Automatically assigned DDC number:
Manually assigned DDC number: 00435
Title: Replication For Efficiency And Fault Tolerance In A Dsm System
Subject: Anne-marie Kermarrec Replication For Efficiency And Fault Tolerance In A Dsm System
Description: Distributed Shared Memory (DSM) systems implemented on a network of workstations (NOW) have become a convenient alternative to shared memory architectures to execute long running parallel applications. However, such architectures are susceptible to experience failures. This paper presents the design and implementation of a recoverable DSM (RDSM) based on a backward error recovery (BER) mechanism. Our RDSM's design has focused on exploiting replication of data for both fault-tolerance and efficiency. This RDSM has been implemented on a NOW and performance evaluation shows the benefits of exploiting both types of replication to design an efficient, scalable and low-cost recoverable DSM. Key Words: Distributed Shared Memory, Replication, Fault Tolerance, Network of Workstations. 1 INTRODUCTION Networks of workstations (now) are an attractive and much cheaper alternative  to shared memory parallel architectures for executing long-running parallel applications. A dsm  implemented o...
Contributor: The Pennsylvania State University CiteSeer Archives
<?xml version="1.0" encoding="UTF-8"?>
<rec ID="SELF" Type="SELF" CiteSeer_Book="SELF" CiteSeer_Volume="SELF" Title="Replication For Efficiency And Fault Tolerance In A Dsm System" />