Automatically assigned DDC number:

Manually assigned DDC number: 00435

Title: Replication For Efficiency And Fault Tolerance In A Dsm System


Subject: Anne-marie Kermarrec Replication For Efficiency And Fault Tolerance In A Dsm System

Description: Distributed Shared Memory (DSM) systems implemented on a network of workstations (NOW) have become a convenient alternative to shared memory architectures to execute long running parallel applications. However, such architectures are susceptible to experience failures. This paper presents the design and implementation of a recoverable DSM (RDSM) based on a backward error recovery (BER) mechanism. Our RDSM's design has focused on exploiting replication of data for both fault-tolerance and efficiency. This RDSM has been implemented on a NOW and performance evaluation shows the benefits of exploiting both types of replication to design an efficient, scalable and low-cost recoverable DSM. Key Words: Distributed Shared Memory, Replication, Fault Tolerance, Network of Workstations. 1 INTRODUCTION Networks of workstations (now) are an attractive and much cheaper alternative [1] to shared memory parallel architectures for executing long-running parallel applications. A dsm [2] implemented o...

Contributor: The Pennsylvania State University CiteSeer Archives

Publisher: unknown

Date: 1998-04-03

Pubyear: unknown

Format: ps



Language: en

Rights: unrestricted


<?xml   version="1.0"   encoding="UTF-8"?>


      <rec   ID="SELF"   Type="SELF"   CiteSeer_Book="SELF"   CiteSeer_Volume="SELF"   Title="Replication   For   Efficiency   And   Fault   Tolerance   In   A   Dsm   System"   />