Searched for keywords.author.keyword:"software fault tolerance" OR acmdlCCS:"software fault tolerance"  [new search]  [edit/save query]  [advanced search]
Searched The ACM Full-Text Collection: 476,316 records   [Expand your search to The ACM Guide to Computing Literature: 2,702,209 records] Help: ACM vs. Guide
572 results found
Export Results: bibtex | endnote | acmref | csv

video content 10 videos found
Refine by People
Names show/hide
Institutions show/hide
Authors show/hide
Reviewers show/hide
Refine by Publications
Publication Names show/hide
ACM Publications show/hide
All Publications show/hide
Content Formats show/hide
Publishers show/hide
Refine by Conferences
Sponsors show/hide
Events show/hide
Proceeding Series show/hide
Refine by Publication Year
1961
Result 1 – 20 of 572
Result page: 1 2 3 4 5 6 7 8 9 10 >>

Sort by:

1 published by ACM
Throughput optimal total order broadcast for cluster environments
Rachid Guerraoui, Ron R. Levy, Bastian Pochon, Vivien Quéma
July 2010 ACM Transactions on Computer Systems (TOCS): Volume 28 Issue 2, July 2010
Publisher: ACM
Bibliometrics:
Citation Count: 5
Downloads (6 Weeks): 9,   Downloads (12 Months): 25,   Downloads (Overall): 586

Full text available: PDFPDF
Total order broadcast is a fundamental communication primitive that plays a central role in bringing cheap software-based high availability to a wide range of services. This article studies the practical performance of such a primitive on a cluster of homogeneous machines. We present LCR, the first throughput optimal uniform total ...
Keywords: total order broadcast, replication, cluster computing, software fault-tolerance
[result highlights]

2 published by ACM
Process backup in producer-consumer systems
David L. Russell
November 1977 ACM SIGOPS Operating Systems Review: Volume 11 Issue 5, November 1977
Publisher: ACM
Bibliometrics:
Citation Count: 9
Downloads (6 Weeks): 0,   Downloads (12 Months): 9,   Downloads (Overall): 295

Full text available: PDFPDF
System state restoration after detection of an error is discussed for producer-consumer systems, with emphasis on the control of the domino effect. Recovery primitives MARK, RESTORE, and PURGE are proposed that, in conjunction with the use of SEND-RECEIVE interprocess communication primitives, allow bounds to be placed on the amount of ...
Keywords: Domino effect, Message facilities, Interprocess communication, Software fault tolerance, State restoration, Asynchronous programming, Error recovery
Also published in:
November 1977  SOSP '77: Proceedings of the sixth ACM symposium on Operating systems principles
[result highlights]

3 published by ACM
Assessing Dependability with Software Fault Injection: A Survey
Roberto Natella, Domenico Cotroneo, Henrique S. Madeira
February 2016 ACM Computing Surveys (CSUR): Volume 48 Issue 3, February 2016
Publisher: ACM
Bibliometrics:
Citation Count: 1
Downloads (6 Weeks): 21,   Downloads (12 Months): 305,   Downloads (Overall): 629

Full text available: PDFPDF
With the rise of software complexity, software-related accidents represent a significant threat for computer-based systems. Software Fault Injection is a method to anticipate worst-case scenarios caused by faulty software through the deliberate injection of software faults. This survey provides a comprehensive overview of the state of the art on Software ...
Keywords: software fault tolerance, dependability assessment, Software faults
[result highlights]

4 published by ACM
On exceptions as first-class objects in Ada 95
Thomas Wolf
September 2001 ACM SIGAda Ada Letters - Exception handling for a 21st century programming language proceedings: Volume XXI Issue 3, September 2001
Publisher: ACM
Bibliometrics:
Citation Count: 0
Downloads (6 Weeks): 0,   Downloads (12 Months): 4,   Downloads (Overall): 96

Full text available: PdfPdf
This short position paper argues that it might be beneficial to try to bring the exception model of Ada 95 more in-line with the object-oriented model of programming. In particular, it is felt that exceptions --- being such an important concept for the development of fault-tolerant software --- have deserved ...
Keywords: software fault tolerance, object-oriented exception handling
[result highlights]

5 published by ACM
An instruction-level fine-grained recovery approach for soft errors
Jianjun Xu, Qingping Tan, Lanfang Tan, Huiping Zhou
March 2013 SAC '13: Proceedings of the 28th Annual ACM Symposium on Applied Computing
Publisher: ACM
Bibliometrics:
Citation Count: 2
Downloads (6 Weeks): 1,   Downloads (12 Months): 21,   Downloads (Overall): 116

Full text available: PDFPDF
With the continuously progress of integrated circuits, the dependability of computing, caused by soft errors, has become a growing design concern. For mitigating the effects of soft errors, software-based fault tolerance techniques are attractive because of their low costs and flexibility. But current researches mostly focus on error detection, and ...
Keywords: error recovery, software fault-tolerance, soft error
[result highlights]

6 published by ACM
Extending Ada to support multi-core based monitoring and fault tolerance
You Li, Lu Yang, Lei Bu, Linzhang Wang, Jianhua Zhao, Xuandong Li
October 2010 SIGAda '10: Proceedings of the ACM SIGAda annual international conference on SIGAda
Publisher: ACM
Bibliometrics:
Citation Count: 0
Downloads (6 Weeks): 2,   Downloads (12 Months): 9,   Downloads (Overall): 124

Full text available: PDFPDF
Monitoring-Oriented Programming (MOP) and Software Fault Tolerance(SFT) are two important approaches to guarantee the reliablity of software systems, especially for those running online for long term. However, the introduction of monitoring or fault tolerance module will bring in high overhead. With the prevalence of multi-core platform, we can find the ...
Keywords: monitoring-oriented programming, multi-core, software fault tolerance
Also published in:
November 2010  ACM SIGAda Ada Letters - SIGAda 2010: Volume 30 Issue 3, December 2010
[result highlights]

7 published by ACM
About conversations for concurrent OO languages
September 1994 ACM SIGPLAN Notices: Volume 29 Issue 9, Sept. 1994
Publisher: ACM
Bibliometrics:
Citation Count: 0
Downloads (6 Weeks): 0,   Downloads (12 Months): 2,   Downloads (Overall): 63

Full text available: PdfPdf
Keywords: concurrent object-oriented languages, software diversity, software fault-tolerance, backward error recovery
[result highlights]

8 published by ACM
Modeling software design diversity: a review
Bev Littlewood, Peter Popov, Lorenzo Strigini
June 2001 ACM Computing Surveys (CSUR): Volume 33 Issue 2, June 2001
Publisher: ACM
Bibliometrics:
Citation Count: 20
Downloads (6 Weeks): 5,   Downloads (12 Months): 33,   Downloads (Overall): 3,299

Full text available: PDFPDF
Design diversity has been used for many years now as a means of achieving a degree of fault tolerance in software-based systems. While there is clear evidence that the approach can be expected to deliver some increase in reliability compared to a single version, there is no agreement about the ...
Keywords: multiple version programming, control systems, software fault tolerance, functional diversity, N-version software, protection systems, safety
[result highlights]

9
Recovery blocks in action: A system supporting high reliability
T. Anderson, R. Kerr
October 1976 ICSE '76: Proceedings of the 2nd international conference on Software engineering
Publisher: IEEE Computer Society Press
Bibliometrics:
Citation Count: 21
Downloads (6 Weeks): 2,   Downloads (12 Months): 18,   Downloads (Overall): 628

Full text available: PDFPDF
The need for reliable complex systems motivates the development of techniques by which acceptable service can be maintained, even in the presence of residual errors. Recovery blocks allow a software designer to include tests on the acceptability of the various phases of a system's operation, and to specify alternative actions ...
Keywords: Software fault-tolerance, Error recovery, Reliability, Error detection, Recovery block, Recovery cache
[result highlights]

10
The application of compile-time reflection to software fault tolerance using ada 95
P. Rogers, A. J. Wellings
June 2005 Ada-Europe'05: Proceedings of the 10th Ada-Europe international conference on Reliable Software Technologies
Publisher: Springer-Verlag
Bibliometrics:
Citation Count: 0

Transparent system support for software fault tolerance reduces performance in general and precludes application-specific optimizations in particular. In contrast, explicit support - especially at the language level - allows application-specific tailoring. However, current techniques that extend languages to support software fault tolerance lead to interwoven code addressing functional and non-functional ...
Keywords: atomic actions, recovery blocks, software fault tolerance, backward error recovery, reflection, Ada, conversations
[result highlights]

11 published by ACM
Crystal-growth-inspired algorithms for computational grids
June 2009 BADS '09: Proceedings of the 2009 workshop on Bio-inspired algorithms for distributed systems
Publisher: ACM
Bibliometrics:
Citation Count: 0
Downloads (6 Weeks): 1,   Downloads (12 Months): 6,   Downloads (Overall): 112

Full text available: PDFPDF
Biological systems surpass man-made systems in many important ways. Most notably, systems found in nature are typically self-adaptive and self-managing, capable of surviving drastic changes in their environments, such as internal failures and malicious attacks on their components. Large distributed software systems have requirements common to those of some biological ...
Keywords: computational grid, nature-inspired software, fault tolerance, privacy, self-assembly, software architectural style

12 published by ACM
A hierarchical structure for fault tolerant reactive programs
Andrea Clematis, Vittoria Gianuzzi
March 1993 SAC '93: Proceedings of the 1993 ACM/SIGAPP symposium on Applied computing: states of the art and practice
Publisher: ACM
Bibliometrics:
Citation Count: 0
Downloads (6 Weeks): 0,   Downloads (12 Months): 3,   Downloads (Overall): 189

Full text available: PDFPDF
Keywords: transaction based systems, software fault tolerance, backward error recovery, concurrent programming
[result highlights]

13 published by ACM
A survey of linguistic structures for application-level fault tolerance
Vincenzo De Florio, Chris Blondia
May 2008 ACM Computing Surveys (CSUR): Volume 40 Issue 2, April 2008
Publisher: ACM
Bibliometrics:
Citation Count: 6
Downloads (6 Weeks): 1,   Downloads (12 Months): 25,   Downloads (Overall): 2,369

Full text available: PDFPDF
Structures for the expression of fault-tolerance provisions in application software comprise the central topic of this article. Structuring techniques answer questions as to how to incorporate fault tolerance in the application layer of a computer program and how to manage the fault-tolerant code. As such, they provide the means to ...
Keywords: Language support for software-implemented fault tolerance, reconfiguration and error recovery, separation of design concerns, software fault tolerance
[result highlights]

14
Global Virtual Time and distributed synchronization
Jeffrey S. Steinman, Craig A. Lee, Linda F. Wilson, David M. Nicol
July 1995 PADS '95: Proceedings of the ninth workshop on Parallel and distributed simulation
Publisher: IEEE Computer Society
Bibliometrics:
Citation Count: 8
Downloads (6 Weeks): 0,   Downloads (12 Months): 24,   Downloads (Overall): 621

Full text available: PDFPDF
Global Virtual Time (GVT) is the fundamental synchronization concept in optimistic simulations. It is defined as the earliest time tag within the set of unprocessed pending events in distributed simulation. A number of techniques for determining GVT have been proposed in recent years, each having their own intrinsic properties. However, ...
Keywords: flow control, optimistic simulations, SPEEDES framework, efficiency, message passing, parallel programming, real-time systems, scalability, unprocessed pending events, GVT computation, SPEEDES GVT, Synchronous Parallel Environment for Emulation and Discrete-Event Simulation framework, digital simulation, distributed simulation, distributed synchronization, event processing, fundamental synchronization concept, global reduction operations, portability, synchronisation, global virtual time, interactive support, real time use, software fault tolerance
Also published in:
July 1995  ACM SIGSIM Simulation Digest: Volume 25 Issue 1, July 1995
[result highlights]

15 published by ACM
Recovery domains: an organizing principle for recoverable operating systems
March 2009 ASPLOS XIV: Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Publisher: ACM
Bibliometrics:
Citation Count: 16
Downloads (6 Weeks): 3,   Downloads (12 Months): 21,   Downloads (Overall): 715

Full text available: PDFPDF
We describe a strategy for enabling existing commodity operating systems to recover from unexpected run-time errors in nearly any part of the kernel, including core kernel components. Our approach is dynamic and request-oriented; it isolates the effects of a fault to the requests that caused the fault rather than to ...
Keywords: automatic fault recovery, akeso, recovery domains
Also published in:
February 2009  ACM SIGPLAN Notices - ASPLOS 2009: Volume 44 Issue 3, March 2009 March 2009  ACM SIGARCH Computer Architecture News - ASPLOS 2009: Volume 37 Issue 1, March 2009
[result highlights]

16 published by ACM
Affinity-aware checkpoint restart
Ajay Saini, Arash Rezaei, Frank Mueller, Paul Hargrove, Eric Roman
December 2014 Middleware '14: Proceedings of the 15th International Middleware Conference
Publisher: ACM
Bibliometrics:
Citation Count: 1
Downloads (6 Weeks): 6,   Downloads (12 Months): 38,   Downloads (Overall): 113

Full text available: PDFPDF
Current checkpointing techniques employed to overcome faults for HPC applications result in inferior application performance after restart from a checkpoint for a number of applications. This is due to a lack of page and core affinity awareness of the checkpoint/restart (C/R) mechanism, i.e., application tasks originally pinned to cores may ...
Keywords: multi-core, fault tolerance, NUMA, efficiency, system software, checkpoint and restart
[result highlights]

17 published by ACM
July 2004 PODC '04: Proceedings of the twenty-third annual ACM symposium on Principles of distributed computing
Publisher: ACM
Bibliometrics:
Citation Count: 3
Downloads (6 Weeks): 3,   Downloads (12 Months): 7,   Downloads (Overall): 142

Full text available: PDFPDF
Keywords: byzantine fault tolerance, dynamic membership
[result highlights]

18 published by ACM
The effects of metadata corruption on nfs
October 2007 StorageSS '07: Proceedings of the 2007 ACM workshop on Storage security and survivability
Publisher: ACM
Bibliometrics:
Citation Count: 1
Downloads (6 Weeks): 4,   Downloads (12 Months): 6,   Downloads (Overall): 262

Full text available: PDFPDF
Distributed file systems need to be robust in the face of failures. In this work, we study the failure handling and recovery mechanisms of a widely used distributed file system, Linux NFS. We study the behavior of NFS under corruption of important metadata through fault injection. We find that the ...
Keywords: NFS, inconsistency, silent failure, fault tolerance, metadata corruption, reliability, retry
[result highlights]

19 published by ACM
Distributed middleware reliability and fault tolerance support in system S
Rohit Wagle, Henrique Andrade, Kirsten Hildrum, Chitra Venkatramani, Michael Spicer
July 2011 DEBS '11: Proceedings of the 5th ACM international conference on Distributed event-based system
Publisher: ACM
Bibliometrics:
Citation Count: 2
Downloads (6 Weeks): 2,   Downloads (12 Months): 18,   Downloads (Overall): 452

Full text available: PDFPDF
We describe a fault-tolerance technique for implementing operations in a large-scale distributed system that ensures that all the components will eventually have a consistent view of the system even in the face of component failures. To achieve this, we break the distributed operation into a series of smaller operations, each ...
Keywords: fault tolerance, recovery, reliability, stream processing, distributed systems, middleware
[result highlights]

20 published by ACM
On the trade-off between network connectivity, round complexity, and communication complexity of reliable message transmission
Ashwinkumar Badanidiyuru, Arpita Patra, Ashish Choudhury, Kannan Srinathan, C. Pandu Rangan
November 2012 Journal of the ACM (JACM): Volume 59 Issue 5, October 2012
Publisher: ACM
Bibliometrics:
Citation Count: 1
Downloads (6 Weeks): 0,   Downloads (12 Months): 26,   Downloads (Overall): 614

Full text available: PDFPDF
Perfectly reliable message transmission (PRMT) is one of the fundamental problems in distributed computing. It allows a sender to reliably transmit a message to a receiver in an unreliable network, even in the presence of a computationally unbounded adversary. In this article, we study the inherent trade-off between the three ...
Keywords: message transmission, Distributed computing, computationally unbounded
[result highlights]

Result 1 – 20 of 572
Result page: 1 2 3 4 5 6 7 8 9 10 >>



The ACM Digital Library is published by the Association for Computing Machinery. Copyright © 2017 ACM, Inc.
Terms of Usage   Privacy Policy   Code of Ethics   Contact Us
 
Export Formats