Result page:
1
2
3
4
5
6
7
8
9
10
>>
1
Throughput optimal total order broadcast for cluster environments
July 2010
ACM Transactions on Computer Systems (TOCS): Volume 28 Issue 2, July 2010
Publisher: ACM
Bibliometrics:
Citation Count: 5
Downloads (6 Weeks): 9, Downloads (12 Months): 25, Downloads (Overall): 586
Full text available:
PDF
Total order broadcast is a fundamental communication primitive that plays a central role in bringing cheap software-based high availability to a wide range of services. This article studies the practical performance of such a primitive on a cluster of homogeneous machines. We present LCR, the first throughput optimal uniform total ...
Keywords:
total order broadcast, replication, cluster computing, software fault-tolerance
CCS:
Software fault tolerance
Keywords:
software fault-tolerance
Full Text:
... Design—Distributed systemsGeneral Terms: Algorithms, Performance, ReliabilityAdditional Key Words and Phrases: software fault- -tolerance, , replication, total order broadcast,cluster computingACM Reference Format:Guerraoui, R., Levy, ...
2
Process backup in producer-consumer systems
David L. Russell
November 1977
ACM SIGOPS Operating Systems Review: Volume 11 Issue 5, November 1977
Publisher: ACM
Bibliometrics:
Citation Count: 9
Downloads (6 Weeks): 0, Downloads (12 Months): 9, Downloads (Overall): 295
Full text available:
PDF
System state restoration after detection of an error is discussed for producer-consumer systems, with emphasis on the control of the domino effect. Recovery primitives MARK, RESTORE, and PURGE are proposed that, in conjunction with the use of SEND-RECEIVE interprocess communication primitives, allow bounds to be placed on the amount of ...
Keywords:
Domino effect, Message facilities, Interprocess communication, Software fault tolerance, State restoration, Asynchronous programming, Error recovery
Also published in:
November 1977
SOSP '77: Proceedings of the sixth ACM symposium on Operating systems principles
CCS:
Software fault tolerance
Keywords:
Software fault tolerance
References:
B. Randell. System structure for software fault tolerance. IEEE Trans. on Software Engineering SE-1, 2 (June 1975), 220-232.
Full Text:
... Berlin, 1974, pp. 171-187. 2. B. RandeI[. System structure for software fault tolerance. . IEEE Trans. on Software En- gineering SE-i, 2 (June ...
3
Assessing Dependability with Software Fault Injection: A Survey
Roberto Natella,
Domenico Cotroneo,
Henrique S. Madeira
February 2016
ACM Computing Surveys (CSUR): Volume 48 Issue 3, February 2016
Publisher: ACM
Bibliometrics:
Citation Count: 1
Downloads (6 Weeks): 21, Downloads (12 Months): 305, Downloads (Overall): 629
Full text available:
PDF
With the rise of software complexity, software-related accidents represent a significant threat for computer-based systems. Software Fault Injection is a method to anticipate worst-case scenarios caused by faulty software through the deliberate injection of software faults. This survey provides a comprehensive overview of the state of the art on Software ...
Keywords:
software fault tolerance, dependability assessment, Software faults
Keywords:
software fault tolerance
References:
J.-C. Laprie, J. Arlat, C. Beounes, and K. Kanoun. 1990. Definition and analysis of hardware-and software-fault-tolerant architectures. IEEE Computer 23, 7 (1990), 39--51.
M. R. Lyu. 1995. Software Fault Tolerance. John Wiley & Sons.
Full Text:
... viewof system safety [RTCA 1992; ISO 2011].An early study on software fault tolerance [Hudak et al. 1993] compared several tech-niques by injecting both ... February 2016.44:6 R. Natella et al.Fig. 4. Model for comparing software fault tolerance techniques [Hudak et al. 1993].In turn, these probabilities were used ...
... that practicaland easy-to-use tools are essential for the future of software fault tolerance. .10.2.1. The Quest for New Software Fault Tolerance Methods. Several software fault toler- -ance approaches have been proposed, and most of them are ... be supported by SFI at the early stages ofsoftware development.10.2.2. Software Fault Tolerance Challenges Posed by the Forthcoming Hardware. The needto reduce energy ...
... Beounes, and K. Kanoun. 1990. Definition and analysis of hardware-and software- -fault- -tolerant architectures. IEEE Computer 23, 7 (1990), 39–51.N. Laranjeiro, M. Vieira, ...
... susceptibility. In Proc. USENIX Annual Technical Conf.M. R. Lyu. 1995. Software Fault Tolerance. . John Wiley & Sons.H. Madeira, D. Costa, and M. ...
4
On exceptions as first-class objects in Ada 95
Thomas Wolf
September 2001
ACM SIGAda Ada Letters - Exception handling for a 21st century programming language proceedings: Volume XXI Issue 3, September 2001
Publisher: ACM
Bibliometrics:
Citation Count: 0
Downloads (6 Weeks): 0, Downloads (12 Months): 4, Downloads (Overall): 96
Full text available:
Pdf
This short position paper argues that it might be beneficial to try to bring the exception model of Ada 95 more in-line with the object-oriented model of programming. In particular, it is felt that exceptions --- being such an important concept for the development of fault-tolerant software --- have deserved ...
Keywords:
software fault tolerance, object-oriented exception handling
Keywords:
software fault tolerance
5
An instruction-level fine-grained recovery approach for soft errors
Jianjun Xu,
Qingping Tan,
Lanfang Tan,
Huiping Zhou
March 2013
SAC '13: Proceedings of the 28th Annual ACM Symposium on Applied Computing
Publisher: ACM
Bibliometrics:
Citation Count: 2
Downloads (6 Weeks): 1, Downloads (12 Months): 21, Downloads (Overall): 116
Full text available:
PDF
With the continuously progress of integrated circuits, the dependability of computing, caused by soft errors, has become a growing design concern. For mitigating the effects of soft errors, software-based fault tolerance techniques are attractive because of their low costs and flexibility. But current researches mostly focus on error detection, and ...
Keywords:
error recovery, software fault-tolerance, soft error
Keywords:
software fault-tolerance
Full Text:
... Reliability, Test-ing, and Fault ToleranceGeneral TermsAlgorithms, Design, Performance, ReliabilityKeywordssoft error, software fault- -tolerance, , error recovery1. INTRODUCTIONPermission to make digital or hard copies ...
6
Extending Ada to support multi-core based monitoring and fault tolerance
You Li,
Lu Yang,
Lei Bu,
Linzhang Wang,
Jianhua Zhao,
Xuandong Li
October 2010
SIGAda '10: Proceedings of the ACM SIGAda annual international conference on SIGAda
Publisher: ACM
Bibliometrics:
Citation Count: 0
Downloads (6 Weeks): 2, Downloads (12 Months): 9, Downloads (Overall): 124
Full text available:
PDF
Monitoring-Oriented Programming (MOP) and Software Fault Tolerance(SFT) are two important approaches to guarantee the reliablity of software systems, especially for those running online for long term. However, the introduction of monitoring or fault tolerance module will bring in high overhead. With the prevalence of multi-core platform, we can find the ...
Keywords:
monitoring-oriented programming, multi-core, software fault tolerance
Also published in:
November 2010
ACM SIGAda Ada Letters - SIGAda 2010: Volume 30 Issue 3, December 2010
Keywords:
software fault tolerance
Abstract:
Monitoring-Oriented Programming (MOP) and Software Fault Tolerance( (SFT) are two important approaches to guarantee the reliablity of ...
References:
A. Avizienis. The methodology of n-version programming. In M. R. Lyu, editor, SOFTWARE FAULT TOLERANCE. John Wiley & Sons Ltd, 1994.
P. G. Bishop. Software fault tolerance by design diversity. In SOFTWARE FAULT TOLERANCE, pages 211--229. John Wiley & Sons Ltd, 1994.
J. C. Laprie, et al. Architectural issues in software fault tolerance. In M. R. Lyu, editor, SOFTWARE FAULT TOLERANCE, pages 47--80. John Wiley & Sons Ltd, 1994.
B. Randell and J. Xu. The evolution of the recovery block concept. In M. R. Lyu, editor, SOFTWARE FAULT TOLERANCE, pages 1--22. John Wiley & Sons Ltd, 1994.
Full Text:
... Fairfax, Virginia, USA.Copyright 2010 ACM 978-1-4503-0027-8/10/10 ...$10.00.General TermsDesign, ReliablityKeywordsmulti-core,monitoring-oriented programming, software fault toler- -ance1. INTRODUCTIONWith the development of software industry, the reliablity of ... the reliablity of software systems. Mon-itoring Oriented Programming (MOP) and Software Fault Toler- -ance (SFT), which can give high confidence for long-running on-line ...
... used to compensate or mask software failures.Current main approaches of software fault tolerance includingmulti-version techniques and single-version techniques. The multi-version techniques include Recovery ...
... FAULT TOLERANCE.John Wiley & Sons Ltd, 1994.[2] P. G. Bishop. Software fault tolerance by design diversity. InSOFTWARE FAULT TOLERANCE, pages 211–229. JohnWiley & ...
7
About conversations for concurrent OO languages
September 1994
ACM SIGPLAN Notices: Volume 29 Issue 9, Sept. 1994
Publisher: ACM
Bibliometrics:
Citation Count: 0
Downloads (6 Weeks): 0, Downloads (12 Months): 2, Downloads (Overall): 63
Full text available:
Pdf
Keywords:
concurrent object-oriented languages, software diversity, software fault-tolerance, backward error recovery
Keywords:
software fault-tolerance
References:
[RX93] B. Randell, J. Xu. "Object-Oriented Software Fault Tolerance: Framework, Reuse and Design Diversity", PDCS2 ESPRIT basic research project, Year Report, 1993.
[R75] B. Randell. "System Structure for Software Fault-Tolerance", IEEE Trans. Soft. Eng. SE-1, 2, 1975, pp. 220-232.
Full Text:
... Newcastle upon Tyne, NEl 7RU, U Kemail: alexander.romanovsky@newcastle .ac.ukKeywords : software fault- -tolerance, , software diversity, backward error recovery ,concurrent object-oriented languages .The ...
... 1991 .[RX93] B . Randell, J . Xu . "Object-Oriented Software Fault Tolerance: : Framework ,Reuse and Design Diversity", PDCS2 ESPRIT basic research ... Report ,1993 .[R75] B . Randell . "System Structure for Software Fault- -Tolerance" ", IEEE Trans .Soft. Eng . SE-1, 2, 1975, pp ...
8
Modeling software design diversity: a review
Bev Littlewood,
Peter Popov,
Lorenzo Strigini
June 2001
ACM Computing Surveys (CSUR): Volume 33 Issue 2, June 2001
Publisher: ACM
Bibliometrics:
Citation Count: 20
Downloads (6 Weeks): 5, Downloads (12 Months): 33, Downloads (Overall): 3,299
Full text available:
PDF
Design diversity has been used for many years now as a means of achieving a degree of fault tolerance in software-based systems. While there is clear evidence that the approach can be expected to deliver some increase in reliability compared to a single version, there is no agreement about the ...
Keywords:
multiple version programming, control systems, software fault tolerance, functional diversity, N-version software, protection systems, safety
Keywords:
software fault tolerance
References:
AMMANN,P.E.AND KNIGHT, J. C. 1988. Data diversity: An approach to software fault tolerance. IEEE Trans. Comput. C-37, 4, 418-425.
ANDERSON, T., BARRETT, P. A., HALLIWELL,D.N.AND MOULDING, M. R. 1985. An evaluation of software fault tolerance in a practical system. In Proceedings of the 15th IEEE International Symposium on Fault-Tolerant Computing (FTCS- 15). (Ann Arbor, MI.)
KERSKEN,M.AND SAGLIETTI, F. Eds. 1992. Software fault tolerance: Achievement and assessment strategies. Research Reports ESPRIT, Springer- Verlag, New York.
LAPRIE, J. C., ARLAT, J., BEOUNES,C.AND KANOUN, K. 1990. Definition and analysis of hardwareand-software fault-tolerant architectures. IEEE Comput. 23, 7, 39-51.
LYU, M. R. Ed. 1995. Software Fault Tolerance. Wiley, New York, 337.
MIGNEAULT, G. E. 1982. The Cost of Software Fault Tolerance Technical Report. NASA Langley Research Center, Hampton, Va.
Full Text:
... it is thisissue of dependence of failures that makesmodeling of software fault tolerance par-ticularly difficult.Of course, simply replicating a compo-nent (hardware or software) ...
... 2001.188 B. Littlewood et al.several safety-critical systems have beenimplemented using software fault toler- -ance based on design diversity, and therehave been no reports ...
... J., BEOUNES, C. AND KANOUN,K. 1990. Definition and analysis of hardware-and-software fault- -tolerant architectures. IEEEComput. 23, 7, 39–51.LARYD, A. 1994. Operating experience of ... Reliab. Eng. Syst. Safety. 66, 93–95.LYU, M. R. Ed. 1995. Software Fault Tolerance. . Wi-ley, New York, 337.LYU, M. R. Ed. 1996. Handbook ...
9
Recovery blocks in action: A system supporting high reliability
T. Anderson,
R. Kerr
October 1976
ICSE '76: Proceedings of the 2nd international conference on Software engineering
Publisher: IEEE Computer Society Press
Bibliometrics:
Citation Count: 21
Downloads (6 Weeks): 2, Downloads (12 Months): 18, Downloads (Overall): 628
Full text available:
PDF
The need for reliable complex systems motivates the development of techniques by which acceptable service can be maintained, even in the presence of residual errors. Recovery blocks allow a software designer to include tests on the acceptability of the various phases of a system's operation, and to specify alternative actions ...
Keywords:
Software fault-tolerance, Error recovery, Reliability, Error detection, Recovery block, Recovery cache
Keywords:
Software fault-tolerance
References:
B. Randell (1975). System Structure for Software Fault Tolerance. IEEE Trans. on Software Engineering 1, 2, pp. 220-232.
Full Text:
... Phrases Error detection, error recovery, recovery block, recovery cache, reliability, software fault- - tolerance. . Abstract The need for reliable complex systems motivates the ... blocks can be used in systems which aim to provide software fault tolerance have been reported (Randell (1975)), as has a proof-guided methodology ...
10
The application of compile-time reflection to software fault tolerance using ada 95
P. Rogers,
A. J. Wellings
June 2005
Ada-Europe'05: Proceedings of the 10th Ada-Europe international conference on Reliable Software Technologies
Publisher: Springer-Verlag
Transparent system support for software fault tolerance reduces performance in general and precludes application-specific optimizations in particular. In contrast, explicit support - especially at the language level - allows application-specific tailoring. However, current techniques that extend languages to support software fault tolerance lead to interwoven code addressing functional and non-functional ...
Keywords:
atomic actions, recovery blocks, software fault tolerance, backward error recovery, reflection, Ada, conversations
Title:
The application of compile-time reflection to
software fault tolerance using ada 95
Keywords:
software fault tolerance
Abstract:
Transparent system support for software fault tolerance reduces performance in general and precludes application-specific optimizations in particular. ... application-specific tailoring. However, current techniques that extend languages to support software fault tolerance lead to interwoven code addressing functional and non-functional requirements. Reflection ... language design space. To explore this potential we compare common software fault tolerance scenarios implemented in both standard and reflective Ada. Specifically, in ...
References:
M. Lyu, Ed. Software Fault Tolerance, in Trends In Software, vol. 3, Chichester: John Wiley & Sons, 1995.
P. Rogers, "Software Fault Tolerance, Reflection, and the Ada Programming Language (YCST 2003/10)," in Department of Computer Science: University of York, 2003.
P. Rogers and A. J. Wellings, "An Incremental Recovery Cache Supporting Software Fault Tolerance Mechanisms," Journal of Computer Systems: Science and Engineering, vol. 15, no. 1, pp. 33-48, 2000.
Full Text:
LNCS 3555 - The Application of Compile-Time Reflection to Software Fault Tolerance Using Ada 95 T. Vardanega and A. Wellings (Eds.): Ada-Europe ... Springer-Verlag Berlin Heidelberg 2005 The Application of Compile-Time Reflection to Software Fault Tolerance Using Ada 95 P. Rogers1 and A.J. Wellings2 1 Ada ... of York, York, UK [email protected] Abstract. Transparent system support for software fault tolerance reduces performance in general and precludes application-specific optimizations in particular. ... application-specific tailoring. However, current techniques that extend languages to support software fault tolerance lead to interwoven code addressing functional and non-functional requirements. Reflection ... language design space. To explore this potential we compare common software fault tolerance scenarios implemented in both standard and reflective Ada. Specifically, in ... in terms of expressive power, portability, and performance. Keywords: Reflection, software fault tolerance, , Ada, backward error recovery, recovery blocks, atomic actions, conversations. ... combination of potential specification The Application of Compile-Time Reflection to Software Fault Tolerance 237 errors and overall complexity define the problem as one ... define the problem as one of handling unanticipated software faults. “Software fault tolerance” ” is the use of software mechanisms to deal with ... mechanisms to deal with these unanticipated software faults [5, Preface]. Software fault tolerance is expensive and adds to the overall complexity of the ... system (which may even reduce reliability as a result). Nevertheless, software fault tolerance must be explicitly considered for safety-critical applications because software faults ...
... and it has, therefore, been a focus of research in software fault tolerance. . However, most of these research efforts focus on handling ... efforts focus on handling hardware faults and those that address software fault tolerance use languages that are limited in one respect or another. ... Furthermore, such a language has not been used to address software fault tolerance with reflective programming even though Ada is especially appropriate for ... in out Class ); The Application of Compile-Time Reflection to Software Fault Tolerance 239 procedure Translate_Handled_Statements ( This : in out Class; Input ...
... languages. We have implemented a number of scenarios using common software fault tolerance facilities to determine the potential advantages offered by reflection for ... of concerns, and performance. The Application of Compile-Time Reflection to Software Fault Tolerance 241 3.1 Expressive Power Lacking a widely accepted definition, we ...
... is not the goal. The Application of Compile-Time Reflection to Software Fault Tolerance 243 We wish to separate the code meeting the functional ...
... performance penalties but a The Application of Compile-Time Reflection to Software Fault Tolerance 245 poor metaclass translation may very well generate source code ...
... certification is a typical requirement for systems that might employ software fault tolerance techniques. Our implementations used the full Ada language, including tasks ... Massachusetts: MIT Press, 1991. The Application of Compile-Time Reflection to Software Fault Tolerance 247 [4] N. Leveson, “Software Safety: Why, What and How,” ... 18, no. 2, pp. 125-163, 1986. [5] M. Lyu, Ed. Software Fault Tolerance, , in Trends In Software, vol. 3, Chichester: John Wiley ... vol. 22, no. 12, pp. 147-155, 1987. [7] P. Rogers, “Software Fault Tolerance, , Reflection, and the Ada Programming Language (YCST 2003/10),” in ... Rogers and A. J. Wellings, “An Incremental Recovery Cache Supporting Software Fault Tolerance Mechanisms,” Journal of Computer Systems: Science and Engineering, vol. 15, ...
11
Crystal-growth-inspired algorithms for computational grids
June 2009
BADS '09: Proceedings of the 2009 workshop on Bio-inspired algorithms for distributed systems
Publisher: ACM
Bibliometrics:
Citation Count: 0
Downloads (6 Weeks): 1, Downloads (12 Months): 6, Downloads (Overall): 112
Full text available:
PDF
Biological systems surpass man-made systems in many important ways. Most notably, systems found in nature are typically self-adaptive and self-managing, capable of surviving drastic changes in their environments, such as internal failures and malicious attacks on their components. Large distributed software systems have requirements common to those of some biological ...
Keywords:
computational grid, nature-inspired software, fault tolerance, privacy, self-assembly, software architectural style
12
A hierarchical structure for fault tolerant reactive programs
Andrea Clematis,
Vittoria Gianuzzi
March 1993
SAC '93: Proceedings of the 1993 ACM/SIGAPP symposium on Applied computing: states of the art and practice
Publisher: ACM
Bibliometrics:
Citation Count: 0
Downloads (6 Weeks): 0, Downloads (12 Months): 3, Downloads (Overall): 189
Full text available:
PDF
Keywords:
transaction based systems, software fault tolerance, backward error recovery, concurrent programming
Keywords:
software fault tolerance
References:
Randell B., "System structure for software fault Tolerance", IEEE 7Fans. Software Eng., Vol.Se-1, pp.220-232, 1975.
Full Text:
... Geneva2Dipartimento di Informatica e Scienze dell’Informazione - Universit& - GenevaKeywords: Software Fault Tolerance, ... , ConcurrentProgramming, Backward Error Recovery, TransactionBased Systems.AbstractA new approach to software fault tolerance in concur-rent programs modeled as reactive systems k proposed.It is ... using a transaction based ap-proach [5] while the use of software fault tolerance lookslike the adequate solution for process reliability. Unfor-tunately we have ... of differ-ent methodologies to improve the reliability of prcr-grams through software fault tolerance, , the applic>tion of these methodologies to reactive programs is ... reactiveprogram and of its fault tolerance requirements. Thendifferent methodologies for software fault tolerance areshortly revised and the problems in their application toreactive program ...
... and starvation-free algorithmshave been proposed to solve the choice problem [4].Software Fault Tolerance and Re-active SystemsIn the previous sections, a reactive system has ... without any consideration for the use of method-ologies, such as software fault tolerance, , aimed at im-proving its reliability.Proposed software fault tolerance methodologies aregenerally based on redundancy and code diversity, andcan be ...
... very com-plex task.Problem 4: Finally, ensuring reliable process interac-tion through software fault tolerance ia only half of thestory. As mentioned in the introduction ... to be solved is how to com-bine the use of software fault tolerance, , which improvesprocess reliability, with transactions, which ensure datareliability.A hierarchical ... to establish what parts of the systemrequire the use of software fault tolerance. . The actionmanager level is a simple structure, the correctness ...
... pro-vides a simple and structured scheme for the applica-tion of software fault tolerance mechanisms. When thecomputation is performed on a distributed computersystem, software fault tolerance can be profitably inte-grated with system fault tolerance to enforce ...
13
A survey of linguistic structures for application-level fault tolerance
Vincenzo De Florio,
Chris Blondia
May 2008
ACM Computing Surveys (CSUR): Volume 40 Issue 2, April 2008
Publisher: ACM
Bibliometrics:
Citation Count: 6
Downloads (6 Weeks): 1, Downloads (12 Months): 25, Downloads (Overall): 2,369
Full text available:
PDF
Structures for the expression of fault-tolerance provisions in application software comprise the central topic of this article. Structuring techniques answer questions as to how to incorporate fault tolerance in the application layer of a computer program and how to manage the fault-tolerant code. As such, they provide the means to ...
Keywords:
Language support for software-implemented fault tolerance, reconfiguration and error recovery, separation of design concerns, software fault tolerance
Keywords:
software fault tolerance
References:
Ammann, P. E. and Knight, J. C. 1988. Data diversity: An approach to software fault tolerance. IEEE Trans. Comput. 37, 4, 418--425.
Anderson, T., Barrett, P., Halliwell, D., and Moulding, M. 1985. Software fault tolerance: An evaluation. IEEE Trans. Softw. Eng. 11, 2, 1502--1510.
Avizienis, A. 1995. The methodology of N-version programming. In Software Fault Tolerance, M. Lyu, ed. John Wiley and Sons, New York, Chapter 2, 23--46.
Cristian, F. 1995. Exception handling. In Software Fault Tolerance, M. Lyu, ed. Wiley, 81--107.
Huang, Y. and Kintala, C. M. 1995. Software fault tolerance in the application layer. In Software Fault Tolerance, M. Lyu, ed. John Wiley and Sons, New York, Chapter 10, 231--248.
Huang, Y., Kintala, C. M., Bernstein, L., and Wang, Y. 1996. Components for software fault tolerance and rejuvenation. AT&T; Tech. J., 29--37.
Lyu, M. 1995. Software Fault Tolerance. John Wiley and Sons, New York.
Randell, B. 1975. System structure for software fault tolerance. IEEE Trans. Softw. Eng. 1, 220--232.
Randell, B. and Xu, J. 1995. The evolution of the recovery block concept. In Software Fault Tolerance, M. Lyu, ed. John Wiley and Sons, New York, Chapter 1, 1--21.
Taylor, D. J., Morgan, D. E., and Black, J. P. 1980. Redundancy in data structures: Improving software fault tolerance. IEEE Trans. Softw. Eng. 6, 6 (Nov.), 585--594.
Full Text:
... Phrases: Language support for software-implemented fault tolerance, separationof design concerns, software fault tolerance, , reconfiguration and error recoveryACM Reference Format:De Florio, V. and ...
... the current lack of a simple and coherent system structurefor software fault- -tolerance engineering (providing the designer with effective supporttowards fulfilling goals such ...
... through mechanisms eitherresiding in or cooperating with the application layer.2.2. Software Fault- -Tolerance in the Application LayerThe need for software fault tolerance provisions, located in the application layer, issupported by studies showing ... reliability is that of incorporating in the application softwareprovisions for software fault tolerance [Randell 1975].Another argument that justifies the addition of software fault tolerance means in theapplication layer is given by the widespread adoption ...
... and complex applicationsoftware. We next give three consequent obstacles to software fault- -tolerance design.—Amalgamating these two types of application code greatly complicates the ...
... recognized.Two of the previously mentioned approaches derive from well-established researchin software fault- -tolerance; ; Lyu [1998b, 1996, 1995] refers to them as single-versionand ... Lyu [1998b, 1996, 1995] refers to them as single-versionand multiple-version software fault tolerance. . They are dealt with in Section 3.1. Athird approach, ... 1T/1H/1S, is called a simplex in the cited paper.3.1.1. Single-Version Software Fault- -Tolerance. . Single-version software fault- -tolerance( (SV) is basically the embedding into the user application of ...
... theclass of applications that may be tackled with it.3.1.2. Multiple-Version Software Fault- -Tolerance. . This section describes multiple-versionsoftware fault-tolerance (MV), an approach which ...
... blocks as well. In other words, the adoption of multiple-version software fault- -tolerance provisions always implies a penalty on maintainabilityand portability.Limited NVP support ...
... Functional models that appear partic-ularly interesting as system structures for software fault- -tolerance are those based onthe concept of attribute grammars [Paakki 1995]. ...
... AND KNIGHT, J. C. 1988. Data diversity: An approach to software fault tolerance. . IEEE Trans.Comput. 37, 4, 418–425.ANCONA, M., DODERO, G., GIANNUZZI, ... 93–109.ANDERSON, T., BARRETT, P., HALLIWELL, D., AND MOULDING, M. 1985. Software fault tolerance: : An evaluation.IEEE Trans. Softw. Eng. 11, 2, 1502–1510.ANDREWS, G. ... 2007).AVIZ?IENIS, A. 1995. The methodology of N -version programming. In Software Fault Tolerance, , M. Lyu, ed.John Wiley and Sons, New York, Chapter ... De Florio and C. BlondiaCRISTIAN, F. 1995. Exception handling. In Software Fault Tolerance, , M. Lyu, ed. Wiley, 81–107.DE FLORIO, V. 1998. The ...
... Not. 23, 4 (Jul.).HUANG, Y. AND KINTALA, C. M. 1995. Software fault tolerance in the application layer. In Software FaultTolerance, M. Lyu, ed. ... C. M., BERNSTEIN, L., AND WANG, Y. 1996. Components for software fault tolerance andrejuvenation. AT&T Tech. J., 29–37.HUANG, Y., KINTALA, C., KOLETTIS, N., ...
... Software Reliability Engineering. IEEE Computer Society Press andMcGraw-Hill.LYU, M. 1995. Software Fault Tolerance. . John Wiley and Sons, New York.MAES, P. 1987. Concepts ... Prentice-Hall, Upper Saddle River, NJ.RANDELL, B. 1975. System structure for software fault tolerance. . IEEE Trans. Softw. Eng. 1, 220–232.RANDELL, B. AND XU, ... J. 1995. The evolution of the recovery block concept. In Software Fault Tolerance, ,M. Lyu, ed. John Wiley and Sons, New York, Chapter ...
14
Global Virtual Time and distributed synchronization
July 1995
PADS '95: Proceedings of the ninth workshop on Parallel and distributed simulation
Publisher: IEEE Computer Society
Bibliometrics:
Citation Count: 8
Downloads (6 Weeks): 0, Downloads (12 Months): 24, Downloads (Overall): 621
Full text available:
PDF
Global Virtual Time (GVT) is the fundamental synchronization concept in optimistic simulations. It is defined as the earliest time tag within the set of unprocessed pending events in distributed simulation. A number of techniques for determining GVT have been proposed in recent years, each having their own intrinsic properties. However, ...
Keywords:
flow control, optimistic simulations, SPEEDES framework, efficiency, message passing, parallel programming, real-time systems, scalability, unprocessed pending events, GVT computation, SPEEDES GVT, Synchronous Parallel Environment for Emulation and Discrete-Event Simulation framework, digital simulation, distributed simulation, distributed synchronization, event processing, fundamental synchronization concept, global reduction operations, portability, synchronisation, global virtual time, interactive support, real time use, software fault tolerance
Also published in:
July 1995
ACM SIGSIM Simulation Digest: Volume 25 Issue 1, July 1995
Keywords:
software fault tolerance
15
Recovery domains: an organizing principle for recoverable operating systems
March 2009
ASPLOS XIV: Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Publisher: ACM
Bibliometrics:
Citation Count: 16
Downloads (6 Weeks): 3, Downloads (12 Months): 21, Downloads (Overall): 715
Full text available:
PDF
We describe a strategy for enabling existing commodity operating systems to recover from unexpected run-time errors in nearly any part of the kernel, including core kernel components. Our approach is dynamic and request-oriented; it isolates the effects of a fault to the requests that caused the fault rather than to ...
Keywords:
automatic fault recovery, akeso, recovery domains
Also published in:
February 2009
ACM SIGPLAN Notices - ASPLOS 2009: Volume 44 Issue 3, March 2009 March 2009
ACM SIGARCH Computer Architecture News - ASPLOS 2009: Volume 37 Issue 1, March 2009
CCS:
Software fault tolerance
Primary CCS:
Software fault tolerance
16
Affinity-aware checkpoint restart
Ajay Saini,
Arash Rezaei,
Frank Mueller,
Paul Hargrove,
Eric Roman
December 2014
Middleware '14: Proceedings of the 15th International Middleware Conference
Publisher: ACM
Bibliometrics:
Citation Count: 1
Downloads (6 Weeks): 6, Downloads (12 Months): 38, Downloads (Overall): 113
Full text available:
PDF
Current checkpointing techniques employed to overcome faults for HPC applications result in inferior application performance after restart from a checkpoint for a number of applications. This is due to a lack of page and core affinity awareness of the checkpoint/restart (C/R) mechanism, i.e., application tasks originally pinned to cores may ...
Keywords:
multi-core, fault tolerance, NUMA, efficiency, system software, checkpoint and restart
CCS:
Software fault tolerance
Primary CCS:
Software fault tolerance
17
July 2004
PODC '04: Proceedings of the twenty-third annual ACM symposium on Principles of distributed computing
Publisher: ACM
Bibliometrics:
Citation Count: 3
Downloads (6 Weeks): 3, Downloads (12 Months): 7, Downloads (Overall): 142
Full text available:
PDF
Keywords:
byzantine fault tolerance, dynamic membership
CCS:
Software fault tolerance
18
The effects of metadata corruption on nfs
October 2007
StorageSS '07: Proceedings of the 2007 ACM workshop on Storage security and survivability
Publisher: ACM
Bibliometrics:
Citation Count: 1
Downloads (6 Weeks): 4, Downloads (12 Months): 6, Downloads (Overall): 262
Full text available:
PDF
Distributed file systems need to be robust in the face of failures. In this work, we study the failure handling and recovery mechanisms of a widely used distributed file system, Linux NFS. We study the behavior of NFS under corruption of important metadata through fault injection. We find that the ...
Keywords:
NFS, inconsistency, silent failure, fault tolerance, metadata corruption, reliability, retry
CCS:
Software fault tolerance
Primary CCS:
Software fault tolerance
19
Distributed middleware reliability and fault tolerance support in system S
Rohit Wagle,
Henrique Andrade,
Kirsten Hildrum,
Chitra Venkatramani,
Michael Spicer
July 2011
DEBS '11: Proceedings of the 5th ACM international conference on Distributed event-based system
Publisher: ACM
Bibliometrics:
Citation Count: 2
Downloads (6 Weeks): 2, Downloads (12 Months): 18, Downloads (Overall): 452
Full text available:
PDF
We describe a fault-tolerance technique for implementing operations in a large-scale distributed system that ensures that all the components will eventually have a consistent view of the system even in the face of component failures. To achieve this, we break the distributed operation into a series of smaller operations, each ...
Keywords:
fault tolerance, recovery, reliability, stream processing, distributed systems, middleware
CCS:
Software fault tolerance
Primary CCS:
Software fault tolerance
20
On the trade-off between network connectivity, round complexity, and communication complexity of reliable message transmission
Ashwinkumar Badanidiyuru,
Arpita Patra,
Ashish Choudhury,
Kannan Srinathan,
C. Pandu Rangan
November 2012
Journal of the ACM (JACM): Volume 59 Issue 5, October 2012
Publisher: ACM
Bibliometrics:
Citation Count: 1
Downloads (6 Weeks): 0, Downloads (12 Months): 26, Downloads (Overall): 614
Full text available:
PDF
Perfectly reliable message transmission (PRMT) is one of the fundamental problems in distributed computing. It allows a sender to reliably transmit a message to a receiver in an unreliable network, even in the presence of a computationally unbounded adversary. In this article, we study the inherent trade-off between the three ...
Keywords:
message transmission, Distributed computing, computationally unbounded
CCS:
Software fault tolerance
Primary CCS:
Software fault tolerance
Result page:
1
2
3
4
5
6
7
8
9
10
>>