Efficient online memory error assessment and circumvention for Linux with RAMpage

Research output: Contribution to journalResearch articleContributedpeer-review

Contributors

  • Horst Schirmeier - , Dortmund University of Technology (Author)
  • Ingo Korb - , Dortmund University of Technology (Author)
  • Olaf Spinczyk - , Dortmund University of Technology (Author)
  • Michael Engel - , Dortmund University of Technology (Author)

Abstract

Memory errors are a major source of reliability problems in computer systems. Undetected errors may result in program termination or, even worse, silent data corruption. Recent studies have shown that the frequency of permanent memory errors is an order of magnitude higher than previously assumed and regularly affects everyday operation. To reduce the impact of memory errors, we designed RAMpage, a purely software-based infrastructure to assess and circumvent permanent memory errors in a running commodity x86-64 Linux-based system. We briefly describe the design and implementation of RAMpage and present new results from an extensive qualitative and quantitative evaluation. These results show the efficiency of our approach - RAMpage is able to provide a smooth graceful degradation in the presence of permanent memory errors while requiring only a small overhead in terms of CPU time, energy, and memory space.

Details

Original languageEnglish
Pages (from-to)227-247
Number of pages21
JournalInternational Journal of Critical Computer-Based Systems
Volume4
Issue number3
Publication statusPublished - 2013
Peer-reviewedYes
Externally publishedYes

External IDs

ORCID /0000-0002-1427-9343/work/167216819

Keywords

ASJC Scopus subject areas

Keywords

  • DRAM chips, Memory errors, Operating systems, Reliable operation, Silent data corruption, Software-based fault tolerance