RAMpage: Graceful Degradation Management for Memory Errors in Commodity Linux Servers
Research output: Contribution to book/Conference proceedings/Anthology/Report › Conference contribution › Contributed › peer-review
Contributors
Abstract
Memory errors are a major source of reliability problems in current computers. Undetected errors may result in program termination, or, even worse, silent data corruption. Recent studies have shown that the frequency of permanent memory errors is an order of magnitude higher than previously assumed and regularly affects everyday operation. Often, neither additional circuitry to support hardware-based error detection nor downtime for performing hardware tests can be afforded. In the case of permanent memory errors, a system faces two challenges: detecting errors as early as possible and handling them while avoiding system downtime. To increase system reliability, we have developed RAMpage, an online memory testing infrastructure for commodity x86-64-based Linux servers, which is capable of efficiently detecting memory errors and which provides graceful degradation by withdrawing affected memory pages from further use. We describe the design and implementation of RAMpage and present results of an extensive qualitative as well as quantitative evaluation.
Details
Original language | English |
---|---|
Title of host publication | 2011 IEEE 17th Pacific Rim International Symposium on Dependable Computing |
Publisher | IEEE |
Pages | 89-98 |
Number of pages | 10 |
ISBN (print) | 978-0-7695-4590-5 |
Publication status | Published - 14 Dec 2011 |
Peer-reviewed | Yes |
Conference
Title | 2011 IEEE 17th Pacific Rim International Symposium on Dependable Computing |
---|---|
Duration | 12 - 14 December 2011 |
Location | Pasadena, CA, USA |
External IDs
Scopus | 84857724913 |
---|---|
ORCID | /0000-0002-1427-9343/work/167216806 |
Keywords
Keywords
- Random access memory, Kernel, Testing, Memory management, Linux, Degradation, Servers