RAMpage: Graceful Degradation Management for Memory Errors in Commodity Linux Servers
Publikation: Beitrag in Buch/Konferenzbericht/Sammelband/Gutachten › Beitrag in Konferenzband › Beigetragen › Begutachtung
Beitragende
Abstract
Memory errors are a major source of reliability problems in current computers. Undetected errors may result in program termination, or, even worse, silent data corruption. Recent studies have shown that the frequency of permanent memory errors is an order of magnitude higher than previously assumed and regularly affects everyday operation. Often, neither additional circuitry to support hardware-based error detection nor downtime for performing hardware tests can be afforded. In the case of permanent memory errors, a system faces two challenges: detecting errors as early as possible and handling them while avoiding system downtime. To increase system reliability, we have developed RAMpage, an online memory testing infrastructure for commodity x86-64-based Linux servers, which is capable of efficiently detecting memory errors and which provides graceful degradation by withdrawing affected memory pages from further use. We describe the design and implementation of RAMpage and present results of an extensive qualitative as well as quantitative evaluation.
Details
Originalsprache | Englisch |
---|---|
Titel | 2011 IEEE 17th Pacific Rim International Symposium on Dependable Computing |
Herausgeber (Verlag) | IEEE |
Seiten | 89-98 |
Seitenumfang | 10 |
ISBN (Print) | 978-0-7695-4590-5 |
Publikationsstatus | Veröffentlicht - 14 Dez. 2011 |
Peer-Review-Status | Ja |
Konferenz
Titel | 2011 IEEE 17th Pacific Rim International Symposium on Dependable Computing |
---|---|
Dauer | 12 - 14 Dezember 2011 |
Ort | Pasadena, CA, USA |
Externe IDs
Scopus | 84857724913 |
---|---|
ORCID | /0000-0002-1427-9343/work/167216806 |
Schlagworte
Schlagwörter
- Random access memory, Kernel, Testing, Memory management, Linux, Degradation, Servers