Hybrid Hardware/Software Detection of Multi-Bit Upsets in Memory
Publikation: Beitrag in Buch/Konferenzbericht/Sammelband/Gutachten › Beitrag in Konferenzband › Beigetragen › Begutachtung
Beitragende
Abstract
Bit flips in main memory can be caused by a multitude of environmental effects, such as heat or radiation, as well as by malicious actors exploiting Rowhammer-style hardware vulnerabilities. The industry-standard countermeasure is SEC-DED ECC memory, which can reliably correct single-and detect double-bit flips in a data word. However, larger multi-bit upsets (MBUs) regularly occur in real-world systems, and – as shown by an analysis in this paper – have a high probability of being miscorrected. Software-implemented hardware fault tolerance (SIHFT) mechanisms can flexibly handle MBUs, but incur significant runtime costs. In this paper, we propose to combine hardware ECC as a low-cost detector and SIHFT as a handler for miscorrected MBUs that recategorizes them as uncorrectable. A preliminary evaluation on the basis of differential checksums shows a 98.5 % reduction in miscorrected silent data corruptions with a very moderate execution-time overhead.
Details
Originalsprache | Englisch |
---|---|
Titel | 2024 54th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W) |
Herausgeber (Verlag) | IEEE |
Seiten | 94-97 |
Seitenumfang | 4 |
ISBN (elektronisch) | 9798350395723 |
ISBN (Print) | 979-8-3503-9573-0 |
Publikationsstatus | Veröffentlicht - 27 Juni 2024 |
Peer-Review-Status | Ja |
Konferenz
Titel | 2024 54th Annual IEEE/IFIP International Conference on Dependable Systems and Networks |
---|---|
Kurztitel | DSN 2024 |
Veranstaltungsnummer | 54 |
Dauer | 24 - 27 Juni 2024 |
Webseite | |
Ort | Pullman Hotel |
Stadt | Brisbane |
Land | Australien |
Externe IDs
ORCID | /0000-0002-1427-9343/work/166764857 |
---|---|
Scopus | 85203820132 |
Schlagworte
Schlagwörter
- Fault tolerance, Fault tolerant systems, Hardware, Heating systems, Memory management, Runtime, Software