Hybrid Hardware/Software Detection of Multi-Bit Upsets in Memory

Publikation: Beitrag in Buch/Konferenzbericht/Sammelband/GutachtenBeitrag in KonferenzbandBeigetragenBegutachtung

Beitragende

Abstract

Bit flips in main memory can be caused by a multitude of environmental effects, such as heat or radiation, as well as by malicious actors exploiting Rowhammer-style hardware vulnerabilities. The industry-standard countermeasure is SEC-DED ECC memory, which can reliably correct single-and detect double-bit flips in a data word. However, larger multi-bit upsets (MBUs) regularly occur in real-world systems, and – as shown by an analysis in this paper – have a high probability of being miscorrected. Software-implemented hardware fault tolerance (SIHFT) mechanisms can flexibly handle MBUs, but incur significant runtime costs. In this paper, we propose to combine hardware ECC as a low-cost detector and SIHFT as a handler for miscorrected MBUs that recategorizes them as uncorrectable. A preliminary evaluation on the basis of differential checksums shows a 98.5 % reduction in miscorrected silent data corruptions with a very moderate execution-time overhead.

Details

OriginalspracheEnglisch
Titel2024 54th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W)
Herausgeber (Verlag)IEEE
Seiten94-97
Seitenumfang4
ISBN (elektronisch)9798350395723
ISBN (Print)979-8-3503-9573-0
PublikationsstatusVeröffentlicht - 27 Juni 2024
Peer-Review-StatusJa

Konferenz

Titel2024 54th Annual IEEE/IFIP International Conference on Dependable Systems and Networks
KurztitelDSN 2024
Veranstaltungsnummer54
Dauer24 - 27 Juni 2024
Webseite
OrtPullman Hotel
StadtBrisbane
LandAustralien

Externe IDs

ORCID /0000-0002-1427-9343/work/166764857

Schlagworte

Schlagwörter

  • Fault tolerance, Fault tolerant systems, Hardware, Heating systems, Memory management, Runtime, Software