Reliability-aware resource management in multi-/many-core systems: A perspective paper

Research output: Contribution to journalResearch articleContributedpeer-review

Abstract

With the advancement of technology scaling, multi/many-core platforms are getting more attention in embedded systems due to the ever-increasing performance requirements and power efficiency. This feature size scaling, along with architectural innovations, has dramatically exacerbated the rate of manufacturing defects and physical fault-rates. As a result, in addition to providing high parallelism, such hardware platforms have introduced increasing unreliability into the system. Such systems need to be well designed to ensure long-term and application-specific reliability, especially in mixed-criticality systems, where incorrect execution of applications may cause catastrophic consequences. However, the optimal allocation of applications/tasks on multi/many-core platforms is an increasingly complex problem. Therefore, reliability-aware resource management is crucial while ensuring the application-specific Quality-of-Service (QoS) requirements and optimizing other system-level performance goals. This article presents a survey of recent works that focus on reliability-aware resource management in multi-/many-core systems. We first present an overview of reliability in electronic systems, associated fault models and the various system models used in related research. Then, we present recent published articles primarily focusing on aspects such as application-specific reliability optimization, mixed-criticality awareness, and hardware resource heterogeneity. To underscore the techniques’ differences, we classify them based on the design space exploration. In the end, we briefly discuss the upcoming trends and open challenges within the domain of reliability-aware resource management for future research.

Details

Original languageEnglish
Pages (from-to)1-37
Number of pages37
JournalJournal of Low Power Electronics and Applications
Volume11
Issue number7
Publication statusPublished - Mar 2021
Peer-reviewedYes

Keywords

Research priority areas of TU Dresden

ASJC Scopus subject areas

Keywords

  • Mixed-criticality, Multi/many-core platforms, Reliability, Resource management