Fault injectors for tensorflow: Evaluation of the impact of random hardware faults on deep cnns

Research output: Contribution to book/Conference proceedings/Anthology/ReportConference contributionContributedpeer-review

Contributors

  • Michael Beyer - , TUD Dresden University of Technology (Author)
  • Andrey Morozov - , University of Stuttgart (Author)
  • Emil Valiev - , TUD Dresden University of Technology (Author)
  • Christoph Schorn - , Robert Bosch GmbH (Author)
  • Lydia Gauerhof - , Robert Bosch GmbH (Author)
  • Kai Ding - , Robert Bosch GmbH (Author)
  • Klaus Janschek - , Chair of Automation Engineering (Author)

Abstract

Today, Deep Learning (DL) enhances almost every industrial sector, including safety-critical areas. The next generation of safety standards will define appropriate verification techniques for DL-based applications and propose adequate fault tolerance mechanisms. DL-based applications, like any other software, are susceptible to common random hardware faults such as bit flips, which occur in RAM and CPU registers. Such faults can lead to silent data corruption. Therefore, it is crucial to develop methods and tools that help to evaluate how DL components operate under the presence of such faults. In this paper, we introduce two new Fault Injection (FI) frameworks InjectTF and InjectTF2 for TensorFlow 1 and TensorFlow 2, respectively. Both frameworks are available on GitHub and allow the configurable injection of random faults into Neural Networks (NN). In order to demonstrate the feasibility of the frameworks, we also present the results of FI experiments conducted on four VGG-based Convolutional NNs using two image sets. The results demonstrate how random bit flips in the output of particular mathematical operations and layers of NNs affect the classification accuracy. These results help to identify the most critical operations and layers, compare the reliability characteristics of functionally similar NNs, and introduce selective fault tolerance mechanisms.

Details

Original languageEnglish
Title of host publicationProceedings of the 30th European Safety and Reliability Conference and the 15th Probabilistic Safety Assessment and Management Conference
EditorsPiero Baraldi, Francesco Di Maio, Enrico Zio
PublisherResearch Publishing, Singapore
Pages4673-4680
Number of pages8
ISBN (electronic)9789811485930
Publication statusPublished - 2020
Peer-reviewedYes

Conference

Title30th European Safety and Reliability Conference, ESREL 2020 and 15th Probabilistic Safety Assessment and Management Conference, PSAM15 2020
Duration1 - 5 November 2020
CityVenice
CountryItaly

External IDs

Scopus 85110257156

Keywords

Keywords

  • Deep Learning, Fault Injection, Fault Tolerance, Random Hardware Faults, Reliability