Performance models and energy-optimal scheduling of DNNs on many-core hardware with dynamic power management

Bernhard Vogginger; Florian Kelber; Shambhavi Balamuthu Sampath; Johannes Partzsch; Christian Georg Mayr

doi:10.1145/3615338.3618127

Performance models and energy-optimal scheduling of DNNs on many-core hardware with dynamic power management

Publikation: Beitrag in Buch/Konferenzbericht/Sammelband/Gutachten › Beitrag in Konferenzband › Beigetragen › Begutachtung

Beitragende

Bernhard Vogginger - , Professur für Hochparallele VLSI-Systeme und Neuromikroelektronik (Autor:in)
Florian Kelber - , Professur für Hochparallele VLSI-Systeme und Neuromikroelektronik (Autor:in)
Shambhavi Balamuthu Sampath - , Vodafone Stiftungsprofessur für Mobile Nachrichtensysteme (Autor:in)
Johannes Partzsch - , Professur für Hochparallele VLSI-Systeme und Neuromikroelektronik (Autor:in)
Christian Georg Mayr - , Professur für Hochparallele VLSI-Systeme und Neuromikroelektronik (Autor:in)

Abstract

Processing of deep neural networks (DNNs) at the edge may be limited by power or energy constraints of the used embedded hardware system. It is therefore desirable for the compiler to create efficient executables for given DNN models meeting the specific constraints. Here, we consider a low-power many-core hardware with 152 processing elements (PE), each containing an ARM M4F processor, 128 KB SRAM and a custom accelerator for DNN inference. Dynamic power management allows each core to switch between a high-speed and a low-power mode within tens of nanoseconds. For an energy-optimal parallelization of DNNs on the hardware, we first develop analytical performance models to predict the time and energy for executing a DNN layer with the custom accelerator. The models are fitted and validated using measurements on a prototype chip. In a second step we develop concepts for the energy-optimal parallelization of DNNs under latency constraints and evaluate them deploying the performance models: By dynamically switching between the operating modes more than 10% of energy can be saved compared to the case running at high-speed mode only. The presented methodology and concepts are easily transferable to other many-core edge processors.

Details

Originalsprache	Englisch
Titel	Proceedings - 2023 IEEE/ACM International Workshop on Compilers, Deployment, and Tooling for Edge AI, CODAI 2023
Seiten	27 - 31
Seitenumfang	5
ISBN (elektronisch)	9798400703379
Publikationsstatus	Veröffentlicht - 21 Sept. 2023
Peer-Review-Status	Ja

Externe IDs

ORCID	/0000-0002-6286-5064/work/166324419
Scopus	85196429407

Schlagworte

ASJC Scopus Sachgebiete

Schlagwörter

deep neural networks, edge computing, many-core hardware, parallelization, performance model, power management

Forschungsportal der TU Dresden