Prototyping of Low-Cost Configurable Sparse Neural Processing Unit with Buffer and Mixed-Precision Reshapeable MAC Array

Binyi Wu; Wolfgang Furtner; Bernd Waschneck; Christian G. Mayr

doi:10.1109/ICPADS56603.2022.00098

Prototyping of Low-Cost Configurable Sparse Neural Processing Unit with Buffer and Mixed-Precision Reshapeable MAC Array

Publikation: Beitrag in Buch/Konferenzbericht/Sammelband/Gutachten › Beitrag in Konferenzband › Beigetragen › Begutachtung

Beitragende

Binyi Wu - , Professur für Hochparallele VLSI-Systeme und Neuromikroelektronik (Autor:in)
Wolfgang Furtner - , Infineon Technologies AG (Autor:in)
Bernd Waschneck - , Infineon Technologies Dresden GmbH & Co. KG (Autor:in)
Christian G. Mayr - , Exzellenzcluster CeTI: Zentrum für Taktiles Internet (Autor:in)

Abstract

More recently, it has become possible to run deep learning algorithms on edge devices such as microcontrollers due to continuous improvements in neural network optimization algorithms such as quantization and neural architecture search. Nonetheless, most of the embedded hardware available today still falls short of the requirements of running deep neural networks. As a result, specialized processors have emerged to improve the inference efficiency of deep learning algorithms. However, most are not for edge applications that require efficient and low-cost hardware. Therefore, we design and prototype a low-cost configurable sparse Neural Processing Unit (NPU). The NPU has a built-in buffer and a reshapable mixed-precision multiply-accumulator (MAC) array. The computing and memory resources of the NPU are parameterized, and different NPUs can be derived. Besides, users can also conFigure the NPU at runtime to fully utilize the resources. In our experiments, the 200MHz NPU with only 32 MACs is more than 32 times faster than the 400MHzSTM32H7 when inferring MobileNet-Vl. Besides, the yielded NPUs can achieve roofline or even beyond roofline performance. The buffer and reshapeable MAC array push the NPU's attainable performance to the roofline, while the feature of supporting sparsity allows the NPU to obtain performance beyond the roofline.

Details

Originalsprache	Englisch
Titel	Proceedings - 2022 IEEE 28th International Conference on Parallel and Distributed Systems, ICPADS 2022
Erscheinungsort	Nanjing
Seiten	712-719
Seitenumfang	8
ISBN (elektronisch)	978-1-6654-7315-6
Publikationsstatus	Veröffentlicht - 2023
Peer-Review-Status	Ja

Publikationsreihe

Reihe	International Conference on Parallel and Distributed Systems (ICPADS)
ISSN	1521-9097

Externe IDs

Scopus	85152928335

Schlagworte

ASJC Scopus Sachgebiete

Hardware und Architektur

Schlagwörter

configurable, low-cost, mixed-precision, neural processing unit, sparsity

Bibliotheksschlagworte

004 Informatik

Forschungsportal der TU Dresden