NetPU-M: A Generic Reconfigurable Neural Network Accelerator Architecture for MLPs

Yuhao Liu; Shubham Rai; Salim Ullah; Akash Kumar

doi:10.1109/IPDPSW59300.2023.00026

NetPU-M: A Generic Reconfigurable Neural Network Accelerator Architecture for MLPs

Research output: Contribution to book/Conference proceedings/Anthology/Report › Conference contribution › Contributed › peer-review

Contributors

Yuhao Liu - , Center for Advancing Electronics Dresden (cfaed), Chair of Processor Design (cfaed), Center for Scalable Data Analytics and Artificial Intelligence (ScaDS.AI Dresden) (Author)
Shubham Rai - , Center for Advancing Electronics Dresden (cfaed), Chair of Processor Design (cfaed), Center for Scalable Data Analytics and Artificial Intelligence (ScaDS.AI Dresden) (Author)
Salim Ullah - , Chair of Processor Design (cfaed), Center for Advancing Electronics Dresden (cfaed), Center for Scalable Data Analytics and Artificial Intelligence (ScaDS.AI Dresden) (Author)
Akash Kumar - , Chair of Processor Design (cfaed), Center for Advancing Electronics Dresden (cfaed), Center for Scalable Data Analytics and Artificial Intelligence (ScaDS.AI Dresden) (Author)

Abstract

Recent research widely deployed Neural Networks (NNs) in various scenarios, such as IoT systems, wearable devices, or smart sensors. However, the complex application scenarios cause the rapid extension of network model size and the requirement for higher-performance hardware platforms. Related works apply Heterogeneous Streaming Dataflow (HSD) and Processing Element Matrix (PEM) architectures as the most popular schemes for FPGA-based implementation of NN accelerator: 1) HSD architecture implements a complete network for given trained models on FPGA with simplified control but more hardware consumption; 2) PEM architecture implements reusable neuron structures controlled by runtime environments/drivers providing the generic acceleration supports for different network models. Our work explores a new hybrid architecture based on HSD and PEM to implement a reusable partial network structure on FPGA and achieve generic acceleration supports for different network models with simplffied runtime control. This architecture supports scalable, mixable, quantized precision, and selectable activation functions, including ReLU, Sigmoid, Tanh, Sign, and Multi-Thresholds. Data stream transmission can reset the accelerator configuration in runtime without hardware implementation changes for different networks. Our design fully supports the generic inference acceleration for different Multi-Layer Perceptron (MLP) models.

Details

Original language	English
Title of host publication	2023 IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2023
Publisher	Institute of Electrical and Electronics Engineers (IEEE)
Pages	85-92
Number of pages	8
ISBN (electronic)	979-8-3503-1199-0
Publication status	Published - 2023
Peer-reviewed	Yes

Conference

Title	37th IEEE International Parallel and Distributed Processing Symposium
Abbreviated title	IPDPS 2023
Conference number	37
Duration	15 - 19 May 2023
Website	https://www.ipdps.org/ipdps2023/2023-.html
Location	Hilton St. Petersburg Bayfront
City	St. Petersburg
Country	United States of America

Keywords

ASJC Scopus subject areas

Keywords

FPGA, Generic Hardware Accelerator, Neural Network, Quantization

Research Portal of the TU Dresden

Contributors

Abstract

Details

Conference

Keywords

ASJC Scopus subject areas

Keywords