Bitwise Systolic Array Architecture for Runtime-Reconfigurable Multi-Precision Quantized Multiplication on Hardware Accelerators

Publikation: Beitrag in Buch/Konferenzbericht/Sammelband/GutachtenBeitrag in KonferenzbandBeigetragenBegutachtung

Abstract

Neural network accelerators have been widely applied to edge devices for complex tasks like object tracking, image recognition, etc. Previous works have explored the quantization technologies in related lightweight accelerator designs to reduce hardware resource consumption. However, low precision leads to high accuracy loss in inference. Therefore, mixed-precision quantization becomes an alternative solution by applying different precision in different layers to trade off resource consumption and accuracy. Because regular designs for multiplication on hardware cannot support the precision reconfiguration for a multi-precision Quantized Neural Network (QNN) model in runtime, we propose a runtime reconfigurable multi-precision multi-channel bitwise systolic array design for QNN accelerators. We have implemented and evaluated our work on the Ultra96 FPGA platform. Results show that our work can achieve 1.3185× to 3.5671× speedup in inferring mixed-precision models and has less critical path delay, supporting higher clock frequency (250MHz).

Details

OriginalspracheEnglisch
TitelProceedings of the 26th International Symposium on Quality Electronic Design, ISQED 2025
Herausgeber (Verlag)IEEE Computer Society
Seitenumfang9
ISBN (elektronisch)979-8-3315-0942-2
ISBN (Print)979-8-3315-0943-9
PublikationsstatusVeröffentlicht - 2025
Peer-Review-StatusJa

Konferenz

Titel26th International Symposium on Quality Electronic Design
KurztitelISQED 2025
Veranstaltungsnummer26
Dauer23 - 25 April 2025
Webseite
OrtSeven Hills Conference Center & Online
StadtSan Francisco
LandUSA/Vereinigte Staaten