High-Flexibility Designs of Quantized Runtime Reconfigurable Multi-Precision Multipliers

Publikation: Beitrag in FachzeitschriftForschungsartikelBeigetragenBegutachtung

Abstract

Recent research widely explored the quantization schemes on hardware. However, for recent accelerators only supporting 8 bits quantization, such as Google TPU, the lower-precision inputs, such as 1/2-bit quantized neural network models in FINN, need to extend the data width to meet the hardware interface requirements. This conversion influences communication and computing efficiency. To improve the flexibility and throughput of quantized multipliers, our work explores two novel reconfigurable multiplier designs that can repartition the number of input channels in runtime based on input precision and reconfigure the signed/unsigned multiplication modes. In this letter, we explored two novel runtime reconfigurable multi-precision multipliers based on the multiplier-tree and bit-serial multiplier architectures. We evaluated our designs by implementing a systolic array and single-layer neural network accelerator on the Ultra96 FPGA platform. The result shows the flexibility of our implementation and the high speedup for low-precision quantized multiplication working with a fixed data width of the hardware interface.

Details

OriginalspracheEnglisch
Seiten (von - bis)194-197
Seitenumfang4
FachzeitschriftIEEE Embedded Systems Letters
Jahrgang15
Ausgabenummer4
PublikationsstatusVeröffentlicht - 1 Dez. 2023
Peer-Review-StatusJa

Schlagworte

Schlagwörter

  • multi-precision, Multiplier, quantization, runtime reconfiguration