High-Flexibility Designs of Quantized Runtime Reconfigurable Multi-Precision Multipliers

Research output: Contribution to journalResearch articleContributedpeer-review

Abstract

Recent research widely explored the quantization schemes on hardware. However, for recent accelerators only supporting 8 bits quantization, such as Google TPU, the lower-precision inputs, such as 1/2-bit quantized neural network models in FINN, need to extend the data width to meet the hardware interface requirements. This conversion influences communication and computing efficiency. To improve the flexibility and throughput of quantized multipliers, our work explores two novel reconfigurable multiplier designs that can repartition the number of input channels in runtime based on input precision and reconfigure the signed/unsigned multiplication modes. In this letter, we explored two novel runtime reconfigurable multi-precision multipliers based on the multiplier-tree and bit-serial multiplier architectures. We evaluated our designs by implementing a systolic array and single-layer neural network accelerator on the Ultra96 FPGA platform. The result shows the flexibility of our implementation and the high speedup for low-precision quantized multiplication working with a fixed data width of the hardware interface.

Details

Original languageEnglish
Pages (from-to)194-197
Number of pages4
JournalIEEE Embedded Systems Letters
Volume15
Issue number4
Publication statusPublished - 1 Dec 2023
Peer-reviewedYes

Keywords

Keywords

  • multi-precision, Multiplier, quantization, runtime reconfiguration