Multi-Precision Deep Neural Network Acceleration on FPGAs

Research output: Contribution to book/Conference proceedings/Anthology/ReportConference contributionContributedpeer-review

Contributors

Abstract

Quantization is a promising approach to reduce the computational load of neural networks. The minimum bit-width that preserves the original accuracy varies significantly across different neural networks and even across different layers of a single neural network. Most existing designs over-provision neural network accelerators with sufficient bit-width to preserve the required accuracy across a wide range of neural networks. In this paper, we present mpDNN, a multi-precision multiplier with dynamically adjustable bit-width for deep neural network acceleration. The design supports run-time splitting an arithmetic operator into multiple independent operators with smaller bit-width, effectively increasing throughput when lower precision is required. The proposed architecture is designed for FPGAs, in that the multipliers and bit-width adjustment mechanism are optimized for the LUT-based structure of FPGAs. Experimental results show that by enabling run-time precision adjustment, mpDNN can offer 3-15x improvement in throughput.

Details

Original languageEnglish
Title of host publication2022 27th Asia and South Pacific Design Automation Conference (ASP-DAC)
PublisherIEEE, New York [u. a.]
Pages454-459
Number of pages6
ISBN (electronic)9781665421355
Publication statusPublished - 2022
Peer-reviewedYes

Publication series

SeriesAsia and South Pacific Design Automation Conference (ASP-DAC)
Volume2022-January

Conference

Title27th Asia and South Pacific Design Automation Conference, ASP-DAC 2022
Duration17 - 20 January 2022
CityVirtual, Online
CountryTaiwan, Province of China