Convolutional Neural Networks Quantization with Double-Stage Squeeze-and-Threshold

Research output: Contribution to journalResearch articleContributedpeer-review

Contributors

Abstract

It has been proven that, compared to using 32-bit floating-point numbers in the training phase, Deep Convolutional Neural Networks (DCNNs) can operate with low-precision during inference, thereby saving memory footprint and power consumption. However, neural network quantization is always accompanied by accuracy degradation. Here, we propose a quantization method called double-stage Squeeze-and-Threshold (double-stage ST) to close the accuracy gap with full-precision models. While accurate colors in pictures can be pleasing to the viewer, they are not necessary for distinguishing objects. The era of black and white television proves this idea. As long as the limited colors are filled reasonably for different objects, the objects can be well identified and distinguished. Our method utilizes the attention mechanism to adjust the activations and learn the thresholds to distinguish objects (features). We then divide the numerically rich activations into intervals (a limited variety of numerical values) by the learned thresholds. The proposed method supports both binarization and multi-bit quantization. Our method achieves state-of-the-art results. In binarization, ReActNet [Z. Liu, Z. Shen, S. Li, K. Helwegen, D. Huang and K. Cheng, arXiv:abs/2106.11309] trained with our method outperforms the previous state-of-the-art result by 0.2 percentage points. Whereas in multi-bit quantization, the top-1 accuracy of the 3-bit ResNet-18 [K. He, X. Zhang, S. Ren and J. Sun, Deep residual learning for image recognition, 2016 IEEE Conf. Computer Vision and Pattern Recognition, CVPR 2016, 27-30 June 2016, Las Vegas, NV, USA (IEEE Computer Society, 2016), pp. 770-778] model exceeds the top-1 accuracy of its full-precision baseline model by 0.4 percentage points. The double-stage ST activation quantization method is easy to apply by inserting it before the convolution. Besides, the double-stage ST is detachable after training and introducing no computational cost in inference.

Details

Original languageEnglish
Article number2250051
JournalInternational Journal of Neural Systems
Volume32
Issue number12
Publication statusPublished - 1 Dec 2022
Peer-reviewedYes

External IDs

PubMed 36164719

Keywords

ASJC Scopus subject areas

Keywords

  • attention, convolutional neural networks, Quantization

Library keywords