Convolutional Neural Networks Quantization with Double-Stage Squeeze-and-Threshold

Binyi Wu; Bernd Waschneck; Christian Georg Mayr

doi:10.1142/S0129065722500514

Convolutional Neural Networks Quantization with Double-Stage Squeeze-and-Threshold

Research output: Contribution to journal › Research article › Contributed › peer-review

Contributors

Binyi Wu - , Chair of Highly-Parallel VLSI Systems and Neuro-Microelectronics, Infineon Technologies AG (Author)
Bernd Waschneck - , Infineon Technologies AG (Author)
Christian Georg Mayr - , Chair of Highly-Parallel VLSI Systems and Neuro-Microelectronics, Clusters of Excellence CeTI: Centre for Tactile Internet (Author)

Abstract

It has been proven that, compared to using 32-bit floating-point numbers in the training phase, Deep Convolutional Neural Networks (DCNNs) can operate with low-precision during inference, thereby saving memory footprint and power consumption. However, neural network quantization is always accompanied by accuracy degradation. Here, we propose a quantization method called double-stage Squeeze-and-Threshold (double-stage ST) to close the accuracy gap with full-precision models. While accurate colors in pictures can be pleasing to the viewer, they are not necessary for distinguishing objects. The era of black and white television proves this idea. As long as the limited colors are filled reasonably for different objects, the objects can be well identified and distinguished. Our method utilizes the attention mechanism to adjust the activations and learn the thresholds to distinguish objects (features). We then divide the numerically rich activations into intervals (a limited variety of numerical values) by the learned thresholds. The proposed method supports both binarization and multi-bit quantization. Our method achieves state-of-the-art results. In binarization, ReActNet [Z. Liu, Z. Shen, S. Li, K. Helwegen, D. Huang and K. Cheng, arXiv:abs/2106.11309] trained with our method outperforms the previous state-of-the-art result by 0.2 percentage points. Whereas in multi-bit quantization, the top-1 accuracy of the 3-bit ResNet-18 [K. He, X. Zhang, S. Ren and J. Sun, Deep residual learning for image recognition, 2016 IEEE Conf. Computer Vision and Pattern Recognition, CVPR 2016, 27-30 June 2016, Las Vegas, NV, USA (IEEE Computer Society, 2016), pp. 770-778] model exceeds the top-1 accuracy of its full-precision baseline model by 0.4 percentage points. The double-stage ST activation quantization method is easy to apply by inserting it before the convolution. Besides, the double-stage ST is detachable after training and introducing no computational cost in inference.

Details

Original language	English
Article number	2250051
Journal	International Journal of Neural Systems
Volume	32
Issue number	12
Publication status	Published - 1 Dec 2022
Peer-reviewed	Yes

External IDs

PubMed	36164719

Keywords

ASJC Scopus subject areas

Computer Networks and Communications

Keywords

attention, convolutional neural networks, Quantization

Library keywords

610 Medicine and health

Research Portal of the TU Dresden

Contributors

Abstract

Details

External IDs

Keywords

ASJC Scopus subject areas

Keywords

Library keywords