Convolutional Neural Networks Quantization with Double-Stage Squeeze-and-Threshold

Binyi Wu; Bernd Waschneck; Christian Georg Mayr

doi:10.1142/S0129065722500514

Convolutional Neural Networks Quantization with Double-Stage Squeeze-and-Threshold

Publikation: Beitrag in Fachzeitschrift › Forschungsartikel › Beigetragen › Begutachtung

Beitragende

Binyi Wu - , Professur für Hochparallele VLSI-Systeme und Neuromikroelektronik, Infineon Technologies AG (Autor:in)
Bernd Waschneck - , Infineon Technologies AG (Autor:in)
Christian Georg Mayr - , Professur für Hochparallele VLSI-Systeme und Neuromikroelektronik, Exzellenzcluster CeTI: Zentrum für Taktiles Internet (Autor:in)

Abstract

It has been proven that, compared to using 32-bit floating-point numbers in the training phase, Deep Convolutional Neural Networks (DCNNs) can operate with low-precision during inference, thereby saving memory footprint and power consumption. However, neural network quantization is always accompanied by accuracy degradation. Here, we propose a quantization method called double-stage Squeeze-and-Threshold (double-stage ST) to close the accuracy gap with full-precision models. While accurate colors in pictures can be pleasing to the viewer, they are not necessary for distinguishing objects. The era of black and white television proves this idea. As long as the limited colors are filled reasonably for different objects, the objects can be well identified and distinguished. Our method utilizes the attention mechanism to adjust the activations and learn the thresholds to distinguish objects (features). We then divide the numerically rich activations into intervals (a limited variety of numerical values) by the learned thresholds. The proposed method supports both binarization and multi-bit quantization. Our method achieves state-of-the-art results. In binarization, ReActNet [Z. Liu, Z. Shen, S. Li, K. Helwegen, D. Huang and K. Cheng, arXiv:abs/2106.11309] trained with our method outperforms the previous state-of-the-art result by 0.2 percentage points. Whereas in multi-bit quantization, the top-1 accuracy of the 3-bit ResNet-18 [K. He, X. Zhang, S. Ren and J. Sun, Deep residual learning for image recognition, 2016 IEEE Conf. Computer Vision and Pattern Recognition, CVPR 2016, 27-30 June 2016, Las Vegas, NV, USA (IEEE Computer Society, 2016), pp. 770-778] model exceeds the top-1 accuracy of its full-precision baseline model by 0.4 percentage points. The double-stage ST activation quantization method is easy to apply by inserting it before the convolution. Besides, the double-stage ST is detachable after training and introducing no computational cost in inference.

Details

Originalsprache	Englisch
Aufsatznummer	2250051
Fachzeitschrift	International Journal of Neural Systems
Jahrgang	32
Ausgabenummer	12
Publikationsstatus	Veröffentlicht - 1 Dez. 2022
Peer-Review-Status	Ja

Externe IDs

PubMed	36164719

Schlagworte

ASJC Scopus Sachgebiete

Computernetzwerke und -kommunikation

Schlagwörter

attention, convolutional neural networks, Quantization

Bibliotheksschlagworte

610 Medizin und Gesundheit

Forschungsportal der TU Dresden