Convolutional Neural Networks Quantization with Double-Stage Squeeze-and-Threshold
Publikation: Beitrag in Fachzeitschrift › Forschungsartikel › Beigetragen › Begutachtung
Beitragende
Abstract
It has been proven that, compared to using 32-bit floating-point numbers in the training phase, Deep Convolutional Neural Networks (DCNNs) can operate with low-precision during inference, thereby saving memory footprint and power consumption. However, neural network quantization is always accompanied by accuracy degradation. Here, we propose a quantization method called double-stage Squeeze-and-Threshold (double-stage ST) to close the accuracy gap with full-precision models. While accurate colors in pictures can be pleasing to the viewer, they are not necessary for distinguishing objects. The era of black and white television proves this idea. As long as the limited colors are filled reasonably for different objects, the objects can be well identified and distinguished. Our method utilizes the attention mechanism to adjust the activations and learn the thresholds to distinguish objects (features). We then divide the numerically rich activations into intervals (a limited variety of numerical values) by the learned thresholds. The proposed method supports both binarization and multi-bit quantization. Our method achieves state-of-the-art results. In binarization, ReActNet [Z. Liu, Z. Shen, S. Li, K. Helwegen, D. Huang and K. Cheng, arXiv:abs/2106.11309] trained with our method outperforms the previous state-of-the-art result by 0.2 percentage points. Whereas in multi-bit quantization, the top-1 accuracy of the 3-bit ResNet-18 [K. He, X. Zhang, S. Ren and J. Sun, Deep residual learning for image recognition, 2016 IEEE Conf. Computer Vision and Pattern Recognition, CVPR 2016, 27-30 June 2016, Las Vegas, NV, USA (IEEE Computer Society, 2016), pp. 770-778] model exceeds the top-1 accuracy of its full-precision baseline model by 0.4 percentage points. The double-stage ST activation quantization method is easy to apply by inserting it before the convolution. Besides, the double-stage ST is detachable after training and introducing no computational cost in inference.
Details
Originalsprache | Englisch |
---|---|
Aufsatznummer | 2250051 |
Fachzeitschrift | International Journal of Neural Systems |
Jahrgang | 32 |
Ausgabenummer | 12 |
Publikationsstatus | Veröffentlicht - 1 Dez. 2022 |
Peer-Review-Status | Ja |
Externe IDs
PubMed | 36164719 |
---|
Schlagworte
ASJC Scopus Sachgebiete
Schlagwörter
- attention, convolutional neural networks, Quantization