Lightweight Instruction Set for Flexible Dilated Convolutions and Mixed-Precision Operands

Simon Friedrich; Shambhavi Balamuthu Sampath; Robert Wittig; Manoj Rohit Vemparala; Nael Fasfous; Emil Matúš; Walter Stechele; Gerhard Fettweis

doi:10.1109/ISQED57927.2023.10129341

Lightweight Instruction Set for Flexible Dilated Convolutions and Mixed-Precision Operands

Research output: Contribution to book/Conference proceedings/Anthology/Report › Conference contribution › Contributed › peer-review

Contributors

Simon Friedrich - , Vodafone Chair of Mobile Communications Systems (Author)
Shambhavi Balamuthu Sampath - , BMW Group (Author)
Robert Wittig - , Vodafone Chair of Mobile Communications Systems (Author)
Manoj Rohit Vemparala - , BMW Group (Author)
Nael Fasfous - , BMW Group (Author)
Emil Matúš - , Vodafone Chair of Mobile Communications Systems (Author)
Walter Stechele - , Technical University of Munich (Author)
Gerhard Fettweis - , Vodafone Chair of Mobile Communications Systems (Author)

Abstract

Modern deep neural networks specialized for object detection and semantic segmentation require specific operations to increase or preserve the resolution of their feature maps. Hence, more generic convolution layers called transposed and dilated convolutions are employed, adding a large number of zeros between the elements of the input features or weights. Usually, standard neural network hardware accelerators process these convolutions in a straightforward manner, without paying attention to the added zeros, resulting in an increased computation time. To cope with this problem, recent works propose to skip the redundant elements with additional hardware or solve the problem efficiently only for a limited range of dilation rates. We present a general approach for accelerating transposed and dilated convolutions that does not introduce any hardware overhead while supporting all dilation rates. To achieve this, we introduce a novel precision-scalable lightweight instruction set and memory scheme that can be applied to the different convolution variants. This results in a speed-up of 5 times in DeepLabV3+ outperforming the recently proposed design methods. The support of precision-scalable execution of all workloads further increases the speedup in computation time shown for the PointPillars, DeepLabV3+, and ENet networks. Compared to the state-of-the-art commercial EdgeTPU, the instruction footprint of ResNet-50 of our designed accelerator is reduced by 60 percent.

Details

Original language	English
Title of host publication	2023 24th International Symposium on Quality Electronic Design (ISQED)
Publisher	Institute of Electrical and Electronics Engineers (IEEE)
Number of pages	8
ISBN (electronic)	979-8-3503-3475-3, 979-8-3503-3474-6
ISBN (print)	979-8-3503-3476-0
Publication status	Published - 7 Apr 2023
Peer-reviewed	Yes

Conference

Title	24th International Symposium on Quality Electronic Design
Abbreviated title	ISQED 2023
Conference number	24
Duration	5 - 7 April 2023
Website	https://www.isqed.org/English/Archives/2023/index.html
Location	Seven Hills Conference Center
City	San Francisco
Country	United States of America

External IDs

Scopus	85161616984

Keywords

ASJC Scopus subject areas

Keywords

Convolution, Deep learning, Design methodology, Instruction sets, Neural network hardware, Object detection, Semantic segmentation, accelerator, DNN, memory alignment, stride, address generation, transposed convolution, mixed-precision, instruction set, dilated convolution

Research Portal of the TU Dresden

Contributors

Abstract

Details

Conference

External IDs

Keywords

ASJC Scopus subject areas

Keywords