P-CORE: Exploring RISC-V Packed-SIMD Extension for CNNs
Publikation: Beitrag in Fachzeitschrift › Forschungsartikel › Beigetragen › Begutachtung
Beitragende
Abstract
In today's technological landscape, embedded and IoT devices face escalating demands for performance and power efficiency in inference tasks employing Convolution Neural Networks (CNNs). Various methods have emerged to improve performance, including quantization and the adoption of fixed-point formats instead of floating-point formats. RISC-V Instruction Set Architecture (ISA) being open-source and with a modular specification poses as a suitable platform for targeting such applications for embedded processing. The RISC-V's standard P-extension represents a significant advancement in integer-based computation, offering new ways for achieving higher speedups through integer-packed SIMD instructions. This work presents the first RISC-V processor based on RV32IM base ISA with an optional support of P-extension Version 0.9.11-draft-20211209 called the P-CORE. The P-CORE implements P-extension ISA for RV32 specification. The implemented processor is evaluated for Xilinx Zynq Ultrascale+ FPGA using different test case functions; matrix multiplication, convolution, max-pooling, and fully connected. A CNN used for detecting handwritten digits called Lenet-5 is also used as a test case to evaluate the P-CORE. The evaluation also presents Design Space Exploration (DSE) for P-extension by exploiting the level of data-level-parallelism; SIMD8 (8-bit (4-way)) and SIMD16 (16-bit (2-way)). Our investigation demonstrates that with the P-extension activated, CNN algorithm computations can achieve speedups of up to 17x for max-pooling, 7x for matrix multiplication, and 4.8x for fully connected networks, accompanied by similar enhancements in power efficiency. The processor achieved a maximum performance of 335.1 MIOPS and 490.6 MIOPS/W efficiency for matrix multiplication using -O3 compiler optimization.
Details
| Originalsprache | Englisch |
|---|---|
| Seiten (von - bis) | 146603-146616 |
| Seitenumfang | 14 |
| Fachzeitschrift | IEEE access |
| Jahrgang | 13 |
| Publikationsstatus | Veröffentlicht - Aug. 2025 |
| Peer-Review-Status | Ja |
Externe IDs
| ORCID | /0000-0003-2571-8441/work/214453708 |
|---|---|
| ORCID | /0000-0002-8019-7936/work/214453882 |
Schlagworte
ASJC Scopus Sachgebiete
Schlagwörter
- CNN, FPGA, machine learning, packed-SIMD, RISC-V, vector processing