A Configurable RISC-V Co-Processor with Instruction-Controlled Stream-Based Accelerators

Rohan Krishna Vijayaraghavan; Ahmed Kamaleldin; Matthias Nickel; Lester Kalms; Diana Göhringer

doi:10.1145/3816253

A Configurable RISC-V Co-Processor with Instruction-Controlled Stream-Based Accelerators

Research output: Contribution to journal › Research article › Contributed › peer-review

Contributors

Rohan Krishna Vijayaraghavan - , Chair of Adaptive Dynamic Systems (First author)
Ahmed Kamaleldin - , Chair of Adaptive Dynamic Systems (Author)
Matthias Nickel - , Chair of Adaptive Dynamic Systems (Author)
Lester Kalms - , Chair of Adaptive Dynamic Systems, Center for Scalable Data Analytics and Artificial Intelligence (ScaDS.AI Dresden) (Author)
Diana Göhringer - , Chair of Adaptive Dynamic Systems, Center for Scalable Data Analytics and Artificial Intelligence (ScaDS.AI Dresden) (Author)

Abstract

The increasing use of Deep Neural Network (DNN)-based applications on edge devices has imposed new computing challenges, as such workloads demand high computational capabilities while ensuring low power consumption. To address these challenges, enhancing the General-Purpose (GP) computing units of edge devices by integrating hardware accelerators with application-specific functions provides the flexibility to support diverse workloads, thereby improving computing efficiency. This paper proposes a modular RISC-V coprocessor capable of hosting multiple stream-based accelerator IPs in a scalable platform. A low-latency interconnect interfaces the IPs with a local multi-bank scratchpad memory and supports direct IP-to-IP communication. A custom instruction set complemented by software macros is used by a RISC-V core for configuration, execution, and memory management in the coprocessor through the open-source eXtension Interface. Decoupling the coprocessor platform from the core clock domain using asynchronous FIFOs shows that the coprocessor supports integration of up to 8 custom IPs and 16 memory banks at a frequency of 294-333 MHz, when prototyped on an AMD/Xilinx RFSoC 4x2 evaluation board, while consuming 3% or less of the LUTs available on the board. A ShuffleNet-V2 case study shows that the coprocessor achieves comparable latency and up to 51% lower resource utilization compared to standalone HLS IPs.

Details

Original language	English
Journal	ACM transactions on reconfigurable technology and Systems : TRETS
Publication status	E-pub ahead of print - 27 May 2026
Peer-reviewed	Yes

External IDs

ORCID	/0000-0003-2571-8441/work/217234650
ORCID	/0009-0007-8887-1730/work/217237304

Research Portal of the TU Dresden

A Configurable RISC-V Co-Processor with Instruction-Controlled Stream-Based Accelerators

Contributors

Abstract

Details

External IDs

Keywords

ASJC Scopus subject areas

Keywords