A Configurable RISC-V Co-Processor with Instruction-Controlled Stream-Based Accelerators

Research output: Contribution to journalResearch articleContributedpeer-review

Abstract

The increasing use of Deep Neural Network (DNN)-based applications on edge devices has imposed new computing challenges, as such workloads demand high computational capabilities while ensuring low power consumption. To address these challenges, enhancing the General-Purpose (GP) computing units of edge devices by integrating hardware accelerators with application-specific functions provides the flexibility to support diverse workloads, thereby improving computing efficiency. This paper proposes a modular RISC-V coprocessor capable of hosting multiple stream-based accelerator IPs in a scalable platform. A low-latency interconnect interfaces the IPs with a local multi-bank scratchpad memory and supports direct IP-to-IP communication. A custom instruction set complemented by software macros is used by a RISC-V core for configuration, execution, and memory management in the coprocessor through the open-source eXtension Interface. Decoupling the coprocessor platform from the core clock domain using asynchronous FIFOs shows that the coprocessor supports integration of up to 8 custom IPs and 16 memory banks at a frequency of 294-333 MHz, when prototyped on an AMD/Xilinx RFSoC 4x2 evaluation board, while consuming 3% or less of the LUTs available on the board. A ShuffleNet-V2 case study shows that the coprocessor achieves comparable latency and up to 51% lower resource utilization compared to standalone HLS IPs.

Details

Original languageEnglish
Journal ACM transactions on reconfigurable technology and Systems : TRETS
Publication statusE-pub ahead of print - 27 May 2026
Peer-reviewedYes

External IDs

ORCID /0000-0003-2571-8441/work/217234650
ORCID /0009-0007-8887-1730/work/217237304

Keywords

ASJC Scopus subject areas

Keywords

  • RISC-V, ardware Accelerators, High-Level Synthesis, FPGA