A Configurable RISC-V Co-Processor with Instruction-Controlled Stream-Based Accelerators
Research output: Contribution to journal › Research article › Contributed › peer-review
Contributors
Abstract
The increasing use of Deep Neural Network (DNN)-based applications on edge devices has imposed new computing challenges, as such workloads demand high computational capabilities while ensuring low power consumption. To address these challenges, enhancing the General-Purpose (GP) computing units of edge devices by integrating hardware accelerators with application-specific functions provides the flexibility to support diverse workloads, thereby improving computing efficiency. This paper proposes a modular RISC-V coprocessor capable of hosting multiple stream-based accelerator IPs in a scalable platform. A low-latency interconnect interfaces the IPs with a local multi-bank scratchpad memory and supports direct IP-to-IP communication. A custom instruction set complemented by software macros is used by a RISC-V core for configuration, execution, and memory management in the coprocessor through the open-source eXtension Interface. Decoupling the coprocessor platform from the core clock domain using asynchronous FIFOs shows that the coprocessor supports integration of up to 8 custom IPs and 16 memory banks at a frequency of 294-333 MHz, when prototyped on an AMD/Xilinx RFSoC 4x2 evaluation board, while consuming 3% or less of the LUTs available on the board. A ShuffleNet-V2 case study shows that the coprocessor achieves comparable latency and up to 51% lower resource utilization compared to standalone HLS IPs.
Details
| Original language | English |
|---|---|
| Journal | ACM transactions on reconfigurable technology and Systems : TRETS |
| Publication status | E-pub ahead of print - 27 May 2026 |
| Peer-reviewed | Yes |
External IDs
| ORCID | /0000-0003-2571-8441/work/217234650 |
|---|---|
| ORCID | /0009-0007-8887-1730/work/217237304 |
Keywords
ASJC Scopus subject areas
Keywords
- RISC-V, ardware Accelerators, High-Level Synthesis, FPGA