RAPID: Approximate Pipelined Soft Multipliers and Dividers for High Throughput and Energy Efficiency

Research output: Contribution to journalResearch articleContributedpeer-review

Abstract

The rapid updates in error-resilient applications along with their quest for high throughput has motivated designing fast approximate functional units for Field-Programmable Gate Arrays (FPGAs). Studies have proposed various imprecise functional techniques, albeit posed with three shortcomings: first, most existing inexact multipliers and dividers are specialized for Application-Specific Integrated Circuit (ASIC) platforms. Therefore, due to the architectural differences of underlying building blocks in FPGA and ASIC, ASIC-customized designs have not yielded comparable improvements when directly synthesized and ported to FPGAs. Second, state-of-the-art (SoA) approximate units are substituted, mostly in a single kernel of a multi-kernel application. Moreover, the end-to-end assessment is adopted on the Quality of Results (QoR), but not on the overall gained performance. Finally, existing imprecise components are not designed to support a pipelined approach, which could boost the operating frequency/throughput of, e.g., division-included applications. In this paper, we propose, the first pipelined approximate multiplier and divider architectures, customized for FPGAs. The proposed units efficiently utilize 6-input Look-up Tables (6-LUTs) and fast carry chains to implement Mitchell’s approximate algorithms. Our novel error-refinement scheme not only has negligible overhead over the baseline Mitchell’s approach, but also boosts its accuracy to 99.4% for arbitrary size of multiplication and division. Experimental results obtained with Xilinx Vivado demonstrate the efficiency of the proposed pipelined and non-pipelined multipliers and dividers over accurate counterparts. In particular, 4-stage pipelined architecture of 32-bit multiplier (divider) enables 3.3× (5.1×) higher throughput, 2.3× (6.8×) higher throughput/Watt, and 52% (31%) savings of LUTs, over their 4-stage pipelined, accurate IP counterparts. Moreover, the end-to-end evaluations of non-pipelined, deployed in three multi-kernel applications in the domains of bio-signal processing, image processing, and moving object tracking for Unmanned Air Vehicles (UAV) indicate up to 35%, 33%, and 45% improvements in area, latency, and Area-Delay-Product (ADP), respectively, over accurate kernels, with negligible loss in QoR. To springboard future research in reconfigurable and approximate computing communities, our implementations will be available and open-sourced at https://cfaed.tu-dresden.de/pd-downloads.

Details

Original languageEnglish
Article number3
Pages (from-to)712-725
Number of pages14
JournalIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Volume42
Issue number3
Publication statusPublished - 1 Mar 2023
Peer-reviewedYes

External IDs

Scopus 85133620975
Mendeley d8b0aada-e300-3ff4-b109-b86847e9f395

Keywords

Research priority areas of TU Dresden

Sustainable Development Goals

Keywords

  • Approximate Computing, Approximation algorithms, Bio-signal Processing, Compressors, Computer architecture, Divider, Energy-Efficiency., Field programmable gate arrays, Field-Programmable Gate Arrays, High-Throughput, Mitchell’s Algorithm, Multiplier, Pipeline, Pipeline processing, Table lookup, Throughput, Unmanned Air Vehicles, unmanned aerial vehicles (UAVs), Approximate computing, Mitchell's algorithm, divider, energy efficiency, biosignal processing, multiplier, high throughput, pipeline, field-programmable gate arrays (FPGAs)