Reduced-Precision Acceleration of Radio-Astronomical Imaging on Reconfigurable Hardware

Stefano Corda; Bram Veenboer; Ahsan Javed Awan; John W. Romein; Roel Jordans; Akash Kumar; Albert Jan Boonstra; Henk Corporaal

doi:10.1109/ACCESS.2022.3150861

Reduced-Precision Acceleration of Radio-Astronomical Imaging on Reconfigurable Hardware

Publikation: Beitrag in Fachzeitschrift › Forschungsartikel › Beigetragen › Begutachtung

Beitragende

Stefano Corda - , Eindhoven University of Technology, Technische Universität Dresden (Autor:in)
Bram Veenboer - , Netherlands Institute for Radio Astronomy (Autor:in)
Ahsan Javed Awan - , Ericsson AB (Autor:in)
John W. Romein - , Eindhoven University of Technology (Autor:in)
Roel Jordans - , Eindhoven University of Technology (Autor:in)
Akash Kumar - , Professur für Prozessorentwurf (Prozessor Design) (cfaed) (Autor:in)
Albert Jan Boonstra - , Netherlands Institute for Radio Astronomy (Autor:in)
Henk Corporaal - , Eindhoven University of Technology (Autor:in)

Abstract

Radio telescopes produce large volumes of data that need to be processed to obtain high-resolution sky images. This is a complex task that requires computing systems that provide both high performance and high energy efficiency. Hardware accelerators such as GPUs (Graphics Processing Units) and FPGAs (Field Programmable Gate Arrays) can provide these two features and are thus an appealing option for this application. Most HPC (High-Performance Computing) systems operate in double precision (64-bit) or in single precision (32-bit), and radio-astronomical imaging is no exception. With reduced precision computing, smaller data types (e.g., 16-bit) are used to improve energy efficiency and throughput performance in noise-tolerant applications. We demonstrate that reduced precision can also be used to produce high-quality sky images. To this end, we analyze the gridding component (Image-Domain Gridding) of the widely-used WSClean imaging application. Gridding is typically one of the most time-consuming steps in the imaging process and, therefore, an excellent candidate for acceleration. We identify the minimum required exponent and mantissa bits for a custom floating-point data type. Then, we propose the first custom floating-point accelerator on a Xilinx Alveo U50 FPGA using High-Level Synthesis. Our reduced-precision implementation improves the throughput and energy efficiency of respectively 1.84\times and 2.03\times compared to the single-precision floating-point baseline on the same FPGA. Our solution is also 2.12\times faster and 3.46\times more energy-efficient than an Intel i9 9900k CPU (Central Processing Unit) and manages to keep up in throughput with an AMD RX 550 GPU.

Details

Originalsprache	Englisch
Seiten (von - bis)	22819-22843
Seitenumfang	25
Fachzeitschrift	IEEE access
Jahrgang	10
Ausgabenummer	10
Publikationsstatus	Veröffentlicht - 2022
Peer-Review-Status	Ja

Externe IDs

dblp	journals/access/CordaVARJKBC22
Mendeley	bec3465a-dbeb-3080-a41d-3d5c50878a5c

Schlagworte

Forschungsprofillinien der TU Dresden

Informationstechnologien und Mikroelektronik

Ziele für nachhaltige Entwicklung

SDG 7 – Erschwingliche und saubere Energie

ASJC Scopus Sachgebiete

Schlagwörter

Accelerator architectures, approximation methods, astronomy, central processing unit, field programmable gate arrays, graphics processing units, high level synthesis, high performance computing, reconfigurable architectures, scientific computing

Bibliotheksschlagworte

005 Computerprogrammierung, Computerprogramme & Daten

Forschungsportal der TU Dresden