Dual Vector Load for Improved Pipelining in Vector Processors

Publikation: Beitrag in Buch/Konferenzbericht/Sammelband/GutachtenBeitrag in KonferenzbandBeigetragenBegutachtung

Beitragende

Abstract

Vector processors execute instructions that manipulate vectors of data items using time-division multiplexing (TDM). Chaining, the pipelined execution of vector instruction, ensures high performance and utilization. When two vectors are loaded sequentially to be the input of a follow-up compute instruction, which is often the case in vector applications, chaining cannot take effect during the duration of the entire first vector load. To close this gap, we propose dual load: A parallel or interleaved load of the two input vectors. We study this feature analytically and make statements on necessary conditions for performance improvements. Our investigation finds that compute-bound and some memory-bound applications profit from this feature when the memory and compute bandwidths are sufficiently high. A speedup of up to 33 % is possible in the ideal case. Our practical implementation shows improvements of up to 21 % with a hardware overhead of less than 2 %.

Details

OriginalspracheEnglisch
Titel2023 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS)
Herausgeber (Verlag)Institute of Electrical and Electronics Engineers (IEEE)
Seitenumfang6
ISBN (elektronisch)979-8-3503-3201-8
ISBN (Print)979-8-3503-3202-5
PublikationsstatusVeröffentlicht - 21 Apr. 2023
Peer-Review-StatusJa

Konferenz

Titel26th IEEE Symposium in Low-Power and High-Speed Chips
KurztitelCOOL CHIPS 26
Veranstaltungsnummer26
Dauer19 - 21 April 2023
Webseite
OrtThe University of Tokyo
StadtTokyo
LandJapan

Externe IDs

Scopus 85160755632

Schlagworte

Schlagwörter

  • Bandwidth, Hardware, Pipeline processing, Time division multiplexing, Vector processors, DSP, dual load, RISC-V, vector processor