Introducing the Arm-Membench Throughput Benchmark
Publikation: Beitrag in Buch/Konferenzbericht/Sammelband/Gutachten › Beitrag in Konferenzband › Beigetragen › Begutachtung
Beitragende
Abstract
Application performance of modern day processors is often limited by the memory subsystem rather than actual compute capabilities. Therefore, data throughput specifications play a key role in modeling application performance and determining possible bottlenecks. However, while peak instruction throughputs and bandwidths for local caches are often documented, the achievable data and instruction throughput can also depend on the relation between memory access and compute instructions. In this paper, we present an Arm version of the established x86-membench throughput benchmark, which we adapted to support all current SIMD extensions of the Armv8 instruction set architecture. We describe aspects of the Armv8 ISA that need to be considered in the portable design of this benchmark. We use the benchmark to analyze the memory subsystem at a fine spatial granularity and to unveil microarchitectural details of three processors: Fujitsu A64FX, Ampere Altra and Cavium ThunderX2. Based on the resulting performance information, we show that instruction fetch and decoder widths become a potential bottleneck for cache-bandwidth-sensitive workloads due to the load-store concept of the Arm ISA.
Details
| Originalsprache | Englisch |
|---|---|
| Titel | Parallel Processing and Applied Mathematics |
| Redakteure/-innen | Roman Wyrzykowski, Jack Dongarra, Ewa Deelman, Konrad Karczewski |
| Herausgeber (Verlag) | Springer, Cham |
| Seiten | 99–112 |
| Seitenumfang | 14 |
| ISBN (elektronisch) | 978-3-031-85697-6 |
| ISBN (Print) | 978-3-031-85696-9 |
| Publikationsstatus | Veröffentlicht - 2025 |
| Peer-Review-Status | Ja |
Publikationsreihe
| Reihe | Lecture Notes in Computer Science |
|---|---|
| Band | 15579 |
| ISSN | 0302-9743 |
Externe IDs
| ORCID | /0009-0001-6030-3201/work/181861120 |
|---|---|
| Scopus | 105002708176 |
Schlagworte
ASJC Scopus Sachgebiete
Schlagwörter
- Microarchitecture, ThunderX2, Ampere Altra, Bandwidth, Computer architecture, Benchmark, Throughput, Performance analysis, A64FX, Arm, Cache