Fine-grained synchronizations and dataflow programming on GPUs

Ang Li; Gert Jan Van Den Braak; Henk Corporaal; Akash Kumar

doi:10.1145/2751205.2751232

Fine-grained synchronizations and dataflow programming on GPUs

Publikation: Beitrag in Buch/Konferenzbericht/Sammelband/Gutachten › Beitrag in Konferenzband › Beigetragen › Begutachtung

Beitragende

Ang Li - , Eindhoven University of Technology (Autor:in)
Gert Jan Van Den Braak - , Eindhoven University of Technology (Autor:in)
Henk Corporaal - , Eindhoven University of Technology (Autor:in)
Akash Kumar - , National University of Singapore (Autor:in)

Abstract

The last decade has witnessed the blooming emergence of many-core platforms, especially the graphic processing units (GPUs). With the exponential growth of cores in GPUs, utilizing them efficiently becomes a challenge. The data-parallel programming model assumes a single instruction stream for multiple concurrent threads (SIMT); therefore little support is offered to enforce thread ordering and finegrained synchronizations. This becomes an obstacle when migrating algorithms which exploit fine-grained parallelism, to GPUs, such as the dataow algorithms. In this paper, we propose a novel approach for fine-grained inter-thread synchronizations on the shared memory of modern GPUs. We demonstrate its performance and compare it with other fine-grained and medium-grained synchronization approaches. Our method achieves 1.5x speedup over the warp-barrier based approach and 4.0x speedup over the atomic spin-lock based approach on average. To further explore the possibility of realizing fine-grained dataow algorithms on GPUs, we apply the proposed synchronization scheme to Needleman-Wunsch-a 2D wavefront application involving massive cross-loop data dependencies. Our implementation achieves 3.56x speedup over the atomic spin-lock implementation and 1.15x speedup over the conventional data-parallel implementation for a basic sub-grid, which implies that the fine-grained, lock-based programming pattern could be an alternative choice for designing general-purpose GPU applications (GPGPU).

Details

Originalsprache	Englisch
Titel	ICS '15: Proceedings of the 29th ACM on International Conference on Supercomputing
Herausgeber (Verlag)	Association for Computing Machinery (ACM), New York
Seiten	109-118
Seitenumfang	10
ISBN (elektronisch)	978-1-4503-3559-1
Publikationsstatus	Veröffentlicht - 8 Juni 2015
Peer-Review-Status	Ja
Extern publiziert	Ja

Publikationsreihe

Reihe	ICS: International Conference on Supercomputing

Konferenz

Titel	29th ACM International Conference on Supercomputing, ICS 2015
Dauer	8 - 11 Juni 2015
Stadt	Newport Beach
Land	USA/Vereinigte Staaten

Schlagworte

Forschungsprofillinien der TU Dresden

Informationstechnologien und Mikroelektronik

ASJC Scopus Sachgebiete

Informatik (insg.)

Schlagwörter

Dataow, Fine-grained synchronization, GPU, Spin-lock

Forschungsportal der TU Dresden