Fine-grained synchronizations and dataflow programming on GPUs

Ang Li; Gert Jan Van Den Braak; Henk Corporaal; Akash Kumar

doi:10.1145/2751205.2751232

Fine-grained synchronizations and dataflow programming on GPUs

Research output: Contribution to book/conference proceedings/anthology/report › Conference contribution › Contributed › peer-review

Contributors

Ang Li - , Eindhoven University of Technology (Author)
Gert Jan Van Den Braak - , Eindhoven University of Technology (Author)
Henk Corporaal - , Eindhoven University of Technology (Author)
Akash Kumar - , National University of Singapore (Author)

Abstract

The last decade has witnessed the blooming emergence of many-core platforms, especially the graphic processing units (GPUs). With the exponential growth of cores in GPUs, utilizing them efficiently becomes a challenge. The data-parallel programming model assumes a single instruction stream for multiple concurrent threads (SIMT); therefore little support is offered to enforce thread ordering and finegrained synchronizations. This becomes an obstacle when migrating algorithms which exploit fine-grained parallelism, to GPUs, such as the dataow algorithms. In this paper, we propose a novel approach for fine-grained inter-thread synchronizations on the shared memory of modern GPUs. We demonstrate its performance and compare it with other fine-grained and medium-grained synchronization approaches. Our method achieves 1.5x speedup over the warp-barrier based approach and 4.0x speedup over the atomic spin-lock based approach on average. To further explore the possibility of realizing fine-grained dataow algorithms on GPUs, we apply the proposed synchronization scheme to Needleman-Wunsch-a 2D wavefront application involving massive cross-loop data dependencies. Our implementation achieves 3.56x speedup over the atomic spin-lock implementation and 1.15x speedup over the conventional data-parallel implementation for a basic sub-grid, which implies that the fine-grained, lock-based programming pattern could be an alternative choice for designing general-purpose GPU applications (GPGPU).

Details

Original language	English
Title of host publication	ICS '15: Proceedings of the 29th ACM on International Conference on Supercomputing
Publisher	Association for Computing Machinery (ACM), New York
Pages	109-118
Number of pages	10
ISBN (electronic)	978-1-4503-3559-1
Publication status	Published - 8 Jun 2015
Peer-reviewed	Yes
Externally published	Yes

Publication series

Series	ICS: International Conference on Supercomputing

Conference

Title	29th ACM International Conference on Supercomputing, ICS 2015
Duration	8 - 11 June 2015
City	Newport Beach
Country	United States of America

Keywords

Research priority areas of TU Dresden

Information Technology and Microelectronics

ASJC Scopus subject areas

Computer Science (all)

Keywords

Dataow, Fine-grained synchronization, GPU, Spin-lock

Research Portal of the TU Dresden

Contributors

Abstract

Details

Publication series

Conference

Keywords

Research priority areas of TU Dresden

ASJC Scopus subject areas

Keywords