Holistic Debugging of MPI Derived Datatypes

Publikation: Beitrag in Buch/Konferenzbericht/Sammelband/GutachtenBeitrag in KonferenzbandBeigetragenBegutachtung

Beitragende

Abstract

The Message Passing Interface (MPI) specifies an API that allows programmers to create efficient and scalable parallel applications. The standard defines multiple constraints for each function parameter. For performance reasons, no MPI implementation checks all of these constraints at runtime. Derived datatypes are an important concept of MPI and allow users to describe an application's data structures for efficient and convenient communication. Using existing infrastructure we present scalable algorithms to detect usage errors of basic and derived MPI datatypes. We detect errors that include constraints for construction and usage of derived datatypes, matching their type signatures in communication, and detecting erroneous overlaps of communication buffers.

We implement these checks in the MUST runtime error detection framework. We provide a novel representation of error locations to highlight usage errors. Further, approaches to buffer overlap checking can cause unacceptable overheads for non-contiguous datatypes. We present an algorithm that uses patterns in derived MPI datatypes to avoid these overheads without losing precision. Application results for the benchmark suites SPEC MPI2007 and NAS Parallel Benchmarks for up to 2048 cores show that our approach applies to a broad range of applications and that our extended overlap check improves performance by two orders of magnitude. Finally, we augment our runtime error detection component with a debugger extension to support in-depth analysis of the errors that we find as well as semantic errors. This extension to gdb provides information about MPI datatype handles and enables gdb - and other debuggers based on gdb - to display the content of a buffer as used in MPI communications.

Details

OriginalspracheEnglisch
Titel2012 IEEE 26th International Parallel and Distributed Processing Symposium (IPDPS '12 )
Herausgeber (Verlag)IEEE Xplore
Seiten354-365
Seitenumfang12
ISBN (Print)978-1-4673-0975-2
PublikationsstatusVeröffentlicht - 2012
Peer-Review-StatusJa

Publikationsreihe

ReiheInternational Symposium on Parallel and Distributed Processing (IPDPS)
ISSN1530-2075

Konferenz

Titel26th IEEE International Parallel and Distributed Processing Symposium (IPDPS) / Workshop on High Performance Data Intensive Computing
Dauer21 - 25 Mai 2012
StadtShanghai

Externe IDs

researchoutputwizard legacy.publication#52379
WOS 000309131900032
Scopus 84866870762

Schlagworte

Schlagwörter

  • MPI, datatypes, runtime error detection, debugging