Scalability and performance of I/O intensive parallel applications are major concerns in modern High Performance Computing (HPC) environments. Almost all applications use POSIX I/O explicitly or implicitly through third party libraries like MPI-IO to perform I/O operations on the file system. POSIX I/O is known to be one of the lead causes of poor I/O performance due to its restrictive access semantics and consistency requirements. Some file systems therefore relax specific POSIX semantics to alleviate I/O performance penalties. In order to make the most effective use of the offered file systems features it is required to know what kind of POSIX semantics an application requires. Existing tools can analyze parallel I/O performance to report type and duration of executed I/O operations. There are even tools that analyse the consistency requirements of data operations, but none that also consider perfromance critical patterns of metadata operations. In this paper, we present a novel, systematic approach that groups parallel I/O operations and analyzes their I/O semantics with respect to POSIX I/O. We provide the tool rabbitxx that identifies concurrent overlapping accesses to the same file but also identifies metadata accesses such as concurrent create operations in the same directory. Our work indicates that POSIX defined I/O access semantics, in its current form, are often not necessary for parallel applications.
|High Performance Computing - ISC High Performance 2023 International Workshops, Revised Selected Papers
|Amanda Bienz, Michèle Weiland, Marc Baboulin, Carola Kruse
|Lecture Notes in Computer Science
|Veröffentlicht - 2023