BWoS: Formally Verified Block-based Work Stealing for Parallel Processing

Publikation: Beitrag in Buch/Konferenzbericht/Sammelband/GutachtenBeitrag in KonferenzbandBeigetragenBegutachtung

Beitragende

  • Jiawei Wang - , Seniorprofessor für Betriebssysteme, Professur für Rechnerarchitektur, Dresden Research Lab Huawei Technologies (Autor:in)
  • Bohdan Trach - , Professur für Systems Engineering (SE), Dresden Research Lab Huawei Technologies (Autor:in)
  • Ming Fu - , Dresden Research Lab Huawei Technologies (Autor:in)
  • Diogo Behrens - , Dresden Research Lab Huawei Technologies (Autor:in)
  • Jonathan Schwender - , Dresden Research Lab Huawei Technologies (Autor:in)
  • Yutao Liu - , Dresden Research Lab Huawei Technologies (Autor:in)
  • Jitang Lei - , Dresden Research Lab Huawei Technologies (Autor:in)
  • Viktor Vafeiadis - , Max Planck Institute for Software Systems (Autor:in)
  • Hermann Härtig - , Seniorprofessor für Betriebssysteme (Autor:in)
  • Haibo Chen - , Dresden Research Lab Huawei Technologies, Shanghai Jiao Tong University (Autor:in)

Abstract

Work stealing is a widely-used scheduling technique for parallel processing on multicore. Each core owns a queue of tasks and avoids idling by stealing tasks from other queues. Prior work mostly focuses on balancing workload among cores, disregarding whether stealing may adversely impact the owner’s performance or hinder synchronization optimizations. Real-world industrial runtimes for parallel processing heavily rely on work-stealing queues for scalability, and such queues can become bottlenecks to their performance. We present Block-based Work Stealing (BWoS), a novel and pragmatic design that splits per-core queues into multiple blocks. Thieves and owners rarely operate on the same blocks, greatly removing interferences and enabling aggressive optimizations on the owner’s synchronization with thieves. Furthermore, BWoS enables a novel probabilistic stealing policy that guarantees thieves steal from longer queues with higher probability. In our evaluation, using BWoS improves performance by up to 1.25x in the Renaissance macrobenchmark when applied to Java G1GC, provides an average 1.26x speedup in JSON processing when applied to Go runtime, and improves maximum throughput of Hyper HTTP server by 1.12x when applied to Rust Tokio runtime. In microbenchmarks, it provides 8-11x better performance than state-of-the-art designs. We have formally verified and optimized BWoS on weak memory models with a model-checking-based framework.

Details

OriginalspracheEnglisch
TitelProceedings of the 17th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2023
Herausgeber (Verlag)USENIX Association
Seiten833-850
Seitenumfang18
ISBN (elektronisch)9781939133342
PublikationsstatusVeröffentlicht - 2023
Peer-Review-StatusJa

Publikationsreihe

ReiheUSENIX Annual Technical Conference (ATC)

Konferenz

Titel17th USENIX Symposium on Operating Systems Design and Implementation
KurztitelOSDI 2023
Veranstaltungsnummer17
Dauer10 - 12 Juli 2023
Webseite
OrtSheraton Boston Hotel
StadtBoston
LandUSA/Vereinigte Staaten