BWoS: Formally Verified Block-based Work Stealing for Parallel Processing

Research output: Contribution to book/Conference proceedings/Anthology/ReportConference contributionContributedpeer-review

Contributors

  • Jiawei Wang - , Professor (rtd.) of Operating Systems, Chair of Computer Architecture, Dresden Research Lab Huawei Technologies (Author)
  • Bohdan Trach - , Chair of Systems Engineering, Dresden Research Lab Huawei Technologies (Author)
  • Ming Fu - , Dresden Research Lab Huawei Technologies (Author)
  • Diogo Behrens - , Dresden Research Lab Huawei Technologies (Author)
  • Jonathan Schwender - , Dresden Research Lab Huawei Technologies (Author)
  • Yutao Liu - , Dresden Research Lab Huawei Technologies (Author)
  • Jitang Lei - , Dresden Research Lab Huawei Technologies (Author)
  • Viktor Vafeiadis - , Max Planck Institute for Software Systems (Author)
  • Hermann Härtig - , Professor (rtd.) of Operating Systems (Author)
  • Haibo Chen - , Dresden Research Lab Huawei Technologies, Shanghai Jiao Tong University (Author)

Abstract

Work stealing is a widely-used scheduling technique for parallel processing on multicore. Each core owns a queue of tasks and avoids idling by stealing tasks from other queues. Prior work mostly focuses on balancing workload among cores, disregarding whether stealing may adversely impact the owner’s performance or hinder synchronization optimizations. Real-world industrial runtimes for parallel processing heavily rely on work-stealing queues for scalability, and such queues can become bottlenecks to their performance. We present Block-based Work Stealing (BWoS), a novel and pragmatic design that splits per-core queues into multiple blocks. Thieves and owners rarely operate on the same blocks, greatly removing interferences and enabling aggressive optimizations on the owner’s synchronization with thieves. Furthermore, BWoS enables a novel probabilistic stealing policy that guarantees thieves steal from longer queues with higher probability. In our evaluation, using BWoS improves performance by up to 1.25x in the Renaissance macrobenchmark when applied to Java G1GC, provides an average 1.26x speedup in JSON processing when applied to Go runtime, and improves maximum throughput of Hyper HTTP server by 1.12x when applied to Rust Tokio runtime. In microbenchmarks, it provides 8-11x better performance than state-of-the-art designs. We have formally verified and optimized BWoS on weak memory models with a model-checking-based framework.

Details

Original languageEnglish
Title of host publicationProceedings of the 17th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2023
PublisherUSENIX Association
Pages833-850
Number of pages18
ISBN (electronic)9781939133342
Publication statusPublished - 2023
Peer-reviewedYes

Publication series

SeriesUSENIX Annual Technical Conference (ATC)

Conference

Title17th USENIX Symposium on Operating Systems Design and Implementation
Abbreviated titleOSDI 2023
Conference number17
Duration10 - 12 July 2023
Website
LocationSheraton Boston Hotel
CityBoston
CountryUnited States of America