FastLane: Improving Performance of Software Transactional Memory for Low Thread Counts

Jons-Tobias Wamhoff; Christof Fetzer; Pascal Felber; Etienne Rivière; Gilles Muller

doi:10.1145/2442516.2442528

FastLane: Improving Performance of Software Transactional Memory for Low Thread Counts

Publikation: Beitrag zu Konferenzen › Paper › Beigetragen › Begutachtung

Beitragende

Jons-Tobias Wamhoff - , Professur für Systems Engineering (SE) (Autor:in)
Christof Fetzer - , Professur für Systems Engineering (SE) (Autor:in)
Pascal Felber - , Universität Neuenburg (Autor:in)
Etienne Rivière - , Universität Neuenburg (Autor:in)
Gilles Muller - , INRIA - Institut national de recherche en informatique et en automatique (Autor:in)

Abstract

Software transactional memory (STM) can lead to scalable implementations of concurrent programs, as the relative performance of an application increases with the number of threads that support it. However, the absolute performance is typically impaired by the overheads of transaction management and instrumented accesses to shared memory. This often leads STM-based programs with low thread counts to perform worse than a sequential, non-instrumented version of the same application.

In this paper, we propose FastLane, a new STM algorithm that bridges the performance gap between sequential execution and classical STM algorithms when running on few cores. FastLane seeks to reduce instrumentation costs and thus performance degradation in its target operation range. We introduce a novel algorithm that differentiates between two types of threads: One thread (the master) executes transactions pessimistically without ever aborting, thus with minimal instrumentation and management costs, while other threads (the helpers) can commit speculative transactions only when they do not conflict with the master. Helpers thus contribute to the application progress without impairing on the performance of the master.

We implement FastLane as an extension of a state-of-the-art STM runtime system and compiler. Multiple code paths are produced for execution on a single, few, and many cores. The runtime system selects the code path providing the best throughput, depending on the number of cores available on the target machine. Evaluation results indicate that our approach provides promising performance at low thread counts: FastLane almost systematically wins over a classical STM in the 1-6 threads range, and often performs better than sequential execution of the non-instrumented version of the same application starting with 2 threads.

Details

Originalsprache	Englisch
Seiten	113-122
Seitenumfang	10
Publikationsstatus	Veröffentlicht - 2013
Peer-Review-Status	Ja

Konferenz

Titel	18th ACM SIGPLAN symposium on Principles and practice of parallel programming (PPoPP '13), ACM, 2013
Kurztitel	PPoPP '13
Veranstaltungsnummer	18
Dauer	23 - 27 Februar 2013
Bekanntheitsgrad	Internationale Veranstaltung
Stadt	Shenzhen
Land	China

Externe IDs

Scopus	84875150584

Schlagworte

Forschungsprofillinien der TU Dresden

Informationstechnologien und Mikroelektronik

DFG-Fachsystematik nach Fachkollegium

Sicherheit und Verlässlichkeit

Schlagwörter

concurrency, transactional memory, Concurrancy, Algoritzms, Performance

Forschungsportal der TU Dresden