Constrained traffic signal control under competing public transport priority requests via safe reinforcement learning

Runhao Zhou; Tobias Nousch; Lei Wei; Meng Wang

doi:10.1016/j.eswa.2025.127676

Constrained traffic signal control under competing public transport priority requests via safe reinforcement learning

Publikation: Beitrag in Fachzeitschrift › Forschungsartikel › Beigetragen › Begutachtung

Beitragende

Runhao Zhou - , Professur für Verkehrsprozessautomatisierung (Erstautor:in)
Tobias Nousch - , Professur für Verkehrsprozessautomatisierung (Zweitautor:in)
Lei Wei - , Professur für Verkehrsprozessautomatisierung (Autor:in)
Meng Wang - , Professur für Verkehrsprozessautomatisierung (Letztautor:in)

Abstract

Agile signal switching under frequent arrivals of transit vehicles, combined with the need to respect multiple operational constraints, presents significant challenges for effective and safe signal control, as well as for the real-world implementation of reinforcement learning-based control algorithms. We introduce a safe reinforcement learning-based fully adaptive multimodal traffic signal controller in a connected vehicle environment that incorporates a cost estimator during the learning process to account for multiple operational constraints. It utilises the Duelling Double Deep Q-network and a multicriteria reward to minimise passenger delay, and maximise throughput under lower and upper bounds on green time and maximum phase skip constraint. Unsafe situations due to inappropriate and frequent phase switches are specified as a safety constraint, which constrains the learning process. The Lagrangian method is used to transform the constrained learning to an unconstrained one based on the concept of safe reinforcement learning, and the associated Lagrange multiplier is updated via a gradient-based mechanism. The performance of the proposed algorithm is evaluated for an isolated intersection using simulations in SUMO under different traffic demands, fixed public transport schedules and random passenger occupancy levels. The results demonstrate that the proposed algorithm reduces queue length and public transport passenger delays compared to state-of-the-art model-based and model-free signal controllers. The integration of a cost estimator effectively handles both hard and soft constraints during learning. The proposed algorithm resolves public transport priority request conflicts, makes a trade-off between public transport and individual traffic, and ensures traffic safety.

Details

Originalsprache	Englisch
Aufsatznummer	127676
Seitenumfang	19
Fachzeitschrift	Expert systems with applications : an international journal
Jahrgang	284
Publikationsstatus	Veröffentlicht - 23 Apr. 2025
Peer-Review-Status	Ja

Externe IDs

ORCID	/0000-0001-6555-5558/work/182334861
ORCID	/0000-0002-1623-8051/work/182336236
Scopus	105004263453

Schlagworte

ASJC Scopus Sachgebiete

Schlagwörter

Constrained Markov decision process, Dilemma zone, Safe reinforcement learning, Traffic signal control, Transit signal priority

Forschungsportal der TU Dresden