Strategy Synthesis in Markov Decision Processes Under Limited Sampling Access

Christel Baier; Clemens Dubslaff; Patrick Wienhöft; Stefan Kiebel

doi:10.1007/978-3-031-33170-1_6

Strategy Synthesis in Markov Decision Processes Under Limited Sampling Access

Publikation: Beitrag in Buch/Konferenzbericht/Sammelband/Gutachten › Beitrag in Konferenzband › Beigetragen › Begutachtung

Beitragende

Christel Baier - , Professur für Algebraische und logische Grundlagen der Informatik, Exzellenzcluster CeTI: Zentrum für Taktiles Internet (Autor:in)
Clemens Dubslaff - , Exzellenzcluster CeTI: Zentrum für Taktiles Internet, Eindhoven University of Technology (Autor:in)
Patrick Wienhöft - , Professur für Algebraische und logische Grundlagen der Informatik, Exzellenzcluster CeTI: Zentrum für Taktiles Internet (Autor:in)
Stefan Kiebel - , Professur für Kognitive computationale Neurowissenschaft, Exzellenzcluster CeTI: Zentrum für Taktiles Internet (Autor:in)

Abstract

A central task in control theory, artificial intelligence, and formal methods is to synthesize reward-maximizing strategies for agents that operate in partially unknown environments. In environments modeled by gray-box Markov decision processes (MDPs), the impact of the agents’ actions are known in terms of successor states but not the stochastics involved. In this paper, we devise a strategy synthesis algorithm for gray-box MDPs via reinforcement learning that utilizes interval MDPs as internal model. To compete with limited sampling access in reinforcement learning, we incorporate two novel concepts into our algorithm, focusing on rapid and successful learning rather than on stochastic guarantees and optimality: lower confidence bound exploration reinforces variants of already learned practical strategies and action scoping reduces the learning action space to promising actions. We illustrate benefits of our algorithms by means of a prototypical implementation applied on examples from the AI and formal methods communities.

Details

Originalsprache	Englisch
Titel	NASA Formal Methods
Redakteure/-innen	Kristin Yvonne Rozier, Swarat Chaudhuri
Herausgeber (Verlag)	Springer, Cham
Seiten	86-103
Seitenumfang	18
ISBN (elektronisch)	978-3-031-33170-1
ISBN (Print)	978-3-031-33169-5
Publikationsstatus	Veröffentlicht - 3 Juni 2023
Peer-Review-Status	Ja

Publikationsreihe

Reihe	Lecture Notes in Computer Science
Band	13903
ISSN	0302-9743

Konferenz

Titel	NASA Formal Methods Symposium 2023
Kurztitel	NFM 2023
Veranstaltungsnummer	2023
Dauer	16 - 18 Mai 2023
Webseite	https://conf.researchr.org/home/nfm-2023#About
Bekanntheitsgrad	Internationale Veranstaltung
Ort	University of Clear Lake
Stadt	Houston
Land	USA/Vereinigte Staaten

Externe IDs

dblp	conf/nfm/BaierDWK23
Scopus	85163947741
ORCID	/0000-0002-5321-9343/work/142236785
ORCID	/0000-0001-8047-4094/work/143075253

Forschungsportal der TU Dresden

Strategy Synthesis in Markov Decision Processes Under Limited Sampling Access

Beitragende

Abstract

Details

Publikationsreihe

Konferenz

Externe IDs

Schlagworte

Verknüpfte Inhalte

Strategy Synthesis in Markov Decision Processes Under Limited Sampling Access

Strategy Synthesis in Markov Decision Processes Under Limited Sampling Access

Limited sampling RL