Strategy Synthesis in Markov Decision Processes Under Limited Sampling Access

Christel Baier; Clemens Dubslaff; Patrick Wienhöft; Stefan Kiebel

doi:10.1007/978-3-031-33170-1_6

Strategy Synthesis in Markov Decision Processes Under Limited Sampling Access

Research output: Contribution to book/Conference proceedings/Anthology/Report › Conference contribution › Contributed › peer-review

Contributors

Christel Baier - , Chair of Algebraic and Logical Foundations of Computer Science, Clusters of Excellence CeTI: Centre for Tactile Internet (Author)
Clemens Dubslaff - , Clusters of Excellence CeTI: Centre for Tactile Internet, Eindhoven University of Technology (Author)
Patrick Wienhöft - , Chair of Algebraic and Logical Foundations of Computer Science, Clusters of Excellence CeTI: Centre for Tactile Internet (Author)
Stefan Kiebel - , Chair of cognitive computational neuroscience, Clusters of Excellence CeTI: Centre for Tactile Internet (Author)

Abstract

A central task in control theory, artificial intelligence, and formal methods is to synthesize reward-maximizing strategies for agents that operate in partially unknown environments. In environments modeled by gray-box Markov decision processes (MDPs), the impact of the agents’ actions are known in terms of successor states but not the stochastics involved. In this paper, we devise a strategy synthesis algorithm for gray-box MDPs via reinforcement learning that utilizes interval MDPs as internal model. To compete with limited sampling access in reinforcement learning, we incorporate two novel concepts into our algorithm, focusing on rapid and successful learning rather than on stochastic guarantees and optimality: lower confidence bound exploration reinforces variants of already learned practical strategies and action scoping reduces the learning action space to promising actions. We illustrate benefits of our algorithms by means of a prototypical implementation applied on examples from the AI and formal methods communities.

Details

Original language	English
Title of host publication	NASA Formal Methods
Editors	Kristin Yvonne Rozier, Swarat Chaudhuri
Publisher	Springer, Cham
Pages	86-103
Number of pages	18
ISBN (electronic)	978-3-031-33170-1
ISBN (print)	978-3-031-33169-5
Publication status	Published - 3 Jun 2023
Peer-reviewed	Yes

Publication series

Series	Lecture Notes in Computer Science
Volume	13903
ISSN	0302-9743

Conference

Title	NASA Formal Methods Symposium 2023
Abbreviated title	NFM 2023
Conference number	2023
Duration	16 - 18 May 2023
Website	https://conf.researchr.org/home/nfm-2023#About
Degree of recognition	International event
Location	University of Clear Lake
City	Houston
Country	United States of America

External IDs

dblp	conf/nfm/BaierDWK23
Scopus	85163947741
ORCID	/0000-0002-5321-9343/work/142236785
ORCID	/0000-0001-8047-4094/work/143075253

Research Portal of the TU Dresden

Strategy Synthesis in Markov Decision Processes Under Limited Sampling Access

Contributors

Abstract

Details

Publication series

Conference

External IDs

Keywords

Related content

Strategy Synthesis in Markov Decision Processes Under Limited Sampling Access