Strategy Synthesis in Markov Decision Processes Under Limited Sampling Access

Research output: Contribution to book/conference proceedings/anthology/reportConference contributionContributedpeer-review

Abstract

A central task in control theory, artificial intelligence, and formal methods is to synthesize reward-maximizing strategies for agents that operate in partially unknown environments. In environments modeled by gray-box Markov decision processes (MDPs), the impact of the agents’ actions are known in terms of successor states but not the stochastics involved. In this paper, we devise a strategy synthesis algorithm for gray-box MDPs via reinforcement learning that utilizes interval MDPs as internal model. To compete with limited sampling access in reinforcement learning, we incorporate two novel concepts into our algorithm, focusing on rapid and successful learning rather than on stochastic guarantees and optimality: lower confidence bound exploration reinforces variants of already learned practical strategies and action scoping reduces the learning action space to promising actions. We illustrate benefits of our algorithms by means of a prototypical implementation applied on examples from the AI and formal methods communities.

Details

Original languageEnglish
Title of host publicationNASA Formal Methods - 15th International Symposium, NFM 2023, Proceedings
EditorsKristin Yvonne Rozier, Swarat Chaudhuri
Pages86-103
Number of pages18
Volume13903
Publication statusPublished - 3 Jun 2023
Peer-reviewedYes

Conference

TitleNASA Formal Methods Symposium 2023
Abbreviated titleNFM 2023
Conference number2023
Duration16 - 18 May 2023
Website
Degree of recognitionInternational event
LocationUniversity of Clear Lake
CityHouston
CountryUnited States of America

External IDs

dblp conf/nfm/BaierDWK23
Scopus 85163947741
ORCID /0000-0002-5321-9343/work/142236785
ORCID /0000-0001-8047-4094/work/143075253

Keywords