Objective Mismatch in Model-based Reinforcement Learning

Nathan Lambert; Brandon Amos; Omry Yadan; Roberto Calandra

Objective Mismatch in Model-based Reinforcement Learning

Research output: Contribution to journal › Conference article › Contributed › peer-review

Contributors

Nathan Lambert - , University of California at Berkeley (Author)
Brandon Amos - , Meta (Author)
Omry Yadan - , Meta (Author)
Roberto Calandra - , Chair of Machine Learning for Robotics (CeTi), Meta (Author)

Abstract

Model-based reinforcement learning (MBRL) is a powerful framework for data-efficiently learning control of continuous tasks. Recent work in MBRL has mostly focused on using more advanced function approximators and planning schemes, with little development of the general framework. In this paper, we identify a fundamental issue of the standard MBRL framework - what we call objective mismatch. Objective mismatch arises when one objective is optimized in the hope that a second, often uncorrelated, metric will also be optimized. In the context of MBRL, we characterize the objective mismatch between training the forward dynamics model w.r.t. the likelihood of the one-step ahead prediction, and the overall goal of improving performance on a downstream control task. For example, this issue can emerge with the realization that dynamics models effective for a specific task do not necessarily need to be globally accurate, and vice versa globally accurate models might not be sufficiently accurate locally to obtain good control performance on a specific task. In our experiments, we study this objective mismatch issue and demonstrate that the likelihood of one-step ahead predictions is not always correlated with control performance. This observation highlights a critical limitation in the MBRL framework which will require further research to be fully understood and addressed. We propose an initial method to mitigate the mismatch issue by re-weighting dynamics model training. Building on it, we conclude with a discussion about other potential directions of research for addressing this issue.

Details

Original language	English
Pages (from-to)	761-770
Number of pages	10
Journal	Proceedings of Machine Learning Research
Volume	120
Publication status	Published - 2020
Peer-reviewed	Yes

Conference

Title	2nd Annual Conference on Learning for Dynamics and Control, L4DC 2020
Duration	10 - 11 June 2020
City	Berkeley
Country	United States of America

External IDs

ORCID	/0000-0001-9430-8433/work/146646292

Research Portal of the TU Dresden

Objective Mismatch in Model-based Reinforcement Learning

Contributors

Abstract

Details

Conference

External IDs

Keywords

ASJC Scopus subject areas