Jun 17 – 21, 2024
ENSEEIHT
Europe/Paris timezone

Exploiting Structure in Undiscounted Reinforcement Learning in Markov Decision Processes

Jun 17, 2024, 1:30 PM
30m
A002 (ENSEEIHT)

A002

ENSEEIHT

Speaker

Dr Ronald Ortner (MontanUniversitat Leoben)

Description

This talk considers reinforcement learning in Markov decision processes
(MDPs) under the undiscounted reward criterion. In this setting the
so-called regret is a natural performance measure that compares the
accumulated reward of the learner to that of an optimal policy. Usually
the regret depends on the size (number of states and actions) of the
underlying MDP as well as its transition structure. We will examine
structures of the underlying MDP that allow to give improved bounds on
the regret.

Primary author

Dr Ronald Ortner (MontanUniversitat Leoben)

Presentation materials

There are no materials yet.