Description
Organizer and chair: Odalric-Ambrym Maillard
-
Dr Ronald Ortner (MontanUniversitat Leoben)6/17/24, 1:30 PM
This talk considers reinforcement learning in Markov decision processes
Go to contribution page
(MDPs) under the undiscounted reward criterion. In this setting the
so-called regret is a natural performance measure that compares the
accumulated reward of the learner to that of an optimal policy. Usually
the regret depends on the size (number of states and actions) of the
underlying MDP as well as its transition... -
Dr Mohammad Sadegh Talebi (University of Copenhagen)6/17/24, 2:00 PM
We study reinforcement learning for decision processes with Markovian dynamics but non-Markovian rewards, in which high-level knowledge in the form of a finite-state automaton is available to the learner. Such an automaton, often called Reward Machine (RM) (Toro Icarte et al., 2018), generates rewards based on its internal state as well as events that are detected at various states in the...
Go to contribution page -
Dr Anders Jonsson (University Pompeu Fabra)6/17/24, 2:30 PM
We study the problem of offline (or batch) Reinforcement Learning (RL) in episodic Regular Decision Processes (RDPs). RDPs are the subclass of Non-Markov Decision Processes where the dependency on the history of past events can be captured by a finite state automaton. We consider a setting where the automaton that underlies the RDP is unknown, and a learner strives to learn a near-optimal...
Go to contribution page