Jun 17 – 21, 2024
ENSEEIHT
Europe/Paris timezone

Session

Parallel session: Challenges and progress in statistical reinforcement learning

Jun 17, 2024, 1:30 PM
A002 (ENSEEIHT)

A002

ENSEEIHT

Description

Organizer and chair: Odalric-Ambrym Maillard

Presentation materials

There are no materials yet.

  1. Dr Ronald Ortner (MontanUniversitat Leoben)
    6/17/24, 1:30 PM

    This talk considers reinforcement learning in Markov decision processes
    (MDPs) under the undiscounted reward criterion. In this setting the
    so-called regret is a natural performance measure that compares the
    accumulated reward of the learner to that of an optimal policy. Usually
    the regret depends on the size (number of states and actions) of the
    underlying MDP as well as its transition...

    Go to contribution page
  2. Dr Mohammad Sadegh Talebi (University of Copenhagen)
    6/17/24, 2:00 PM

    We study reinforcement learning for decision processes with Markovian dynamics but non-Markovian rewards, in which high-level knowledge in the form of a finite-state automaton is available to the learner. Such an automaton, often called Reward Machine (RM) (Toro Icarte et al., 2018), generates rewards based on its internal state as well as events that are detected at various states in the...

    Go to contribution page
  3. Dr Anders Jonsson (University Pompeu Fabra)
    6/17/24, 2:30 PM

    We study the problem of offline (or batch) Reinforcement Learning (RL) in episodic Regular Decision Processes (RDPs). RDPs are the subclass of Non-Markov Decision Processes where the dependency on the history of past events can be captured by a finite state automaton. We consider a setting where the automaton that underlies the RDP is unknown, and a learner strives to learn a near-optimal...

    Go to contribution page
Building timetable...