Reinforcement Learning for Stochastic Networks, Toulouse

Name: Reinforcement Learning for Stochastic Networks, Toulouse
Start: 2024-06-17T09:00:00+02:00
End: 2024-06-21T18:00:00+02:00
Location: ENSEEIHT

Jun 17 – 21, 2024

ENSEEIHT

Europe/Paris timezone

Fleming-Viot particle systems to accelerate optimal policy learning in the presence of costly rare events

Jun 18, 2024, 2:30 PM

30m

A002 (ENSEEIHT)

A002

ENSEEIHT

Parallel session: Reinforcement learning for combinatorial problems

Daniel Mastropietro (INP Toulouse, CNRS-IRIT)

In this talk we present Fleming-Viot particle systems to increase the efficiency in discovering rare events that have an impact in the learning speed of optimal policies. The approach is used to learn the critic of Actor-Critic policy gradient methods that learn optimal parameters of parameterized policies, giving rise to what we call the FVAC method. We have successfully applied FVAC to two different contexts where it has shown an advantage over a benchmark Monte-Carlo or TD Actor-Critic method: (i) network systems, where the objective is to learn an optimal acceptance policy of incoming jobs with large rejection costs; and (ii) a classical RL environment, where the objective is to find the shortest path to the exit in a labyrinth.

Daniel Mastropietro (INP Toulouse, CNRS-IRIT)

Matthieu Jonckheere (LAAS–CNRS) Urtzi Ayesta (CNRS)

There are no materials yet.

Reinforcement Learning for Stochastic Networks, Toulouse

Fleming-Viot particle systems to accelerate optimal policy learning in the presence of costly rare events

A002

ENSEEIHT

Speaker

Description

Primary author

Co-authors

Presentation materials

Choose timezone

Reinforcement Learning for Stochastic Networks, Toulouse

Speaker

Description

Primary author

Co-authors

Presentation materials