Reinforcement Learning for Stochastic Networks, Toulouse

Name: Reinforcement Learning for Stochastic Networks, Toulouse
Start: 2024-06-17T09:00:00+02:00
End: 2024-06-21T18:00:00+02:00
Location: ENSEEIHT

Jun 17 – 21, 2024

ENSEEIHT

Europe/Paris timezone

Model-Free Robust $\phi$-Divergence Reinforcement Learning Using Both Offline and Online Data

Jun 19, 2024, 1:30 PM

30m

A002 (ENSEEIHT)

A002

ENSEEIHT

Parallel session: Online learning

Kishan Panaganti (California Institute of Technology)

The robust $\phi$-regularized Markov Decision Process (RRMDP) framework focuses on designing control policies that are robust against parameter uncertainties due to mismatches between the simulator (nominal) model and real-world settings. This work makes \emph{two} important contributions. First, we propose a \textit{model-free} algorithm called \textit{Robust $\phi$-regularized fitted Q-iteration} (RPQ) for learning an $\epsilon$-optimal robust policy that uses only the historical data collected by rolling out a behavior policy (with \textit{robust exploratory} requirement) on the nominal model. To the best of our knowledge, we provide the \textit{first} unified analysis for a class of $\phi$-divergences achieving robust optimal policies in high-dimensional systems with general function approximation. Second, we introduce the \textit{hybrid robust $\phi$-regularized reinforcement learning} framework to learn an optimal robust policy using both historical data and online sampling. Towards this framework, we propose a model-free algorithm called \textit{Hybrid robust Total-variation-regularized Q-iteration} (HyTQ). Finally, we provide theoretical guarantees on the performance of the learned policies of our algorithms on systems with arbitrary large state space using function approximation.

Kishan Panaganti (California Institute of Technology)

Adam Wierman (California Institute of Technology) Eric Mazumdar (California Institute of Technology)

There are no materials yet.

Reinforcement Learning for Stochastic Networks, Toulouse

Model-Free Robust $\phi$-Divergence Reinforcement Learning Using Both Offline and Online Data

A002

ENSEEIHT

Speaker

Description

Primary author

Co-authors

Presentation materials

Choose timezone

Reinforcement Learning for Stochastic Networks, Toulouse

Speaker

Description

Primary author

Co-authors

Presentation materials