Jun 17 – 21, 2024
ENSEEIHT
Europe/Paris timezone

Symphony of experts: orchestration with adversarial insights in reinforcement learning

Jun 17, 2024, 4:30 PM
30m
A001 (ENSEEIHT)

A001

ENSEEIHT

Speaker

Chiara Mignacco (Université Paris-Saclay)

Description

Structured reinforcement learning leverages policies with advantageous properties to reach better performance, particularly in scenarios where exploration poses challenges. We explore this field through the concept of orchestration, where a (small) set of expert policies guides decision-making; the modeling thereof constitutes our first contribution. We then establish value-functions regret bounds for orchestration in the tabular setting by transferring regret-bound results from adversarial settings. We generalize and extend the analysis of natural policy gradient in Agarwal et al. [2021, Section 5.3] to arbitrary adversarial aggregation strategies. We also extend it to the case of estimated advantage functions, providing insights into sample complexity both in expectation and high probability. A key point of our approach lies in its arguably more transparent proofs compared to existing methods. Finally, we provide simulations for a stochastic matching toy model.

Primary authors

Chiara Mignacco (Université Paris-Saclay) Gilles Stoltz (Université Paris-Saclay) Matthieu Jonckheere (LAAS–CNRS)

Presentation materials

There are no materials yet.