Jun 17 – 21, 2024
ENSEEIHT
Europe/Paris timezone

A Doubly Robust Approach to Sparse Reinforcement Learning

Jun 18, 2024, 2:30 PM
30m
A001 (ENSEEIHT)

A001

ENSEEIHT

Speaker

Assaf Zeevi (columbia university)

Description

We propose a new regret minimization algorithm for episodic sparse linear Markov decision process (SMDP) where the state-transition distribution is a linear function of observed features.
The only previously known algorithm for SMDP requires the knowledge of the sparsity parameter and oracle access to a reference policy.
We overcome these limitations by combining the doubly robust method that allows one to use feature vectors of \emph{all} actions with a novel analysis technique that enables the algorithm to use data from all periods in all episodes. This algorithm is shown to achieve best possible regret (up to log factors)

Primary author

Assaf Zeevi (columbia university)

Presentation materials

There are no materials yet.