Reinforcement Learning for Stochastic Networks, Toulouse

Name: Reinforcement Learning for Stochastic Networks, Toulouse
Start: 2024-06-17T09:00:00+02:00
End: 2024-06-21T18:00:00+02:00
Location: ENSEEIHT

Jun 17 – 21, 2024

ENSEEIHT

Europe/Paris timezone

Recent Advances in Average-Reward Restless Bandits

Jun 17, 2024, 4:00 PM

30m

A002 (ENSEEIHT)

A002

ENSEEIHT

Parallel session: Online learning in stochastic networks

Weina Wang (Carnegie Mellon University)

We consider the infinite-horizon, average reward restless bandit problem. For this problem, a central challenge is to find asymptotically optimal policies in a computationally efficient manner in the regime where the number of arms, N, grows large. Existing policies, including the renowned Whittle index policy, all rely on a uniform global attractor property (UGAP) assumption to achieve asymptotic optimality, which is a complex and difficult-to-verify assumption. In this talk, I will present new, sampling-based policy designs for restless bandits. One of our proposed policies breaks the long-standing UGAP assumption for the first time; then our subsequent policies eliminate the need for the UGAP assumption to achieve asymptotic optimality entirely. Our techniques offer new insights into guaranteeing convergence (avoiding undesirable attractors or cycles) in large stochastic systems.

Yige Hong (Carnegie Mellon University) Qiaomin Xie (University of Wisconsin–Madison) Yudong Chen (University of Wisconsin–Madison) Weina Wang (Carnegie Mellon University)

There are no materials yet.

Reinforcement Learning for Stochastic Networks, Toulouse

Recent Advances in Average-Reward Restless Bandits

A002

ENSEEIHT

Speaker

Description

Primary authors

Presentation materials

Choose timezone

Reinforcement Learning for Stochastic Networks, Toulouse

Speaker

Description

Primary authors

Presentation materials