Jun 17 – 21, 2024
Europe/Paris timezone

Learning LP-indices in Average-Reward Restless Multi-Armed Bandits

Jun 19, 2024, 1:30 PM




Dr Konstantin Avrachenkov (INRIA Sophia Antipolis)


Restless Multi-Armed Bandits (RMABs) are extensively used in scheduling,
resource allocation, marketing and clinical trials, just to name a few
application areas. RMABs are Markov Decision Processes with two actions
(active and passive modes) for each arm and with a constraint on the
number of active arms per time slot. Since in general RMABs are
PSPACE-complete, several heuristics such as Whittle index and LP index
have been proposed. In this talk, I present a reinforcement learning
scheme for LP indices with almost sure convergence guarantee in the
tabular setting and an empirically efficient Deep Q-learning variant.
Several examples, including scheduling in queueing systems, will be
presented. This is a joint work V.S. Borkar and P. Shah from IIT Bombay.

Primary author

Dr Konstantin Avrachenkov (INRIA Sophia Antipolis)


Prof. Vivek Borkar (IITB) Mr Pratik Shah (IITB)

Presentation materials

There are no materials yet.