Description
Organizers and chairs: R. Srikant and Yashaswini Murthy
This talk presents our recent results on joint learning and scheduling in queueing systems.
Inventory management offers unique opportunities for reliably evaluating and applying deep reinforcement learning (DRL). We introduce Hindsight Differentiable Policy Optimization (HDPO), facilitating direct optimization of a policy's hindsight performance using stochastic gradient descent. HDPO leverages two key elements: (i) an ability to backtest any policy's performance on a sample of ...
We propose a new regret minimization algorithm for episodic sparse linear Markov decision process (SMDP) where the state-transition distribution is a linear function of observed features.
The only previously known algorithm for SMDP requires the knowledge of the sparsity parameter and oracle access to a reference policy.
We overcome these limitations by combining the doubly robust method...