Description
Organizers and chairs: R. Srikant and Yashaswini Murthy
-
Lei Ying (University of Michigan, Ann Arbor)6/18/24, 1:30 PM
This talk presents our recent results on joint learning and scheduling in queueing systems.
Go to contribution page -
Matias Alvo (Columbia Business School (Decision, Risk and Operations division))6/18/24, 2:00 PM
Inventory management offers unique opportunities for reliably evaluating and applying deep reinforcement learning (DRL). We introduce Hindsight Differentiable Policy Optimization (HDPO), facilitating direct optimization of a policy's hindsight performance using stochastic gradient descent. HDPO leverages two key elements: (i) an ability to backtest any policy's performance on a sample of ...
Go to contribution page -
Assaf Zeevi (columbia university)6/18/24, 2:30 PM
We propose a new regret minimization algorithm for episodic sparse linear Markov decision process (SMDP) where the state-transition distribution is a linear function of observed features.
Go to contribution page
The only previously known algorithm for SMDP requires the knowledge of the sparsity parameter and oracle access to a reference policy.
We overcome these limitations by combining the doubly robust method...