Orateur
Description
Multistage stochastic optimization (MSO) is pivotal for sequential decision-making under uncertainty, with prominent approaches in stochastic optimal control and reinforcement learning. While methods like Stochastic Dual Dynamic Programming excel with moderate-dimensional states and large continuous control spaces, and reinforcement learning handles large state spaces with smaller control sets, a significant gap exists for dynamic operations research problems. These problems are characterized by high-dimensional state spaces and discrete, combinatorially large control spaces.
Policies based on Combinatorial Optimization Augmented Machine Learning (COAML) have recently demonstrated success in tackling such complex problems, notably in dynamic vehicle routing. However, current state-of-the-art learning algorithms for these policies typically rely on imitating anticipative decisions. This often translates to supervised learning problems using Fenchel-Young losses, which can result in "voting" type policies that underperform on problems requiring strong temporal coordination between decisions.
To address this limitation and foster better coordination, we introduce multistage extensions of Fenchel-Young losses. These novel loss functions are integrated into an empirical cost minimization algorithm. Preliminary numerical results on benchmark environments indicate that our approach significantly improves upon existing methods by enabling more coordinated decision-making in multistage stochastic settings.