Séminaire MAD-Stat

Separation of non-ergodic uniform convergence rates for regularized learning in games

par M. Julien Grand-Clément (HEC Paris)

Europe/Paris
Auditorium 3 (Toulouse School of Economics)

Auditorium 3

Toulouse School of Economics

Description

Self-play via online learning is a leading paradigm for solving large-scale games and has enabled recent superhuman performance (e.g., Go, Poker). This work clarifies that different convergence notions in self-play (last iterate, best iterate, and a randomly sampled iterate) can behave fundamentally differently. For a broad class of learning dynamics, including Optimistic Multiplicative Weights Update (OMWU), we prove a separation: even in two-player zero-sum games, last-iterate convergence can be arbitrarily slow, random-iterate convergence can be slower than any polynomial, while best-iterate convergence is polynomial. This departs from much prior theory where these notions align, and we attribute the gap to OMWU’s insufficient “forgetfulness,” linking it to empirical behavior in practical game solving.

Paper 1 -- Paper 2