11e Journée Statistique et Informatique pour la Science des Données à Paris-Saclay

Name: 11e Journée Statistique et Informatique pour la Science des Données à Paris-Saclay
Start: 2026-04-03T09:30:00+02:00
End: 2026-04-03T17:30:00+02:00
Location: Le Bois-Marie

3 avril 2026

Le Bois-Marie

Fuseau horaire Europe/Paris

Cécile Gourgues

Asymptotic Theory of Iterated Empirical Risk Minimization, with Applications to Active Learning

3 avr. 2026, 14:30

50m

Centre de Conférences Marilyn et James Simons (Le Bois-Marie)

Centre de Conférences Marilyn et James Simons

Le Bois-Marie

35, route de Chartres CS 40001 91893 Bures-sur-Yvette Cedex

Hugo Cui (CNRS, Paris-Saclay)

We study a class of iterated empirical risk minimization (ERM) procedures in which two successive ERMs are performed on the same dataset, and the predictions of the first estimator enter as an argument in the loss function of the second. This setting, which arises naturally in active learning and reweighting schemes, introduces intricate statistical dependencies across samples and fundamentally distinguishes the problem from classical single-stage ERM analyses. For linear models trained with a broad class of convex losses on Gaussian mixture data, we derive a sharp asymptotic characterization of the test error in the high-dimensional regime where the sample size and ambient dimension scale proportionally. Our results provide explicit, fully asymptotic predictions for the performance of the second-stage estimator despite the reuse of data and the presence of prediction-dependent losses. We apply this theory to revisit a well-studied pool-based active learning problem, removing oracle and sample-splitting assumptions made in prior work. We uncover a fundamental tradeoff in how the labeling budget should be allocated across stages, and demonstrate a double-descent behavior of the test error driven purely by data selection, rather than model size or sample count. Based on joint work with Yue M Lu.

Aucun document.

11e Journée Statistique et Informatique pour la Science des Données à Paris-Saclay

Cécile Gourgues

Asymptotic Theory of Iterated Empirical Risk Minimization, with Applications to Active Learning

Centre de Conférences Marilyn et James Simons

Le Bois-Marie

Orateur

Description

Documents de présentation

Choisissez le fuseau horaire

11e Journée Statistique et Informatique pour la Science des Données à Paris-Saclay

Cécile Gourgues

Orateur

Description

Documents de présentation