11e Journée Statistique et Informatique pour la Science des Données à Paris-Saclay

Europe/Paris
Centre de Conférences Marilyn et James Simons (Le Bois-Marie)

Centre de Conférences Marilyn et James Simons

Le Bois-Marie

35, route de Chartres CS 40001 91893 Bures-sur-Yvette Cedex
Description

The aim of this workshop is to bring together mathematicians and computer scientists around some talks on recent results from statistics, machine learning, and more generally data science research. Various topics in machine learning, optimization, deep learning, optimal transport, fairness, statistics will be presented. This workshop is particularly intended for doctoral and post-doctoral researchers.

 

Registration is free and open until March 27, 2026.

Invited speakers:
Richard Combes (CentraleSupéléc)
Hugo Cui (CNRS, Paris-Saclay)
Ekhine Irurozki (Télécom Paris)
Anna Korba (ENSAE)
Paul Mangold (École polytechnique)
Tabea Rebafka (AgroParisTech)

Organizers: 
Avetik Karagulyan (CNRS) & Erwan Le Pennec (École polytechnique)

Cécile Gourgues
    • 09:30 10:00
      Café d'accueil 30m
    • 10:00 10:50
      Linear Bandits on Ellipsoids: Minimax Optimal Algorithms 50m

      We consider linear stochastic bandits where the set of actions is an ellipsoid. We provide the first known minimax optimal algorithm for this problem. We first derive a novel information-theoretic lower bound on the regret of any algorithm, which must be at least $\Omega(\min(d \sigma \sqrt{T} + d \|\theta\|_{A}, \|\theta\|_{A} T))$ where $d$ is the dimension, $T$ the time horizon, $\sigma^2$ the noise variance, $A$ a matrix defining the set of actions, and $\theta$ the vector of unknown parameters. We then provide an algorithm whose regret matches this bound to a multiplicative universal constant. The algorithm is non-classical in the sense that it is not optimistic, and it is not a sampling algorithm. The main idea is to combine a novel sequential procedure to estimate $\|\theta\|$, followed by an explore-and-commit strategy informed by this estimate. The algorithm is highly computationally efficient, and a run requires only time $\mathcal{O}(dT + d^2 \log(T/d) + d^3)$ and memory $\mathcal{O}(d^2)$, in contrast with known optimistic algorithms, which are not implementable in polynomial time. We go beyond minimax optimality and show that our algorithm is locally asymptotically minimax optimal, a much stronger notion of optimality. We further provide numerical experiments to illustrate our theoretical findings.

      Orateur: Richard Combes (CentraleSupélec)
    • 10:50 11:20
      Pause café 30m
    • 11:20 12:10
      Statistical Analysis of Multiple Networks 50m

      This talk provides a brief introduction to statistical network analysis and random graph models. We then focus on the problem of estimating the graphon function, which characterizes nonparametric exchangeable random graph models. Our main emphasis is on the setting where multiple networks are observed, which introduces additional challenges compared to the classical single-network framework. To address this, we propose a new histogram-based estimator with low computational complexity. The key idea is to jointly align the nodes across all observed graphs, rather than processing each network independently as in most existing approaches. We establish consistency results for the proposed estimator and demonstrate through numerical experiments that it outperforms current methods in both estimation accuracy and computational efficiency. Finally, we show that, when used for data augmentation in graph neural network classification tasks, our approach leads to improved performance on various real-world datasets. This is joint work with Roland Sogan.

      Orateur: Tabea Rebafka (AgroParisTech)
    • 12:10 13:00
      A Computable Measure of Suboptimality for Entropy-Regularised Variational Objectives 50m

      Several emerging post-Bayesian methods target a probability distribution for which an entropy-regularised variational objective is minimised. This increased flexibility introduces a computational challenge, as one loses access to an explicit unnormalised density for the target. To mitigate this difficulty, we introduce a novel measure of suboptimality called 'gradient discrepancy', and in particular a 'kernel' gradient discrepancy (KGD) that can be explicitly computed. In the standard Bayesian context, KGD coincides with the kernel Stein discrepancy (KSD), and we obtain a novel characterisation of KSD as measuring the size of a variational gradient. Outside this familiar setting, KGD enables novel sampling algorithms to be developed and compared, even when unnormalised densities cannot be obtained. To illustrate this point several novel algorithms are proposed and studied, including a natural generalisation of Stein variational gradient descent, with applications to mean-field neural networks and predictively oriented posteriors presented. On the theoretical side, our principal contribution is to establish sufficient conditions for desirable properties of KGD, such as continuity and convergence control. Joint work with Clémentine Chazal, Heishiro Kanagawa, Zheyang Shen and Chris. J. Oates.

      Orateur: Anna Korba (ENSAE)
    • 13:00 14:30
      Déjeuner Buffet 1h 30m
    • 14:30 15:20
      Asymptotic Theory of Iterated Empirical Risk Minimization, with Applications to Active Learning 50m

      We study a class of iterated empirical risk minimization (ERM) procedures in which two successive ERMs are performed on the same dataset, and the predictions of the first estimator enter as an argument in the loss function of the second. This setting, which arises naturally in active learning and reweighting schemes, introduces intricate statistical dependencies across samples and fundamentally distinguishes the problem from classical single-stage ERM analyses. For linear models trained with a broad class of convex losses on Gaussian mixture data, we derive a sharp asymptotic characterization of the test error in the high-dimensional regime where the sample size and ambient dimension scale proportionally. Our results provide explicit, fully asymptotic predictions for the performance of the second-stage estimator despite the reuse of data and the presence of prediction-dependent losses. We apply this theory to revisit a well-studied pool-based active learning problem, removing oracle and sample-splitting assumptions made in prior work. We uncover a fundamental tradeoff in how the labeling budget should be allocated across stages, and demonstrate a double-descent behavior of the test error driven purely by data selection, rather than model size or sample count. Based on joint work with Yue M Lu.

      Orateur: Hugo Cui (CNRS, Paris-Saclay)
    • 15:20 16:10
      Convergence and Linear Speed-Up in Stochastic Federated Learning 50m

      In federated learning, multiple users collaboratively train a machine learning model without sharing local data. To reduce communication, users perform multiple local stochastic gradient steps that are then aggregated by a central server. However, due to data heterogeneity, local training introduces bias. In this talk, I will present a novel interpretation of the Federated Averaging algorithm, establishing its convergence to a stationary distribution. By analyzing this distribution, we show that the bias consists of two components: one due to heterogeneity and another due to gradient stochasticity. I will then extend this analysis to the Scaffold algorithm, demonstrating that it effectively mitigates heterogeneity bias but not stochasticity bias. Finally, we show that both algorithms achieve linear speed-up in the number of agents, a key property in federated stochastic optimization.

      Orateur: Paul Mangold (École polytechnique)
    • 16:10 16:40
      Pause café 30m
    • 16:40 17:30
      Beyond Kemeny Medians: Consensus Ranking Distributions Definition, Properties and Statistical Learning 50m

      Summarising a distribution over rankings by a single Kemeny median fails whenever the distribution is multimodal or heterogeneous. Drawing on the histogram analogy, we introduce Consensus Ranking Distributions (CRD): sparse mixtures of local Kemeny medians indexed by a partition of the space of rankings, interpolating between a single consensus ranking and the raw empirical distribution. We propose the COAST algorithm, a top-down decision tree that learns the partition from data using pairwise comparison splits, and establish a PAC-style generalisation bound. Experiments on synthetic mixtures and real preference data illustrate the method's ability to recover modes and produce interpretable summaries.

      Orateur: Ekhine Irurozki (Télécom paris)