Séminaire de Statistique et Optimisation

Understanding the training of infinitely deep and wide ResNets with Conditional Optimal Transport

par Raphaël Barboni (ENS)

Europe/Paris
Salle K. Johnson (1R3, 1er étage)

Salle K. Johnson

1R3, 1er étage

Description

We study the convergence of gradient flow for the training of deep neural networks. If Residual Neural Networks are a popular example of very deep architectures, their training constitutes a challenging optimization problem due notably to the non-convexity and the non-coercivity of the objective. Yet, in applications, those tasks are successfully solved by simple optimization algorithms such as gradient descent. To better understand this phenomenon, we focus here on a ``mean-field'' model of infinitely deep and arbitrarily wide ResNet. This model is parameterized by a probability measures over the product set of layers and parameters and with constant marginal on the set of layers and we propose to train it by performing a (metric) gradient flow w.r.t. the conditional Optimal Transport distance: a restriction of the classical Wasserstein distance which enforces our marginal condition. Relying on the theory of gradient flows in metric spaces we first show the well-posedness of the gradient flow equation and its consistency with the training of ResNets at finite width. Performing a local Polyak-Łojasiewicz analysis, we then show convergence of the gradient flow for well-chosen initializations: if the number of features is finite but sufficiently large and the risk is sufficiently small at initialization, the gradient flow converges towards a global minimizer.