Date : September the 7th 2021
Place : Institut de Mathématiques de Toulouse (IMT), Amphithéatre Schwartz
This conference will be part of a series of four similar sessions dedicated to interactions of AI with other branches of mathematics.
The goal of the session will be to outline some interactions between Partial Differential Equations and AI.
If the sanitary conditions will allow it, the conference will be held in presence.
The speakers will be :
- Bruno Després (Sorbonne Université)
- Arnulf Jentzen (University of Münster)
- Pierre Marion (Sorbonne Université)
- Zhenjie Ren (Université Paris-Dauphine)
A tentative schedule of the day is:
- 9:30 am to 10:30 am: Arnulf Jentzen Convergence analysis for the gradient descent optimization method in the training of artificial neural networks with ReLU activation for piecewise linear target functions
- 10:30 am to 11:00 am: Coffee break
- 11:00 am to 12:00 am: Bruno Després Apprentissage automatique et résolution numérique d'EDP simples
- 12:00 am to 2:00 pm: Lunch break
- 2:00 pm to 3:00 pm: Pierre Marion A primer on Neural Ordinary Differential Equations
- 3:00 pm to 3:30 pm: Coffee break
- 3:30 pm to 4:30 pm: Zhenjie Ren Training Neural Networks and Mean-field Langevin dynamics
List of titles and abstracts.
Convergence analysis for the gradient descent optimization method in the training of artificial neural networks with ReLU activation for piecewise linear target functions (Arnulf Jentzen)
Abstract: Gradient descent (GD) type optimization methods are the standard instrument to train artificial neural networks (ANNs) with rectified linear unit (ReLU) activation. Despite the great success of GD type optimization methods in numerical simulations for the training of ANNs with ReLU activation, it remains -- even in the simplest situation of the plain vanilla GD optimization method with random initializations -- an open problem to prove (or disprove) the conjecture that the true risk of the GD optimization method converges in the training of ANNs with ReLU activation to zero as the width of the ANNs, the number of independent random initializations, and the number of GD steps increase to infinity. In this talk we prove this conjecture in the special situation where the probability distribution of the input data is absolutely continuous with respect to the continuous uniform distribution on a compact interval and where the target function under consideriation is piecewise linear.
Apprentissage automatique et résolution numérique d'EDP simples (Bruno Després)
Abstract: Les méthodes d'apprentissage automatique (=ML=NN=IA) permettent de nouvelles pratiques pour le calcul numérique et la résolution numérique d'EDP (simples pour l'instant). Pour des problèmes pauvrement traités par les méthodes standard, cela apporte de nouvelles perspectives et permet quelques progrès dans les bons cas. Plus précisément on passera en revue les liens entre les NNs construits sur des fonctions d'activation de type ReLU et leur application à la construction de schémas de Volumes Finis pour l'advection d'un point triple. On essaiera d'apporter un éclairage sur la délicate question de la stabilité de ces méthodes.
A primer on Neural Ordinary Differential Equations (Pierre Marion)
Abstract: Deep learning has become a prominent method for many applications, for instance computer vision or neural language processing. Mathematical understanding of these methods is yet incomplete. A recent approach has been to view a neural network as a numerical integration method for an ordinary differential equation, or sometimes a partial differential equation. We will give an overview of both theoretical results (e.g. error bounds) and practical applications (e.g. training by the adjoint method) of this point of view. We will also present new results regarding Recurrent Neural Networks, a common type of neural networks for time series. We prove error bounds by framing them as a linear method in an infinite-dimensional Hilbert space.
Training Neural Networks and Mean-field Langevin dynamics (Zhenjie Ren)
Abstract: The neural networks have become an extremely useful tool in various applications such as statistical learning and sampling. The empirical success urges a theoretical investigation based on mathematical models. Recently it has become popular to treat the training of the neural networks as an optimization on the space of probability measures. In this talk we show that the optimizer of such optimization can be approximated using the so-called mean-field Langevin dynamics. This theory sheds light on the efficiency of the (stochastic) gradient descent algorithm for training the neural networks. Based on the theory, we also propose a new algorithm for training the generative adversarial networks (GAN), and test it to produce sampling of simple probability distributions.