Orateur
Description
The training of neural networks with first order methods still remains misunderstood in theory, despite compelling empirical evidence. Not only it is believed that neural networks converge towards global minimizers, but the implicit bias of optimisation algorithms makes them converge towards specific minimisers with nice generalisation properties. This talk focuses on the early alignment phase that appears in the training dynamics of two layer networks with small initialisations. During this early alignment phase, the numerous neurons align towards a few number of key directions, hence leading to some sparsity in the number of represented neurons. While this alignment phenomenon can be at the origin of convergence towards spurious local minima of the network parameters, such local minima can actually have good properties and yield much lower excess risks than any global minimizer of the training loss. In other words, this early alignment can lead to a simplicity bias that is helpful in minimizing the test loss.