Description
We study the learning dynamics of wide two-layer neural networks trained by stochastic gradient descent (SGD), aiming to understand quantitatively how network width shapes both the typical training trajectory and the variability of the final predictor.
We adopt an interacting particle viewpoint in which neurons evolve under SGD as a large coupled system. As the width grows, this collective dynamics is well approximated by a deterministic mean-field limit, which provides an analytically tractable description of how the parameter distribution (and hence predictions) evolves during training.
We then quantify finite-width effects through two complementary results. First, we characterize fluctuations around the mean-field limit: after the natural rescaling, we show that the deviations converge to a Gaussian limiting process, yielding an explicit description of the variability induced by training randomness. Second, we establish finite-width concentration inequalities, uniform over training time, which control with high probability how close a width-N network remains to its mean-field proxy.