Séminaire de Probabilités

A high dimensional asymptotic view on learning from large data sets

by Xiaoyi Mai (IMT)

Amphi L. Schwartz

Amphi L. Schwartz

In modern machine learning, we often encounter data sets with comparably large numbers of samples and features. It is argued that classical statistical learning theory is insufficient for explaining the generalization performance on such large data sets, due to the curse of dimensionality and the overfitting problem of overparametrized learning models. 
The approach of high dimensional asymptotic analyses is motivated by the need of modeling and understanding modern machine learning. To apply this increasingly popular approach, we develop a flexible framework based on the leave-one-out perturbation, notably capable of handling implicit optimization and iterative procedure involved in many learning algorithms. Our analyses allow for original understandings on fundamental questions such the challenge of high dimensional learning, the role of loss function and the effectiveness of learning with less labels. Practical improvements are proposed based on the insights gained from these analyses.