Séminaire de Statistique et Optimisation

A random matrix approach for high-dimensional semi-supervised learning by graph regularization

par Mme Xiaoyi Mai (CRISTAL Lille)

Europe/Paris
Salle Johnson (1R3)

Salle Johnson

1R3

Description
In modern machine learning, we often encounter tasks with high-dimensional and numerous data vectors. There is currently a growing volume of works on characterizing the exact performance of machine learning methods in an asymptotic regime where the dimension of feature vectors and the number of data samples are comparably large. The approach of high-dimensional asymptotic analyses has a particular interest for understanding and improving learning algorithms confronted to the challenge of high-dimensional data learning. Our study of semi-supervised learning by graph regularization starts with an unified analysis of classical Laplacian regularization methods, revealing a series of consequences that are consistent with the practical behavior on high-dimensional data. In particular, our analysis shows that semi-supervised Laplacian-regularization, despite being a common semi-supervised learning approach, fails to learn effectively from both labelled and unlabelled data of high dimensionality. Motivated by our theoretical findings, we propose a new method of centered regularization, with a theoretically established consistent semi-supervised learning performance in the same high-dimensional regime.