Alejandro Murua (Université de Montréal)
We develop a classification model for high dimensional data that takes into account two main problems in high-dimensions: the curse of the dimensionality and the empty space phenomenon. We overcome these obstacles by modeling the distribution of distances involving feature vectors instead of modeling directly the distribution of feature vectors. The model is based on the sphere-hardening result which states that, in high dimensions, data cluster in shells. Based on asymptotics on the dimension parameter, we show that under simple sampling conditions the distances of data points to their means are distributed as a variant of generalized gamma variables. We propose using mixtures of these distributions for both supervised and unsupervised classification of high-dimensional data. The paradigm is extended to low-dimensional data by embedding the data into higher-dimensional spaces by means of the kernel trick. Part of this work (a) has been done in collaboration with Bertrand Saulnier (Université de Montréal), and Nicolas Wicker (Université de Lille 1; Murua and Wicker, 2014), and (b) was inspired by a conversation with François Léonard (Hydro-Québec; Leonard and Gauvin, 2013).