9-10 June 2016
Ecole Centrale Lille
Europe/Paris timezone

High-dimensional data classification with mixtures of sphere-hardening distances

10 Jun 2016, 10:50
Grand Amphithéâtre (Ecole Centrale Lille)

Grand Amphithéâtre

Ecole Centrale Lille

Campus Lille 1 à Villeneuve d'Ascq


Alejandro Murua (Université de Montréal)


We develop a classification model for high dimensional data that takes into account two main problems in high-dimensions: the curse of the dimensionality and the empty space phenomenon. We overcome these obstacles by modeling the distribution of distances involving feature vectors instead of modeling directly the distribution of feature vectors. The model is based on the sphere-hardening result which states that, in high dimensions, data cluster in shells. Based on asymptotics on the dimension parameter, we show that under simple sampling conditions the distances of data points to their means are distributed as a variant of generalized gamma variables. We propose using mixtures of these distributions for both supervised and unsupervised classification of high-dimensional data. The paradigm is extended to low-dimensional data by embedding the data into higher-dimensional spaces by means of the kernel trick. Part of this work (a) has been done in collaboration with Bertrand Saulnier (Université de Montréal), and Nicolas Wicker (Université de Lille 1; Murua and Wicker, 2014), and (b) was inspired by a conversation with François Léonard (Hydro-Québec; Leonard and Gauvin, 2013).

Presentation Materials

There are no materials yet.