Probabilités et statistiques

Maximum Mean Discrepancy and Variable Selection for High-Dimensional Data

par Dr Kensuke Mitsuzawa

Europe/Paris
Salle 1 (LJAD)

Salle 1

LJAD

Description

Maximum Mean Discrepancy (MMD) [1] is a versatile metric for quantifying differences between probability distributions, with applications ranging from two-sample testing to generative model optimization. This presentation focuses on the problem of variable selection: identifying the dimensions that contribute most significantly to the dissimilarity between two distributions. Building upon MMD estimator optimization [2], we introduce a regularization term that enables the identification of these influential variables [3]. Our approach enhances the interpretability of distributional comparisons by highlighting the key features driving observed differences. This methodology is demonstrated through empirical evaluations, showcasing its effectiveness in discerning relevant dimensions.

 

[1] Gretton, A., Borgwardt, K. M., Rasch, M. J., Schölkopf, B., & Smola, A. (2012). A kernel two-sample test. The Journal of Machine Learning Research, 13(1), 723-773.

[2] Sutherland, D. J., Tung, H. Y., Strathmann, H., De, S., Ramdas, A., Smola, A., & Gretton, A. (2016). Generative models and model criticism via optimized maximum mean discrepancy. arXiv preprint arXiv:1611.04488.

[3] Mitsuzawa, K., Kanagawa, M., Bortoli, S., Grossi, M., & Papotti, P. (2023). Variable selection in maximum mean discrepancy for interpretable distribution comparison. arXiv preprint arXiv:2311.01537.