Journée Statistique et Informatique pour la Science des Données à Paris Saclay

Europe/Paris
Le Bois-Marie

Le Bois-Marie

35, route de Chartres 91440 Bures-sur-Yvette
Description

The aim of this workshop is to bring together mathematicians and computer scientists around some talks on recent results from statistics, machine learning and more generally data science research. Various topics in machine learning, optimization, deep learning, optimal transport, inverse problems, statistics and problems of scientific reproducibility will be presented.

Registration is free and open to February 2, 2021.

Organised by: Thanh Mai PHAM NGOC (LMO) and Charles SOUSSEN (L2S)

Note: The conference will be held entirely on video-conference.

Invited speakers:

Guillaume Charpiat (LRI)
Lenaïc Chizat (LMO)
Emilie Chouzenoux (CVN)
Agnès Desolneux (Centre Borelli)
Gaël Richard (Télécom Paris)
Gaël Varoquaux (INRIA Parietal)

Participants
  • Aayadi Khadija
  • Abbass Gorgi
  • Abdelilah Monir
  • Aboubakar MAITOURNAM
  • Adeline Fermanian
  • Adnane Fouadi
  • Adrien Courtois
  • Ahmed Ben Saad
  • Ajmal Oodally
  • Alain GIBERT
  • Alessandro Leite
  • Alexandre Gramfort
  • Alexandre Hippert-Ferrer
  • Alexis BISMUTH
  • Amin Fehri
  • Anirudh Rayas
  • Anna Kazeykina
  • Antoine Collas
  • Aurélien Decelle
  • Aya Sakite
  • Beatriz Seoane
  • Benjamin Auder
  • Benjamin Guedj
  • Benoit CLAIR
  • Berrenur Saylam
  • Bertrand Maury
  • Bertrand Michel
  • Bousselham GANBOURI
  • caligaris claude
  • Canon Didier
  • Catherine MATIAS
  • Cedric Allain
  • Christian Derquenne
  • Christophe JUILLET
  • César CARDORELLE
  • Daniel Fiorilli
  • Daniel Wagner
  • David Vigouroux
  • Didier Lucor
  • Dieu merci Kimpolo nkokolo
  • djama abdi bachir
  • Désiré Sidibé
  • Elena MAJ
  • Elisabeth Lahalle
  • Elton Rexhepaj
  • Elvire Roblin
  • Emmanuel IDOHOU
  • Emmanuel Menier
  • Estelle Kuhn
  • Fanny Pouyet
  • Fedor Goncharov
  • Flora Jay
  • Florent Bouchard
  • Florian Gosselin
  • FRANCOIS BICHET
  • François Landes
  • François Orieux
  • Frédéric Barbaresco
  • Frédéric Pascal
  • Gabriele Facciolo
  • Gerard Kerkyacharian
  • Gilles Blanchard
  • Hui Yan
  • Huyen Nguyen
  • Héctor Climente
  • Ilias Ghrizi
  • Ines OUKID
  • IOANNIS BARGIOTAS
  • Ismaël Castillo
  • Jean Vidal
  • Jean-Armand Moroni
  • Jean-Loup Loyer
  • Jean-Loup Loyer
  • Jerome Buzzi
  • Johan Duque
  • Joon Kwon
  • Kai Zheng
  • Kaniav Kamary
  • Kare KAMILA
  • Khalid Akhlil
  • Laura Vuduc
  • Laurent Pierre
  • Lionel Mathelin
  • liu tupikina
  • Lolita Aboa
  • Lorenzo Audibert
  • Léon Faure
  • Malika Kharouf
  • Manon MOTTIER
  • Marc Evrard
  • Marc Glisse
  • Marc Michel
  • Marietta Manolessou
  • Mathilde Jeuland
  • Matthieu Nastorg
  • Michele Alessandro Bucci
  • Miha Srdinšek
  • Milad LEYLI ABADI
  • MOHAMED Alaoui
  • Mohammed Nabil EL KORSO
  • Myrto Limnios
  • Narcicegi Kiran
  • Natalia Rodriguez
  • Nicolas Lermé
  • Nilo Schwencke
  • Olivia Breysse
  • onofrio semeraro
  • Pablo Miralles
  • Pascal Bondon
  • Pegdwende Minoungou
  • Pierluigi Morra
  • Quang Huy Tran
  • Quentin Duchemin
  • Rahmani mostafa
  • Raphael LECLERCQ
  • RIFI Mouna
  • Ruocong Zhang
  • Ryad Belhakem
  • Rémi Ginestiere
  • Saad Balbiyad
  • Sabrine Bendimerad
  • Salvish Goomanee
  • Samy Clementz
  • sanaa zannane
  • Santosh Ballav Sapkota
  • Sara REJEB
  • Sebastien Treguer
  • Sebastien Treguer
  • SENA HERVE DAKO
  • Sohrab Samimi
  • Stefano Fortunati
  • Stephane RUBY
  • Sylvain Arlot
  • Taha Bouziane
  • Tamon Nakano
  • Thanh Mai PHAM NGOC
  • Theo Deladerriere
  • Thibaud Ishacian
  • Thibault Randrianarisoa
  • Théo Lacombe
  • Timothée Mathieu
  • Toumi Bouchentouf
  • Trésor Djonga
  • VAIBHAV ARORA
  • VIANNEY PERCHET
  • Victoria Bourgeais
  • Vivien Goepp
  • YAO SINAN
  • Yassine Mhiri
  • Zacharie Naulet
  • Zakia BENJELLOUN-TOUIMI
  • Zhangyun Tan
  • Zhen Xu
Cécile Gourgues
    • 10:20
      Accueil
    • 1
      Supervised Learning with Missing Values

      Some data come with missing values. For instance, a survey’s participant may ignore some questions. There is an abundant statistical literature on this topic, establishing for instance how to fit model without biases due to the missingness, and imputation strategies to provide practical solutions to the analyst. In machine learning, to build models that minimize a prediction risk, most work default to these practices. As we will see, these different settings lead to different theoretical and practical solutions.

      I will outline some conditions under which machine-learning models yield the best-possible predictions in the presence of missing values. A striking result is that naive imputation strategies can be optimal, as the supervised-learning model does the hard work [1]. A challenge to fitting a machine-learning model is that there is a combinatorial explosion of possible missing-values patterns such that even when the output is a linear function of the fully-observed data, the optimal predictor is complex [2]. I will show how the same dedicated neural architecture can approximate well the optimal predictor for multiple missing-values mechanisms, including difficult missing-not-at-random settings [3].

      [1] Josse, J., Prost, N., Scornet, E., & Varoquaux, G. (2019). On the consistency of supervised learning with missing values. arXiv preprint arXiv:1902.06931.

      [2] Le Morvan, M., Prost, N., Josse, J., Scornet, E., & Varoquaux, G. (2020). Linear predictor on linearly-generated data with missing values: non consistency and solutions. AISTATS 2020.

      Orateur: Gaël Varoquaux (INRIA Parietal)
    • 11:10
      Pause café
    • 2
      Analysis of Gradient Descent on Wide Two-Layer Neural Networks

      Artificial neural networks are a class of "prediction" functions parameterized by a large number of parameters -- called weights -- that are used in various machine learning tasks (classification, regression, etc). Given a learning task, the weights are adjusted via a gradient-based algorithm so that the corresponding predictor achieves a good performance on a given training set. In this talk, we propose an analysis of gradient descent on wide two-layer ReLU neural networks for supervised machine learning tasks, that leads to sharp characterizations of the learned predictor. The main idea is to study the dynamics when the width of the hidden layer goes to infinity, which is a Wasserstein gradient flow. While this dynamics evolves on a non-convex landscape, we show that its limit is a global minimizer if initialized properly. We also study the "implicit bias" of this algorithm when the objective is the unregularized logistic loss: among the many global minimizers, we show that it selects a specific one which is a max-margin classifier in a certain functional space. We finally discuss what these results tell us about the generalization performance and the adaptivity to low dimensional structures of neural networks. This is based on joint work with Francis Bach.

      Orateur: Lenaïc Chizat (LMO)
    • 3
      Deep Unfolding of a Proximal Interior Point Method for Image Restoration

      Variational methods have started to be widely applied to ill-posed inverse problems since they have the ability to embed prior knowledge about the solution. However, the level of performance of these methods significantly depends on a set of parameters, which can be estimated through computationally expensive and time-consuming processes. In contrast, deep learning offers very generic and efficient architectures, at the expense of explainability, since it is often used as a black-box, without any fine control over its output. Deep unfolding provides a convenient approach to combine variational-based and deep learning approaches. Starting from a variational formulation for image restoration, we develop iRestNet [1], a neural network architecture obtained by unfolding an interior point proximal algorithm. Hard constraints, encoding desirable properties for the restored image, are incorporated into the network thanks to a logarithmic barrier, while the barrier parameter, the stepsize, and the penalization weight are learned by the network. We derive explicit expressions for the gradient of the proximity operator for various choices of constraints, which allows training iRestNet with gradient descent and backpropagation. In addition, we provide theoretical results regarding the stability of the network. Numerical experiments on image deblurring problems show that the proposed approach outperforms both state-of-the-art variational and machine learning methods in terms of image quality.

      [1] C. Bertocchi, E. Chouzenoux, M.-C. Corbineau, J.-C. Pesquet and M. Prato. Deep Unfolding of a Proximal Interior Point Method for Image Restoration. Inverse Problems, vol. 36, pp. 034005, 2020.

      Orateur: Emilie Chouzenoux (CVN)
    • 12:40
      Lunch
    • 4
      Maximum Entropy Distributions for Image Synthesis under Statistical Constraints

      The question of texture synthesis in image processing is a very challenging problem that can be stated as followed: given an exemplar image, sample a new image that has the same statistical features (empirical mean, empirical covariance, filter responses, neural network responses, etc.). Exponential models then naturally arise as distributions satisfying these constraints in expectation while being of maximum entropy. Now the parameters of these exponential models need to be estimated and samples have to be drawn. I will explain how these can be done simultaneously through the SOUL (Stochastic Optimization with Unadjusted Langevin) algorithm. This is based on a joint work with Valentin de Bortoli, Alain Durmus, Bruno Galerne and Arthur Leclaire.

      Orateur: Agnès Desolneux (Centre Borelli)
    • 14:40
      Pause café
    • 5
      Input Similarity from the Neural Network Perspective

      Given a trained neural network, we aim at understanding how similar it considers any two samples. For this, we express a proper definition of similarity from the neural network perspective (i.e. we quantify how undissociable two inputs A and B are), by taking a machine learning viewpoint: how much a parameter variation designed to change the output for A would impact the output for B as well?

      We study the mathematical properties of this similarity measure, and show how to estimate sample density with it, in low complexity, enabling new types of statistical analysis for neural networks. We also propose to use it during training, to enforce that examples known to be similar should also be seen as similar by the network.

      We then study the self-denoising phenomenon encountered in regression tasks when training neural networks on datasets with noisy labels. We exhibit a multimodal image registration task where almost perfect accuracy is reached, far beyond label noise variance. Such an impressive self-denoising phenomenon can be explained as a noise averaging effect over the labels of similar examples. We analyze data by retrieving samples perceived as similar by the network, and are able to quantify the denoising effect without requiring true labels.

      Orateur: Guillaume Charpiat (LRI)
    • 6
      Deep Neural Network for Audio and Music Transformations

      We will first discuss how deep learning techniques can be used for audio signals. To that aim, we will recall some of the important characteristics of an audio signal and review some of the main deep learning architectures and concepts used in audio signal analysis. We will then illustrate some of these concepts in more details with two applications, namely informed singing voice source separation and music style transfer.

      Orateur: Gaël Richard (Télécom Paris)