Journée Statistique et Informatique pour la Science des Données à Paris Saclay

Europe/Paris
Centre de Conférences Marilyn et James Simons (Le Bois-Marie)

Centre de Conférences Marilyn et James Simons

Le Bois-Marie

35, route de Chartres 91440 Bures-sur-Yvette
Description

The aim of this workshop is to bring together mathematicians and computer scientists around some talks on recent results from statistics, machine learning and more generally data science research. Various topics in machine learning, optimization, deep learning, optimal transport, inverse problems, statistics and problems of scientific reproducibility will be presented.

Registration is free and open to January 20, 2020.

Organised by: Alexandre Gramfort (INRIA) and Thanh Mai Pham NGOC (LMO Orsay)

Invited speakers:

Sarah Cohen-Boulakia (LRI, Paris-Sud)
Victor-Emmanuel Brunel (ENSAE/CREST)
Steve Oudot (INRIA)
Charles Soussen (CentraleSupélec)
Gilles Blanchard (IHES)
Quentin Merigot (Paris-Sud)

Poster
Cécile Gourgues
    • 09:00
      Café d'accueil
    • 1
      Computational Reproducibility in the Life Sciences and Research in Computer Science: Round Trip

      With the development of new experimental technologies, biologists are faced with an avalanche of data to be computationally analyzed for scientific advancements and discoveries to emerge. Faced with the complexity of analysis pipelines, the large number of computational tools, and the enormous amount of data to manage, there is compelling evidence that many (if not most) scientific discoveries will not stand the test of time. Increasing the reproducibility of computed results is of paramount importance.

      While several elements of solutions are currently available, ensuring reproducible analyses relies on progress made in several areas of research in computer science including fundamental aspects.

      After an introduction to the problem of computational reproducibility, we go on to discuss the challenges posed by this domain and describe the remaining opportunities of research in computer science.

      Orateur: Prof. Sarah Cohen-Boulakia (LRI, Paris-Sud)
    • 10:50
      Pause café
    • 2
      Determinantal Point Processes in Machine Learning

      Determinantal point processes are a very powerful tool in probability theory, especially for integrable systems, because they allow to get very concise closed form formulas and simplify a lot of computations. This is one reason why they have become very attractive in machine learning. Another reason is that, when parametrized by a symmetric matrix, they allow to model repulsive interactions between finitely many items; They were even introduced as fermionic point processes by Odile Macchi in statistical physics in the 70’s, in order to describe particles that tend to repel each other within same energy states. In this talk, I will define these point processes, give a few examples and properties, and list a few challenges that they pose in machine learning theory.

      Orateur: Prof. Victor-Emmanuel Brunel (ENSEA/CREST)
    • 3
      The Pre-image Problem from a Topological Perspective

      This talk will be a review of the efforts of the Topological Data Analysis (TDA) community to tackle the preimage problem. After a general introduction on TDA, the main focus will be on recent attempts to invert the TDA operator. While this line of work is still in its infancy, the hope on the long run is to use such inverses for feature interpretation. The mathematical tools involved in the analysis come mainly from metric geometry, spectral theory, and the theory of constructible functions---specific pointers will be given in the course of the exposition.

      Orateur: Prof. Steve Oudot (INRIA)
    • 13:00
      Buffet-lunch
    • 4
      Orthogonal Greedy Algorithms for Sparse Reconstruction

      The past decade has witnessed a tremendous interest in the concept of sparse representations in signal and image processing. Inverse problems involving sparsity arise in many application fields such as nondestructive evaluation of materials, electroencephalography for brain activity analysis, biological imaging, or fluid mechanics, to name a few. In this lecture, I will introduce well-known greedy algorithms and show how they can be used to address ill-posed inverse problems regularized by sparsity. Orthogonal greedy algorithms are popular iterative schemes for sparse signal reconstruction. Their principle is to sequentially select atoms in a given dictionary and to update the sparse approximation coefficients by solving a least-square problem whenever a new atom is selected. Two classical greedy algorithms will be put forward, Orthogonal Matching Pursuit (OMP) and Orthogonal Least Squares (OLS). Their popularity relies on the fact that fast solvers are available, since least-square problems can be solved recursively. I will then introduce stepwise extension of greedy algorithms, where an early wrong atom selection can be counteracted by its further removal from the active set. I will also address non-negative extension of greedy algorithms for inverse problems regularized by both sparsity and non-negativity. In the latter algorithms, a series of non-negative least-squares subproblems are solved. I will then discuss how orthogonal greedy schemes can be adapted, and show that fast implementations are still possible, based on more involved strategies for recursively solving non-negative least-square problems. The last part of my talk will be dedicated to the theoretical analysis of greedy algorithms, aiming to characterize the performance of sparse algorithms in terms of exact recovery guarantees. I will give a flavor of the main concepts behind classical exact recovery analysis techniques.

      Orateur: Prof. Charles Soussen (CentraleSupélec)
    • 15:10
      Pause café
    • 5
      Simultaneous Adaptation for Several Criteria Using an Extended Lepskii Principle

      In the setting of supervised learning using kernel methods, while the least-square (prediction) error is classically the performance measure of interest, if the true target function is assumed to be an element of a Hilbert space, one can also be interested in the norm of the error of an estimator in that space (reconstruction error); this is of particular relevance in inverse problems where the observed signal is the target after passing through a known linear operator. When the regularity (in a certain sense) of the target is known, a common regularization parameter can achieve optimal minimax error rates in both norms. When the regularity is unknown (which is usually the case), we address the question of data-dependent selection rule of a regularization parameter that is adaptive to the unknown regularity of the target function and is optimal both for the prediction error and for the reproducing kernel Hilbert space (reconstruction) norm error by proposing a modified Lepskii balancing principle using a varying family of norms. (Based on joint work with P. Mathé, N. Mücke).

      Orateur: Prof. Gilles Blanchard (IHES)
    • 6
      Quantitative Stability of Optimal Transport Maps and Linearization of the $2$-Wasserstein Space

      This work studies an explicit embedding of the set of probability measures into a Hilbert space, defined using optimal transport maps from a reference probability density. This embedding linearizes to some extent the $2$-Wasserstein space, and enables the direct use of generic supervised and unsupervised learning algorithms on measure data. Our main result is that the embedding is (bi-)Hölder continuous, when the reference density is uniform over a convex set, and can be equivalently phrased as a dimension-independent Hölder-stability results for optimal transport maps. Joint work with A. Delalande and F. Chazal.

      Orateur: Prof. Quentin Merigot (Paris-Sud)