8e Journée Statistique et Informatique pour la Science des Données à Paris-Saclay

Centre de Conférences Marilyn et James Simons (Le Bois-Marie)

Centre de Conférences Marilyn et James Simons

Le Bois-Marie

35, route de Chartres 91440 Bures-sur-Yvette

The aim of this workshop is to bring together mathematicians and computer scientists around some talks on recent results from statistics, machine learning, and more generally data science research. Various topics in machine learning, optimization, deep learning, optimal transport, inverse problems, statistics, and problems of scientific reproducibility will be presented. This workshop is particularly intended for doctoral and post-doctoral researchers.

Registration is free and open until March 1, 2023.

Gilles Blanchard (LMO/Université Paris-Saclay)
Florence Tupin (LTCI/Télécom Paris)

Invited speakers:
Isabelle Bloch (LIP6/Sorbonne Université - LTCI/Télécom Paris)
Antoine Cornuéjols (MIA/AgroParisTech)
Aymeric Dieuleveut (CMAP/École polytechnique)
Elisabeth Gassiat (LMO/Université Paris-Saclay)
Mohammed Nabil El Korso (L2S/CentraleSupélec)
Mathilde Mougeot (ENSIIE & Centre Borelli/ENS Paris-Saclay)

  • Emanuel Aldea
  • Faïcel Chamroukhi
  • Frédéric Barbaresco
  • Isaia Andrenacci
  • Jerome Bobin
  • Jill-Jênn Vie
  • Mouïn Ben Ammar
  • Nacim Belkhir
  • Nawel Arab
  • Nicolas Lermé
  • Pierre Andraud
  • Remy Hosseinkhan
  • Rémi Kazmierczak
  • Sebastian Popescu
  • Serge Cohen
  • Sylvain Arlot
  • Thibaut Germain
  • +69
Cécile Gourgues
    • 9:00 AM
      Welcome coffee
    • 1
      Manifold Learning with Noisy Data

      It is a common idea that high dimensional data (or features) may lie on low dimensional support making learning easier. In this talk, I will present a very general set-up in which it is possible to recover low dimensional non-linear structures with noisy data, the noise being totally unknown and possibly large.
      Then I will present minimax rates for the estimation of the support in Hausdorff distance.

      Speaker: Prof. Elisabeth GASSIAT (LMO/Université Paris-Saclay)
    • 10:50 AM
      Coffee break
    • 2
      Hybrid AI for Knowledge Representation and Model-based Image Understanding - Towards Explainability

      This presentation will focus on hybrid AI, as a step towards explainability, more specifically in the domain of spatial reasoning and image understanding. Image understanding benefits from the modeling of knowledge about both the scene observed and the objects it contains as well as their relationships. We show in this context the contribution of hybrid artificial intelligence, combining different types of formalisms and methods, and combining knowledge with data. Knowledge representation may rely on symbolic and qualitative approaches, as well as semi-qualitative ones to account for their imprecision or vagueness.
      Structural information can be modeled in several formalisms, such as graphs, ontologies, logical knowledge bases, or neural networks, on which reasoning will be based. Image understanding is then expressed as a problem of spatial reasoning. These approaches will be illustrated with examples in medical imaging, illustrating the usefulness of
      combining several approaches.

      Speaker: Prof. Isabelle BLOCH (LIP6/Sorbonne Université - LTCI/Télécom Paris)
    • 3
      Federated Learning with Communication Constraints: Challenges in Compression Based Approaches

      In this presentation, I will present some results on optimization in the context of federated learning with compression. I will first summarise the main challenges and the type of results the community has obtained, and dive into some more recent results on tradeoffs between convergence and compression rates, and user-heterogeneity. In particular, I will describe two fundamental phenomenons (and related proof techniques): (1) how user-heterogeneity affects the convergence of federated optimization methods in the presence of communication constraints, and (2) the robustness of distributed stochastic algorithms to perturbation of the iterates, and the link with model compression. I will then introduce and discuss a new compression scheme based on random codebooks and unitary invariant distributions.

      Speaker: Prof. Aymeric DIEULEVEUT (CMAP/Ecole polytechnique)
    • 1:00 PM
    • 4
      Leveraging Knowledge to Design Machine Learning Despite the Lack of Industrial Data

      In recent years, considerable progress has been made in the implementation of decision support procedures based on machine learning methods through the exploitation of very large databases and the use of learning algorithms.
      In the industrial environment, the databases available in research and development or in production are rarely so voluminous and the question arises as to whether in this context it is reasonable to use machine learning methods.
      This talk presents research work around transfer learning and hybrid models that use knowledge from related application domains or physics to implement efficient models with an economy of data.
      Several achievements in industrial collaborations will be presented that successfully use these learning models to design machine learning for industrial small data regimes and to develop powerful decision support tools even in cases where the initial data volume is limited.

      Speaker: Prof. Mathilde MOUGEOT (ENSIIE & Centre Borelli/ENS Paris-Saclay)
    • 5
      Covariance & Subspace Inference: Handling Robustness, Variability and Incompleteness

      In this talk, we focus on covariance matrix inference and principal component analysis in the context of non-regular data under heterogeneous environments. First, we briefly introduce mixed effects models, which are widely used to analyze repeated measures data arising in several signal processing applications that need to incorporate the same global individual's behavior with possible local variations. Then, we will expose classical strategies to learn under Gaussian assumptions. It is worth mentioning that in certain situations, in which there exist outliers within the data set, the Gaussian assumption is not valid and leads to a dramatic performance loss. To overcome this drawback, we will present an expectation-maximization-based algorithm in which the heterogeneous component is considered part of the complete data. Then, the proposed algorithm is cast into a parallel scheme, w.r.t. the individuals, in order to alleviate the computational cost and a possible central processor overload. In addition, extensions to deal with missing data, which refers to the situation where part of the individual responses is unobserved, will be presented. Finally, applications related to calibration and imaging in the context of large radio-interferometers will be considered.

      Speaker: Prof. Mohammed Nabil EL KORSO (L2S/CentraleSupélec)
    • 4:10 PM
      Coffee break
    • 6
      Transfer Learning, Covariant Learning and Parallel Transport

      Transfer learning has become increasingly important in recent years, particularly because learning a new model for each task can be much more costly in terms of training examples than adapting a model learned for another task. The standard approach in neural networks is to reuse the learned representation in the first layers and to adapt the decision function performed by the last layers.
      In this talk, we will revisit transfer learning. A dual algorithm of the standard approach, which adapts the representation while keeping the decision function, will be presented, as well as an algorithm for the early classification of time series. This will allow us to question the notion of bias in transfer learning as well as the cost of information and to ask ourselves which a priori assumptions are necessary to obtain guarantees on transfer learning.
      We will note that reasoning by analogy and online learning are instances of transfer learning, and we will see how the notions of parallel transport and covariant physics can provide useful conceptual tools to address transfer learning.

      Speaker: Prof. Antoine CORNUEJOLS (MIA/AgroParisTech)