Journées Statistiques du Sud 2024

Europe/Paris
Auditorium J. Herbrand (IRIT, Université Paul Sabatier)

Auditorium J. Herbrand

IRIT, Université Paul Sabatier

Description

La 10ème édition des Journées Statistiques du Sud se déroulera à Toulouse du 19 au 21 juin 2024, à l'Auditorium J. Herbrand (IRIT, Univ. Paul Sabatier).

Programme de la conférence et recueil des résumés : 
   ➤ Journées Statistiques du Sud 2024


Ces journées sont un ensemble de groupes de travail ayant lieu dans le sud de la France.  

Le but de ces journées est de donner une vue d'ensemble des développements scientifiques récents en statistique et de promouvoir les échanges entre étudiant.es et chercheur.euses. 

Au programme : 

Les étudiant.es participants aux Journées Statistiques du Sud sont invité.es à une pré-journée le mardi 18 juin 2024. Cet événement satellite est co-organisé avec la Fédération Occitane de Recherche Mathématique (OcciMath). Plus d'informations ici.
 

L'inscription, gratuite mais obligatoire, est obligatoire pour participer aux journées. 
Il n'est cependant plus possible de s'inscire à ce jour.

Les soumissions pour la session poster se font via le formulaire de contribution, jusqu'au vendredi 14 juin 2024 (inclus). 

 


__

Comité d'organisation :

François Bachoc (IMT-UPS), Juliette Chevallier (IMT-INSA), Emannuelle Claeys (IRIT-UPS), Nicolas Enjalbert-Courrech (IMT-CNRS), Xiaoyi Mai (IMT-UT2J), Adrien Mazoyer (IMT-UPS), Nathalie Peyrard (INRAE), Tom Rohmer (INRAE) et Sophia Yazzourh (IMT-UPS).

Soutien administratif :

Céline Rocca et Virginie Raffe.



Éditions précédentes :

Depuis leur création en 2007, les JSS se sont tenues principalement dans les environs ensoleillés d'Avignon (2022), Barcelone (2014), Marseille (2009), Montpellier (2016), Nice (2007 et 2011) et Toulouse (2008 et 2012).


Sponsors et partenaires :

.__
.__
 
 
 
 
 

 


Participants
  • AHMED Mohamed El Moktar
  • Andéol Léo
  • BACAVE Hanna
  • Bachoc François
  • Ben Ajmia Lamia
  • BERTHET Philippe
  • Bontemps Dominique
  • Bouasabah Mohammed
  • BOULENC Hugo
  • Briscik Mitja
  • Bruning Victoria
  • Chevallier Juliette
  • Chion Marie
  • Chiron Arthur
  • Chopin Nicolas
  • Claeys Emmanuelle
  • Coeuret Lucas
  • Costa Manon
  • Dalmau Joseba
  • Demange-Chryst Julien
  • Demangeot Marine
  • Fatma Aouissaoui
  • Gabriel Edith
  • Gimenez Rollin
  • GOUTHON Abiodun Jean-Luc
  • Gransard Benjamin
  • Heimida Mohamed
  • Henderson Iain
  • Hovsepyan Lilit
  • jouilil youness
  • Kouakou Hugues
  • Lafargue Valentin
  • LAGNOUX Agnès
  • Lalanne Clément
  • LAURENT-BONNEAU Béatrice
  • LAVANDIER MIROUZE Sylvaine
  • Lequen Arnaud
  • Lounis Christophe
  • MAI Xiaoyi
  • Maugis-Rabusseau Cathy
  • MERCIER Sabine
  • monnier jerome
  • Neuvial Pierre
  • Orntangar Nguenamadji
  • Peyhardi Jean
  • Philippenko Constantin
  • Richard Frédéric
  • RIOUALI Maryam
  • Roustant Olivier
  • Saulières Léo
  • YAZZOURH Sophia
  • +53
    • 12:30 PM 12:50 PM
      Divers: Accueil de la pré-journée Salle Johnson (IMT, Université Paul Sabatier)

      Salle Johnson

      IMT, Université Paul Sabatier

    • 12:50 PM 1:00 PM
      Divers: Introduction Salle Johnson (IMT, Université Paul Sabatier)

      Salle Johnson

      IMT, Université Paul Sabatier

      Conveners: Laurent Manivel (CNRS et IMT), Nicolas Enjalbert Courrech (Institut de Mathématiques de Toulouse), Sophia YAZZOURH (Institut de Mathématiques de Toulouse)
    • 1:00 PM 2:15 PM
      Exposé court: Applications Bio & Santé Salle Johnson (IMT, Université Paul Sabatier)

      Salle Johnson

      IMT, Université Paul Sabatier

      Convener: Nicolas Enjalbert Courrech (Institut de Mathématiques de Toulouse)
      • 1:00 PM
        HSMM piloté par les observations pour l'estimation de la dynamique des adventices 25m Salle Johnson

        Salle Johnson

        IMT, Université Paul Sabatier

        Les adventices sont des plantes qui poussent spontanément dans les parcelles agricoles et qui entrent en compétition avec les cultures. Leur dynamique repose sur la colonisation et la dormance. La banque de graines n'étant jamais observée de manière naturelle, une modélisation de cette dynamique a été proposée dans le cadre des Hidden Markov Models (HMM). Ce modèle, appelé Observation Driven-HMM (OD-HMM) étend les HMM au cas où les probabilités de transition dépendent de l'observation courante pour tenir compte des nouvelles graines produites qui entrent dans la banque de graines. Cependant, pour plus de réalisme sur la distribution de la survie de la banque de graines, le cadre naturel serait celui des Hidden Semi-Markov Models (HSMM). Néanmoins la notion de durée de séjour dans l'état caché n'est plus adaptée dès lors que l'observation influence la chaîne cachée à chaque instant. En nous appuyant sur les deux cadres OD-HMM et HSMM, nous proposons un nouveau modèle général : l'OD-HSMM, permettant à la fois de tenir compte d'une influence des données sur la chaîne cachée et de s'affranchir de la loi du temps de séjour géométrique. Nous en présentons une version paramétrique à partir des paramètres clés de la dynamique d'une espèce adventice et nous discutons différentes approches pour leur estimation.

        Speaker: Hanna Bacave (INRAE, MIAT)
      • 1:25 PM
        One-Step estimation procedure in univariate and multivariate GLMs with categorical explanatory variables 25m Salle Johnson

        Salle Johnson

        IMT, Université Paul Sabatier

        Generalized linear models are commonly used for modeling relationships in both univariate and multivariate contexts, with parameters traditionally estimated via the maximum likelihood estimator (MLE). MLE, while efficient, often requires a Newton-Raphson type algorithm for computation, making it time-intensive particularly with large datasets or numerous variables. Although faster, alternative closed form estimators lack the efficiency. In this topic, we propose a fast and asymptotically efficient estimation of the parameters of generalized linear models with categorical explanatory variables. It is based on a one-step procedure where a single step of the gradient descent is performed on the log-likelihood function initialized from the explicit estimators. This work presents the theoretical results obtained, the simulations carried out and an application to car insurance pricing.

        Multivariate GLMs are studied in many scientific contexts. In insurance sector actuaries and risk managers precisely, they allow to assess the joint probabilities of various events occurring simultaneously, such as multiple claims or correlated risks across different insurance policy types (e.g., life, property, and auto). Copula models provide flexible tools to model multivariate variables by distinguish marginal effects from the dependence structure. In this setting, the copula parameter which quantify the (non-linear) dependency of the coordinates and the parameters of the marginal distributions are unknown and have to be estimated jointly.

        In order to infer the parameters, maximum likelihood estimators (MLE) can be used due to the asymptotic properties. However, MLE is generally not in closed-form expression and is consequently time consuming. An alternative procedure, called inference for margins estimators (IFM), has been proposed in (Xu 1996, Joe 1997, 2005). In the IFM procedure, parameters of the marginals are estimated separately and simultaneously and plug-in to obtain finally the copula parameter. Although, IFM-MLE can still be time-consuming for this reason in order to estimate the copula parameter, fast and asymptotically efficient OS-CFE are used to estimate the parameters of the marginals and plug-in to estimate the copula parameter with the IFM method.

        Speaker: Mrs Lilit Hovsepyan (Laboratoire Manceau de Mathématiques - Inrae Toulouse)
      • 1:50 PM
        Bayesian Outcome Weighted Learning 25m Salle Johnson

        Salle Johnson

        IMT, Université Paul Sabatier

        L'un des objectifs principaux de la médecine de précision statistique est d'apprendre des règles de traitement individualisées optimales ou "Individualized Treatment Rules" (ITRs). La méthode "Outcome Weighted Learning" (OWL) propose pour la première fois, une approche basée sur la classification, ou l'apprentissage automatique, pour estimer les ITRs. Elle reformule le problème d'apprentissage des ITR optimales en un problème de classification pondérée, qui peut être résolu en utilisant des méthodes d'apprentissage automatique, telles que les machines à vecteurs de support. Dans cet article, nous introduisons une formulation bayésienne de l'OWL. En partant de la fonction objective de l'OWL, nous générons une pseudo-vraisemblance qui peut être exprimée comme un mélange d'échelles de distributions normales. Un algorithme de Gibbs sampling est développé pour échantillonner la distribution postérieure des paramètres. En plus de fournir une stratégie pour apprendre une ITR optimale, l'OWL bayésien offre (1) une approche méthodique pour la génération de règles de décision apprises sur données dispersées et (2) une approche probabiliste naturelle pour estimer l'incertitude des recommandations de traitement ITR elles-mêmes. Nous démontrons la performance de notre méthode à travers plusieurs études de simulation.

        Speaker: Sophia YAZZOURH (Institut de Mathématiques de Toulouse)
    • 2:15 PM 2:45 PM
      Pause café IMT, Université Paul Sabatier

      IMT, Université Paul Sabatier

    • 2:45 PM 4:00 PM
      Exposé court: Machine Learning Salle Johnson (IMT, Université Paul Sabatier)

      Salle Johnson

      IMT, Université Paul Sabatier

      Convener: Sophia YAZZOURH (Institut de Mathématiques de Toulouse)
      • 2:45 PM
        A general approximation lower bound in Lp norm, with applications to feed-forward neural networks 25m Salle Johnson

        Salle Johnson

        IMT, Université Paul Sabatier

        We study the fundamental limits to the expressive power of neural networks. Given two sets F, G of real-valued functions, we first prove a general lower bound on how well functions in F can be approximated in Lp(μ) norm by functions in G, for any p≥1 and any probability measure μ. The lower bound depends on the packing number of F, the range of F, and the fat-shattering dimension of G. We then instantiate this bound to the case where G corresponds to a piecewise-polynomial feed-forward neural network, and describe in details the application to two sets F: Hölder balls and multivariate monotonic functions. Beside matching (known or new) upper bounds up to log factors, our lower bounds shed some light on the similarities or differences between approximation in Lp norm or in sup norm, solving an open question by DeVore et al. (2021). Our proof strategy differs from the sup norm case and uses a key probability result of Mendelson (2002).

        Speaker: Armand Foucault (université de Toulouse, ANITI)
      • 3:10 PM
        Variational autoencoder with weighted samples for high-dimensional non-parametric adaptive importance sampling 25m Salle Johnson

        Salle Johnson

        IMT, Université Paul Sabatier

        Adaptive importance sampling is a well-known family of algorithms for density approximation, generation and Monte Carlo integration including rare event estimation. The main common denominator of this family of algorithms is to perform density estimation with weighted samples at each iteration. However, the classical existing methods to do so, such as kernel smoothing or approximation by a Gaussian distribution, suffer from the curse of dimensionality and/or a lack of flexibility. Both are limitations in high dimension and when we do not have any prior knowledge on the form of the target distribution, such as its number of modes. Variational autoencoders are probabilistic tools able to represent with fidelity high-dimensional data in a lower dimensional space. They constitute a parametric family of distributions robust faced to the dimension and since they are based on deep neural networks, they are flexible enough to be considered as non-parametric models. In this communication, we propose to use a variational autoencoder as the auxiliary importance sampling distribution by extending the existing framework to weighted samples. We integrate the proposed procedure in existing adaptive importance sampling algorithms and we illustrate its practical interest on diverse examples.

        Speaker: Julien Demange-Chryst (ONERA - IMT)
      • 3:35 PM
        Physics-Informed Machine Learning methods applied to inverse problems in river hydraulics 25m Salle Johnson

        Salle Johnson

        IMT, Université Paul Sabatier

        Faced with the socio-economic challenges of flood forecasting, in a context of climate change, multi-scale modeling approaches that take advantage of the maximum amount of information available are needed to enable accurate and rapid flood forecasts.

        In this context, the objective of this work is to develop a Physics-Informed Machine Learning method to efficiently perform flood models calibration, which is crucial to ensure accurate forecasts. More precisely, an approach aiming at inferring a spatially-distributed friction coefficient (Manning-Strickler coefficient) from data with a Physics-Informed Neural Network is proposed. The method consists in considering for the loss function a physical model term corresponding to the 2D Shallow-Water Equations residual and a data discrepancy term. Data are generated with the reference software DassFlow1 and refer to observations of the free surface height and mass flow rate at various locations and times in the computational domain. The PINN parameters and the spatially-distributed friction parameter are then optimized so that the weighted sum of the physical residual term and the data discrepancy term are minimized. To tackle multi-scale issues, a multiresolution strategy is employed on the friction parameter, thus allowing to initialize the optimization with a coarse friction resolution and refining it iteratively throughout the training for regularization purposes.

        To illustrate the efficiency of the method and its sensitivity to the friction parameter dimension, two river hydraulic modeling test cases will be discussed. Overall, the proposed method appears efficient and robust. Moreover, it is simple to implement (non-intrusive) compared to more traditional Variational Data Assimilation approaches making it a viable strategy to further enhance for rapid flood forecasts

        Speaker: Hugo Boulenc (IMT - INSA Toulouse)
    • 4:00 PM 4:30 PM
      Pause café IMT, Université Paul Sabatier

      IMT, Université Paul Sabatier

    • 4:30 PM 5:00 PM
      Divers: Exposé autour de la thèse et l'après-thèse Salle Johnson (IMT, Université Paul Sabatier)

      Salle Johnson

      IMT, Université Paul Sabatier

      Convener: Pierre Neuvial (Institut de Mathématiques de Toulouse)
    • 5:00 PM 6:00 PM
      Divers: Table-ronde autour de l'après-thèse Salle Johnson (IMT, Université Paul Sabatier)

      Salle Johnson

      IMT, Université Paul Sabatier

      Conveners: Pierre Neuvial (Institut de Mathématiques de Toulouse), Vincent Baron (Université Paul Sabatier), Juliette Chevallier (Institut de Mathématiques de Toulouse), Chifaa Dahik (Capgemini), Sebastien Déjean (Institut de Mathématiques de Toulouse), Tom Rohmer (Inrae)
    • 6:00 PM 6:30 PM
      Divers: Discussions et fin de la pré-journée Salle Johnson (IMT, Université Paul Sabatier)

      Salle Johnson

      IMT, Université Paul Sabatier

    • 8:45 AM 9:00 AM
      Divers: Accueil Auditorium J. Herbrand

      Auditorium J. Herbrand

      IRIT, Université Paul Sabatier

    • 9:00 AM 9:30 AM
      Divers: Ouverture des journées Auditorium J. Herbrand

      Auditorium J. Herbrand

      IRIT, Université Paul Sabatier

    • 9:30 AM 10:30 AM
      Exposé long: Inference techniques for the analysis of Brownian image textures Auditorium J. Herbrand

      Auditorium J. Herbrand

      IRIT, Université Paul Sabatier

      Convener: Frederic Richard (Aix-Marseille University)
      • 9:30 AM
        Inference techniques for the analysis of Brownian image textures 1h

        In this talk, I will present some techniques for estimating the functional parameters of anisotropic fractional Brownian fields, and their application to the analysis of image textures. I will focus on a first approach based on the resolution of inverse problems which leads to a complete estimation of parameters. The formulation of these inverse problems comes from the fitting of the empirical semi-variogram of an image to the semi-variogram of a turning band field that approximates the anisotropic fractional Brownian field. It takes the form of a separable non-linear least square criterion which can be solved by a variable projection method, and extended to take into account additional penalties. Besides, I will also describe an alternate approach which uses neural networks to obtain accurate estimation of field features such as the field degree of regularity.

        Speaker: Frédéric Richard (Aix-Marseille University)
    • 10:30 AM 11:00 AM
      Pause café
    • 11:00 AM 12:30 PM
      Mini-Cours: Introduction à l'apprentissage par renforcement Auditorium J. Herbrand

      Auditorium J. Herbrand

      IRIT, Université Paul Sabatier

      Convener: Emmanuel Rachelson (ISAE-SUPAERO)
      • 11:00 AM
        Introduction à l'apprentissage par renforcement - 1/2 1h 30m
        Speaker: Emmanuel Rachelson (ISAE-SUPAERO)
    • 12:30 PM 2:00 PM
      Repas Brasserie L'esplanade (Université Paul Sabatier)

      Brasserie L'esplanade

      Université Paul Sabatier

    • 2:00 PM 3:00 PM
      Exposé long: Unbiased estimation of smooth functions. Applications in statistic and machine learning Auditorium J. Herbrand

      Auditorium J. Herbrand

      IRIT, Université Paul Sabatier

      Convener: Nicolas Chopin (ENSAE, Institut Polytechnique de Paris)
      • 2:00 PM
        Unbiased estimation of smooth functions. Applications in statistic and machine learning 1h

        Given a smooth function f, we develop a general approach to turn Monte Carlo samples with expectation m into an unbiased estimate of f(m). Specifically, we develop estimators that are based on randomly truncating the Taylor series expansion of f and estimating the coefficients of the truncated series. We derive their properties and propose a strategy to set their tuning parameters -- which depend on m -- automatically, with a view to make the whole approach simple to use. We develop our methods for the specific functions f(x)=log(x) and f(x)=1/x, as they arise in several statistical applications such as maximum likelihood estimation of latent variable models and Bayesian inference for un-normalised models. Detailed numerical studies are performed for a range of applications to determine how competitive and reliable the proposed approach is.

        Speaker: Nicolas Chopin (ENSAE, Institut Polytechnique de Paris)
    • 3:00 PM 4:00 PM
      Exposé long: A new preconditioned stochastic gradient algorithm for estimation in latent variable models Auditorium J. Herbrand

      Auditorium J. Herbrand

      IRIT, Université Paul Sabatier

      Convener: Maud Delattre (INRAe MaIAGE Jouy-en-Jossas)
      • 3:00 PM
        A new preconditioned stochastic gradient algorithm for estimation in latent variable models 1h

        Latent variable models are powerful tools for modeling complex phenomena involving in particular partially observed data, unobserved variables or underlying complex unknown structures. Inference is often difficult due to the latent structure of the model. To deal with parameter estimation in the presence of latent variables, well-known efficient methods exist, such as gradient-based and EM-type algorithms, but with practical and theoretical limitations. We propose as an alternative for parameter estimation an efficient preconditioned stochastic gradient algorithm. Our method includes a preconditioning step based on a positive definite Fisher information matrix estimate. We prove convergence results for the proposed algorithm under mild assumptions for very general latent variables models. We illustrate through relevant simulations the performance of the proposed methodology in a nonlinear mixed effects model and in a stochastic block model.

        Speaker: Maud Delattre (INRAe MaIAGE Jouy-en-Jossas)
    • 4:00 PM 4:30 PM
      Pause café
    • 4:30 PM 5:30 PM
      Exposé long: Gaussian processes with inequality constraints: Theory and computation Auditorium J. Herbrand

      Auditorium J. Herbrand

      IRIT, Université Paul Sabatier

      Convener: François Bachoc (Institut de Mathématiques de Toulouse)
      • 4:30 PM
        Gaussian processes with inequality constraints: Theory and computation 1h

        In Gaussian process modeling, inequality constraints enable to take expert knowledge into account and thus to improve prediction and uncertainty quantification. Typical examples are when a black-box function is bounded or monotonic with respect to some of its input variables. We will show how inequality constraints impact the Gaussian process model, the computation of its posterior distribution and the estimation of its covariance parameters. An example will be presented, where a numerical flooding model is monotonic with respect to two input variables called tide and surge.
        The talk will follow 3 parts. (1) An introduction to (constrained) Gaussian processes and their motivations in the field of computer experiments will be provided. (2) Theoretical results on the impact of the constraints on maximum likelihood estimation will be provided. (3) Focusing on numerical computations, an algorithm called MaxMod will be presented.

        Speaker: François Bachoc (Institut de Mathématiques de Toulouse)
    • 5:30 PM 8:30 PM
      Session Poster
      • 5:30 PM
        A general approximation lower bound in Lp norm, with applications to feed-forward neural networks 3h

        We study the fundamental limits to the expressive power of neural networks. Given two sets F, G of real-valued functions, we first prove a general lower bound on how well functions in F can be approximated in Lp(μ) norm by functions in G, for any p≥1 and any probability measure μ. The lower bound depends on the packing number of F, the range of F, and the fat-shattering dimension of G. We then instantiate this bound to the case where G corresponds to a piecewise-polynomial feed-forward neural network, and describe in details the application to two sets F: Hölder balls and multivariate monotonic functions. Beside matching (known or new) upper bounds up to log factors, our lower bounds shed some light on the similarities or differences between approximation in Lp norm or in sup norm, solving an open question by DeVore et al. (2021). Our proof strategy differs from the sup norm case and uses a key probability result of Mendelson (2002).

        Speaker: Armand Foucault (université de Toulouse, ANITI)
      • 5:30 PM
        Copula Integration for Genetic Selection Parameter Estimation in Bivariate Linear Mixed Models 3h

        In animal genetics, linear mixed models are pivotal in determining the genetic and environmental impacts on animal traits, which is critical for designing effective breeding strategies. Traditional approaches, such as restricted maximum likelihood, rely on the assumption of multi-normality in trait distributions. However, this assumption often fails in practice due to the non-Gaussian nature of the joint distribution of multiple traits, resulting in biased estimations.

        In this study, a novel method was introduced by incorporating functions known as copulas, which describe the specific dependence structures. This was achieved using a stochastic gradient algorithm, where genetic parameters were estimated by maximizing a likelihood function that includes copulas.

        Following validation with simulated Gaussian data, the method was applied to other dependence structures, such as the Clayton copula, demonstrating its functionality.

        The findings indicate that accounting for the actual joint distribution of traits can lead to more precise estimations of genetic parameters, thereby enhancing the effectiveness of breeding programs.

        Speaker: Victoria Bruning (INRAE)
      • 5:30 PM
        Global sensitivity analysis with weighted Poincaré inequalities 3h

        Recently, one-dimensional Poincaré inequalities were used in Global Sensitivity Analysis (GSA) to provide upper bounds and chaos-type approximations of Sobol indices with derivative-based global sensitivity measures. As a new proposal, we develop the use of one-dimensional weighted Poincaré inequalities. The use of weights provides an additional degree of freedom that can be manipulated to enhance the precision of the upper bounds and approximations. In this context, we propose a way to construct weights that guarantee the existence of an orthonormal system of eigenfunctions, as well as a data-driven method based on a monotonic approximation of the main effects. Finally, we illustrate the benefits of using weights in a GSA study of a real flooding application.

        Speaker: David Heredia (Institut de Mathématiques de Toulouse)
      • 5:30 PM
        Integration of Medical Knowledge into Reinforcement Learning for Dynamic Treatment Regimes 3h

        La médecine de précision permet aux patients atteints de maladies
        chroniques (diabètes, cancer…) de mettre au centre leurs propres
        informations afin d'améliorer leur santé. Les stratégies de
        traitements adaptatifs ou "Dynamic Treatment Regimes" sont une branche
        de ce domaine médical. Elles établissent une règle de décision à
        chaque étape du processus de guérison, en se basant sur l'historique
        médical et l’évolution des données physiologiques. Ici, l'objectif est
        d'optimiser à long terme la réponse positive du patient à la séquence
        de décisions de traitement. Les méthodes d’apprentissage par
        renforcement ou "Reinforcement Learning" déterminent directement des
        données ces stratégies optimales. Introduire de l'expertise médicale
        dans ce processus contribue à en améliorer les performances
        d'apprentissage.

        Speaker: Sophia YAZZOURH (Institut de Mathématiques de Toulouse)
      • 5:30 PM
        Statistiques et machine learning pour la prédiction de sorties complexes avec application à la sûreté nucléaire. 3h

        Présentation de métamodèles pour le code de calcul DRACCAR qui simule le phénomène de renoyage du coeur d'un réacteur nucléaire.

        Speaker: Florian Gossard (IMT)
    • 8:45 AM 9:00 AM
      Divers: Accueil Auditorium J. Herbrand

      Auditorium J. Herbrand

      IRIT, Université Paul Sabatier

    • 9:00 AM 9:30 AM
      Exposé court: Compressed and distributed least-squares regression: convergence rates with applications to Federated Learning Salle Johnson (IMT, Université Paul Sabatier)

      Salle Johnson

      IMT, Université Paul Sabatier

      Convener: Constantin Philippenko (Ecole Polytechnique)
      • 9:00 AM
        Compressed and distributed least-squares regression: convergence rates with applications to Federated Learning 30m

        We investigate the impact of compression on stochastic gradient algorithms for machine learning, a technique widely used in distributed and federated learning.
        We underline differences in terms of convergence rates between several unbiased compression operators, that all satisfy the same condition on their variance, thus going beyond the classical worst-case analysis. To do so, we focus on the case of least-squares regression (LSR) and analyze a general stochastic approximation algorithm for minimizing quadratic functions relying on a random field. More particularly, we highlight the impact on the convergence of the covariance of the additive noise induced by the algorithm. We consider weak assumptions on the random field, tailored to the analysis (specifically, expected Hölder regularity), and on the noise covariance, enabling the analysis of various randomizing mechanisms, including compression. We then extend our results to the case of federated learning.

        Speaker: Constantin Philippenko (Ecole Polytechnique)
    • 9:30 AM 10:30 AM
      Exposé long: Building explainable and robust neural networks by using Lipschitz constraints and Optimal Transport Auditorium J. Herbrand

      Auditorium J. Herbrand

      IRIT, Université Paul Sabatier

      Convener: Mathieu Serrurier (IRIT Toulouse)
      • 9:30 AM
        Building explainable and robust neural networks by using Lipschitz constraints and Optimal Transport 1h

        The lack of robustness and explainability in neural networks is directly linked to the arbitrarily high Lipschitz constant of deep models. Although constraining the Lipschitz constant has been shown to improve these properties, it can make it challenging to learn with classical loss functions. In this presentation, we explain how to control this constant, and demonstrate that training such networks requires defining specific loss functions and optimization processes. To this end, we propose a loss function based on optimal transport that not only certifies robustness but also converts adversarial examples into provable counterfactual examples.

        Speaker: Mathieu Serrurier (IRIT Toulouse)
    • 10:30 AM 11:00 AM
      Pause café
    • 11:00 AM 12:30 PM
      Mini-Cours: Introduction à l'apprentissage par renforcement Amphi Fermat 1A (Université Paul Sabatier)

      Amphi Fermat 1A

      Université Paul Sabatier

      Convener: Emmanuel Rachelson (ISAE-SUPAERO)
      • 11:00 AM
        Introduction à l'apprentissage par renforcement - 2/2 1h 30m
        Speaker: Emmanuel Rachelson (ISAE-SUPAERO)
    • 12:30 PM 2:00 PM
      Repas Brasserie L'esplanade (Université Paul Sabatier)

      Brasserie L'esplanade

      Université Paul Sabatier

    • 2:00 PM 3:00 PM
      Exposé long: Nonparametric Bayesian mixture models for identifying clusters from longitudinal and cross-sectional data Auditorium J. Herbrand

      Auditorium J. Herbrand

      IRIT, Université Paul Sabatier

      Convener: Anaïs Rouanet (ISPED, Université de Bordeaux)
      • 2:00 PM
        Nonparametric Bayesian mixture models for identifying clusters from longitudinal and cross-sectional data 1h

        The identification of sets of co-regulated genes that share a common function is a key question of modern genomics. Bayesian profile regression is a semi-supervised mixture modelling approach that makes use of a response to guide inference toward relevant clusterings. Previous applications of profile regression have considered univariate continuous, categorical, and count outcomes. In this work, we extend Bayesian profile regression to cases where the outcome is longitudinal (or multivariate continuous), using multivariate normal and Gaussian process regression response models. The model is applied on budding-yeast data to identify groups of genes co-regulated during the Saccharomyces cerevisiae cell cycle. We identify four distinct groups of genes associated with specific patterns of gene expression trajectories, along with the bound transcriptional factors, likely involved in their co-regulation process.

        Speaker: Anaïs Rouanet (ISPED, Université de Bordeaux)
    • 3:00 PM 4:00 PM
      Exposé long: Polya urn models for multivariate species abundance data and neutral theory of biodiversity Auditorium J. Herbrand

      Auditorium J. Herbrand

      IRIT, Université Paul Sabatier

      Convener: Jean Peyhardi (Institut Montpelliérain Alexander Grothendieck (IMAG))
      • 3:00 PM
        Polya urn models for multivariate species abundance data: Properties and application 1h

        This talk focuses on models for multivariate count data, with emphasis on species abundance data. Two approches emerge in this framework: the Poisson log-normal (PLN) and the Tree Dirichlet multinomial (TDM) models. The first uses a latent gaussian vector to model dependencies between species whereas the second models dependencies directly on observed abundances. The TDM model makes the assumption that the total abundance is fixed, and is then often used for microbiome datasets since the sequencing depth (in RNA seq) varies from one observation to another, leading to a total abundance that is not really interpretable. We propose to generalize TDM models in two ways: by relaxing the fixed total abundance assumption and by using Polya distribution instead of Dirichlet multinomial. This family of models corresponds to Polya urn models with a random number of draws and will be named Polya splitting distributions. In a first part I will present the probabilistic properties of such models, with focus on marginals and probabilistic graphical model. Then it will be shown that these models emerge as stationary distributions of multivariate birth death process under simple parametric assumption on birth-detah rates. These assumptions are related to the neutral theory of biodiversity that assumes no biological interaction between species. Finally the statistical aspects of Polya splitting models will be presented: the regression framework, the inference, the consideration of a partition tree structure and two applications on real data.

        Speaker: Jean Peyhardi (Institut Montpelliérain Alexander Grothendieck (IMAG))
    • 4:00 PM 4:30 PM
      Pause café
    • 4:30 PM 6:00 PM
      Mini-Cours: A primer on diffusion-based generative models Amphi Fermat 1A (Université Paul Sabatier)

      Amphi Fermat 1A

      Université Paul Sabatier

      Convener: Claire Boyer (Sorbonne Université)
      • 4:30 PM
        A primer on diffusion-based generative models - 1/2 1h 30m Amphi Fermat 1A

        Amphi Fermat 1A

        Université Paul Sabatier

        Speaker: Claire Boyer (Sorbonne Université)
    • 8:00 PM 10:00 PM
      Dînner de la conférence: Les Caves de la Maréchale Les Caves de la Maréchale

      Les Caves de la Maréchale

      3 rue Jules Chalande 31000 Toulouse www.lescavesdelamarechale.com

      Diner aux Caves de la Maréchale.
      www.lescavesdelamarechale.com

    • 9:30 AM 10:30 AM
      Exposé long: Conformal prediction for object detection Auditorium J. Herbrand

      Auditorium J. Herbrand

      IRIT, Université Paul Sabatier

      Convener: Sébastien Gerchinovitz (IRT Saint Exupéry)
      • 9:30 AM
        Conformal prediction for object detection 1h

        We address the problem of constructing reliable uncertainty estimates for object detection. We build upon classical tools from Conformal Prediction, which offer (marginal) risk guarantees when the predictive uncertainty can be reduced to a one-dimensional parameter. In this talk, we will first recall standard algorithms and theoretical guarantees in conformal prediction and beyond. We will then address the problem of tuning a two-dimensional uncertainty parameter, and will illustrate our method on an objection detection task. This is a joint work with Léo Andéol, Luca Mossina, and Adrien Mazoyer.

        Speaker: Sébastien Gerchinovitz (IRT Saint Exupéry)
    • 10:30 AM 11:00 AM
      Pause café
    • 11:00 AM 12:30 PM
      Mini-Cours: A primer on diffusion-based generative models Amphi Grignard 2A (Université Paul Sabatier)

      Amphi Grignard 2A

      Université Paul Sabatier

      Convener: Claire Boyer (Sorbonne Université)
      • 11:00 AM
        A primer on diffusion-based generative models - 2/2 1h 30m Amphi Grignard 2A

        Amphi Grignard 2A

        Université Paul Sabatier

        Speaker: Claire Boyer (Sorbonne Université)
    • 12:30 PM 1:00 PM
      Divers: Mot de la fin Auditorium J. Herbrand

      Auditorium J. Herbrand

      IRIT, Université Paul Sabatier