Séminaire de Biostatistique

Justine Remiat — Random forests using longitudinal predictors

by Justine Remiat

Europe/Paris
Amphi Louis (ISPED)

Amphi Louis

ISPED

Description

Speaker: Justine Remiat from Bordeaux Population health
Title: Random forests using longitudinal predictors

This seminar will be in English

Abstract: Random Forests (Breiman, 2001) are an effective predictive tool, particularly in high-dimensional settings. However, they are not well-suited for longitudinal data collected over time. To address this limitation, Fréchet Random Forests (Capitaine et al. 2020) were proposed. They can handle any type of data within a metric space by using a distance tailored to each data type (e.g., images, trajectories). This work aimed to implement the Fréchet Random Forest for trajectory data, fully exploiting the flexibility of the generalized discrete Fréchet distance; and evaluate the performance of the Fréchet Random Forest in predicting a continuous outcome using longitudinal inputs. The Generalized Discrete Fréchet Distance depends on a time-shifting parameter, called timescale, which modifies its behavior. We proposed two implementations: the time-scale defined as an hyper parameter or the time-scale randomly drawn at each tree node to explore all time sensitivity behaviors. A simulation study has been conducted to illustrate the flexibility of the Fréchet Random Forest to capture different scenarios of association: (i) time-sensitive association (ii) shape-sensitive association and (iii) a mix of both. We then apply the method to data from a population-based cohort to predict the risk of dementia from clinical marker trajectories. The simulations illustrated the flexibility of the Fréchet Random Forests to adapt to different types of associations with the timescale tuning. The Fréchet Random Forests also demonstrated better predictive performance (MSE) across all three scenarios compared to classical Random Forests with pre-determined features. On the application data, the Fréchet forests outperformed classical forests, even with more irregular and sparse data, while similarly identifying predictive markers. Thanks to its tunable timescale parameter that can adapt to different structures of association, the Fréchet Random Forest constitutes a flexible tool for prediction based on longitudinal data.

 

Calendar subscription link for the complete seminar series:
https://indico.math.cnrs.fr/category/711/events.ics

Program of the Biostatistics seminars:
https://indico.math.cnrs.fr/category/711/

Subscribe to the seminar mailing list:
https://diff.u-bordeaux.fr/sympa/subscribe/seminaire.biostat.bph

Former e-seminars on our YouTube channel (mostly in French): https://www.youtube.com/channel/UCURp-hEQL7k23UzGfqgEurA/videos

 

Biostatistics seminar series from the Department of Public Health from the University of Bordeaux and the Bordeaux Population Health UMR 1219 research center

 

Organized by

Boris Hejblum