Séminaire de Biostatistique

Pierre Catoire - Missing values in prediction: do we need "Missing at Random" ?

par Lise Gamboa (Université de Bordeaux, Inserm Bordeaux Population Health U1219, BIOSTAT)

Europe/Paris
amphi Louis (ISPED)

amphi Louis

ISPED

Description

Speaker: Pierre Catoire from BPH
Title: Missing values in prediction: do we need "Missing at Random" ?

Abstract: 

Missing values affect every statistical work. In inference, where the analysis aims at describing the relationships between variables, missing data may alter the results and the "Missing at Random" (MAR) assumption is required to guarantee the absence of bias of parameter estimation induced by these missing values. It is commonly assumed that this condition is also necessary in prediction context, where the analysis aims at estimate the probability of an outcome given the observed value of predictors. However, recent results suggest that the MAR assumption is a sufficient, yet not necessary condition to obtain unbiased prediction. We suggest to solve this apparent conflict by remarking that when missing values may be present, two quantities can be considered to be predicted: the probability of the outcome given the value of the observed predictors ("Missingness-unconditional" probability, MU) and the probability of the outcome given the value of the observed predictors and their observation pattern ("Missingness-conditional" probability, MC). We identify conditions under which these two quantities are equal, and the conditions required to estimate each of the two without bias, both being weaker than MAR. This results in a new nomenclature of missingness mechanisms, dedicated to prediction, and different from the MAR classification essential in inference. We illustrate these results with applications on simulated data, assessing the performance of various missing data handling methods, and discuss their implications for estimation, validation and deployment of prediction models.

Calendar subscription link for the complete seminar series:
https://indico.math.cnrs.fr/category/711/events.ics

Program of the Biostatistics seminars:
https://indico.math.cnrs.fr/category/711/

Subscribe to the seminar mailing list:
https://diff.u-bordeaux.fr/sympa/subscribe/seminaire.biostat.bph

Former e-seminars on our YouTube channel (mostly in French): https://www.youtube.com/channel/UCURp-hEQL7k23UzGfqgEurA/videos

 

Biostatistics seminar series from the Department of Public Health from the University of Bordeaux and the Bordeaux Population Health UMR 1219 research center

 

Organisé par

Denis Rustand