Le fuseau horaire de votre profil:
Speaker: Valérie Garès from INSA Rennes
Title: Record linkage and analysis of linked data with application in French national health data system
Abstract: The French National Health Data System is the national health data system which collects all the longitudinal health records and insurance information of most of the French population. These data can be used to enrich other existing databases (cohorts, health registries...), which allows to get a more comprehensive medical information on each patient, and thus, to improve the subsequent statistical analysis. However, patients in the SNDS and health databases are usually anonymised, and no unique patient identifier is available to match the databases. Fellegi and Sunter (1969) proposed a probabilistic record linkage method, based on the fact that we usually access some "matching variables" which serve as partial identifiers common to both databases (e.g., gender, postal codes, dates of the treatment…). They allow to calculate "matching probabilities" for each pair of patients taken in the SNDS and the health registry of interest. The Fellegi and Sunter model is limited to simple binary comparison between matching variables. In our first work, we proposed an extension of this model for mixed-type comparison vectors. We developed a mixture model for handling comparison values of low prevalence categorical matching variables, and a mixture of hurdle gamma distribution for handling comparison values of continuous matching variables. In a second work, we proposed models for survival analysis with matched data. Indeed, perfect matching is never achieved, and neglecting associated errors can lead to biased estimates. In this work, we proposed an adjusted estimating equation for secondary Cox regression analysis, where linked data have been prepared by someone else and no information on matching variables are available to the analyst. Finally, we may access the matching probabilities which convey some uncertainty on the matching process, and this uncertainty must be taken into account in any subsequent statistical analysis. We proposed a new method in order to take account of these errors in a survival analysis based on the Cox model. This method is based on the well-known EM algorithm for estimation in a missing-data context. The proposed models are applied to perform a survival analysis of linked data between a registry of patients suffering from venous thromboembolism in the Brest and the SNDS.
Joint work with Vanessa Chezeu, Huan Vo Tanh, Guillaume Chauvet and Jean-François Dupuy.
Vo T.H., Gares V., L-C. Zhang L-C., Happe A., Oger E., S. Paquelet S. et Chauvet G. Cox regression with linked data. Statistics in medecine. 43(2), pp. 296-314, 2023.
Vo T.H., Chauvet G., Happe A., Oger E., Paquelet S. et Gares V. Extending the Fellegi-Sunter record linkage model for mixed-type data with application to the French national health data system. Computational Statistics and Data Analysis journal. 79, pp. 107656, 2023.
Calendar subscription link for the complete seminar series:
https://indico.math.cnrs.fr/category/711/events.ics
Program of the Biostatistics seminars:
https://indico.math.cnrs.fr/category/711/
Subscribe to the seminar mailing list:
https://diff.u-bordeaux.fr/sympa/subscribe/seminaire.biostat.bph
Former e-seminars on our YouTube channel (mostly in French): https://www.youtube.com/channel/UCURp-hEQL7k23UzGfqgEurA/videos
Biostatistics seminar series from the Department of Public Health from the University of Bordeaux and the Bordeaux Population Health UMR 1219 research center
Boris Hejblum