Séminaire de Statistique et Optimisation

Record linkage and analysis of linked data with application in French national health data system

by Valérie Garès (INSA Rennes)

Salle K. Johnson, 1er étage (1R3)

Salle K. Johnson, 1er étage



In this work, we extend the Fellegi-Sunter probabilistic record linkage model for mixed-type data. Probabilistic record linkage is a process of combining data from different sources, when such data refer to common entities and identifying information is not available. Fellegi and Sunter proposed a probabilistic record linkage framework that takes into account multiple non-identifying information, but is limited to simple binary comparison between matching variables. We propose an extension of this model for mixed-type comparison vectors. We develop a mixture model for handling comparison values of low prevalence categorical matching variables, and a mixture of hurdle gamma distribution for handling comparison values of continuous matching variables. The proposed model is applied to perform linkage between a registry of patients suffering from venous thromboembolism in the Brest and the French national health data system. In a second work, we propose a model for Cox regression with linked data. The linked data can bring analysts novel and valuable knowledge which is unable to obtain from a single database. However, linkage errors are usually unavoidable regardless of record linkage methods and ignoring these errors may lead to bias estimates. In this work, we propose an adjusted estimating equation for secondary Cox regression analysis, where linked data have been prepared by someone else and no information on matching variables are available to the analyst. An asymptotically unbiased variance estimator is also proposed. The proposed model is applied to a linked database from the Brest stroke registry.