Séminaire de Biostatistique

Meta-clustering of Gene Expression Data

by Dr Yingying Wei (The Chinese University of Hong-Kong)

Europe/Paris
https://u-bordeaux-fr.zoom.us/j/83760818301?pwd=Vu4YXwdEsyG5SgKjRLajmczYzMbHpY.1 (Zoom)

https://u-bordeaux-fr.zoom.us/j/83760818301?pwd=Vu4YXwdEsyG5SgKjRLajmczYzMbHpY.1

Zoom

Description

Speaker: Yingying Wei from the Chinese University of Hong-Kong

Abstract: Traditional meta-analyses pool effect sizes across studies to improve statistical power. Likewise, there is growing interest in joint clustering across datasets to identify disease subtypes for bulk gene expression data and to discover cell types for single-cell RNA-sequencing (scRNA-seq) data. Unfortunately, due to the prevalence of technical batch effects, directly clustering samples from multiple gene expression datasets can lead to wrong results. Therefore, in the past several years, there has been very active research on the integration of multiple gene expression datasets. However, the discussion on when multiple gene expression datasets can be integrated for joint clustering is lacking. Obviously, if different subtypes are assayed in distinct batches, then meta-clustering would be impossible no matter what types of machine learning or statistical methods are used. In this talk, I will present our Batch-effects-correction-with-Unknown-Subtypes (BUS) framework. BUS is capable of adjusting batch effects explicitly, grouping samples that share similar characteristics into subtypes, identifying genes that distinguish subtypes and enjoying a linear-order computational complexity. The BUS framework can be adapted to perform meta-clustering for bulk gene expression data, scRNA-seq data collected from a single biological condition, and scRNA-seq data collected from multiple biological conditions, respectively. The proofs for model identifiability for the corresponding models provide insights on when multiple gene expression data can be integrated for meta-clustering. Simulation studies and real data analyses show the advantage of BUS over state-of-the-art methods.

This seminar will be in English

 

Calendar subscription link for the complete seminar series:
https://indico.math.cnrs.fr/category/711/events.ics

Program of the Biostatistics seminars:
https://indico.math.cnrs.fr/category/711/

Subscribe to the seminar mailing list:
https://diff.u-bordeaux.fr/sympa/subscribe/seminaire.biostat.bph

Former e-seminars on our YouTube channel (mostly in French):
https://www.youtube.com/channel/UCURp-hEQL7k23UzGfqgEurA/videos

Biostatistics seminar series from the Department of Public Health from the University of Bordeaux and the Bordeaux Population Health UMR 1219 research center

Organized by

Boris Hejblum