We aim to develop a machine-learning approach that improves disease gene discovery by incorporating the similarity of genes and diseases into graph encoding. We hypothesize that utilizing a multi-modal graph neural network approach, a novel similarity concept for data integration, effective ways for rank refinement, and an adapted pre-training strategy can reveal novel genetic causes for human rare diseases. In a proof-of-principle study, we will make use of a large neurodevelopmental disorder patient cohort, as well as other pediatric genetic disease cohorts. Specifically, we will develop an end-to-end multi-modal graph neural network for disease gene prioritization. This model will be evaluated in the disease cohorts and novel candidate genes will be experimentally confirmed by cell and animal models.
(supervised by PI Backofen)
Data collection & analysis, similarity definitions, graph encoding, pre-training, model development, and evaluation.
(supervised by PI Schmidts)
Experimental generation of additional cell biology data, unbiased evaluation of model predictions, and functional workup of novel disease-associated genes. This includes cell biology workup including gene editing, transcriptomics, protein expression studies, and in-vivo validation in model systems.