Linking cohorts and expert knowledge through categorical representations for improved knowledge extraction from longitudinal data

Project summary

In clinical cohorts, for example with Epidermolysis Bullosa (EB) patients, there are complex longitudinal patterns and sub-cohorts of patients for which comprehensive longitudinal characterization is available, and sub-cohorts where only limited information is available. We hypothesize that simplified helper models based on medically meaningful categorical representations of measurements can aid information transfer. In particular, we will investigate transfer from less well characterized sub-cohorts and medical expert knowledge into well-characterized sub-cohorts. Specifically, we will learn the much simpler joint distribution of the categorical data in a pre-training step, and subsequently train a model again at the level of the quantitative measurements. Such simpler distributions will also be used to formalize and incorporate medical expert knowledge.

Our methods

  • Combining knowledge- and data-driven modeling
  • Neural networks
  • Pre-training

Principal investigator 1


Principal investigator 2

Doctoral researcher position

(supervised by PI Hess)


Developing approaches for learning the joint distribution from longitudinal categorical data with generative models. Investigating pre-training of target models as well as domain adaptation.


  • Master’s degree or equivalent in mathematics, (bio-)statistics, computer science or similar
  • Programming skills in, e.g., Python, R, or Julia
  • Interest in modeling clinical data
  • Ideally, experience with deep learning

Administrative Manager

Marc Schumacher

Institute of Medical Biometry and Statistics,
Faculty of Medicine and Medical Center –
University of Freiburg