Scientific Framework

In SmallData, we address data analysis and modeling in small data settings, i.e., when there is only little information in a dataset at hand, due to a small number of observations that carry relevant information, relative to the complexity of novel patterns to be uncovered or the level of heterogeneity across observations.

We focus on

Similarity for pulling in additional data of the same type (Project Area A),

Transfer for transferring additional information to the dataset at hand, such as from data of different type (Project Area B),

Uncertainty for quantifying and reducing uncertainty in particular in similarity and transfer (Project Area C).

This is enabled by a joint methods framework, with a focus on combining knowledge-driven and data-driven modeling.

See the research projects

Knowledge-driven modeling

Using mathematical/statistical models with strong structural assumptions. The aim of modeling then is to better understand the system, i.e., to add to the body of knowledge.

Data-driven modeling

Avoiding strong structural assumptions for flexibly picking up patterns in the data. Frequently, this corresponds to applications where the primary aim is prediction.

Combining knowledge-driven and data-driven modeling

Addressing challenges of data- driven modeling, e.g., by combining neural networks with differential equations, allowing to impose structural assumptions, while still maintaining flexibility.

Differential equations

Example of knowledge-driven modeling with strong structural assumptions. We will also consider settings where the process that is to be described by such a dynamic model can only be indirectly observed.

Neural networks

Data-driven approach, where it is commonly assumed that a large number of observations is required. We will investigate how this still can be feasible with a small number of observations.


A specific approach for sharing information between datasets, e.g., on model parameters or tuning parameters.


Asymmetric approach for transferring parameters between datasets. Typically, a large datasets are only used for model initialization, i.e., pre-training of parameters, with subsequent fine-tuning on a small dataset.

Local perspective

Useful for mathematically analyzing the local characteristics of a modeling problem, such as uncertainty. Furthermore, problems like the choice of the numerical optimization strategy might best be addressed locally in a small data setting.

Administrative Manager

Marc Schumacher

Institute of Medical Biometry and Statistics,
Faculty of Medicine and Medical Center –
University of Freiburg