The task of classifying a human DNA trace from a crime spot into (sub-continental) regions is usually referred to as the analysis of biogeographical ancestry. This analysis is complicated by admixture, i.e., the possibility that every genetic marker may originate from a different region. The corresponding amount of uncertainty can be expected to vary depending on the trace, and on the quality of the reference database. Most importantly, test data may have a different distribution than the reference database. To address this small data challenge, we aim for a general theoretical framework for local uncertainty quantification in classification in situations with unequal though suitably similar distributions of training and test data. To allow for this, we will formalize the notion of local uncertainty of decision boundaries, for subsequently developing approaches for obtaining uncertainty estimates and analyzing their properties.
(supervised by PI Lutz-Bonengel)
Work related to admixed samples, and all data applications of the developed methods.
(supervised by PI Pfaffelhuber)
Developing mathematical theory for guiding methods development, in a filtering framework, which will, e.g., provide error bounds and goodness-of-fit approaches for model selection.
(supervised by PI Rohde)
Fundamental statistical questions, such as transfer learning and nonparametric classifiers.