SmallData Symposium 2024

C01

Uncertainty quantification in classification with applications in forensic genetics

Project summary

The task of classifying a human DNA trace from a crime spot into (sub-continental) regions is usually referred to as the analysis of biogeographical ancestry. This analysis is complicated by admixture, i.e., the possibility that every genetic marker may originate from a different region. The corresponding amount of uncertainty can be expected to vary depending on the trace, and on the quality of the reference database. Most importantly, test data may have a different distribution than the reference database. To address this small data challenge, we aim for a general theoretical framework for local uncertainty quantification in classification in situations with unequal though suitably similar distributions of training and test data. To allow for this, we will formalize the notion of local uncertainty of decision boundaries, for subsequently developing approaches for obtaining uncertainty estimates and analyzing their properties.