Sub-Population Identification of Multi-morbidity in Sub-Saharan African Populations

  • Skyler Speakman, IBM Research Africa

This work bridges medical literature on multimorbidity with data science approaches to exploratory health data analysis. We define multimorbidity as the co-occurrence of at least two disease diagnoses from a pre-determined list, contributing to evolving definitions in the field. Applying this definition to two sub-Saharan populations—Nairobi, Kenya and Agincourt, South Africa—using data from the Africa Wits-INDEPTH Partnership for Genomic Studies, we explore patterns of disease co-occurrence across diverse demographic groups.

We automatically stratify the data to identify sub-populations with unusually high or low multimorbidity rates, offering a scalable alternative to traditional confirmatory methods. Notably, high-risk groups in one location often mirror those in the other, and we uncover nuanced risk profiles beyond age and sex, challenging common stratification practices. This work demonstrates how data science methods can enhance public health research by revealing complex patterns in large-scale datasets.