Towards a data-driven system for personalized cervical cancer risk stratification

Geir Severin R E Langberg; Jan F Nygård; Vinay Chakravarthi Gogineni; Mari Nygård; Markus Grasmair; Valeriya Naumova

doi:10.1038/s41598-022-16361-6

Towards a data-driven system for personalized cervical cancer risk stratification

Sci Rep. 2022 Jul 15;12(1):12083. doi: 10.1038/s41598-022-16361-6.

Authors

Geir Severin R E Langberg¹, Jan F Nygård², Vinay Chakravarthi Gogineni³, Mari Nygård⁴, Markus Grasmair⁵, Valeriya Naumova⁶

Affiliations

¹ Department of Research, Cancer Registry of Norway (CRN), Oslo, 0379, Norway. langberg91@gmail.com.
² Department of Registry Informatics, CRN, Oslo, 0379, Norway.
³ Department of Electronic Systems, Norwegian University of Science and Technology (NTNU), Trondheim, 7491, Norway.
⁴ Department of Research, Cancer Registry of Norway (CRN), Oslo, 0379, Norway.
⁵ Department of Mathematical Sciences, NTNU, Trondheim, 7491, Norway.
⁶ Machine Intelligence Department, Simula Research Laboratory, Oslo, 0164, Norway.

Abstract

Mass-screening programs for cervical cancer prevention in the Nordic countries have been effective in reducing cancer incidence and mortality at the population level. Women who have been regularly diagnosed with normal screening exams represent a sub-population with a low risk of disease and distinctive screening strategies which avoid over-screening while identifying those with high-grade lesions are needed to improve the existing one-size-fits-all approach. Machine learning methods for more personalized cervical cancer risk estimation may be of great utility to screening programs shifting to more targeted screening. However, deriving personalized risk prediction models is challenging as effective screening has made cervical cancer rare and the exam results are strongly skewed towards normal. Moreover, changes in female lifestyle and screening habits over time can cause a non-stationary data distribution. In this paper, we treat cervical cancer risk prediction as a longitudinal forecasting problem. We define risk estimators by extending existing frameworks developed on cervical cancer screening data to incremental learning for longitudinal risk predictions and compare these estimators to machine learning methods popular in biomedical applications. As input to the prediction models, we utilize all the available data from the individual screening histories.Using data from the Cancer Registry of Norway, we find in numerical experiments that the models are strongly biased towards normal results due to imbalanced data. To identify females at risk of cancer development, we adapt an imbalanced classification strategy to non-stationary data. Using this strategy, we estimate the absolute risk from longitudinal model predictions and a hold-out set of screening data. Comparing absolute risk curves demonstrate that prediction models can closely reflect the absolute risk observed in the hold-out set. Such models have great potential for improving cervical cancer risk stratification for more personalized screening recommendations.

MeSH terms

Cervix Uteri / pathology
Early Detection of Cancer
Female
Humans
Mass Screening / methods
Papillomavirus Infections* / pathology
Risk Assessment
Uterine Cervical Neoplasms* / diagnosis
Uterine Cervical Neoplasms* / epidemiology
Uterine Cervical Neoplasms* / pathology