Development and Validation of a Colorectal Cancer Prediction Model: A Nationwide Cohort-Based Study

Dig Dis Sci. 2024 Apr 25. doi: 10.1007/s10620-024-08427-4. Online ahead of print.

Abstract

Background: Early diagnosis of colorectal cancer (CRC) is critical to increasing survival rates. Computerized risk prediction models hold great promise for identifying individuals at high risk for CRC. In order to utilize such models effectively in a population-wide screening setting, development and validation should be based on cohorts that are similar to the target population.

Aim: Establish a risk prediction model for CRC diagnosis based on electronic health records (EHR) from subjects eligible for CRC screening.

Methods: A retrospective cohort study utilizing the EHR data of Clalit Health Services (CHS). The study includes CHS members aged 50-74 who were eligible for CRC screening from January 2013 to January 2019. The model was trained to predict receiving a CRC diagnosis within 2 years of the index date. Approximately 20,000 EHR demographic and clinical features were considered.

Results: The study includes 2935 subjects with CRC diagnosis, and 1,133,457 subjects without CRC diagnosis. Incidence values of CRC among subjects in the top 1% risk scores were higher than baseline (2.3% vs 0.3%; lift 8.38; P value < 0.001). Cumulative event probabilities increased with higher model scores. Model-based risk stratification among subjects with a positive FOBT, identified subjects with more than twice the risk for CRC compared to FOBT alone.

Conclusions: We developed an individualized risk prediction model for CRC that can be utilized as a complementary decision support tool for healthcare providers to precisely identify subjects at high risk for CRC and refer them for confirmatory testing.

Keywords: Colorectal cancer; Colorectal cancer screening; Machine learning.