Development and Validation of a Colorectal Cancer Prediction Model: A Nationwide Cohort-Based Study

Ofer Isakov; Dan Riesel; Michael Leshchinsky; Galit Shaham; Ben Y Reis; Dan Keret; Zohar Levi; Baruch Brener; Ran Balicer; Noa Dagan; Samah Hayek

doi:10.1007/s10620-024-08427-4

Development and Validation of a Colorectal Cancer Prediction Model: A Nationwide Cohort-Based Study

Dig Dis Sci. 2024 Apr 25. doi: 10.1007/s10620-024-08427-4. Online ahead of print.

Authors

Ofer Isakov^{1

2

3}, Dan Riesel¹, Michael Leshchinsky¹, Galit Shaham¹, Ben Y Reis^{2

3

4

5}, Dan Keret⁶, Zohar Levi⁷, Baruch Brener^{8

9}, Ran Balicer^{1

3

10}, Noa Dagan^{1

3

11}, Samah Hayek^{12

13}

Affiliations

¹ Innovation Division, Clalit Research Institute, Clalit Health Services, Tel Aviv, Israel.
² Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
³ The Ivan and Francesca Berkowitz Family Living Laboratory Collaboration at Harvard Medical School and Clalit Research Institute, Boston, MA, USA.
⁴ Predictive Medicine Group, Boston Children's Hospital, Boston, MA, USA.
⁵ Harvard Medical School, Boston, MA, USA.
⁶ Gastroenterology and Hepatology Department, Clalit Health Services, Jerusalem, Israel.
⁷ Department of Gastroenterology, Beilinson Medical Center, Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel.
⁸ Institute of Oncology, Davidoff Cancer Center, Rabin Medical Center, Beilinson Campus, Petah Tikva, Israel.
⁹ Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel.
¹⁰ School of Public Health, Faculty of Health Sciences, Ben Gurion University of the Negev, Be'er Sheva, Israel.
¹¹ Software and Information Systems Engineering, Ben Gurion University of the Negev, Be'er Sheva, Israel.
¹² Innovation Division, Clalit Research Institute, Clalit Health Services, Tel Aviv, Israel. Samahhayek@tauex.tau.ac.il.
¹³ Department of Epidemiology and Preventive Medicine, School of Public Health, Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel. Samahhayek@tauex.tau.ac.il.

PMID: 38662163
DOI: 10.1007/s10620-024-08427-4

Abstract

Background: Early diagnosis of colorectal cancer (CRC) is critical to increasing survival rates. Computerized risk prediction models hold great promise for identifying individuals at high risk for CRC. In order to utilize such models effectively in a population-wide screening setting, development and validation should be based on cohorts that are similar to the target population.

Aim: Establish a risk prediction model for CRC diagnosis based on electronic health records (EHR) from subjects eligible for CRC screening.

Methods: A retrospective cohort study utilizing the EHR data of Clalit Health Services (CHS). The study includes CHS members aged 50-74 who were eligible for CRC screening from January 2013 to January 2019. The model was trained to predict receiving a CRC diagnosis within 2 years of the index date. Approximately 20,000 EHR demographic and clinical features were considered.

Results: The study includes 2935 subjects with CRC diagnosis, and 1,133,457 subjects without CRC diagnosis. Incidence values of CRC among subjects in the top 1% risk scores were higher than baseline (2.3% vs 0.3%; lift 8.38; P value < 0.001). Cumulative event probabilities increased with higher model scores. Model-based risk stratification among subjects with a positive FOBT, identified subjects with more than twice the risk for CRC compared to FOBT alone.

Conclusions: We developed an individualized risk prediction model for CRC that can be utilized as a complementary decision support tool for healthcare providers to precisely identify subjects at high risk for CRC and refer them for confirmatory testing.

Keywords: Colorectal cancer; Colorectal cancer screening; Machine learning.