A causal inference framework for cancer cluster investigations using publicly available data

J R Stat Soc Ser A Stat Soc. 2020 Jun;183(3):1253-1272. doi: 10.1111/rssa.12567. Epub 2020 Apr 25.

Abstract

Often, a community becomes alarmed when high rates of cancer are noticed, and residents suspect that the cancer cases could be caused by a known source of hazard. In response, the US Centers for Disease Control and Prevention recommend that departments of health perform a standardized incidence ratio (SIR) analysis to determine whether the observed cancer incidence is higher than expected. This approach has several limitations that are well documented in the existing literature. In this paper we propose a novel causal inference framework for cancer cluster investigations, rooted in the potential outcomes framework. Assuming that a source of hazard representing a potential cause of increased cancer rates in the community is identified a priori, we focus our approach on a causal inference estimand which we call the causal SIR (cSIR). The cSIR is a ratio defined as the expected cancer incidence in the exposed population divided by the expected cancer incidence for the same population under the (counterfactual) scenario of no exposure. To estimate the cSIR we need to overcome two main challenges: 1) identify unexposed populations that are as similar as possible to the exposed one to inform estimation of the expected cancer incidence under the counterfactual scenario of no exposure, and 2) publicly available data on cancer incidence for these unexposed populations are often available at a much higher level of spatial aggregation (e.g. county) than what is desired (e.g. census block group). We overcome the first challenge by relying on matching. We overcome the second challenge by building a Bayesian hierarchical model that borrows information from other sources to impute cancer incidence at the desired level of spatial aggregation. In simulations, our statistical approach was shown to provide dramatically improved results, i.e., less bias and better coverage, than the current approach to SIR analyses. We apply our proposed approach to investigate whether trichloroethylene vapor exposure has caused increased cancer incidence in Endicott, New York.

Keywords: Bayesian; Cancer cluster investigation; Causal inference; Endicott; Matching; Spatial over-aggregation; Trichloroethylene vapor.