Improving efficiency of fitting Cox proportional hazards models for time-to-event outcomes in genome-wide association studies (GWAS)

Bioinform Adv. 2023 Oct 13;3(1):vbad148. doi: 10.1093/bioadv/vbad148. eCollection 2023.

Abstract

Summary: Technologies identifying single nucleotide polymorphisms (SNPs) in DNA sequencing yield an avalanche of data requiring analysis and interpretation. Standard methods may require many weeks of processing time. The use of statistical methods requiring data sorting, matrix inversions of a high-dimension and replication in subsets of the data on multiple outcomes exacerbate these times.A method which reduces the computational time in problems with time-to-event outcomes and hundreds of thousands/millions of SNPs using Cox-Snell residuals after fitting the Cox proportional hazards model (PH) to a fixed set of concomitant variables is proposed. This yields coefficients for SNP effect from a Cox-Snell adjusted Poisson model and shows a high concordance to the adjusted PH model.The method is illustrated with a sample of 10 000 SNPs from a genome-wide association study in a diabetic population. The gain in processing efficiency using the proposed method based on Poisson modelling can be as high as 62%. This could result in saving of over three weeks processing time if 5 million SNPs require analysis. The method involves only a single predictor variable (SNP), offering a simpler, computationally more stable approach to examining and identifying SNP patterns associated with the outcome(s) allowing for a faster development of genetic signatures. Use of deviance residuals from the PH model to screen SNPs demonstrates a large discordance rate at a 0.2% threshold of concordance. This rate is 15 times larger than that based on the Cox-Snell residuals from the Cox-Snell adjusted Poisson model.

Availability and implementation: The method is simple to implement as the procedures are available in most statistical packges. The approach involves obtaining Cox-Snell residuals from a PH model, to a binary time-to-event outcome, for factors which need to be common when assessing each SNP. Each SNP is then fitted as a predictor to the outcome of interest using a Poisson model with the Cox-Snell as the exposure variable.