Studying the effects of haplotype partitioning methods on the RA-associated genomic results from the North American Rheumatoid Arthritis Consortium (NARAC) dataset

J Adv Res. 2019 Jan 18:18:113-126. doi: 10.1016/j.jare.2019.01.006. eCollection 2019 Jul.

Abstract

The human genome, which includes thousands of genes, represents a big data challenge. Rheumatoid arthritis (RA) is a complex autoimmune disease with a genetic basis. Many single-nucleotide polymorphism (SNP) association methods partition a genome into haplotype blocks. The aim of this genome wide association study (GWAS) was to select the most appropriate haplotype block partitioning method for the North American Rheumatoid Arthritis Consortium (NARAC) dataset. The methods used for the NARAC dataset were the individual SNP approach and the following haplotype block methods: the four-gamete test (FGT), confidence interval test (CIT), and solid spine of linkage disequilibrium (SSLD). The measured parameters that reflect the strength of the association between the biomarker and RA were the P-value after Bonferroni correction and other parameters used to compare the output of each haplotype block method. This work presents a comparison among the individual SNP approach and the three haplotype block methods to select the method that can detect all the significant SNPs when applied alone. The GWAS results from the NARAC dataset obtained with the different methods are presented. The individual SNP, CIT, FGT, and SSLD methods detected 541, 1516, 1551, and 1831 RA-associated SNPs respectively, and the individual SNP, FGT, CIT, and SSLD methods detected 65, 156, 159, and 450 significant SNPs respectively, that were not detected by the other methods. Three hundred eighty-three SNPs were discovered by the haplotype block methods and the individual SNP approach, while 1021 SNPs were discovered by all three haplotype block methods. The 383 SNPs detected by all the methods are promising candidates for studying RA susceptibility. A hybrid technique involving all four methods should be applied to detect the significant SNPs associated with RA in the NARAC dataset, but the SSLD method may be preferred because of its advantages when only one method was used.

Keywords: Confidence interval test; Four-gamete test; Genome-wide association study; NARAC; Rheumatoid arthritis; Solid spine of linkage disequilibrium.