Sparse linear discriminant analysis for simultaneous testing for the significance of a gene set/pathway and gene selection

Bioinformatics. 2009 May 1;25(9):1145-51. doi: 10.1093/bioinformatics/btp019. Epub 2009 Jan 25.

Abstract

Motivation: Pathway and gene set-based approaches for the analysis of gene expression profiling experiments have become increasingly popular for addressing problems associated with individual gene analysis. Since most genes are not differently expressed, existing gene set tests, which consider all the genes within a gene set, are subject to considerable noise and power loss, a concern exacerbated in studies in which the degree of differential expression is moderate for truly differentially expressed genes. For a significantly differentially expressed pathway, it is also of substantial interest to select important genes that drive the differential expression of the pathway.

Methods: We develop a unified framework to jointly test the significance of a pathway and to select a subset of genes that drive the significant pathway effect. To achieve dimension reduction and gene selection, we decompose each gene pathway into a single score by using a regularized form of linear discriminant analysis, called sparse linear discriminant analysis (sLDA). Testing for the significance of the pathway effect proceeds via permutation of the sLDA score. The sLDA-based test is compared with competing approaches with simulations and two applications: a study on the effect of metal fume exposure on immune response and a study of gene expression profiles among Type II Diabetes patients.

Results: Our results show that sLDA-based testing provides a powerful approach to test for the significance of a differentially expressed pathway and gene selection.

Availability: An implementation of the proposed sLDA-based pathway test in the R statistical computing environment is available at http://www.hsph.harvard.edu/~mwu/software/.

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Algorithms*
  • Databases, Protein
  • Diabetes Mellitus / genetics
  • Discriminant Analysis
  • Gene Expression Profiling / methods*
  • Humans