Simple rules underlying gene expression profiles of more than six subtypes of acute lymphoblastic leukemia (ALL) patients

Bioinformatics. 2003 Jan;19(1):71-8. doi: 10.1093/bioinformatics/19.1.71.

Abstract

Motivations and results: For classifying gene expression profiles or other types of medical data, simple rules are preferable to non-linear distance or kernel functions. This is because rules may help us understand more about the application in addition to performing an accurate classification. In this paper, we discover novel rules that describe the gene expression profiles of more than six subtypes of acute lymphoblastic leukemia (ALL) patients. We also introduce a new classifier, named PCL, to make effective use of the rules. PCL is accurate and can handle multiple parallel classifications. We evaluate this method by classifying 327 heterogeneous ALL samples. Our test error rate is competitive to that of support vector machines, and it is 71% better than C4.5, 50% better than Naive Bayes, and 43% better than k-nearest neighbour. Experimental results on another independent data sets are also presented to show the strength of our method.

Availability: Under http://sdmc.lit.org.sg/GEDatasets/, click on Supplementary Information.

Publication types

  • Comparative Study
  • Evaluation Study
  • Validation Study

MeSH terms

  • Algorithms*
  • Biomarkers, Tumor / classification
  • Biomarkers, Tumor / genetics
  • Cluster Analysis
  • DNA, Neoplasm / classification*
  • DNA, Neoplasm / genetics*
  • Gene Expression Profiling / methods*
  • Gene Expression Regulation, Neoplastic / genetics
  • Genetic Markers / genetics
  • Humans
  • Models, Genetic
  • Models, Statistical
  • Pattern Recognition, Automated
  • Precursor Cell Lymphoblastic Leukemia-Lymphoma / classification
  • Precursor Cell Lymphoblastic Leukemia-Lymphoma / genetics*

Substances

  • Biomarkers, Tumor
  • DNA, Neoplasm
  • Genetic Markers