Pattern recognition in bioinformatics

Brief Bioinform. 2013 Sep;14(5):633-47. doi: 10.1093/bib/bbt020. Epub 2013 Apr 4.

Abstract

Pattern recognition is concerned with the development of systems that learn to solve a given problem using a set of example instances, each represented by a number of features. These problems include clustering, the grouping of similar instances; classification, the task of assigning a discrete label to a given instance; and dimensionality reduction, combining or selecting features to arrive at a more useful representation. The use of statistical pattern recognition algorithms in bioinformatics is pervasive. Classification and clustering are often applied to high-throughput measurement data arising from microarray, mass spectrometry and next-generation sequencing experiments for selecting markers, predicting phenotype and grouping objects or genes. Less explicitly, classification is at the core of a wide range of tools such as predictors of genes, protein function, functional or genetic interactions, etc., and used extensively in systems biology. A course on pattern recognition (or machine learning) should therefore be at the core of any bioinformatics education program. In this review, we discuss the main elements of a pattern recognition course, based on material developed for courses taught at the BSc, MSc and PhD levels to an audience of bioinformaticians, computer scientists and life scientists. We pay attention to common problems and pitfalls encountered in applications and in interpretation of the results obtained.

Keywords: bioinformatics; classification; clustering; dimensionality reduction; pattern recognition.

Publication types

  • Review

MeSH terms

  • Algorithms
  • Artificial Intelligence
  • Cluster Analysis
  • Computational Biology / education*
  • Curriculum
  • High-Throughput Nucleotide Sequencing / statistics & numerical data
  • Pattern Recognition, Automated / methods*
  • Pattern Recognition, Automated / statistics & numerical data