Pattern recognition in bioinformatics

Dick de Ridder; Jeroen de Ridder; Marcel J T Reinders

doi:10.1093/bib/bbt020

Pattern recognition in bioinformatics

Brief Bioinform. 2013 Sep;14(5):633-47. doi: 10.1093/bib/bbt020. Epub 2013 Apr 4.

Authors

Dick de Ridder¹, Jeroen de Ridder, Marcel J T Reinders

Affiliation

¹ Delft Bioinformatics Lab, Department of Intelligent Systems, Faculty of Electrical Engineering, Mathematics & Computer Science, Delft University of Technology, Mekelweg 4, 2628 CD Delft, The Netherlands. Tel.: +31 15 2785114; Fax: +31 15 2787022; d.deridder@tudelft.nl.

PMID: 23559637
DOI: 10.1093/bib/bbt020

Abstract

Pattern recognition is concerned with the development of systems that learn to solve a given problem using a set of example instances, each represented by a number of features. These problems include clustering, the grouping of similar instances; classification, the task of assigning a discrete label to a given instance; and dimensionality reduction, combining or selecting features to arrive at a more useful representation. The use of statistical pattern recognition algorithms in bioinformatics is pervasive. Classification and clustering are often applied to high-throughput measurement data arising from microarray, mass spectrometry and next-generation sequencing experiments for selecting markers, predicting phenotype and grouping objects or genes. Less explicitly, classification is at the core of a wide range of tools such as predictors of genes, protein function, functional or genetic interactions, etc., and used extensively in systems biology. A course on pattern recognition (or machine learning) should therefore be at the core of any bioinformatics education program. In this review, we discuss the main elements of a pattern recognition course, based on material developed for courses taught at the BSc, MSc and PhD levels to an audience of bioinformaticians, computer scientists and life scientists. We pay attention to common problems and pitfalls encountered in applications and in interpretation of the results obtained.

Keywords: bioinformatics; classification; clustering; dimensionality reduction; pattern recognition.

Publication types

Review

MeSH terms

Algorithms
Artificial Intelligence
Cluster Analysis
Computational Biology / education*
Curriculum
High-Throughput Nucleotide Sequencing / statistics & numerical data
Pattern Recognition, Automated / methods*
Pattern Recognition, Automated / statistics & numerical data