Exploration, visualization, and preprocessing of high-dimensional data

Methods Mol Biol. 2010:620:267-84. doi: 10.1007/978-1-60761-580-4_8.

Abstract

The rapid advances in biotechnology have given rise to a variety of high-dimensional data. Many of these data, including DNA microarray data, mass spectrometry protein data, and high-throughput screening (HTS) assay data, are generated by complex experimental procedures that involve multiple steps such as sample extraction, purification and/or amplification, labeling, fragmentation, and detection. Therefore, the quantity of interest is not directly obtained and a number of preprocessing procedures are necessary to convert the raw data into the format with biological relevance. This also makes exploratory data analysis and visualization essential steps to detect possible defects, anomalies or distortion of the data, to test underlying assumptions and thus ensure data quality. The characteristics of the data structure revealed in exploratory analysis often motivate decisions in preprocessing procedures to produce data suitable for downstream analysis. In this chapter we review the common techniques in exploring and visualizing high-dimensional data and introduce the basic preprocessing procedures.

Publication types

  • Review

MeSH terms

  • Computer Graphics*
  • Data Interpretation, Statistical*
  • Mass Spectrometry
  • Microarray Analysis
  • Proteomics