Exploration, visualization, and preprocessing of high-dimensional data

Zhijin Wu; Zhiqiang Wu

doi:10.1007/978-1-60761-580-4_8

Exploration, visualization, and preprocessing of high-dimensional data

Methods Mol Biol. 2010:620:267-84. doi: 10.1007/978-1-60761-580-4_8.

Authors

Zhijin Wu¹, Zhiqiang Wu

Affiliation

¹ Center for Statistical Sciences, Brown University, Providence, RI, USA.

PMID: 20652508
DOI: 10.1007/978-1-60761-580-4_8

Abstract

The rapid advances in biotechnology have given rise to a variety of high-dimensional data. Many of these data, including DNA microarray data, mass spectrometry protein data, and high-throughput screening (HTS) assay data, are generated by complex experimental procedures that involve multiple steps such as sample extraction, purification and/or amplification, labeling, fragmentation, and detection. Therefore, the quantity of interest is not directly obtained and a number of preprocessing procedures are necessary to convert the raw data into the format with biological relevance. This also makes exploratory data analysis and visualization essential steps to detect possible defects, anomalies or distortion of the data, to test underlying assumptions and thus ensure data quality. The characteristics of the data structure revealed in exploratory analysis often motivate decisions in preprocessing procedures to produce data suitable for downstream analysis. In this chapter we review the common techniques in exploring and visualizing high-dimensional data and introduce the basic preprocessing procedures.

Publication types

Review

MeSH terms

Computer Graphics*
Data Interpretation, Statistical*
Mass Spectrometry
Microarray Analysis
Proteomics