TOBMI: trans-omics block missing data imputation using a k-nearest neighbor weighted approach

Xuesi Dong; Lijuan Lin; Ruyang Zhang; Yang Zhao; David C Christiani; Yongyue Wei; Feng Chen

doi:10.1093/bioinformatics/bty796

TOBMI: trans-omics block missing data imputation using a k-nearest neighbor weighted approach

Bioinformatics. 2019 Apr 15;35(8):1278-1283. doi: 10.1093/bioinformatics/bty796.

Authors

Xuesi Dong^{1

2}, Lijuan Lin¹, Ruyang Zhang^{1

3}, Yang Zhao^{1

3}, David C Christiani^{3

4}, Yongyue Wei^{1

3}, Feng Chen^{1

3}

Affiliations

¹ Department of Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, China.
² Department of Epidemiology and Biostatistics, School of Public Health, Southeast University, Nanjing, China.
³ China International Cooperation Center for Environment and Human Health, Nanjing Medical University, Nanjing, China.
⁴ Department of Environmental Health, Harvard School of Public Health, Boston, MA, USA.

PMID: 30202885
DOI: 10.1093/bioinformatics/bty796

Abstract

Motivation: Stitching together trans-omics data is a powerful approach to assess the complex mechanisms of cancer occurrence, progression and treatment. However, the integration process suffers from the 'block missing' phenomena when part of individuals lacks some omics data.

Results: We proposed a k-nearest neighbor (kNN) weighted imputation method for trans-omics block missing data (TOBMIkNN) to handle gene-absence individuals in RNA-seq datasets using external information obtained from DNA methylation probe datasets. Referencing to multi-hot deck, mean imputation and missing cases deletion, we assess the relative error, absolute error, inter-omics correlation structure change and variable selection.The proposed method, TOBMIkNN reliably imputed RNA-seq data by borrowing information from DNA methylation data, and showed superiority over the other three methods in imputation error and stability of correlation structure. Our study indicates that TOBMIkNN can be used as an advisable method for trans-omics block missing data imputation.

Availability and implementation: TOBMIkNN is freely available at https://github.com/XuesiDong/TOBMI.

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Cluster Analysis*
Humans
Research Design