CLCLSA: Cross-omics linked embedding with contrastive learning and self attention for integration with incomplete multi-omics data

Chen Zhao; Anqi Liu; Xiao Zhang; Xuewei Cao; Zhengming Ding; Qiuying Sha; Hui Shen; Hong-Wen Deng; Weihua Zhou

doi:10.1016/j.compbiomed.2024.108058

CLCLSA: Cross-omics linked embedding with contrastive learning and self attention for integration with incomplete multi-omics data

Comput Biol Med. 2024 Mar:170:108058. doi: 10.1016/j.compbiomed.2024.108058. Epub 2024 Jan 28.

Authors

Chen Zhao¹, Anqi Liu², Xiao Zhang², Xuewei Cao³, Zhengming Ding⁴, Qiuying Sha³, Hui Shen², Hong-Wen Deng⁵, Weihua Zhou⁶

Affiliations

¹ Department of Computer Science, Kennesaw State University, Marietta, GA, 30060, USA.
² Division of Biomedical Informatics and Genomics, Tulane Center of Biomedical Informatics and Genomics, Deming Department of Medicine, Tulane University, New Orleans, LA, 70112, USA.
³ Department of Mathematical Sciences, Michigan Technological University, 1400 Townsend Dr, Houghton, MI, 49931, USA.
⁴ Department of Computer Science, Tulane University, New Orleans, LA, 70118, USA.
⁵ Division of Biomedical Informatics and Genomics, Tulane Center of Biomedical Informatics and Genomics, Deming Department of Medicine, Tulane University, New Orleans, LA, 70112, USA. Electronic address: hdeng2@tulane.edu.
⁶ Department of Applied Computing, Michigan Technological University, 1400 Townsend Dr, Houghton, MI, 49931, USA; Center for Biocomputing and Digital Health, Institute of Computing and Cybersystems, and Health Research Institute, Michigan Technological University, Houghton, MI, 49931, USA. Electronic address: whzhou@mtu.edu.

Abstract

Integration of heterogeneous and high-dimensional multi-omics data is becoming increasingly important in understanding etiology of complex genetic diseases. Each omics technique only provides a limited view of the underlying biological process and integrating heterogeneous omics layers simultaneously would lead to a more comprehensive and detailed understanding of diseases and phenotypes. However, one obstacle faced when performing multi-omics data integration is the existence of unpaired multi-omics data due to instrument sensitivity and cost. Studies may fail if certain aspects of the subjects are missing or incomplete. In this paper, we propose a deep learning method for multi-omics integration with incomplete data by Cross-omics Linked unified embedding with Contrastive Learning and Self Attention (CLCLSA). Utilizing complete multi-omics data as supervision, the model employs cross-omics autoencoders to learn the feature representation across different types of biological data. The multi-omics contrastive learning is employed, which maximizes the mutual information between different types of omics. In addition, the feature-level self-attention and omics-level self-attention are employed to dynamically identify the most informative features for multi-omics data integration. Finally, a Softmax classifier is employed to perform multi-omics data classification. Extensive experiments were conducted on four public multi-omics datasets. The experimental results indicate that our proposed CLCLSA produces promising results in multi-omics data classification using both complete and incomplete multi-omics data.

Keywords: Autoencoders; Contrastive learning; Deep learning; Incomplete omics data; Multi-omics integration.

MeSH terms

Head*
Humans
Multiomics*
Phenotype

Abstract

MeSH terms

Grants and funding