Biclustering analysis on tree-shaped time-series single cell gene expression data of Caenorhabditis elegans

BMC Bioinformatics. 2024 May 9;25(1):183. doi: 10.1186/s12859-024-05800-y.

Abstract

Background: In recent years, gene clustering analysis has become a widely used tool for studying gene functions, efficiently categorizing genes with similar expression patterns to aid in identifying gene functions. Caenorhabditis elegans is commonly used in embryonic research due to its consistent cell lineage from fertilized egg to adulthood. Biologists use 4D confocal imaging to observe gene expression dynamics at the single-cell level. However, on one hand, the observed tree-shaped time-series datasets have characteristics such as non-pairwise data points between different individuals. On the other hand, the influence of cell type heterogeneity should also be considered during clustering, aiming to obtain more biologically significant clustering results.

Results: A biclustering model is proposed for tree-shaped single-cell gene expression data of Caenorhabditis elegans. Detailedly, a tree-shaped piecewise polynomial function is first employed to fit non-pairwise gene expression time series data. Then, four factors are considered in the objective function, including Pearson correlation coefficients capturing gene correlations, p-values from the Kolmogorov-Smirnov test measuring the similarity between cells, as well as gene expression size and bicluster overlapping size. After that, Genetic Algorithm is utilized to optimize the function.

Conclusion: The results on the small-scale dataset analysis validate the feasibility and effectiveness of our model and are superior to existing classical biclustering models. Besides, gene enrichment analysis is employed to assess the results on the complete real dataset analysis, confirming that the discovered biclustering results hold significant biological relevance.

Keywords: Biclustering; Genetic algorithm; Single-cell gene expression; Tree-shaped dataset.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Animals
  • Caenorhabditis elegans* / genetics
  • Caenorhabditis elegans* / metabolism
  • Cluster Analysis
  • Gene Expression Profiling / methods
  • Single-Cell Analysis* / methods