GIDB: a knowledge database for the automated curation and multidimensional analysis of molecular signatures in gastrointestinal cancer

Database (Oxford). 2019 Jan 1:2019:baz051. doi: 10.1093/database/baz051.

Abstract

Gastrointestinal (GI) cancer is common, characterized by high mortality, and includes oesophagus, gastric, liver, bile duct, pancreas, rectal and colon cancers. The insufficient specificity and sensitivity of biomarkers is still a key clinical hindrance for GI cancer diagnosis and successful treatment. The emergence of `precision medicine', `basket trial' and `field cancerization' concepts calls for an urgent need and importance for the understanding of how organ system cancers occur at the molecular levels. Knowledge from both the literature and data available in public databases is informative in elucidating the molecular alterations underlying GI cancer. Currently, most available cancer databases have not offered a comprehensive discovery of gene-disease associations, molecular alterations and clinical information by integrated text mining and data mining in GI cancer. We develop GIDB, a panoptic knowledge database that attempts to automate the curation of molecular signatures using natural language processing approaches and multidimensional analyses. GIDB covers information on 8730 genes with both literature and data supporting evidence, 248 miRNAs, 58 lncRNAs, 320 copy number variations, 49 fusion genes and 2381 semantic networks. It presents a comprehensive database, not only in parallelizing supporting evidence and data integration for signatures associated with GI cancer but also in providing the timeline feature of major molecular discoveries. It highlights the most comprehensive overview, research hotspots and the development of historical knowledge of genes in GI cancer. Furthermore, GIDB characterizes genomic abnormalities in multilevel analysis, including simple somatic mutations, gene expression, DNA methylation and prognosis. GIDB offers a user-friendly interface and two customizable online tools (Heatmap and Network) for experimental researchers and clinicians to explore data and help them shorten the learning curve and broaden the scope of knowledge. More importantly, GIDB is an ongoing research project that will continue to be updated and improve the automated method for reducing manual work.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Biomarkers, Tumor* / genetics
  • Biomarkers, Tumor* / metabolism
  • Data Curation*
  • Data Mining*
  • Gastrointestinal Neoplasms* / genetics
  • Gastrointestinal Neoplasms* / metabolism
  • Humans
  • Natural Language Processing*
  • Precision Medicine

Substances

  • Biomarkers, Tumor