Automatic summarization of mouse gene information by clustering and sentence extraction from MEDLINE abstracts

AMIA Annu Symp Proc. 2007 Oct 11:2007:831-5.

Abstract

Tools to automatically summarize gene information from the literature have the potential to help genomics researchers better interpret gene expression data and investigate biological pathways. The task of finding information on sets of genes is common for genomic researchers, and PubMed is still the first choice because the most recent and original information can only be found in the unstructured, free text biomedical literature. However, finding information on a set of genes by manually searching and scanning the literature is a time-consuming and daunting task for scientists. We built and evaluated a query-based automatic summarizer of information on mouse genes studied in microarray experiments. The system clusters a set of genes by MeSH, GO and free text features and presents summaries for each gene by ranked sentences extracted from MEDLINE abstracts. Evaluation showed that the system seems to provide meaningful clusters and informative sentences are ranked higher by the algorithm.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Abstracting and Indexing
  • Algorithms
  • Animals
  • Gene Expression
  • Genes*
  • Genome
  • Genomics / methods
  • MEDLINE
  • Medical Subject Headings
  • Mice
  • Natural Language Processing*
  • Oligonucleotide Array Sequence Analysis
  • Software