Extraction of gene-disease relations from Medline using domain dictionaries and machine learning

Pac Symp Biocomput. 2006:4-15.

Abstract

We describe a system that extracts disease-gene relations from Medline. We constructed a dictionary for disease and gene names from six public databases and extracted relation candidates by dictionary matching. Since dictionary matching produces a large number of false positives, we developed a method of machine learning-based named entity recognition (NER) to filter out false recognitions of disease/gene names. We found that the performance of relation extraction is heavily dependent upon the performance of NER filtering and that the filtering improves the precision of relation extraction by 26.7% at the cost of a small reduction in recall.

MeSH terms

  • Animals
  • Artificial Intelligence*
  • Computing Methodologies
  • Dictionaries, Medical as Topic
  • Disease*
  • Genes*
  • Humans
  • MEDLINE*
  • Terminology as Topic
  • Unified Medical Language System