Anatomical entity mention recognition at literature scale

Bioinformatics. 2014 Mar 15;30(6):868-75. doi: 10.1093/bioinformatics/btt580. Epub 2013 Oct 25.

Abstract

Motivation: Anatomical entities ranging from subcellular structures to organ systems are central to biomedical science, and mentions of these entities are essential to understanding the scientific literature. Despite extensive efforts to automatically analyze various aspects of biomedical text, there have been only few studies focusing on anatomical entities, and no dedicated methods for learning to automatically recognize anatomical entity mentions in free-form text have been introduced.

Results: We present AnatomyTagger, a machine learning-based system for anatomical entity mention recognition. The system incorporates a broad array of approaches proposed to benefit tagging, including the use of Unified Medical Language System (UMLS)- and Open Biomedical Ontologies (OBO)-based lexical resources, word representations induced from unlabeled text, statistical truecasing and non-local features. We train and evaluate the system on a newly introduced corpus that substantially extends on previously available resources, and apply the resulting tagger to automatically annotate the entire open access scientific domain literature. The resulting analyses have been applied to extend services provided by the Europe PubMed Central literature database.

Availability and implementation: All tools and resources introduced in this work are available from http://nactem.ac.uk/anatomytagger.

Contact: sophia.ananiadou@manchester.ac.uk

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Anatomy*
  • Animals
  • Artificial Intelligence*
  • Databases, Factual*
  • Humans
  • Publications
  • Unified Medical Language System