Building a common pipeline for rule-based document classification

Stud Health Technol Inform. 2013:192:1211.

Abstract

Instance-based classification of clinical text is a widely used natural language processing task employed as a step for patient classification, document retrieval, or information extraction. Rule-based approaches rely on concept identification and context analysis in order to determine the appropriate class. We propose a five-step process that enables even small research teams to develop simple but powerful rule-based NLP systems by taking advantage of a common UIMA AS based pipeline for classification. Our proposed methodology coupled with the general-purpose solution provides researchers with access to the data locked in clinical text in cases of limited human resources and compact timelines.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms*
  • Artificial Intelligence*
  • Data Mining / methods*
  • Documentation / classification*
  • Electronic Health Records / classification
  • Natural Language Processing*
  • Software*
  • Vocabulary, Controlled*