Building a common pipeline for rule-based document classification

Olga V Patterson; Thomas Ginter; Scott L DuVall

Building a common pipeline for rule-based document classification

Stud Health Technol Inform. 2013:192:1211.

Authors

Olga V Patterson¹, Thomas Ginter, Scott L DuVall

Affiliation

¹ VA Salt Lake City Health Care System, Salt Lake City, UT, USA.

PMID: 23920985

Abstract

Instance-based classification of clinical text is a widely used natural language processing task employed as a step for patient classification, document retrieval, or information extraction. Rule-based approaches rely on concept identification and context analysis in order to determine the appropriate class. We propose a five-step process that enables even small research teams to develop simple but powerful rule-based NLP systems by taking advantage of a common UIMA AS based pipeline for classification. Our proposed methodology coupled with the general-purpose solution provides researchers with access to the data locked in clinical text in cases of limited human resources and compact timelines.

Publication types

Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

Algorithms*
Artificial Intelligence*
Data Mining / methods*
Documentation / classification*
Electronic Health Records / classification
Natural Language Processing*
Software*
Vocabulary, Controlled*