Evaluating ChatGPT text mining of clinical records for companion animal obesity monitoring

Vet Rec. 2024 Feb 3;194(3):e3669. doi: 10.1002/vetr.3669. Epub 2023 Dec 6.

Abstract

Background: Veterinary clinical narratives remain a largely untapped resource for addressing complex diseases. Here we compare the ability of a large language model (ChatGPT) and a previously developed regular expression (RegexT) to identify overweight body condition scores (BCS) in veterinary narratives pertaining to companion animals.

Methods: BCS values were extracted from 4415 anonymised clinical narratives using either RegexT or by appending the narrative to a prompt sent to ChatGPT, prompting the model to return the BCS information. Data were manually reviewed for comparison.

Results: The precision of RegexT was higher (100%, 95% confidence interval [CI] 94.81%-100%) than that of ChatGPT (89.3%, 95% CI 82.75%-93.64%). However, the recall of ChatGPT (100%, 95% CI 96.18%-100%) was considerably higher than that of RegexT (72.6%, 95% CI 63.92%-79.94%).

Limitations: Prior anonymisation and subtle prompt engineering are needed to improve ChatGPT output.

Conclusions: Large language models create diverse opportunities and, while complex, present an intuitive interface to information. However, they require careful implementation to avoid unpredictable errors.

MeSH terms

  • Animals
  • Data Mining*
  • Language
  • Narration
  • Obesity / veterinary
  • Pets*