Registered multi-device/staining histology image dataset for domain-agnostic machine learning models

Sci Data. 2024 Apr 3;11(1):330. doi: 10.1038/s41597-024-03122-5.

Abstract

Variations in color and texture of histopathology images are caused by differences in staining conditions and imaging devices between hospitals. These biases decrease the robustness of machine learning models exposed to out-of-domain data. To address this issue, we introduce a comprehensive histopathology image dataset named PathoLogy Images of Scanners and Mobile phones (PLISM). The dataset consisted of 46 human tissue types stained using 13 hematoxylin and eosin conditions and captured using 13 imaging devices. Precisely aligned image patches from different domains allowed for an accurate evaluation of color and texture properties in each domain. Variation in PLISM was assessed and found to be significantly diverse across various domains, particularly between whole-slide images and smartphones. Furthermore, we assessed the improvement in domain shift using a convolutional neural network pre-trained on PLISM. PLISM is a valuable resource that facilitates the precise evaluation of domain shifts in digital pathology and makes significant contributions towards the development of robust machine learning models that can effectively address challenges of domain shift in histological image analysis.

Publication types

  • Dataset

MeSH terms

  • Eosine Yellowish-(YS)
  • Histological Techniques*
  • Histology
  • Humans
  • Image Processing, Computer-Assisted* / methods
  • Machine Learning*
  • Neural Networks, Computer*
  • Staining and Labeling*

Substances

  • Eosine Yellowish-(YS)