A comparison of deep transfer learning backbone architecture techniques for printed text detection of different font styles from unstructured documents

PeerJ Comput Sci. 2024 Feb 23:10:e1769. doi: 10.7717/peerj-cs.1769. eCollection 2024.

Abstract

Object detection methods based on deep learning have been used in a variety of sectors including banking, healthcare, e-governance, and academia. In recent years, there has been a lot of attention paid to research endeavors made towards text detection and recognition from different scenesor images of unstructured document processing. The article's novelty lies in the detailed discussion and implementation of the various transfer learning-based different backbone architectures for printed text recognition. In this research article, the authors compared the ResNet50, ResNet50V2, ResNet152V2, Inception, Xception, and VGG19 backbone architectures with preprocessing techniques as data resizing, normalization, and noise removal on a standard OCR Kaggle dataset. Further, the top three backbone architectures selected based on the accuracy achieved and then hyper parameter tunning has been performed to achieve more accurate results. Xception performed well compared with the ResNet, Inception, VGG19, MobileNet architectures by achieving high evaluation scores with accuracy (98.90%) and min loss (0.19). As per existing research in this domain, until now, transfer learning-based backbone architectures that have been used on printed or handwritten data recognition are not well represented in literature. We split the total dataset into 80 percent for training and 20 percent for testing purpose and then into different backbone architecture models with the same number of epochs, and found that the Xception architecture achieved higher accuracy than the others. In addition, the ResNet50V2 model gave us higher accuracy (96.92%) than the ResNet152V2 model (96.34%).

Keywords: Backbone architectures; Inception; OCR (Object Character Recognition); Object detection; Object recognition; Printed text recognition; ResNet50V2; Transfer learning; VGG19; Xception.

Grants and funding

This work was supported by the Analytical Center for the Government of Russian Federation, in 1 November 2021, under Grant 70-2021-00143 and Grant IGK 000000D730321P5Q0002. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.