Sign language recognition based on dual-path background erasure convolutional neural network

Junming Zhang; Xiaolong Bu; Yushuai Wang; Hao Dong; Yu Zhang; Haitao Wu

doi:10.1038/s41598-024-62008-z

Sign language recognition based on dual-path background erasure convolutional neural network

Sci Rep. 2024 May 18;14(1):11360. doi: 10.1038/s41598-024-62008-z.

Authors

Junming Zhang^{1

2}, Xiaolong Bu^{1

2}, Yushuai Wang^{1

2

3}, Hao Dong^{1

2

3}, Yu Zhang^{1

2}, Haitao Wu^{4

5}

Affiliations

¹ School of Computer and Artificial Intelligence, Huanghuai University, Zhumadian, 463000, Henan Province, China.
² Key Laboratory of Intelligent Lighting, Henan Province, Zhumadian, 463000, China.
³ School of Computer Science, Zhongyuan University of Technology, Xinzheng, 450007, Henan, China.
⁴ School of Computer and Artificial Intelligence, Huanghuai University, Zhumadian, 463000, Henan Province, China. whtstu@163.com.
⁵ Key Laboratory of Intelligent Lighting, Henan Province, Zhumadian, 463000, China. whtstu@163.com.

Abstract

Sign language is an important way to provide expression information to people with hearing and speaking disabilities. Therefore, sign language recognition has always been a very important research topic. However, many sign language recognition systems currently require complex deep models and rely on expensive sensors, which limits the application scenarios of sign language recognition. To address this issue, based on computer vision, this study proposed a lightweight, dual-path background erasing deep convolutional neural network (DPCNN) model for sign language recognition. The DPCNN consists of two paths. One path is used to learn the overall features, while the other path learns the background features. The background features are gradually subtracted from the overall features to obtain an effective representation of hand features. Then, these features are flatten into a one-dimensional layer, and pass through a fully connected layer with an output unit of 128. Finally, use a fully connected layer with an output unit of 24 as the output layer. Based on the ASL Finger Spelling dataset, the total accuracy and Macro-F1 scores of the proposed method is 99.52% and 0.997, respectively. More importantly, the proposed method can be applied to small terminals, thereby improving the application scenarios of sign language recognition. Through experimental comparison, the dual path background erasure network model proposed in this paper has better generalization ability.