A comparative analysis of Indian sign language recognition using deep learning models
Abstract
Sign language is a form of communication where people use bodily gestures, particularly those of hands and arms. This method of communication is put into motion when spoken communication is unattainable or disfavored. There are very few people who can translate sign language and readily understand them. It would be convenient for the hearing-impaired to have a platform where their sign language could be translated easily. Hence, through this study, with the help of artificial neural networks, we wish to compare how various widely implemented deep learning architectures respond to faultless translation of Indian sign language for the native audience. This research would streamline the development of software tools that can accurately predict or translate ISL. For the purpose of understanding the method of training the machine and exploring our model’s performance without any optimizations, a Convolutional Neural Network architecture was implemented. Over the course of our research, there have been several Pre-trained Transfer Learning Models implemented that have yielded promising results. The research aims to contrast how various convolutional neural networks perform while translating Indian Sign Actions on a custom dataset that factors in illumination, angles, and different backgrounds to provide a balanced and distinctive set of images. The goal of this study is to make clear comparisons between the various deep learning frameworks. Hence, a fresh Indian sign language dataset is introduced. Since every dataset in the field of deep learning has special properties that may be utilized for the betterment of the existing models, the development of a fresh dataset could be viewed as a development in the field. The optimum model for our task: classification of these gestures is found to be ResNet-50 (Accuracy = 98.25% and F1-score = 99.34%), and the least favorable was InceptionNet V3 (Accuracy = 66.75%, and F1-score = 70.89%).
References
Abiyev RH, Arslan M, and Idoko J (2020) Sign language translation using deep convolutional neural networks. KSII Transactions on Internet and Information Systems 14(2). DOI: 10.3837/tiis.2020.02.009
Adeyanju IA, Bello OO, and Adegboye MA (2021) Machine learning methods for sign language recognition: A critical review and analysis. Intelligent Systems with Applications 12: 200056. DOI: 10.1016/j.iswa.2021.200056
Adithya V and Rajesh R (2020) A deep convolutional neural network approach for static hand gesture recognition. Procedia Computer Science 171: 2353–2361. DOI: 10.1016/j.procs.2020.04.255
Alzubaidi L, Zhang J, Humaidi AJ, et al. (2021) Review of deep learning: Concepts, CNN architectures, challenges, applications, future directions. Journal of Big Data 8(53). DOI: 10.1186/s40537-021-00444-8
Arikeri P (2021) Indian Sign Language (ISL) [online]. Available at: https://www.kaggle.com/datasets/prathumarikeri/indian-sign-language-isl
Bansal D, Chhikara R, Khanna K, and Gupta P (2018) Comparative analysis of various machine learning algorithms for detecting dementia. Procedia Computer Science 132: 1497–1502. DOI: 10.1016/j.procs.2018.05.102
Barbhuiya AA, Karsh RK, and Jain R (2021) CNN based feature extraction and classification for sign language. Multimedia Tools and Applications 80(2): 3051–3069. DOI: 10.1007/s11042-020-09829-y
Chollet F (2017) Xception: Deep learning with depthwise separable convolutions. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, USA, 21–26 July 2017, pp.1800–1807. New York: IEEE.
Correll R (2022) Challenges That Still Exist for the Deaf Community [online]. Available at: https://www.verywellhealth.com/what-challenges-still-exist-for-the-deaf-community-4153447 (Accessed: 28 July 2022).
Deshmukh D (1997) Sign Language and Bilingualism in Deaf Education [online]. Ichalkaranji: Hunda Infotech. Available at: https://bilingualism.in/
Dong K, Zhou C, Ruan Y, and Li Y (2020) MobileNet V2 model for image classification. In: 2020 2nd International Conference on Information Technology and Computer Application (ITCA), Guangzhou, China, 18–20 December 2020, pp.476–480. New York: IEEE.
Dumbre A (2022) Indian Sign Language (ISLRTC referred) [online]. Available at: https://www.kaggle.com/datasets/atharvadumbre/indian-sign-language-islrtc-referred
Dutta S, Manideep BCS, Rai S, and Vijayarajan V (2017) A comparative study of deep learning models for medical image classification. IOP Conference Series: Materials Science and Engineering 263(4): 042097. DOI: 10.1088/1757-899x/263/4/042097
Ebrahimi MS and Abadi HK (2021) Study of residual networks for image recognition. In: Arai K (ed.) Intelligent Computing. Berlin: Springer, pp.754–763.
Elakkiya R and Natarajan B (2021) ISL-CSLTR: Indian sign language dataset for continuous sign language translation and recognition. Mendeley Data 1. DOI: 10.17632/kcmpdxky7p.1
Ethnologue (2022) Sign Language [online]. Available at: https://www.ethnologue.com/subgroup/2/
Fabien M (2019) XCeption Model and Depthwise Separable Convolutions [online]. Available at: https://maelfabien.github.io/deeplearning/xception/#ii-in-keras (Accessed: 31 August 2022).
Fable (2022) What Is Sign Language? [online]. Available at: https://makeitfable.com/glossary-term/sign-language/
Garcia B and Viesca SA (2016) Real-time American sign language recognition with convolutional neural networks. Convolutional Neural Networks for Visual Recognition 2: 225–232.
Google Cloud (2023) Advanced Guide to Inception V3 [online]. Available at: https://cloud.google.com/tpu/docs/inception-v3-advanced
Goutte C and Gaussier E (2005) A probabilistic interpretation of precision, recall and F-score, with implication for evaluation. In: Losada DE and Fernández-Luna JM (eds.) ECIR 2005: Advances in Information Retrieval. Berlin: Springer Berlin Heidelberg, pp.345–359.
Goyal C (2021) 20 Questions to Test Your Skills on CNN (Convolutional Neural Networks) [online]. Available at: https://www.analyticsvidhya.com/blog/2021/05/20-questions-to-test-your-skills-on-cnn-convolutional-neural-networks/ (Accessed: 18 August 2022).
Gu J, Wang Z, Kuen J, et al. (2018) Recent advances in convolutional neural networks. Pattern Recognition 77: 354–377. DOI: 10.1016/j.patcog.2017.10.013
He K, Zhang X, Ren S, and Sun J (2016a) Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, USA, 27–30 June 2016, pp.770–778. New York: IEEE.
He K, Zhang X, Ren S, and Sun J (2016b) Identity mappings in deep residual networks. In: Leibe B, Matas J, Sebe N, and Welling M (eds.) Computer Version—ECCV 2016. Berlin: Springer, pp.630–645.
Heath N (2020) What is Machine Learning? Everything You Need to Know [online]. Available at: https://www.zdnet.com/article/what-is-machine-learning-everything-you-need-to-know (Accessed: 23 August 2022).
Hossin M and Sulaiman MN (2015) A review on evaluation metrics for data classification evaluations. International Journal of Data Mining & Knowledge Management Process 5(2): 1–11. DOI: 10.5121/ijdkp.2015.5201
Howard AG, Zhu M, Chen B, et al. (2017) MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications [online]. Available at: https://arxiv.org/abs/1704.04861
Huang G, Liu Z, Van Der Maaten L, and Weinberger KQ (2017) Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, USA, 21–26 July 2017, pp.2261–2269. New York: IEEE.
Hussin S, Elashek K, and Yildrim R (2019) Convolutional neural network baseline modelbuilding for person re-identification. In: Saritas I, Cunkas M, and Basciftci F (eds.) Conference: International Conference on Engineering Technologies (ICENTE’19), Konya, Turkey, 25–27 October 2019, pp.53–57. Meram: SN Bilgi Teknolojileri.
Khan Z (2017) How do We Speak with the 18 Million Indians Who Are Deaf? [online] Available at: https://www.wionews.com/south-asia/how-do-we-speak-with-the-18-million-indians-who-are-deaf-18835 (Accessed: 3 August 2022).
Khanna M (2021) Paper Review: DenseNet-Densely Connected Convolutional Networks [online]. Available at: https://towardsdatascience.com/paper-review-densenet-densely-connected-convolutional-networks-acf9065dfefb (Accessed: 29 July 2022).
Kumar K (2022) Indian Sign Language Dataset [online]. Available at: https://www.kaggle.com/datasets/kshitij192/isl-dataset
Le K (2021) An Overview of VGG16 and NiN Models [online]. Available at: https://medium.com/mlearning-ai/an-overview-of-vgg16-and-nin-models-96e4bf398484 (Accessed: 22 June 2022).
Mandke K and Chandekar P (2019) Deaf education in India. In: Knoors H, Brons M, and Marschark M (eds.) Deaf Education Beyond the Western World: Context, Challenges, and Prospects, Perspectives on Deafness. Oxford: Oxford University Press, pp.261–284.
Mascarenhas S and Agarwal M (2021) A comparison between VGG16, VGG19 and ResNet50 architecture frameworks for image classification. In: 2021 International Conference on Disruptive Technologies for Multi-Disciplinary Research and Applications (CENTCON), Bengaluru, India, 19–21 November 2021, pp.96–99. New York: IEEE.
MathWorks UK (2023) VGG-19 Convolutional Neural Network [online]. Available at: https://uk.mathworks.com/help/deeplearning/ref/vgg19.html
Metev SM and Veiko VP (1998) Laser-Assisted Microtechnology. Berlin: Springer Berlin Heidelberg.
McNeely-White DG, Beveridge JR, and Draper BA (2019) Inception and ResNet: Same training, same features. In: Samsonovich AV (ed.) Biologically Inspired Cognitive Architectures. Berlin: Springer, pp.352–357.
Mikołajczyk A and Grochowski M (2018) Data augmentation for improving deep learning in image classification problem. In: 2018 International Interdisciplinary PhD Workshop (IIPhDW), Świnoujście, Poland, 9–12 May 2018, pp.117–122. New York: IEEE.
Mitter S (2017) This Country Is Developing Its Own Sign Language Dictionary [online]. Available at: https://mashable.com/article/india-sign-language-dictionary (Accessed: 23 August 2022).
Pigou L, Dieleman S, Kindermans P, and Schrauwen B (2014) Sign language recognition using convolutional neural networks. In: Agapito L, Bronstein MM, and Rother C (eds.) Computer Vision—ECCV 2014 Workshops. Berlin: Springer, pp.572–578.
Powers DMW (2011) Evaluation: From precision, recall and F-measure to ROC, informedness, markedness and correlation. Journal of Machine Learning Technologies 2(1): 37–63.
Rajalakshmi E, Elakkiya R, Prikhodko AL, et al. (2022) Static and dynamic isolated Indian and Russian sign language recognition with spatial and temporal feature detection using hybrid neural network. ACM Transactions on Asian and Low-Resource Language Information Processing 22(1): 1–23. DOI: 10.1145/3530989
Rathi P, Gupta RK, Agarwal S, and Shukla A (2020) Sign language recognition using ResNet50 deep neural network architecture. Social Science Research Network. DOI: 10.2139/ssrn.3545064
Ribani R and Marengoni M (2019) A survey of transfer learning for convolutional neural networks. In: 2019 32nd SIBGRAPI Conference on Graphics, Patterns and Images Tutorials (SIBGRAPI-T), Rio de Janeiro, Brazil, 28–31 October 2019, pp.47–57. New York: IEEE.
Sandler M, Howard AW, Zhu M, et al. (2018) MobileNet V2: Inverted residuals and linear bottlenecks. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake, USA, 18–23 June In: 2018, pp.4510–4520. New York: IEEE.
Sec I (2021) VGG-19 convolutional neural network. In: Machine Learning Blog. Available at: https://blog.techcraft.org/vgg-19-convolutional-neural-network/ (Accessed: 22 June 2022).
Shafiq M and Gu Z (2022) Deep residual learning for image recognition: A survey. Applied Sciences 12(18): 8972. DOI: 10.3390/app12188972
Sharma P and Anand RS (2021) A comprehensive evaluation of deep models and optimizers for Indian sign language recognition. Graphics and Visual Computing 5: 200032. DOI: 10.1016/j.gvc.2021.200032
Sharma S and Singh S (2021) Vision-based hand gesture recognition using deep learning for the interpretation of sign language. Expert Systems with Applications 182: 115657. DOI: 10.1016/j.eswa.2021.115657
Shorten C and Khoshgoftaar TM (2019) A survey on image data augmentation for deep learning. Journal of Big Data 6(1). DOI: 10.1186/s40537-019-0197-0
Simonyan K and Zisserman A (2014) Very Deep Convolutional Networks for Large-scale Image Recognition [online]. Available at: https://arxiv.org/abs/1409.1556
Smeda K (2019) Understand the Architecture of CNN [online]. Available at: https://towardsdatascience.com/understand-the-architecture-of-cnn-90a25e244c7 (Accessed: 23 June 2022).
Sonawane V (2020) Indian Sign Language Dataset [online]. Available at: https://www.kaggle.com/datasets/vaishnaviasonawane/indian-sign-language-dataset
Szegedy C, Vanhoucke V, Ioffe S, et al. (2016) Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, USA, 27–30 June 2016, pp.2818–2826. New York: IEEE.
Tammina S (2019) Transfer learning using VGG-16 with deep convolutional neural network for classifying images. International Journal of Scientific and Research Publications 9(10): 9420. DOI: 10.29322/ijsrp.9.10.2019.p9420
Vasishta M, Woodward JC, and Wilson KL (1978) Sign language in India: Regional variation within the deaf population. Indian Journal of Applied Linguistics 4(2): 66–74.
Wikipedia (2023) History of Sign Language [online]. Available at: https://en.wikipedia.org/wiki/History_of_sign_language
World Health Organization (2019) WHO-ITU Standard Aims to Prevent Hearing Loss among 1.1 Billion Young People [online]. Available at: https://www.who.int/news/item/12-02-2019-new-who-itu-standard-aims-to-prevent-hearing-loss-among-1.1-billion-young-people (Accessed: 19 August 2022).
World Health Organization (2023) Deafness and Hearing Loss [online]. Available at: https://www.who.int/news-room/fact-sheets/detail/deafness-and-hearing-loss
Yamashita R, Nishio M, Do RKG, et al. (2018) Convolutional neural networks: An overview and application in radiology. Insights into Imaging 9(4): 611–629. DOI: 10.1007/s13244-018-0639-9
Yao G, Lei T, and Zhong J (2019) A review of convolutional-neural-network-based action recognition. Pattern Recognition Letters 118: 14–22. DOI: 10.1016/j.patrec.2018.05.018
Copyright (c) 2023 author(s)
This work is licensed under a Creative Commons Attribution 4.0 International License.
The author(s) warrant that permission to publish the article has not been previously assigned elsewhere.
Author(s) shall retain the copyright of their work and grant the Journal/Publisher right for the first publication with the work simultaneously licensed under:
OA - Creative Commons Attribution License (CC BY 4.0). This license allows for the copying, distribution and transmission of the work, provided the correct attribution of the original creator is stated. Adaptation and remixing are also permitted.
This broad license intends to facilitate free access to, as well as the unrestricted reuse of, original works of all types.