Abstract
The deep learning technique uses speech recognition in many different applications, including voice assistants, voice authentication, audio transcriptions, etc. Children who are dyslexic, blind persons and those with impairments can all benefit from spoken digit recognition. The goal of this paper is to create spoken digit recognition for the categorization of digits from 0 to 9 utilizing Convolution Neural Networks (CNN) and Long Short -Term Memory neural networks. With the addition of autoencoders, the performance of the CNN model is assessed. Finally, a comparative analysis is performed on the performances of the models based on the performance metrics.
References
- A. Khemani, “Spoken Digit Recognition (Speech Recognition) ",http:// cs230.stanford.edu/projects_fall_2020/ reports/55617928.pdf
- P. Sarma, “Automatic Spoken Digit Recognition Using Artificial Neural Network”, International Journal Of Scientific & Technology Research Volume 8, Issue 12, Dec 2019.
- Riffat Sharmin, et al., “Bengali Spoken Digit Classification: A Deep Learning Approach Using Convolutional Neural Network”, Elsevier, Science direct, Procedia Computer Science 171 (2020) 1381-1388.
- Amirhossein Tavanaei, et al., “Support Vector Data Description for Spoken Digit Recognition”, https:// www.scitepress.org/ papers/2012/37644/37644.pdf
- S. Imani, P. Sarma, and K. Samudravijaya. "Automatic Identification of Native Language from Spoken English." Proceedings in FRSM 2019, Kanpur, India, July 6-7, 2019
- H. Xie, Li Zhang, C. P. Lim, “Evolving CNN-LSTM Models for Time Series Prediction,” IEEE Access, vol. 8, p. 161519 – 161541, Sep. 2020
- P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, P. Manzagol, “Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion,” Journal of Machine Learning Research, vol. 11, p. 3371-3408, 2010.
- Jane Oruh and Serestina Viriri, “Deep Learning-Based Classification of Spoken English Digits”, Journal of Computational Intelligence and Neuroscience, 2022 Sep 28. doi: 10.1155/2022/3364141
- G. Hinton, L. Deng, D. Yu et al., “Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups,” IEEE Signal Processing Magazine, vol. 29, no. 6, pp. 82–97, 2012.
- M. M. Saleem, Deep Learning for Speech Classification and Speaker Recognition, The University of Texas at Dallas, Richardson, TX, USA, 2014.
