Audio Tagging Using CNN Based Audio Neural Networks for Massive Data Processing
PDF

Keywords

Transfer learning
pretrained audio neural networks
audio pattern recognition
audio tagging
machine learning

How to Cite

Manoharan, J. Samuel. 2021. “Audio Tagging Using CNN Based Audio Neural Networks for Massive Data Processing”. Journal of Artificial Intelligence and Capsule Networks 3 (4): 365-74. https://doi.org/10.36548/jaicn.2021.4.008.

Abstract

Sound event detection, speech emotion classification, music classification, acoustic scene classification, audio tagging and several other audio pattern recognition applications are largely dependent on the growing machine learning technology. The audio pattern recognition issues are also addressed by neural networks in recent days. The existing systems operate within limited durations on specific datasets. Pretrained systems with large datasets in natural language processing and computer vision applications over the recent years perform well in several tasks. However, audio pattern recognition research with large-scale datasets is limited in the current scenario. In this paper, a large-scale audio dataset is used for training a pre-trained audio neural network. Several audio related tasks are performed by transferring this audio neural network. Several convolution neural networks are used for modeling the proposed audio neural network. The computational complexity and performance of this system are analyzed. The waveform and leg-mel spectrogram are used as input features in this architecture. During audio tagging, the proposed system outperforms the existing systems with a mean average of 0.45. The performance of the proposed model is demonstrated by applying the audio neural network to five specific audio pattern recognition tasks.

PDF

References

Verbitskiy, S., Berikov, V., & Vyshegorodtsev, V. (2021). Eranns: Efficient residual audio neural networks for audio pattern recognition. arXiv preprint arXiv:2106.01621.

Adam, E. E. B. (2020). Deep Learning based NLP Techniques In Text to Speech Synthesis for Communication Recognition. Journal of Soft Computing Paradigm (JSCP), 2(04), 209-215.

Xu, K., Zhu, B., Kong, Q., Mi, H., Ding, B., Wang, D., & Wang, H. (2019). General audio tagging with ensembling convolutional neural networks and statistical features. The Journal of the Acoustical Society of America, 145(6), EL521-EL527.

Rodrigo, W. U. D., H. U. W. Ratnayake, and I. A. Premaratne. "Identification of Music Instruments from a Music Audio File." In Proceedings of International Conference on Sustainable Expert Systems: ICSES 2020, vol. 176, p. 335. Springer Nature, 2021.

Dhaya, R. "Efficient Two Stage Identification for Face mask detection using Multiclass Deep Learning Approach." Journal of Ubiquitous Computing and Communication Technologies 3, no. 2 (2021): 107-121.

de Benito-Gorron, D., Lozano-Diez, A., Toledano, D. T., & Gonzalez-Rodriguez, J. (2019). Exploring convolutional, recurrent, and hybrid deep neural networks for speech and music detection in a large audio dataset. EURASIP Journal on Audio, Speech, and Music Processing, 2019(1), 1-18.

Manoharan, S. (2019). A smart image processing algorithm for text recognition, information extraction and vocalization for the visually challenged. Journal of Innovative Image Processing (JIIP), 1(01), 31-38.

Sankar, MS Arun, Tharak Sai Bobba, and PS Sathi Devi. "Stage Audio Classifier Using Artificial Neural Network." In International Conference on Communication, Computing and Electronics Systems, pp. 139-147. Springer, Singapore, 2020.

Nanni, L., Maguolo, G., Brahnam, S., & Paci, M. (2021). An ensemble of convolutional neural networks for audio classification. Applied Sciences, 11(13), 5796.

Chandy, A. (2019). A review on iot based medical imaging technology for healthcare applications. Journal of Innovative Image Processing (JIIP), 1(01), 51-60.

Adapa, S. (2019). Urban sound tagging using convolutional neural networks. arXiv preprint arXiv:1909.12699.

Hamdan, Yasir Babiker. "Construction of Statistical SVM based Recognition Model for Handwritten Character Recognition." Journal of Information Technology 3, no. 02 (2021): 92-107.

Zhu, B., Xu, K., Kong, Q., Wang, H., & Peng, Y. (2020). Audio tagging by cross filtering noisy labels. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 28, 2073-2083.

Duraipandian, M. (2020). Adaptive Algorithms for Signature Wavelet recognition in the Musical Sounds. Journal of Soft Computing Paradigm (JSCP), 2(02), 120-129.

Iqbal, T., Kong, Q., Plumbley, M., & Wang, W. (2018). Stacked convolutional neural networks for general-purpose audio tagging. DCASE2018 Challenge.

Vishva, R., P. Harish Annamalai, K. Raja Raman, B. Vijay, J. Rolant Gini, and M. E. Harikumar. "Automated Industrial Sound Power Alert System." In International Conference on Communication, Computing and Electronics Systems: Proceedings of ICCCES 2020, vol. 733, p. 175. Springer Nature, 2021.

Vinothkanna, M. R. (2019). A secure steganography creation algorithm for multiple file formats. Journal of Innovative Image Processing (JIIP), 1(01), 20-30.

Pamina, J., J. Beschi Raja, S. Sam Peter, S. Soundarya, S. Sathya Bama, and M. S. Sruthi. "Inferring Machine Learning Based Parameter Estimation for Telecom Churn Prediction." In International Conference On Computational Vision and Bio Inspired Computing, pp. 257-267. Springer, Cham, 2019.

Koszewski, D., & Kostek, B. (2020). Musical instrument tagging using data augmentation and effective noisy data processing. Journal of the Audio Engineering Society, 68(1/2), 57-65.

REDDY, M. R. (2020). IoT Based Air And Sound Pollution Monitioring System Using Machine Learning Algorithms. Journal of IoT in Social, Mobile, Analytics, and Cloud, 2(1), 13-25.

Lee, J., & Nam, J. (2017). Multi-level and multi-scale feature aggregation using pretrained convolutional neural networks for music auto-tagging. IEEE signal processing letters, 24(8), 1208-1212.

Narmadha, S., and V. Vijayakumar. "An Effective Imputation Model for Vehicle Traffic Data Using Stacked Denoise Autoencoder." In International Conference On Computational Vision and Bio Inspired Computing, pp. 71-78. Springer, Cham, 2019.

Adam, E. E. B., Babikir, E., & Sathesh, P. (2021). Survey on medical imaging of electrical impedance tomography (eit) by variable current pattern methods. Journal of ISMAC, 3(02), 82-95.

Wang, H. C., Syu, S. W., & Wongchaisuwat, P. (2021). A method of music autotagging based on audio and lyrics. Multimedia Tools and Applications, 80(10), 15511-15539.

Ranganathan, G. (2021). A Study to Find Facts Behind Preprocessing on Deep Learning Algorithms. Journal of Innovative Image Processing (JIIP), 3(01), 66-74.