Abstract
The computer system is developing the model for speech synthesis of various aspects for natural language processing. The speech synthesis explores by articulatory, formant and concatenate synthesis. These techniques lead more aperiodic distortion and give exponentially increasing error rate during process of the system. Recently, advances on speech synthesis are tremendously moves towards deep learning process in order to achieve better performance. Due to leverage of large scale data gives effective feature representations to speech synthesis. The main objective of this research article is that implements deep learning techniques into speech synthesis and compares the performance in terms of aperiodic distortion with prior model of algorithms in natural language processing.
References
- Zen, H “A Deep Mixture density networks for acoustics modeling in statistical parameteric speech synthesis” In proceedings of the 39th IEEE International Conference on Acoustics, Speech and Signal Processing, Florence, Italy, 4-9 May 2014;pp. 3844-4848
- Li, R.N et al “ Multi task learning of structured output layer bidirectional LSTMs for speech synthesis” In proceedings of the 42nd IEEE International Conference on Acoustics, speech and signal processing, New Orleans, LA, USA, 5-9 March 2017; pp. 5510-5514.
- Suman K. Saksamudre, P.P. Shrishrimal, R.R. Deshmukh, A Review on Different Approaches for Speech Recognition System, International Journal of Computer Applications (0975 8887) Volume 115 No. 22, April 2015.
- Seltzer, et al “ Multi-task learning in deep neural networks for improved phoneme recognition” in proceedings of the 38th IEEE international conference on Acoustics, speech and signal processing, Vancouver, BC, Canada, 26-31 May 2013;pp.6965-6969.
- Prashant G. Desai , Saroja Devi H ,Niranjan N. Chiplumkar, “A Template Based Algorithm for Automatic Summarization and Dialogue Management for Text Documents”, Proceedings of International Journal of Research in Engineering and Technology, Vol. 04 Issue: 11, pp334-340, 2015.
- Archana Garg, Vishal Gupta and Manish Jindal, “A Survey of Language Identification Techniques and Applications”, Journal of Emerging Technologies in Web Intelligence, Vol. 6, No. 4, pp. 388-400, November 2014.
- Vishal Gupta, "A Survey of Natural Language Processing Techniques", International Journal of Computer Science & Engineering Technology, pp.14-16, Vol. 5 No. 01, 2014.
- Tokuda et al “ Speech parameter generation algorithms for HMM based speech synthesis” in proceedings of the 2000 IEEE International Conference on Acoustics,Speech and Signal Processing, Istanbut, Turkey, 5-9 June 2000, Volume 3, pp. 1315-1318.
- Yang, J.A. et al. “Deep Learning theory and its application to speech recognition” Commun.Countermeas. 2014,33, 1-5
- Zen H, et al. “ The HMM based speech synthesis system version 2.0” In proceeding of the ISCA Workshop on speech synthesis, Bonn, Germany, 22-24 August 2007; pp. 294 – 299.
- Zen, H “Acoustic modeling in statistical parametric speech synthesis from HMM to LSTM-RNN’ In proceedings of the first international workshop on Machine learning in Spoken Language processing, Aizu, Japan, 19-20, September 2105.
- “HMM/DNN based Speech synthesis system (HTS)” Available online at http://hts.sp.nitech.ac.jp/.
- Heiga Zen “ Deep mixture density networks for acoustic modelling in statistical parametric speech synthesis” In proceeding of ICASSP,, May 2014.
- Eliyahu Kiperwasser et al “Simple and Accurate Dependency Parsing using Bidirectional LSTM feature Representatios” Transaction of the Association for Computational Linguistics Vol. 4, pp. 313-327, 2016.
- Zhou Yu et al. “Using Bidirectional LSTM Recurrent Neural Networks to Learn High Level Abstractions of Sequential Features for Automated Scoring of Non-Native Spontaneous Speech” published in IEEE workshop on Automatic Speech Recognition and understanding, Dec 2015 DOI: 10.1109/ASRU.2015.7404814.
- Bo Fan, “A Deep bidirectional LSTM approach for video-realistic talking head” Published in Springer, Multimed Toos Appl, Sep 2015 DOI: 10.1007/s11042-015-2944-3.
- Heiga Zen et al “Fast, Compact, and High Quality LSTM-RNN Based Statistical Parametric Speech Synthesizers for Mobile Devices” In Proceeding Inter speech, San Francisco, CA, USA (2016), pp. 2273-2277.
- Bo Fan “Photo Real Talking Head with Deep Bidirectional LSTM” Published in conference proceeding ICASSP, April 2015, DOI: 10.1109/ICASSP.2015.7178899.
- Yuchen Fan “ TTS Synthesis with Bidirectional LSTM based Recurrent Neural Networks” Published in Fifteenth Annual Conference of the International Speech Communication, Jan 2014.
- Santiago Pascual et al “Multi-output RNN-LSTM for multiple speaker speech synthesis and adaptation” published in 24th European Signal Processing Conference (EUSIPCO), 2016.
- Naihan Li et al “Neural Speech Synthesis with Transformer Network” Published in The Thirty-Third AAAI Conference on Artificial Intelligence (AAAI-19),
- Tom Young et al “Recent Trends in Deep Learning Based Natural Language Processing” published in Proceedings of the AAAI Conference on Artificial Intelligence 33:6706-6713, July 2019. DOI: 10.1609/aaai.v33i01.33016706.
- M. S. Al-Radhi, T. Gábor Csapó and G. Németh, "RNN-based speech synthesis using a continuous sinusoidal model," 2019 International Joint Conference on Neural Networks (IJCNN), Budapest, Hungary, 2019, pp. 1-8, doi: 10.1109/IJCNN.2019.8852253.
