Integration of Sentiment Analysis and Speech Text Processing in Phonetic Flow System
PDF
PDF

How to Cite

Vandana, Ritik Rana, Akhilendra Khare, and Subash Harizan. 2025. “Integration of Sentiment Analysis and Speech Text Processing in Phonetic Flow System”. Journal of Ubiquitous Computing and Communication Technologies 7 (4): 327-38. https://doi.org/10.36548/jucct.2025.4.001.

Keywords

— HCI
— STT
— TTS
— Bi-LSTM.
Published: 04-11-2025

Abstract

Human-computer interaction (HCI) applications increasingly rely on reading speech, understanding emotional context, and generating natural language. The heterogeneous set of approaches the existing solutions use for sentiment analysis, speech synthesis, and speech recognition results in an unbalanced user experience. Designing an integrated system that can perform speech-to-text (STT) and text-to-speech (TTS) processing, sentiment analysis of input text, and neighboring aware speech generation is the problem this paper attempts to solve. To identify complexity like sarcasm and negation, Bi – LSTM is selected because it can learn context from context words (previous and next words in a sentence). In spite of data sparsity conditions, GloVe embeddings improve model generalisation by offering deep semantic understanding from large corpora. Following experimental verification, our Bi-LSTM with GloVe embeddings achieves 90% sentiment classification accuracy that is 7-10% higher relative to standard baselines like SVM (82%) and Naïve Bayes (75%). With true positive values above 88%, the model achieves well-balanced performance on the positive, neutral, and negative classes. Due to its low latency and about 87% accuracy during live testing, the system is an excellent option for interactive systems. All these features are amalgamated in our Phonetic Flow System, which enhances them to develop an extensible system that supports quicker, more natural, and emotionally intelligent human-machine interaction.

References

  1. J. Patel, “Twitter Entity Sentiment Analysis,” Kaggle, 2020.[Online]. Available: https://www.kaggle.com/datasets/jp79 7498e/twitter-entity-sentiment- analysis[Accessed on 22-11-2024]
  2. Pennington, Jeffrey, Richard Socher, and Christopher D. Manning. "Glove: Global vectors for word representation." In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), 2014, 1532-1543.
  3. Graves, Alex, and Jürgen Schmidhuber. "Framewise phoneme classification with bidirectional LSTM and other neural network architectures." Neural networks 18, no. 5-6 (2005): 602-610
  4. https://pypi.org/project/SpeechRecogn%20ition/
  5. McKinney, Wes. "Data structures for statistical computing in Python." scipy 445, no. 1 (2010): 51-56.
  6. https://pypi.org/project/pyttsx3/
  7. https://pypi.org/project/gTTS
  8. PyTorch, “An open source machine learning framework.”[Online]. Available: https://pytorch.org
  9. Harris, Charles R., K. Jarrod Millman, Stéfan J. Van Der Walt, Ralf Gommers, Pauli Virtanen, David Cournapeau, Eric Wieser et al. "Array programming with NumPy." nature 585, no. 7825 (2020): 357-362.
  10. Hugging Face, “Transformers: State-of-the-art Natural Language Processing.”[Online]. Available: https://huggingface.co/transformers
  11. Wang, Yuxuan, R. J. Skerry-Ryan, Daisy Stanton, Yonghui Wu, Ron J. Weiss, Navdeep Jaitly, Zongheng Yang et al. "Tacotron: Towards end-to-end speech synthesis." arXiv preprint arXiv:1703.10135 (2017).
  12. Oord, Aaron van den, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal Kalchbrenner, Andrew Senior, and Koray Kavukcuoglu. "Wavenet: A generative model for raw audio." arXiv preprint arXiv:1609.03499 (2016).
  13. Ren, Yi, Yangjun Ruan, Xu Tan, Tao Qin, Sheng Zhao, Zhou Zhao, and Tie-Yan Liu. "Fastspeech: Fast, robust and controllable text to speech." Advances in neural information processing systems 32 (2019).
  14. Devlin, Jacob, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. "Bert: Pre-training of deep bidirectional transformers for language understanding." In Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers), 2019, 4171-4186.
  15. Bird, Steven, Ewan Klein, and Edward Loper. Natural language processing with Python: analyzing text with the natural language toolkit. " O'Reilly Media, Inc.", 2009.
  16. https://code.visualstudio.com
  17. Juhttps://jupyter.org
  18. https://flask.palletsprojects.com
  19. https://docs.python.org/3.8/