Empowering Communication: Speech Technology for Indian and Western Accents through AI-powered Speech Synthesis

Vinotha R.; Hepsiba D.; Vijay Anand L D.

doi:10.36548/jtcsst.2025.1.003

Empowering Communication: Speech Technology for Indian and Western Accents through AI-powered Speech Synthesis

Vinotha R. , Hepsiba D., Vijay Anand L D.

Open Access

Volume - 7 • Issue - 1 • march 2025

https://doi.org/10.36548/jtcsst.2025.1.003

29-52 566 PDF

Abstract

Neural Text-to-speech (TTS) synthesis is a powerful technology that can generate speech using neural networks. One of the most remarkable features of TTS synthesis is its capability to produce speech in the voice of different speakers. This research builds upon the foundation of neural TTS synthesis, particularly focusing on voice cloning and speech synthesis capabilities for Indian accents. This stands in contrast to most existing systems, which are predominantly trained on Western accents. First, an LSTM-based speaker verification system identifies distinctive speaker traits. Next, a synthesizer, acting as a sequence-to-sequence model, translates text into Mel spectrograms representing speech acoustics. A WaveRNN vocoder transforms these spectrograms into corresponding audio waveforms. Finally, noise reduction algorithms refine the generated speech for enhanced clarity and naturalness. This system significantly enhanced its cloning process by training on a diverse multi-accent dataset (with 80% Indian accent). The improvement is attributed to the model being exposed to 600 hours of speech signals, encompassing the voices of 3000 speakers. This research offers an open-source Python package specifically designed for professionals seeking to integrate voice cloning and speech synthesis capabilities into their devices. This package aims to generate synthetic speech that sounds like the natural voice of an individual, but it does not replace the natural human voice.

Cite this article

Chicago APA MLA Vancouver IEEE Harvard BibTeX

R., Vinotha, Hepsiba D., and Vijay Anand L D.. "Empowering Communication: Speech Technology for Indian and Western Accents through AI-powered Speech Synthesis." Journal of Trends in Computer Science and Smart Technology 7, no. 1 (2025): 29-52. doi: 10.36548/jtcsst.2025.1.003

Copy Citation

R., V., D., H., & D., V. A. L. (2025). Empowering Communication: Speech Technology for Indian and Western Accents through AI-powered Speech Synthesis. Journal of Trends in Computer Science and Smart Technology, 7(1), 29-52. https://doi.org/10.36548/jtcsst.2025.1.003

Copy Citation

R., Vinotha, et al. "Empowering Communication: Speech Technology for Indian and Western Accents through AI-powered Speech Synthesis." Journal of Trends in Computer Science and Smart Technology, vol. 7, no. 1, 2025, pp. 29-52. DOI: 10.36548/jtcsst.2025.1.003.

Copy Citation

R. V, D. H, D. VAL. Empowering Communication: Speech Technology for Indian and Western Accents through AI-powered Speech Synthesis. Journal of Trends in Computer Science and Smart Technology. 2025;7(1):29-52. doi: 10.36548/jtcsst.2025.1.003

Copy Citation

V. R., H. D., and V. A. L. D., "Empowering Communication: Speech Technology for Indian and Western Accents through AI-powered Speech Synthesis," Journal of Trends in Computer Science and Smart Technology, vol. 7, no. 1, pp. 29-52, Mar. 2025, doi: 10.36548/jtcsst.2025.1.003.

Copy Citation

R., V., D., H. and D., V.A.L. (2025) 'Empowering Communication: Speech Technology for Indian and Western Accents through AI-powered Speech Synthesis', Journal of Trends in Computer Science and Smart Technology, vol. 7, no. 1, pp. 29-52. Available at: https://doi.org/10.36548/jtcsst.2025.1.003.

Copy Citation

@article{r.2025,
  author    = {Vinotha R. and Hepsiba D. and Vijay Anand L D.},
  title     = {{Empowering Communication: Speech Technology for Indian and Western Accents through AI-powered Speech Synthesis}},
  journal   = {Journal of Trends in Computer Science and Smart Technology},
  volume    = {7},
  number    = {1},
  pages     = {29-52},
  year      = {2025},
  publisher = {IRO Journals},
  doi       = {10.36548/jtcsst.2025.1.003},
  url       = {https://doi.org/10.36548/jtcsst.2025.1.003}
}

Copy Citation

Keywords

Speech Synthesis Voice Cloning Speaker Characteristics Mean Opinion Score (MOS) Speech Disorders

Category	Fee
Article Access Charge	30 USD
Article Processing Charge	400 USD
Annual Subscription Fee	200 USD

Empowering Communication: Speech Technology for Indian and Western Accents through AI-powered Speech Synthesis

Vinotha R.

Published

24 April, 2025

e-ISSN: 2582-4104
4 issues per year
DOI: https://doi.org/10.36548/jtcsst

Indexing
Scopus | GoogleScholar | Crossref | MicrosoftAcademic | ScienceGate | J-Gate

Publisher

Inventive Research Organization

Open Access Journal

Empowering Communication: Speech Technology for Indian and Western Accents through AI-powered Speech Synthesis

Vinotha R.

Published

24 April, 2025

e-ISSN: 2582-4104 4 issues per year DOI: https://doi.org/10.36548/jtcsst

Indexing Scopus | GoogleScholar | Crossref | MicrosoftAcademic | ScienceGate | J-Gate

Publisher Inventive Research Organization

Open Access Journal

e-ISSN: 2582-4104
4 issues per year
DOI: https://doi.org/10.36548/jtcsst

Indexing
Scopus | GoogleScholar | Crossref | MicrosoftAcademic | ScienceGate | J-Gate

Publisher

Inventive Research Organization