Journal of Trends in Computer Science and Smart Technology is accepted for inclusion in Scopus. click here
Home / Archives / Volume-7 / Issue-1 / Article-3

Volume - 7 | Issue - 1 | march 2025

Empowering Communication: Speech Technology for Indian and Western Accents through AI-powered Speech Synthesis Open Access
Vinotha R.  , Hepsiba D., Vijay Anand L D.  231
Pages: 29-52
Cite this article
R., Vinotha, Hepsiba D., and Vijay Anand L D.. "Empowering Communication: Speech Technology for Indian and Western Accents through AI-powered Speech Synthesis." Journal of Trends in Computer Science and Smart Technology 7, no. 1 (2025): 29-52
Published
24 April, 2025
Abstract

Neural Text-to-speech (TTS) synthesis is a powerful technology that can generate speech using neural networks. One of the most remarkable features of TTS synthesis is its capability to produce speech in the voice of different speakers. This research builds upon the foundation of neural TTS synthesis, particularly focusing on voice cloning and speech synthesis capabilities for Indian accents. This stands in contrast to most existing systems, which are predominantly trained on Western accents. First, an LSTM-based speaker verification system identifies distinctive speaker traits. Next, a synthesizer, acting as a sequence-to-sequence model, translates text into Mel spectrograms representing speech acoustics. A WaveRNN vocoder transforms these spectrograms into corresponding audio waveforms. Finally, noise reduction algorithms refine the generated speech for enhanced clarity and naturalness. This system significantly enhanced its cloning process by training on a diverse multi-accent dataset (with 80% Indian accent). The improvement is attributed to the model being exposed to 600 hours of speech signals, encompassing the voices of 3000 speakers. This research offers an open-source Python package specifically designed for professionals seeking to integrate voice cloning and speech synthesis capabilities into their devices. This package aims to generate synthetic speech that sounds like the natural voice of an individual, but it does not replace the natural human voice.

Keywords

Speech Synthesis Voice Cloning Speaker Characteristics Mean Opinion Score (MOS) Speech Disorders

×
Article Processing Charges

Journal of Trends in Computer Science and Smart Technology (jtcsst) is an open access journal. When a paper is accepted for publication, authors are required to pay Article Processing Charges (APCs) to cover its editorial and production costs. The APC for each submission is 400 USD. There are no additional charges based on color, length, figures, or other elements.

Category Fee
Article Access Charge 30 USD
Article Processing Charge 400 USD
Annual Subscription Fee 200 USD
Payment Gateway
Paypal: click here
Townscript: click here
Razorpay: click here
After payment,
please send an email to irojournals.contact@gmail.com / journals@iroglobal.com requesting article access.
Subscription form: click here