Cross-Lingual Attention-based Mechanism for Speech Emotion Recognition

Aditya, Tummala Vamsi; Kuchibhotla, Swarna; Poduri, Devi Venkata Revathi; Vankayalapati, Hima Deepthi

Volume - 7 | Issue - 3 | september 2025

Cross-Lingual Attention-based Mechanism for Speech Emotion Recognition Open Access

Tummala Vamsi Aditya , Swarna Kuchibhotla, Devi Venkata Revathi Poduri, Hima Deepthi Vankayalapati 219

Pages: 331-356

Full Article PDF

Cite this article

Aditya, Tummala Vamsi, Swarna Kuchibhotla, Devi Venkata Revathi Poduri, and Hima Deepthi Vankayalapati. "Cross-Lingual Attention-based Mechanism for Speech Emotion Recognition." Journal of Trends in Computer Science and Smart Technology 7, no. 3 (2025): 331-356

DOI

10.36548/jtcsst.2025.3.003

Published

16 August, 2025

Abstract

Speech emotion recognition is one of the most emerging areas for emotion detection that may fall within the scope of affective computing. In this particular case, emotional speech files of spoken words delivered during verbal communication are of interest. The emotions of speech are investigated through sound and emotion in speech and are modeled through machine learning. Through machine learning, we performed a series of experiments on datasets like RAVDESS, TESS, SAVEE, and EMO-DB, which lean toward the objective that a Recurrent Neural Network (RNN) and (CLAF-SER): The Cross-Lingual Attention-Based Adversarial Framework for SER would be able to detect and classify such emotions as sadness, anger, happiness, neutrality, and fear. Features such as MFCC, LPCC, pitch, energy, and chroma were extracted before implementing the RNN. Through this model, TESS achieved the highest accuracy among the other datasets. However, CLAF-SER gives the best performance when all datasets are combined.

Keywords

Speech Emotion Recognition (SER) RNN (Recurrent Neural Network) CLAF-SER (Cross-Lingual Attention-based Adversarial Framework for SER) SAVEE (Surrey Audio-Visual Expressed Emotion Database) RAVDESS (Ryerson Audio-Visual Database of Emotional Speech and Song) TESS (Toronto Emotional Speech Set) EMO-DB (Berlin Database of Emotional Speech) MFCC (Mel-Frequency Cepstral Coefficients) LPCC (Linear Prediction-based Cepstral Coefficients) Pitch Energy Chroma

Category	Fee
Article Access Charge	30 USD
Article Processing Charge	400 USD
Annual Subscription Fee	200 USD

Volume - 7 | Issue - 3 | september 2025

Tummala Vamsi Aditya

DOI

10.36548/jtcsst.2025.3.003

Published

16 August, 2025

e-ISSN: 2582-4104
4 issues per year
DOI: https://doi.org/10.36548/jtcsst

Indexing
Scopus | GoogleScholar | Crossref | MicrosoftAcademic | ScienceGate | J-Gate

Publisher

Inventive Research Organization

Open Access Journal

Volume - 7 | Issue - 3 | september 2025

Tummala Vamsi Aditya

DOI

10.36548/jtcsst.2025.3.003

Published

16 August, 2025

e-ISSN: 2582-4104 4 issues per year DOI: https://doi.org/10.36548/jtcsst

Indexing Scopus | GoogleScholar | Crossref | MicrosoftAcademic | ScienceGate | J-Gate

Publisher Inventive Research Organization

Open Access Journal

e-ISSN: 2582-4104
4 issues per year
DOI: https://doi.org/10.36548/jtcsst

Indexing
Scopus | GoogleScholar | Crossref | MicrosoftAcademic | ScienceGate | J-Gate

Publisher

Inventive Research Organization