Volume - 7 | Issue - 3 | september 2025
Published
16 August, 2025
Speech emotion recognition is one of the most emerging areas for emotion detection that may fall within the scope of affective computing. In this particular case, emotional speech files of spoken words delivered during verbal communication are of interest. The emotions of speech are investigated through sound and emotion in speech and are modeled through machine learning. Through machine learning, we performed a series of experiments on datasets like RAVDESS, TESS, SAVEE, and EMO-DB, which lean toward the objective that a Recurrent Neural Network (RNN) and (CLAF-SER): The Cross-Lingual Attention-Based Adversarial Framework for SER would be able to detect and classify such emotions as sadness, anger, happiness, neutrality, and fear. Features such as MFCC, LPCC, pitch, energy, and chroma were extracted before implementing the RNN. Through this model, TESS achieved the highest accuracy among the other datasets. However, CLAF-SER gives the best performance when all datasets are combined.
KeywordsSpeech Emotion Recognition (SER) RNN (Recurrent Neural Network) CLAF-SER (Cross-Lingual Attention-based Adversarial Framework for SER) SAVEE (Surrey Audio-Visual Expressed Emotion Database) RAVDESS (Ryerson Audio-Visual Database of Emotional Speech and Song) TESS (Toronto Emotional Speech Set) EMO-DB (Berlin Database of Emotional Speech) MFCC (Mel-Frequency Cepstral Coefficients) LPCC (Linear Prediction-based Cepstral Coefficients) Pitch Energy Chroma