A Root-Guided Random Forest Framework with Discriminative Voice Biomarker Selection for Parkinson’s Disease Detection

Geetha Ramani R.; Nandhitha K.

doi:10.36548/jscp.2026.3.002

A Root-Guided Random Forest Framework with Discriminative Voice Biomarker Selection for Parkinson’s Disease Detection

Open Access

https://doi.org/10.36548/jscp.2026.3.002

Vol. 8, No. 3 (2026)

Published: 22 June, 2026

Pages: 201-218

Geetha Ramani R. , Geetha Ramani R.

Department of Information Technology, Anna University, Chennai, India

Department of Information Technology, Anna University, Chennai, India
Nandhitha K. Nandhitha K.

Department of Information Technology, Anna University, Chennai, India

Department of Information Technology, Anna University, Chennai, India

view PDF

How to Cite

R., Geetha Ramani, and Nandhitha K. 2026. “A Root-Guided Random Forest Framework With Discriminative Voice Biomarker Selection for Parkinson’s Disease Detection”. Journal of Soft Computing Paradigm 8 (3): 201-18. https://doi.org/10.36548/jscp.2026.3.002.

Keywords

Parkinson’s Disease

Random Forest

Feature Selection

Voice Biomarkers

Root-Guided Learning

Abstract

Parkinson's Disease (PD) is a neurodegenerative disorder that severely impacts speech production by causing unstable phonation, articulation disorders, and prosody impairments. Early identification of such speech-specific PD symptoms enables timely diagnosis and treatment of the disease. The paper introduces a Root-Guided Random Forest Feature Selection (RGRFFS) framework for automatic detection of Parkinson's Disease using speech data. Samples of voices collected from the existing publicly available Parkinson's disease speech datasets have been preprocessed by performing noise reduction, voice activity detection, normalization, and signal segmentation. In total, 107 acoustic and spectral features were extracted, which include Mel-Frequency Cepstral Coefficients (MFCCs), measures of voice perturbations, formants, spectral features, and prosodic parameters, to capture speech features of Parkinsonian patients. To minimize redundant information and increase discriminative power of the set of features, Root-Guided Random Forest Feature Selection method was used for selection of vocal biomarkers ranked by Root Importance Score (RIS). As a result, 75 most informative features have been chosen and used for further classification and evaluation of classification accuracy. The achieved classification accuracy of 93.85% together with precision, recall, and F1-score values of 0.94 indicates that the selected feature subset allows reliable separation of samples of PD and control speech.

References

Madusanka, Nuwan, and Byeong-il Lee. "Vocal Biomarkers for Parkinson’s Disease Classification Using Audio Spectrogram Transformers." Journal of Voice 2024.
Wodzinski, Marek, Andrzej Skalski, Daria Hemmerling, Juan Rafael Orozco-Arroyave, and Elmar Nöth. "Deep Learning Approach to Parkinson’s Disease Detection Using Voice Recordings and Convolutional Neural Network Dedicated to Image Classification." In 2019 41st annual international conference of the IEEE engineering in medicine and biology society (EMBC), IEEE, 2019, 717-720.
Hossain, Mohammad Amran, and Francesco Amenta. "Machine Learning-Based Classification of Parkinson’s Disease Patients Using Speech Biomarkers." Journal of Parkinson’s Disease 2024, vol. 14, no. 1: 95-109.
Chen, Wenna, Rongfu Lv, Xiaowei Du, Xiangyu Chen, Hao Wang, Jincan Zhang, and Ganqin Du. "Parkinson's Disease Detection Using Spectrogram-Based Multi-Model Feature Fusion Networks." Frontiers in Neurology 2025, vol. 16: 1706317.
Hernandez, Abner, Eunjung Yeo, Kwanghee Choi, Chin-Jou Li, Zhengjun Yue, Rohan Kumar Das, Jan Rusz et al. "Adapting Self-Supervised Speech Representations for Cross-lingual Dysarthria Detection in Parkinson's Disease." arXiv preprint arXiv:2603.22225 (2026).
Klempíř, Ondřej, and Radim Krupička. "Analyzing Wav2vec 1.0 Embeddings for Cross-Database Parkinson’s Disease Detection and Speech Features Extraction." Sensors 2024, vol. 24, no. 17: 5520.
Sedigh Malekroodi, Hadi, Nuwan Madusanka, Byeong-il Lee, and Myunggi Yi. "Speech-Based Parkinson’s Detection Using Pre-Trained Self-Supervised Automatic Speech Recognition (ASR) Models and Supervised Contrastive Learning." Bioengineering 2025, vol. 12, no. 7: 728.
Skaramagkas, Vasileios, Anastasia Pentari, Zinovia Kefalopoulou, and Manolis Tsiknakis. "Multi-Modal Deep Learning Diagnosis of Parkinson’s Disease—A Systematic Review." IEEE Transactions on Neural Systems and Rehabilitation Engineering 2023, vol. 31: 2399-2423.
Shibina, V., and T. M. Thasleema. "A Hybrid Approach to Detecting Parkinson's Disease Using Spectrogram and Deep Learning CNN-LSTM Network." International Journal of Speech Technology 2024, vol. 27, no. 3: 657-671.
Ribas, Dayana, Miguel A. Pastor, Antonio Miguel, David Martínez, Alfonso Ortega, and Eduardo Lleida. "Automatic Voice Disorder Detection Using Self-Supervised Representations." IEEE Access 2023, vol. 11: 14915-14927.
Orozco-Arroyave, Juan Rafael, Julián David Arias-Londoño, Jesús Francisco Vargas-Bonilla, María Claudia Gonzalez-Rátiva, and Elmar Nöth. "New Spanish Speech Corpus Database for the Analysis of People Suffering from Parkinson's Disease." In Lrec 2014, vol. 14: 342-347.
Little, M. “Parkinsons” [Dataset]. UCI Machine Learning, 2007, https://archive.ics.uci.edu/dataset/174/parkinsons?utm_source

A Root-Guided Random Forest Framework with Discriminative Voice Biomarker Selection for Parkinson’s Disease Detection

How to Cite

Download Citation

Keywords

Abstract

References