Journal of Innovative Image Processing is accepted for inclusion in Scopus. click here
Home / Archives / Volume-7 / Issue-4 / Article-12

Volume - 7 | Issue - 4 | december 2025

Detection of Voice and Lung Pathological Signal Using Acoustic Spectrogram Transformers Open Access
Revathi S.  , Mohana Sundaram K., Padmini Sharma, Manjusha Silas  5
Pages: 1304-1319
Full Article PDF pdf-white-icon
Cite this article
S., Revathi, Mohana Sundaram K., Padmini Sharma, and Manjusha Silas. "Detection of Voice and Lung Pathological Signal Using Acoustic Spectrogram Transformers." Journal of Innovative Image Processing 7, no. 4 (2025): 1304-1319
Published
07 November, 2025
Abstract

In the medical field, identifying various pathological conditions poses a crucial challenge because it requires an invasive and contact-based data extraction technique. Therefore, non-invasive and non-contact forms of vital data, such as speech signals, can be used to identify various pathological conditions. Speech signals have distinguishing phonetic characteristics that change when a pathological condition occurs in the human body. By using these changes, various pathological signals can be classified by training machine learning and deep learning models with the acoustic features of speech signals. This work proposes the acoustic spectrogram transformer, where all the layers in the transformer are trained using acoustic characteristics extracted from the speech signals of voice and lung disease patients. Mel-frequency cepstral coefficients (MFCCs), Mel spectrograms, and spectral variables like centroid, bandwidth, roll-off, and zero-crossing rate are used for feature extraction from the voice and lung dataset. These acoustic features train the transformer blocks and depth-adaptive parameters, enabling the model to capture complex patterns for effective signal classification. Along with this architecture, the model consists of frequency-focused attention mechanisms used to extract spectral characteristics that are most indicative of pathological conditions. Meanwhile, multiple pooling strategies are employed for the effective aggregation of temporal information. Due to this targeted design, the system serves as an effective clinical tool for classification, minimizing computational complexity and achieving an accuracy of about 83% in voice pathology classification and 99% in lung pathology classification.

Keywords

Voice Pathology Lung Pathology Acoustic Spectrogram Transformer Mel Spectrogram

×
Article Processing Charges

Journal of Innovative Image Processing (jiip) is an open access journal. When a paper is accepted for publication, authors are required to pay Article Processing Charges (APCs) to cover its editorial and production costs. The APC for each submission is 400 USD. There are no additional charges based on color, length, figures, or other elements.

Category Fee
Article Access Charge 30 USD
Article Processing Charge 400 USD
Annual Subscription Fee 200 USD
Payment Gateway
Paypal: click here
Townscript: click here
Razorpay: click here
After payment,
please send an email to irojournals.contact@gmail.com / journals@iroglobal.com requesting article access.
Subscription form: click here