Respiratory Disease Classification from Cough Sounds Using Pre-trained  Audio Embeddings and Embedded Feature Fusion: A Comparative Study of YAMNet and VGGish

Ayşen Özün Türkçetin; Turgay Koç; Şule Çilekar

Authors

Ayşen Özün Türkçetin Suleyman Demirel University
Turgay Koç Suleyman Demirel University
Şule Çilekar Afyonkarahisar Health Sciences University

Keywords:

Lung Diseases, Cough Sound Dataset, YAMNet, VGGish, Feature Fusion, Embedding

Abstract

Cough is a primary symptom associated with a variety of respiratory conditions, including
asthma, chronic obstructive pulmonary disease (COPD), and pneumonia. This study conducts a
comparative analysis of pre-trained audio embedding models for classifying these conditions from cough
sounds, with a focus on robust evaluation for a small and imbalanced clinical dataset. We systematically
evaluated three feature sets: YAMNet embeddings, VGGish embeddings, and a fusion of both. These
features were used to train five different classifiers, including classical machine learning models and a
Convolutional Neural Network (CNN). Given the class imbalance in our dataset, we prioritized the patient
level 2-fold cross-validation Macro F1-Score as the primary metric for assessing generalization
performance. Our findings demonstrate that the fusion of YAMNet and VGGish embeddings, when
processed by a custom CNN architecture, yields the highest performance, achieving a mean cross-validation
Macro F1-Score of 0.612. This result surpassed the performance of models using single embedding types
and other classical classifiers. These findings underscore that combining complementary audio
representations through feature fusion creates a highly discriminative feature space, and a CNN is
particularly effective at leveraging this space for robust classification. This approach presents a promising,
non-invasive screening tool for respiratory diseases, suitable for telemedicine and mobile health
applications.

Downloads

Download data is not yet available.

Author Biographies

Ayşen Özün Türkçetin, Suleyman Demirel University

Graduate School of Natural and Applied Sciences, Mechanical Engineering, Turkiye

Carrier and Planning Center, Suleyman Demirel University, Turkiye

Turgay Koç, Suleyman Demirel University

Department of Electric Electronic Engineering, Turkiye

Şule Çilekar, Afyonkarahisar Health Sciences University

Department of Pulmonology, Turkiye

References

S. Hershey et al., “CNN Architectures for Large-Scale Audio Classification,” Jan. 2017, [Online]. Available: http://arxiv.org/abs/1609.09430

A. Roy and U. Satija, “Effect of Auscultation Hindering Noises on Detection of Adventitious Respiratory Sounds Using Pre-trained Audio Neural Nets: A Comprehensive Study,” IEEE Trans Instrum Meas, 2025, doi: 10.1109/TIM.2025.3571143.

F. Özcan and A. Alkan, “Explainable audio CNNs applied to neural decoding: sound category identification from inferior colliculus,” Signal Image Video Process, vol. 18, no. 2, pp. 1193–1204, Mar. 2024, doi: 10.1007/s11760-023-02825-3.

F. Özcan, “Differentiability of voice disorders through explainable AI,” Sci Rep, vol. 15, no. 1, Dec. 2025, doi: 10.1038/s41598-025-03444-3.

H. Mahdi et al., “Tuning In: Analysis of Audio Classifier Performance in Clinical Settings with Limited Data,” Apr. 2024, [Online]. Available: http://arxiv.org/abs/2402.10100

T. Xia et al., “COVID-19 Sounds: A Large-Scale Audio Dataset for Digital Respiratory Screening.” [Online]. Available: https://covid19.who.int/

T. T. Oishee, J. Anjom, U. Mohammed, and M. I. A. Hossain, “Leveraging deep edge intelligence for real-time respiratory disease detection,” Clinical eHealth, vol. 7, pp. 207–220, Dec. 2024, doi: 10.1016/j.ceh.2025.01.001.

R. V. Sharan, H. Xiong, and S. Berkovsky, “Detecting Cough Recordings in Crowdsourced Data Using CNN-RNN,” in BHI-BSN 2022 - IEEE-EMBS International Conference on Biomedical and Health Informatics and IEEE-EMBS International Conference on Wearable and Implantable Body Sensor Networks, Symposium Proceedings, Institute of Electrical and Electronics Engineers Inc., 2022. doi: 10.1109/BHI56158.2022.9926896.

Y. Yan, S. O. Simons, L. van Bemmel, L. G. Reinders, F. M. E. Franssen, and V. Urovi, “Optimizing MFCC parameters for the automatic detection of respiratory diseases,” Applied Acoustics, vol. 228, Jan. 2025, doi: 10.1016/j.apacoust.2024.110299.

S. Alrabie and A. Barnawi, “Evaluation of Pre-Trained CNN Models for Cardiovascular Disease Classification: A Benchmark Study,” Information Sciences Letters, vol. 12, no. 7, pp. 3317–3338, Jul. 2023, doi: 10.18576/isl/120755.

M. G. Campana, F. Delmastro, and E. Pagani, “Transfer learning for the efficient detection of COVID-19 from smartphone audio data,” Pervasive Mob Comput, vol. 89, Feb. 2023, doi: 10.1016/j.pmcj.2023.101754.

Respiratory Disease Classification from Cough Sounds Using Pre-trained Audio Embeddings and Embedded Feature Fusion: A Comparative Study of YAMNet and VGGish

Authors

Keywords:

Abstract

Downloads

Author Biographies

Ayşen Özün Türkçetin, Suleyman Demirel University

Turgay Koç, Suleyman Demirel University

Şule Çilekar, Afyonkarahisar Health Sciences University

References

Downloads

Published

How to Cite

Issue

Section

Keywords

Information

Current Issue