Respiratory Disease Classification from Cough Sounds Using Pre-trained Audio Embeddings and Embedded Feature Fusion: A Comparative Study of YAMNet and VGGish


Keywords:
Lung Diseases, Cough Sound Dataset, YAMNet, VGGish, Feature Fusion, EmbeddingAbstract
Cough is a primary symptom associated with a variety of respiratory conditions, including
asthma, chronic obstructive pulmonary disease (COPD), and pneumonia. This study conducts a
comparative analysis of pre-trained audio embedding models for classifying these conditions from cough
sounds, with a focus on robust evaluation for a small and imbalanced clinical dataset. We systematically
evaluated three feature sets: YAMNet embeddings, VGGish embeddings, and a fusion of both. These
features were used to train five different classifiers, including classical machine learning models and a
Convolutional Neural Network (CNN). Given the class imbalance in our dataset, we prioritized the patient
level 2-fold cross-validation Macro F1-Score as the primary metric for assessing generalization
performance. Our findings demonstrate that the fusion of YAMNet and VGGish embeddings, when
processed by a custom CNN architecture, yields the highest performance, achieving a mean cross-validation
Macro F1-Score of 0.612. This result surpassed the performance of models using single embedding types
and other classical classifiers. These findings underscore that combining complementary audio
representations through feature fusion creates a highly discriminative feature space, and a CNN is
particularly effective at leveraging this space for robust classification. This approach presents a promising,
non-invasive screening tool for respiratory diseases, suitable for telemedicine and mobile health
applications.
Downloads
References
S. Hershey et al., “CNN Architectures for Large-Scale Audio Classification,” Jan. 2017, [Online]. Available: http://arxiv.org/abs/1609.09430
A. Roy and U. Satija, “Effect of Auscultation Hindering Noises on Detection of Adventitious Respiratory Sounds Using Pre-trained Audio Neural Nets: A Comprehensive Study,” IEEE Trans Instrum Meas, 2025, doi: 10.1109/TIM.2025.3571143.
F. Özcan and A. Alkan, “Explainable audio CNNs applied to neural decoding: sound category identification from inferior colliculus,” Signal Image Video Process, vol. 18, no. 2, pp. 1193–1204, Mar. 2024, doi: 10.1007/s11760-023-02825-3.
F. Özcan, “Differentiability of voice disorders through explainable AI,” Sci Rep, vol. 15, no. 1, Dec. 2025, doi: 10.1038/s41598-025-03444-3.
H. Mahdi et al., “Tuning In: Analysis of Audio Classifier Performance in Clinical Settings with Limited Data,” Apr. 2024, [Online]. Available: http://arxiv.org/abs/2402.10100
T. Xia et al., “COVID-19 Sounds: A Large-Scale Audio Dataset for Digital Respiratory Screening.” [Online]. Available: https://covid19.who.int/
T. T. Oishee, J. Anjom, U. Mohammed, and M. I. A. Hossain, “Leveraging deep edge intelligence for real-time respiratory disease detection,” Clinical eHealth, vol. 7, pp. 207–220, Dec. 2024, doi: 10.1016/j.ceh.2025.01.001.
R. V. Sharan, H. Xiong, and S. Berkovsky, “Detecting Cough Recordings in Crowdsourced Data Using CNN-RNN,” in BHI-BSN 2022 - IEEE-EMBS International Conference on Biomedical and Health Informatics and IEEE-EMBS International Conference on Wearable and Implantable Body Sensor Networks, Symposium Proceedings, Institute of Electrical and Electronics Engineers Inc., 2022. doi: 10.1109/BHI56158.2022.9926896.
Y. Yan, S. O. Simons, L. van Bemmel, L. G. Reinders, F. M. E. Franssen, and V. Urovi, “Optimizing MFCC parameters for the automatic detection of respiratory diseases,” Applied Acoustics, vol. 228, Jan. 2025, doi: 10.1016/j.apacoust.2024.110299.
S. Alrabie and A. Barnawi, “Evaluation of Pre-Trained CNN Models for Cardiovascular Disease Classification: A Benchmark Study,” Information Sciences Letters, vol. 12, no. 7, pp. 3317–3338, Jul. 2023, doi: 10.18576/isl/120755.
M. G. Campana, F. Delmastro, and E. Pagani, “Transfer learning for the efficient detection of COVID-19 from smartphone audio data,” Pervasive Mob Comput, vol. 89, Feb. 2023, doi: 10.1016/j.pmcj.2023.101754.