Integrating Machine Learning with Public Health Screening: High-Performance CatBoost Models for Diabetes and Hypertension Risk Assessment


Abstract views: 25 / PDF downloads: 28

Authors

  • Hakan Yılmaz Konya Technical University
  • Mehmet Özdem Türk Telekom

Keywords:

Diabetes prediction, Hypertension risk, Machine learning, Classification, Web-based decision support

Abstract

Diabetes mellitus and hypertension remain two of the most prevalent and burdensome chronic diseases worldwide, underscoring the need for accurate, scalable, and sensitivity-oriented risk prediction tools. This study develops and evaluates two CatBoost-based machine learning models aimed at estimating diabetes and hypertension risk using data from the nationally representative NHANES 2021–2023 survey. After extensive preprocessing—including harmonizing demographic, anthropometric, laboratory, blood pressure, lifestyle, and smoking-related variables—the final analytic cohort consisted of 6,320 adults with 15 core predictors. Two independent CatBoost classifiers were trained using stratified 80/20 splits, balanced class weighting, and optimized hyperparameters. To reflect screening-oriented priorities, a reduced probability threshold (0.40) was selected to maximize sensitivity. The diabetes model achieved an accuracy of 0.874, recall of 0.852, and ROC-AUC of 0.947, while the hypertension model reached an accuracy of 0.792, recall of 0.850, and ROC-AUC of 0.888. These results demonstrate strong discriminative performance using a minimal feature set and confirm the suitability of CatBoost for handling heterogeneous clinical and behavioral data. To illustrate real-world applicability, both models were deployed in a lightweight Flask-based web interface capable of generating real-time probability estimates and categorical risk labels. Overall, this study highlights the potential of modern machine learning methods to transform large-scale health survey data into practical chronic disease risk prediction tools. The findings support future work integrating expanded biomarker sets, longitudinal modeling, multi-disease prediction, and explainability frameworks, as well as external validation across diverse populations. 

Downloads

Download data is not yet available.

Author Biographies

Hakan Yılmaz, Konya Technical University

Department of Artificial Intelligence and Machine Learning, Konya

Mehmet Özdem, Türk Telekom

Ankara

Downloads

Published

2026-01-25

How to Cite

Yılmaz, H., & Özdem, M. (2026). Integrating Machine Learning with Public Health Screening: High-Performance CatBoost Models for Diabetes and Hypertension Risk Assessment. International Journal of Advanced Natural Sciences and Engineering Researches, 10(1), 27–42. Retrieved from https://as-proceeding.com/index.php/ijanser/article/view/3030

Issue

Section

Articles