Integrating Complex Network Analysis and Machine Learning for Biomarker Discovery in the Human Gut Microbiome
Keywords:
Gut microbiome, Complex network analysis, Machine learning, Biomarker discovery, Multi-omics integrationAbstract
The human gut microbiome is a complex ecosystem whose structural and functional
equilibrium is essential for host health. Imbalances in this equilibrium have been linked to numerous
chronic diseases, underscoring the need for sophisticated analytical techniques to elucidate microbiome
composition and predict disease-associated phenotypes. This study proposes an integrated methodology
that combines complicated network spectral analysis with machine learning to discover physiologically
significant patterns from multi-omics microbiome data. Utilizing metagenomic datasets, we calculated
essential measures, including LATENT, EXPLAINED, and MU, which encapsulate the variance structure
and network impact of microbial taxa. Our findings demonstrated a long-tail distribution of LATENT
values, aligning with scale-free network characteristics, suggesting the existence of highly linked taxa that
may function as keystone species or biomarkers. Positive correlations between EXPLAINED and MU
indicate that taxa contributing more to variation also exert a bigger impact within the functional
microbiome network. Statistical distribution analysis, ECDF plots, and comparison boxplots validated a
significant level of variability within the dataset, a characteristic commonly observed in microbial
communities. This integrated framework optimises predictive performance and biological interpretability,
offering a scalable approach for biomarker discovery and the construction of personalised diagnostic
models.
Downloads
References
. Almeida, A., Nayfach, S., Boland, M., Strozzi, F., Beracochea, M., Shi, Z. J., Pollard, K. S., Sakharova, E., Parks, D. H., Hugenholtz, P., Segata, N., Kyrpides, N. C., & Finn, R. D. (2020). A unified catalog of 204,938 reference genomes from the human gut microbiome. Nature Biotechnology, 39(1), 105–114. https://doi.org/10.1038/s41587-020-0603-3
. Bakir-Gungor, B., Bulut, O., Jabeer, A., Nalbantoglu, O. U., & Yousef, M. (2021). Discovering potential taxonomic biomarkers of type 2 diabetes from human gut microbiota via different feature selection methods. Frontiers in Microbiology, 12. https://doi.org/10.3389/fmicb.2021.628426
. Bakir-Gungor, B., Hacılar, H., Jabeer, A., Nalbantoglu, O. U., Aran, O., & Yousef, M. (2022). Inflammatory bowel disease biomarkers of human gut microbiota selected via different feature selection methods. PeerJ, 10, e13205. https://doi.org/10.7717/peerj.13205
. Barabási, A., & Albert, R. (1999). Emergence of scaling in random networks. Science, 286(5439), 509–512. https://doi.org/10.1126/science.286.5439.509
. Chatelier, E. L., Nielsen, T., Qin, J., Prifti, E., Hildebrand, F., Falony, G., Almeida, M., Arumugam, M., Batto, J., Kennedy, S., Leonard, P., Li, J., Burgdorf, K., Grarup, N., Jørgensen, T., Brandslund, I., Nielsen, H. B., Juncker, A. S., Bertalan, M., . . . Pedersen, O. (2013). Richness of human gut microbiome correlates with metabolic markers. Nature, 500(7464), 541–546. https://doi.org/10.1038/nature12506
. Curry, K. D., Nute, M. G., & Treangen, T. J. (2021). It takes guts to learn: machine learning techniques for disease detection from the gut microbiome. Emerging Topics in Life Sciences, 5(6), 815–827. https://doi.org/10.1042/etls20210213
. Faust, K., & Raes, J. (2012). Microbial interactions: from networks to models. Nature Reviews Microbiology, 10(8), 538–550. https://doi.org/10.1038/nrmicro2832
. Franzosa, E. A., Sirota-Madi, A., Avila-Pacheco, J., Fornelos, N., Haiser, H. J., Reinker, S., Vatanen, T., Hall, A. B., Mallick, H., McIver, L. J., Sauk, J. S., Wilson, R. G., Stevens, B. W., Scott, J. M., Pierce, K., Deik, A. A., Bullock, K., Imhann, F., Porter, J. A., . . . Xavier, R. J. (2018). Gut microbiome structure and metabolic activity in inflammatory bowel disease. Nature Microbiology, 4(2), 293–305. https://doi.org/10.1038/s41564-018-0306-4
. Hagberg, A. A., Schult, D. A., & Swart, P. J. (2008). Exploring Network Structure, Dynamics, and Function using NetworkX. Proceedings of the Python in Science Conferences, 11–15. https://doi.org/10.25080/tcwv9851
. Han, H., Fulcher, J. M., Dandey, V. P., Iwasa, J. H., Sundquist, W. I., Kay, M. S., Shen, P. S., & Hill, C. P. (2019). Structure of Vps4 with circular peptides and implications for translocation of two polypeptide chains by AAA+ ATPases. eLife, 8. https://doi.org/10.7554/elife.44071
. Harris, C. R., Millman, K. J., Van Der Walt, S. J., Gommers, R., Virtanen, P., Cournapeau, D., Wieser, E., Taylor, J., Berg, S., Smith, N. J., Kern, R., Picus, M., Hoyer, S., Van Kerkwijk, M. H., Brett, M., Haldane, A., Del Río, J. F., Wiebe, M., Peterson, P., . . . Oliphant, T. E. (2020). Array programming with NumPy. Nature, 585(7825), 357–362. https://doi.org/10.1038/s41586-020-2649-2
. He, Y., Wu, W., Zheng, H., Li, P., McDonald, D., Sheng, H., Chen, M., Chen, Z., Ji, G., Zheng, Z., Mujagond, P., Chen, X., Rong, Z., Chen, P., Lyu, L., Wang, X., Wu, C., Yu, N., Xu, Y., . . . Zhou, H. (2018). Regional variation limits applications of healthy gut microbiome reference ranges and disease models. Nature Medicine, 24(10), 1532–1535. https://doi.org/10.1038/s41591-018-0164-x
. Hunter, J. D. (2007). Matplotlib: A 2D graphics environment. Computing in Science & Engineering, 9(3), 90–95. https://doi.org/10.1109/MCSE.2007.55
. Kalluci, E., Preni, B., Dhamo, X., Noka, E., Bardhi, S., Bani, K., Macchia, A., Bonetti, G., Dhuli, K., Donato, K., Bertelli, M., Zambrano, L. J. M., & Janaqi, S. (2024). A comparative study of supervised and unsupervised machine learning algorithms applied to human microbiome. La Clinica Terapeutica, 175(3), 98–116. https://doi.org/10.7417/ct.2024.5051
. Kapçiu, R., Preni, B., & Kalluçi, E. (2024a). IT-ENABLED WGCNA FOR CRITICAL GENE MODULE MAPPING AND THERAPY OPTIMIZATION: ADVANCING LEUKEMIA CARE. Transdisciplinary Journal of Engineering & Science, 15. https://doi.org/10.22545/2024/00252
. Kapçiu, R., Preni, B., Kalluçi, E., & Kosova, R. (2024b). MODELING INFLATION DYNAMICS USING THE LOGISTIC MODEL: INSIGHTS AND FINDINGS. Jurnal Ilmiah Ilmu Terapan Universitas Jambi|JIITUJ|, 8(1), 364–378. https://doi.org/10.22437/jiituj.v8i1.32605
. Karlsson, F. H., Tremaroli, V., Nookaew, I., Bergström, G., Behre, C. J., Fagerberg, B., Nielsen, J., & Bäckhed, F. (2013). Gut metagenome in European women with normal, impaired and diabetic glucose control. Nature, 498(7452), 99–103. https://doi.org/10.1038/nature12198
. Kosova, R., Hajrulla, S., Xhafaj, E., & Kapçiu, R. (2024). URBAN FLOOD RESILIENCE: a MULTI-CRITERIA EVALUATION USING AHP AND TOPSIS. Jurnal Ilmiah Ilmu Terapan Universitas Jambi|JIITUJ|, 8(2), 812–825. https://doi.org/10.22437/jiituj.v8i2.35387
. Kurilshikov, A., Medina-Gomez, C., Bacigalupe, R., Radjabzadeh, D., Wang, J., Demirkan, A., Roy, C. I. L., Garay, J. a. R., Finnicum, C. T., Liu, X., Zhernakova, D. V., Bonder, M. J., Hansen, T. H., Frost, F., Rühlemann, M. C., Turpin, W., Moon, J., Kim, H., Lüll, K., . . . Zhernakova, A. (2021). Large-scale association analyses identify host factors influencing human gut microbiome composition. Nature Genetics, 53(2), 156–165. https://doi.org/10.1038/s41588-020-00763-1
. Marcos-Zambrano, L. J., López-Molina, V. M., Bakir-Gungor, B., Frohme, M., Karaduzovic-Hadziabdic, K., Klammsteiner, T., Ibrahimi, E., Lahti, L., Loncar-Turukalo, T., Dhamo, X., Simeon, A., Nechyporenko, A., Pio, G., Przymus, P., Sampri, A., Trajkovik, V., Lacruz-Pleguezuelos, B., Aasmets, O., Araujo, R., . . . De Santa Pau, E. C. (2023). A toolbox of machine learning software to support microbiome analysis. Frontiers in Microbiology, 14. https://doi.org/10.3389/fmicb.2023.1250806
. McKinney, W. (2010). Data structures for statistical computing in Python. Proceedings of the Python in Science Conferences, 56–61. https://doi.org/10.25080/majora-92bf1922-00a
. Newman, M. E. J. (2010). Networks: An introduction. Oxford University Press. https://doi.org/10.1093/acprof:oso/9780199206650.001.0001
. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E., Louppe, G. (2012). Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research. 12. https://www.researchgate.net/publication/51969319_Scikit-learn_Machine_Learning_in_Python/citation/download
. Proctor, L. M., Creasy, H. H., Fettweis, J. M., Lloyd-Price, J., Mahurkar, A., Zhou, W., Buck, G. A., Snyder, M. P., Strauss, J. F., Weinstock, G. M., White, O., & Huttenhower, C. (2019). The Integrative Human Microbiome Project. Nature, 569(7758), 641–648. https://doi.org/10.1038/s41586-019-1238-8
. Qin, J., Li, R., Raes, J., Arumugam, M., Burgdorf, K. S., Manichanh, C., Nielsen, T., Pons, N., Levenez, F., Yamada, T., Mende, D. R., Li, J., Xu, J., Li, S., Li, D., Cao, J., Wang, B., Liang, H., Zheng, H., . . . Wang, J. (2010). A human gut microbial gene catalogue established by metagenomic sequencing. Nature, 464(7285), 59–65. https://doi.org/10.1038/nature08821
. Qin, J., Li, Y., Cai, Z., Li, S., Zhu, J., Zhang, F., Liang, S., Zhang, W., Guan, Y., Shen, D., Peng, Y., Zhang, D., Jie, Z., Wu, W., Qin, Y., Xue, W., Li, J., Han, L., Lu, D., . . . Wang, J. (2012). A metagenome-wide association study of gut microbiota in type 2 diabetes. Nature, 490(7418), 55–60. https://doi.org/10.1038/nature11450
. Sokolova, M., & Lapalme, G. (2009). A systematic analysis of performance measures for classification tasks. Information Processing & Management, 45(4), 427–437. https://doi.org/10.1016/j.ipm.2009.03.002
. Tabaku, E., Vyshka, E., Kapçiu, R., Shehi, A., & Smajli, E. (2025). UTILIZING ARTIFICIAL INTELLIGENCE IN ENERGY MANAGEMENT SYSTEMS TO IMPROVE CARBON EMISSION REDUCTION AND SUSTAINABILITY. Jurnal Ilmiah Ilmu Terapan Universitas Jambi|JIITUJ|, 9(1), 393–405. https://doi.org/10.22437/jiituj.v9i1.38665
. Vatanen, T., Franzosa, E. A., Schwager, R., Tripathi, S., Arthur, T. D., Vehik, K., Lernmark, Å., Hagopian, W. A., Rewers, M. J., She, J., Toppari, J., Ziegler, A., Akolkar, B., Krischer, J. P., Stewart, C. J., Ajami, N. J., Petrosino, J. F., Gevers, D., Lähdesmäki, H., . . . Xavier, R. J. (2018). The human gut microbiome in early-onset type 1 diabetes from the TEDDY study. Nature, 562(7728), 589–594. https://doi.org/10.1038/s41586-018-0620-2
. Waskom, M. (2021). seaborn: statistical data visualization. The Journal of Open Source Software, 6(60), 3021. https://doi.org/10.21105/joss.03021
. Wilmanski, T., Diener, C., Rappaport, N., Patwardhan, S., Wiedrick, J., Lapidus, J., Earls, J. C., Zimmer, A., Glusman, G., Robinson, M., Yurkovich, J. T., Kado, D. M., Cauley, J. A., Zmuda, J., Lane, N. E., Magis, A. T., Lovejoy, J. C., Hood, L., Gibbons, S. M., . . . Price, N. D. (2021). Gut microbiome pattern reflects healthy ageing and predicts survival in humans. Nature Metabolism, 3(2), 274–286. https://doi.org/10.1038/s42255-021-00348-0