Modern Approaches in Turkish Demographic Data Analytics: Transforming into an End-To-End Panel with Streamlit and Altair
Keywords:
TÜİK, ADNKS, ETL pipeline, data harmonization, tidy data, caching, Streamlit dashboardAbstract
This paper proposes a comprehensive system that automatically extracts, stores, processes, and formats the demographic information from the ADNKS bulletins issued by the Turkish Statistical Institute (TÜİK) into a visual analytics dashboard. This system uses a data pipeline that automatically downloads the files, stores them in a cache that prevents the need for future downloads, accurately extracts the tables, and formats the data in a tidy form that enables cross-year, cross-province comparison. This system uses Streamlit, which provides interactive filtering, time series, and side-by-side comparison options. To make the system more functional, it provides data quality checks, which include checks on the data format, missing data, and the unification of the names of the provinces. This system will be assessed through: (i) the comparison of the system’s performance in a cold-start setting versus a cached setting, (ii) the verification of the system’s data quality on a variety of bulletins, and (iii) the demonstration of scenario-based analytics tasks that better represent real-world usage, which include the comparison of provinces over time, the top-k comparison, and the demographics.