Dataset for Software Engineering Learning Resources

Abstract views: 27 / PDF downloads: 134


  • Muddassira Arshad Department of Software Engineering, University of the Punjab, Pakistan
  • Muhammad Murtaza Yousaf Department of Software Engineering, University of the Punjab, Pakistan
  • Syed Mansoor Sarwar Punjab University College of Information Technology, University of the Punjab, Pakistan



Software Engineering Learning Resource Corpus, Reading Recommender dataset for Learning Software Engineering, Readability Assessment of Software Engineering Learning Resources, Software Engineering self-learning repository with readability guidance, Software Engineering Concept Map


– In the current digital age, an abundance of digital resources is readily available to learners. With the ongoing COVID pandemic and prevalent economic crises, a significant number of learners prefer to engage in self-learning. To develop customized self-learning applications and guide learners to utilize resources based on their learning preferences, a dataset containing learning resources and their prerequisite relationships is required. Several learning resource datasets exist for Machine Learning (ML), Information Retrieval (IR), and Natural Language Processing (NLP). To contribute to this area, we present the Software Engineering Learning Resource Dataset (SELRD), which is a publicly available dataset specifically designed for learning Software Engineering (SE). We have extracted the data for SELRD from multiple sources, including edX, my-mooc, and textbooks. The SE learning resources (SELR) are organized based on topics, and the dataset includes 602 SELRs referring to 302 topics. We have extracted the content from lectures and books available in presentation files (pptx) and Portable Document Format (PDF) using Python libraries. Additionally, we have computed the expected reading time for each SELR, which would facilitate learners by guiding them on the time required to read each respective resource. The SELRD comprises 692 prerequisite pairs, including 592 positive pairs and 100 negative pairs. This data can be used along with machine learning algorithms to generate learning paths that would facilitate self-learners. Additionally, the SELRD can also serve as a repository of SE learning resources. In the future, we plan to add best practices and examples for each SELR, making it even more useful for learners.




How to Cite

Arshad, M., Yousaf, M. M., & Sarwar, S. M. (2023). Dataset for Software Engineering Learning Resources. International Conference on Scientific and Innovative Studies, 1(1), 118–123.