A Sequence Data Model for Analyzing Temporal Patterns of Student Data

Mary Lou Maher
Omar Eltayeby
Wenwen Dou
Kazjon Grace


Data models built for analyzing student data often obfuscate temporal relationships for reasons of simplicity, or to aid in generalization. We present a model based on temporal relationships of heterogeneous data as the basis for building predictive models. We show how within- and between-semester temporal patterns can provide insight into the student experience. For example, in a within-semester model, the prediction of the final course grade can be based on weekly activities and submissions recorded in the LMS. In the between-semester model, the prediction of success or failure in a degree program can be based on sequence patterns of grades and activities across multiple semesters. The benefits of our sequence data model include temporal structure, segmentation, contextualization, and storytelling. To demonstrate these benefits, we have collected and analyzed 10 years of student data from the College of Computing at UNC Charlotte in a between-semester sequence model, and used data in an introductory course in computer science to build a within-semester sequence model. Our results for the two sequence models show that analytics based on the sequence data model can achieve higher predictive accuracy than non-temporal models with the same data.


Sequence data model; educational data mining; learning analytics; predictive modelling; knowledge discovery.

Full Text:



Agrawal, R., Imieliński, T., & Swami, A. (1993, June). Mining association rules between sets of items in large databases. In Acm sigmod record (Vol. 22, No. 2, pp. 207-216). ACM.

Arnold, K. E. [Kimberly E]. (2010). Signals: applying academic analytics. Educause Quarterly, 33(1), n1.

Arnold, K. E. [Kimberly E.] & Pistilli, M. D. (2012). Course signals at Purdue: using learning analytics to increase student success, 267. doi:10.1145/2330601.2330666

Blei, D. M. (2012). Probabilistic topic models. Communications of the ACM, 55, 77-84.

Campbell, J. P. [John P], Oblinger, D. G. et al. (2007). Academic analytics. EDUCAUSE review, 42(4), 40–57.

Campbell, J. P. [John Patrick]. (2007). Utilizing student data within the course management system to determine undergraduate student academic success: an exploratory study. ProQuest.

Campello, R. J., Moulavi, D., & Sander, J. (2013, April). Density-based clustering based on hierarchical density estimates. In Pacific-Asia Conference on Knowledge Discovery and Data Mining (pp. 160-172). Springer, Berlin, Heidelberg.

Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine learning, 20(3), 273-297.

Er, E. (2012). Identifying at-risk students using machine learning techniques: a case study with is 100. International Journal of Machine Learning and Computing, 2(4), 476.

Hulten, G., Spencer, L., & Domingos, P. (2001, August). Mining time-changing data streams. In Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 97-106). ACM.

Jayaprakash, S. M., Moody, E. W., Lauría, E. J., Regan, J. R., & Baron, J. D. (2014). Early alert of academically at-risk students: an open source analytics initiative. Journal of Learning Analytics, 1(1), 6–47.

Lakkaraju, H., Aguiar, E., Shan, C., Miller, D., Bhanpuri, N., Ghani, R., & Addison, K. L. (2015). A machine learning framework to identify students at risk of adverse academic outcomes. In Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 1909–1918). KDD ’15. Sydney, NSW, Australia: ACM. doi:10.1145/2783258.2788620

Macfadyen, L. P. & Dawson, S. (2010). Mining LMS data to develop an “early warning system” for educators: a proof of concept. Computers & Education, 54 (2), 588–599. doi:10.1016/j.compedu.2009.09.008

Maher, M. L., & Mahzoon, M. J. (2015). Finding unexpected patterns in citizen science contributions using innovation analytics. Collective Intelligence Conference.

Mohamad, S. K. & Tasir, Z. (2013). Educational data mining: a review. Procedia - Social and Behavioral Sciences, 97, 320–324. The 9th International Conference on Cognitive Science. doi:http://dx.doi.org/10.1016/j.sbspro.2013.10.240

Padmanabhan, B. & Tuzhilin, A. (1999). Unexpectedness as a measure of interestingness in knowledge discovery. Decision Support Systems, 27 (3), 303–318. 267vc Times Cited:61 Cited References Count:17. doi:Doi10.1016/S0167-9236(99)00053-6

Peña-Ayala, A. (2014). Educational data mining: a survey and a data mining-based analysis of recent works. Expert Systems with Applications, 41 (4, Part 1), 1432–1462. doi:http://dx.doi.org/10.1016/j.eswa.2013.08.042

Ramage, D., Hall, D., Nallapati, R., & Manning, C. D. (2009, August). Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1-Volume 1 (pp. 248-256). Association for Computational Linguistics.

Romero, C. [C.] & Ventura, S. (2007). Educational data mining: a survey from 1995 to 2005. Expert Systems with Applications, 33(1), 135–146. doi:10.1016/j.eswa.2006.04.005

Romero, C. [C.] & Ventura, S. (2010). Educational data mining: a review of the state of the art. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 40(6), 601–618. doi:10.1109/TSMCC.2010.2053532

Romero, C. [Cristóbal], Ventura, S., & García, E. (2008). Data mining in course management systems: moodle case study and tutorial. Computers & Education, 51(1), 368–384. doi:10.1016/j.compedu.2007.05.016

Tausczik, Y.R., & Pennebaker, J.W. (2010). The psychological meaning of words: LIWC and computerized text analysis methods. Journal of Language and Social Psychology, 29, 24-54.

Widmer, G., & Kubat, M. (1996). Learning in the presence of concept drift and hidden contexts. Machine learning, 23(1), 69-101.

Wolff, A., Zdrahal, Z., Nikolov, A., & Pantucek, M. (2013). Improving retention: predicting at-risk students by analyzing clicking behaviour in a virtual learning environment. In Proceedings of the third international conference on learning analytics and knowledge (pp. 145–149). LAK ’13. Leuven, Belgium: ACM. doi:10.1145/2460296.2460324

DOI: http://dx.doi.org/10.18608/jla.2018.51.5


  • There are currently no refbacks.