Utilizing Student Time Series Behaviour in Learning Management Systems for Early Prediction of Course Performance
Keywords:predictive analytics, learning management system, long short-term memory network, machine learning, LSTM, learning analytics
Predictive analytics in higher education has become increasingly popular in recent years with the growing availability of educational big data. Particularly, a wealth of student activity data is available from learning management systems (LMSs) in most academic institutions. However, previous investigations into predictive analytics in higher education using LMS activity data did not adequately accommodate student behaviours in the form of time series. In this study, we have applied a deep learning approach — long short-term memory (LSTM) networks — to analyze student online temporal behaviours using their LMS data for the early prediction of course performance. To reveal the potential of the deep learning approach in predictive analytics, we compared LSTM networks with eight conventional machine learning classifiers in terms of the prediction performance as measured by the area under the ROC (receiver operating characteristic) curve (AUC) scores. Results indicate that using the deep learning approach, time series information about click frequencies successfully provided early detection of at-risk students with moderate prediction accuracy. In addition, the deep learning approach showed higher prediction performance and stronger generalizability than the machine learning classifiers.
Al-Shabandar, R., Hussain, A., Laws, A., Keight, R., Lunn, J., & Radi, N. (2017). Machine learning approaches to predict learning outcomes in massive open online courses. Proceedings of the 2017 International Joint Conference on Neural Networks (IJCNN 2017), 14–19 May 2017, Anchorage, Alaska, USA (pp. 713–720). Washington, DC: IEEE Computer Society. https://dx.doi.org/10.1109/IJCNN.2017.7965922922
Amrieh, E. A., Hamtini, T., & Aljarah, I. (2016). Mining educational data to predict student’s academic performance using ensemble methods. International Journal of Database Theory and Application, 9(8), 119–136. https://dx.doi.org/10.14257/ijdta.2016.9.8.13
Arnold, K. E., & Pistilli, M. D. (2012). Course Signals at Purdue: Using learning analytics to increase student success. In S. Buckingham Shum, D. Gašević, & R. Ferguson (Eds.), Proceedings of the 2nd International Conference on Learning Analytics and Knowledge (LAK ʼ12), 29 April–2 May 2012, Vancouver, BC, Canada (pp. 267–270). New York: ACM. https://dx.doi.org/10.1145/2330601.2330666
Bengio, Y., Simard, P., & Frasconi, P. (1994). Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks, 5(2), 157–166. https://dx.doi.org/10.1109/72.279181
Bergstra, J., & Bengio, Y. (2012). Random search for hyper-parameter optimization. Journal of Machine Learning Research, 13(10), 281–305. https://dl.acm.org/doi/10.5555/2188385.2188395
Berland, M., Baker, R. S., & Blikstein, P. (2014). Educational data mining and learning analytics: Applications to constructionist research. Technology, Knowledge and Learning, 19, 205–220. https://dx.doi.org/10.1007/s10758-014-9223-7
Casey, K., & Azcona, D. (2017). Utilizing student activity patterns to predict performance. International Journal of Educational Technology in Higher Education, 14, 4. https://dx.doi.org/10.1186/s41239-017-0044-3
Chandrashekar, G., & Sahin, F. (2014). A survey on feature selection methods. Computers & Electrical Engineering, 40(1), 16–28. https://dx.doi.org/10.1016/j.compeleceng.2013.11.024
Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321–357. https://dx.doi.org/10.1613/jair.953
Chiu, M. M. (2018). Statistically modelling effects of dynamic processes on outcomes: An example of discourse sequences and group solutions. Journal of Learning Analytics, 5(1), 75–91. https://doi.org/10.18608/jla.2018.51.6
Choi, E., Bahadori, M. T., Schuetz, A., Stewart, W. F., & Sun, J. (2016). Doctor AI: Predicting clinical events via recurrent neural networks. Proceedings of Machine Learning Research, vol. 56: Machine Learning for Healthcare Conference (MLHC 2016), 19–20 August 2016, Los Angeles, CA, USA (pp. 301–318). Retreived from https://proceedings.mlr.press/v56/Choi16.pdf
Chollet, F. (2015). keras. GitHub repository. Retrieved from http://github.com/fchollet/keras
Coelho, O. B., & Silveira, I. (2017). Deep learning applied to learning analytics and educational data mining: A systematic literature review. Proceedings of the Brazilian Symposium on Computers in Education (Simpósio Brasileiro de Informática na Educação-SBIE) (Vol. 28, No. 1, p. 143). http://dx.doi.org/10.5753/cbie.sbie.2017.143
Cole, J., & Foster, H. (2007). Using Moodle: Teaching with the popular open source course management system. Sebastopol CA: O’Reilly Media.
Conijn, R., Snijders, C., Kleingeld, A., & Matzat, U. (2017). Predicting student performance from LMS data: A comparison of 17 blended courses using Moodle LMS. IEEE Transactions on Learning Technologies, 10(1), 17–29. http://dx.doi.org/10.1109/TLT.2016.2616312
Cui, Y., Chen, F., Shiri, A., & Fan, Y. (2019). Predictive analytic models of student success in higher education: A review of methodology. Information and Learning Sciences, 120(3/4), 208–227. http://dx.doi.org/10.1108/ILS-10-2018-0104
Daniel, B. (2015). Big data and analytics in higher education: Opportunities and challenges. British Journal of Educational Technology, 46(5), 904–920. http://dx.doi.org/10.1111/bjet.12230
Duffy, T. M., & Cunningham, D. J. (1996). Constructivism: Implications for the design and delivery of instruction. In D. Jonassen (Ed.), Handbook of research for educational communications and technology. New York: Macmillan.
Edwards, D., & Mercer, N. (2013). Common knowledge: The development of understanding in the classroom. London: Routledge.
Evale, D. (2016). Learning management system with prediction model and course-content recommendation module. Journal of Information Technology Education: Research, 16, 437–457. http://dx.doi.org/10.28945/3883
Gal, Y., & Ghahramani, Z. (2016). A theoretically grounded application of dropout in recurrent neural networks. In D. D. Lee, M. Sugiyama, U. von Luxburg, I. Guyon, & R. Garnett (Eds.), Proceedings of the 30th International Conference on Neural Information Processing Systems (pp. 1027–1035). Red Hook, NY: Curran Associates Inc. http://papers.nips.cc/paper/6241-a-theoreticallygrounded-application-of-dropout-in-recurrent-neural-networks.pdf
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. Cambridge, MA: MIT Press.
Graves, A., & Schmidhuber, J. (2005). Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Networks, 18(5–6), 602–610. http://dx.doi.org/10.1016/j.neunet.2005.06.042
Guarín, C. E. L., Guzmán, E. L., & González, F. A. (2015). A model to predict low academic performance at a specific enrollment using data mining. IEEE Revista Iberoamericana de Tecnologias del Aprendizaje, 10(3), 119–125. http://dx.doi.org/10.1109/RITA.2015.2452632
Hall, M. A., & Smith, L. A. (1998). Practical feature subset selection for machine learning. In C. McDonald (Ed.), Computer Science ’98: Proceedings of the 21st Australasian Computer Science Conference (ACSC’98), 4–6 February 1998, Perth, WA, Australia (pp. 181–191). Berlin: Springer. Retrieved from http://hdl.handle.net/10289/1512
Hein, G. E. (1991). Constructivist learning theory. Paper presented at The Museum and the Needs of People: International Committee of Museum Educators Conference (CECA), 15–22 October 1991, Jerusalem, Israel. Retreived from http://www.exploratorium.edu/IFI/resources/constructivistlearning.html
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780. http://dx.doi.org/10.1162/neco.19220.127.116.115
Hu, Y. H., Lo, C. L., & Shih, S. P. (2014). Developing early warning systems to predict students’ online learning performance. Computers in Human Behavior, 36, 469–478. http://dx.doi.org/10.1016/j.chb.2014.04.002
Jiang, Y., Bosch, N., Baker, R., Paquette, L., Ocumpaugh, J., Andres, J. M. A. L., Moore, A. L., & Biswas, G. (2018). Expert feature-engineering vs. deep neural networks: Which is better for sensor-free affect detection? In C. P. Rosé et al. (Eds.), Proceedings of the 19th International Conference on Artificial Intelligence in Education (AIED 2018), 27–30 June 2018, London, UK (pp. 198–211). Lecture Notes in Computer Science, vol. 10947. Springer. http://dx.doi.org/10.1007/978-3-319-93843-1_15
Kim, B. H., Vizitei, E., & Ganapathi, V. (2018). GritNet: Student performance prediction with deep learning. arXiv preprint. Retreived from http://arxiv.org/abs/1804.07405
Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint. Retreived from http://arxiv.org/abs/1412.6980
Kuhn, M. (2008). Building predictive models in R using the caret package. Journal of Statistical Software, 28(5), 1–26. http://dx.doi.org/10.18637/jss.v028.i05
Le, C. V., Pardos, Z. A., Meyer, S. D., & Thorp, R. (2018). Communication at scale in a MOOC using predictive engagement analytics. In C. P. Rosé et al. (Eds.), Proceedings of the 19th International Conference on Artificial Intelligence in Education (AIED 2018), 27–30 June 2018, London, UK (pp. 239–252). Lecture Notes in Computer Science, vol. 10947. Springer. http://dx.doi.org/10.1007/978-3-319-93843-1_18
Ling, C. X., Huang, J., & Zhang, H. (2003). AUC: A better measure than accuracy in comparing learning algorithms. In Y. Xiang & B. Chaib-draa (Eds.), Advances in Artificial Intelligence: Proceedings of the 16th Conference of the Canadian Society for Computational Studies of Intelligence (Canadian AI’03), 11–13 June 2003, Halifax, NS, Canada (pp. 329–341). Lecture Notes in Computer Science, vol. 2671. Springer. http://dx.doi.org/10.1007/3-540-44886-1_25
Lippmann, R. P. (1987). An introduction to computing with neural nets. IEEE ASSP Magazine, 4(2), 4–22. http://dx.doi.org/10.1109/MASSP.1987.1165576
Liu, R., Stamper, J. C., & Davenport, J. (2018). A novel method for the in-depth multimodal analysis of student learning trajectories in intelligent tutoring systems. Journal of Learning Analytics, 5(1), 41–54. https://doi.org/10.18608/jla.2018.51.4
Long, P., Siemens, G., Conole, G., & Gašević, D. (Eds.). (2011). Proceedings of the 1st International Conference on Learning Analytics and Knowledge (LAK ʼ11), 27 February–1 March 2011, Banff, AB, Canada. New York: ACM.
Luo, J., Sorour, S. E., Goda, K., & Mine, T. (2015). Predicting student grade based on free-style comments using Word2Vec and ANN by considering prediction results obtained in consecutive lessons. In O. C. Santos et al. (Eds.), Proceedings of the 8th International Conference on Educational Data Mining (EDM2015), 26–29 June 2015, Madrid, Spain (pp. 396–399). International Educational Data Mining Society.
Luo, L., Koprinska, I., & Liu, W. (2015). Discrimination-aware classifiers for student performance prediction. In O. C. Santos et al. (Eds.), Proceedings of the 8th International Conference on Educational Data Mining (EDM2015), 26–29 June 2015, Madrid, Spain (pp. 384–387). International Educational Data Mining Society.
Macfadyen, L. P., & Dawson, S. (2010). Mining LMS data to develop an “early warning system” for educators: A proof of concept. Computers & Education, 54(2), 588–599. http://dx.doi.org/10.1016/j.compedu.2009.09.008
Mahzoon, M. J., Maher, M. L., Eltayeby, O., Dou, W., & Grace, K. (2018). A sequence data model for analyzing temporal patterns of student data. Journal of Learning Analytics, 5(1), 55–74. https://doi.org/10.18608/jla.2018.51.5
Mandrekar, J. N. (2010). Receiver operating characteristic curve in diagnostic test assessment. Journal of Thoracic Oncology, 5(9), 1315–1316. http://dx.doi.org/10.1097/JTO.0b013e3181ec173d
Mayer, H., Gomez, F., Wierstra, D., Nagy, I., Knoll, A., & Schmidhuber, J. (2008). A system for robotic heart surgery that learns to tie knots using recurrent neural networks. Advanced Robotics, 22(13–14), 1521–1537. http://dx.doi.org/10.1163/156855308X360604
Meedech, P., Iam-On, N., & Boongoen, T. (2016). Prediction of student dropout using personal profile and data mining approach. In K. Lavangnananda, S. Phon-Amnuaisuk, W. Engchuan, & J. Chan (Eds.), Intelligent and Evolutionary Systems: Proceedings of the 19th Asia Pacific Symposium on Intelligent and Evolutionary Systems (IES 2015), 22–25 November 2015, Bangkok, Thailand. Proceedings in Adaptation, Learning and Optimization, vol. 5 (pp. 143–155). Springer. http://dx.doi.org/10.1007/978-3-319-27000-5_12
Milne, J., Jeffrey, L. M., Suddaby, G., & Higgins, A. (2012). Early identification of students at risk of failing. In M. Brown, M. Hartnett, & T. Stewart (Eds.), Proceedings of the 29th Annual Conference of the Australasian Society for Computers in Learning in Tertiary Education (ASCILITE 2012), 25–28 November 2012, Wellington, New Zealand (pp. 657–661). Australasian Society for Computers in Learning in Tertiary Education. Retrieved from https://www.learntechlib.org/p/42660/
Ochoa, X. (2016). Adaptive multilevel clustering model for the prediction of academic risk. Proceedings of the 11th Latin American Conference on Learning Objects and Technology (LACLO 2016), 3–7 October 2016, San Carlos, Costa Rica (pp. 1–8). IEEE. http://dx.doi.org/10.1109/LACLO.2016.7751800
Okubo, F., Yamashita, T., Shimada, A., & Konomi, S. (2017). Students’ performance prediction using data of multiple courses by recurrent neural network. In W. Chen et al. (Eds.), Proceedings of the 25th International Conference on Computers in Education (ICCE 2017), 4–8 December 2017, Christchurch, New Zealand (pp. 439–444). Asia-Pacific Society for Computers in Education.
Okubo, F., Yamashita, T., Shimada, A., & Ogata, H. (2017). A neural network approach for students’ performance prediction. Proceedings of the 7th International Conference on Learning Analytics and Knowledge (LAK ’17), 13–17 March 2017, Vancouver, BC, Canada (pp. 598–599). New York: ACM. http://dx.doi.org/10.1145/3027385.3029479
Oshima, J., Oshima, R., & Fujita, W. (2018). A mixed-methods approach to analyze shared epistemic agency in jigsaw instruction at multiple scales of temporality. Journal of Learning Analytics, 5(1), 10–24. https://doi.org/10.18608/jla.2018.51.2
R Core Team (2018). R: A language and environment for statistical computing. R Foundation for Statistical Computing. Retreived from https://www.r-project.org/
Rice, M. E., & Harris, G. T. (2005). Comparing effect sizes in follow-up studies: ROC area, Cohen’s D, and R. Law and Human Behavior, 29(5), 615–620. https://dx.doi.org/10.1007/s10979-005-6832-7
Riel, J., Lawless, K. A., & Brown, S. W. (2018). Timing matters: Approaches for measuring and visualizing behaviours of timing and spacing of work in self-paced online teacher professional development courses. Journal of Learning Analytics, 5(1), 25–40. https://doi.org/10.18608/jla.2018.51.3
Romero, C., Espejo, P. G., Zafra, A., Romero, J. R., & Ventura, S. (2013). Web usage mining for predicting final marks of students that use Moodle courses. Computer Applications in Engineering Education, 21(1), 135–146. https://dx.doi.org/10.1002/cae.20456
Schell, J., Lukoff, B., & Alvarado, C. (2014). Using early warning signs to predict academic risk in interactive, blended teaching environments. Internet Learning, 3(2), 6. Retreived from https://pdfs.semanticscholar.org/b064/c434033ebc2240782886bd10c1a3813ca809.pdf
Sclater, N., Peasgood, A., & Mullan, J. (2016). Learning analytics in higher education. Bristol, UK: JISC. Retreived from https://www.jisc.ac.uk/sites/default/files/learning-analytics-in-he-v3.pdf
Shahiri, A. M., & Husain, W. (2015). A review on predicting student’s performance using data mining techniques. Procedia Computer Science, 72, 414–422. https://dx.doi.org/10.1016/j.procs.2015.12.157
Siemens, G., & Long, P. (2011). Penetrating the fog: Analytics in learning and education. EDUCAUSE Review, 46(5), 30–40. Retreived from https://er.educause.edu/articles/2011/9/penetrating-the-fog-analytics-in-learning-and-education
Winne, P. H., & Hadwin, A. F. (1998). Studying as self-regulated learning. In D. J. Hacker, J. Dunlosky, & A. C. Graesser (Eds.), Metacognition in educational theory and practice (pp. 277–304). Mahwah, NJ: Lawrence Erlbaum Associates.
Zhou, Y., Huang, C., Hu, Q., Zhu, J., & Tang, Y. (2018). Personalized learning full-path recommendation model based on LSTM neural networks. Information Sciences, 444, 135–152. https://dx.doi.org/10.1016/j.ins.2018.02.053
How to Cite
Copyright (c) 2020 Journal of Learning Analytics
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons License, Attribution - NonCommercial-NoDerivs 3.0 Unported (CC BY-NC-ND 3.0) license that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).