Learning Factor Models of Students at Risk of Failing in the Early Stage of Tertiary Education

Geraldine Gray
Colm McGuinness
Philip Owende
Markus Hofmann


This paper reports on a study to predict students at risk of failing based on data available prior to commencement of first year of study. The study was conducted over three years, 2010 to 2012, on a student population from a range of academic disciplines, n=1,207. Data was gathered from both student enrolment data maintained by college administration, and an online, self-reporting, learner profiling tool administered during first-year student induction. Factors considered included prior academic performance, personality, motivation, self-regulation, learning approaches, age and gender.  Models were trained on data from the 2010 and 2011 student cohort, and tested on data from the 2012 student cohort. A comparison of eight classification algorithms found k-NN achieved best model accuracy (72%), but results from other models were similar, including ensembles (71%), support vector machine (70%) and a decision tree (70%). Models of subgroups by age and discipline achieved higher accuracies, but were affected by sample size; n<900 underrepresented patterns in the dataset. Results showed that factors most predictive of academic performance in first year of study at tertiary education included age, prior academic performance and self-efficacy. This study indicated that early modelling of first year students yielded informative, generalisable models that identified students at risk of failing.

Full Text:



ACT (2012), 2012 retention completion summary tables, Technical report, www.act.org.

Allick, J. & Realo, A. (1997), Intelligence, academic abilities, and personality, Personality and Individual Differences, 23 (5), 809–814.

Apter, M. J. (1989), Reversal theory: motivation, emotion and personality, London: Routledge.

Baker, R. S. J. D. & Yacef, K. (2010), The state of educational data mining in 2009: A review and future visions, Journal of Educational Data Mining, 1(1), 3–17.

Baumann, K. (2003), Cross-validation as the objective function for variable-selection techniques, Trends in Analytical Chemistry, 22(6), 395–406.

Bergin, S. (2006), Statistical and machine learning models to predict programming performance, (Unpublished doctoral dissertation). National University of Ireland, Maynooth.

Bidjerano, T. & Dai, D.Y. (2007), The relationship between the big-five model of personality and self-regulated learning strategies, Learning and Individual Differences 17, 69 – 81.

Biggs, J., Kember, D. & Leung, D. (2001), The revised two-factor study process questionnaire: R-SPQ-2F, British Journal of Education Psychology, 71, 133–149.

Bruinsma, M. (2004), Motivation, cognitive processing and achievement in higher education, Learning and Instruction, 14, 549–568.

Buckingham Shum, S. & Deakin Crick, R. (2012), Learning dispositions and transferable competencies. pedagogy, modelling and learning analytics. Second International Conference on Learning Analytics and Knowledge, Vancouver, BC, Canada, pp. 92–101.

Burisch, M. (1997), Test length and validity revisited, European Journal of Personality, 11(4), 303–315.

Carpenter, J. & Bithell, J. (2000), Bootstrap confidence intervals - When, Which, What? A practical guide for medical statisticians, Statistics in Medicine, 19, 1141–1164.

Cassidy, S. (2011), Exploring individual differences as determining factors in student academic achievement in higher education, Studies in Higher Education, 37(7), 1–18.

Chamorro-Premuzic, T. & Furnham, A. (2008), Personality, intelligence and approaches to learning as predictors of academic performance, Personality and Individual Differences, 44, 1596–1603.

Chatfield, C. (1983, June). Statistics for technology. A course in applied statistics. Chapman and Hall/CRC.

Chatti, M. A., Dychhoff, A. L., Schroeder, U. & Thüs, H. (2012), A reference model for learning analytics, International Journal of Technology Enhanced Learning, Special Issue on State of the Art in TEL, pp. 318–331.

Chawla, N. V., Bowyer, K. W., Hall, L. O. & Kelelmeyer, W. P. (2002), Smotesynthetic minority over-sampling technique, Journal of Artificial Intelligence Research, 16, 321-357.

Colby, J. (2004), Attendance and attainment – A comparative study, Fifth Annual Conference of the Information and

Computer Sciences: Learning and Teaching Support Network (ICS-LTSN), 31 August - 2 September, University of Ulster. doi: http://www.ics.heacademy.ac.uk/italics/Vol4-2/ITALIX.pdf.

Conrad, M. A. (2006), Aptitude is not enough: How personality and behavior predict academic performance, Journal of Research in Personality, 40, 339–346.

Covington, M. V. (2000), Goal theory, motivation, and school achievement: An integrative review, Annual Review of Psychology, 51, 171–200.

De Clercq, M., Galand, B. & Frenay, M. (2013), Chicken or the egg: Longitudinal analysis of the causal dilemma between goal orientation, self-regulation and cognitive processing strategies in higher education, Studies in Educational Evaluation, 39, 4–13.

De Raad, B. & Schouwenburg, H. C. (1996), Personality in learning and education: A Review, European Journal of Personality, 10, 303–336.

Deakin Crick, R. & Goldspink, C. (2014), Learning dispositions, self-theories and student engagement, British Journal of Educational Studies, pp. 1–17.

DOI: http://dx.doi.org/10.1080/00071005.2014.904038

Dekker, G., Pechenizkiy, M. & Vleeshouwers, J. (2009), Predicting students drop out: A case study. In T. Barnes, M. C.

Desmarais, C. Romero & S. Ventura (Eds.), Proceedings of the 2nd International Conference on Educational Data Mining, Cordoba, Spain, pp. 41–50.

Dietterich, T. G. (1998). Approximate statistical tests for comparing supervised classification learning algorithms. Journal of Neural Computation, 10(7), pp. 1895-1923.

Diseth, Á. (2011), Self-efficacy, goal orientations and learning strategies as mediators between preceding and subsequent academic achievement, Learning and Individual Differences, 21, 191–195.

Dollinger, S. J., Matyja, A. M. & Huber, J. L. (2008), Which factors best account for academic success: Those which college students can control or those they cannot?, Journal of Research in Personality, 42, 872–885.

Drachsler, H. & Greller, W. (2012), The pulse of learning analytics. Understandings and expectations from the stakeholders. Second International Conference on Learning Analytics and Knowledge, ACM, Vancouver, BC, Canada, pp. 120–129.

Duff, A., Boyle, E., Dunleavy, K. & Ferguson, J. (2004), The relationship between personality, approach to learning and academic performance, Personality and Individual Differences, 36, 1907–1920.

Duffin, D. & Gray, G. (2009a), Accommodating learner diversity in the classroom, International Technology, Education and Development Conference, 2009, Valencia, Spain., pp 4629-4635.

Duffin, D. & Gray, G. (2009b), Using ICT to enable inclusive teaching practices in higher education. In Emiliani, P L, Burzagli, L, Como A, Gabbanini, F & Salminen (Eds.) Assisstive Technology Research Series, 25, 640-645.

Elliot, A. & Harackiewicz, J. M. (1996), Approach and avoidance achievement goals and intrinsic motivation a mediational analysis, Journal of Personality and Social Psychology, 70(3), 461–475.

Entwhistle, N. (2005), Contrasting perspectives in learning. In

F. Marton, D. Hounsell & N. Entwhistle (Eds.) The Experience of Learning, Edinburgh: University of Edinburgh, Centre for Teaching, Learning and Assessment, pp. 3–22. http://www.tla.ed.ac.uk/resources/EoL.html

Eppler, M. A. & Harju, B. L. (1997), Achievement motivation goals in relation to academic performance in traditional and nontraditional college students, Research in Higher Education, 38 (5), 557–573.

Farsides, T. & Woodfield, R. (2003), Individual differences and undergraduate academic success: The roles of personality, intelligence, and application, Personality and Individual Differences, 34, 1225–1243.

Fleming, N. D. (1995), I’m different, not dumb. Modes of presentation (VARK) in the tertiary classroom, Research and Development in Higher Education, Proceedings of the 1995 Annual Conference of the Higher Education and Research Development Society of Australasia, 18, 308–313.

Galesic, M. & Bosnjak, M. (2009), Effect of questionniare lenght on participation and indicators of response qualirt in a web survey, Public Opinion Quarterly on Topics in Survey Measurement and Public Opinion, 73(2), 349–360.

Ganyaupfu, E. M. (2013), Teaching methods and students academic performance, International Journal of Humanities and Social Science Invention, 2(9), 29–35.

Gilakjani, A. P. (2012), A match or mismatch between learning styles of the learners and teaching styles of the teachers, International Journal of Modern Education and Computer Science, 11, 51–60.

Goldberg, L. R., Johnson, J. A., Eber, H. W., Hogan, R., Ashton, M. C., Cloninger, C. R. & Gough, H. C. (2006), The international personality item pool and the future of public-domain personality measures, Journal of Research in Personality, 40, 84–96.

Gray, G., McGuinness, C. & Owende, P. (2013), An investigation of psychometric measures for modelling academic performance in tertiary education. In S. K. DMello, R. A. Calvo & A. Olney (Eds.) Sixth International Conference on Educational Data Mining, Memphis, Tennessee, pp. 240–243.

Gray, G., McGuinness, C. & Owende, P. (2014), Non-cognitive factors of learning as predictors of academic performance in tertiary education. In S. Ritter & S. Fancsali (Eds.) Workshop on Non-Cognitive Factors and Personalization for Adaptive Learning (NCFPAL) in EDM 2014 Extended Proceedings, CEUR Workshop Proceedings, London, pp. 107–114.

Gray, G., McGuinness, C., Owende, P. & Carthy, A. (2014), A review of psychometric data analysis and applications in modelling of academic achievement in tertiary education, Journal of Learning Analytics, 1(1), 75–106.

Hair, J. F. J., W. C. Black, B. J. Babin, and R. E. Anderson (2010). Multivariate Data Analysis A Global Perspective, 2nd Ed . Pearson Education, Inc.

Hake, R. R. (1998), Interactive-engagement versus traditional methods: A six-thousand-student survey of mechanics test data for introductory physics courses, American Association of Physics Teachers, 66, 64–74.

Hall, M. A. & Homes, G. (2003), Benchmarking attributes selection techniques for discrete class data mining, IEEE Transactions on Knowledge and Data Engineering, 15(6), 1437–1447.

Han, J. & Kamber, M. (2006), Data Mining Concepts and Techniques, Burlington, MA: Morgan Kaufmann.

Herzog, S. (2006), Estimating student retention and degree-completion time: Decision trees and neural networks vis-à-vis regression, New Directions For Institutional Research, 131, 17–33.

Hoskins, S. L., Newstead, S. E. & Dennis, I. (1997), Degree performance as a function of age, gender, prior qualifications and discipline studied, Assessment and Evaluation in Higher Education, 22(3), 317–328.

Jayaprakash, S. M., Moody, E. W., Lauria, E. J. M., Regan, J. R. & Baron, J. D. (2014), Early alert of academically at-risk students an opensources analytics initiative, Journal of Learning Analytics, 1(1), 6–47.

Kablan, Z. (2014), The effect of manipulatives on mathematics achievement across different learning styles, Educational Psychology (ahead-of-print), 1–20. http://dx.doi.org/10.1080/01443410.2014.946889

Kang, Y. & Harring, J. R. (2015), Reexamining the impact of non-normality in two-group comparison procedures, Journal of Experimental Education, 83(2), 147-174.

Kappe, R. & van der Flier, H. (2010), Using multiple and specific criteria to assess the predictive validity of the big five personality factors on academic performance, Journal of Research in Personality, 44, 142–145.

Kaufman, J. C., Agars, M. D. & Lopez-Wagner, M. C. (2008), The role of personality and motivation in predicting early college academic success in non-traditional students at a hispanic-serving institution, Learning and Individual Differences, 18, 492 – 496.

Knight, S., Buckingham Shum, S. & Littleton, K. (2013), Epistemology, pedagogy, assessment and learning analytics, Third Conference on Learning Analytics and Knowledge (LAK 2013), Leuven, Belgium, pp. 75–84.

Komarraju, M., Karau, S. J., Schmeck, R. R. & Avdic, A. (2011), The big five personality traits, learning styles, and academic achievement, Personality and Individual Differences, 51, 472–477.

Komarraju, M. & Nadler, D. (2013), Self-efficacy and academic achievement. Why do implicit beliefs, goals, and effort regulation matter?, Learning and Individual Differences, 25, 67–72.

Kundel, H. L. & Polansky, M. (2003), Measurement of observer agreement, Radiology, 228(2), 303–308.

Larose, D. T. (2005), Discovering Knowledge in Data: An Introduction to Data Mining, John Wiley and Sons, Inc.

Lauria, E. J. M., Moody, E. W., Jayaprakash, S. M., Jonnalagadda, N. & Baron, J. D. (2013), Open academic analytics initiative. Initial research findings, Third Conference on Learning Analytics and Knowledge (LAK 2013), ACM, Leuven, Belgium.

Marton, F. & Säljö, R. (2005), Approaches to learning. In F. Marton, D. Hounsell & N. Entwhistle (Eds.) The Experience of Learning: Implications for teaching and studying in higher education, 3rd (Internet) edition. Edinburgh: University of Edinburgh, Centre for Teaching, Learning and Assessment, pp. 36–58. http://www.tla.ed.ac.uk/resources/EoL.html

Micceri, T. (1989), The unicorn, the normal curve, and other improbably creatures, Psychological Bulletin, 105(1), 156–166.

Miller-Reilly, B. (2006), Affective change in adult students in second chance mathematics courses: Three different teaching approaches, (Unpublished doctoral dissertation). University of Auckland.

Milne, J., Jeffrey, L. M., Suddaby, G. & Higgins, A. (2012), Early identification of students at risk of failing, Australian Society for Computers in Learning in Tertiary Education Annual Conference (ASCILITE), Vol. 1, 25-28 November, Wellington, New Zealand. http://www.ascilite.org/conferences/Wellington12/2012/images/custom/milne,_john_-_early_identification.pdf.

Minaei-Bidgoli, B., Kashy, D. A., Kortemeyer, G. & Punch, W. F. (2003), Predicting student performance: An application of data mining methods with educational web-based system lon-capa, Proceedings of the 33rd ASEE/IEEE Frontiers in Education Conference, November 5-8, Boulder, CO, pp 13-18.

Mirriahi, N., Gasevic, D., Long, P. & Dawson, S. (2014), Scientometrics as an important tool for the growth of the field of learning analytics, Journal of Learning Analytics, 1(2), 1–4.

Mitchell, T, M. (2015), Generative and Discriminative Classifiers: Naïve Bayes and Logistic Regression In Machine Learning, 2nd Ed, McGraw Hill. http://www.cs.cmu.edu/~tom/mlbook/NBayesLogReg.pdf

Mooney, O., Patterson, V., OConnor, M. & Chantler, A. (2010), A study of progression in higher education: A report by the higher education authority, Technical report, Higher Education Authority, Ireland.

Moran, M. A. & Crowley, M. J. (1979), The leaving certificate and first year university performance, Journal of Statistical and Social Enquiry in Ireland XXIV, part 1, 231–266.

Naderi, H., Abdullah, H. T., Sharir, J. & Kumar, V. (2009), Creativity, age and gender as predictors of academic achievement among undergraduate students, Journal of American Science, 5(5), 101–112.

Ng, A. Y. & Jordon, M. I. (2001), On discriminative vs generative classifiers: A comparison of logistic regression and naïve Bayes, Advances in Neural Information Processing Systems (NIPS), 14, 841–848.

OECD (2013), Education at a glance 2013: Retrieved from http://www.oecd.org/edu/eag2013%20(eng)--FINAL%2020%20June%202013.pdf.

Patterson, V., Carroll, D. & Harvey, V. (2014), Key facts and figures, higher education 2012 13, Technical report, Higher Education Authority.

Pérez-Martínez, J. E., García-García, M. J., Perdemo, W. H. & Villamide-Díaz, M. J. (2009), Analysis of the results of the continuous assessment in the adaptation of the Universidad Politécnica de Madrid to the European Higher Education Area. In R. Hadgraft & L. Mann (Eds.) Proceedings of the Research in Engineering Education Symposium, Palm Cove, Queensland, Australia.

Pintrich, P., Smith, D., Garcia, T. & McKeachie, W. (1991), A manual for the use of the motivated strategies for learning questionnaire, Technical Report 91-B-004, The Regents of the University of Michigan.

Provost, F., Jensen, D. & Oates, T. (1999), Efficient progressive sampling, Knowldege Discovery in Data, San Diego, CA, pp. 23–31.

Rice, J. A. (1995), Mathematical Statistics and Data Analysis, 2nd edition, Duxbury Press, Belmont, CA.

Robbins, S. B., Lauver, K., Le, H., Davis, D. & Langley, R. (2004), Do psychosocial and study skill factors predict college outcomes? A meta analysis, Psychological Bulletin, 130 (2), 261–288.

Romero, C., Ventura, S., Espejo, P. G. & Hervás, C. (2008), Data mining algorithms to classify students, Proceedings of the 1st International Conference on Educational Data Mining, pp. 8–17.

Roll, I., Winne, P. (2015), Understanding, evaluating, and supporting self-regulated learning using learning analytics, Journal of Learning Analytics, 2(1), pp. 7–12.

Sachin, B. R. & Vijay, S. M. (2012), A survey and future vision of data mining in educational field, Second International Conference on Advanced Computing Communication Technologies (ACCT), pp. 96–100.

Shute, V. & Ventura, M. (2013), Stealth Assessment. Measuring and Supporting Learning in Video Games, The John D. and Catherine T. MacArthur Foundation Reports on Digital Media and Learning, MIT Press.

Siemens, G. (2012), Learning analytics. Envisioning a research discipline and a domain of practice, Proceedings of the 2nd International Conference on Learning Analytics and Knowledge, pp. 4–8.

Siemens, G. & Baker, R. S. J. D. (2012), Learning analytics and educational data mining. Towards communication and collaboration, Proceedings of the 2nd International Conference on Learning Analytics and Knowledge, pp. 252–254.

Sins, P. H. M., van Joolingen, W. R., Savelsbergh, E. R. & van

Hout-Wolters, B. (2008), Motivation and performance within a collaborative computer-based modeling task: Relations between students achievement goal orientation, self-efficacy, cognitive processing, and achievement, Contemporary Educational Psychology, 33, 58–77.

Smith, Z. R. & Wells, C. S. (2006), Central limit theorem and sample size, Northeastern Educational Research Association, October 18-20, Kerhonkson, New York.

Steel, P. & Konig, C. J. (2006), Integrating theories of motivation, Academy of Management Review, 31 (4), 889–913.

Superby, J. F., Vandamme, J. P. & Meskens, N. (2006), Determination of factors influencing the achievement of the first-year university students using data mining methods, Proceedings of the Workshop on Educational Data Mining, Eight International Conference on Intelligence Tutoring Systems, Taiwan, pp 1-8.

Swanberg, A. B. & Martinsen, Ø. L. (2010), Personality, approaches to learning and achievement, Educational Psychology, 30(1), 75–88.

Tavakol, M. & Dennick, R., Making sense of Cronbach Alpha, International Journal of Medical Education, 2, 53-55.

Tempelaar, D. T., Cuypers, H., van de Vrie, E., Heck, A. & van der Kooij, H. (2013), Formative assessment and learning analytics, Proceedings of the Third International Conference on Learning Analytics and Knowledge (LAK 13), ACM, New York, NY, USA, pp. 205–209.

Thai-Nghe, N. Janecek, P. & Haddawy, P. (2007), A comparative analysis of techniques for predicting academic performance, 37th ASEE/IEEE Frontiers in Education Conference, October 10-13, Milwaukee, WI, pp T2G7–T2G12.

Tinto, V. (2006), Research and practice of student retention. What next?, Journal of College Student Retention, 8(1), 1–19.

Tishman, S., Jay, E. & Perkins, D. N. (1993), Teaching thinking disposition: From transmission to enculturation, Theory into Practice, 32, 147–153.

Volet, S. E. (1996), Cognitive and affective variables in academic learning: The significance of direction and effort in students goals, Learning and Instruction, 7(3), 235–254.

Weiss, G. M. (2004), Mining with Rarity: A Unifying Framework, SIGKDD Explorations, 6(1),7-19.

Wigfield, A., Eccles, J. S. & Pintrich, P. (1996), Handbook of Educational Psychology, Simon & Schuster Macmillan, pp. 148–185.

Winters, F. I., Greene, J. A. & Costich, C. M. (2008), Self-regulated learning within computer-based learning environments a critical analysis, Educational Psychology Review, 20, 429–444.

Wise, A. F. & Shaffer, D. W. (2015). Why theory matters more then ever in the age of big data, Journal of Learning Analytics, 2(2), 5-13

Yang, J. & Honavar, V. (1998), Feature subset selection using a genetic algorithm, IEEE Intellegent Systems, 13(2), 44–49.

Zimmerman, B. J. (1990), Self-regulated learning and academic achievement: An overview, Educational Psychologist, 25(1), 3–17.

DOI: http://dx.doi.org/10.18608/jla.2016.32.20


  • There are currently no refbacks.