The Promise of MOOCs Revisited? Demographics of Learners Preparing for University




distance education, massive open online courses, learning analytics, learning design, MOOCs, research paper


This paper leverages cluster analysis to provide insight into how traditionally underrepresented learners engage with entry-level massive open online courses (MOOCs) intended to lower the barrier to university enrolment, produced by a major research university in the United States. From an initial sample of 260,239 learners, we cluster analyze a subset of data from 29,083 participants who submitted an assignment in one of nine entry-level MOOC courses. Manhattan distance and Gower distance measures are computed based on engagement, achievement, and demographic data. To our knowledge, this marks one of the first such uses of Gower distance to cluster mixed-variable data to explore fairness and equity in the MOOC literature. The clusters are derived from CLARA and PAM algorithms, enriched by demographic data, with a particular focus on education level, as well as approximated socioeconomic status (SES) for a smaller subset of learners. Results indicate that learners without a college degree are more likely to be high-performing compared to college-educated learners. Learners from lower SES backgrounds are just as likely to be successful as learners from middle and higher SES backgrounds. While MOOCs have struggled to improve access to learning, more fair and equitable outcomes for traditionally underrepresented learners are possible.


Anderson, A., Huttenlocher, D., Kleinberg, J., & Leskovec, J. (2014). Engaging with massive online courses. Proceedings of the 23rd International Conference on World Wide Web (WWW ’14), 7–11 April 2014, Seoul, Republic of Korea (pp. 687–698). ACM Press.

Arora, S., Goel, M., Sabitha, A. S., & Mehrotra, D. (2017). Learner groups in massive open online courses. American Journal of Distance Education, 31(2), 80–97.

Brooks, C., Thompson, C., & Teasley, S. (2015). Who you are or what you do: Comparing the predictive power of demographics vs. activity patterns in massive open online courses (MOOCs). Proceedings of the 2nd ACM Conference on Learning @ Scale (L@S 2015), 14–18 March 2015, Vancouver, BC, Canada (pp. 245–248). ACM Press.

Carnegie Classifications of Institutions of Higher Education. (2017). Basic classification description.

Charrad, M., Ghazzali, N., Boiteau, V., & Niknafs, A. (2014). NbClust: An R package for determining the relevant number of clusters in a data set. Journal of Statistical Software, 61(6), 1–36.

Chen, B., Håklev, S., Harrison, L., Najafi, H., & Rolheiser, C. (2015). How do MOOC learners’ intentions relate to their behaviors and overall outcomes? Proceedings of the American Educational Research Association Annual Conference (AERA 2015), 16–20 April 2015, Chicago, IL, USA.'_intentions_relate_to_their_behaviors_and_overall_outcomes#fullTextFileContent

Chetty, R., Friedman, J. N., Saez, E., Turner, N., & Yagan, D. (2017). Mobility report cards: The role of colleges in intergenerational mobility (No. w23618). National Bureau of Economic Research.

Deng, R., Benckendorff, P., & Gannaway, D. (2017). Understanding learning and teaching in MOOCs from the perspectives of students and instructors: A review of literature from 2014 to 2016. Proceedings of the 5th European MOOCs Stakeholders Summit (EMOOCs 2017), 22–26 May 2017, Madrid, Spain (pp. 176–181). Springer.

Deng, R., Benckendorff, P., & Gannaway, D. (2019). Progress and new directions for teaching and learning in MOOCs. Computers & Education, 129, 48–60.

Dillahunt, T. R., Wang, B. Z., & Teasley, S. (2014). Democratizing higher education: Exploring MOOC use among those who cannot afford a formal education. The International Review of Research in Open and Distributed Learning, 15(5), 177–196.

Dowle, M., Srinivasan, A., Gorecki, J., Chirico, M., Stetsenko, P., Short, T., Lianoglou, S., Antoyan, E., Bonsch, M., Parsonage, H., Ritchie, S., Ren, K., Tan, X., Saporta, R., Seiskari, O., Dong, X., Lang, M., Iwasaki, W., Wenchel, S., ... Schwen, B. (2022). Extension of ‘data.frame.’

Ebbert, D., & Dutke, S. (2020). Patterns in students’ usage of lecture recordings: A cluster analysis of self-report data. Research in Learning Technology, 28.

Engle, D., Mankoff, C., & Carbrey, J. (2015). Coursera’s introductory human physiology course: Factors that characterize successful completion of a MOOC. The International Review of Research in Open and Distributed Learning, 16(2), 46–68.

Escobari, M., Seyal, I., & Meaney, M. J. (2019). Realism about reskilling: Upgrading the career prospects of America’s low-wage workers. The Brookings Institution.

Ferguson, R., & Clow, D. (2015). Examining engagement: Analysing learner subpopulations in massive open online courses (MOOCs). Proceedings of the 5th International Conference on Learning Analytics and Knowledge (LAK ʼ15), 16–20 March 2015, Poughkeepsie, NY, USA (pp. 51–58). ACM Press.

Ferguson, R., Clow, D., Beale, R., Cooper, A. J., Morris, N., Bayne, S., & Woodgate, A. (2015). Moving through MOOCS: Pedagogy, learning design and patterns of engagement. Proceedings of the 10th European Conference on Technology Enhanced Learning: Design for Teaching and Learning in a Networked World (EC-TEL 2015), 15–17 September 2015, Toledo, Spain (pp. 70–84). Lecture Notes in Computer Science, vol. 9307. Springer.

Ganelin, D., & Chuang, I. (2019). IP geolocation underestimates regressive economic patterns in MOOC usage. Proceedings of the 2019 11th International Conference on Education Technology and Computers (ICETC 2019), 28–31 October 2019, Amsterdam, Netherlands (pp. 268–272). ACM Press.

Gardner, J., & Brooks, C. (2018). Student success prediction in MOOCs. User Modeling and User-Adapted Interaction, 28, 127–203.

Goldberg, L. R., Bell, E., King, C., O’Mara, C., McInerney, F., Robinson, A., & Vickers, J. (2015). Relationship between participants’ level of education and engagement in their completion of the Understanding Dementia massive open online course. BMC medical education, 15, 60.

Gower, J. C. (1971). A general coefficient of similarity and some of its properties. Biometrics, 27(4), 857–871.

Greene, J. A., Oswald, C. A., & Pomerantz, J. (2015). Predictors of retention and achievement in a massive open online course. American Educational Research Journal, 52(5), 925–955.

Guo, P. J., & Reinecke, K. (2014). Demographic differences in how students navigate through MOOCs. Proceedings of the 1st ACM Conference on Learning @ Scale (L@S 2014), 4–5 March 2014, Atlanta, GA, USA (pp. 21–30). ACM Press.

Hansen, J. D., & Reich, J. (2015). Democratizing education? Examining access and usage patterns in massive open online courses. Science, 350(6265), 1245–1248.

Jang, M., & Vorderstrasse, A. (2019). Socioeconomic status and racial or ethnic differences in participation: Web-based survey. JMIR Research Protocols, 8(4), e11865.

Joksimović, S., Kovanović, V., & Dawson, S. (2019). The journey of learning analytics. HERDSA Review of Higher Education, 6, 37–63.

Joksimović, S., Poquet, O., Kovanović, V., Dowell, N., Mills, C., Gašević, D., Dawson, S., Graesser, A. C., & Brooks, C. (2018). How do we model learning at scale? A systematic review of research on MOOCs. Review of Educational Research, 88(1), 43–86.

Jordan, K. (2014). Initial trends in enrolment and completion of massive open online courses. The International Review of Research in Open and Distributed Learning, 15(1), 133–160.

Kassambara, A. (2017). Practical guide to cluster analysis in R: Unsupervised machine learning. CreateSpace Independent Publishing Platform.

Kassambara, A., & Mundt, F. (2016). Package ‘factoextra’: Extract and visualize the results of multivariate data analyses.

Kaufman, L., & Rousseeuw, P. J. (1990). Finding groups in data: An introduction to cluster analysis. John Wiley & Sons.

Khalil, M., & Ebner, M. (2017). Clustering patterns of engagement in massive open online courses (MOOCs): The use of learning analytics to reveal student categories. Journal of Computing in Higher Education, 29(1), 114–132.

Kizilcec, R. F., & Halawa, S. (2015). Attrition and achievement gaps in online learning. Proceedings of the 2nd ACM Conference on Learning @ Scale (L@S 2015), 14–18 March 2015, Vancouver, BC, Canada (pp. 57–66). ACM Press.

Kizilcec, R. F., Piech, C., & Schneider, E. (2013). Deconstructing disengagement: Analyzing learner subpopulations in massive open online courses. Proceedings of the 3rd International Conference on Learning Analytics and Knowledge (LAK ’13), 8–12 April 2013, Leuven, Belgium (pp. 170–179). ACM Press.

Kizilcec, R. F., & Schneider, E. (2015). Motivation as a lens to understand online learners: Toward data-driven design with the OLEI scale. ACM Transactions on Computer–Human Interaction, 22(2), 1–24.

Kovanović, V., Joksimović, S., Gašević, D., Owers, J., Scott, A.-M., & Woodgate, A. (2016). Profiling MOOC course returners: How does student behaviour change between two course enrollments? Proceedings of the 3rd ACM Conference on Learning @ Scale (L@S 2016), 25–28 April 2016, Edinburgh, Scotland (pp. 269–272). ACM Press.

Kravvaris, D., Kermanidis, K. L., & Ntanis, G. (2016). How MOOCs link with social media. Journal of the Knowledge Economy, 7, 461–487.

Lambert, S. R. (2020). Do MOOCs contribute to student equity and social inclusion? A systematic review 2014–18. Computers & Education, 145, 103693.

Li, Q., & Baker, R. (2018). The different relationships between engagement and outcomes across participant subgroups in massive open online courses. Computers & Education, 127, 41–65.

Lodder, P. (2014). To impute or not impute: That’s the question. In G. J. Mellenbergh & H. J. Adèr (Eds.), Advising on research methods: Selected topics 2013. Johannes van Kessel Publishing.

Long, J. S. (1997). Regression models for categorical and limited dependent variables. Sage Publications.

Loohach, R., & Garg, K. (2012). Effect of distance functions on k-means clustering algorithm. International Journal of Computer Applications, 49(6), 7–9.

Maechler, M., Rousseeuw, P., Struyf, A., Hubert, M., Hornik, K., Studer, M., Roudier, P., Gonzalez, J., Kozlowski, K., Schubert, E., & Murphy, K. (2022). “Finding groups in data”: Cluster analysis extended.

Martin, D. (2016). Clustering mixed data types in R.

Meaney, M. (2021). Essays on the design of inclusive learning in Massive Open Online Courses, and implications for educational futures [Unpublished doctoral dissertation]. University of Cambridge.

Meaney, M. J., & Fikes, T. (2019). Early-adopter iteration bias and research-praxis bias in virtual learning environments. Companion Proceedings of the 9th International Conference on Learning Analytics and Knowledge (LAK ’19), 4–8 March 2019, Tempe, Arizona (pp. 513–519). Society for Learning Analytics Research (SoLAR).

Meaney, M., & Fikes, T. (2022). Adding a demographic lens to cluster analysis of participants in entry-level massive open online courses (MOOCs). Proceedings of the 9th ACM Conference on Learning @ Scale (L@S 2022), 1–3 June 2022, New York, NY, USA (pp. 355–359). ACM Press.

R Core Team. (2019). R: A language and environment for statistical computing. The R Foundation for Statistical Computing.

Ramesh, A., Goldwasser, D., Huang, B., Daume III, H., & Getoor, L. (2014). Learning latent engagement patterns of students in online courses. Proceedings of the 28th Conference on Artificial Intelligence (AAAI-18), 27–31 July 2014, Québec City, Québec, Canada (pp. 1272–1278). AAAI Press.

Reich, J., & Ruipérez-Valiente, J. A. (2019). The MOOC pivot. Science, 363(6423), 130–131.

Rohs, M., & Ganz, M. (2015). MOOCs and the claim of education for all: A disillusion by empirical data. The International Review of Research in Open and Distributed Learning, 16(6), 1–19.

Savje, F. (2021). Distances. GitHub, Inc.

Schubert, E., & Rousseeuw, P. J. (2019). Faster k-medoids clustering: Improving the PAM, CLARA, and CLARANS algorithms. Proceedings of the 12th International Conference on Similarity Search and Applications (SISAP 2019), 2–4 October 2019, Newark, NJ, USA (pp. 171–187). Springer.

Si, Y., & Reiter, J. P. (2013). Nonparametric Bayesian multiple imputation for incomplete categorical variables in large-scale assessment surveys. Journal of Educational and Behavioral Statistics, 38(5), 499–521.

Stack Exchange. (2014, April 26). How should I interpret GAP statistic? [Online forum post].

Stich, A. E., & Reeves, T. D. (2017). Massive open online courses and underserved students in the USA. The Internet and Higher Education, 32, 58–71.

Tibshirani, R. (2013). Clustering 1: K-means, K-medoids.

Tibshirani, R., Walther, G., & Hastie, T. (2001). Estimating the number of clusters in a data set via the gap statistic. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 63(2), 411–423.

Torres-Reyna, O. (2012). Getting started in Logit and ordered Logit regression. Princeton University.

Wang, Y., Fikes, T. G., & Pettyjohn, P. (2018). Open scale courses: Exploring access and opportunity for less-educated learners. Proceedings of the 2018 Learning with MOOCS Conference (LWMOOCS ’18), 26–28 September 2018, Madrid, Spain (pp. 102–105). IEEE.

Wickham, H. (2022). Easily install and load the ‘tidyverse’ [R package tidyverse version 1.3.2]. Comprehensive R Archive Network (CRAN).

Zamecnik, A., Kovanović, V., Joksimović, S., & Liu, L. (2022). Exploring non-traditional learner motivations and characteristics in online learning: A learner profile study. Computers and Education: Artificial Intelligence, 3, 100051.

Zhang, Q., Bonafini, F. C., Lockee, B. B., Jablokow, K. W., & Hu, X. (2019). Exploring demographics and students’ motivation as predictors of completion of a massive open online course. The International Review of Research in Open and Distributed Learning, 20(2), 140–161.

Zhenghao, C., Alcorn, B., Christensen, G., Eriksson, N., Koller, D., & Emanuel, E. J. (2015, September 22). Who’s benefiting from MOOCs, and why. Harvard Business Review.




How to Cite

Meaney, M., & Fikes, T. (2023). The Promise of MOOCs Revisited? Demographics of Learners Preparing for University. Journal of Learning Analytics, 10(1), 113-132.



Special Section on Fairness, Equity, and Responsibility in Learning Analytics