Large-Scale Assessments for Learning: A Human-Centred AI Approach to Contextualizing Test Performance
DOI:
https://doi.org/10.18608/jla.2024.8007Keywords:
large-scale assessment, test-taking process profiles, human-in-the-loop, machine learning, deep learning, research paperAbstract
Large-scale assessments play a key role in education: educators and stakeholders need to know what students know and can do, so that they can be prepared for education policies and interventions in teaching and learning. However, a score from the assessment may not be enough—educators need to know why students got low scores, how students engaged with the tasks and the assessment, and how students with different levels of skills worked through the assessment. Process data, combined with response data, reflect students’ test-taking processes and can provide educators such rich information, but manually labelling the complex data is hard to scale for large-scale assessments. From scratch, we leveraged machine learning techniques (including supervised, unsupervised, and active learning) and experimented with a general human-centred AI approach to help subject matter experts efficiently and effectively make sense of big data (including students’ interaction sequences with the digital assessment platform, such as response, timing, and tool use sequences) to provide process profiles, that is, a holistic view of students’ entire test-taking processes on the assessment, so that performance can be viewed in context. Process profiles may help identify different sources for low performance and help generate rich feedback to educators and policy makers. The released National Assessment of Educational Progress (NAEP) Grade 8 mathematics data were used to illustrate our proposed approach.
References
Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G. S., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I., Harp, A., Irving, G., Isard, M., Jia, Y., Jozefowicz, R., Kaiser, L., Kudlur, M., . . . Zheng, X. (2015). TensorFlow: Large-scale machine learning on heterogeneous systems. https://www.tensorflow.org/
Baker, R., D’Mello, S., Rodrigo, M. M. T., & Graesser, A. C. (2010). Better to be frustrated than bored: The incidence, persistence, and impact of learners’ cognitive-affective states during interactions with three different computer-based learning environments. International Journal of Human-Computer Studies, 68(4), 223–241. https://doi.org/10.1016/j.ijhcs.2009.12.003
Baker, R. (2021). Artificial intelligence in education: Bringing it all together. In S. Vincent-Lancrin (Ed.), Pushing the frontiers with AI, blockchain, and robots (pp. 43–54). OECD.
Baker, R., Ogan, A., Madaio, M., & Walker, E. (2019). Culture in computer-based learning systems: Challenges and opportunities. Computer-based learning in context. Zenodo. https://doi.org/10.5281/zenodo.4057223
Baker, T., Smith, L., & Anissa, N. (2019). Educ-AI-tion rebooted? Exploring the future of artificial intelligence in schools and colleges (Report). NESTA. London, UK. https://www.nesta.org.uk/report/education-rebooted
Bennett, R. E., Zhang, M., Sinharay, S., Guo, H., & Deane, P. (2022). Are there distinctive profiles in examinee essay-writing processes? Educational Measurement: Issues and Practice, 41(2), 55–69. https://doi.org/10.1111/emip.12469
Biswas, G., Segedy, J., & Bunchongchit, K. (2016). From design to implementation to practice a learning by teaching system: Betty’s brain. International Journal of Artificial Intelligence in Education, 26, 350–364. https://doi.org/10.1007/s40593-015-0057-9
Deane, P.,Wilson, J., Zhang, M., Li, C., van Rijn, P., Guo, H., Roth, A.,Winchester, E., & Richter, T. (2021). The sensitivity of a scenario-based assessment of written argumentation to school differences in curriculum and instruction. International Journal of Artificial Intelligence in Education, 31(1), 57–98. https://doi.org/10.1007/s40593-020-00227-x
Ercikan, K., Guo, H., & He, Q. (2020). Use of response process data to inform group comparisons and fairness research. Educational Assessment, 25(3), 179–197. https://doi.org/10.1080/10627197.2020.1804353
Ercikan, K., Guo, H., & Por, H.-H. (2023). Uses of process data in advancing the practice and science of technology-rich assessments. In N. Foster & M. Piacentini (Eds.), Innovating assessments to measure and support complex skills (pp. 211–228). OECD Publishing. https://doi.org/10.1787/7b3123f1-en
Ercikan, K., & Pellegrino, J. (2017). Validation of score meaning in the next generation of assessments: The use of response processes. Routledge.
Foster, N., & Piacentini, M. (Eds.). (2023). Innovating assessments to measure and support complex skills. OECD Publishing. https://doi.org/10.1787/e5f3e341-en
Geron, A. (2017). Hands-on machine learning with Scikit-Learn and TensorFlow: Concepts, tools, and techniques to build intelligent systems. O’Reilly Media. https://www.bibsonomy.org/bibtex/2a91270a3a516f4edaa5d459c40317fcc/achakraborty
Gervet, T., Koedinger, K., Schneider, J., & Mitchell, T. (2020). When is deep learning the best approach to knowledge tracing? Journal of Educational Data Mining, 12(3), 31–54. https://doi.org/10.5281/zenodo.4143614
Gordon, E. (2020). Toward assessment in the service of learning. Educational Measurement: Issues and Practice, 39(3), 72–78. https://doi.org/10.1111/emip.12370
Greiff, S., Niepel, C., Scherer, R., & Martin, R. (2016). Understanding students’ performance in a computer-based assessment of complex problem solving: An analysis of behavioral data from computer-generated log files. Computers in Human Behavior, 61, 36–46. https://doi.org/10.1016/j.chb.2016.02.095
Guo, H. (2022). How did students engage with a remote educational assessment? A case study. Educational Measurement: Issues and Practice, 41(3), 58–68. https://doi.org/10.1111/emip.12476
Guo, H., & Ercikan, K. (2021a). Differential rapid responding across language and cultural groups. Educational Research and Evaluation, 26(5-6), 302–327. https://doi.org/10.1080/13803611.2021.1963941
Guo, H., & Ercikan, K. (2021b). Using response-time data to compare the testing behaviors of English language learners (ELLs) to other test-takers (non-ELLs) on a mathematics assessment. ETS Research Report, 2021(1), 1–15. https://doi.org/10.1002/ets2.12340
Guo, H., Zhang, M., Deane, P., & Bennett, R. (2020). Effects of scenario-based assessment on students’ writing processes. Journal of Educational Data Mining, 12(1), 19–45. https://doi.org/10.5281/zenodo.3911797
International Test Commission and Association of Test Publishers. (2022). Guidelines for technology-based assessment. https://www.intestcom.org/news/38
Johnson, M. S., & Liu, X. (2022). Psychometric considerations for the joint modeling of response and process data [Paper presented at the 2022 IMPS annual meeting, 11–15 July 2022, Bologna, Italy].
Kleinman, E., Shergadwala, M., Teng, Z., Villareale, J., Bryant, A., Zhu, J., & Seif El-Nasr, M. (2022). Analyzing students’ problem-solving sequences: A human-in-the-loop approach. Journal of Learning Analytics, 9(2), 138–160. https://doi.org/10.18608/jla.2022.7465
Lagud, M. C. V., & Rodrigo, M. M. T. (2010). The affective and learning profiles of students using an intelligent tutoring system for algebra. In V. Aleven, J. Kay, & J. Mostow (Eds.), Proceedings of the 10th International Conference on Intelligent Tutoring Systems (ITS 2010), 14–18 June 2010, Pittsburgh, Pennsylvania, USA (pp. 255–263). Springer. https://doi.org/10.1007/978-3-642-13388-6_30
Levy, R. (2020). Implications of considering response process data for greater and lesser psychometrics. Educational Assessment, 25(3), 218–235. https://doi.org/10.1080/10627197.2020.1804352
Miao, F., Holmes, W., Huang, R., & Zhang, H. (2021). AI and education: Guidance for policymakers. UNESCO.
National Assessment Governing Board. (2020). Response process data from the 2017 NAEP grade 8 mathematics assessment. https://www.nationsreportcard.gov/process_data/
National Center for Education Statistics. (2017). NAEP 2017 sample design. https://nces.ed.gov/nationsreportcard/tdw/sample_design/2017/naep_2017_sample_design.aspx
National Center for Education Statistics. (2021). NAEP DBA tutorial. https://enaep-public.naepims.org/2017/EN/main.html?
subject=Math8
National Center for Education Statistics. (2022). NAEP questions tool. https://nces.ed.gov/NationsReportCard/nqt/
National Research Council. (2000). How people learn: Brain, mind, experience, and school: Expanded edition. https://doi.org/10.17226/9853
Nawaz, S., Kennedy, G., Bailey, J., & Mead, C. (2020). Moments of confusion in simulation-based learning environments. Journal of Learning Analytics, 7(3), 118–137. https://doi.org/10.18608/jla.2020.73.9
Office of Educational Technology. (2023). Artificial intelligence and the future of teaching and learning: Insights and recommendations (Report). U.S. Department of Education. Washington, DC, 2023.
Organisation for Economic Co-operation and Development. (2020). PISA 2018 database. https://www.oecd.org/pisa/data/2018database/
Paquette, L., & Baker, R. S. (2019). Comparing machine learning to knowledge engineering for student behavior modeling: A case study in gaming the system. Interactive Learning Environments, 27(5-6), 585–597. https://doi.org/10.1080/10494820.2019.1610450
Pellegrino, J. W. (2020). Important considerations for assessment to function in the service of education. Educational Measurement: Issues and Practice, 39(3), 81–85. https://doi.org/10.1111/emip.12372
Pohl, S., Ulitzsch, E., & von Davier, M. (2021). Reframing rankings in educational assessments. Science, 372(6540), 338–340. https://www.science.org/doi/abs/10.1126/science.abd3300
Pools, E., & Monseur, C. (2021). Student test-taking effort in low-stakes assessments: Evidence from the English version of the PISA 2015 science test. Large-scale Assessments in Education, 9(10). https://doi.org/10.1186/s40536-021-00104-6
Radwan, A. M. (2019). Human active learning. In S. M. Brito (Ed.), Active learning. IntechOpen. https://doi.org/10.5772/intechopen.81371
Rios, J., & Guo, H. (2020). Can culture be a salient predictor of test-taking engagement? An analysis of differential noneffortful responding on an international college-level assessment of critical thinking ISLA. Applied Measurement in Education, 33(4), 263–279. https://doi.org/10.1080/08957347.2020.1789141
Rios, J., Guo, H., Mao, L., & Liu, O. L. (2017). Evaluating the impact of careless responding on aggregated-scores: To filter unmotivated examinees or not? International Journal of Testing, 17(1), 74–104. https://doi.org/10.1080/15305058.2016.1231193
Rizve, M. N., Duarte, K., Rawat, Y. S., & Shah, M. (2021). In defense of pseudo-labeling: An uncertainty-aware pseudo-label selection framework for semi-supervised learning. In Proceedings of the 2021 International Conference on Learning Representations (ICLR 2021), 4 May 2021, Vienna, Austria. https://openreview.net/forum?id=-ODN6SbiUU
Trinh, H. D., Zeydan, E., Giupponi, L., & Dini, P. (2019). Detecting mobile traffic anomalies through physical control channel fingerprinting: A deep semi-supervised approach. IEEE Access, 7, 152187–152201. https://doi.org/10.1109/ACCESS.2019.2947742
Ulitzsch, E., He, Q., & Pohl, S. (2022). Using sequence mining techniques for understanding incorrect behavioral patterns on interactive tasks. Journal of Educational and Behavioral Statistics, 47(1), 3–35. https://doi.org/10.3102/10769986211010467
van der Linden. (2007). A hierarchical framework for modeling speed and accuracy on test items. Psychometrika, 72(3), 287–308. https://doi.org/10.1007/s11336-006-1478-z
Wang, C., & Xu, G. (2015). A mixture hierarchical model for response times and response accuracy. British Journal of Mathematical and Statistical Psychology, 68(3), 456–477. https://doi.org/10.1111/bmsp.12054
Wise, S. (2017). Rapid-guessing behavior: Its identification, interpretation, and implications. Educational Measurement: Issues and Practice, 36(4), 52–61. https://doi.org/10.1111/emip.12165
Wise, S. (2021). Six insights regarding test-taking disengagement. Educational Research and Evaluation, 26(5-6), 328–338. https://doi.org/10.1080/13803611.2021.1963942
Xie, Q., Dai, Z., Hovy, E., Luong, T., & Le, Q. V. (2020). Unsupervised data augmentation for consistency training. In Proceedings of the 34th Conference on Neural Information Processing Systems (NeurIPS 2020), 6–12 December 2020, Vancouver, British Columbia, Canada (pp. 6256–6268). Curran Associates. https://dl.acm.org/doi/pdf/10.5555/3495724.3496249
Zhu, X., Lafferty, J., & Ghahramani, Z. (2003). Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions. In Proceedings of the ICML 2003 Workshop on the Continuum from Labeled to Unlabeled Data in Machine Learning and Data Mining, 21 August 2003,Washington, DC, USA (pp. 58–65). Springer. https://doi.org/10.1007/978-3-319-12637-1_27
Zoanetti, N., & Griffin, P. (2017). Log-file data as indicators for problem-solving processes. In B. Csapo & J. Funke (Eds.), The nature of problem solving: Using research to inspire 21st century learning. OECD Publishing. https://doi.org/10.1787/9789264273955-en
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Journal of Learning Analytics

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.