The Effects of Explanations in Automated Essay Scoring Systems on Student Trust and Motivation

Rianne Conijn; Patricia Kahr; Chris Snijders

doi:10.18608/jla.2023.7801

Authors

Rianne Conijn eindhoven university of technology https://orcid.org/0000-0002-6316-4892
Patricia Kahr
Chris Snijders

DOI:

https://doi.org/10.18608/jla.2023.7801

Keywords:

explainable artificial intelligence, automated essays scoring systems, trust, motivation, academic writing, research paper

Abstract

Ethical considerations, including transparency, play an important role when using artificial intelligence (AI) in education. Explainable AI has been coined as a solution to provide more insight into the inner workings of AI algorithms. However, carefully designed user studies on how to design explanations for AI in education are still limited. The current study aimed to identify the effect of explanations of an automated essay scoring system on students’ trust and motivation. The explanations were designed using a needs-elicitation study with students in combination with guidelines and frameworks of explainable AI. Two types of explanations were tested: full-text global explanations and an accuracy statement. The results showed that both explanations did not have an effect on student trust or motivation compared to no explanations. Interestingly, the grade provided by the system, and especially the difference between the student’s self-estimated grade and the system grade, showed a large influence. Hence, it is important to consider the effects of the outcome of the system (here: grade) when considering the effect of explanations of AI in education.

References

Adadi, A., & Berrada, M. (2018). Peeking inside the black-box: A survey on explainable artificial intelligence (XAI). IEEE Access, 6, 52138–52160. https://doi.org/10.1109/ACCESS.2018.2870052

Alan, A., Costanza, E., Fischer, J., Ramchurn, S. D., Rodden, T., & Jennings, N. R. (2014). A field study of human–agent interaction for electricity tariff switching. Proceedings of the 13th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2014), 5–9 May 2014, Paris, France (pp. 965–972).

Alikaniotis, D., Yannakoudakis, H., & Rei, M. (2016). Automatic text scoring using neural networks. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL 2016), 7–12 August 2016, Berlin, Germany (Vol. 1: Long papers, pp. 715–725). Association for Computational Linguistics. https://doi.org/10.18653/V1/P16-1068

Allen, L. K., Jacovina, M. E., & McNamara, D. S. (2016). Computer-based writing instruction. In C. A. MacArthur, S. Graham, & J. Fitzgerald (Eds.), Handbook of writing research, 2nd ed. (pp. 316–329). Guildford Press. https://eric.ed.gov/?id=ED586512

Alonso, J. M., & Casalino, G. (2019). Explainable artificial intelligence for human-centric data analysis in virtual learning environments. In D. Burgos, M. Cimitile, P. Ducange, R. Pecori, P. Picerno, P. Raviolo, & C. M. Stracke (Eds.), Higher education learning methodologies and technologies online (pp. 125–138). Springer. https://doi.org/10.1007/978-3-030-31284-8_10

Arrieta, A. B., Díaz-Rodríguez, N., Del Ser, J., Bennetot, A., Tabik, S., Barbado, A., Garcia, S., Gil-Lopez, S., Molina, D., Bejamins, R., Chatila, R., & Herrera, F. (2020). Explainable artificial intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Information Fusion, 58, 82–115. https://doi.org/10.1016/j.inffus.2019.12.012

Ashoori, M., & Weisz, J. D. (2019). In AI we trust? Factors that influence trustworthiness of AI-infused decision-making processes. https://doi.org/10.48550/arXiv.1912.02675

Attali, Y., & Burstein, J. (2004). Automated essay scoring with e-rater® V.2.0. ETS Research Report Series, 2004(2). https://doi.org/10.1002/j.2333-8504.2004.tb01972.x

Barria-Pineda, J., & Brusilovsky, P. (2019). Making educational recommendations transparent through a fine-grained open learner model. IUI Workshops ’19, 20 March 2019, Los Angeles, CA, USA. https://ceur-ws.org/Vol-2327/IUI19WS-IUIATEC-6.pdf

Bellotti, V., & Edwards, K. (2001). Intelligibility and accountability: Human considerations in context-aware systems. Human–Computer Interaction, 16(2–4), 193–212. https://doi.org/10.1207/S15327051HCI16234_05

Bodily, R., Kay, J., Aleven, V., Jivet, I., Davis, D., Xhakaj, F., & Verbert, K. (2018). Open learner models and learning analytics dashboards: A systematic review. Proceedings of the 8th International Conference on Learning Analytics and Knowledge (LAK ’18), 5–9 March 2018, Sydney, NSW, Australia (pp. 41–50). ACM Press. https://doi.org/10.1145/3170358.3170409

Bull, S., & Kay, J. (2016). SMILI☺: A framework for interfaces to learning data in open learner models, learning analytics and related fields. International Journal of Artificial Intelligence in Education, 26, 293–331. https://doi.org/10.1007/s40593-015-0090-8

Bussone, A., Stumpf, S., & O’Sullivan, D. (2015). The role of explanations on trust and reliance in clinical decision support systems. Proceedings of the 2015 IEEE International Conference on Healthcare Informatics (ICHI 2015), 21–23 October 2015, Dallas, TX, USA (pp. 160–169). IEEE Computer Society. https://doi.org/10.1109/ICHI.2015.26

Cacioppo, J. T., Petty, R. E., Feinstein, J. A., & Jarvis, W. B. G. (1996). Dispositional differences in cognitive motivation: The life and times of individuals varying in need for cognition. Psychological Bulletin, 119(2), 197–253. https://doi.org/10.1037/0033-2909.119.2.197

Campagna, R. L., Mislin, A. A., Dirks, K. T., & Elfenbein, H. A. (2022). The (mostly) robust influence of initial trustworthiness beliefs on subsequent behaviors and perceptions. Human Relations, 75(7), 1383–1411. https://doi.org/10.1177/00187267211002905

Cerratto Pargman, T. C., & McGrath, C. (2021). Mapping the ethics of learning analytics in higher education: A systematic literature review of empirical research. Journal of Learning Analytics, 8(2), 123–139. https://doi.org/10.18608/jla.2021.1

Chao, C.-Y., Chang, T.-C., Wu, H.-C., Lin, Y.-S., & Chen, P.-C. (2016). The interrelationship between intelligent agents’ characteristics and users’ intention in a search engine by making beliefs and perceived risks mediators. Computers in Human Behavior, 64, 117–125. https://doi.org/10.1016/j.chb.2016.06.031

Choi, S., Jang, Y., & Kim, H. (2023). Influence of pedagogical beliefs and perceived trust on teachers’ acceptance of educational artificial intelligence tools. International Journal of Human–Computer Interaction, 39(4), 910–922. https://doi.org/10.1080/10447318.2022.2049145

Clancey, W. J., & Hoffman, R. R. (2021). Methods and standards for research on explainable artificial intelligence: Lessons from intelligent tutoring systems. Applied AI Letters, 2(4). https://doi.org/10.1002/ail2.53

Conati, C., Barral, O., Putnam, V., & Rieger, L. (2021). Toward personalized XAI: A case study in intelligent tutoring systems. Artificial Intelligence, 298, 103503. https://doi.org/10.1016/j.artint.2021.103503

Conati, C., Porayska-Pomsta, K., & Mavrikis, M. (2018). AI in education needs interpretable machine learning: Lessons from open learner modelling. Proceedings of the 2018 ICML Workshop on Human Interpretability in Machine Learning (WHI 2018), 14 July 2018, Stockholm, Sweden (pp. 21–27). https://doi.org/10.48550/arXiv.1807.00154

Cramer, H., Evers, V., Ramlal, S., van Someren, M., Rutledge, L., Stash, N., Aroyo, L., & Wielinga, B. (2008). The effects of transparency on trust in and acceptance of a content-based art recommender. User Modeling and User-Adapted Interaction, 18, 455–496. https://doi.org/10.1007/s11257-008-9051-3

Dietvorst, B. J., Simmons, J. P., & Massey, C. (2015). Algorithm aversion: People erroneously avoid algorithms after seeing them err. Journal of Experimental Psychology: General, 144(1), 114–126. https://doi.org/10.1037/xge0000033

Dietvorst, B. J., Simmons, J. P., & Massey, C. (2018). Overcoming algorithm aversion: People will use imperfect algorithms if they can (even slightly) modify them. Management Science, 64(3), 1155–1170. https://doi.org/10.1287/mnsc.2016.2643

Dikli, S. (2006). An overview of automated scoring of essays. The Journal of Technology, Learning and Assessment, 5(1). http://ejournals.bc.edu/ojs/index.php/jtla/article/view/1640

Dzindolet, M. T., Peterson, S. A., Pomranky, R. A., Pierce, L. G., & Beck, H. P. (2003). The role of trust in automation reliance. International Journal of Human–Computer Studies, 58(6), 697–718. https://doi.org/10.1016/S1071-5819(03)00038-7

Eiband, M., Schneider, H., Bilandzic, M., Fazekas-Con, J., Haug, M., & Hussmann, H. (2018). Bringing transparency design into practice. Proceedings of the 23rd International Conference on Intelligent User Interfaces (IUI ’18), 7–11 March 2018, Tokyo, Japan (pp. 211–223). https://doi.org/10.1145/3172944.3172961

Esterwood, C., & Robert, L. J. (2021, August 12). Do you still trust me? Human–robot trust repair strategies. Proceedings of the 30th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN 2021), 8–12 August 2021, Virtual. IEEE Computer Society.

Fenster, M., Zuckerman, I., & Kraus, S. (2012). Guiding user choice during discussion by silence, examples and justifications. Frontiers in Artificial Intelligence and Applications, 242, 330–335. https://doi.org/10.3233/978-1-61499-098-7-330

Ferguson, R. (2019). Ethical challenges for learning analytics. Journal of Learning Analytics, 6(3), 25–30. https://doi.org/10.18608/jla.2019.63.5

Ferguson, R., Hoel, T., Scheffel, M., & Drachsler, H. (2016). Guest editorial: Ethics and privacy in learning analytics. Journal of Learning Analytics, 3(1), 5–15. https://doi.org/10.18608/jla.2016.31.2

Hattie, J. (2008). Visible learning: A synthesis of over 800 meta-analyses relating to achievement. Routledge.

Hattie, J., & Timperley, H. (2007). The power of feedback. Review of Educational Research, 77(1), 81–112. https://doi.org/10.3102/003465430298487

Herlocker, J. L., Konstan, J. A., & Riedl, J. (2000). Explaining collaborative filtering recommendations. Proceedings of the 2000 Conference on Computer Supported Cooperative Work (CSCW ’00), 2–6 December 2000, Philadelphia, PA, USA (pp. 241–250). ACM Press. https://doi.org/10.1145/358916.358995

Hussein, M. A., Hassan, H., & Nassef, M. (2019). Automated language essay scoring systems: A literature review. PeerJ Computer Science, 5, e208. https://doi.org/10.7717/PEERJ-CS.208

Hütter, M., & Ache, F. (2016). Seeking advice: A sampling approach to advice taking. Judgement and Decision Making, 11(4), 401–415. https://journal.sjdm.org/15/151110a/jdm151110a.pdf

Jessup, S. A., Schneider, T. R., Alarcon, G. M., Ryan, T. J., & Capiola, A. (2019). The measurement of the propensity to trust automation. In J. Y. C. Chen & G. Fragomeni (Eds.), Virtual, augmented and mixed reality: Applications and case studies (pp. 476–489). Lecture Notes in Computer Science, vol. 11575. Springer. https://doi.org/10.1007/978-3-030-21565-1_32

Jian, J.-Y., Bisantz, A. M., & Drury, C. G. (2010). Foundations for an empirically determined scale of trust in automated systems. International Journal of Cognitive Ergonomics, 4(1), 53–71. https://doi.org/10.1207/S15327566IJCE0401_04

Kamath, U., & Liu, J. (2021). Explainable artificial intelligence: An introduction to interpretable machine learning. Springer. https://doi.org/10.1007/978-3-030-83356-5

Khosravi, H., Buckingham Shum, S., Chen, G., Conati, C., Tsai, Y.-S., Kay, J., Knight, S., Martinez-Maldonado, R., Sadiq, S., & Gašević, D. (2022). Explainable artificial intelligence in education. Computers and Education: Artificial Intelligence, 3, 100074. https://doi.org/10.1016/j.caeai.2022.100074

Kim, P. H., Ferrin, D. L., Cooper, C. D., & Dirks, K. T. (2004). Removing the shadow of suspicion: The effects of apology versus denial for repairing competence- versus integrity-based trust violations. Journal of Applied Psychology, 89(1), 104–118. https://doi.org/10.1037/0021-9010.89.1.104

Kim, T., & Song, H. (2021). How should intelligent agents apologize to restore trust? Interaction effects between anthropomorphism and apology attribution on trust repair. Telematics and Informatics, 61, 101595. https://doi.org/10.1016/j.tele.2021.101595

Kizilcec, R. F. (2016). How much information? Effects of transparency on trust in an algorithmic interface. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI ʼ16), 7–12 May 2016, San Jose, CA, USA (pp. 2390–2395). ACM Press. https://doi.org/10.1145/2858036.2858402

Knight, S., Buckingham Shum, S., Ryan, P., Sándor, Á., & Wang, X. (2018). Designing academic writing analytics for civil law student self-assessment. International Journal of Artificial Intelligence in Education, 28, 1–28. https://doi.org/10.1007/s40593-016-0121-0

Kulesza, T., Burnett, M., Wong, W.-K., & Stumpf, S. (2015). Principles of explanatory debugging to personalize interactive machine learning. Proceedings of the 20th International Conference on Intelligent User Interfaces (IUI ’15), 29 March–1 April 2015, Atlanta, GA, USA (pp. 126–137). ACM Press. https://doi.org/10.1145/2678025.2701399

Kulesza, T., Stumpf, S., Burnett, M., Yang, S., Kwan, I., & Wong, W.-K. (2013). Too much, too little, or just right? Ways explanations impact end users’ mental models. Proceedings of the 2013 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC’13), 15–19 September 2013, San Jose, CA, USA (pp. 3–10). https://doi.org/10.1109/VLHCC.2013.6645235

Lee, J. D., & See, K. A. (2004). Trust in automation: Designing for appropriate reliance. Human Factors, 46(1), 50–80. https://doi.org/10.1518/hfes.46.1.50_30392

Lim, B. Y., & Dey, A. K. (2010). Toolkit to support intelligibility in context-aware applications. Proceedings of the 2010 ACM International Conference on Ubiquitous Computing (UbiComp ’10), 26–29 September 2010, Copenhagen, Denmark (pp. 13–22). ACM Press. https://doi.org/10.1145/1864349.1864353

Lins de Holanda Coelho, G., Hanel, P. H. P., & Wolf, L. J. (2020). The very efficient assessment of need for cognition: Developing a six-item version. Assessment, 27(8), 1870–1885. https://doi.org/10.1177/1073191118793208

Long, Y., & Aleven, V. (2013). Supporting students’ self-regulated learning with an open learner model in a linear equation tutor. In H. C. Lane, K. Yacef, J. Mostow, & P. Pavlik (Eds.), Proceedings of the 16th International Conference on Artificial Intelligence in Education (AIED ʼ13), 9–13 July 2013, Memphis, TN, USA (pp. 219–228). Springer. https://doi.org/10.1007/978-3-642-39112-5_23

Manzey, D., Reichenbach, J., & Onnasch, L. (2012). Human performance consequences of automated decision aids: The impact of degree of automation and system experience. Journal of Cognitive Engineering and Decision Making, 6(1), 57–87. https://doi.org/10.1177/1555343411433844

Matzat, U., & Snijders, C. (2012). Rebuilding trust in online shops on consumer review sites: Sellers’ responses to user-generated complaints. Journal of Computer-Mediated Communication, 18(1), 62–79. https://doi.org/10.1111/J.1083-6101.2012.01594.X

McAuley, E., Duncan, T., Tammen, V. V. (1989). Psychometric properties of the intrinsic motivation inventory in a competitive sport setting: A confirmatory factor analysis. Research Quarterly for Exercise and Sport, 60(1), 48–58. https://doi.org/10.1080/02701367.1989.10607413

Meuwissen, M., & Bollen, L. (2021). Transparency versus explainability in AI. https://doi.org/10.13140/RG.2.2.27466.90561

Möhlmann, M., & Zalmanson, L. (2017). Hands on the wheel: Navigating algorithmic management and Uber driver’s autonomy. Proceedings of the 38th International Conference on Information Systems (ICIS 2017), 10–13 December 2017, Seoul, South Korea. https://www.researchgate.net/publication/319965259

Mohseni, S., Zarei, N., & Ragan, E. D. (2021). A multidisciplinary survey and framework for design and evaluation of explainable AI systems. ACM Transactions on Interactive Intelligent Systems, 11(3–4), 1–45. https://doi.org/10.1145/3387166

Mueller, S. T., Hoffman, R. R., Clancey, W., Emrey, A., & Klein, G. (2019). Explanation in human–AI systems: A literature meta-review, synopsis of key ideas and publications, and bibliography for explainable AI. DARPA XAI Literature Review. https://doi.org/10.48550/arXiv.1902.01876

Mueller, S. T., Veinott, E. S., Hoffman, R. R., Klein, G., Alam, L., Mamun, T., & Clancey, W. J. (2021). Principles of explanation in human–AI systems. Proceedings of the 35th Conference on Artificial Intelligence (AAAI-21), 8–9 February 2021, Virtual. https://doi.org/10.48550/arXiv.2102.04972

Nazaretsky, T., Ariely, M., Cukurova, M., & Alexandron, G. (2022). Teachers’ trust in AI-powered educational technology and a professional development program to improve it. British Journal of Educational Technology, 53(4), 914–931. https://doi.org/10.1111/bjet.13232

Nazaretsky, T., Cukurova, M., & Alexandron, G. (2022). An instrument for measuring teachers’ trust in AI-based educational technology. Proceedings of the 12th International Conference on Learning Analytics and Knowledge (LAK ’22), 21–25 March 2022, Online (pp. 56–66). ACM Press. https://doi.org/10.1145/3506860.3506866

Ooge, J., Kato, S., & Verbert, K. (2022). Explaining recommendations in e-learning: Effects on adolescents’ trust. Proceedings of the 27th International Conference on Intelligent User Interfaces (IUI ’22), 22–25 March 2022, Helsinki, Finland (pp. 93–105). https://doi.org/10.1145/3490099.3511140

Papenmeier, A., Englebienne, G., & Seifert, C. (2019). How model accuracy and explanation fidelity influence user trust. Proceedings of the 28th International Joint Conference on Artificial Intelligence (IJCAI’19), 10–16 August 2019, Macao, China (pp. 94–100). https://doi.org/10.48550/arXiv.1907.12652

Prahl, A., & van Swol, L. (2017). Understanding algorithm aversion: When is advice from automation discounted? Journal of Forecasting, 36(6), 691–702. https://doi.org/10.1002/FOR.2464

Prinsloo, P., & Slade, S. (2018). Mapping responsible learning analytics: A critical proposal. In B. H. Khan, J. R. Corbeil, & M. E. Corbeil (Eds.), Responsible analytics and data mining in education: Global perspectives on quality, support, and decision-making. Routledge. http://oro.open.ac.uk/55827/

Qin, F., Li, K., & Yan, J. (2020). Understanding user trust in artificial intelligence‐based educational systems: Evidence from China. British Journal of Educational Technology, 51(5), 1693–1710. https://doi.org/10.1111/bjet.12994

Rawal, A., McCoy, J., Rawat, D. B., Sadler, B. M., & St. Amant, R. (2022). Recent advances in trustworthy explainable artificial intelligence: Status, challenges and perspectives. IEEE Transactions on Artificial Intelligence, 3(6), 852–866. https://doi.org/10.1109/TAI.2021.3133846

Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). “Why should I trust you?” Explaining the predictions of any classifier. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’16), 13–17 August 2016, San Francisco, CA, USA (pp. 1135–1144). ACM Press. https://doi.org/10.1145/2939672.2939778

Rosenthal, S. L., & Dey, A. K. (2010). Towards maximizing the accuracy of human-labeled sensor data. Proceedings of the 15th International Conference on Intelligent User Interfaces (IUI ’10), 7–10 February 2010, Hong Kong, China (pp. 259–268). ACM Press. https://doi.org/10.1145/1719970.1720006

Samek, W., & Müller, K.-R. (2019). Towards explainable artificial intelligence. In W. Samek, G. Montavon, A. Vedaldi, L. K. Hansen, & K.-R. Müller (Eds.), Explainable AI: Interpreting, explaining and visualizing deep learning (pp. 5–22). Springer. https://doi.org/10.1007/978-3-030-28954-6_1

Sclater, N. (2016). Developing a code of practice for learning analytics. Journal of Learning Analytics, 3(1), 16–42. https://doi.org/10.18608/jla.2016.31.3

Selwyn, N. (2019). What’s the problem with learning analytics? Journal of Learning Analytics, 6(3), 11–19. https://doi.org/10.18608/jla.2019.63.3

Shin, D. (2021). The effects of explainability and causability on perception, trust, and acceptance: Implications for explainable AI. International Journal of Human–Computer Studies, 146, 102551. https://doi.org/10.1016/j.ijhcs.2020.102551

Slade, S., & Prinsloo, P. (2013). Learning analytics: Ethical issues and dilemmas. American Behavioral Scientist, 57(10), 1510–1529. https://doi.org/10.1177/0002764213479366

Snijders, C., Bober, M., & Matzat, U. (2017). Online reputation in eBay auctions: Damaging and rebuilding trustworthiness through feedback comments from buyers and sellers. In B. Jann & W. Przepiorka (Eds.), Social dilemmas, institutions, and the evolution of cooperation (pp. 421–444). De Gruyter Oldenbourg. https://doi.org/10.1515/9783110472974-020

Tzimas, D., & Demetriadis, S. (2021). Ethical issues in learning analytics: A review of the field. Educational Technology Research and Development, 69, 1101–1133. https://doi.org/10.1007/S11423-021-09977-4

Wang, W., Qiu, L., Kim, D., & Benbasat, I. (2016). Effects of rational and social appeals of online recommendation agents on cognition- and affect-based trust. Decision Support Systems, 86, 48–60. https://doi.org/10.1016/j.dss.2016.03.007

Warren, G., Keane, M. T., & Byrne, R. M. J. (2022). Features of explainability: How users understand counterfactual and causal explanations for categorical and continuous features in XAI. https://doi.org/10.48550/arXiv.2204.10152

Wilson, J., & Czik, A. (2016). Automated essay evaluation software in English language arts classrooms: Effects on teacher feedback, student motivation, and writing quality. Computers & Education, 100, 94–109. https://doi.org/10.1016/J.COMPEDU.2016.05.004

Wisniewski, B., Zierer, K., & Hattie, J. (2020). The power of feedback revisited: A meta-analysis of educational feedback research. Frontiers in Psychology, 10. https://doi.org/10.3389/FPSYG.2019.03087

Yang, L. W., Aggarwal, P., & McGill, A. L. (2020). The 3 C’s of anthropomorphism: Connection, comprehension, and competition. Consumer Psychology Review, 3(1), 3–19. https://doi.org/10.1002/arcp.1054

Yin, M., Wortman Vaughan, J., & Wallach, H. (2019). Understanding the effect of accuracy on trust in machine learning models. Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (CHI ’19), 4–9 May 2019, Glasgow, Scotland, UK (pp. 1–12). ACM Press. https://doi.org/10.1145/3290605.3300509

Zumbrunn, S., Marrs, S., & Mewborn, C. (2016). Toward a better understanding of student perceptions of writing feedback: A mixed methods study. Reading and Writing, 29, 349–370. https://doi.org/10.1007/S11145-015-9599-3

The Effects of Explanations in Automated Essay Scoring Systems on Student Trust and Motivation

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Most read articles by the same author(s)