OpenOPAF: An Open Source Multimodal System for Automated Feedback for Oral Presentations
DOI:
https://doi.org/10.18608/jla.2024.8411Keywords:
open-source tool, communication skills, multimodal learning analytics, data and tools reportAbstract
Providing automated feedback that facilitates the practice and acquisition of oral presentation skills has been one of the notable applications of multimodal learning analytics (MmLA). However, the closedness and general unavailability of existing systems have reduced their potential impact and benefits. This work introduces OpenOPAF, an open-source system designed to provide automated multimodal feedback for oral presentations. By leveraging analytics to assess body language, gaze direction, voice volume, articulation speed, filled pauses, and the use of text in visual aids, it provides real-time, actionable information to presenters. Evaluations conducted on OpenOPAF show that it performs similarly, both technically and pedagogically, to existing closed solutions. This system targets practitioners who wish to use it as-is to provide feedback to novice presenters, developers seeking to adapt it for other learning contexts, and researchers interested in experimenting with new feature extraction algorithms and report mechanisms and studying the acquisition of oral presentation skills. This initiative aims to foster a community-driven approach to democratize access to sophisticated analytics tools for oral presentation skill development.
References
Abdulkadir, M. S., Rathnayaka, R., Kodithuwakkuge, V., & Beneragama, C. (2021). Reliability of assessing oral presentations by the university professionals. International Journal of Research and Innovation in Social Science, 5(9), 378–383. https://doi.org/10.47772/IJRISS.2021.5912
Alley, M., & Robertshaw, H. (2004). Rethinking the design of presentation slides: Creating slides that are readily comprehended. In Proceedings of the ASME International Mechanical Engineering Congress and Exposition (ASME 2004), 13–19 November 2004, Anaheim, California, USA (pp. 445–450, Vol. 47233). ASME. https://doi.org/10.1115/IMECE2004-61889
Batrinca, L., Stratou, G., Shapiro, A., Morency, L.-P., & Scherer, S. (2013). Cicero—towards a multimodal virtual audience platform for public speaking training. In R. Aylett, B. Krenn, C. Pelachaud, & H. Shimodaira (Eds.), Proceedings of the International Workshop on Intelligent Virtual Agents (IVA 2013), Lecture notes in computer science (pp. 116–128, Vol. 8108). Springer. https://doi.org/10.1007/978-3-642-40415-3_10
Boersma, P., & Van Heuven, V. (2001). Speak and unSpeak with PRAAT. Glot International, 5(9/10), 341–347. https://www.fon.hum.uva.nl/paul/papers/speakUnspeakPraat_glot2001.pdf
Bull, P., & Frederikson, L. (2019). Non-verbal communication. In A. M. Colman (Ed.), Companion Encyclopedia of Psychology (pp. 852–872). Routledge. https://doi.org/10.4324/9781315542072
Castañer, M., Camerino, O., Anguera, M. T., & Jonsson, G. K. (2013). Kinesics and proxemics communication of expert and novice PE teachers. Quality & Quantity, 47, 1813–1829. https://doi.org/10.1007/s11135-011-9628-5
Chan, V. (2011). Teaching oral communication in undergraduate science: Are we doing enough and doing it right? Journal of Learning Design, 4(3), 71–79. https://doi.org/10.5204/jld.v4i3.82
Chong, E., Wang, Y., Ruiz, N., & Rehg, J. M. (2020). Detecting attended visual targets in video. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2020), 13–19 June 2020, Seattle, Washington, USA (pp. 5396–5406). IEEE. https://doi.org/10.1109/CVPR42600.2020.00544
Chunduru, V., Roy, M., Dasari, R. N. S., & Chittawadigi, R. G. (2021). Hand tracking in 3D space using MediaPipe and PnP method for intuitive control of virtual globe. In Proceedings of the 2021 IEEE Ninth Region 10 Humanitarian Technology Conference (R10-HTC 2021), 30 September 2021–02 October 2021, Bangalore, India (pp. 1–6). IEEE. https://doi.org/10.1109/R10-HTC53172.2021.9641587
Clegg, H. R., Carpenter, T. M., Freear, S., & Cowell, D. M. (2022). An open, modular, ultrasound digitial signal processing specification. In Proceedings of the 2022 IEEE International Ultrasonics Symposium (IUS 2022), 10–13 October 2022, Venice, Italy (pp. 1–4). IEEE. https://doi.org/10.1109/IUS54386.2022.9957486
Coleman, G. R., & Salter, W. T. (2023). More eyes on the prize: Open-source data, software and hardware for advancing plant science through collaboration. AoB Plants, 15(2), 13. https://doi.org/10.1093/aobpla/plad010
Damian, I., Tan, C. S., Baur, T., Sch ¨oning, J., Luyten, K., & Andr ´e, E. (2015). Augmenting social interactions: Realtime behavioural feedback using social signal processing techniques. In Proceedings of the 33rd annual ACM Conference on Human Factors in Computing Systems (CHI 2015), 18–23 April 2015, Seoul, Republic of Korea (pp. 565–574). ACM. https://doi.org/10.1145/2702123.2702314
Davis, F. D. (1989). Perceived usefulness, perceived ease of use, and user acceptance of information technology. MIS Quarterly, 13(3), 319–340. https://doi.org/10.2307/249008
De Jong, N. H., Pacilly, J., & Heeren, W. (2021). PRAAT scripts to measure speed fluency and breakdown fluency in speech automatically. Assessment in Education: Principles, Policy & Practice, 28(4), 456–476. https://doi.org/10.1080/0969594X.2021.1951162
De Jong, N. H., & Wempe, T. (2009). PRAAT script to detect syllable nuclei and measure speech rate automatically. Behavior Research Methods, 41(2), 385–390. https://doi.org/10.3758/brm.41.2.385
Debalaxmi, D., Vishwakarma, D. K., & Ranga, V. (2024). Analyzing yoga pose recognition: A comparison of MediaPipe and YOLO keypoint detection with ensemble techniques. In Proceedings of the Third International Conference on Applied Artificial Intelligence and Computing (ICAAIC 2024), 5–7 June 2024, Salem, India (pp. 1011–1017). IEEE. https://doi.org/10.1109/ICAAIC60222.2024.10574984
Dermody, F., & Sutherland, A. (2015). A multimodal system for public speaking with real time feedback. In Proceedings of the 2015 ACM International Conference on Multimodal Interaction (ICMI 2015), 12–16 November 2015, Tokyo, Japan (pp. 369–370). ACM. https://doi.org/10.1145/2993148.2998536
Deshmukh, O., Espy-Wilson, C. Y., Salomon, A., & Singh, J. (2005). Use of temporal information: Detection of periodicity, aperiodicity, and pitch in speech. IEEE Transactions on Speech and Audio Processing, 13(5), 776–786. https://doi.org/10.1109/TSA.2005.851910
Domínguez, F., Eras, L., Tomal ´a, J., & Collaguazo, A. (2023). Estimating the distribution of oral presentation skills in an educational institution: A novel methodology. In J. Jovanovic, I. -A. Chounta, J. Uhomoibhi, & B. McLaren (Eds.), Proceedings of the 15th International Conference on Computer Supported Education (CSEDU 2023), 21–23 April 2023, Prague, Czechia (pp. 39–46). ScitePress Digital Library. https://doi.org/10.5220/0011853900003470
Donnell, J. A., Aller, B. M., Alley, M. P., & Kedrowicz, A. A. (2011). Why industry says that engineering graduates have poor communication skills: What the literature says. In Proceedings of the ASEE Annual Conference and Exposition (ASEE 2011), 26–29 June 2011, Vancouver, British Columbia, Canada. ASEE. https://doi.org/10.18260/1-2--18809
Dowhower, S. L. (1991). Speaking of prosody: Fluency’s unattended bedfellow. Theory into Practice, 30(3), 165–175. https://doi.org/10.1080/00405849109543497
Ericsson, K. A., Krampe, R. T., & Tesch-Römer, C. (1993). The role of deliberate practice in the acquisition of expert performance. Psychological Review, 100(3), 363. https://doi.org/10.1037/0033-295X.100.3.363
Fernández-Nieto, G. M., Echeverria, V., Martinez-Maldonado, R., & Shum, S. B. (2024). YarnSense: Automated data storytelling for multimodal learning analytics. In Proceedings of the 2024 Data Storytelling and Learning Analytics Workshop (DS-LAK 2024), 18–22 March 2024, Kyoto, Japan (pp. 124–138). CEUR. https://ceur-ws.org/Vol-3667/DS-LAK24_paper_3.pdf
Gan, T., Wong, Y., Mandal, B., Chandrasekhar, V., & Kankanhalli, M. S. (2015). Multi-sensor self-quantification of presentations. In Proceedings of the 23rd ACM International Conference on Multimedia (MM 2015), 26–30 October 2015, Brisbane, Australia (pp. 601–610). ACM. https://doi.org/10.1145/2733373.2806252
Kilag, O. K. T., Quimada, G. M., Contado, M. B., Macapobre, H. E., Rabi, J. I. I. A., & Peras, C. C. (2023). The use of body language in public speaking. Science and Education, 4(1), 393–406. https://openscience.uz/index.php/sciedu/article/view/4847
Kurihara, K., Goto, M., Ogata, J., Matsusaka, Y., & Igarashi, T. (2007). Presentation sensei: A presentation training system using speech and image processing. In Proceedings of the Ninth International Conference on Multimodal Interfaces (ICMI 2007), 12–15 November 2007, Nagoya, Aichi, Japan (pp. 358–365). https://doi.org/10.1145/1322192.1322256
Li, Z., Jensen, M. T., Nolte, A., & Spikol, D. (2024). Field report for platform mBox: Designing an open MMLA platform. In Proceedings of the 14th International Conference on Learning Analytics and Knowledge (LAK 2024), 18–22 March 2024, Kyoto, Japan (pp. 785–791). ACM. https://doi.org/10.1145/3636555.3636872
Lui, A. K.-F., Ng, S. - C., & Wong, W. -W. (2015). A novel mobile application for training oral presentation delivery skills. In J. Lam, K. Ng, S. Cheung, T. Wong, K. Li, & W. F. (Eds.), International Conference on Technology in Education (ICTE 2015), Communications in computer and information science (pp. 79–89, Vol. 559). Springer. https://doi.org/10.1007/978-3-662-48978-9_8
Martinez-Maldonado, R., Echeverria, V., Prieto, L. P., Rodriguez-Triana, M. J., Spikol, D., Curukova, M., Mavrikis, M., Ochoa, X., & Worsley, M. (2018). Multimodal transcript of face-to-face group-work activity around interactive tabletops. In Proceedings of the Second Multimodal Learning Analytics across (Physical and Digital) Spaces (CrossMMLA 2018), 6 March 2018, Sydney, New South Wales, Australia. CEUR. http://ceur-ws.org/Vol-2163/paper4.pdf
McCarthy, C., Pradhan, N., Redpath, C., & Adler, A. (2016). Validation of the Empatica E4 wristband. In Proceedings of the IEEE EMBS International Student Conference (ISC 2016), 29–31 May 2016, Ottawa, Ontario, Canada (pp. 1–4). IEEE. https://doi.org/10.1109/EMBSISC.2016.7508621
McGaghie, W. C., Issenberg, S. B., Cohen, M. E. R., Barsuk, J. H., & Wayne, D. B. (2011). Does simulation-based medical education with deliberate practice yield better results than traditional clinical education? A meta-analytic comparative review of the evidence. Academic Medicine: Journal of the Association of American Medical Colleges, 86(6), 706. https://doi.org/10.1097/acm.0b013e318217e119
Moothedath, M. (2024). Reliability of rubrics in the assessment of clinical oral presentation: A prospective controlled study. Journal of Education and Health Promotion, 13(1), 182. https://doi.org/10.4103/jehp.jehp 1016 23
Motavalli, S. (1998). Review of reverse engineering approaches. Computers & Industrial Engineering, 35(1), 25–28. https://doi.org/10.1016/S0360-8352(98)00011-4
Nguyen, A. -T., Chen, W., & Rauterberg, M. (2015). Intelligent presentation skills trainer analyses body movement. In I. Rojas, G. Joya, & A. Catala (Eds.), Proceedings of the 13th International Work-Conference on Artificial Neural Networks (IWANN 2015), Advances in computational intelligence (pp. 320–332). Springer. https://doi.org/10.1007/978-3-319-19222-2_27
Ochoa, X. (2017). Multimodal learning analytics. In C. Lang, G. Siemens, A. Wise, & D. Gasevic (Eds.), The Handbook of Learning Analytics (pp. 129–141, Vol. 1). SoLAR. https://doi.org/10.18608/hla17.011
Ochoa, X. (2022a). Multimodal learning analytics—rationale, process, examples, and direction. In C. Lang, G. Siemens, A. Friend Wise, D. Gasevic, & A. Merceron (Eds.), The Handbook of Learning Analytics (2nd ed., pp. 54–65). SoLAR. https://doi.org/10.18608/hla22.006
Ochoa, X. (2022b). Multimodal systems for automated oral presentation feedback: A comparative analysis. In M. Giannakos, D. Spikol, D. Di Mitri, K. Sharma, X. Ochoa, & R. Hammad (Eds.), The Multimodal Learning Analytics Handbook (pp. 53–78). Springer. https://doi.org/10.1007/978-3-031-08076-0_3
Ochoa, X., & Dominguez, F. (2020). Controlled evaluation of a multimodal system to improve oral presentation skills in a real learning setting. British Journal of Educational Technology, 51(5), 1615–1630. https://doi.org/10.1111/bjet.12987
Ochoa, X., Domínguez, F., Guamán, B., Maya, R., Falcones, G., & Castells, J. (2018). The RAP system: Automatic feedback of oral presentation skills using multimodal analysis and low-cost sensors. In Proceedings of the Eighth International Conference on Learning Analytics and Knowledge (LAK 2018), 7–9 March 2018, Sydney, New South Wales, Australia (pp. 360–364). ACM. https://doi.org/10.1145/3170358.3170406
Olechowski, A., Eppinger, S. D., & Joglekar, N. (2015). Technology readiness levels at 40: A study of state-of-the-art use, challenges, and opportunities. In Proceedings of the 2015 Portland International Conference on Management of Engineering and Technology (PICMET 2015), 2–6 August 2015, Portland, Oregon, USA (pp. 2084–2094). IEEE. https://doi.org/10.1109/PICMET.2015.7273196
Rios, J. A., Ling, G., Pugh, R., Becker, D., & Bacall, A. (2020). Identifying critical 21st-century skills for workplace success: A content analysis of job advertisements. Educational Researcher, 49(2), 80–89. https://doi.org/10.3102/0013189X19890600
Schneider, J., Börner, D., Van Rosmalen, P., & Specht, M. (2015). Presentation Trainer, your public speaking multimodal coach. In Proceedings of the 2015 ACM International Conference on Multimodal Interaction (ICMI 2015), 9–13 November 2015, Seattle, Washington, USA (pp. 539–546). ACM. https://doi.org/10.1145/2818346.2830603
Schneider, J., Börner, D., Van Rosmalen, P., & Specht, M. (2016). Enhancing public speaking skills—An evaluation of the Presentation Trainer in the wild. In K. Verbert, M. Sharples, & T. Klobu ˇcar (Eds.), Proceedings of the 2016 European Conference on Technology Enhanced Learning (EC-TEL 2016), Lecture notes in computer science (pp. 263–276). Springer. https://doi.org/10.1007/978-3-319-45153-4_20
Schneider, J., Romano, G., & Drachsler, H. (2019). Beyond reality—extending a presentation trainer with an immersive VR module. Sensors, 19(16), 3457. https://doi.org/10.3390/s19163457
Subapriya, K. (2009). The importance of non-verbal cues. Journal of Soft Skills, 3(2), 37–42. https://www.iupindia.in/609/IJSS_Non-Verbal%20Cues 37.html
Tanveer, M. I., Lin, E., & Hoque, M. (2015). Rhema: A real-time in-situ intelligent interface to help people with public speaking. In Proceedings of the 20th International Conference on Intelligent User Interfaces (IUI 2015), 29 March–1 April 2015), Atlanta, Georgia, USA (pp. 286–295). ACM. https://doi.org/10.1145/2678025.2701386
Thurneck, L. (2011). Incorporating student presentations in the college classroom. Inquiry, 16(1), 17–30. https://files.eric.ed.gov/fulltext/EJ952023.pdf
Trinh, H., Asadi, R., Edge, D., & Bickmore, T. (2017). RoboCOP: A robotic coach for oral presentations. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 1(2), 1–24. https://doi.org/10.1145/3090092
Van Ginkel, S. (2019). Fostering oral presentation competence in higher education [Doctoral dissertation, Wageningen University]. https://doi.org/10.18174/476541
Van Ginkel, S., Gulikers, J., Biemans, H., & Mulder, M. (2015). Towards a set of design principles for developing oral presentation competence: A synthesis of research in higher education. Educational Research Review, 14, 62–80. https://doi.org/10.1016/j.edurev.2015.02.002
Van Ginkel, S., Laurentzen, R., Mulder, M., Mononen, A., Kytt¨a, J., & Kortelainen, M. J. (2017). Assessing oral presentation performance: Designing a rubric and testing its validity with an expert group. Journal of Applied Research in Higher Education, 9(3), 474–486. https://doi.org/10.1108/JARHE-02-2016-0012
Williams van Rooij, S. (2011). Higher education sub-cultures and open source adoption. Computers & Education, 57(1), 1171–1183. https://doi.org/10.1016/j.compedu.2011.01.006
Worsley, M., & Martinez-Maldonado, R. (2018). Multimodal learning analytics’ past, present, and potential futures. In Proceedings of the Second Multimodal Learning Analytics across (Physical and Digital) Spaces (CrossMMLA 2018), 6 March 2018, Sydney, New South Wales, Australia (pp. 1–16, Vol. 2). CEUR. https://ceur-ws.org/Vol-2163/paper5.pdf
Yan, L., Zhao, L., Gasevic, D., & Martinez-Maldonado, R. (2022). Scalability, sustainability, and ethicality of multimodal learning analytics. In Proceedings of the 12th International Conference on Learning Analytics and Knowledge (LAK 2022), 21–25 March 2022, online (pp. 13–23). ACM. https://doi.org/10.1145/3506860.3506862
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Journal of Learning Analytics

This work is licensed under a Creative Commons Attribution 4.0 International License.