OpenOPAF: An Open Source Multimodal System for Automated Feedback for Oral Presentations

Xavier Ochoa; Heru Zhao

doi:10.18608/jla.2024.8411

Authors

Xavier Ochoa New York University https://orcid.org/0000-0002-4371-7701
Heru Zhao New York University https://orcid.org/0000-0002-8799-3599

DOI:

https://doi.org/10.18608/jla.2024.8411

Keywords:

open-source tool, communication skills, multimodal learning analytics, data and tools report

Abstract

Providing automated feedback that facilitates the practice and acquisition of oral presentation skills has been one of the notable applications of multimodal learning analytics (MmLA). However, the closedness and general unavailability of existing systems have reduced their potential impact and benefits. This work introduces OpenOPAF, an open-source system designed to provide automated multimodal feedback for oral presentations. By leveraging analytics to assess body language, gaze direction, voice volume, articulation speed, filled pauses, and the use of text in visual aids, it provides real-time, actionable information to presenters. Evaluations conducted on OpenOPAF show that it performs similarly, both technically and pedagogically, to existing closed solutions. This system targets practitioners who wish to use it as-is to provide feedback to novice presenters, developers seeking to adapt it for other learning contexts, and researchers interested in experimenting with new feature extraction algorithms and report mechanisms and studying the acquisition of oral presentation skills. This initiative aims to foster a community-driven approach to democratize access to sophisticated analytics tools for oral presentation skill development.

References

Abdulkadir, M. S., Rathnayaka, R., Kodithuwakkuge, V., & Beneragama, C. (2021). Reliability of assessing oral presentations by the university professionals. International Journal of Research and Innovation in Social Science, 5(9), 378–383. https://doi.org/10.47772/IJRISS.2021.5912

Alley, M., & Robertshaw, H. (2004). Rethinking the design of presentation slides: Creating slides that are readily comprehended. In Proceedings of the ASME International Mechanical Engineering Congress and Exposition (ASME 2004), 13–19 November 2004, Anaheim, California, USA (pp. 445–450, Vol. 47233). ASME. https://doi.org/10.1115/IMECE2004-61889

Batrinca, L., Stratou, G., Shapiro, A., Morency, L.-P., & Scherer, S. (2013). Cicero—towards a multimodal virtual audience platform for public speaking training. In R. Aylett, B. Krenn, C. Pelachaud, & H. Shimodaira (Eds.), Proceedings of the International Workshop on Intelligent Virtual Agents (IVA 2013), Lecture notes in computer science (pp. 116–128, Vol. 8108). Springer. https://doi.org/10.1007/978-3-642-40415-3_10

Boersma, P., & Van Heuven, V. (2001). Speak and unSpeak with PRAAT. Glot International, 5(9/10), 341–347. https://www.fon.hum.uva.nl/paul/papers/speakUnspeakPraat_glot2001.pdf

Bull, P., & Frederikson, L. (2019). Non-verbal communication. In A. M. Colman (Ed.), Companion Encyclopedia of Psychology (pp. 852–872). Routledge. https://doi.org/10.4324/9781315542072

Castañer, M., Camerino, O., Anguera, M. T., & Jonsson, G. K. (2013). Kinesics and proxemics communication of expert and novice PE teachers. Quality & Quantity, 47, 1813–1829. https://doi.org/10.1007/s11135-011-9628-5

Chan, V. (2011). Teaching oral communication in undergraduate science: Are we doing enough and doing it right? Journal of Learning Design, 4(3), 71–79. https://doi.org/10.5204/jld.v4i3.82

Chong, E., Wang, Y., Ruiz, N., & Rehg, J. M. (2020). Detecting attended visual targets in video. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2020), 13–19 June 2020, Seattle, Washington, USA (pp. 5396–5406). IEEE. https://doi.org/10.1109/CVPR42600.2020.00544

Chunduru, V., Roy, M., Dasari, R. N. S., & Chittawadigi, R. G. (2021). Hand tracking in 3D space using MediaPipe and PnP method for intuitive control of virtual globe. In Proceedings of the 2021 IEEE Ninth Region 10 Humanitarian Technology Conference (R10-HTC 2021), 30 September 2021–02 October 2021, Bangalore, India (pp. 1–6). IEEE. https://doi.org/10.1109/R10-HTC53172.2021.9641587

Clegg, H. R., Carpenter, T. M., Freear, S., & Cowell, D. M. (2022). An open, modular, ultrasound digitial signal processing specification. In Proceedings of the 2022 IEEE International Ultrasonics Symposium (IUS 2022), 10–13 October 2022, Venice, Italy (pp. 1–4). IEEE. https://doi.org/10.1109/IUS54386.2022.9957486

Coleman, G. R., & Salter, W. T. (2023). More eyes on the prize: Open-source data, software and hardware for advancing plant science through collaboration. AoB Plants, 15(2), 13. https://doi.org/10.1093/aobpla/plad010

Damian, I., Tan, C. S., Baur, T., Sch ¨oning, J., Luyten, K., & Andr ´e, E. (2015). Augmenting social interactions: Realtime behavioural feedback using social signal processing techniques. In Proceedings of the 33rd annual ACM Conference on Human Factors in Computing Systems (CHI 2015), 18–23 April 2015, Seoul, Republic of Korea (pp. 565–574). ACM. https://doi.org/10.1145/2702123.2702314

Davis, F. D. (1989). Perceived usefulness, perceived ease of use, and user acceptance of information technology. MIS Quarterly, 13(3), 319–340. https://doi.org/10.2307/249008

De Jong, N. H., Pacilly, J., & Heeren, W. (2021). PRAAT scripts to measure speed fluency and breakdown fluency in speech automatically. Assessment in Education: Principles, Policy & Practice, 28(4), 456–476. https://doi.org/10.1080/0969594X.2021.1951162

De Jong, N. H., & Wempe, T. (2009). PRAAT script to detect syllable nuclei and measure speech rate automatically. Behavior Research Methods, 41(2), 385–390. https://doi.org/10.3758/brm.41.2.385

Debalaxmi, D., Vishwakarma, D. K., & Ranga, V. (2024). Analyzing yoga pose recognition: A comparison of MediaPipe and YOLO keypoint detection with ensemble techniques. In Proceedings of the Third International Conference on Applied Artificial Intelligence and Computing (ICAAIC 2024), 5–7 June 2024, Salem, India (pp. 1011–1017). IEEE. https://doi.org/10.1109/ICAAIC60222.2024.10574984

Dermody, F., & Sutherland, A. (2015). A multimodal system for public speaking with real time feedback. In Proceedings of the 2015 ACM International Conference on Multimodal Interaction (ICMI 2015), 12–16 November 2015, Tokyo, Japan (pp. 369–370). ACM. https://doi.org/10.1145/2993148.2998536

Deshmukh, O., Espy-Wilson, C. Y., Salomon, A., & Singh, J. (2005). Use of temporal information: Detection of periodicity, aperiodicity, and pitch in speech. IEEE Transactions on Speech and Audio Processing, 13(5), 776–786. https://doi.org/10.1109/TSA.2005.851910

Domínguez, F., Eras, L., Tomal ´a, J., & Collaguazo, A. (2023). Estimating the distribution of oral presentation skills in an educational institution: A novel methodology. In J. Jovanovic, I. -A. Chounta, J. Uhomoibhi, & B. McLaren (Eds.), Proceedings of the 15th International Conference on Computer Supported Education (CSEDU 2023), 21–23 April 2023, Prague, Czechia (pp. 39–46). ScitePress Digital Library. https://doi.org/10.5220/0011853900003470

Donnell, J. A., Aller, B. M., Alley, M. P., & Kedrowicz, A. A. (2011). Why industry says that engineering graduates have poor communication skills: What the literature says. In Proceedings of the ASEE Annual Conference and Exposition (ASEE 2011), 26–29 June 2011, Vancouver, British Columbia, Canada. ASEE. https://doi.org/10.18260/1-2--18809

Dowhower, S. L. (1991). Speaking of prosody: Fluency’s unattended bedfellow. Theory into Practice, 30(3), 165–175. https://doi.org/10.1080/00405849109543497

Ericsson, K. A., Krampe, R. T., & Tesch-Römer, C. (1993). The role of deliberate practice in the acquisition of expert performance. Psychological Review, 100(3), 363. https://doi.org/10.1037/0033-295X.100.3.363

Fernández-Nieto, G. M., Echeverria, V., Martinez-Maldonado, R., & Shum, S. B. (2024). YarnSense: Automated data storytelling for multimodal learning analytics. In Proceedings of the 2024 Data Storytelling and Learning Analytics Workshop (DS-LAK 2024), 18–22 March 2024, Kyoto, Japan (pp. 124–138). CEUR. https://ceur-ws.org/Vol-3667/DS-LAK24_paper_3.pdf

Gan, T., Wong, Y., Mandal, B., Chandrasekhar, V., & Kankanhalli, M. S. (2015). Multi-sensor self-quantification of presentations. In Proceedings of the 23rd ACM International Conference on Multimedia (MM 2015), 26–30 October 2015, Brisbane, Australia (pp. 601–610). ACM. https://doi.org/10.1145/2733373.2806252

Kilag, O. K. T., Quimada, G. M., Contado, M. B., Macapobre, H. E., Rabi, J. I. I. A., & Peras, C. C. (2023). The use of body language in public speaking. Science and Education, 4(1), 393–406. https://openscience.uz/index.php/sciedu/article/view/4847

Kurihara, K., Goto, M., Ogata, J., Matsusaka, Y., & Igarashi, T. (2007). Presentation sensei: A presentation training system using speech and image processing. In Proceedings of the Ninth International Conference on Multimodal Interfaces (ICMI 2007), 12–15 November 2007, Nagoya, Aichi, Japan (pp. 358–365). https://doi.org/10.1145/1322192.1322256

Li, Z., Jensen, M. T., Nolte, A., & Spikol, D. (2024). Field report for platform mBox: Designing an open MMLA platform. In Proceedings of the 14th International Conference on Learning Analytics and Knowledge (LAK 2024), 18–22 March 2024, Kyoto, Japan (pp. 785–791). ACM. https://doi.org/10.1145/3636555.3636872

Lui, A. K.-F., Ng, S. - C., & Wong, W. -W. (2015). A novel mobile application for training oral presentation delivery skills. In J. Lam, K. Ng, S. Cheung, T. Wong, K. Li, & W. F. (Eds.), International Conference on Technology in Education (ICTE 2015), Communications in computer and information science (pp. 79–89, Vol. 559). Springer. https://doi.org/10.1007/978-3-662-48978-9_8

Martinez-Maldonado, R., Echeverria, V., Prieto, L. P., Rodriguez-Triana, M. J., Spikol, D., Curukova, M., Mavrikis, M., Ochoa, X., & Worsley, M. (2018). Multimodal transcript of face-to-face group-work activity around interactive tabletops. In Proceedings of the Second Multimodal Learning Analytics across (Physical and Digital) Spaces (CrossMMLA 2018), 6 March 2018, Sydney, New South Wales, Australia. CEUR. http://ceur-ws.org/Vol-2163/paper4.pdf

McCarthy, C., Pradhan, N., Redpath, C., & Adler, A. (2016). Validation of the Empatica E4 wristband. In Proceedings of the IEEE EMBS International Student Conference (ISC 2016), 29–31 May 2016, Ottawa, Ontario, Canada (pp. 1–4). IEEE. https://doi.org/10.1109/EMBSISC.2016.7508621

McGaghie, W. C., Issenberg, S. B., Cohen, M. E. R., Barsuk, J. H., & Wayne, D. B. (2011). Does simulation-based medical education with deliberate practice yield better results than traditional clinical education? A meta-analytic comparative review of the evidence. Academic Medicine: Journal of the Association of American Medical Colleges, 86(6), 706. https://doi.org/10.1097/acm.0b013e318217e119

Moothedath, M. (2024). Reliability of rubrics in the assessment of clinical oral presentation: A prospective controlled study. Journal of Education and Health Promotion, 13(1), 182. https://doi.org/10.4103/jehp.jehp 1016 23

Motavalli, S. (1998). Review of reverse engineering approaches. Computers & Industrial Engineering, 35(1), 25–28. https://doi.org/10.1016/S0360-8352(98)00011-4

Nguyen, A. -T., Chen, W., & Rauterberg, M. (2015). Intelligent presentation skills trainer analyses body movement. In I. Rojas, G. Joya, & A. Catala (Eds.), Proceedings of the 13th International Work-Conference on Artificial Neural Networks (IWANN 2015), Advances in computational intelligence (pp. 320–332). Springer. https://doi.org/10.1007/978-3-319-19222-2_27

Ochoa, X. (2017). Multimodal learning analytics. In C. Lang, G. Siemens, A. Wise, & D. Gasevic (Eds.), The Handbook of Learning Analytics (pp. 129–141, Vol. 1). SoLAR. https://doi.org/10.18608/hla17.011

Ochoa, X. (2022a). Multimodal learning analytics—rationale, process, examples, and direction. In C. Lang, G. Siemens, A. Friend Wise, D. Gasevic, & A. Merceron (Eds.), The Handbook of Learning Analytics (2nd ed., pp. 54–65). SoLAR. https://doi.org/10.18608/hla22.006

Ochoa, X. (2022b). Multimodal systems for automated oral presentation feedback: A comparative analysis. In M. Giannakos, D. Spikol, D. Di Mitri, K. Sharma, X. Ochoa, & R. Hammad (Eds.), The Multimodal Learning Analytics Handbook (pp. 53–78). Springer. https://doi.org/10.1007/978-3-031-08076-0_3

Ochoa, X., & Dominguez, F. (2020). Controlled evaluation of a multimodal system to improve oral presentation skills in a real learning setting. British Journal of Educational Technology, 51(5), 1615–1630. https://doi.org/10.1111/bjet.12987

Ochoa, X., Domínguez, F., Guamán, B., Maya, R., Falcones, G., & Castells, J. (2018). The RAP system: Automatic feedback of oral presentation skills using multimodal analysis and low-cost sensors. In Proceedings of the Eighth International Conference on Learning Analytics and Knowledge (LAK 2018), 7–9 March 2018, Sydney, New South Wales, Australia (pp. 360–364). ACM. https://doi.org/10.1145/3170358.3170406

Olechowski, A., Eppinger, S. D., & Joglekar, N. (2015). Technology readiness levels at 40: A study of state-of-the-art use, challenges, and opportunities. In Proceedings of the 2015 Portland International Conference on Management of Engineering and Technology (PICMET 2015), 2–6 August 2015, Portland, Oregon, USA (pp. 2084–2094). IEEE. https://doi.org/10.1109/PICMET.2015.7273196

Rios, J. A., Ling, G., Pugh, R., Becker, D., & Bacall, A. (2020). Identifying critical 21st-century skills for workplace success: A content analysis of job advertisements. Educational Researcher, 49(2), 80–89. https://doi.org/10.3102/0013189X19890600

Schneider, J., Börner, D., Van Rosmalen, P., & Specht, M. (2015). Presentation Trainer, your public speaking multimodal coach. In Proceedings of the 2015 ACM International Conference on Multimodal Interaction (ICMI 2015), 9–13 November 2015, Seattle, Washington, USA (pp. 539–546). ACM. https://doi.org/10.1145/2818346.2830603

Schneider, J., Börner, D., Van Rosmalen, P., & Specht, M. (2016). Enhancing public speaking skills—An evaluation of the Presentation Trainer in the wild. In K. Verbert, M. Sharples, & T. Klobu ˇcar (Eds.), Proceedings of the 2016 European Conference on Technology Enhanced Learning (EC-TEL 2016), Lecture notes in computer science (pp. 263–276). Springer. https://doi.org/10.1007/978-3-319-45153-4_20

Schneider, J., Romano, G., & Drachsler, H. (2019). Beyond reality—extending a presentation trainer with an immersive VR module. Sensors, 19(16), 3457. https://doi.org/10.3390/s19163457

Subapriya, K. (2009). The importance of non-verbal cues. Journal of Soft Skills, 3(2), 37–42. https://www.iupindia.in/609/IJSS_Non-Verbal%20Cues 37.html

Tanveer, M. I., Lin, E., & Hoque, M. (2015). Rhema: A real-time in-situ intelligent interface to help people with public speaking. In Proceedings of the 20th International Conference on Intelligent User Interfaces (IUI 2015), 29 March–1 April 2015), Atlanta, Georgia, USA (pp. 286–295). ACM. https://doi.org/10.1145/2678025.2701386

Thurneck, L. (2011). Incorporating student presentations in the college classroom. Inquiry, 16(1), 17–30. https://files.eric.ed.gov/fulltext/EJ952023.pdf

Trinh, H., Asadi, R., Edge, D., & Bickmore, T. (2017). RoboCOP: A robotic coach for oral presentations. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 1(2), 1–24. https://doi.org/10.1145/3090092

Van Ginkel, S. (2019). Fostering oral presentation competence in higher education [Doctoral dissertation, Wageningen University]. https://doi.org/10.18174/476541

Van Ginkel, S., Gulikers, J., Biemans, H., & Mulder, M. (2015). Towards a set of design principles for developing oral presentation competence: A synthesis of research in higher education. Educational Research Review, 14, 62–80. https://doi.org/10.1016/j.edurev.2015.02.002

Van Ginkel, S., Laurentzen, R., Mulder, M., Mononen, A., Kytt¨a, J., & Kortelainen, M. J. (2017). Assessing oral presentation performance: Designing a rubric and testing its validity with an expert group. Journal of Applied Research in Higher Education, 9(3), 474–486. https://doi.org/10.1108/JARHE-02-2016-0012

Williams van Rooij, S. (2011). Higher education sub-cultures and open source adoption. Computers & Education, 57(1), 1171–1183. https://doi.org/10.1016/j.compedu.2011.01.006

Worsley, M., & Martinez-Maldonado, R. (2018). Multimodal learning analytics’ past, present, and potential futures. In Proceedings of the Second Multimodal Learning Analytics across (Physical and Digital) Spaces (CrossMMLA 2018), 6 March 2018, Sydney, New South Wales, Australia (pp. 1–16, Vol. 2). CEUR. https://ceur-ws.org/Vol-2163/paper5.pdf

Yan, L., Zhao, L., Gasevic, D., & Martinez-Maldonado, R. (2022). Scalability, sustainability, and ethicality of multimodal learning analytics. In Proceedings of the 12th International Conference on Learning Analytics and Knowledge (LAK 2022), 21–25 March 2022, online (pp. 13–23). ACM. https://doi.org/10.1145/3506860.3506862

OpenOPAF: An Open Source Multimodal System for Automated Feedback for Oral Presentations

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Most read articles by the same author(s)