Predicting Math Identity Through Language and Click-Stream Patterns in a Blended Learning Mathematics Program for Elementary Students


  • Scott A Crossley
  • Shamya Karumbaiah
  • Jaclyn Ocumpaugh
  • Matthew J Labrum
  • Ryan S Baker



Natural language processing, click-stream data, math success, math identity, longitudinal math development, corpus linguistics


This study builds on prior research by leveraging natural language processing (NLP), click-stream analyses, and survey data to predict students’ mathematics success and math identity (namely, self-concept, interest, and value of mathematics). Specifically, we combine NLP tools designed to measure lexical sophistication, text cohesion, and sentiment with analyses of student click-stream data within an online mathematics tutoring system. We combine these data sources to predict elementary students’ success within the system as well as components of their math identity as measured though a standardized survey. Data from 147 students was examined longitudinally over a year of study. The results indicated links between math success and non-cognitive measures of math identity. Additionally, the results indicate that math identity was strongly predicted by click-stream variables and the production of more lexically sophisticated and cohesive language. In addition, significant variance in math identity was explained by affective and cognitive variables. The results indicate that NLP and click-stream data can combine to provide insights into non-cognitive constructs such as math identity.


Allen, L. K., Likens, A. D., & McNamara, D. S. (2018). A multi-dimensional analysis of writing flexibility in an automated writing evaluation system. Proceedings of the 8th International Conference on Learning Analytics and Knowledge (LAK ’18), 5–9 March 2018, Sydney, NSW, Australia (pp. 380–388). New York: ACM.

Baker, R. S., & Rossi, L. M. (2013). Assessing the disengaged behavior of learners. In R. Sottilare, A. Graesser, X. Hu, & H. Holden (Eds.), Design recommendations for intelligent tutoring systems (pp. 155–166). Orlando, FL: U.S. Army Research Lab.

Baker, R., & Ocumpaugh, J. (2014). Interaction-based affect detection in educational software. In R. A. Calvo, S. K. D’Mello, J. Gratch, & A. Kappas (Eds.), The Oxford handbook of affective computing. Oxford, UK: Oxford University Press. https://dx,

Bandura, A. (1977). Self-efficacy: Toward a unifying theory of behavioral change. Psychological Review, 84(2), 191–215.

Bandura, A., & Schunk, D. (1981). Cultivating competence, self-efficacy, and intrinsic interest through proximal self-motivation. Journal of Personality and Social Psychology, 41(3), 586–598.

Barton, K. (2018). MuMIn: Multi-Model Inference. R package version 1.42.1.

Bates, D., Maechler, M., Bolker, B., & Walker, S. (2015). lme4: Linear mixed-effects models using Eigen and S4. R package version 1.1–9.

Baxter, J. A., & Haycock, J. (2014). Roles and student identities in online large course forums: Implications for practice. The International Review of Research in Open and Distributed Learning, 15(1).

Beheshitha, S. S., Hatala, M., Gašević, D., & Joksimović, S. (2016). The role of achievement goal orientations when studying effect of learning analytics visualizations. Proceedings of the 6th International Conference on Learning Analytics and Knowledge (LAK ʼ16), 25–29 April 2016, Edinburgh, UK (pp. 54–63). New York: ACM.

Bem, S. (1974). The measurement of psychological androgyny. Journal of Consulting and Clinical Psychology, 42(2), 155–162.

Bereiter, C., & Scardamalia, M. (1987). The psychology of written communication. Hillsdale, NJ: Lawrence Erlbaum.

Berninger, V., Fuller, F., & Whitaker, D. (1996). A process approach to writing development across the life span. Educational Psychology Review, 8, 193–218.

Bong, M., & Skaalvik, E. M. (2003). Academic self-concept and self-efficacy: How different are they really? Educational Psychology Review, 15(1), 1–40.

Botarleanu, R., Dascalu, M., Sirbu, D., Crossley, S. A., & Trausan-Matu, S. (2018). ReadME: Generating personalized feedback for essay writing using the ReaderBench framework. In H. Knoche, E. Popescu, & A. Cartelli (Eds.), The Interplay of Data, Technology, Place and People for Smart Learning: Proceedings of the 3rd International Conference on Smart Learning Ecosystems and Regional Development (pp. 133–145). Springer.

Campbell, N., & Hackett, G. (1986). The effects of mathematics task performance on math self-efficacy and task interest. Journal of Vocational Behavior, 28(2), 149–162.

Cass, C. A., Hazari, Z., Cribbs, J., Sadler, P. M., & Sonnert, G. (2011). Examining the impact of mathematics identity on the choice of engineering careers for male and female students. 2011 Frontiers in Education Conference (FIE), Rapid City, SD, 2011 (pp. F2H-1–F2H-5). IEEE Computer Society.

Chouinard, R., Karsenti, T., & Roy, N. (2007). Relations among competence beliefs, utility value, achievement goals, and effort in mathematics. British Journal of Educational Psychology, 77(3), 501–517.

Cooper, D. G., Arroyo, I., Woolf, B. P., Muldner, K., Burleson, W., & Christopherson, R. (2009). Sensors model student self-concept in the classroom. In G.-J. Houben, G. McCalla, F. Pianesi, & M. Zancanaro (Eds.), International Conference on User Modeling, Adaptation, and Personalization (UMAP 2009), 22–26 June 2009, Trento, Italy (pp. 30–41). Springer.

Crossley, S. A., Barnes, T., Lynch, C., & McNamara, D. S. (2017). Linking language to math success in a blended course. In X. Hu, T. Barnes, A. Hershkovitz, & L. Paquette (Eds.), Proceedings of the 10th International Conference on Educational Data Mining (EDM2017), 25–28 June 2017, Wuhan, China (pp. 180–185). International Educational Data Mining Society.

Crossley, S. A., Kyle, K., & McNamara, D. S. (2016a). The tool for the automatic analysis of text cohesion (TAACO): Automatic assessment of local, global, and text cohesion. Behavior Research Methods, 28(4), 1227–1237.

Crossley, S. A., Kyle, K., & McNamara, D. S. (2016b). Sentiment analysis and social cognition engine (SEANCE): An automatic tool for sentiment, social cognition, and social order analysis. Behavior Research Methods, 49(3), 803–821.

Crossley, S. A., Kyle, K., & McNamara, D. S. (2016c). The development and use of cohesive devices in L2 writing and their relations to judgments of essay quality. The Journal of Second Language Writing, 32, 1–16.

Crossley, S. A., Karumbaiah, S., Labrum, M., Ocumpaugh, J., & Baker, R. (2019). Predicting math success in an online tutoring system using language data and click-stream variables: A longitudinal analysis. In M. Eskevich, G. de Melo, C.

Fäth, J. P. McCrae, P. Buitelaar, C. Chiarcos, B. Klimek, & M. Dojchinovski (Eds.), Proceedings of the 2nd Conference on Language Data and Knowledge (LDK 2019), 20–23 May 2019, Leipzig, Germany (pp. 1–13). Open Access Series in Informatics (OASICS), vol. 70. Retrieved from

Crossley, S., Ocumpaugh, J., Labrum, M., Bradfield, F., Dascalu, M., & Baker, R. S. (2018). Modeling math identity and math success through sentiment analysis and linguistic features. In K. E. Boyer & M. Yudelson (Eds.), Proceedings of the 11th International Conference on Educational Data Mining (EDM2018), 16–20 July 2018, Buffalo, New York, USA (pp. 11–20). International Educational Data Mining Society.

Crossley, S. A., Paquette, L., Dascalu, M., McNamara, D., & Baker, R. (2016). Combining click-stream data with NLP tools to better understand MOOC completion. In D. Gašević & G. Lynch (Eds.), Proceedings of the 6th International Conference on Learning Analytics and Knowledge (LAK ʼ16), 25–29 April 2016, Edinburgh, UK (pp. 6–14). New York: ACM.

Crossley, S. A., Skalicky, S., & Dascalu, M. (2019). Moving beyond classic readability formulas: New methods and new models. Journal of Research in Reading, 42(3–4), 541–561.

D’Mello, S. K., & Kory, J. (2015). A review and meta-analysis of multimodal affect detection systems. ACM Computing Surveys (CSUR), 47(3),

Davies, M. (2008). The Corpus of Contemporary American: 450 million words, 1990–present. Retrieved from http://corpu

Dascalu, M., Sirbu, M. D., Gutu-Robu, G., Ruseti, S., Crossley, S. A., & Trausan-Matu, S. (2018). Cohesion-centered analysis of sociograms for online communities and courses using ReaderBench. In V. Pammer-Schindler, M. Pérez-Sanagustín, H. Drachsler, R. Elferink, & M. Scheffel (Eds.), Lifelong Technology-Enhanced Learning (EC-TEL 2018). Lecture Notes in Computer Science, vol. 11082. Springer, Cham.

Dowell, N. M. M., & Graesser, A. C. (2014). Modeling learners’ cognitive, affective, and social processes through language and discourse. Journal of Learning Analytics, 1(3), 183–186.

Duckworth, A. L., & Seligman, M. E. P. (2005). Self-discipline outdoes IQ in predicting academic performance of adolescents. Psychological Science, 16(12), 939–944.

Dupeyrat, C., & Mariné, C. (2005). Implicit theories of intelligence, goal orientation, cognitive engagement, and achievement: A test of Dweck’s model with returning to school adults. Contemporary Educational Psychology, 30(1), 43–59.

Eccles, J. (2009). Who am I and what am I going to do with my life? Personal and collective identities as motivators of action. Educational Psychologist, 44(2), 78–89.

Eccles, J. (2011). Gendered educational and occupational choices: Applying the Eccles et al. model of achievement-related choices. International Journal of Behavioral Development, 35(3), 195–201.

Epstein, S. (1973). The self-concept revisited: Or a theory of a theory. American Psychologist, 28(5), 404–416.

Fink, R. P. (1998). Interest, gender, and literacy development in successful dyslexics. In L. Hoffmann, A. Krapp, K. A. Renninger, & J. Baumert (Eds.), Interest and Learning: Proceedings of the Seeon Conference on Interest and Gender (pp. 402–407). Kiel, Germany: IPN.

Frenzel, A. C., Goetz, T., Pekrun, R., & Watt, H. M. (2010). Development of mathematics interest in adolescence: Influences of gender, family, and school context. Journal of Research on Adolescence, 20(2), 507–537.

Gašević, D., Dawson, S., & Siemens, G. (2015). Let’s not forget: Learning analytics are about learning. TechTrends, 59(1), 64–71.

Gecas, V. (2000). Value identities, self-motives, and social movements. Self, Identity, and Social Movements, 13, 93–109.

Gottfried, A. E. (1985). Academic intrinsic motivation in elementary and junior high school students. Journal of Educational Psychology, 77(6), 631–645.

Greene, J. A., Muis, K. R., & Pieschl, S. (2010). The role of epistemic beliefs in students’ self-regulated learning with computer-based learning environments: Conceptual and methodological issues. Educational Psychologist, 45(4), 245–257.

Harackiewicz, J., Rozek, C., Hulleman, C., & Hyde, J. (2012). Helping parents to motivate adolescents in mathematics and science: An experimental test of a utility-value intervention. Psychological Science, 23(8), 899–906.

Hidi, S., & Renninger, K. (2006). The four-phase model of interest development. Educational Psychologist, 41(2), 111–127.

Hitlin, S. (2003). Values as the core of personal identity: Drawing links between two theories of self. Social Psychology Quarterly, 66(2), 118–137.

Karumbaiah, S., Ocumpaugh, J., & Baker, R. S. (2019). The influence of school demographics on the relationship between students’ help-seeking behavior and performance and motivational measures. In C. F. Lynch, A. Merceron, M. Desmarais, & R. Nkambou (Eds.), Proceedings of the 12th International Conference on Educational Data Mining (EDM2019), 2–5 July 2019, Montréal, Quebec, Canada (pp. 99–108). International Educational Data Mining Society.

Karumbaiah, S., Ocumpaugh, J., Labrum, M., & Baker, R. S. (2019). Temporally rich features capture variable performance associated with elementary students’ lower math self-concept. Paper presented at the Workshop on Online Learning and Social-Emotional Learning at the 9th International Conference on Learning Analytics and Knowledge (LAK ’19), 4–8 March 2019, Tempe, Arizona, USA. New York: ACM.

Khachatryan, G., Romashov, A., Khachatryan, A., Gaudino, S., Khachatryan, J., Guarian, K., & Yufa, N. (2014). Reasoning mind genie 2: An intelligent tutoring system as a vehicle for international transfer of instructional methods in mathematics. International Journal of Artificial Intelligence in Education, 24(3), 333–382.

Knight, S., Buckingham Shum, S., & Littleton, K. (2013). Epistemology, pedagogy, assessment and learning analytics. Proceedings of the 3rd International Conference on Learning Analytics and Knowledge (LAK ’13), 8–12 April 2013, Leuven, Belgium (pp. 75–84). New York: ACM.

Kuznetsova, A., Brockhoff, P. B., & Christensen, R. H. B. (2017). lmerTest package: Tests in linear mixed effects models. Journal of Statistical Software, 82(13), 1–26.

Kyle, K., & Crossley, S. A. (2017). Assessing syntactic sophistication in L2 writing: A usage-based approach. Language Testing, 34(4), 513–535.

Kyle, K., Crossley, S. A., & Berger, C. (2018). The tool for the automatic analysis of lexical sophistication version 2.0. Behavior Research Methods, 50(3), 1030–1046.

Lust, G., Elen, J., & Clarebout, G. (2013). Students’ tool-use within a web-enhanced course: Explanatory mechanisms of students’ tool-use pattern. Computers in Human Behavior, 29(5), 2013–2021.

Marsh, H. W., Parker, J. W., & Smith, I. D. (1983). Preadolescent self-concept: Its relation to self-concept as inferred by teachers and to academic ability. British Journal of Educational Psychology, 53(1), 60–78.

Marsh, H. W., Smith, I. D., & Barnes, J. (1985). Multidimensional self-concepts: Relations with sex and academic achievement. Journal of Educational Psychology, 77(5), 581–596.

McQuiggan, S. W., Mott, B. W., & Lester, J. C. (2008). Modeling self-efficacy in intelligent tutoring systems: An inductive approach. User Modeling and User-Adapted Interaction, 18(1–2), 81–123.

Miller, G. A. (1995). WordNet: A lexical database for English. Communications of the ACM, 38(11), 39–41.

Miller, W., Baker, R., Labrum, M., Petsche, K., Liu, Y.-H., & Wagner, A. (2015). Automated detection of proactive remediation by teachers in reasoning mind classrooms. Proceedings of 5th International Learning Analytics and Knowledge Conference (LAK ʼ15), 16–20 March 2015, Poughkeepsie, NY, USA (pp. 290–294). New York: ACM.

Mingle, L. (2013). Threats to Success in Mathematics: Examining the Combined Effects of Choking Under Pressure and Stereotype Threat (Doctoral dissertation, University of Illinois at Urbana-Champaign).

Moon, S., Potdar, S., & Martin, L. (2014). Identifying student leaders from MOOC discussion forums through language influence. Proceedings of the Workshop on Modeling Large Scale Social Interaction in MOOCs at the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP 2014), 25 October 2014, Doha, Qatar (pp. 15–20). Association for Computational Linguistics.

Nasir, N. I. S., & Saxe, G. B. (2003). Ethnic and academic identities: A cultural practice perspective on emerging tensions and their management in the lives of minority students. Educational Researcher, 32(5), 14–18.

Ocumpaugh, J., San Pedro, M. O., Lai, H. Y., Baker, R. S., & Borgen, F. (2016). Middle school engagement with mathematics software and later interest and self-efficacy for STEM careers. Journal of Science Education and Technology, 25(6), 877–887.

Osborne, J. W., & Jones, B. D. (2011). Identification with academics and motivation to achieve in school: How the structure of the self influences academic outcomes. Educational Psychology Review, 23(1), 131–158.

Osterman, K. F. (2000). Students’ need for belonging in the school community. Review of Educational Research, 70(3), 323–367.

Pajares, F., & Miller, M. D. (1994). Role of self-efficacy and self-concept beliefs in mathematical problem solving: A path analysis. Journal of Educational Psychology, 86(2), 193–203.

Pardo, A., Han, F., & Ellis, R. A. (2016). Exploring the relation between self-regulation, online activities, and academic performance: A case study. Proceedings of the 6th International Conference on Learning Analytics and Knowledge (LAK ʼ16), 25–29 April 2016, Edinburgh, UK (pp. 422–429). New York: ACM.

Prenzel, M. (1992). The selective persistence of interest. In K. A. Renninger, S. Hidi, & A. Krapp (Eds.), The role of interest in learning and development (pp. 71–98). Hillsdale, NJ: Lawrence Erlbaum.

R Core Team. (2018). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.

Renninger, K. A. (2009). Interest and identity development in instruction: An inductive model. Educational Psychologist, 44(2), 105–118.

Reveles, J. M., Cordova, R., & Kelly, G. J. (2004). Science literacy and academic identity formulation. Journal of Research in Science Teaching, 41(10), 1111–1144.

Roberts, B., & DelVecchio, W. (2000). The rank-order consistency of personality from childhood to old age: A quantitative review of longitudinal studies. Psychological Bulletin, 126(1), 3–25.

Ryan, K., & Ryan, A. (2005). Psychological processes underlying stereotype threat and standardized math test performance. Educational Psychologist, 40(1), 53–63.

San Pedro, M. O., Ocumpaugh, J., Baker, R. S., & Heffernan, N. T. (2014). Predicting STEM and non-STEM college major enrollment from middle school interaction with mathematics educational software. In J. Stamper et al. (Eds.), Proceedings of the 7th International Conference on Educational Data Mining (EDM2014), 4–7 July 2014, London, UK (pp. 276–279). International Educational Data Mining Society.

Sansone, C., Weir, C., Harpster, L., & Morgan, C. (1992). Once a boring task always a boring task? Interest as a self-regulatory mechanism. Journal of Personality and Social Psychology, 63(3), 379–390.

Schlenker, B., & Weigold, M. (1989). Goals and the self-identification process: Constructing desired identities. In L. A. Pervin (Ed.), Goal concepts in personality and social psychology (pp. 243–289). Hillside, NJ: Lawrence Erlbaum Association.

Shavelson, R., & Bolus, R. (1982). Self-concept: The interplay of theory and methods. Journal of Educational Psychology, 74(1), 3–17.

Sirbu, M. D., Dascalu, M., Crossley, S., McNamara, D. S., & Trausan-Matu, S. (2019). Longitudinal analysis of participation in online courses powered by cohesion network analysis. In K. Lund, G. Niccolai, E. Lavoué, C. Hmelo-Silver, G. Gweon, & M. Baker (Eds.), A Wide Lens: Combining Embodied, Enactive, Extended, and Embedded Learning in Collaborative Settings, Proceedings of the 13th International Conference on Computer Supported Collaborative Learning (CSCL 2019), Volume 2, 17–21 June 2019, Lyon, France (pp. 640–643). International Society of the Learning Sciences.

Slater, S., Ocumpaugh, J., Baker, R., Lib, J., & Labrum, M. (2018). Identifying changes in math identity through adaptive learning systems use. In J. C. Yang, M. Chang, L.-H. Wong, & M. T. Rodrigo (Eds.), Proceedings of the 26th International Conference on Computers in Education (ICCE 2018), 26–30 November 2018, Manila, Philippines (pp. 71–76). Asia-Pacific Society for Computers in Education.

Solomon, Y. (2007). Not belonging? What makes a functional learner identity in undergraduate mathematics? Studies in Higher Education, 32(1), 79–96.

Steinmayr, R., & Spinath, B. (2009). The importance of motivation as a predictor of school achievement. Learning & Individual Differences, 19(1), 80–90.

Stipek, D. J. (1981). Children’s perceptions of their own and their classmates’ ability. Journal of Educational Psychology, 73(3), 404–410.

Syed, M., Azmitia, M., & Cooper, C. R. (2011). Identity and academic success among underrepresented ethnic minorities: An interdisciplinary review and integration. Journal of Social Issues, 67(3), 442–468.

Syed, M., & Chemers, M. M. (2011). Ethnic minorities and women in STEM: Casting a wide net to address a persistent social problem. Journal of Social Issues, 67(3), 435–441.

Walton, G. M., & Cohen, G. L. (2011). A brief social-belonging intervention improves academic and health outcomes of minority students. Science, 331(6023), 1447–1451.

Watt, H. M., Hyde, J. S., Petersen, J., Morris, Z. A., Rozek, C. S., & Harackiewicz, J. M. (2017). Mathematics: A critical filter for STEM-related career choices? A longitudinal examination among Australian and US adolescents. Sex Roles, 77(3–4), 254–271.

Wen, M., Yang, D., & Rosé, C. P. (2014a). Sentiment analysis in MOOC discussion forums: What does it tell us? In J. Stamper et al. (Eds.), Proceedings of the 7th International Conference on Educational Data Mining (EDM2014), 4–7 July 2014, London, UK (pp. 130–137). International Educational Data Mining Society.

Wen, M., Yang, D., & Rosé, C. P. (2014b). Linguistic reflections of student engagement in massive open online courses. Proceedings of the 8th International AAAI Conference on Weblogs and Social Media (ICWSM ’14), 1–4 June 2014, Ann Arbor, Michigan, USA (pp. 525–534). Palo Alto, CA: AAAI Press.

Winne, P. H., & Jamieson-Noel, D. (2002). Exploring students’ calibration of self reports about study tactics and achievement. Contemporary Educational Psychology, 27(4),551–572.

Zhou, M., & Winne, P. H. (2012). Modeling academic achievement by self-reported versus traced goal orientation. Learning and Instruction, 22(6), 413–419.




How to Cite

Crossley, S. A., Karumbaiah, S., Ocumpaugh, J., Labrum, M. J., & Baker, R. S. (2020). Predicting Math Identity Through Language and Click-Stream Patterns in a Blended Learning Mathematics Program for Elementary Students. Journal of Learning Analytics, 7(1), 19–37.



Special Section: Beyond Cognitive Ability: Enabling Assessment of 21C Skills