Individual Differences Related to College Students’ Course Performance in Calculus II
Sara A. Hart
Department of Psychology
Florida Center for Reading Research
Florida State University
Department of Psychology, Florida State University
Colleen M. Ganley
Florida Center for Research in Science
Technology, Engineering and Math, Learning Systems Institute
Department of Psychology, Florida State University
ABSTRACT. In this study, we explore student achievement in a semester-long flipped Calculus II course, combining various predictor measures related to student attitudes (math anxiety, math confidence, math interest, math importance) and cognitive skills (spatial skills, approximate number system), as well as student engagement with the online system (discussion forum interaction, time to submission of workshop assignments, quiz attempts), in predicting final grades. Data from 85 students enrolled in a flipped Calculus II course was used in dominance analysis to determine which predictors emerged as the most important for predicting final grades. Results indicated that feelings of math importance, approximate number system (ANS) ability, total amount of discussion forum posting, and time grading peer workshop submissions was the best combination of predictors of final grade, accounting for 17% of variance in a student’s final grade. The point of this work was to determine which predictors are the most important in predicting student grade, with the end goal of building a recommendation system that could be implemented to help students in this traditionally difficult class. The methods used here could be used for any class.
Keywords: Math performance, calculus, flipped classroom, math attitudes, cognitive performance, student engagement
ISSN 1929-7750 (online). The Journal of Learning Analytics works under a Creative Commons License, Attribution - NonCommercial-NoDerivs 3.0 Unported (CC BY-NC-ND 3.0)
(2017). Individual differences related to college students’ course performance in calculus II. Journal of Learning Analytics, 4(2), 129–153. http://dx.doi.org/10.18608/jla.2017.42.11
Over the past few years, the attrition of STEM-focused undergraduates in the United States has become a critical national concern, with “a substantial number of undergraduate students initially enrolled in STEM degree programs [dropping] out in the first two years” (PCAST, 2012). In order to maintain highquality instructional practices in the face of STEM undergraduate attrition, growing demand for large numbers of graduates, and diminishing financial and human resources, many colleges and universities are turning to technologically aided teaching practices. This technological shift aims to integrate the traditional classroom environment with online course resources to enhance, replace, and supplement face-to-face instruction to reach more students in a cost-effective way (Garrison & Kanuka, 2004).
To accommodate this shift toward technologically aided teaching methods, schools have implemented campus-wide Learning Management Systems (LMS), such as Moodle and Blackboard. These web-based management systems provide data related to the “user,” with every action of the student tracked and recorded online. The underlying advantage of these LMS data is that they unobtrusively record individual student activity and interaction with course materials in real-time, providing a lens into traditionally unobservable learning-related behaviours and silently tracking individual students’ learning progression (Macfadyen & Dawson, 2010; Gašević, Dawson, & Siemens, 2015). In traditional college courses, many performance measures, for example midterm or final exams (Lee, Speglia, Ha, Finch, & Nehm, 2015; Macfadyen & Dawson, 2010), are taken too late in the semester to identify struggling students in time to prevent course failure. In contrast, the real-time nature of LMS data may provide early warning signs of potentially at-risk students, enabling the implementation of intervention efforts before failure becomes inevitable (Milne, Jeffrey, Suddaby, & Higgins, 2012; Gašević et al., 2015). By moving away from summative assessments such as exams, and providing visualizable, real-time information about individual aspects of student engagement and learning (Cocea & Weibelzahl, 2009), the use of LMS data, also called “academic analytics” (Goldstein & Katz, 2005; Macfadyen & Dawson, 2010), allows educators to monitor individual academic progress step-by-step (Bienkowski, Feng & Means, 2012).
With this study, we sought to determine the best individual differences predictors of student performance in a flipped Calculus II course. This course required students to interact with an LMS outside of class by watching videos of course content lectures and doing workshops and quizzes, as well as come to class for guided problem solving. The course performance predictors we used included attitudinal and cognitive factors commonly found to be associated with math achievement in the education and psychology literatures (Reyes & Stanic, 1988; Randhawa, Beamer, & Lundberg., 1993; House, 1993; House, 1995; Wigfield & Eccles, 2000; Maloney & Beilock, 2012). Beyond these factors, we also incorporated LMS online-usage data as course performance predictors, to capture student engagement and disengagement with the online portion of the course (Cocea & Weibelzahl, 2009; Tempelaar, Rientes, & Giesbers, 2015; Gašević et al., 2015). This study represents a step toward using learning analytics in combination with cognitive and attitudinal variables to inform potential recommendation systems based on predictive models to facilitate individualized learning and bolster student success (Tempelaar et al., 2015).
1.1 Attitudinal and Cognitive Factors Related to Math Achievement
There is a rich history of examining individual differences predictors of math achievement. For example, recently a study found that learning-related emotions were strong predictors of undergraduate student exam scores in a math and statistics course (Niculescu et al., 2015). In general, research across this area has found that attitudinal variables are typically weakly to moderately correlated with math achievement (e.g., meta-analyses for math anxiety and math achievement found r = –.34 for grades 5–12, Hembree, 1990; r = –.27 for grades 4–12, Ma, 1999). Critical attitudinal variables related to math that have been examined in past research include math anxiety, math confidence, math interest, and math importance (e.g., Wigfield & Eccles, 2000). Math anxiety is defined as a fear or an adverse emotional response to the idea of doing math. Math anxiety has been negatively linked to math achievement for several potential reasons, including the possibility that it acts as a proxy for poor math ability, that it is a source of worry that compromises cognitive load in a math testing situation, or because it leads to math avoidance, which results in poor math achievement (Ashcraft, 2002; Ashcraft & Krause, 2007; Ganley & Vasilyeva, 2014; Hembree, 1990; Ho et al., 2000; Ma & Xu, 2004; Maloney & Beilock, 2012). On the other hand, math confidence, which we conceptualize as including what others term expectations of success, academic self-concept, and perceived competence, has been related to better math outcomes (House, 1993; House, 1995; Randhawa et al., 1993; Reyes & Stanic, 1988). Math interest (Köller, Baumert, & Schnabel, 2001; Marsh, Trautwein, Lüdtke, Köller, & Baumert, 2005) also positively relates to math achievement because it may increase engagement, as students are motivated to engage with material in which they are interested (Cocea & Weibelzahl, 2009). This relation is found across educational levels and with multiple different math content areas (Richardson, Abraham, & Bond, 2012). Math importance, or math utility value, is another critical variable to consider, as research suggests that when students do not see the value in learning math they have lower math achievement and are less likely to sign up for math courses (Meece, Wigfield, & Eccles, 1990; Simpkins, Davis-Kean, & Eccles, 2006).
Differences in math achievement have also been positively correlated with cognitive skills, including the Approximate Number System (ANS) and spatial abilities. The ANS is a part of the non-symbolic number system, less precise than counting, that improves with age, and allows us to represent numbers nonverbally in order to quickly distinguish between two quantities (Halberda & Feigenson, 2008). Although not without controversy, a review of the literature links the ANS to math achievement (r = 0.20, 95% confidence intervals = 0.14, 0.26; Chen & Li, 2014). Another cognitive skill believed to be associated with math achievement is spatial ability, or the ability to mentally represent and manipulate objects in space. In particular, mental rotation skills, or the ability to rotate objects in space, is consistently found to be related to math achievement (r = .35–.38 for female students, r = –.03–.54 for male students; Casey, Nuttall, Pezaris, & Benbow, 1995). Some experimental work has also shown that training in spatial skills can produce enhanced math test performance (Cheng & Mix, 2014) and improved grades in a math course (Sorby, Casey, Veurink, & Dulaney, 2013).
1.2 Factors Related to Achievement in Online Courses
Looking specifically at the online learning environment, research has revealed that positive learning dispositions bolster student engagement (Tempelaar, Niculescu, Rientes, Giesbers, & Gijselaers, 2012; Tempelaar et al., 2015), and that student engagement and frequency of participation strongly predict student performance (Chen & Jang, 2010; Davies & Graff, 2005; de Barba, Kennedy, & Ainley, 2016; Tempelaar et al., 2015; Kizilcec, Piech, & Schneider, 2013; Morris, Finnegan, & Wu, 2005) and course completion (Milne et al., 2012; Macfadyen & Dawson, 2010; Morris et al., 2005) in online courses. Although these results are not surprising, they are important because they suggest that we may be able to use LMS data to identify students who are not engaged with course material. This affords the opportunity to do things, using the online platform, to help engage students in the material. An early study exploring the value of LMS student engagement data was the Course Signals program implemented at Purdue University. The study revealed that students who received regular feedback about their likelihood of successful course completion via an online colour-coded risk assessment system had higher retention rates than controls (Yukselturk & Bulut, 2007).
Student engagement with an LMS class can take different forms, including learner–content interaction or social interaction (i.e., learner–learner or learner–instructor interaction; Fulford & Zhang, 1993), and the studies analyzing LMS data have found inconsistent results about which engagement method is most related to achievement. Some studies find that learner–content interaction is most important for student success. For example, Morris and colleagues (2005) linked learner–content interaction, measured by both frequency and duration of content viewing, to student achievement, concluding that higher performing students are more likely to dedicate their time to viewing relevant content rather than creating their own original content. Similarly, Ramos and Yudko (2008) found that the total number of user hits on an LMS positively predicted exam outcomes, but discussion board participation, including number of posts contributed and number of posts read, did not. On the other hand, Gašević, Dawson, Rogers, and Gašević (2016) found that time spent on assignments was actually negatively associated with achievement, suggesting that perhaps students who spend more time on assignments specifically may be using their resources inefficiently or are trying to make up for a lack of comprehension (Gašević, et al., 2016).
Other work suggests that learner–learner interaction is the more important form of student engagement for student achievement. For example, multiple studies in different STEM content areas have found a positive relation between the number of discussion posts contributed by a student (Huon, Spehar, Adam, & Refkin, 2007; Macfadyen & Dawson, 2010), as well as the quality of these posts (Romero, Lopez, Luna, & Ventura, 2013) and their course performance, suggesting that successful students are more likely to use online resources to facilitate learning through social engagement with peers. Conversely, students who infrequently participate in discussion forums and show patterns of disengagement are more likely to fail (Milne et al., 2012; Romero et al., 2013). The strength of the association ranges from weak (Lauría, Baron, Devireddy, Sundararaju, & Jayaprakash, 2012; Chanlin, 2012) to strong (Macfadyen & Dawson, 2010), and mere participation in discussion does not significantly distinguish students with the highest grades from average performers (Davies & Graff, 2005). Furthermore, discussion participation matters most for students in danger of failing (Davies & Graff, 2005), most likely because the feeling of being supported makes struggling students more likely to persist (Romero et al., 2013).
The rise in computer-assisted classes represents a push toward learner-controlled and learner-centred learning environments, in which students are required to make decisions about how and to what extent they engage with class materials (Lust, Elen, & Clarebout, 2013), but a majority of students fail to utilize resources effectively (Lust, Juarez-Collazo, Elen, & Clarebout, 2012; Lust, Vandewaetere, Cuelemans, Elen, & Clarebout, 2011). Evidence shows that the degree to which students engage in the learning process through effective online tool use is associated with their learning outcomes, even when accounting for differences in cognitive ability (Shute & Gluck, 1996). Thus, pinpointing the factors that underlie student engagement with learning materials is essential for understanding why students succeed or fail in an online course (Tempelaar et al., 2012).
Time management is defined as the ability to prioritize time in order to fulfill learning demands, complete tasks, and modify plans to accommodate any changes in time or demands (Jo, Park, Yoon, & Sung, 2016). Time management has been shown to be a significant predictor of online achievement (Jo et al., 2016; Kwon, 2009; Choi & Choi, 2012), although the association may be weak (r=.14, p=.00; Broadbent & Poon, 2015). Other studies have found direct effects of time management on achievement (Jo & Kim, 2013; Loomis, 2000). Conversely, poor time management, or procrastination, negatively influences online participation (Michinov, Brunot, Le Bohec, Juhel, & Delaval, 2011; Balduf, 2009), with the students least likely to participate also demonstrating the poorest performance (Michinov et al., 2011).
Past educational research has consistently shown that the best predictor of performance is performance itself and the learning analytics field is no different (Tempelaar et al., 2015). In studies that include undergraduates taking LMS-supplemented courses, the strongest predictor of final exam grades, when accounting for a variety of student dispositional and engagement variables, was grades on formative quizzes (Huon et al., 2007; Tempelaar et al., 2015). In addition, early quiz grades have the same predictive power as later quiz grades, so these formative assessments can be used to create recommendations as well as to predict final exam grades (Wolff, Zdrahal, Nikolov, & Pantucek, 2013). Practice quizzes are often specifically designed as exam review tools, so the association between quiz grades and final exam grades is not surprising. However, the frequency of use of practice quizzes, and not just the grades, also has predictive value (Huon et al., 2007).
Thus, several potential factors related to LMS engagement have predicted subsequent achievement, but there are also some inconsistencies. These divergent results may be attributable to differences in contexts (i.e., school size, class type), outcome variables (i.e., continuous grades or pass/fail), choice of covariates, and prediction techniques, which challenges the generalizability of study results (Gašević et al., 2016). This study will use dominance analysis to try to figure out which facets of online LMS activity, in conjunction with individual student cognitive and affective factors, are important for student success in a flipped Calculus II course. By focusing on predictors of performance, we hope to create recommendations tailored specifically to future Calculus II classes, which will help overcome the challenges of generalizability.
1.3 The Present Study
Although substantial literature supports the role of the reviewed attitudinal and cognitive correlates of math achievement, as well as LMS activity data as a proxy for student engagement, in predicting student course achievement, our study is unique in its varied, multi-component approach. As the present data come from a flipped course, we have both the attitudinal and cognitive factors and the student engagement factors, and we use these sources of information to predict student performance in the flipped Calculus II course. We build upon the “dispositional learning analytics infrastructure” that proposes that learning attitudes can be gathered through self-report and combined with LMS activity data (Buckingham Shum & Deakin Crick, 2012). To this, we add cognitive factors to predict student course performance.
We will build on previous studies by exploring student achievement in a flipped classroom, semesterlong Calculus II course, combining various predictor variables often found to be related to student attitudes (math anxiety, math confidence, math interest, math importance) and cognitive skills (mental rotation, approximate number system), as well as student engagement with the online system (discussion forum interaction, time of submission of workshop assignments, time of submission of peer reviews, quiz attempts), in predicting final grades in the class. We began this project with the intention to use these data to build a recommendation system into the course platform, to provide feedback to students about actions they can take to improve their likelihood of success in the class. As such, we were most interested in which factors were the best at predicting final grades so that we could target those factors. Therefore, we will use a methodological approach called dominance analysis to rank order the relative importance of the attitudinal, cognitive, and LMS activity predictors on students’ final grades in Calculus II. This will allow us to determine which predictors emerge as the most important.
Participants were 85 students who completed a flipped Calculus II course in Spring 2014. About 43% of the students were female. Approximately 87% of students were white, 7% were Black or African American, 2% were Asian, 4% were other, and 17% were Hispanic/Latino. About 51% of students were first year students, 22% were sophomores, 21% were juniors, and 7% were seniors. The students were on average 19.75 years old (SD = 1.90, range = 17.75–30.00). Most students were majors in STEM fields that required Calculus II to complete the major.
In the course, students used the online course platform (WEPS)1 to watch lecture videos of course content outside of class time and then solved problems with the professor and other students during class time. Each course topic had three lecture videos filmed by the course instructor, corresponding to three difficulty levels (gentle, normal, rigorous), and each video was approximately 5 to 25 minutes in length. All teaching content was available to students at all times, although graded items had specific time frames of availability based on due dates. Students were not required to watch the videos before class, but were highly encouraged to do so.
Measures for this study were obtained from multiple sources including the following: in-person data collection (spatial skills measure of mental rotation); a Qualtrics survey (math anxiety, math confidence, math interest, math importance, approximate number system); log data from the online course system (time to deadline for online workshops, time to deadline for grading other students’ workshops, online quiz attempts, active forum interaction, passive forum interaction); and class records (course grade). All attitudinal and cognitive measures used are open source and freely available, other than the mental rotation measure.2 All the attitudinal and cognitive measures were specifically included in the course for research purposes. Participants completed informed consent procedures and a pencil-and-paper mental rotation task in-person at the start of one of their class periods at the beginning of the Spring 2014 term. In total, our final n = 85, which represents the number of students who completed the course and received a final grade, as well as consented to be part of the study and for us to use their online course data and class records. Of this n = 85, one student did not complete the online Qualtrics survey.
2.2.1 Course grade
A student’s final course grade (0–100) in Calculus II was used as the outcome variable. This grade was a weighted average of the scores on the workshops (30% of grade), scores on the two course exams (30%), and score on the final exam (40%). Quiz grades were also used as bonus points (up to 3% on the final grade).
2.2.2 Math anxiety
Students’ math anxiety was measured with the Math Anxiety Rating Scale–Revised (MARS–R; Plake & Parker, 1982). This scale has 24 items rated on a five-point scale in which students are asked to indicate the amount of anxiety they feel in different situations (e.g., looking through the pages on a math text) from “not at all” to “very much.” The internal consistency (α) for the scale was .95.
2.2.3 Math confidence
Students’ math confidence was measured with the 12-item confidence subscale of the Fennema- Sherman Math Attitudes Scales (Fennema & Sherman, 1976). Items were rated on a seven-point scale from Strongly disagree to Strongly agree (e.g., I am sure I could do advanced work in mathematics). The internal consistency (α) for the scale was .92.
2.2.4 Math interest
We measured math interest with four items adapted from Wigfield and Eccles (2000), from the Educational Longitudinal Study (Ingels et al., 2007), and from the Early Childhood Longitudinal Study (Tourangeau, Nord, Le, Pollack, & Atkins-Burnett, 2006). Items were rated on a seven-point scale from Strongly disagree to Strongly agree. Sample items are “I like math” and “I find working on math assignments to be very interesting.” The internal consistency (α) for the scale was .91.
2.2.5 Math importance
We measured math importance with six items, two of which are adapted from Wigfield and Eccles (2000) and four of which are researcher-developed (e.g., What I learn in math is useful). Items were rated on a seven-point scale from Strongly disagree to Strongly agree. The internal consistency (α) for the scale was .91.
2.2.6 Mental rotation
The Mental Rotation Test (Vandenberg & Kuse, 1978) was administered to students. The test consists of 24 items divided into two blocks. Each item includes a picture of a three-dimensional object presented on the left (i.e., the target object). On the right, there are four other pictures of three-dimensional objects. Two of them depict objects identical to the target, only presented from a different perspective. The other two depict either a mirror image of the target or an object with slightly different features. Students are asked to identify which two of the four objects are the same as the target. Using the standard scoring for this measure, students received 1 point only in those cases when both of their choices were correct. They received 0 points for any other type of response (e.g., if they selected one correct and one incorrect choice) to help account for guessing. Internal consistency (Cronbach’s alpha) for the measure was .89.
2.2.7 Approximate number system (ANS)
The Approximate Number System (ANS), or intuitive recognition of number, was measured using an online test on the Panamath website3 (Halberda, Mazzocco, & Feigenson, 2008). This test measures the ability of an individual to non-verbally represent numbers, or understand and manipulate numerical quantities non-symbolically (Halberda et al., 2008; Halberda & Feigenson, 2008; Libertus & Brannon, 2010). In the task, participants were shown brief displays (600 milliseconds) of intermixed blue and yellow dots with five to 20 dots per colour, and asked to determine if there were more blue (by pressing the “b” key) or yellow dots (by pressing the “y” key). In total, the participants were given 120 trials of various ratios of dot quantities (~5–7 minutes of testing time), and accuracy and response time for each trial was recorded. Panamath then calculates the participant Weber fraction (w-score), which represents the smallest ratio that can accurately be discriminated by a given individual, and reports it in a hyperlinked .pdf. The participant was asked to provide the hyperlink, from which the w-score was obtained. Due to the additional step of having to navigate to a different website and then copy the hyperlink in Qualtrics, 13 participants did not report w-score data, thus there were n = 71 participants with ANS data. Additionally, upon looking at the initial data, 4 data points were deemed to be outliers (w-scores greater than .85, where the next highest score was .44), so the scores were set to missing, as scores this out of range are likely reflective of the participant not accurately completing the task.
2.2.8 Online forum interaction
The online learning platform (WEPS) included forums in which students could ask, answer, and read questions about the course material. We created two different scores from forum information. For “discussion forum posting,” we scored the number of times students actively interacted with the forum by counting each time they wrote a post, including when they wrote the original post or responded to another student’s post. Upon looking at the initial data, 1 data point was deemed to be an extreme outlier (corresponding to 101 posts, versus the next highest of 36 posts), so we set this score to missing. For “discussion forum viewing,” we also scored the number of times students passively interacted with the forum by counting the number of times they viewed (but did not contribute) content on the forum.
2.2.9 Time to deadline for online workshops
As part of the course, students had to complete and submit 13 workshops, which were essential homework problem sets, over the course of the semester. We created a score that represented the average number of hours remaining before the deadline at the time they submitted their assignment.
2.2.10 Time to deadline for grading other students’ workshops
Students also had to grade the workshop assignments of five other students in their class for each workshop. They were given a specific amount of time to do this and again, we coded the average number of hours remaining before the deadline at the time they submitted the graded workshop assignments. Their accuracy in grading made up 20% of their grade for each workshop.
2.2.11 Online quiz attempts
Students took seven online quizzes over the course of the semester, and they were allowed to take the quizzes an unlimited number of times until they were happy with their grade. Because of this, using their grade on the quiz was less meaningful (and was also included in the calculation of final grade), so instead we examined the mean total number of attempts across the seven quizzes. Upon looking at the initial data, 1 data point was deemed to be an extreme outlier (corresponding to a mean of 8.34 submissions per quiz, versus the next highest of 3.15 submissions per quiz), so we set this score to missing.
Descriptive statistics are presented in Table 1. One variable was positively skewed (skew > 2.00, with no obvious outliers; see Tabachnick & Fidell, 2013), and therefore this variable was log-transformed for use in all remaining analysis (skew was 0 after log-transformation). Following calculation of descriptive statistics, Proc MI (i.e., multiple imputation; Rubin, 1987) was used to fill in the missing data for each predictor. We elected to do this step because dominance analysis, the main analysis, drops cases listwise when there is missing data. Using Proc MI, 1000 imputed datasets were calculated with plausible values for each missing data point, after which we calculated a mean score of all 1000 data points that became the predictor value. This meant that every individual had a complete dataset (i.e., no missing data) in all subsequent analyses.
|Discussion forum posting||77||8.75||8.86||0.00||36.00||0–∞||1.47|
|Discussion forum viewing||78||193.60||196.53||19.00||978.00||0–∞||2.21/0.00b|
|Workshop submissions ( hrs)||78||164.00||10.84||138.12||197.83||0–varying||0.64|
|Grading peer workshop submissions hrs)||78||69.44||21.69||23.27||128.04||0–varying||0.84|
|a n is reported as total cases available for each variable, out of a total possible n = 85. b skew is reported as before log-transformation/after log-transformation.|
Pearson correlations among the measures are in Table 2. For the most part, correlations with final grade were low to moderate in magnitude, with only the correlations of math confidence and active discussion forum activity with final grade being statistically significant. Sensitivity analysis conducted using G*Power suggested that we were only powered to detect a statistically significant correlation coefficient of r = .26 and greater with our sample size of 85, an alpha-error probability of .05, and power = .80. Therefore, we advise that the magnitude, and not statistical significance, be considered. There are four correlation coefficients, for example, that are .19 or higher, but are not statistically significant (math interest, ANS, discussion forum viewing, grading peer workshop submissions).
|1. Math anxiety||1|
|2. Math confidence||–.43*||1|
|3. Math interest||–.24*||.66*||1|
|4. Math importance||.17||.58*||.65*||1|
|5. Mental rotation||–.05||–.05||.10||.07||1|
|7. Discussion forum posting||–.02||.15||.28*||.08||–.06||–.21*||1|
|8. Discussion forum viewing||.16||.03||.15||.04||–.10||–.28*||.54*||1|
|9. Workshop submissions||–.18||.03||.06||–.09||–.10||.09||.10||.11||1|
|10. Grading peer workshops||.02||.14||–.13||–.24*||–.36*||.01||.14||.12||.26
|11. Quiz attempts||.14||.10||.01||–.11||–.25*||–.19||–.05||.06||.17||.33*||1|
|12. Final grade||.02||.24*||.15||.19||–.03||–.21||.24*||.19||.01||.19||.09|
As a first step, we used all-subsets regression to reduce the total predictor variables that would be entered into the dominance analysis (as dominance analysis is computationally intensive and limited to a maximum of 10 predictor variables). All-subsets regression computes all possible R² values for all possible set sizes of predictors (i.e., 1 predictor, 2 predictors, etc.) regressed onto final grade, and then rank-orders the obtained R² in order of highest to lowest within set size (Miller, 2002). Once the highest R² values for each set size was obtained, we used a “diminishing returns” technique to avoid overfitting the dominance analysis, mirroring a method laid out in Speece et al. (2010). Using this approach, we determined when going from a set of n predictors to a set of n + 1 predictors would not give us an important increase in R² value for the additional set increase. The models with 1 to all 11 predictors included yielded maximum R² values of .06, .10, .15, .17 .17, .17, .18, .18, .18, .18, and .18 (respectively). We determined that after the set size four, there were diminishing returns of R² increases, in that for each set size increase after four, there was at most only 1% more variance explained than the previous set. Table 3 displays the top 10 models, based on total R² values, of possible 4 predictor models. Looking at Table 3, only one model accounted for the highest proportion of variance accounted for in final grade (17% of the variance), representing the predictors of math importance, ANS, discussion forum posting, and grading peer workshop submission. These were moved forward into the dominance analysis.
Dominance analysis was used to rank order the predictors by contributive importance to final grade. Dominance analysis uses bootstrapping to compute total and unique R2 for all the possible combinations of the entered predictors of the outcome variable, here final grade in Calculus II. In dominance analysis, a series of different regression models, called “subset models,” are run and used as a whole to determine dominance order of the predictor variables. The total number of regression models run is based on a combinatorial rule of probability (Hays, 1994). As we had four predictors, we ran 15 different models: four single predictor models, six combinations of two predictor models, four models with three predictors, and one model with four predictors. This was done using a macro in SAS 9.4 (Azen & Budescu, 2003).
There are three types, or levels, of dominance: complete, conditional, and general (Azen & Budescu, 2003; Budescu, 1993). In the most strict, complete dominance, a predictor variable is considered completely dominant over a different predictor variable if it contributes additional unique variance to final grade in both the pairwise comparison, as well as when all other possible combinations of predictors are added into the model. That is, a predictor is completely dominant over another predictor in its association with final grade when it predicts unique variance in final grade when competed against all other predictors in all possible subset models. Conditional dominance is a weaker form of dominance from complete, in that a predictor variable is considered conditionally dominant over another predictor variable when it contributes unique variance to final grade within each model size (i.e., averaging across all the subset models with two predictor variables in the multiple regression). General dominance is weaker still, in that general dominance is achieved when a predictor variable’s unique variance accounted for is greater than another predictor variable’s, averaged across all possible subset models. Achieving complete dominance is ideal, but often undeterminable (i.e., total unique prediction above all other variables is difficult to achieve), so the weaker forms are then used to rank order predictors (Azen & Budescu, 2003). If a variable shows complete dominance, it would then logically also show conditional and general dominance, and so on.
Table 4 presents the total and unique R² values for each variable, or variable combination, in each subset model. The subset models with one predictor accounted for 4–6% of the variance in final grade, the subset models with two predictors accounted for 8–9% of the variance in final grade, the subset model with three predictors accounted for 11–15% of the variance in final grade, and the subset model with all four predictors accounted for 17% of the variance in final grade (see R² column in Table 4). The columns in the far right of Table 4 represent the additional unique variance each variable would account for in the presence of the other predictors in the model, if they were added into the model. For example, the subset model with only math importance accounted for 4% of the variance in Calculus II final grade. After controlling for the variance attributable to math importance, ANS would have accounted for an additional 5% of variance in final grade if it were added as a second predictor, discussion forum posting would have accounted for an additional 5% of variance in final grade if it were added as a second predictor, and time of submission of grading peer assignments would have accounted for an additional 6% of the variance in final grade if it were added as a second predictor.
In each of the bootstrapped samples, the dominance value is Dij obtained for a given pair of predictors, Xi and Xj, which corresponds to one of three values: 1, if Xi dominates Xj; 0, if Xj dominates Xi; and 0.5, if dominance cannot be established between the two predictors. As we used 1000 bootstrapped samples, Dij_mean represents the expected dominance level in the population of Xi over Xj. Finally, as dominance analysis uses this bootstrapping, it does not employ the more traditional approach used in multiple regression of producing p-values to determine “significance.” Instead, the closer Dij_mean is to 1 or 0, the stronger the case for clear directional dominance, and the closer Dij_mean is to 0.5, the stronger the case for indeterminate dominance. Additionally, these analyses also produce a “reproducibility” value, which represents the proportion of bootstrapped samples that the given dominance pattern is produced. The closer the reproducibility value is to 1.00, the greater the robustness of the dominance results.
Table 5 presents the Dij results, as well as Dij_mean (and corresponding standard error), Pij (proportion of samples were Dij = 1.0), Pji (proportion of samples were Dij = 0.0), Pijno (proportion of samples were Dij = 0.5), and the reproducibility value. As described in Azen and Budescu (2003), complete dominance is established if the additional contribution of a given predictor is higher than another predictor in all subset models where the two predictor variables are competing head-to-head. Complete dominance was established for math importance over grading peer workshop submissions (as seen by the Dij = 1.0, Dij_mean = .55), but there was not enough evidence for any other pair-wise complete dominance to be established (as seen by the Dij = 0.5). No pairs passed the test for conditional dominance, so therefore the remaining pairs were ranked by the general dominance criteria (which, by definition, should always find dominance patterns as it simply rank orders the total amount of variance accounted for, averaged across all subset models, from highest to lowest). The results indicate the following general dominance pattern: math importance > grading peer workshop submissions > ANS > discussion forum posting. Reproducibility values topped out at .66, suggesting that some caution should be given when accepting the general dominance pattern, particularly considering Azen and Budescu’s (2003) recommendation to give more weight to the much stricter complete dominance.
|Math importance||Discussion forum posting||0.5||.56||.41||.41||.29||.30||.30|
|Math importance||Grading peer workshop submissions||1.0*||.55||.47||.50||.40||.10||.50|
|ANS||Discussion forum posting||0.5||.43||.44||.32||.46||.23||.23|
|ANS||Grading peer workshop submissions||0.5||.42||.43||.30||.47||.23||.23|
|Discussion forum posting||Grading peer workshop submissions||0.5||.49||.39||.30||.32||.39||.39|
|Math importance||Discussion forum posting||0.5||.56||.42||.41||.29||.30||.30|
|Math importance||Grading peer workshop submissions||1.0||.55||.48||.51||.41||.08||.05|
|ANS||Discussion forum posting||0.5||.43||.44||.33||.46||.21||.21|
|ANS||Grading peer workshop submissions||0.5||.41||.44||.31||.50||.19||.19|
|Discussion forum posting||Grading peer workshop submissions||0.5||.49||.40||.30||.33||.38||.38|
|Math importance||Discussion forum posting||1.0*||.55||.50||.55||.45||.00||.55|
|Math importance||Grading peer workshop submissions||1.0||.55||.50||.55||.46||.00||.55|
|ANS||Discussion forum posting||1.0*||.42||.49||.42||.58||.00||.42|
|ANS||Grading peer workshop submissions||0.0*||.40||.49||.40||.60||.00||.60|
|Discussion forum posting||Grading peer workshop submissions||0.0*||.49||.50||.49||.51||.00||.51|
|Note: i and j = variables that are competing; Dij_mean = average number of times variable i dominated variable j over all bootstrap samples; Pij = proportion of bootstrap samples in which i dominated j; Pji = proportion of bootstrap samples in which j dominated i; Pijno = proportion of bootstrap samples in which no dominance was established. Reproducibil ity is the proportion of bootstrap samples that replicated the reported effect. * indicates the highest level of dominance achieved and implies all subsequent levels of dominance are also achieved.|
A substantial body of work has identified particular student attitudinal and cognitive factors related to math performance. Moreover, research that examines student engagement variables, using LMS activity data from online courses has grown. We sought to combine these research areas, determining which predictors, out of student attitudinal and cognitive factors, and indicators of student engagement in the online course, emerge as the most important in predicting final grades in a flipped Calculus II course. The goal of this work was to build a list of factors that predict performance in Calculus II, so that a (future) recommendation system can be built to provide feedback to students about actions they can take to improve their likelihood of success in the class.
Some might find it somewhat surprising that only two of our predictors were statistically significant in their correlation with final grade. This reaction is certainly warranted, but our small sample size of 85 is limiting our ability to obtain statistically significant results. Seven out of the 11 predictors had correlation coefficients that were over .15, and many of these had p-values = .05–.08. This, coupled with the possibility of multi-collinearity, is why we elected to use all-subsets regression, which takes into account variance explained rather than statistical significance. Our goal was to explain as much variance as possible in final grade, so we focused more on effect sizes than statistical significance.
The all-subsets regression indicated that believing math is important, having a stronger approximate number system (ANS), contributing more discussion forum posts, and submitting peer grading of workshops earlier together represent the best combination of predictors of final grade while also balancing model complexity when adding additional predictors. These variables were then chosen to be used in the dominance analysis. The results of the dominance analysis were not overly compelling for a specific ordering of importance of the four final predictor variables in predicting final grade. Math importance was established as completely dominant over time of submission of grading peer workshop submissions. Otherwise, dominance was only established using the least strict form, general dominance. These results suggested that math importance was more important than time of submission of grading peer workshop submissions, which was more importance than ANS ability, which was more important than the amount an individual did discussion forum posting. Overall, we take these results to mean that, in general, these variables are each similarly predictive.
The predictor variable of math importance is from a scale measuring thoughts about the usefulness of studying math. It is interesting to consider that students in Calculus II have already chosen a college course path that is STEM related, as this course is only required of certain majors. Thus, these students must already believe at some level that math is important. Yet, individual differences in the extent to which students believe math is important is associated with course performance. Perhaps math majors/minors believe that math is more useful than biology majors, and possibly math majors/minors are themselves better at calculus than biology majors based solely on the fact they have chosen to pursue a degree in it. Or it is possible that students able to more clearly see the connections between what they are learning in Calculus II and their chosen field do better (i.e., utility value; Wigfield & Eccles, 2000). We were not able to disentangle this nuance here, although this result might lend support to reminding students in Calculus II about the utility of the class in achieving their career goals.
ANS ability is a measure of non-symbolic numerosity, or an intuitive recognition of number. Researchers suggest that even as early as infancy, individuals develop the ability to make approximations and discriminate between large non-symbolic values (e.g., ten versus eight dots, objects, syllables, and or shapes; e.g., Halberda et al., 2008). We found that students with an ability to differentiate smaller set sizes quickly did better in the Calculus II class. The ANS is thought to be the foundation on which math ability builds (Verguts & Fias, 2004), although some work has found no significant relation between the ANS and future math performance (e.g., de Smedt, Noël, Gilmore, & Ansari, 2013). A meta-analysis has found that there does appear to be a small but stable relation between the ANS and math performance (Chen & Li, 2014) but this relation appears to be non-linear (Bonny & Lourenco, 2013; Purpura & Logan, 2015) with the ANS more related to earlier skills than with later skills (Chu, vanMarle, & Geary, 2015; Libertus, Feigenson, & Halberda, 2013). Given this mixed literature on the role of the ANS in complex math performance, we found it surprising that the ANS was one of the most important predictors of performance in Calculus II.
We also found it surprising that mental rotation was not an important predictor (or even a strong correlate of) final grade in Calculus II. Spatial skills are undeniably important in math performance. The best we can gather is that by the time students are enrolled in Calculus II, spatial skills themselves are no longer an important predictor of performance, as these students likely, at the very least, have good enough spatial skills to have gotten that far in math. The average score of this sample on the mental rotation task seems to be higher than a general undergraduate sample from the same school we have collected the same measure with (approximately 2 items correct more), but at this point this idea is just conjecture. We also remind the reader that spatial skills are not necessarily specifically needed for answering the problems in Calculus II (compared to say Calculus III or other math courses). Some of the units in this Calculus II class did require more overt spatial skills than others, and we anticipate that performance on those units might be more related to spatial skills than overall grade.
Beyond the attitudinal and cognitive predictors, we found that two of the online engagement variables were important predictors of Calculus II performance. The total number of posts that a student contributed to the online course discussion board was positively related to their final course grade. This variable is coarse, in that we are not able to say if the student was generating the root discussion post (e.g., asking the question), or if they were answering other student’s questions/posts. Therefore, we are not sure if this variable represented engagement in the course, or if students who posted questions were able to receive more help that led to a higher grade, or any other number of possible explanations. Interestingly, this finding replicates previous work that also indicated that the total number of posts to a discussion board was associated with success in an online biology course (Macfadyen & Dawson, 2010). Some have suggested that interacting with the course in this engaged way may deepen comprehension of the course content (Evans & Sabry, 2003), a conclusion that fits with the finding here that more discussion board posts was associated with higher grades.
We also found that students who submitted their grading of peer workshop assignments earlier did better in the course. One might suggest that this variable could be thought of as a “procrastination” variable. Procrastination in general is thought to be negatively associated with course performance, and this might explain our finding (e.g., Tice & Baumeister, 1997). Other studies have found that time management in online courses predicts achievement (Jo et al., 2016; Kwon, 2009; Choi & Choi, 2012). Alternatively, this variable could simply reflect that students who were struggling in the course found it difficult to grade these assignments (i.e., it’s hard to determine if something is right or wrong if you yourself are unsure of the right answer), and therefore took longer to submit it. We are unable to disentangle these possible explanations with the currently available data.
For both of the online engagement variables that were found to be important, we are conjecturing at best as to what these variables represent. But we can say that simply tracking a user through an online course, here with two variables, predicted 8% of the total variance in final grades. To put this in perspective with broader educational research, the variance in individual reading performance directly associated with a child’s teacher has been suggested to be similar in magnitude (Byrne et al., 2010). Although small, this 8% is meaningful in an educational context.
There are three points we would like to caution readers about. First, certainly any of the predictors listed in Table 3 are likely interchangeable with the four we selected from the most predictive model, as there is no statistical test of the difference between the correlations of the predictor variables included into the dominance analysis and those that were not (Schatschneider, Fletcher, Francis, Carlson, & Foorman, 2004). In particular, math importance is likely interchangeable with some of the other attitudinal predictors, including math interest, and math confidence — variables that were fairly highly correlated with math importance. We take from this that including an attitudinal variable is important in predicting end of course grades, and is likely useful for readers interested in using student characteristics to incorporating predictive systems in their online courses (e.g., Yukselturk & Bulut, 2007) to include at least one rather than none. These attitudinal surveys are easy to administer online and take 5–10 minutes at the most, and contribute a non-trivial amount of variance in predicting final grade. Interestingly, previous work looking at predictors of success in math classes have found attitudinal predictors to be important (e.g., math confidence; House, 1995), supporting their role especially in predicting student performance in online math classes.
Second, the dominance analysis was not convincing for a clearly important difference in strength of the predictors, and subsequently we believe that all four predictors are important for the model predicting final grade. Readers should focus on how a combination of four predictors, across attitudinal, cognitive, and online student engagement variables, accounted for 17% of the variance in final grade in Calculus II. Although far from 100%, we feel this amount is impressive given that none of the predictors are obvious indicators of performance in Calculus II. Certainly, math performance would be a dominant predictor, including previous grades. We sought to test measurable student characteristic variables that could feasibly be added into prediction software attached to online courses. With that in mind, testing a student’s ANS ability is actually not as feasible as a reader might want, given that the task would either need to be programmed into the course platform, or users would be required to navigate to a thirdparty website and harvest their response (i.e., our method). We note our data shows that we predicted 13% of the variance in student final grade without including ANS, an option that might be considered. Equally important, not all instructors are able to harvest LMS activity data because of privacy concerns; therefore, the full range of student engagement variables might not be possible. Our data shows that we predict 12% of the variance in student final grade without including any of the LMS activity data. Though these variables alone do not give the full picture, something is to be gained even with this incomplete information.
Finally, it is also important to point out that the four predictors we determined to be most important in our data might not be equally important in other similar data or for other courses. These analyses are fundamentally sample specific. Therefore, we reiterate that acknowledging that student attitudes, cognitive performance, and online engagement variables are all important to consider when predicting grade performance and should be considered together, when feasible, or ethically possible.
In conclusion, we sought to predict student final grades in a Calculus II from a battery consisting of attitudinal, cognitive, and online student engagement variables. We found that a mix of variables across all three categories of variables predicted a non-trivial proportion of variance in final grade. The aim of this work was to determine which are the most important predictors of student grade, with the end goal of building a recommendation system that could be implemented to help students in this traditionally difficult class. The methods used here could be used for any class, with the intention to determine student performance early, and potentially allow an instructor to identify students who may need more intensive help earlier in the semester when intervention can be more effective.
This project was made possible by the tireless efforts of Dr. Mika Seppälä, who died before he could see the successful outcome of his work. Mika was a pioneer in online teaching, and was passionate in his efforts to make the undergraduate math curriculum more accessible to all students. We thank Dr. Olga Caprotti and Yahya Almalki for their efforts in continuing Mika’s work, including their important work in harvesting the student engagement variables from the WEPS system for us after Mika died so that this project could continue.
This material is based upon work supported by the National Science Foundation under Grant No. 1338509 and 1450501.
2 The Mental Rotation Test is free but must be requested from Dr. Michael Peters.