an investigation of psychometric measures for modelling academic performance in tertiary...

4
An Investigation of Psychometric Measures for Modelling Academic Performance in Tertiary Education Geraldine Gray, Colm McGuinness, Philip Owende Institute of Technology Blanchardstown Blanchardstown Road North Dublin 15, Ireland [email protected] ABSTRACT Increasing college participation rates, and a more diverse student population, is posing a challenge for colleges in fa- cilitating all learners achieve their potential. This paper re- ports on a study to investigate the usefulness of data mining techniques in the analysis of factors deemed to be significant to academic performance in first year of college. Measures used include data typically available to colleges at the start of first year such as age, gender and prior academic per- formance. The study also explores the usefulness of addi- tional psychometric measures that can be assessed early in semester one, specifically, measures of personality, motiva- tion and learning strategies. A variety of data mining models are compared to assess the relative accuracy of each. Keywords Educational data mining, academic performance, ability, per- sonality, motivation, learning style, self-regulated learning. 1. INTRODUCTION Factors impacting on academic performance have been the focus of research for many years and still remain an active research topic [5], indicating the inherent difficulty in defin- ing robust deterministic models to predict academic per- formance, particularly in tertiary education [14]. Non pro- gression of students from first year to second year of study continues to be a problem across most academic disciplines. This is particularly true of the Institute of Technology 1 sec- tor in Ireland, where students have a weaker academic his- tory than university students, and there is increasing num- bers of non-standard students in the classroom [12]. This increase in participation poses a challenge to colleges to re- spond in a pro-active way to enable all leaners achieve their potential. 1 The Institute of Technology sector is a major provider of third and fourth level education in Ireland, focusing on the skill needs of the community they serve (www.ioti.ie). Educational Data Mining (EDM) is emerging as an evolv- ing and growing research discipline in recent years, covering the application of data mining techniques to the analysis of data in educational settings [4, 17]. EDM has given much attention to date to datasets generated from students’ be- haviour on Virtual Learning Environments (VLE) and In- telligent Tutoring Systems (ITS), many of which come from school education [1]. Less focus has been given to college ed- ucation, and in particular, to modelling datasets from out- side virtual or online learning environments. This paper reports on the preliminary results of a study to analyse the significance of a range of measures in building deterministic models of student performance in college. The dataset in- cludes data systematically gathered by colleges for student registration. The usefulness of additional psychometric mea- sures gathered early in semester one is also assessed. 2. STUDY CRITERIA In deciding on measures to include in the study, four key areas were reviewed: aptitude, personality, motivation and learning strategies. These were chosen firstly because re- search highlights these factors as being directly or indirectly related to academic performance [18], and secondly because these factors can be measured early in semester one. The following sections will report on correlations between indi- vidual factors and academic achievement, and also look at regression models of combinations of measures. All studies cited below are based on college education. 2.1 Influence of learner ability There is broad agreement that ability is correlated to aca- demic performance, although opinions differ on the range of sub factors that constitute ability [8]. For example, some studies have used specific cognitive ability tests to measure ability, for which there is extensive validity evidence. How- ever such tests have been criticised with regarding to the objects of measurment. For example Sternberg 1999 [19] asserts that high correlation between cognitive intelligence scores and academic performance is because they measure the same skill set rather than it being a causal relation- ship. Therefore many studies use data already available to colleges to measure ability, i.e. grades from 2nd level edu- cation or SAT/ACT (Scholastic Aptitude Test / American College Testing) scores [18]. In a meta analysis of 109 stud- ies by Robbins et al 2004 [16] prior academic achievement based on high school GPA or grades was found to have mod- erate correlation with academic performance (90% CI: 0.448 ± 0.4,). Average correlation between SAT scores and aca-

Upload: aldo-ramirez

Post on 30-Dec-2015

14 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: An Investigation of Psychometric Measures for Modelling Academic Performance in Tertiary Education-33

An Investigation of Psychometric Measures for ModellingAcademic Performance in Tertiary Education

Geraldine Gray, Colm McGuinness, Philip OwendeInstitute of Technology Blanchardstown

Blanchardstown Road NorthDublin 15, Ireland

[email protected]

ABSTRACTIncreasing college participation rates, and a more diversestudent population, is posing a challenge for colleges in fa-cilitating all learners achieve their potential. This paper re-ports on a study to investigate the usefulness of data miningtechniques in the analysis of factors deemed to be significantto academic performance in first year of college. Measuresused include data typically available to colleges at the startof first year such as age, gender and prior academic per-formance. The study also explores the usefulness of addi-tional psychometric measures that can be assessed early insemester one, specifically, measures of personality, motiva-tion and learning strategies. A variety of data mining modelsare compared to assess the relative accuracy of each.

KeywordsEducational data mining, academic performance, ability, per-sonality, motivation, learning style, self-regulated learning.

1. INTRODUCTIONFactors impacting on academic performance have been thefocus of research for many years and still remain an activeresearch topic [5], indicating the inherent difficulty in defin-ing robust deterministic models to predict academic per-formance, particularly in tertiary education [14]. Non pro-gression of students from first year to second year of studycontinues to be a problem across most academic disciplines.This is particularly true of the Institute of Technology1 sec-tor in Ireland, where students have a weaker academic his-tory than university students, and there is increasing num-bers of non-standard students in the classroom [12]. Thisincrease in participation poses a challenge to colleges to re-spond in a pro-active way to enable all leaners achieve theirpotential.

1The Institute of Technology sector is a major provider ofthird and fourth level education in Ireland, focusing on theskill needs of the community they serve (www.ioti.ie).

Educational Data Mining (EDM) is emerging as an evolv-ing and growing research discipline in recent years, coveringthe application of data mining techniques to the analysis ofdata in educational settings [4, 17]. EDM has given muchattention to date to datasets generated from students’ be-haviour on Virtual Learning Environments (VLE) and In-telligent Tutoring Systems (ITS), many of which come fromschool education [1]. Less focus has been given to college ed-ucation, and in particular, to modelling datasets from out-side virtual or online learning environments. This paperreports on the preliminary results of a study to analyse thesignificance of a range of measures in building deterministicmodels of student performance in college. The dataset in-cludes data systematically gathered by colleges for studentregistration. The usefulness of additional psychometric mea-sures gathered early in semester one is also assessed.

2. STUDY CRITERIAIn deciding on measures to include in the study, four keyareas were reviewed: aptitude, personality, motivation andlearning strategies. These were chosen firstly because re-search highlights these factors as being directly or indirectlyrelated to academic performance [18], and secondly becausethese factors can be measured early in semester one. Thefollowing sections will report on correlations between indi-vidual factors and academic achievement, and also look atregression models of combinations of measures. All studiescited below are based on college education.

2.1 Influence of learner abilityThere is broad agreement that ability is correlated to aca-demic performance, although opinions differ on the range ofsub factors that constitute ability [8]. For example, somestudies have used specific cognitive ability tests to measureability, for which there is extensive validity evidence. How-ever such tests have been criticised with regarding to theobjects of measurment. For example Sternberg 1999 [19]asserts that high correlation between cognitive intelligencescores and academic performance is because they measurethe same skill set rather than it being a causal relation-ship. Therefore many studies use data already available tocolleges to measure ability, i.e. grades from 2nd level edu-cation or SAT/ACT (Scholastic Aptitude Test / AmericanCollege Testing) scores [18]. In a meta analysis of 109 stud-ies by Robbins et al 2004 [16] prior academic achievementbased on high school GPA or grades was found to have mod-erate correlation with academic performance (90% CI: 0.448± 0.4,). Average correlation between SAT scores and aca-

Aldo Ramírez
Resaltado
Aldo Ramírez
Resaltado
Aldo Ramírez
Resaltado
Page 2: An Investigation of Psychometric Measures for Modelling Academic Performance in Tertiary Education-33

demic performance was lower (90% CI: 0.368± 0.035).

2.2 Influence of personalityFactor analysis by a number of researchers, working indepen-dently and using different approaches, has resulted in broadagreement of five main personality dimensions, namely open-ness, agreeableness, extraversion, conscientiousness and neu-roticism, commonly referred to as the Big Five [9]. Of thefive dimensions, conscientiousness is the best predictor ofacademic performance [20]. For example, Chamorro et al2008 [6] reported a correlation of 0.37 with academic per-formance (p<0.01, n=158). Openness is the second mostsignificant personality factor, but results are not as consis-tent. Chamorro et al 2008 [6] reported a correlation of 0.21(p<0.01, n=158) between openness and academic perfor-mance. However the strength of the correlation is influencedby assessment type, with open personalities doing betterwhere the assessment method is not restricted by rules anddeadlines [10]. Studies on the predictive validity of othermeasures of personality are inconclusive [20].

2.3 Motivational factorsMotivation is explained by a range of complementary the-ories, which in turn encompass a number of factors, someof which have been shown to be relevant, directly or indi-rectly, to academic performance [16]. Factors relevant toacademic performance in college include achievement moti-vation (drive to achieve goals), self-efficacy and self-determinedmotivation (intrinsic and extrinsic motivation). In Robbinset al 2004 meta analysis of 109 studies, self-efficacy andachievement motivation were found to be the best predic-tors of academic performance [16]. Correlations with self-efficacy averaged at 0.49 ±0.05 (CI: 90%), correlations withachievement motivation averaged at 0.303 ± 0.04 (CI: 90%).Self-determined motivation is not as strong a predictor ofacademic performance [11].

2.4 Influence of learning strategiesThe relationship between academic performance and per-sonality or motivation is mediated by a students approachto the learning task. Such learning strategies include bothlearning style (such as deep, strategic or shallow learningapproach) [6] and learning effort or self-regulation [13].

Analysing the influence of learning style directly on aca-demic performance, some studies show higher correlationswith a deep learning approach [6], while others cite marginallyhigher correlations with a strategic learning approach [5].The difference in these results can be explained, in part, bythe type of knowledge being tested for in the assessmentitself [21]. Many studies argue that there is a negative cor-relation between a shallow learning approach and academicperformance [5].

Self-regulated learning is recognised as a complex concept todefine, as it overlaps with a number of other concepts includ-ing personality, learning style and motivation, particularlyself-efficacy and goal setting [2]. For example, while manystudents may set goals, being able to self-regulate learningcan be the difference between achieving, or not achieving,goals set. Violet 1996 [21] argued that self-regulated learn-ing is more significant in tertiary level than earlier levels

of education because of the shift from a teacher-controlledenvironment to one where a student is expected to managetheir own study. A longitudinal tertiary level study by Ningand Dowling 2010 [13] investigating the interrelationshipsbetween motivation and self-regulation found while both hada significant influence on academic performance, motivationwas a stronger predictor of academic performance.

2.5 Combining psychometric measuresIndividually the factors discussed above are correlated withacademic performance. Also of relevance is how much of thevariance in academic performance they account for. Cassidy2011 [5] accounted for 53% of the variance in a regressionmodel including prior academic performance, self-efficacyand age (n=97, mean age=23.5). Chamorro-Premuzic etal 2008 [6] accounted for 40% of the variance in a regres-sion model including ability, personality factors and a deeplearning strategy (n=158, mean age=19.2). A similar vari-ance was reported by Dollinger et al 2008 [7] (44%) in aregression model including prior academic ability, personal-ity factors, academic goals and study time (n=338, meanage=21.9). However not all studies concur with these re-sults. In a study of non-traditional students, Kaufmann etal 2008 [11] accounted for 14% of the variance in a modelwith prior academic performance, personality factors andself-determined motivation (n=315, mean age=25.9). Thissuggests that models based on standard students may notbe applicable to a more diverse student population.

There is clearly overlap, either correlation or causal, betweenthe factors discussed above. A dataset that includes thesemeasures will have a complex pattern of interdependencies.This raises questions on how best to model this type of data,what dimensions are useful to include, and how consistentare results across student groups.

3. THE STUDY DATASETThis study was based on first year students at an Instituteof Technology in Ireland. 30% of first year students tookpart (n=713) based on their participation in online profil-ing during induction. Data was gathered over two years,September 2010 and September 2011. Students were froma variety of academic disciplines including computing, en-gineering, humanities, business and horticulture. The agerange was [18,60] with an average age of 23.75. AverageCAO Points2 was 257.9 ± 75. 59% of the students weremale.

The data was compiled from three sources:

1. Prior knowledge of the student: Table 1 lists attributesused from data available following student registration.This includes age, gender and ability measures basedon prior academic performance. Access to full timecollege courses in Ireland is based on academic achieve-ment in a set of state exams at the end of secondaryschool. Students have a grade for Maths, English, aforeign language, and four additional subjects chosenby the student. These subjects were categorised asscience, humanities and creative / practical.

2CAO Points are a measure of prior academic performancein Ireland, range [0,600].

Aldo Ramírez
Resaltado
Aldo Ramírez
Resaltado
Aldo Ramírez
Resaltado
Page 3: An Investigation of Psychometric Measures for Modelling Academic Performance in Tertiary Education-33

2. Psychometric data: Table 2 lists the additional mea-sures used which were assessed using an online ques-tionnaire developed for the study (www.howilearn.ie).The questionnaire was completed during first year in-duction. It covered measures of personality, motiva-tion, learning style and self-regulated learning. Ques-tions are taken from openly available, validated instru-ments.

3. Academic performance in year 1: A binary class labelwas used based on end of year GPA score, range [0-4].GPA is an aggregated score based on results from be-tween 10 and 12 modules delivered in first year. Thetwo classes were poor academic achievers who failedoverall (GPA<2, n=296), and strong academic achiev-ers who achieved honours overall (GPA≥2.5, n=340).To focus on patterns that distinguish poor academicachievers from strong academic achievers, students witha GPA of between 2.0 and 2.5 were excluded, giving afinal dataset of (n=636).

4. RESULTSTo date, six algorithms have been used to train models onthe dataset, Support Vector Machine(SVM), Neural Net-work(NN), k-Nearest Neighbour (k-NN), Naıve Bayes, De-cision tree and Logistic Regression, using RapidMiner V5.2(rapid-i.com). Models were run on the full dataset, and alsoon two subgroups of the dataset split by age. An age bound-ary of 21 was chosen because the majority of under 21s hada poor academic performance (61%), whereas the majorityof over 21s had a strong academic performance (70%). Alldatasets were balanced by over sampling the minority class,the attributes were scaled to the range [0,1], and model ac-curacy was assessed using cross validation (k=10). Modelaccuracy for data available at registration only (prior) wascompared to model accuracy when psychometric data wasincluded (all). Results and model parameters are detailedin Table 3.

When modelling all students using prior attributes only,pairwise comparison of the mean accuracies using least sig-nificant difference (dlsd = 5.07, p = 0.05) indicates modelperformance was comparable across learners, with the onlysignificant pairwise difference being between Naıve Bayes(75.74%) and SVM (69.41%). Model accuracies changedmarginally when psychometric variables were included. Themost notable change was the SVM model, where model ac-curacy increased from 69.41% to 75% (t(18) = 2.05, p =0.055). The accuracy of most models improved when trainedon younger students only. Again the most notable changewas the SVM model. It increased from 69.41% to 82.62%,which was significant (t(18) = 3.53, p = 0.0024). Modelaccuracies changed marginally when psychometric variableswere included for this group.

Models trained on older students were the least accuratewith the exception of an SVM model using all attributes,which achieved the highest accuracy of all models at 93.45%.Accuracies for Decision Tree and Logistic Regression wereparticularly poor, with models performing no better thanrandom guessing. In this student group, prior academic per-formance was not available for 38% of the students (n=108),explaining the improvement in accuracies when psychomet-

Table 1: Data available from the college

Learner Ability measures, mean and standard deviation:Aggregate Mark (CAO points) (258±76)English Result (45.5±18) Maths Result (23.8±14)Highest Result (64.7±14) Humanities Average(39.7±13)Science Average (31.8±16) Creative Average (47.9±19)Other factors:Age (23.75±7.6) Gender (m=440, f=304)Note: Range for age is [18,59], valid range for CAO pointsis [0,600], valid range for other values is [0,100]

Table 2: Additional measures*

Personality, Goldbergs IPIP scales (http://ipip.ori.org):Conscientousness (5.9±1.5 ) Openness (6.35±1.3)Motivation, MSLQ [15]:Intrinsic Goal Orientation(7.1±1.4) Self Efficacy (6.8±1.5)Extrinsic Goal Orientation (7.8±1.4)Learning style, based on R-SPQ-2F [3]:Shallow Learner (1.4±2) Deep Learner (5.2±2.9)Strategic Learner (3.5±2.5)Self-regulated Learning, MSLQ [15]:Self Regulation (5.8±1.3) Study Effort (5.9±1.7)Study Time (6.2±2.3)*The mean, standard deviation, and instrument fromwhich questions were sourced are given for each measure.Valid range for all measures is [0,10]

ric attributes were included in the models. When studentswith missing data were removed from this group, includ-ing psychometric measures did not improve model accura-cies significantly for any of the models. SVM again had thehighest accuracy at 91%, while Neural Networks achievedsimilar accuracies to the under 21 student group.

5. CONCLUSIONResults from this study show that models of academic per-formance in tertiary education can achieve good predictiveaccuracy, particularly if younger students and mature stu-dents are modelled separately. This suggests that patternsare different for standard versus non-standard students. Thepreliminary analysis has demonstrated that good accuracycan be achieved based on data already available to colleges.Including additional psychometric measures improves pre-dictive accuracy for mature students, but the evidence sofar suggests this is due to missing data regarding prior aca-demic performance rather than the additional added valueof the psychometric measures themselves.

The most accurate models were SVMs trained on under 21sand over 21s separately. In general, models that can learnmore complex patterns, and handle high dimensionality, aregetting higher accuracies for both student groups. The dif-ference in accuracy across models is most pronounced formature students, suggesting the patterns in that subgroupare more complex. When training a single model for all stu-dents, including students for whom prior academic resultsare not available (i.e. the quality of the dataset is reduced),models give comparable accuracy (73.82%±1.8).

The results published here represent early results from thestudy. Further analysis of the psychometric measures is re-quired to determine their predictive value, and also their use-fulness in understanding how the profile of students who fail

Aldo Ramírez
Resaltado
Page 4: An Investigation of Psychometric Measures for Modelling Academic Performance in Tertiary Education-33

Table 3: Model accuracies (% mean and st. dev)

Dataset Attri-butes

SVM* NN** k-NN(k=16)

NaıveBayes

DecisionTree

LogReg

Full(n=636)

prior 69.41±7.11

74.32±4.72

73.97±4.46

75.74±5

72.94±5.78

72.01±6.41

all 75±4.88

75.33±7.99

74.85±3.63

72.35±6

74.56±5.54

70.84±3.60

Under21(n=350)

prior 82.62±9.47

78.1±8.7

76.9±4.77

77.14±5.55

70±4.42

74.94±6.22

all 80.71±10.4

78.1±5.3

79.05±5.41

79.29±7.76

69.76±4.89

76.45±6.47

Over 21(n=286)

prior 77.03±9.56

72.29±11.33

64.99±7.33

57.16±16.79

50.62±1.41

48.96±22.12

all 93.45±4.41

77.31±9.3

71.7±4.86

57.16±16.79

50.62±1.41

52.47±18.08

Over 21no miss-

prior 91.03±6.46

78.3±11

71.54±11.47

62±11

51.5±1.36

64.31±10.71

ing data(n=178)

all 91.03±6.46

79.6±12.12

70.69±6.1

69.26±5.67

51.5±1.36

66.26±10.07

prior: attributes available from the college, all: all attributes*Anova kernel, epsilon=0.7 and C=1.*Learning rate=0.7, and momentum=0.4, 2 hidden layers (10,5).

differs from those who do well. To date, two subgroups havebeen examined, split by age. Other subgroups will be re-viewed, including an analysis of differences across academicdisciplines, and an analysis of students with GPA between2.0 and 2.5 (not included in this study). Finally, model ac-curacies will be verified by testing the models against a thirdyear of data, students registered in September 2012.

6. ACKNOWLEDGEMENTSThe authors would like to thank the Institute of TechnologyBlanchardstown (www.itb.ie) for their support in facilitat-ing this research, and the staff at the National Learning Net-work Assessment Services (www.nln.ie) for their assistancein administering questionnaires during student induction.

7. REFERENCES[1] R. S. J. D. Baker and K. Yacef. The state of

educational data mining in 2009” a review and futurevisions. Journal of Educational Data Mining, pages3–17, 2010.

[2] T. Bidjerano and D. Y. Dai. The relationship betweenthe big-five model of personality and self-regulatedlearning strategies. Learning and IndividualDifferences, 17:69 – 81, 2007.

[3] J. Biggs, D. Kember, and D. Leung. The revisedtwo-factor study process questionnaire: R-spq-2f.British Journal of Education Psyhcology, 71:133–149,2001.

[4] T. Calders and M. Pechenizkiy. Introduction to thespecial section on educational data mining. SIGKDD,13(2):3–3, 2011.

[5] S. Cassidy. Exploring individual differences asdetermining factors in student academic achievementin higher education. Studies in Higher Education,pages 1–18, 2011.

[6] T. Chamorro-Premuzic and A. Furnham. Personality,intelligence and approaches to learning as predictors ofacademic performance. Personality and IndividualDifferences, 44:1596–1603, 2008.

[7] S. J. Dollinger, A. M. Matyja, and J. L. Huber. Whichfactors best account for academic success: Those

which college students can control or those theycannot? Journal of Research in Personality,42:872–885, 2008.

[8] D. P. Flanagan and K. S. McGrew. Interpretingintelligence tests from contemporary gf-gc theory:Joint confirmatory factor analysis of the wj - r andkait in a non - white sample. Journal of SchoolPsychology, Vol. 36, No. 2:151 – 182, 1998.

[9] L. R. Goldberg. The development of markers for thebig-five factor structure. Psychological Assessment, 4(1):26–42, 1992.

[10] R. Kappe and H. van der Flier. Using multiple andspecific criteria to assess the predictive validity of thebig five personality factors on academic performance.Journal of Research in Personality, 44:142–145, 2010.

[11] J. C. Kaufman, M. D. Agars, and M. C.Lopez-Wagner. The role of personality and motivationin predicting early college academic success innon-traditional students at a hispanic-servinginstitution. Learning and Individual Differences,18:492 – 496, 2008.

[12] O. Mooney, V. Patterson, M. O’Connor, andA. Chantler. A study of progression in highereducation: A report by the higher education authority.Technical report, Higher Education Authority, Ireland,October 2010.

[13] H. K. Ning and K. Downing. The reciprocalrelationship between motivation and self-regulation: Alongitudinal study on academic performance. Learningand Individual Differences, 20:682–686, 2010.

[14] Z. A. Pardos, R. S. J. D. Baker, S. M. Gowda, andN. T. Heffernan. The sum is greater than the parts:Ensembling models of student knowledge ineducational software. SIGKDD Explorations, 13(2),2011.

[15] P. Pintrich, D. Smith, T. Garcia, and W. McKeachie.A manual for the use of the motivated strategies forlearning questionnaire. Technical Report 91-B-004,The Regents of the University of Michigan, 1991.

[16] S. B. Robbins, K. Lauver, H. Le, D. Davis, andR. Langley. Do psychosocial and study skill factorspredict college outcomes? a meta analysis.Psychological Bulletin, 130 (2):261–288, 2004.

[17] C. Romero and S. Ventura. Educational data mining:A survey from 1995 to 2005. Expert Systems withApplications, 33:135–146, 2007.

[18] N. Schmitt, F. L. Oswald, T. Pleskac, R. Sinha, andM. Zorzie. Prediction of four-year college studentperformance using cognitive and noncognitivepredictors and the impact on demographic status ofadmitted students. Journal of Applied Psychology,2009.

[19] R. Sternberg. Intelligence as developing expertise.Contemporary Educational Psychology, 24:359 – 375,1999.

[20] A. B. Swanberg and Ø. L. Martinsen. Personality,approaches to learning and achievement. EducationalPsychology, 30(1):75–88, 2010.

[21] S. E. Volet. Cognitive and affective variables inacademic learning: the significance or direction andeffort in students’ goals. Learning and Instruction,7(3):235–254, 1996.