the course experience questionnaire: a rasch measurement model analysis
TRANSCRIPT
This article was downloaded by: [University of Cambridge]On: 09 October 2014, At: 15:20Publisher: RoutledgeInforma Ltd Registered in England and Wales Registered Number: 1072954Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK
Higher Education Research &DevelopmentPublication details, including instructions for authors andsubscription information:http://www.tandfonline.com/loi/cher20
The Course ExperienceQuestionnaire: a RaschMeasurement Model AnalysisRussell F. Waugh aa Edith Cowan UniversityPublished online: 01 Nov 2006.
To cite this article: Russell F. Waugh (1998) The Course Experience Questionnaire: a RaschMeasurement Model Analysis, Higher Education Research & Development, 17:1, 45-64, DOI:10.1080/0729436980170103
To link to this article: http://dx.doi.org/10.1080/0729436980170103
PLEASE SCROLL DOWN FOR ARTICLE
Taylor & Francis makes every effort to ensure the accuracy of all the information(the “Content”) contained in the publications on our platform. However, Taylor& Francis, our agents, and our licensors make no representations or warrantieswhatsoever as to the accuracy, completeness, or suitability for any purpose of theContent. Any opinions and views expressed in this publication are the opinions andviews of the authors, and are not the views of or endorsed by Taylor & Francis. Theaccuracy of the Content should not be relied upon and should be independentlyverified with primary sources of information. Taylor and Francis shall not be liablefor any losses, actions, claims, proceedings, demands, costs, expenses, damages,and other liabilities whatsoever or howsoever caused arising directly or indirectly inconnection with, in relation to or arising out of the use of the Content.
This article may be used for research, teaching, and private study purposes. Anysubstantial or systematic reproduction, redistribution, reselling, loan, sub-licensing,systematic supply, or distribution in any form to anyone is expressly forbidden.Terms & Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions
Higher Education Research & Development, Vol. 17, No. 1, 1998 45
The Course Experience Questionnaire:a Rasch Measurement Model AnalysisRUSSELL F. WAUGHEdith Cowan University
ABSTRACT The Course Experience Questionnaire (CEQ) is applied to graduatingstudents of Australian universities. Data from a selected university for graduates from 1994to 1996 were analysed using a Rasch measurement model. The whole scale and each of thefive sub-scales were analysed for each year separately, to investigate its conceptual designand validity. The results show that, taken together, at least 17 of the 25 items con form avalid scale measuring graduate perceptions of their courses for each of the three data groups.Of the five sub-scales, Good Teaching and Generic Skills are only moderately valid andreliable for use and interpretation separately from the main scale.
Introduction
The Course Experience Questionnaire (CEQ) consists of 25 items in a Likert formatwith five response categories. The questionnaire is used by most of the 37 universi-ties in Australia to gather data about teaching and course quality, as perceived bygraduates about 4 months after graduation. The CEQ is given out annually to allgraduates, by individual universities, along with the Graduate Destination Survey,and the results are sent to the Graduate Careers Council of Australia whichproduces reports covering all the universities (Johnson, 1997; Johnson, Ainley &Long, 1996). It is used to measure graduates' perceptions of the quality of theircompleted courses (see the CEQ and Johnson, 1997, p. 3). The items are conceptu-alised from five aspects relating to course experiences and the learning environment.These are Good Teaching (6 items, 1994; 7 items, 1995-1996); Clear Goals andStandards (5 items, 1994; 4 items, 1995-1996); Appropriate Assessment (4 items);Appropriate Workload (4 items); and Generic Skills (6 items); and a single item onoverall satisfaction.
The development of the CEQ is given in Ainley and Long (1994, 1995); Johnson(1997); Johnson, Ainley and Long (1996); and Ramsden (1991a, 1991b). It evolvedfrom work carried out at the University of Lancaster in the 1970s. The originalquestionnaire was based on a model of university teaching which involves curricu-lum, instruction, assessment and learning outcomes, and contained more items thanis currently used (see Ramsden 1991a, 1991b). It was intended that students wouldevaluate descriptions of distinct aspects of their learning environment. It has since
0729-4360/98/010045-20 © 1998 HERDSA
Dow
nloa
ded
by [
Uni
vers
ity o
f C
ambr
idge
] at
15:
20 0
9 O
ctob
er 2
014
46 R. F. Waugh
been revised for use in Australia and the Graduate Careers Council of Australiaadded-items relating to generic skills to the questionnaire. The CEQ now focuses onstudent perceptions of five aspects relating to courses and the learning environment.For recent commentary on the CEQ, see Johnson (1997); Johnson et al. (1996);and Wilson, Lizzio and Ramsden (1996), and, for earlier development work,see Ramsden; Linke (1991); Entwistle and Ramsden (1983); and Marton andSaljo (1976).
The CEQ has been analysed by traditional measurement techniques. Using largemulti-disciplinary 1992, 1993 and 1994 samples of students and graduates(N= 2,130, 1992, N= 1,362, 1993 and N= 7,370, 1994), the CEQ was found tohave reasonable internal reliability (Cronbach alphas between 0.67 and 0.88) andgood construct validity as judged by appropriate factor loadings on each of thesub-scales (Wilson et al., 1996).
In another study providing different results and using different techniques, Sheri-dan (1995) used graduate samples of about 400 from three universities and analysedthe CEQ using the Extended Logistic Model of Rasch (Andrich, 1988a, 1988b;Rasch, 1980/1960). Sheridan suggested that the CEQ should continue to be treatedas five separate measures of an overarching construct called course experience, butindicated that, "doubt exists regarding the measurement quality of the CEQ overalland of the continued use of this instrument in its present form" (p. 21). He wascritical of the sub-scales, the lack of labels on the five sub-scales, the mixing ofpositively and negatively worded items which can cause a respondent interactioneffect, the use of the neutral response category (which attracts many different typesof response) and the use of the Likert format (disagree to neutral to agree order).A more favoured format from a measurement perspective is a clearly ordered onelike never to all-the-time (Sheridan, 1995; Treolar, 1994).
The analysis by Sheridan (1995) showed that the CEQ sub-scales are suspect,except for the Good Teaching scale (reliabilities greater than 0.8). The sub-scaleslack reliability and the thresholds, which check on the consistency of the responsecategories, show that graduates from three universities reacted differently to theitems. This means that a degree of interaction between the items and the threeuniversity groups existed, making comparisons between university groups invalid orsuspect (p. 15). Furthermore, items 25, 16, 7, and 20 exhibit misfit to the modeland should be discarded or reworded.
Aims of the Study
The present study aims to investigate the psychometric properties and the concep-tual design of the CEQ as an instrument to measure the perceptions of courseexperiences of university graduates after they have graduated. It aims to do this byanalysing the psychometric properties of the whole scale and the five sub-scalesseparately, using the Extended Logistic Model of Rasch (Andrich, 1979, 1988a,1988b; Rasch, 1980/1960). This model creates an interval level scale from thedata where equal differences between numbers on the scale represent equal differ-ences in graduate perception measures and item difficulties, as appropriate, and it
Dow
nloa
ded
by [
Uni
vers
ity o
f C
ambr
idge
] at
15:
20 0
9 O
ctob
er 2
014
The Course Experience Questionnaire 47
does this by calibrating both item difficulties and graduate perceptions on the samescale.
The conceptual and theoretical design of the CEQ is based on five aspects of thegraduates' course experiences. In order to investigate the meaning of the question-naire, it is necessary to investigate the sub-scales separately, as well as the wholequestionnaire. Theoretically, in an ideal conception of the questionnaire, the itemsof the sub-scale should be related well enough to fit the Rasch model as separatesub-scales and together as a whole scale. This is a major benefit of using Raschmodel analysis because it helps in the theoretical development of the variable and itsmeaning.
Method
Data
Data for the present study were analysed in three groups. The first group was 1,635;1994 graduates from University X in Australia (44% response rate). The secondgroup was 2,430; 1995 graduates from University X who responded to the samesurvey (except that question 16 was changed, see the Appendix). This representeda 66% response rate. The third group was 2,696; 1996 graduates from University X(67% response rate). Graduates covered all six Faculties of the university, namely:the Academy of Performing Arts; Arts; Business; Education; Health and HumanSciences; and Science, Engineering and Technology.
Measurement
Taken individually, the 25 items of the CEQ can be used to interpret the responsesof graduates to their perceptions about the university courses that they undertook.This could provide a view of their perceptions from a qualitative point of view oneach item. However, if data on the 25 items are aggregated in some way, or used tocreate a scale and then interpreted, then seven criteria have to be met before it canbe said that the items form a valid and reliable scale.
The seven measurement criteria have been set out by Wright and Masters (1981).They involve: (1) an evaluation of whether each item functions as intended; (2) anestimation of the relative position (difficulty) of each valid item along the scale;(3) an evaluation of whether each person's responses form a valid response pattern;(4) an estimation of each person's relative score (perception) on the scale; (5)calibrating the person scores and the item scores together on a common scaledefined by the items with a constant interval from one end of the scale to the otherso that their numerical values mark off the scale in a linear way; (6) calculating thenumerical values with standard errors which indicate the precision of the measure-ments on the scale; and (7) checking that the items remain similar in their functionand meaning from person to person and group to group so that they are seen asstable and useful measures. The present study used these seven criteria to analysethe 25 items of the CEQ and its five sub-scales.
Dow
nloa
ded
by [
Uni
vers
ity o
f C
ambr
idge
] at
15:
20 0
9 O
ctob
er 2
014
48 R. F. Waugh
Measurement Model
The Extended Logistic Model of Rasch (Andrich, 1978, 1988a, 1988b; Rasch,1980/1960; Wright, 1985) was used with the computer program QUEST (Adams &Khoo, 1994) to create a scale, satisfying the seven measurement criteria of Wrightand Masters (1981). The scale is based on the log odds (called logits) of graduates'agreeing with the items. The items are ordered along the scale at interval measure-ment level from easiest with which to agree to hardest with which to agree. Items at theeasiest end of the scale (those with negative logit values) are answered in agreementby most students and those items at the hardest end of the scale (those with positivelogit values) are most likely to be answered in agreement only by students whoseperceptions are strongly positive. The Rasch method produces scale-free graduateperception measures and sample-free item difficulties (Andrich, 1988b; Wright &Masters, 1982). That is, the differences between pairs of graduate perceptionmeasures and item difficulties are expected to be sample independent.
The program checks on the consistency of the graduate responses and calculatesthe scale score needed for a 50% chance of passing from one response category tothe next; for example, from strongly disagree to disagree; from disagree to neutral;neutral to agree; and from agree to strongly agree, for each item. These scale scoresare called threshold values; they are calculated in logits and they must be ordered torepresent the increasing perception needed to answer from strongly disagreeto disagree, to neutral, to agree, to strongly agree. Items whose thresholds arenot ordered—that is, items for which the students do not use the categoriesconsistently—are not considered to fit the model and would be discarded.
The program checks that the graduate responses fit the measurement modelaccording to strict criteria. The criteria are described by Adams and Khoo (1994),Wright and Masters (1982) and Wright (1985). The fit statistics are weighted andunweighted mean squares that can be approximately normalised using the Wilson-Hilferty transformation. The normalised statistics are called infit t and outfit t andthey have a mean near 0 and a standard deviation near 1 when the data conform tothe measurement model. A fit mean square of 1 plus x indicates 100x% morevariation between the observed and predicted response patterns than would beexpected if the data and the model were compatible. Similarly, a fit mean square of1 minus x indicates 100x% less variation between the observed and predictedresponse patterns than would be expected if the data and the model were compat-ible. In this study, each item had to fit the model within a 30% variation betweenthe observed and expected response pattern or it was discarded. With such items,the graduate responses are not consistent with the responses on the other items inthe scale and there is not sufficient agreement amongst graduates as to the position(difficulty) of the items on the scale.
Reliability is calculated by the Item Separation Index and the Graduate Separ-ation Index. Separation Indices represent the proportion of observed varianceconsidered to be true. A value of 1 represents high reliability and a value of 0 is low(Wright & Masters, 1982). A combination of data is required as evidence for theconstruct validity of the CEQ. The Item and Graduate Separation Indices need to
Dow
nloa
ded
by [
Uni
vers
ity o
f C
ambr
idge
] at
15:
20 0
9 O
ctob
er 2
014
The Course Experience Questionnaire 49
be high; the observed and expected item response patterns need to fit the measure-ment model according to strict criteria; the thresholds relating to passing from onecategory response to the next need to be ordered; and there needs to be a conceptualframework (theoretical or practical) linking the items of the scale together.
Data Analysis
The 1995 data were analysed with all the 25 items together and with each of thesub-scales of the CEQ separately. Six interval level scales were created for the 1996data: one for all 25 items and one for each of the five sub-scales. These analyses wererepeated for the 1995 and 1994 data.
Results
In the interest of conciseness and brevity, not all the results are presented here; onlywhat are considered the most important. For example, the values for the standarderrors of measurement for each item difficulty and graduate perception measure arenot presented; the threshold values for each response category of each item for eachof the scales created are not included; and the derived scales showing the positionsof the items and the graduate perceptions for each of the five sub-scales with eachof the three data groups are not presented.
Items 21 and 25 did not fit the model using any of the three data groups and sothe items were discarded. Of the remainder, item 9 did not fit the model for the1994 data; items 3 and 8 did not fit the model for the 1995 data; and items 8, 9 and17 did not fit the model for the 1996 data. A good scale was created with 17 itemsfor the 1996 data; 21 items for the 1995 data; and 22 items for the 1994 data.
The main results are set out in 12 tables and 2 figures. Table 2 shows thesummary statistics relating to the items fitting the model for each of the three datagroups. Table 3 shows the summary statistics relating to graduate perceptions foreach of the three data groups. Tables 4 and 5 show similar data for the GoodTeaching sub-scale; Tables 6 and 7 for the Clear Goals and Standards sub-scale;Tables 8 and 9 for the Appropriate Assessment sub-scale; Tables 10 and 11 for theAppropriate Workload sub-scale and Tables 12 and 13 for the Generic Skillssub-scale. Table 1 shows the item difficulties for the CEQ items fitting the model foreach of the three data groups. Figure 1 shows the CEQ scale with graduateperception measures and item difficulties calibrated on the same scale for the 1996data. Figure 2 shows the items fitting the model for the 1996 data.
Discussion
Psychometric Characteristics: Course Experience Questionnaire
The values of the infit mean squares and outfit mean squares are approximately1 and the values of the infit t-scores and outfit t-scores are approximately 0(see Tables 2 and 3). For each item, the mean squares are within 30% of the
Dow
nloa
ded
by [
Uni
vers
ity o
f C
ambr
idge
] at
15:
20 0
9 O
ctob
er 2
014
50 R. F. Waugh
TABLE 1. Difficulties of the items in logits for the three CEQscales created from 1994, 1995 and 1996 data
Item No.
12345
6789
10
1112131415
1617181920
21 Did not fit the22232425 Did not fit the
1994
+ 0.11-0.20+ 0.07+ 0.15-0.37
-0.10+ 0.48-0.40No fit-0.00
-0.31-0.15-0.03+ 0.04+ 0.17
+ 0.31+ 0.23+ 0.28-0.62+ 0.16
measurement-0.33+ 0.33+ 0.18
measurement
1995
+ 0.10-0.31No fit+ 0.18-0.44
-0.11+ 0.43No fit+ 0.24-0.04
-0.44-0.22+ 0.02-0.03+ 0.20
+ 0.35+ 0.14+ 0.22-0.53+ 0.10
model-0.30+ 0.33+ 0.17
model
1996
+ 0.14-0.22+ 0.05No fit-0.32
-0.00+ 0.52No fitNo fit-0.01
-0.31-0.07+ 0.06+ 0.04+ 0.31
No fitNo fit+ 0.29-0.53+ 0.23
-0.36No fit+ 0.17
Note: The items have similar difficulties across all three datagroups. This supports one of the aspects of a good scale,invariance of difficulties.
expected values, calculated according to the model (see Figure 2). These indicatethat the final sets of items of the CEQ for each data group have a strong fit to themeasurement model. This means that there is a strong agreement between allgraduates as to the difficulties of all items located at different positions on the scale.However, the items are not as well targeted as they could be (see Figure 1) and somemore easy and more difficult items are needed. The threshold values are orderedfrom low to high indicating that the graduates have answered consistently with anordered response format from strongly disagree to disagree; neutral; agree; andstrongly agree. The Indices of Graduate Perception and Item Separation range from0.87 to 0.90 (see Tables 2 and 3), indicating that the errors are low and that thepower of the tests of fit to the measurement model is good. The item difficulties andthe graduate perception measures are calibrated on the same scale and each item hasa similar difficulty value on the scale for each of the three data groups (see Table 1).
Dow
nloa
ded
by [
Uni
vers
ity o
f C
ambr
idge
] at
15:
20 0
9 O
ctob
er 2
014
The Course Experience Questionnaire 51
Postive graduate perceptions4.0 logits
3.0 logits
2.0 logits
1.0 logits
0.0 logits
-1.0 logits
-2.0 logits
X
XX
XXXX
XXXX
XXXXXX
xxxxxxxxxxx
xxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxx
xxxxXX
XXXXX
Negative graduate perceptions
Difficult items
Items 7Items 15, 18Items 1, 20, 24Items 3, 6, 10, 12, 13, 14
Items 2, 5, 11,22Items 19
Easy items
Notes:
Each X represents 12 graduates.
The item difficulties and the graduate perceptions are calibrated on the same scale. The scale ismeasured in logits which is the log odds of graduates agreeing with the items.
N= 2,702, 1996 graduates.
L= 17 as 8 items (4, 8, 9, 17, 21, 23, 24 and 25) did not fit the model and were discarded.
The graduate perception scores range from - 1.5 logits to + 3.7 logits and the item difficulties rangefrom - 0.5 logits to + 0.5 logits. This means that the difficulties of the 17 items are not targetedappropriately for the graduates. They are too easy and, therefore, more difficult items need to beadded.
The difficult items are at the top of the right-hand side of the scale. Only graduates with strong positiveperceptions can agree with these items. The easy items are at the bottom right-hand side of the scale.Most graduates agree with these.
FIG. 1. Scale for the Course Experience Questionnaire using 1996 data.
This supports the view that sample-free item measures have been created. It can,therefore, be claimed that the items of the CEQ which fit the model have soundpsychometric properties and that a good scale has been created.
Dow
nloa
ded
by [
Uni
vers
ity o
f C
ambr
idge
] at
15:
20 0
9 O
ctob
er 2
014
52 R. F. Waugh
INFITMNSQ 0.63 0.71 0.83 1.00 1.20 1.40 1.60
1 item 12 item 23 item 35 item 56 item 67 item 710 item 1011 item 1112 item 1213 item 1314 item 1415 item 1518 item 1819 item 1920 item 2022 item 2224 item 24
FIG. 2. Fit mean square data of the Course Experience Questionaire items (17) that fit the modelfor the 1996 data.
TABLE 2. Item statistics for the Course Experience Questionnairea
Meanb
SDh
Separability11
Infit meand
Outfit meane
InfillOutfit/No. of itemsNon fit items
1994
+ 0.00+ 0.20+ 0.83+ 1.00+ 1.02-0.21+ 0.3122
9, 21, 25
1995
+ 0.00+ 0.29+ 0.90+ 1.00+ 1.02-0.33+ 0.2721
3, 8, 21, 25
1996
+ 0.00+ 0.28+ 0.88+ 1.00+ 1.02-0.31+ 0.3417
4,8,9, 16,17,21,23,25
Note: When the data are compatible with the model, the expected values of the mean squares areapproximately 1 and the expected values of the r-scores are approximately 0. a Items 4, 8, 12, 13,16, 19, 21 and 23 are reversed scored. b Mean and SD are the mean and standard deviation of theitem thresholds or the graduate attitude scores for the scale and sub-scales, as appropriate. Athreshold for an item step between two categories of the same item is the attitude score that is requiredfor a graduate to have a 50% chance of passing that step (such as passing from agree to strongly agreeon a Likert item). c Separation indices represent the proportion of observed variance considered tobe true. A value of 1 represents high separability and a value of 0 represents low separability (Wright& Masters, 1982; Wright, 1985). A separability value of 0.9 or more is sought for a good scale.d Infitmean refers to mean squares, unweighted. e Outfit mean refers to weighted mean squares. f Infit tand outfit t refers to the normalised t values using the Wilson-Hilferty transformation.
Meaning of the Scale
The items fitting the model that make up the variable: graduate perceptions abouttheir courses of study at universities, relate to their experiences of teaching quality;goals and standards; assessment; workload; and Generic Skills learned. These itemsdefine the variable. They have good content validity and they are derived from aconceptual framework based on previous research. This, together with the previous
Dow
nloa
ded
by [
Uni
vers
ity o
f C
ambr
idge
] at
15:
20 0
9 O
ctob
er 2
014
The Course Experience Questionnaire 53
TABLE 3. Graduate statistics for the Course Experience Questionnairea
Meanb
SDb
Separability0
Infit meand
Outfit meane
Infit/Outfit/No. of graduatesGraduate responses (%)
1994
+ 0.29+ 0.70+ 0.87+ 1.02+ 1.02-0.15-0.09
1,63544
1995
+ 0.29+ 0.71+ 0.87+ 1.02+ 1.02-0.17-0.11
2,42866
1996
+ 0.51+ 0.92+ 0.89+ 1.01+ 1.02-0.18-0.09
2,69667
Note: When the data are compatible with the model, the expected values of the meansquares are approximately 1 and the expected values of the f-scores are approximately0. a Items 4, 8, 12, 13, 16, 19, 21 and 23 are reversed scored. b Mean and SD are themean and standard deviation of the item thresholds or the graduate attitude scores forthe scale and sub-scales, as appropriate. A threshold for an item step between twocategories of the same item is the attitude score that is required for a graduate to havea 50% chance of passing that step (such as passing from agree to strongly agree on aLikert item). c Separation indices represent the proportion of observed varianceconsidered to be true. A value of 1 represents high separability and a value of 0 representslow separability (Wright & Masters, 1982; Wright, 1985). A separability value of 0.9 ormore is sought for a good scale.d Infit mean refers to mean squares, unweighted. e Outfitmean refers to weighted mean squares. f Infit t and outfit t refers to the normalised t valuesusing the Wilson-Hilferty transformation.
data relating to reliability and fit to the measurement model, is strong evidence forthe construct validity of the variable. This means that the graduate responses to theitems "hang together" sufficiently well to represent the unobservable trait: graduateperceptions about their courses of study at universities. This trait involves, and isrelated to, the five aspects of the learning environment.
Items 21 and 25 do not fit the model because graduates cannot agree as to theirposition (difficulties) on the scale. Item 25 on course satisfaction does not relate toany of the five aspects of the learning environment and, on this point, is measuringsomething different about courses. Item 21 on pressure in the course does not relateto appropriate workload and is measuring something different. It is suggested thatthis item be reworded in a positive sense to focus on time pressure to learn what isrequired owing to workload in the course.
The Sub-scales of the Course Experience Questionnaire
The Good Teaching Sub-scale
The five items (3, 7, 15, 18 and 20; see the Appendix)) that make up thesub-variable, "Good Teaching", relate to graduate perceptions of how staff
Dow
nloa
ded
by [
Uni
vers
ity o
f C
ambr
idge
] at
15:
20 0
9 O
ctob
er 2
014
54 R. F. Waugh
TABLE 4.
Meanb
SDh
SeparabilityInfit meand
Outfit meane
Infit ?Outfit r5
No. of itemsNon-fit items
Item statistics for the
1994
+ 0.00+ 0.21+ 0.58+ 0.99+ 0.99-0.37-0.20
6None
Good Teaching
1995
+ 0.00+ 0.23+ 0.76+ 0.98+ 0.98-0.54-0.44
616
sub-scale3
1996
+ 0.00+ 0.24+ 0.80+ 0.98+ 0.99-0.60-0.44
516,17
Note: When the data are compatible with the model, the expected valuesof the mean squares are approximately 1 and the expected values of thet-scores are approximately 0. a Items 4, 8, 12, 13, 16, 19, 21 and 23 arereversed scored.b Mean and SD are the mean and standard deviation of theitem thresholds or the graduate attitude scores for the scale and sub-scales,as appropriate. A threshold for an item step between two categories of thesame item is the attitude score that is required for a graduate to have a 50%chance of passing that step (such as passing from agree to strongly agree ona likert item). c Separation indices represent the proportion of observedvariance considered to be true. A value of 1 represents high separability anda value of 0 represents low separability (Wright & Masters, 1982; Wright,1985). A separability value of 0.9 or more is sought for a good scale. d Infitmean refers to mean squares, unweighted.c Outfit mean refers to weightedmean squares. f Infit t and outfit t refers to the normalised t values using theWilson-Hilferty transformation.
motivate; comment; help; explain; and make the subjects interesting (items 16 and17 do not fit the model for the 1995 and 1996 data). The five items have a good fitto the measurement model; they have ordered thresholds indicating that the re-sponses are answered consistently; good graduate separability indices ranging from0.81 to 0.83; and reasonable item separability indices from + 0.76 to + 0.80, exceptfor an item separability index of + 0.58 with 1994 data (see Tables 4 and 5). Thedifficulties of the items are not as well targeted against the graduate perceptions asthey could be (scale not included here). It would seem that this sub-scale could beused separately from the full scale, if needed. Overall, it can be said that, while theGood Teaching sub-scale has some satisfactory psychometric properties, there isroom for improvement. The scale could be improved by adding more easy and harditems to better target the graduates and by rewording items 16 and 17. Items 16(1995 data) and 16 and 17 (1996 data) do not fit the model because graduatescannot agree on their position (difficulty) on the scale. It is suggested that item 17be reworded along the lines of, "The teaching staff gave me helpful feedback on myset work/assignments". Item 16 probably has to be discarded.
Dow
nloa
ded
by [
Uni
vers
ity o
f C
ambr
idge
] at
15:
20 0
9 O
ctob
er 2
014
The Course Experience Questionnaire 55
TABLE 5. Graduate statistics for the Good Teaching sub-scalea
Meanb
SDb
Separability0
Infit meand
Outfit meanc
Infit I5
Outfit/No. of graduates
1994
+ 0.13+ 1.43+ 0.83+ 0.99+ 0.99-0.21-0.10
1,635
1995
+ 0.16+ 1.49+ 0.84+ 0.98+ 0.96-0.21-0.09
2,430
1996
+ 0.34+ 1.50+ 0.81+ 0.99+ 0.99-0.19-0.07
2,637
Note: When the data are compatible with the model, the expected values of the meansquares are approximately 1 and the expected values of the r-scores are approximately0. a Items 4, 8, 12, 13, 16, 19, 21 and 23 are reversed scored. b Mean and SD are themean and standard deviation of the item thresholds or the graduate attitude scores forthe scale and sub-scales, as appropriate. A threshold for an item step between twocategories of the same item is the attitude score that is required for a graduate to havea 50% chance of passing that step (such as passing from agree to strongly agree on aLikert item). c Separation indices represent the proportion of observed varianceconsidered to be true. A value of 1 represents high separability and a value of 0 representslow separability (Wright & Masters, 1982; Wright, 1985). A separability value of 0.9 ormore is sought for a good scale. d Infit mean refers to mean squares, unweighted.e Outfitmean refers to weighted mean squares. f Infit t and outfit t refers to the normalised t valuesusing the Wilson-Hilferty transformation.
The Clear Goals and Standards Sub-scale
The four items (1, 6, 13 and 24; see the Appendix) that make up the sub-variable,"Clear Goals and Standards", relate to graduate perceptions of the standards andgoals expected in the course. (Item 16 did not fit the measurement model for the1994 data and was included with the Good Teaching sub-scale for the 1995 and1996 data groups, where it also didn't fit the model.) The four items have a goodfit to the measurement model; they have ordered thresholds indicating that theresponses are answered consistently, and moderate graduate and item separabilityindices from + 0.62 to + 0.74 for the three data groups, except for the 1996 datagroup where the index is 0.28 (see Tables 6 and 7). The difficulties of the items arenot as well targeted against the graduate perceptions as they could be because thereare too few items (scale not included here). While the four items of the Clear Goalsand Standards sub-scale have moderately good psychometric properties, they shouldnot be used as a separate scale without major modification. It is suggested that somemore easy and hard items be added.
The Appropriate Assessment Sub-scale
The three items (8, 12 and 19; see the Appendix) that make up the sub-variable,"Appropriate Assessment", relate to graduate perceptions of memorisation and the
Dow
nloa
ded
by [
Uni
vers
ity o
f C
ambr
idge
] at
15:
20 0
9 O
ctob
er 2
014
56 R. F. Waugh
TABLE 6. Item statistics for the Clear Goals and Standards sub-scalea
Meanb
SDb
Separability0
Infit meand
Outfit mean'Infit/Outfit/No. of itemsNon-fit items
1994
+ 0.00+ 0.20+ 0.62+ 0.99+ 1.03-0.71-0.13
416
1995
+ 0.00+ 0.19+ 0.64+ 0.98+ 0.97-0.86-0.91
4None
1996
+ 0.00+ 0.12+ 0.28+ 0.97+ 0.97-1.00-0.90
4None
Note: When the data are compatible with the model, the expected values ofthe mean squares are approximately 1 and the expected values of the r-scoresare approximately 0. a Items 4, 8, 12, 13, 16, 19, 21 and 23 are reversed scored.b Mean and SD are the mean and standard deviation of the item thresholds orthe graduate attitude scores for the scale and sub-scales, as appropriate. Athreshold for an item step between two categories of the same item is theattitude score that is required for a graduate to have a 50% chance of passingthat step (such as passing from agree to strongly agree on a Likert item).c Separation indices represent the proportion of observed variance consideredto be true. A value of 1 represents high separability and a value of 0 representslow separability (Wright & Masters, 1982; Wright, 1985). A separability valueof 0.9 or more is sought for a good scale. d Infit mean refers to mean squares,unweighted. c Outfit mean refers to weighted mean squares. f Infit t and outfitr refers to the normalised t values using the Wilson-Hilferty transformation.
learning of facts in the course. The three items have a satisfactory fit to themeasurement model. However, while they have ordered thresholds indicating thatthe responses are answered consistently, they also have low graduate and itemseparability indices from + 0.55 to + 0.77 for the three data groups (see Tables 8and 9). The low reliability is directly attributable to the low number of items and,hence, to the poor targeting of the items. The current three items of the AppropriateAssessment sub-scale do not have sound psychometric properties and cannot beused as a separate scale without major modification. It is suggested that more easyand hard items be added to better target the graduates.
The Appropriate Workload Sub-scale
The four items (4, 14, 21 and 23; see the Appendix) that make up the sub-variable,"Appropriate Workload", relate to graduate perceptions of the amount of work, thepressure and lack of time to comprehend everything in the course. While item 21 fitsthe sub-variable, it does not fit the full CEQ scale, indicating that all graduates canagree on its common difficulty in the sub-scale but not in the full scale. The four
Dow
nloa
ded
by [
Uni
vers
ity o
f C
ambr
idge
] at
15:
20 0
9 O
ctob
er 2
014
The Course Experience Questionnaire 57
TABLE 7. Graduate statistics for the Clear Goals and Standards sub-scalea
Meanb
SDb
Separability'Infit meand
Outfit mean'Infit fOutfit ^No. of graduates
1994
+ 0.26+ 1.05+ 0.71+ 1.04+ 1.03-0.14-0.04
1,635
1995
+ 0.36+ 1.38+ 0.74+ 0.97+ 0.97-0.29-0.14
2,430
1996
+ 0.51+ 1.37+ 0.74+ 0.97+ 0.97-0.30-0.15
2,618
Note: When the data are compatible with the model, the expected values of the meansquares are approximately 1 and the expected values of the t-scores are approximately0. a Items 4, 8, 12, 13, 16, 19, 21 and 23 are reversed scored. b Mean and SD are themean and standard deviation of the item thresholds or the graduate attitude scores forthe scale and sub-scales, as appropriate. A threshold for an item step between twocategories of the same item is the attitude score that is required for a graduate to havea 50% chance of passing that step (such as passing from agree to strongly agree on aLikert item). c Separation indices represent the proportion of observed varianceconsidered to be true. A value of 1 represents high separability and a value of 0 representslow separability (Wright & Masters, 1982; Wright, 1985). A separability value of 0.9 ormore is sought for a good scale. d Infit mean refers to mean squares, unweighted. e Outfitmean refers to weighted mean squares. f Infit t and outfit t refers to the normalised t valuesusing the Wilson-Hilferty transformation.
items have a reasonable fit to the measurement model in the sub-scale. However,while they have ordered thresholds, indicating that the responses are answeredconsistently, they have only moderate graduate and item separability indices from+ 0.68 to + 0.91 for the three data groups (see Tables 10 and 11). The moderateseparability is directly attributable to the low number of items and to the poortargeting of the items (not shown here). The current four items of the AppropriateWorkload sub-scale do not have sound psychometric properties and cannot be usedas a separate scale without major modification. Similar modifications should bemade to this sub-scale as to the previous sub-scale.
The Generic Skills Sub-scale
The six items (2, 5, 10, 11 and 22; item 9 did not fit the model for any group anditem 10 did not fit the model for the 1994 group, see the Appendix), that make upthe sub-variable, "Generic Skills", relate to graduate perceptions of their problem-solving ability; analytical skills; communication skills; ability to plan ahead; andwork as a team member, and to their confidence in tackling unfamiliar problems, asdeveloped in the course. The five items have a good fit to the measurement model.They have ordered thresholds, indicating that the responses are answered consist-ently, and moderate graduate and item separability indices (see Tables 12 and 13).This sub-scale could be used separately from the full 23 item scale, if needed.Overall, it can be said that, while the Generic Skills sub-scale has some satisfactory
Dow
nloa
ded
by [
Uni
vers
ity o
f C
ambr
idge
] at
15:
20 0
9 O
ctob
er 2
014
58 R. F. Waugh
TABLE 8. Item statistics for the Appropriate Assessment sub-scalea
Meanb
SDb
Separability1
Infit meand
Outfit meanc
Infit/Outfit/No. of itemsNon-fit items
1994
+ 0.00+ 0.20+ 0.69+ 0.99+ 0.98-0.43-0.45
3None
1995
+ 0.00+ 0.21+ 0.65+ 0.98+ 0.98-0.69-0.63
3None
1996
+ 0.00+ 0.25+ 0.77+ 0.99+ 0.98-0.50-0.72
3None
Note: When the data are compatible with the model, the expected values ofthe mean squares are approximately 1 and the expected values of the t-scoresare approximately 0.a Items 4, 8, 12, 13, 16, 19, 21 and 23 are reversed scored.b Mean and SD are the mean and standard deviation of the item thresholds orthe graduate attitude scores for the scale and sub-scales, as appropriate. Athreshold for an item step between two categories of the same item is theattitude score that is required for a graduate to have a 50% chance of passingthat step (such as passing from agree to strongly agree on a Likert item).c Separation indices represent the proportion of observed variance consideredto be true. A value of 1 represents high separability and a value of 0 representslow separability (Wright & Masters, 1982; Wright, 1985). A separability valueof 0.9 or more is sought for a good scale. d Infit mean refers to mean squares,unweighted. e Outfit mean refers to weighted mean squares. f Infit t and outfitt refers to the normalised t values using the Wilson-Hilferty transformation.
psychometric properties, the scale could be improved by adding more easy and harditems to better target the graduates. Item 9 seems out of place because many coursesdo not aim to develop team work. The item could be changed to read, "The coursehelped to develop my general ability to see other points of view". Item 10 could bemodified to read, "As a result of my course, I can try to solve unfamiliar problems".
Problems with the Likert Format
Although the Likert format is commonly used in attitude measures, its use has beencalled into question in modern attitudinal measurement (Andrich, 1982; Andrich,de Jong & Sheridan, 1994; Dubois & Burns, 1975; Sheridan, 1993, 1995). Threeissues have been questioned. The first relates to the middle or neutral category andits interpretation. The second relates to the use of the response categories which arenot considered to represent a true ordering from low to high, and the third relatesto the mixing of negatively and positively worded items to avoid the fixed responsecategory syndrome.
Dow
nloa
ded
by [
Uni
vers
ity o
f C
ambr
idge
] at
15:
20 0
9 O
ctob
er 2
014
The Course Experience Questionnaire 59
TABLE 9. Graduate statistics for the Appropriate Assessment sub-scalea
Meanb
SDb
Separability0
Infit meand
Outfit meanc
InfillOutfit r*No. of graduates
1994
+ 0.73+ 1.16+ 0.56+ 0.96+ 0.98-0.21-0.05
1,635
1995
+ 0.70+ 1.15+ 0.55+ 0.96+ 0.98-0.26-0.00
2,430
1996
+ 0.78+ 1.15+ 0.54+ 0.95+ 0.98-0.25-0.08
2,466
Note: When the data are compatible with the model, the expected values of the meansquares are approximately 1 and the expected values of the r-scores are approximately0. a Items 4, 8, 12, 13, 16, 19, 21 and 23 are reversed scored. b Mean and SD are themean and standard deviation of the item thresholds or the graduate attitude scores forthe scale and sub-scales, as appropriate. A threshold for an item step between twocategories of the same item is the attitude score that is required for a graduate to havea 50% chance of passing that step (such as passing from agree to strongly agree on aLikert item). c Separation indices represent the proportion of observed varianceconsidered to be true. A value of 1 represents high separability and a value of 0 representslow separability (Wright & Masters, 1982; Wright, 1985). A separability value of 0.9 ormore is sought for a good scale.d Infit mean refers to mean squares, unweighted. e Outfitmean refers to weighted mean squares. f Infit t and outfit t refers to the normalised t valuesusing the Wilson-Hilferty transformation.
TABLE 10. Item statistics for the Appropriate Workload sub-scalea
Meanb
SDb
Separabilityc
Infit meand
Outfit meane
Infit fOutfit fNo. of itemsNon-fit items
1994
+ 0.00+ 0.30+ 0.80+ 0.99+ 0.99-0.57-0.45
4None
1995
+ 0.00+ 0.29+ 0.88+ 0.99+ 0.99-0.51-0.47
4None
1996
+ 0.00+ 0.34+ 0.91+ 1.00+ 1.00-0.27-0.20
4None
Note: When the data are compatible with the model, the expected values ofthe mean squares are approximately 1 and the expected values of the r-scoresare approximately 0. a Items 4, 8, 12, 13, 16, 19, 21 and 23 are reversed scored.b Mean and SD are the mean and standard deviation of the item thresholds orthe graduate attitude scores for the scale and sub-scales, as appropriate. Athreshold for an item step between two categories of the same item is theattitude score that is required for a graduate to have a 50% chance of passingthat step (such as passing from agree to strongly agree on a Likert item).c Separation indices represent the proportion of observed variance consideredto be true. A value of 1 represents high separability and a value of 0 representslow separability (Wright & Masters, 1982; Wright, 1985). A separability valueof 0.9 or more is sought for a good scale. d Infit mean refers to mean squares,unweighted.e Outfit mean refers to weighted mean squares. f Infit t and outfitt refers to the normalised t values using the Wilson-Hilferty transformation.
Dow
nloa
ded
by [
Uni
vers
ity o
f C
ambr
idge
] at
15:
20 0
9 O
ctob
er 2
014
60 R. F. Waugh
TABLE 11. Graduate statistics for the Appropriate Workload sub-scalea
Meanb
SDh
Separability0
Infit meand
Outfit mean6
Infit fOutfit fNo. of graduates
1994
+ 0.10+ 1.10+ 0.69+ 0.97+ 0.99-0.21-0.06
1,635
1995
+ 0.05+ 1.12+ 0.68+ 0.98+ 0.99-0.20-0.05
2,430
1996
+ 0.17+ 1.15+ 0.68+ 0.99+ 1.00-0.20-0.06
2,663
Note: When the data are compatible with the model, the expected values of the meansquares are approximately 1 and the expected values of the t-scores are approximately0. a Items 4, 8, 12, 13, 16, 19, 21 and 23 are reversed scored. b Mean and SD are themean and standard deviation of the item thresholds or the graduate attitude scores forthe scale and sub-scales, as appropriate. A threshold for an item step between twocategories of the same item is the attitude score that is required for a graduate to havea 50% chance of passing that step (such as passing from agree to strongly agree on aIikert item). c Separation indices represent the proportion of observed varianceconsidered to be true. A value of 1 represents high separability and a value of 0 representslow separability (Wright & Masters, 1982; Wright, 1985). A separability value of 0.9 ormore is sought for a good scale. d Infit mean refers to mean squares, unweighted. e Outfitmean refers to weighted mean squares. f Infit t and outfit t refers to the normalised t valuesusing the Wilson-Hilferty transformation.
TABLE 12. Item statistics for the Generic Skills sub-scalea
Meanb
SDb
Separability0
Infit meand
Outfit meanc
Infit fOutfit/No. of itemsNon-fit items
1994
+ 0.00+ 0.11+ 0.62+ 0.98+ 0.98-0.68-0.34
49,10
1995
+ 0.00+ 0.24+ 0.77+ 0.99+ 0.99-0.41-0.33
59
1996
+ 0.00+ 0.18+ 0.62+ 0.98+ 0.99-0.75-0.34
59
Note: When the data are compatible with the model, the expected values of the meansquares are approximately 1 and the expected values of the t-scores are approximately0. a Items 4, 8, 12, 13, 16, 19, 21 and 23 are reversed scored. b Mean and SD are themean and standard deviation of the item thresholds or the graduate attitude scores forthe scale and sub-scales, as appropriate. A threshold for an item step between twocategories of the same item is the attitude score that is required for a graduate to havea 50% chance of passing that step (such as passing from agree to strongly agree on aLikert item). c Separation indices represent the proportion of observed varianceconsidered to be true. A value of 1 represents high separability and a value of 0 representslow separability (Wright & Masters, 1982; Wright, 1985). A separability value of 0.9 ormore is sought for a good scale. d Infit mean refers to mean squares, unweighted. e Outfitmean refers to weighted mean squares. f Infit t and outfit t refers to the normalised t valuesusing the Wilson-Hilferty transformation.
Dow
nloa
ded
by [
Uni
vers
ity o
f C
ambr
idge
] at
15:
20 0
9 O
ctob
er 2
014
The Course Experience Questionnaire 61
TABLE 13. Graduate statistics for the Generic Skills sub-scalea
Meanb
SDb
Separability0
Infit meand
Outfit mean'Infit £Outfit £No. of graduates
1994
+ 0.79+ 1.22+ 0.67+ 0.99+ 0.99-0.24-0.10
1,635
1995
+ 0.87+ 1.31+ 0.75+ 0.99+ 0.99-0.21-0.09
2,430
1996
+ 0.97+ 1.33+ 0.75+ 1.00+ 0.99-0.21-0.10
2,628
Note: When the data are compatible with the model, the expected values of the meansquares are approximately 1 and the expected values of the r-scores are approximately0. a Items 4, 8, 12, 13, 16, 19, 21 and 23 are reversed scored. b Mean and SD are themean and standard deviation of the item thresholds or the graduate attitude scores forthe scale and sub-scales, as appropriate. A threshold for an item step between twocategories of the same item is the attitude score that is required for a graduate to havea 50% chance of passing that step (such as passing from agree to strongly agree on aLikert item). c Separation indices represent the proportion of observed varianceconsidered to be true. A value of 1 represents high separability and a value of 0 representslow separability (Wright & Masters, 1982; Wright, 1985). A separability value of 0.9 ormore is sought for a good scale. d Infit mean refers to mean squares, unweighted. e Outfitmean refers to weighted mean squares. f Infit t and outfit t refers to the normalised t valuesusing the Wilson-Hilferty transformation.
The middle category between agree and disagree attracts a variety of responsessuch as "don't know", "unsure", "neutral", and "don't want to answer" and theseare at odds with the implied ordered responses from strongly disagree to stronglyagree. With a Rasch measurement analysis, this difficulty of graduate interpretationwould be indicated by reversed thresholds and misfit to the measurement model.Strangely, in the present study, all the thresholds were nicely ordered and thisproblem did not seem to be present, although there is still the problem of interpret-ation of the results. That does not mean that the problem will not be present in otheradministrations of the CEQ; it just did not show as a problem with the presentstudy.
From a measurement perspective, and for some graduates, the range fromstrongly disagree to strongly agree is not ordered from low to high. Again, in a Raschmeasurement analysis, this would be indicated by reverse thresholds and misfit tothe model. While the present study did not show any reverse thresholds, the problemof interpretation of the results is still present.
It is suggested that the items of the CEQ should be modified to overcome thesetwo problems of interpretation. One way to do this is to change the response formatto a clearly increasing order such as never, sometimes, a great deal or all the time.Another is to use numbers or a range of numbers starting from 0.
Although it has been common practice in attitude measures to mix negatively andpositively worded items to avoid the fixed category response syndrome, its practicehas been called into question during Rasch attitude measurement analysis (Andrich
Dow
nloa
ded
by [
Uni
vers
ity o
f C
ambr
idge
] at
15:
20 0
9 O
ctob
er 2
014
62 R. F. Waugh
& van Schoubroeck, 1989; Sheridan, 1995). It is claimed that mixing negatively andpositively worded items causes many respondents to link answers between items andthis relates to an interaction effect between items and different groups of respon-dents resulting in the loss of invariance of the items. In the present study, this couldexplain why items 16, 21 and 25 did not fit the model. It could also partially explainthe variation in the extent of fit of items to the model and be related to the lowerseparability (reliability) of the items in the sub-scales.
Conclusions
Taken separately, each of the 25 items of the CEQ can be used to provide qualitativedata about graduate perceptions of the courses they have completed at university.Taken together, 17 items for the 1996 data, 21 items for the 1995 data and 22 itemsfor the 1994 data form valid and reliable scales. Some improvements could be madeto the scales by adding some more easy and hard items to better target the graduates.The conceptual design of the CEQ from the five aspects: Good Teaching; ClearGoals and Standards; Appropriate Assessment; Appropriate Workload; and GenericSkills, is confirmed and the Rasch measurement model has been useful in examiningits meaning and conceptual design.
Of the five aspects of the CEQ, only Good Teaching and Generic Skills formmoderately valid and reliable sub-scales which could be used and interpretedseparately. Both the valid and not-so-valid sub-scales could be improved by increas-ing the number of easy and hard items to provide better targeting.
Address for correspondence: Dr Russell F. Waugh, Edith Cowan University,Pearson Street, Churchlands, WA 6018, Australia. E-mail: [email protected]
ReferencesADAMS, R.J. & KHOO, S.T. (1994). QUEST: The interactive test analysis system. Melbourne:
Australian Council for Educational Research (ACER).AINLEY, J. & LONG, M. (1994). The Course Experience Survey 1992 Graduates. Canberra: Australian
Government Publishing Service, AGPS.AINLEY, J. & LONG, M. (1995). The 1995 Course Experience Questionnaire: An interim report.
Melbourne: Australian Council for Educational Research (ACER) for the GCCA.ANDRICH, D. (1978). A rating formulation for ordered response categories. Psychometrika, 43,
561-573.ANDRICH, D. (1982). Using latent trait measurement to analyse attitudinal data: A synthesis of
viewpoints. In D. SPEARITT (Ed.), The improvement of measurement in education andpsychology (pp. 89-126). Melbourne, Victoria: Australian Council for Educational Research(ACER).
ANDRICH, D. (1988a). A general form of Rasch's Extended Logistic Model for Partial CreditScoring. Applied Measurement in Education, 7(4), 363-378.
ANDRICH, D. (1988b). Rasch models for measurement. Sage university paper on quantitativeapplications in the social sciences (Series No. 07/068). Newbury Park, CA: Sage.
ANDRICH, D., DE JONG, J.H. & SHERIDAN, B. (1994, May l6-19). Diagnostic opportunities with theRasch model for ordered response categories. Paper presented at the IPN Symposium onApplications of Latent Trait and Latent Class Models in the Social Sciences, AkademieSankelmark, Germany.
Dow
nloa
ded
by [
Uni
vers
ity o
f C
ambr
idge
] at
15:
20 0
9 O
ctob
er 2
014
The Course Experience Questionnaire 63
ANDRICH, D. & VAN SCHOUBROECK, L. (1989). The General Health Questionnaire: A psycho-metric analysis using latent trait theory. Psychological Medicine, 19, 469-485.
DUBOIS, B. & BURNS, J.A. (1975). An analysis of the question mark response category inattitudinal scales. Educational and Psychological Measurement, 35, 869-884.
ENTWISTLE, N.J. & RAMSDEN, P. (1983). Understanding student learning. London: Croom Helm.JOHNSON, T. (1997). The 1996 Course Experience Questionnaire. Parkville, Victoria: Graduate
Careers Council of Australia.JOHNSON, T., ATNLEY, J. & LONG, M. (1996). The 1995 Course Experience Questionnaire. Parkville,
Victoria: Graduate Careers Council of Australia.LIKERT, R. (1932). A technique for the measurement of attitudes. Archives of Psychology, 140.LIKERT, R.D. (1991). Report of the research group on performance indicators in higher education.
Canberra: Australian Government Publishing Service (AGPS)MARTON, F. & SÄLJÖ, R. (1976). On qualitative differences in learning. II—Outcome as a function
of the learner's conception of the task. British Journal of Educational Psychology, 46, 115-127.RAMSDEN, P. (1991a). Report on the Course Experience Questionnaire trial. In R. LINKE (Ed.),
Performance indicators in higher education (Vol. 2). Canberra: Australian GovernmentPublishing Service (AGPS).
RAMSDEN, P. (1991b). A performance indicator of teaching quality in higher education: TheCourse Experience Questionnaire. Studies in Higher Education, 16, 129-150.
RAMSDEN, P. (1992). Learning to teach in higher education. London: Routledge.RAMSDEN, P. (1996, October 3-4). The validity and future of the Course Experience Questionnaire.
Paper delivered at the Australian Vice-Chancellors' Committee Course ExperienceSymposium, Griffith University, Queensland, Australia.
RASCH, G. (1980/1960). Probabilistic models for intelligence and attainment tests (expanded edition).Chicago: The University of Chicago Press (original work published in 1960)
SHERIDAN, B. (1993, April 10-11). Threshold location and Likert-style questionnaires. Paperpresented at the Seventh International Objective Measurement Workshop, AmericanEducational Research Association (AERA) Annual Meeting in Atlanta, U.S.A.
SHERIDAN, B. (1995). The Course Experience Questionnaire as a measure for evaluating courses inhigher education. Perth: Edith Cowan University, Measurement, Assessment and EvaluationLaboratory.
TREOLAR, D. (1994). Course Experience Questionnaire: Reaction. Paper presented at the SecondGraduate Careers Council of Australia Symposium in Sydney, NSW, Australia.
WILSON, K.L., LIZZIO, A. & RAMSDEN, P. (1996). The use and validation of the Course ExperienceQuestionnaire (Occasional Paper No. 6). Brisbane: Griffith University, Griffith Institute forHigher Education.
WRIGHT, B.D. (1985). Additivity in psychological measurement. In E.E. ROSKAM (Ed.),Measurement and personality assessment (pp. 101-112). Amsterdam: Elsevier North-Holland.
WRIGHT, B. & MASTERS, G. (1981). The measurement of knowledge and attitude [Research Memo-randum No. 30]. Chicago: University of Chicago, Department of Education, StatisticalLaboratory.
WRIGHT, B. & MASTERS, G. (1982). Rating scale analysis: Rasch measurement. Chicago: MESAPress.
Dow
nloa
ded
by [
Uni
vers
ity o
f C
ambr
idge
] at
15:
20 0
9 O
ctob
er 2
014
64 R. F. Waugh
Appendix
The Course Experience Questionnaire
[Note: Items marked with an asterisk are reverse scored. Sub-scales are not marked for thegraduates who answer the 25 items as their perceptions of their courses just completed.]
The Good Teaching Sub-scale (6 items)
3. The teaching staff of this course motivated me to do my best work.7. The staff put a lot of time into commenting on my work.
15. The staff made a real effort to understand difficulties I might be having with my work.16.* Feedback on my work was usually provided only in the form of marks or grades (1995, 1996
version).17. The teaching staff normally gave me helpful feedback on how I was going.18. My lecturers were extremely good at explaining things.20. The teaching staff worked hard to make their subjects interesting.
The Clear Goals and Standards Sub-scale (5 items)
1. It was always easy to know the standard of work expected.6. I usually had a clear idea of where I was going and what was expected of me in this course.
13.* It was often hard to discover what was expected of me in this course.16.* The course was overly theoretical and abstract (1994 version).24. The staff made it clear right from the start what they expected of students.
The Appropriate Assessment Sub-scale (3 items)
8.* To do well in this course all you really needed was a good memory.12.* The staff seemed more interested in testing what I had memorised than what I understood.19.* Too many staff asked me questions just about facts.
The Appropriate Workload Sub-scale (4 items)
4.* The work load was too heavy.14. I was generally given enough time to understand the things I had to learn.21.* There was a lot of pressure on me as a student in this course.23.* The sheer volume of work to be got through in this course meant that it couldn't all be
thoroughly comprehended.
The Generic Skills Sub-scale (6 items)
2. The course developed my problem-solving skills.5. The course sharpened my analytic skills.9. The course helped me develop my ability to work as a team member.
10. As a result of my course, I feel confident about tackling unfamiliar problems.11. The course improved skills in written communication.22. My course helped me to develop the ability to plan my own work.
Overall Satisfaction (1 item)
25. Overall, I was satisfied with the quality of this course.
Dow
nloa
ded
by [
Uni
vers
ity o
f C
ambr
idge
] at
15:
20 0
9 O
ctob
er 2
014