the course experience questionnaire: a rasch measurement model analysis

This article was downloaded by: [University of Cambridge]On: 09 October 2014, At: 15:20Publisher: RoutledgeInforma Ltd Registered in England and Wales Registered Number: 1072954Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

Higher Education Research &DevelopmentPublication details, including instructions for authors andsubscription information:http://www.tandfonline.com/loi/cher20

The Course ExperienceQuestionnaire: a RaschMeasurement Model AnalysisRussell F. Waugh aa Edith Cowan UniversityPublished online: 01 Nov 2006.

To cite this article: Russell F. Waugh (1998) The Course Experience Questionnaire: a RaschMeasurement Model Analysis, Higher Education Research & Development, 17:1, 45-64, DOI:10.1080/0729436980170103

To link to this article: http://dx.doi.org/10.1080/0729436980170103

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information(the “Content”) contained in the publications on our platform. However, Taylor& Francis, our agents, and our licensors make no representations or warrantieswhatsoever as to the accuracy, completeness, or suitability for any purpose of theContent. Any opinions and views expressed in this publication are the opinions andviews of the authors, and are not the views of or endorsed by Taylor & Francis. Theaccuracy of the Content should not be relied upon and should be independentlyverified with primary sources of information. Taylor and Francis shall not be liablefor any losses, actions, claims, proceedings, demands, costs, expenses, damages,and other liabilities whatsoever or howsoever caused arising directly or indirectly inconnection with, in relation to or arising out of the use of the Content.

This article may be used for research, teaching, and private study purposes. Anysubstantial or systematic reproduction, redistribution, reselling, loan, sub-licensing,systematic supply, or distribution in any form to anyone is expressly forbidden.Terms & Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions

http://www.tandfonline.com/loi/cher20

http://www.tandfonline.com/action/showCitFormats?doi=10.1080/0729436980170103

http://dx.doi.org/10.1080/0729436980170103

http://www.tandfonline.com/page/terms-and-conditions

http://www.tandfonline.com/page/terms-and-conditions

Higher Education Research & Development, Vol. 17, No. 1, 1998 45

The Course Experience Questionnaire:a Rasch Measurement Model AnalysisRUSSELL F. WAUGHEdith Cowan University

ABSTRACT The Course Experience Questionnaire (CEQ) is applied to graduatingstudents of Australian universities. Data from a selected university for graduates from 1994to 1996 were analysed using a Rasch measurement model. The whole scale and each of thefive sub-scales were analysed for each year separately, to investigate its conceptual designand validity. The results show that, taken together, at least 17 of the 25 items con form avalid scale measuring graduate perceptions of their courses for each of the three data groups.Of the five sub-scales, Good Teaching and Generic Skills are only moderately valid andreliable for use and interpretation separately from the main scale.

Introduction

The Course Experience Questionnaire (CEQ) consists of 25 items in a Likert formatwith five response categories. The questionnaire is used by most of the 37 universi-ties in Australia to gather data about teaching and course quality, as perceived bygraduates about 4 months after graduation. The CEQ is given out annually to allgraduates, by individual universities, along with the Graduate Destination Survey,and the results are sent to the Graduate Careers Council of Australia whichproduces reports covering all the universities (Johnson, 1997; Johnson, Ainley &Long, 1996). It is used to measure graduates' perceptions of the quality of theircompleted courses (see the CEQ and Johnson, 1997, p. 3). The items are conceptu-alised from five aspects relating to course experiences and the learning environment.These are Good Teaching (6 items, 1994; 7 items, 1995-1996); Clear Goals andStandards (5 items, 1994; 4 items, 1995-1996); Appropriate Assessment (4 items);Appropriate Workload (4 items); and Generic Skills (6 items); and a single item onoverall satisfaction.

The development of the CEQ is given in Ainley and Long (1994, 1995); Johnson(1997); Johnson, Ainley and Long (1996); and Ramsden (1991a, 1991b). It evolvedfrom work carried out at the University of Lancaster in the 1970s. The originalquestionnaire was based on a model of university teaching which involves curricu-lum, instruction, assessment and learning outcomes, and contained more items thanis currently used (see Ramsden 1991a, 1991b). It was intended that students wouldevaluate descriptions of distinct aspects of their learning environment. It has since

0729-4360/98/010045-20 © 1998 HERDSA

Dow

nloa

ded

by [

Uni

vers

ity o

f C

ambr

idge

] at

15:

20 0

9 O

ctob

er 2

014

46 R. F. Waugh

been revised for use in Australia and the Graduate Careers Council of Australiaadded-items relating to generic skills to the questionnaire. The CEQ now focuses onstudent perceptions of five aspects relating to courses and the learning environment.For recent commentary on the CEQ, see Johnson (1997); Johnson et al. (1996);and Wilson, Lizzio and Ramsden (1996), and, for earlier development work,see Ramsden; Linke (1991); Entwistle and Ramsden (1983); and Marton andSaljo (1976).

The CEQ has been analysed by traditional measurement techniques. Using largemulti-disciplinary 1992, 1993 and 1994 samples of students and graduates(N= 2,130, 1992, N= 1,362, 1993 and N= 7,370, 1994), the CEQ was found tohave reasonable internal reliability (Cronbach alphas between 0.67 and 0.88) andgood construct validity as judged by appropriate factor loadings on each of thesub-scales (Wilson et al., 1996).

In another study providing different results and using different techniques, Sheri-dan (1995) used graduate samples of about 400 from three universities and analysedthe CEQ using the Extended Logistic Model of Rasch (Andrich, 1988a, 1988b;Rasch, 1980/1960). Sheridan suggested that the CEQ should continue to be treatedas five separate measures of an overarching construct called course experience, butindicated that, "doubt exists regarding the measurement quality of the CEQ overalland of the continued use of this instrument in its present form" (p. 21). He wascritical of the sub-scales, the lack of labels on the five sub-scales, the mixing ofpositively and negatively worded items which can cause a respondent interactioneffect, the use of the neutral response category (which attracts many different typesof response) and the use of the Likert format (disagree to neutral to agree order).A more favoured format from a measurement perspective is a clearly ordered onelike never to all-the-time (Sheridan, 1995; Treolar, 1994).

The analysis by Sheridan (1995) showed that the CEQ sub-scales are suspect,except for the Good Teaching scale (reliabilities greater than 0.8). The sub-scaleslack reliability and the thresholds, which check on the consistency of the responsecategories, show that graduates from three universities reacted differently to theitems. This means that a degree of interaction between the items and the threeuniversity groups existed, making comparisons between university groups invalid orsuspect (p. 15). Furthermore, items 25, 16, 7, and 20 exhibit misfit to the modeland should be discarded or reworded.

Aims of the Study

The present study aims to investigate the psychometric properties and the concep-tual design of the CEQ as an instrument to measure the perceptions of courseexperiences of university graduates after they have graduated. It aims to do this byanalysing the psychometric properties of the whole scale and the five sub-scalesseparately, using the Extended Logistic Model of Rasch (Andrich, 1979, 1988a,1988b; Rasch, 1980/1960). This model creates an interval level scale from thedata where equal differences between numbers on the scale represent equal differ-ences in graduate perception measures and item difficulties, as appropriate, and it

Dow

nloa

ded

by [

Uni

vers

ity o

f C

ambr

idge

] at

15:

20 0

9 O

ctob

er 2

014

The Course Experience Questionnaire 47

does this by calibrating both item difficulties and graduate perceptions on the samescale.

The conceptual and theoretical design of the CEQ is based on five aspects of thegraduates' course experiences. In order to investigate the meaning of the question-naire, it is necessary to investigate the sub-scales separately, as well as the wholequestionnaire. Theoretically, in an ideal conception of the questionnaire, the itemsof the sub-scale should be related well enough to fit the Rasch model as separatesub-scales and together as a whole scale. This is a major benefit of using Raschmodel analysis because it helps in the theoretical development of the variable and itsmeaning.

Method

Data

Data for the present study were analysed in three groups. The first group was 1,635;1994 graduates from University X in Australia (44% response rate). The secondgroup was 2,430; 1995 graduates from University X who responded to the samesurvey (except that question 16 was changed, see the Appendix). This representeda 66% response rate. The third group was 2,696; 1996 graduates from University X(67% response rate). Graduates covered all six Faculties of the university, namely:the Academy of Performing Arts; Arts; Business; Education; Health and HumanSciences; and Science, Engineering and Technology.

Measurement

Taken individually, the 25 items of the CEQ can be used to interpret the responsesof graduates to their perceptions about the university courses that they undertook.This could provide a view of their perceptions from a qualitative point of view oneach item. However, if data on the 25 items are aggregated in some way, or used tocreate a scale and then interpreted, then seven criteria have to be met before it canbe said that the items form a valid and reliable scale.

The seven measurement criteria have been set out by Wright and Masters (1981).They involve: (1) an evaluation of whether each item functions as intended; (2) anestimation of the relative position (difficulty) of each valid item along the scale;(3) an evaluation of whether each person's responses form a valid response pattern;(4) an estimation of each person's relative score (perception) on the scale; (5)calibrating the person scores and the item scores together on a common scaledefined by the items with a constant interval from one end of the scale to the otherso that their numerical values mark off the scale in a linear way; (6) calculating thenumerical values with standard errors which indicate the precision of the measure-ments on the scale; and (7) checking that the items remain similar in their functionand meaning from person to person and group to group so that they are seen asstable and useful measures. The present study used these seven criteria to analysethe 25 items of the CEQ and its five sub-scales.

Dow

nloa

ded

by [

Uni

vers

ity o

f C

ambr

idge

] at

15:

20 0

9 O

ctob

er 2

014

48 R. F. Waugh

Measurement Model

The Extended Logistic Model of Rasch (Andrich, 1978, 1988a, 1988b; Rasch,1980/1960; Wright, 1985) was used with the computer program QUEST (Adams &Khoo, 1994) to create a scale, satisfying the seven measurement criteria of Wrightand Masters (1981). The scale is based on the log odds (called logits) of graduates'agreeing with the items. The items are ordered along the scale at interval measure-ment level from easiest with which to agree to hardest with which to agree. Items at theeasiest end of the scale (those with negative logit values) are answered in agreementby most students and those items at the hardest end of the scale (those with positivelogit values) are most likely to be answered in agreement only by students whoseperceptions are strongly positive. The Rasch method produces scale-free graduateperception measures and sample-free item difficulties (Andrich, 1988b; Wright &Masters, 1982). That is, the differences between pairs of graduate perceptionmeasures and item difficulties are expected to be sample independent.

The program checks on the consistency of the graduate responses and calculatesthe scale score needed for a 50% chance of passing from one response category tothe next; for example, from strongly disagree to disagree; from disagree to neutral;neutral to agree; and from agree to strongly agree, for each item. These scale scoresare called threshold values; they are calculated in logits and they must be ordered torepresent the increasing perception needed to answer from strongly disagreeto disagree, to neutral, to agree, to strongly agree. Items whose thresholds arenot ordered—that is, items for which the students do not use the categoriesconsistently—are not considered to fit the model and would be discarded.

The program checks that the graduate responses fit the measurement modelaccording to strict criteria. The criteria are described by Adams and Khoo (1994),Wright and Masters (1982) and Wright (1985). The fit statistics are weighted andunweighted mean squares that can be approximately normalised using the Wilson-Hilferty transformation. The normalised statistics are called infit t and outfit t andthey have a mean near 0 and a standard deviation near 1 when the data conform tothe measurement model. A fit mean square of 1 plus x indicates 100x% morevariation between the observed and predicted response patterns than would beexpected if the data and the model were compatible. Similarly, a fit mean square of1 minus x indicates 100x% less variation between the observed and predictedresponse patterns than would be expected if the data and the model were compat-ible. In this study, each item had to fit the model within a 30% variation betweenthe observed and expected response pattern or it was discarded. With such items,the graduate responses are not consistent with the responses on the other items inthe scale and there is not sufficient agreement amongst graduates as to the position(difficulty) of the items on the scale.

Reliability is calculated by the Item Separation Index and the Graduate Separ-ation Index. Separation Indices represent the proportion of observed varianceconsidered to be true. A value of 1 represents high reliability and a value of 0 is low(Wright & Masters, 1982). A combination of data is required as evidence for theconstruct validity of the CEQ. The Item and Graduate Separation Indices need to

Dow

nloa

ded

by [

Uni

vers

ity o

f C

ambr

idge

] at

15:

20 0

9 O

ctob

er 2

014


be high; the observed and expected item response patterns need to fit the measure-ment model according to strict criteria; the thresholds relating to passing from onecategory response to the next need to be ordered; and there needs to be a conceptualframework (theoretical or practical) linking the items of the scale together.

Data Analysis

The 1995 data were analysed with all the 25 items together and with each of thesub-scales of the CEQ separately. Six interval level scales were created for the 1996data: one for all 25 items and one for each of the five sub-scales. These analyses wererepeated for the 1995 and 1994 data.

Results

In the interest of conciseness and brevity, not all the results are presented here; onlywhat are considered the most important. For example, the values for the standarderrors of measurement for each item difficulty and graduate perception measure arenot presented; the threshold values for each response category of each item for eachof the scales created are not included; and the derived scales showing the positionsof the items and the graduate perceptions for each of the five sub-scales with eachof the three data groups are not presented.

Items 21 and 25 did not fit the model using any of the three data groups and sothe items were discarded. Of the remainder, item 9 did not fit the model for the1994 data; items 3 and 8 did not fit the model for the 1995 data; and items 8, 9 and17 did not fit the model for the 1996 data. A good scale was created with 17 itemsfor the 1996 data; 21 items for the 1995 data; and 22 items for the 1994 data.

The main results are set out in 12 tables and 2 figures. Table 2 shows thesummary statistics relating to the items fitting the model for each of the three datagroups. Table 3 shows the summary statistics relating to graduate perceptions foreach of the three data groups. Tables 4 and 5 show similar data for the GoodTeaching sub-scale; Tables 6 and 7 for the Clear Goals and Standards sub-scale;Tables 8 and 9 for the Appropriate Assessment sub-scale; Tables 10 and 11 for theAppropriate Workload sub-scale and Tables 12 and 13 for the Generic Skillssub-scale. Table 1 shows the item difficulties for the CEQ items fitting the model foreach of the three data groups. Figure 1 shows the CEQ scale with graduateperception measures and item difficulties calibrated on the same scale for the 1996data. Figure 2 shows the items fitting the model for the 1996 data.

Discussion

Psychometric Characteristics: Course Experience Questionnaire

The values of the infit mean squares and outfit mean squares are approximately1 and the values of the infit t-scores and outfit t-scores are approximately 0(see Tables 2 and 3). For each item, the mean squares are within 30% of the

Dow

nloa

ded

by [

Uni

vers

ity o

f C

ambr

idge

] at

15:

20 0

9 O

ctob

er 2

014

50 R. F. Waugh

TABLE 1. Difficulties of the items in logits for the three CEQscales created from 1994, 1995 and 1996 data

Item No.

12345

6789

10

1112131415

1617181920

21 Did not fit the22232425 Did not fit the

1994

+ 0.11-0.20+ 0.07+ 0.15-0.37

-0.10+ 0.48-0.40No fit-0.00

-0.31-0.15-0.03+ 0.04+ 0.17

+ 0.31+ 0.23+ 0.28-0.62+ 0.16

measurement-0.33+ 0.33+ 0.18

measurement

1995

+ 0.10-0.31No fit+ 0.18-0.44

-0.11+ 0.43No fit+ 0.24-0.04

-0.44-0.22+ 0.02-0.03+ 0.20

+ 0.35+ 0.14+ 0.22-0.53+ 0.10

model-0.30+ 0.33+ 0.17

model

1996

+ 0.14-0.22+ 0.05No fit-0.32

-0.00+ 0.52No fitNo fit-0.01

-0.31-0.07+ 0.06+ 0.04+ 0.31

No fitNo fit+ 0.29-0.53+ 0.23

-0.36No fit+ 0.17

Note: The items have similar difficulties across all three datagroups. This supports one of the aspects of a good scale,invariance of difficulties.

expected values, calculated according to the model (see Figure 2). These indicatethat the final sets of items of the CEQ for each data group have a strong fit to themeasurement model. This means that there is a strong agreement between allgraduates as to the difficulties of all items located at different positions on the scale.However, the items are not as well targeted as they could be (see Figure 1) and somemore easy and more difficult items are needed. The threshold values are orderedfrom low to high indicating that the graduates have answered consistently with anordered response format from strongly disagree to disagree; neutral; agree; andstrongly agree. The Indices of Graduate Perception and Item Separation range from0.87 to 0.90 (see Tables 2 and 3), indicating that the errors are low and that thepower of the tests of fit to the measurement model is good. The item difficulties andthe graduate perception measures are calibrated on the same scale and each item hasa similar difficulty value on the scale for each of the three data groups (see Table 1).

Dow

nloa

ded

by [

Uni

vers

ity o

f C

ambr

idge

] at

15:

20 0

9 O

ctob

er 2

014


Postive graduate perceptions4.0 logits

3.0 logits

2.0 logits

1.0 logits

0.0 logits

-1.0 logits

-2.0 logits

X

XX

XXXX

XXXX

XXXXXX

xxxxxxxxxxx

xxxxxxxxxxxxxxxxxxx

xxxxxxxxxxxxxxxxxxxxxxxx

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

xxxxxxxxxxxxxxxxxxxxxxxxx

xxxxxxxxxxxx

xxxxXX

XXXXX

Negative graduate perceptions

Difficult items

Items 7Items 15, 18Items 1, 20, 24Items 3, 6, 10, 12, 13, 14

Items 2, 5, 11,22Items 19

Easy items

Notes:

Each X represents 12 graduates.

The item difficulties and the graduate perceptions are calibrated on the same scale. The scale ismeasured in logits which is the log odds of graduates agreeing with the items.

N= 2,702, 1996 graduates.

L= 17 as 8 items (4, 8, 9, 17, 21, 23, 24 and 25) did not fit the model and were discarded.

The graduate perception scores range from - 1.5 logits to + 3.7 logits and the item difficulties rangefrom - 0.5 logits to + 0.5 logits. This means that the difficulties of the 17 items are not targetedappropriately for the graduates. They are too easy and, therefore, more difficult items need to beadded.

The difficult items are at the top of the right-hand side of the scale. Only graduates with strong positiveperceptions can agree with these items. The easy items are at the bottom right-hand side of the scale.Most graduates agree with these.

FIG. 1. Scale for the Course Experience Questionnaire using 1996 data.

This supports the view that sample-free item measures have been created. It can,therefore, be claimed that the items of the CEQ which fit the model have soundpsychometric properties and that a good scale has been created.

Dow

nloa

ded

by [

Uni

vers

ity o

f C

ambr

idge

] at

15:

20 0

9 O

ctob

er 2

014

52 R. F. Waugh

INFITMNSQ 0.63 0.71 0.83 1.00 1.20 1.40 1.60

1 item 12 item 23 item 35 item 56 item 67 item 710 item 1011 item 1112 item 1213 item 1314 item 1415 item 1518 item 1819 item 1920 item 2022 item 2224 item 24

FIG. 2. Fit mean square data of the Course Experience Questionaire items (17) that fit the modelfor the 1996 data.

TABLE 2. Item statistics for the Course Experience Questionnairea

Meanb

SDh

Separability11

Infit meand

Outfit meane

InfillOutfit/No. of itemsNon fit items

1994

+ 0.00+ 0.20+ 0.83+ 1.00+ 1.02-0.21+ 0.3122

9, 21, 25

1995

+ 0.00+ 0.29+ 0.90+ 1.00+ 1.02-0.33+ 0.2721

3, 8, 21, 25

1996

+ 0.00+ 0.28+ 0.88+ 1.00+ 1.02-0.31+ 0.3417

4,8,9, 16,17,21,23,25

Note: When the data are compatible with the model, the expected values of the mean squares areapproximately 1 and the expected values of the r-scores are approximately 0. a Items 4, 8, 12, 13,16, 19, 21 and 23 are reversed scored. b Mean and SD are the mean and standard deviation of theitem thresholds or the graduate attitude scores for the scale and sub-scales, as appropriate. Athreshold for an item step between two categories of the same item is the attitude score that is requiredfor a graduate to have a 50% chance of passing that step (such as passing from agree to strongly agreeon a Likert item). c Separation indices represent the proportion of observed variance considered tobe true. A value of 1 represents high separability and a value of 0 represents low separability (Wright& Masters, 1982; Wright, 1985). A separability value of 0.9 or more is sought for a good scale.d Infitmean refers to mean squares, unweighted. e Outfit mean refers to weighted mean squares. f Infit tand outfit t refers to the normalised t values using the Wilson-Hilferty transformation.

Meaning of the Scale

The items fitting the model that make up the variable: graduate perceptions abouttheir courses of study at universities, relate to their experiences of teaching quality;goals and standards; assessment; workload; and Generic Skills learned. These itemsdefine the variable. They have good content validity and they are derived from aconceptual framework based on previous research. This, together with the previous

Dow

nloa

ded

by [

Uni

vers

ity o

f C

ambr

idge

] at

15:

20 0

9 O

ctob

er 2

014


TABLE 3. Graduate statistics for the Course Experience Questionnairea

Meanb

SDb

Separability0

Infit meand

Outfit meane

Infit/Outfit/No. of graduatesGraduate responses (%)

1994

+ 0.29+ 0.70+ 0.87+ 1.02+ 1.02-0.15-0.09

1,63544

1995

+ 0.29+ 0.71+ 0.87+ 1.02+ 1.02-0.17-0.11

2,42866

1996

+ 0.51+ 0.92+ 0.89+ 1.01+ 1.02-0.18-0.09

2,69667

Note: When the data are compatible with the model, the expected values of the meansquares are approximately 1 and the expected values of the f-scores are approximately0. a Items 4, 8, 12, 13, 16, 19, 21 and 23 are reversed scored. b Mean and SD are themean and standard deviation of the item thresholds or the graduate attitude scores forthe scale and sub-scales, as appropriate. A threshold for an item step between twocategories of the same item is the attitude score that is required for a graduate to havea 50% chance of passing that step (such as passing from agree to strongly agree on aLikert item). c Separation indices represent the proportion of observed varianceconsidered to be true. A value of 1 represents high separability and a value of 0 representslow separability (Wright & Masters, 1982; Wright, 1985). A separability value of 0.9 ormore is sought for a good scale.d Infit mean refers to mean squares, unweighted. e Outfitmean refers to weighted mean squares. f Infit t and outfit t refers to the normalised t valuesusing the Wilson-Hilferty transformation.

data relating to reliability and fit to the measurement model, is strong evidence forthe construct validity of the variable. This means that the graduate responses to theitems "hang together" sufficiently well to represent the unobservable trait: graduateperceptions about their courses of study at universities. This trait involves, and isrelated to, the five aspects of the learning environment.

Items 21 and 25 do not fit the model because graduates cannot agree as to theirposition (difficulties) on the scale. Item 25 on course satisfaction does not relate toany of the five aspects of the learning environment and, on this point, is measuringsomething different about courses. Item 21 on pressure in the course does not relateto appropriate workload and is measuring something different. It is suggested thatthis item be reworded in a positive sense to focus on time pressure to learn what isrequired owing to workload in the course.

The Sub-scales of the Course Experience Questionnaire

The Good Teaching Sub-scale

The five items (3, 7, 15, 18 and 20; see the Appendix)) that make up thesub-variable, "Good Teaching", relate to graduate perceptions of how staff

Dow

nloa

ded

by [

Uni

vers

ity o

f C

ambr

idge

] at

15:

20 0

9 O

ctob

er 2

014

54 R. F. Waugh

TABLE 4.

Meanb

SDh

SeparabilityInfit meand

Outfit meane

Infit ?Outfit r5

No. of itemsNon-fit items

Item statistics for the

1994

+ 0.00+ 0.21+ 0.58+ 0.99+ 0.99-0.37-0.20

6None

Good Teaching

1995

+ 0.00+ 0.23+ 0.76+ 0.98+ 0.98-0.54-0.44

616

sub-scale3

1996

+ 0.00+ 0.24+ 0.80+ 0.98+ 0.99-0.60-0.44

516,17

Note: When the data are compatible with the model, the expected valuesof the mean squares are approximately 1 and the expected values of thet-scores are approximately 0. a Items 4, 8, 12, 13, 16, 19, 21 and 23 arereversed scored.b Mean and SD are the mean and standard deviation of theitem thresholds or the graduate attitude scores for the scale and sub-scales,as appropriate. A threshold for an item step between two categories of thesame item is the attitude score that is required for a graduate to have a 50%chance of passing that step (such as passing from agree to strongly agree ona likert item). c Separation indices represent the proportion of observedvariance considered to be true. A value of 1 represents high separability anda value of 0 represents low separability (Wright & Masters, 1982; Wright,1985). A separability value of 0.9 or more is sought for a good scale. d Infitmean refers to mean squares, unweighted.c Outfit mean refers to weightedmean squares. f Infit t and outfit t refers to the normalised t values using theWilson-Hilferty transformation.

motivate; comment; help; explain; and make the subjects interesting (items 16 and17 do not fit the model for the 1995 and 1996 data). The five items have a good fitto the measurement model; they have ordered thresholds indicating that the re-sponses are answered consistently; good graduate separability indices ranging from0.81 to 0.83; and reasonable item separability indices from + 0.76 to + 0.80, exceptfor an item separability index of + 0.58 with 1994 data (see Tables 4 and 5). Thedifficulties of the items are not as well targeted against the graduate perceptions asthey could be (scale not included here). It would seem that this sub-scale could beused separately from the full scale, if needed. Overall, it can be said that, while theGood Teaching sub-scale has some satisfactory psychometric properties, there isroom for improvement. The scale could be improved by adding more easy and harditems to better target the graduates and by rewording items 16 and 17. Items 16(1995 data) and 16 and 17 (1996 data) do not fit the model because graduatescannot agree on their position (difficulty) on the scale. It is suggested that item 17be reworded along the lines of, "The teaching staff gave me helpful feedback on myset work/assignments". Item 16 probably has to be discarded.

Dow

nloa

ded

by [

Uni

vers

ity o

f C

ambr

idge

] at

15:

20 0

9 O

ctob

er 2

014


TABLE 5. Graduate statistics for the Good Teaching sub-scalea

Meanb

SDb

Separability0

Infit meand

Outfit meanc

Infit I5

Outfit/No. of graduates

1994

+ 0.13+ 1.43+ 0.83+ 0.99+ 0.99-0.21-0.10

1,635

1995

+ 0.16+ 1.49+ 0.84+ 0.98+ 0.96-0.21-0.09

2,430

1996

+ 0.34+ 1.50+ 0.81+ 0.99+ 0.99-0.19-0.07

2,637

Note: When the data are compatible with the model, the expected values of the meansquares are approximately 1 and the expected values of the r-scores are approximately0. a Items 4, 8, 12, 13, 16, 19, 21 and 23 are reversed scored. b Mean and SD are themean and standard deviation of the item thresholds or the graduate attitude scores forthe scale and sub-scales, as appropriate. A threshold for an item step between twocategories of the same item is the attitude score that is required for a graduate to havea 50% chance of passing that step (such as passing from agree to strongly agree on aLikert item). c Separation indices represent the proportion of observed varianceconsidered to be true. A value of 1 represents high separability and a value of 0 representslow separability (Wright & Masters, 1982; Wright, 1985). A separability value of 0.9 ormore is sought for a good scale. d Infit mean refers to mean squares, unweighted.e Outfitmean refers to weighted mean squares. f Infit t and outfit t refers to the normalised t valuesusing the Wilson-Hilferty transformation.

The Clear Goals and Standards Sub-scale

The four items (1, 6, 13 and 24; see the Appendix) that make up the sub-variable,"Clear Goals and Standards", relate to graduate perceptions of the standards andgoals expected in the course. (Item 16 did not fit the measurement model for the1994 data and was included with the Good Teaching sub-scale for the 1995 and1996 data groups, where it also didn't fit the model.) The four items have a goodfit to the measurement model; they have ordered thresholds indicating that theresponses are answered consistently, and moderate graduate and item separabilityindices from + 0.62 to + 0.74 for the three data groups, except for the 1996 datagroup where the index is 0.28 (see Tables 6 and 7). The difficulties of the items arenot as well targeted against the graduate perceptions as they could be because thereare too few items (scale not included here). While the four items of the Clear Goalsand Standards sub-scale have moderately good psychometric properties, they shouldnot be used as a separate scale without major modification. It is suggested that somemore easy and hard items be added.

The Appropriate Assessment Sub-scale

The three items (8, 12 and 19; see the Appendix) that make up the sub-variable,"Appropriate Assessment", relate to graduate perceptions of memorisation and the

Dow

nloa

ded

by [

Uni

vers

ity o

f C

ambr

idge

] at

15:

20 0

9 O

ctob

er 2

014

56 R. F. Waugh

TABLE 6. Item statistics for the Clear Goals and Standards sub-scalea

Meanb

SDb

Separability0

Infit meand

Outfit mean'Infit/Outfit/No. of itemsNon-fit items

1994

+ 0.00+ 0.20+ 0.62+ 0.99+ 1.03-0.71-0.13

416

1995

+ 0.00+ 0.19+ 0.64+ 0.98+ 0.97-0.86-0.91

4None

1996

+ 0.00+ 0.12+ 0.28+ 0.97+ 0.97-1.00-0.90

4None

Note: When the data are compatible with the model, the expected values ofthe mean squares are approximately 1 and the expected values of the r-scoresare approximately 0. a Items 4, 8, 12, 13, 16, 19, 21 and 23 are reversed scored.b Mean and SD are the mean and standard deviation of the item thresholds orthe graduate attitude scores for the scale and sub-scales, as appropriate. Athreshold for an item step between two categories of the same item is theattitude score that is required for a graduate to have a 50% chance of passingthat step (such as passing from agree to strongly agree on a Likert item).c Separation indices represent the proportion of observed variance consideredto be true. A value of 1 represents high separability and a value of 0 representslow separability (Wright & Masters, 1982; Wright, 1985). A separability valueof 0.9 or more is sought for a good scale. d Infit mean refers to mean squares,unweighted. c Outfit mean refers to weighted mean squares. f Infit t and outfitr refers to the normalised t values using the Wilson-Hilferty transformation.

learning of facts in the course. The three items have a satisfactory fit to themeasurement model. However, while they have ordered thresholds indicating thatthe responses are answered consistently, they also have low graduate and itemseparability indices from + 0.55 to + 0.77 for the three data groups (see Tables 8and 9). The low reliability is directly attributable to the low number of items and,hence, to the poor targeting of the items. The current three items of the AppropriateAssessment sub-scale do not have sound psychometric properties and cannot beused as a separate scale without major modification. It is suggested that more easyand hard items be added to better target the graduates.

The Appropriate Workload Sub-scale

The four items (4, 14, 21 and 23; see the Appendix) that make up the sub-variable,"Appropriate Workload", relate to graduate perceptions of the amount of work, thepressure and lack of time to comprehend everything in the course. While item 21 fitsthe sub-variable, it does not fit the full CEQ scale, indicating that all graduates canagree on its common difficulty in the sub-scale but not in the full scale. The four

Dow

nloa

ded

by [

Uni

vers

ity o

f C

ambr

idge

] at

15:

20 0

9 O

ctob

er 2

014


TABLE 7. Graduate statistics for the Clear Goals and Standards sub-scalea

Meanb

SDb

Separability'Infit meand

Outfit mean'Infit fOutfit ^No. of graduates

1994

+ 0.26+ 1.05+ 0.71+ 1.04+ 1.03-0.14-0.04

1,635

1995

+ 0.36+ 1.38+ 0.74+ 0.97+ 0.97-0.29-0.14

2,430

1996

+ 0.51+ 1.37+ 0.74+ 0.97+ 0.97-0.30-0.15

2,618

Note: When the data are compatible with the model, the expected values of the meansquares are approximately 1 and the expected values of the t-scores are approximately0. a Items 4, 8, 12, 13, 16, 19, 21 and 23 are reversed scored. b Mean and SD are themean and standard deviation of the item thresholds or the graduate attitude scores forthe scale and sub-scales, as appropriate. A threshold for an item step between twocategories of the same item is the attitude score that is required for a graduate to havea 50% chance of passing that step (such as passing from agree to strongly agree on aLikert item). c Separation indices represent the proportion of observed varianceconsidered to be true. A value of 1 represents high separability and a value of 0 representslow separability (Wright & Masters, 1982; Wright, 1985). A separability value of 0.9 ormore is sought for a good scale. d Infit mean refers to mean squares, unweighted. e Outfitmean refers to weighted mean squares. f Infit t and outfit t refers to the normalised t valuesusing the Wilson-Hilferty transformation.

items have a reasonable fit to the measurement model in the sub-scale. However,while they have ordered thresholds, indicating that the responses are answeredconsistently, they have only moderate graduate and item separability indices from+ 0.68 to + 0.91 for the three data groups (see Tables 10 and 11). The moderateseparability is directly attributable to the low number of items and to the poortargeting of the items (not shown here). The current four items of the AppropriateWorkload sub-scale do not have sound psychometric properties and cannot be usedas a separate scale without major modification. Similar modifications should bemade to this sub-scale as to the previous sub-scale.

The Generic Skills Sub-scale

The six items (2, 5, 10, 11 and 22; item 9 did not fit the model for any group anditem 10 did not fit the model for the 1994 group, see the Appendix), that make upthe sub-variable, "Generic Skills", relate to graduate perceptions of their problem-solving ability; analytical skills; communication skills; ability to plan ahead; andwork as a team member, and to their confidence in tackling unfamiliar problems, asdeveloped in the course. The five items have a good fit to the measurement model.They have ordered thresholds, indicating that the responses are answered consist-ently, and moderate graduate and item separability indices (see Tables 12 and 13).This sub-scale could be used separately from the full 23 item scale, if needed.Overall, it can be said that, while the Generic Skills sub-scale has some satisfactory

Dow

nloa

ded

by [

Uni

vers

ity o

f C

ambr

idge

] at

15:

20 0

9 O

ctob

er 2

014

58 R. F. Waugh

TABLE 8. Item statistics for the Appropriate Assessment sub-scalea

Meanb

SDb

Separability1

Infit meand

Outfit meanc

Infit/Outfit/No. of itemsNon-fit items

1994

+ 0.00+ 0.20+ 0.69+ 0.99+ 0.98-0.43-0.45

3None

1995

+ 0.00+ 0.21+ 0.65+ 0.98+ 0.98-0.69-0.63

3None

1996

+ 0.00+ 0.25+ 0.77+ 0.99+ 0.98-0.50-0.72

3None

Note: When the data are compatible with the model, the expected values ofthe mean squares are approximately 1 and the expected values of the t-scoresare approximately 0.a Items 4, 8, 12, 13, 16, 19, 21 and 23 are reversed scored.b Mean and SD are the mean and standard deviation of the item thresholds orthe graduate attitude scores for the scale and sub-scales, as appropriate. Athreshold for an item step between two categories of the same item is theattitude score that is required for a graduate to have a 50% chance of passingthat step (such as passing from agree to strongly agree on a Likert item).c Separation indices represent the proportion of observed variance consideredto be true. A value of 1 represents high separability and a value of 0 representslow separability (Wright & Masters, 1982; Wright, 1985). A separability valueof 0.9 or more is sought for a good scale. d Infit mean refers to mean squares,unweighted. e Outfit mean refers to weighted mean squares. f Infit t and outfitt refers to the normalised t values using the Wilson-Hilferty transformation.

psychometric properties, the scale could be improved by adding more easy and harditems to better target the graduates. Item 9 seems out of place because many coursesdo not aim to develop team work. The item could be changed to read, "The coursehelped to develop my general ability to see other points of view". Item 10 could bemodified to read, "As a result of my course, I can try to solve unfamiliar problems".

Problems with the Likert Format

Although the Likert format is commonly used in attitude measures, its use has beencalled into question in modern attitudinal measurement (Andrich, 1982; Andrich,de Jong & Sheridan, 1994; Dubois & Burns, 1975; Sheridan, 1993, 1995). Threeissues have been questioned. The first relates to the middle or neutral category andits interpretation. The second relates to the use of the response categories which arenot considered to represent a true ordering from low to high, and the third relatesto the mixing of negatively and positively worded items to avoid the fixed responsecategory syndrome.

Dow

nloa

ded

by [

Uni

vers

ity o

f C

ambr

idge

] at

15:

20 0

9 O

ctob

er 2

014


TABLE 9. Graduate statistics for the Appropriate Assessment sub-scalea

Meanb

SDb

Separability0

Infit meand

Outfit meanc

InfillOutfit r*No. of graduates

1994

+ 0.73+ 1.16+ 0.56+ 0.96+ 0.98-0.21-0.05

1,635

1995

+ 0.70+ 1.15+ 0.55+ 0.96+ 0.98-0.26-0.00

2,430

1996

+ 0.78+ 1.15+ 0.54+ 0.95+ 0.98-0.25-0.08

2,466

Note: When the data are compatible with the model, the expected values of the meansquares are approximately 1 and the expected values of the r-scores are approximately0. a Items 4, 8, 12, 13, 16, 19, 21 and 23 are reversed scored. b Mean and SD are themean and standard deviation of the item thresholds or the graduate attitude scores forthe scale and sub-scales, as appropriate. A threshold for an item step between twocategories of the same item is the attitude score that is required for a graduate to havea 50% chance of passing that step (such as passing from agree to strongly agree on aLikert item). c Separation indices represent the proportion of observed varianceconsidered to be true. A value of 1 represents high separability and a value of 0 representslow separability (Wright & Masters, 1982; Wright, 1985). A separability value of 0.9 ormore is sought for a good scale.d Infit mean refers to mean squares, unweighted. e Outfitmean refers to weighted mean squares. f Infit t and outfit t refers to the normalised t valuesusing the Wilson-Hilferty transformation.

TABLE 10. Item statistics for the Appropriate Workload sub-scalea

Meanb

SDb

Separabilityc

Infit meand

Outfit meane

Infit fOutfit fNo. of itemsNon-fit items

1994

+ 0.00+ 0.30+ 0.80+ 0.99+ 0.99-0.57-0.45

4None

1995

+ 0.00+ 0.29+ 0.88+ 0.99+ 0.99-0.51-0.47

4None

1996

+ 0.00+ 0.34+ 0.91+ 1.00+ 1.00-0.27-0.20

4None

Note: When the data are compatible with the model, the expected values ofthe mean squares are approximately 1 and the expected values of the r-scoresare approximately 0. a Items 4, 8, 12, 13, 16, 19, 21 and 23 are reversed scored.b Mean and SD are the mean and standard deviation of the item thresholds orthe graduate attitude scores for the scale and sub-scales, as appropriate. Athreshold for an item step between two categories of the same item is theattitude score that is required for a graduate to have a 50% chance of passingthat step (such as passing from agree to strongly agree on a Likert item).c Separation indices represent the proportion of observed variance consideredto be true. A value of 1 represents high separability and a value of 0 representslow separability (Wright & Masters, 1982; Wright, 1985). A separability valueof 0.9 or more is sought for a good scale. d Infit mean refers to mean squares,unweighted.e Outfit mean refers to weighted mean squares. f Infit t and outfitt refers to the normalised t values using the Wilson-Hilferty transformation.

Dow

nloa

ded

by [

Uni

vers

ity o

f C

ambr

idge

] at

15:

20 0

9 O

ctob

er 2

014

60 R. F. Waugh

TABLE 11. Graduate statistics for the Appropriate Workload sub-scalea

Meanb

SDh

Separability0

Infit meand

Outfit mean6

Infit fOutfit fNo. of graduates

1994

+ 0.10+ 1.10+ 0.69+ 0.97+ 0.99-0.21-0.06

1,635

1995

+ 0.05+ 1.12+ 0.68+ 0.98+ 0.99-0.20-0.05

2,430

1996

+ 0.17+ 1.15+ 0.68+ 0.99+ 1.00-0.20-0.06

2,663

Note: When the data are compatible with the model, the expected values of the meansquares are approximately 1 and the expected values of the t-scores are approximately0. a Items 4, 8, 12, 13, 16, 19, 21 and 23 are reversed scored. b Mean and SD are themean and standard deviation of the item thresholds or the graduate attitude scores forthe scale and sub-scales, as appropriate. A threshold for an item step between twocategories of the same item is the attitude score that is required for a graduate to havea 50% chance of passing that step (such as passing from agree to strongly agree on aIikert item). c Separation indices represent the proportion of observed varianceconsidered to be true. A value of 1 represents high separability and a value of 0 representslow separability (Wright & Masters, 1982; Wright, 1985). A separability value of 0.9 ormore is sought for a good scale. d Infit mean refers to mean squares, unweighted. e Outfitmean refers to weighted mean squares. f Infit t and outfit t refers to the normalised t valuesusing the Wilson-Hilferty transformation.

TABLE 12. Item statistics for the Generic Skills sub-scalea

Meanb

SDb

Separability0

Infit meand

Outfit meanc

Infit fOutfit/No. of itemsNon-fit items

1994

+ 0.00+ 0.11+ 0.62+ 0.98+ 0.98-0.68-0.34

49,10

1995

+ 0.00+ 0.24+ 0.77+ 0.99+ 0.99-0.41-0.33

59

1996

+ 0.00+ 0.18+ 0.62+ 0.98+ 0.99-0.75-0.34

59

Note: When the data are compatible with the model, the expected values of the meansquares are approximately 1 and the expected values of the t-scores are approximately0. a Items 4, 8, 12, 13, 16, 19, 21 and 23 are reversed scored. b Mean and SD are themean and standard deviation of the item thresholds or the graduate attitude scores forthe scale and sub-scales, as appropriate. A threshold for an item step between twocategories of the same item is the attitude score that is required for a graduate to havea 50% chance of passing that step (such as passing from agree to strongly agree on aLikert item). c Separation indices represent the proportion of observed varianceconsidered to be true. A value of 1 represents high separability and a value of 0 representslow separability (Wright & Masters, 1982; Wright, 1985). A separability value of 0.9 ormore is sought for a good scale. d Infit mean refers to mean squares, unweighted. e Outfitmean refers to weighted mean squares. f Infit t and outfit t refers to the normalised t valuesusing the Wilson-Hilferty transformation.

Dow

nloa

ded

by [

Uni

vers

ity o

f C

ambr

idge

] at

15:

20 0

9 O

ctob

er 2

014


TABLE 13. Graduate statistics for the Generic Skills sub-scalea

Meanb

SDb

Separability0

Infit meand

Outfit mean'Infit £Outfit £No. of graduates

1994

+ 0.79+ 1.22+ 0.67+ 0.99+ 0.99-0.24-0.10

1,635

1995

+ 0.87+ 1.31+ 0.75+ 0.99+ 0.99-0.21-0.09

2,430

1996

+ 0.97+ 1.33+ 0.75+ 1.00+ 0.99-0.21-0.10

2,628

Note: When the data are compatible with the model, the expected values of the meansquares are approximately 1 and the expected values of the r-scores are approximately0. a Items 4, 8, 12, 13, 16, 19, 21 and 23 are reversed scored. b Mean and SD are themean and standard deviation of the item thresholds or the graduate attitude scores forthe scale and sub-scales, as appropriate. A threshold for an item step between twocategories of the same item is the attitude score that is required for a graduate to havea 50% chance of passing that step (such as passing from agree to strongly agree on aLikert item). c Separation indices represent the proportion of observed varianceconsidered to be true. A value of 1 represents high separability and a value of 0 representslow separability (Wright & Masters, 1982; Wright, 1985). A separability value of 0.9 ormore is sought for a good scale. d Infit mean refers to mean squares, unweighted. e Outfitmean refers to weighted mean squares. f Infit t and outfit t refers to the normalised t valuesusing the Wilson-Hilferty transformation.

The middle category between agree and disagree attracts a variety of responsessuch as "don't know", "unsure", "neutral", and "don't want to answer" and theseare at odds with the implied ordered responses from strongly disagree to stronglyagree. With a Rasch measurement analysis, this difficulty of graduate interpretationwould be indicated by reversed thresholds and misfit to the measurement model.Strangely, in the present study, all the thresholds were nicely ordered and thisproblem did not seem to be present, although there is still the problem of interpret-ation of the results. That does not mean that the problem will not be present in otheradministrations of the CEQ; it just did not show as a problem with the presentstudy.

From a measurement perspective, and for some graduates, the range fromstrongly disagree to strongly agree is not ordered from low to high. Again, in a Raschmeasurement analysis, this would be indicated by reverse thresholds and misfit tothe model. While the present study did not show any reverse thresholds, the problemof interpretation of the results is still present.

It is suggested that the items of the CEQ should be modified to overcome thesetwo problems of interpretation. One way to do this is to change the response formatto a clearly increasing order such as never, sometimes, a great deal or all the time.Another is to use numbers or a range of numbers starting from 0.

Although it has been common practice in attitude measures to mix negatively andpositively worded items to avoid the fixed category response syndrome, its practicehas been called into question during Rasch attitude measurement analysis (Andrich

Dow

nloa

ded

by [

Uni

vers

ity o

f C

ambr

idge

] at

15:

20 0

9 O

ctob

er 2

014

62 R. F. Waugh

& van Schoubroeck, 1989; Sheridan, 1995). It is claimed that mixing negatively andpositively worded items causes many respondents to link answers between items andthis relates to an interaction effect between items and different groups of respon-dents resulting in the loss of invariance of the items. In the present study, this couldexplain why items 16, 21 and 25 did not fit the model. It could also partially explainthe variation in the extent of fit of items to the model and be related to the lowerseparability (reliability) of the items in the sub-scales.

Conclusions

Taken separately, each of the 25 items of the CEQ can be used to provide qualitativedata about graduate perceptions of the courses they have completed at university.Taken together, 17 items for the 1996 data, 21 items for the 1995 data and 22 itemsfor the 1994 data form valid and reliable scales. Some improvements could be madeto the scales by adding some more easy and hard items to better target the graduates.The conceptual design of the CEQ from the five aspects: Good Teaching; ClearGoals and Standards; Appropriate Assessment; Appropriate Workload; and GenericSkills, is confirmed and the Rasch measurement model has been useful in examiningits meaning and conceptual design.

Of the five aspects of the CEQ, only Good Teaching and Generic Skills formmoderately valid and reliable sub-scales which could be used and interpretedseparately. Both the valid and not-so-valid sub-scales could be improved by increas-ing the number of easy and hard items to provide better targeting.

Address for correspondence: Dr Russell F. Waugh, Edith Cowan University,Pearson Street, Churchlands, WA 6018, Australia. E-mail: [email protected]

ReferencesADAMS, R.J. & KHOO, S.T. (1994). QUEST: The interactive test analysis system. Melbourne:

Australian Council for Educational Research (ACER).AINLEY, J. & LONG, M. (1994). The Course Experience Survey 1992 Graduates. Canberra: Australian

Government Publishing Service, AGPS.AINLEY, J. & LONG, M. (1995). The 1995 Course Experience Questionnaire: An interim report.

Melbourne: Australian Council for Educational Research (ACER) for the GCCA.ANDRICH, D. (1978). A rating formulation for ordered response categories. Psychometrika, 43,

561-573.ANDRICH, D. (1982). Using latent trait measurement to analyse attitudinal data: A synthesis of

viewpoints. In D. SPEARITT (Ed.), The improvement of measurement in education andpsychology (pp. 89-126). Melbourne, Victoria: Australian Council for Educational Research(ACER).

ANDRICH, D. (1988a). A general form of Rasch's Extended Logistic Model for Partial CreditScoring. Applied Measurement in Education, 7(4), 363-378.

ANDRICH, D. (1988b). Rasch models for measurement. Sage university paper on quantitativeapplications in the social sciences (Series No. 07/068). Newbury Park, CA: Sage.

ANDRICH, D., DE JONG, J.H. & SHERIDAN, B. (1994, May l6-19). Diagnostic opportunities with theRasch model for ordered response categories. Paper presented at the IPN Symposium onApplications of Latent Trait and Latent Class Models in the Social Sciences, AkademieSankelmark, Germany.

Dow

nloa

ded

by [

Uni

vers

ity o

f C

ambr

idge

] at

15:

20 0

9 O

ctob

er 2

014


ANDRICH, D. & VAN SCHOUBROECK, L. (1989). The General Health Questionnaire: A psycho-metric analysis using latent trait theory. Psychological Medicine, 19, 469-485.

DUBOIS, B. & BURNS, J.A. (1975). An analysis of the question mark response category inattitudinal scales. Educational and Psychological Measurement, 35, 869-884.

ENTWISTLE, N.J. & RAMSDEN, P. (1983). Understanding student learning. London: Croom Helm.JOHNSON, T. (1997). The 1996 Course Experience Questionnaire. Parkville, Victoria: Graduate

Careers Council of Australia.JOHNSON, T., ATNLEY, J. & LONG, M. (1996). The 1995 Course Experience Questionnaire. Parkville,

Victoria: Graduate Careers Council of Australia.LIKERT, R. (1932). A technique for the measurement of attitudes. Archives of Psychology, 140.LIKERT, R.D. (1991). Report of the research group on performance indicators in higher education.

Canberra: Australian Government Publishing Service (AGPS)MARTON, F. & SÄLJÖ, R. (1976). On qualitative differences in learning. II—Outcome as a function

of the learner's conception of the task. British Journal of Educational Psychology, 46, 115-127.RAMSDEN, P. (1991a). Report on the Course Experience Questionnaire trial. In R. LINKE (Ed.),

Performance indicators in higher education (Vol. 2). Canberra: Australian GovernmentPublishing Service (AGPS).

RAMSDEN, P. (1991b). A performance indicator of teaching quality in higher education: TheCourse Experience Questionnaire. Studies in Higher Education, 16, 129-150.

RAMSDEN, P. (1992). Learning to teach in higher education. London: Routledge.RAMSDEN, P. (1996, October 3-4). The validity and future of the Course Experience Questionnaire.

Paper delivered at the Australian Vice-Chancellors' Committee Course ExperienceSymposium, Griffith University, Queensland, Australia.

RASCH, G. (1980/1960). Probabilistic models for intelligence and attainment tests (expanded edition).Chicago: The University of Chicago Press (original work published in 1960)

SHERIDAN, B. (1993, April 10-11). Threshold location and Likert-style questionnaires. Paperpresented at the Seventh International Objective Measurement Workshop, AmericanEducational Research Association (AERA) Annual Meeting in Atlanta, U.S.A.

SHERIDAN, B. (1995). The Course Experience Questionnaire as a measure for evaluating courses inhigher education. Perth: Edith Cowan University, Measurement, Assessment and EvaluationLaboratory.

TREOLAR, D. (1994). Course Experience Questionnaire: Reaction. Paper presented at the SecondGraduate Careers Council of Australia Symposium in Sydney, NSW, Australia.

WILSON, K.L., LIZZIO, A. & RAMSDEN, P. (1996). The use and validation of the Course ExperienceQuestionnaire (Occasional Paper No. 6). Brisbane: Griffith University, Griffith Institute forHigher Education.

WRIGHT, B.D. (1985). Additivity in psychological measurement. In E.E. ROSKAM (Ed.),Measurement and personality assessment (pp. 101-112). Amsterdam: Elsevier North-Holland.

WRIGHT, B. & MASTERS, G. (1981). The measurement of knowledge and attitude [Research Memo-randum No. 30]. Chicago: University of Chicago, Department of Education, StatisticalLaboratory.

WRIGHT, B. & MASTERS, G. (1982). Rating scale analysis: Rasch measurement. Chicago: MESAPress.

Dow

nloa

ded

by [

Uni

vers

ity o

f C

ambr

idge

] at

15:

20 0

9 O

ctob

er 2

014

64 R. F. Waugh

Appendix

The Course Experience Questionnaire

[Note: Items marked with an asterisk are reverse scored. Sub-scales are not marked for thegraduates who answer the 25 items as their perceptions of their courses just completed.]

The Good Teaching Sub-scale (6 items)

3. The teaching staff of this course motivated me to do my best work.7. The staff put a lot of time into commenting on my work.

15. The staff made a real effort to understand difficulties I might be having with my work.16.* Feedback on my work was usually provided only in the form of marks or grades (1995, 1996

version).17. The teaching staff normally gave me helpful feedback on how I was going.18. My lecturers were extremely good at explaining things.20. The teaching staff worked hard to make their subjects interesting.

The Clear Goals and Standards Sub-scale (5 items)

1. It was always easy to know the standard of work expected.6. I usually had a clear idea of where I was going and what was expected of me in this course.

13.* It was often hard to discover what was expected of me in this course.16.* The course was overly theoretical and abstract (1994 version).24. The staff made it clear right from the start what they expected of students.

The Appropriate Assessment Sub-scale (3 items)

8.* To do well in this course all you really needed was a good memory.12.* The staff seemed more interested in testing what I had memorised than what I understood.19.* Too many staff asked me questions just about facts.

The Appropriate Workload Sub-scale (4 items)

4.* The work load was too heavy.14. I was generally given enough time to understand the things I had to learn.21.* There was a lot of pressure on me as a student in this course.23.* The sheer volume of work to be got through in this course meant that it couldn't all be

thoroughly comprehended.

The Generic Skills Sub-scale (6 items)

2. The course developed my problem-solving skills.5. The course sharpened my analytic skills.9. The course helped me develop my ability to work as a team member.

10. As a result of my course, I feel confident about tackling unfamiliar problems.11. The course improved skills in written communication.22. My course helped me to develop the ability to plan my own work.

Overall Satisfaction (1 item)

25. Overall, I was satisfied with the quality of this course.

Dow

nloa

ded

by [

Uni

vers

ity o

f C

ambr

idge

] at

15:

20 0

9 O

ctob

er 2

014

the course experience questionnaire: a rasch measurement model analysis

Documents