rasch analysis of three versions of the oswestry disability questionnaire

10
Manual Therapy 13 (2008) 222–231 Original article Rasch analysis of three versions of the Oswestry Disability Questionnaire Megan Davidson ,1 School of Physiotherapy, La Trobe University, Vic. 3058, Australia Received 11 October 2006; received in revised form 9 January 2007; accepted 17 January 2007 Abstract The purpose of the study was to explore the construct validity of three versions of the Oswestry Disability Questionnaire for low back pain using Rasch analysis. The three versions of the ODQ share 9 items and differ on one other. About 100 patients with non-specific low back pain seeking physiotherapy treatment at hospital outpatient departments and physiotherapy private practices completed the 12 Oswestry items as part of a battery of questionnaires. Rasch analysis revealed that four items (Personal Care, Standing, Sex Life and Social Life) had disordered response thresholds and one item (Walking) showed differential item functioning by age. The 10 standard Oswestry items and a modified version in which Sex Life is replaced by Work/Housework showed adequate overall fit to the Rasch model (w 2 P4.01). The third version, in which Sex Life is replaced by Changing Degree of Pain, did not fit the model (w 2 P ¼ .006) and the Changing Degree of Pain item was misfitting (residual 2.34, P ¼ .007). These findings suggest that either of the first two of the three versions of this widely used low back pain outcome measure should be selected over the third. Users should also be aware that for some items the rating scale steps do not perform as intended. r 2007 Elsevier Ltd. All rights reserved. Keywords: Questionnaires; Validity; Low back pain; Rasch analysis 1. Introduction The Oswestry Disability Questionnaire (ODQ) is one of the oldest self-report questionnaires for measuring functional outcomes in patients with low back pain and remains widely used (Fairbank et al., 1980; Grotle et al., 2004). The ODQ was developed as a clinical assessment tool that would provide an estimate of disability expressed as a percentage score. Ten sections or items assess pain, personal care, lifting, walking, sitting, standing, sleeping, sex life, social life and travelling. The developers provided little detail on how the items were selected, saying only that the activities chosen were those most relevant to people with low back pain. Each item of the ODQ has 6 response choices arranged in order of difficulty and the respondent is asked to select the response ‘‘that most closely describes you today’’. For example, the Sitting section responses are I can sit in any chair as long as I like, I can only sit in my favourite chair as long as I like, Pain prevents me sitting more than 1 hr, Pain prevents me from sitting more than 30 min, Pain prevents me from sitting more than 10 min and Pain prevents me from sitting at all. A score of 0 is awarded if the first response option is selected, through to 5 for the last option. A total score is calculated by summing the individual items scores, dividing by the total possible score (adjusted if any items are missed) and multiplied by 100. The possible score range is 0–100 and a higher score indicates greater disability. The ODQ is therefore an atypical ARTICLE IN PRESS www.elsevier.com/locate/math 1356-689X/$ - see front matter r 2007 Elsevier Ltd. All rights reserved. doi:10.1016/j.math.2007.01.008 Corresponding author. Tel.: +61 3 9479 5798; fax: +61 3 9479 5766. E-mail address: [email protected]. 1 Department/Institution to which the work should be attributed: Musculoskeletal Research Centre, School of Physiotherapy, La Trobe University, Vic. 3086, Australia.

Upload: megan-davidson

Post on 10-Sep-2016

216 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Rasch analysis of three versions of the Oswestry Disability Questionnaire

ARTICLE IN PRESS

1356-689X/$ -

doi:10.1016/j.m

�Correspondfax: +61 3 947

E-mail add1Departmen

Musculoskelet

University, Vic

Manual Therapy 13 (2008) 222–231

www.elsevier.com/locate/math

Original article

Rasch analysis of three versions of the Oswestry DisabilityQuestionnaire

Megan Davidson�,1

School of Physiotherapy, La Trobe University, Vic. 3058, Australia

Received 11 October 2006; received in revised form 9 January 2007; accepted 17 January 2007

Abstract

The purpose of the study was to explore the construct validity of three versions of the Oswestry Disability Questionnaire

for low back pain using Rasch analysis. The three versions of the ODQ share 9 items and differ on one other. About 100 patients

with non-specific low back pain seeking physiotherapy treatment at hospital outpatient departments and physiotherapy

private practices completed the 12 Oswestry items as part of a battery of questionnaires. Rasch analysis revealed that four

items (Personal Care, Standing, Sex Life and Social Life) had disordered response thresholds and one item (Walking) showed

differential item functioning by age. The 10 standard Oswestry items and a modified version in which Sex Life is replaced

by Work/Housework showed adequate overall fit to the Rasch model (w2 P4.01). The third version, in which Sex Life is replaced

by Changing Degree of Pain, did not fit the model (w2 P ¼ .006) and the Changing Degree of Pain item was misfitting (residual 2.34,

P ¼ .007). These findings suggest that either of the first two of the three versions of this widely used low back pain outcome

measure should be selected over the third. Users should also be aware that for some items the rating scale steps do not perform as

intended.

r 2007 Elsevier Ltd. All rights reserved.

Keywords: Questionnaires; Validity; Low back pain; Rasch analysis

1. Introduction

The Oswestry Disability Questionnaire (ODQ) is oneof the oldest self-report questionnaires for measuringfunctional outcomes in patients with low back pain andremains widely used (Fairbank et al., 1980; Grotle et al.,2004). The ODQ was developed as a clinical assessmenttool that would provide an estimate of disabilityexpressed as a percentage score. Ten sections or itemsassess pain, personal care, lifting, walking, sitting,standing, sleeping, sex life, social life and travelling.The developers provided little detail on how the items

see front matter r 2007 Elsevier Ltd. All rights reserved.

ath.2007.01.008

ing author. Tel.: +61 3 9479 5798;

9 5766.

ress: [email protected].

t/Institution to which the work should be attributed:

al Research Centre, School of Physiotherapy, La Trobe

. 3086, Australia.

were selected, saying only that the activities chosen werethose most relevant to people with low back pain.

Each item of the ODQ has 6 response choicesarranged in order of difficulty and the respondent isasked to select the response ‘‘that most closely describesyou today’’. For example, the Sitting section responsesare I can sit in any chair as long as I like, I can only sit in

my favourite chair as long as I like, Pain prevents me

sitting more than 1 hr, Pain prevents me from sitting more

than 30 min, Pain prevents me from sitting more than

10 min and Pain prevents me from sitting at all. A scoreof 0 is awarded if the first response option is selected,through to 5 for the last option. A total score iscalculated by summing the individual items scores,dividing by the total possible score (adjusted if anyitems are missed) and multiplied by 100. The possiblescore range is 0–100 and a higher score indicatesgreater disability. The ODQ is therefore an atypical

Page 2: Rasch analysis of three versions of the Oswestry Disability Questionnaire

ARTICLE IN PRESSM. Davidson / Manual Therapy 13 (2008) 222–231 223

questionnaire because there is no consistent rating scaleused across all items: instead, each step of each item hasits own definition.

The ODQ was modified by Baker et al. (1989) whoremoved references to medication from the Pain andSleeping items, thereby improving the relevance ofthese items to people not taking medication. Davidsonand Keating (2002) further modified this version byreplacing miles with kilometres in the Walking section.A modified version sometimes called the Chiropracticversion (Hudson-Cook et al., 1989) replaced Sex Life

with a new item called Changing Degree of Pain.This version has been criticised for including a transi-tional rating, which is conceptually different fromthe other items that ask about pain intensity andactivity limitations (Fairbank and Pynsent, 2000). Morerecently, Fritz and Irrgang (2001) reported a versionthat replaced Sex Life with a new item called Employ-

ment/ Homemaking. This modification added anaspect of activity/participation that is otherwise absentfrom the ODQ. The developers recommend Version 2.0of the Oswestry (Fairbank and Pynsent, 2000), whichinstructs patients to answer the questions in relation tohow their back problem is affecting them ‘‘today’’,rather than the original instructions, which do notspecify a time-frame. Selection of any particular versionof the ODQ is at present based solely on preference forcontent and no studies have directly compared differentversions.

The aim of this study was to explore the constructvalidity of three versions of the ODQ using Raschanalysis. Rasch analysis is a useful tool for exploringthe validity of questionnaires that have been developedusing traditional methods. Developed by the Danishmathematician Georg Rasch (Rasch, 1960). Raschanalysis is a probabilistic model that tests theextent to which the observed pattern of responses fitsthe pattern expected by the model. Rasch analysiscalibrates person ability and item difficulty onto aninterval scale in units called logits (log-odds units).Because logits are interval units, Andrich (2004) arguesthat Rasch analysis ‘‘y provides an operationalcriterion for fundamental measurement of the kindfound in the physical sciences’’ (pI-12). The Raschmodel provides evidence of scale validity by determiningwhether data derived from questionnaires can be validlysummed, and if polytomous scoring categories work asintended. In addition Rasch analysis tests for invarianceof items across external group characteristics bydifferential item functioning (DIF) analysis. Thus, avigorous protocol is used to test what, in effect, is theinternal construct validity of the scale, includingunidimensionality, through a test of local independence.This analysis provides a strong diagnostic for the scaleand provides a mechanism for comparison of differentversions of a questionnaire.

2. Method

2.1. Sample and procedures

Consecutive eligible patients at 7 public hospitaloutpatient or community-based physiotherapy depart-ments and 9 physiotherapy private practices in Australiawere invited to participate. Ambulatory patients receiv-ing their first or second consultation for an episode oflow back pain were invited to participate. Eligiblepatients were those aged 18 years or older who were ableto read and write English. People with back pain relatedto pregnancy, traumatic injury or rheumatic diseaseswere excluded.

Participants completed a 12-item Oswestry (seeAppendix A) as part of a larger battery of question-naires and were mailed the questionnaires to complete asecond time 4 weeks later. The ODQ version by Bakeret al. (1989) was used for items 1–10 with metricdistances replacing imperial units in the Walking item.The final two items were the Changing Degree of Pain

item from the Hudson-Cook version (1989) and theWork/Housework item, which is an adaptation of theEmployment/Homemaking item from Fritz and Irrgang(2001). These changes were made to make the itemsmore culturally relevant for Australia.

Ethics approvals were gained from the Faculty ofHealth Sciences Human Ethics Committee of La TrobeUniversity and from the ethics committees of theparticipating clinics where such committees existed.

2.2. Analysis

For analysis, data for each of the three 10-itemversions of the ODQ were extracted from the 12-itemscompleted by respondents. Version 1 was the standard10-items of the ODQ which includes the section Sex

Life. Version 2 replaced Sex Life with a Work/House-

work section and Version 3 replaced Sex Life withChanging Degree of Pain.

Rasch analysis locates item difficulty and personability on a logit scale. A logit (log-odds unit) is thenatural logarithm of the odds of a person endorsing aparticular rating scale step in an item. Due to the scoringdirection of the ODQ, persons of higher ability anditems of greater difficulty are located on the negativeside of the logit scale, while persons of lower ability anditems of less difficulty are located on the positive side.There are a series of components to Rasch anaysis.

2.2.1. Threshold order

A threshold occurs where there is a transition betweenpossible response options. The threshold is reached whenthe likelihood of endorsing one level of the scale is thesame as the likelihood of endorsing the next level. Eachitem in the ODQ has 6 statements of increasing difficulty

Page 3: Rasch analysis of three versions of the Oswestry Disability Questionnaire

ARTICLE IN PRESSM. Davidson / Manual Therapy 13 (2008) 222–231224

and therefore 5 thresholds. Each threshold has a locationon the logit scale and each item has an average location.For each item one would expect that with decreasingability the probability of selecting each statement in turnwould increase in an ordered fashion from least to mostdifficult. In the Sitting item, for example, one wouldexpect the probability of selecting a particular responsewould increase from easiest (I can sit in any chair as long

as I like) to hardest (pain prevents me from sitting at all) ina step-wise manner with decreasing person ability. Raschanalysis allows us to identify whether the steps in theresponse scale attract this expected pattern of responsesor whether the thresholds are disordered, that is when theprobability of selecting each level does not rise in themanner predicted. Disordered thresholds can be a sourceof item misfit. In the analysis, if items had disorderedthresholds, response categories were collapsed until theordered thresholds were achieved and the items showedadequate fit.

2.2.2. Differential item functioning (DIF)

A useful questionnaire can be used with a broadspectrum of patients so it is important that the itemsfunction similarly for persons at the same level of ability.Some items may attract systematically different re-sponses on the basis of some characteristic other thanitem difficulty. DIF by gender and by age in thegroupings 18–44, 45–64 and 65-plus years was con-ducted to explore whether these characteristics had aconfounding effect on item responses. DIF by time (thefirst and second administrations of the test) was testedso that the responses at the two administrations could bepooled, provided no DIF by time was evident.(Changand Chan, 1995) For the DIF analysis the sample isdivided into three equal-sized groups or ‘‘class intervals’’classifying persons of low, medium and high ability.Uniform DIF is exhibited when there is a consistentdeviation of observed from expected responses across allclass intervals. Non-uniform DIF occurs when classintervals differ on their deviation from expected scores.

2.2.3. Overall fit and person separation

The extent to which the overall questionnaire data forthe class intervals fit the Rasch model is tested with a w2

statistic. The w2 probability values greater than thechosen alpha value indicates no significant deviation ofthe data from the model.

The Person Separation Index provides an indicationof how many groups or strata of ability the test candiscriminate amongst (Wright and Masters, 1982). Thehigher the reliability of person separation, the moregroups the test is able to detect. A reliability coefficientof .8 indicates that two groups can be identified, and .9four or more groups.2

2Separation ¼ (reliability/(1�reliability))0.5.

2.3. Unidimensionality

Rasch analysis examines the unidimensionality of thescale, that is, the extent to which all the items in a scaleare measuring the same underlying construct or latent-trait variable. Scale unidimensionality, or local indepen-dence, is a requirement of Rasch models. Item fitstatistics are an indicator of whether or not each itemcontributes to the measurement of a single underlyingconstruct. Fit residuals are the distance of the observeditem data from the expected data where a perfect fitwould result in a mean of zero and a standard deviationof one. Items with a residual greater than 72 areconsidered significantly misfitting (Masters and Keeves,1999). Items with large negative residual values indicatea high level of predictability in responses and thereforeinformation redundancy. Items with large positiveresidual values indicate an unacceptable level of ‘‘noise’’in the responses.

Item fit statistics are often the only indicator used toestablish scale unidimensionality. A second strategy is toformally test the assumption of local independence byexamining the principal components analysis (PCA)results of the Rasch analysis (Smith, 2002). If a set ofitems truly are unidimensional, then if a person takesany subset of items in the questionnaire their responsesshould provide the same person ability estimates as ifthey had taken the entire test. Two item subsets weredetermined by examining the first component of thePCA (after extraction of the Rasch component). Itemswith negative and positive loadings comprised the twosubsets of items, which were anchored to the originalitem locations. Anchoring equates the two tests bycalibrating them on the same logit-scale ‘‘ruler’’. Theperson ability locations from each subset of items werethen compared, using paired t-tests, with the locationsderived from the full set of items.

2.4. Targetting

Targetting refers to the extent to which the itemthreshold difficulties have adequately targeted theabilities of the persons in the sample. Poor targettingoccurs when item thresholds are clustered at certainpoints along the logit scale leaving large gaps, and wheremany respondents have a higher or lower ability thanthe most or least difficulty item threshold. Targetting isjudged by visual inspection of the distribution ofpersons and item thresholds on the logit scale.

Descriptive statistics were calculated using SPSS forwindows V11.0 and Rasch analysis using RUMM2020software.3 RUMM2020 applies the unrestricted PartialCredit Model. Formulae for Rasch models can be found

3RUMM2020, RUMM Laboratory Pty Ltd., 14 Dodonaea Court,

Duncraig W.A. 6023, Australia.

Page 4: Rasch analysis of three versions of the Oswestry Disability Questionnaire

ARTICLE IN PRESS

Table 1

Sample characteristics

Characteristic Count Mean (sd)

Age in years 52.69 (14.66)

range 19–80

Gender

Male 36

Female 64

Work status

Employed at usual job 31

M. Davidson / Manual Therapy 13 (2008) 222–231 225

in numerous publications (Andrich, 1978; Masters,1982; Wright and Masters 1982; Masters and Keeves,1999). Acceptable overall fit of the data to the modelwas set as P4.01 (item-trait interaction, w2 probability).Items were considered misfitting if fit residuals exceeded72.0 or w2 probability was o.01. As three personfactors (age, gender and time) were tested for twelveitems, DIF was considered significant if the w2 prob-ability was o.001. Difference in person locationsbetween item subsets and the full test was consideredsignificant if Po.01.

Light duty or restricted work 2

Paid leave/sick leave 5

Unpaid leave 0

Unemployed due to health problems 4

Unemployed due to other reason 3

Student 2

Keeping house/homemaker 19

Retired 24

On disability benefit 10

On compensation

Yes 7

No 92

Duration of current episode

Less than 6 weeks 31

6 weeks to 3 months 17

3–6 months 12

More than 6 months 40

Pain location

Back only 24

Refers to buttock, groin or thigh 40

Refers to leg below knee 35

Number of previous episodes

None 12

1–5 25

More than 5 63

Note that because n ¼ 100 the percentage is equal to the count.

Totals may not equal 100 due to missing data.

Table 2

Response frequency

Item Response category

0 1 2 3 4 5

1 Pain 5 26 39 21 7 1

2 Personal care 57 18 22 3 0 0

3 Lifting 9 18 14 36 18 2

4 Walking 34 22 26 15 2 0

5 Sitting 14 14 40 17 12 2

6 Standing 18 28 15 20 17 1

7 Sleeping 7 59 23 3 0 0

8 Sex life 20 25 1 11 9 2

9 Social life 23 14 23 29 2 1

10 Traveling 10 38 26 11 6 0

11 Work/housework 5 47 23 12 3 2

12 Changing degree of pain 3 23 26 29 11 0

Note that because n ¼ 100 the percentage is equal to the count.

Totals may not equal 100 due to missing data.

3. Results

One hundred patients completed the questionnairesinitially and 74 of these also returned the questionnairesa second time 4 weeks later. Sample characteristics(Table 1) reveal a predominantly female (64%)sample with ages ranging from 19 to 80 years. Fewindividuals were on sick leave (2%) or receivingcompensation (7%). The majority were either employed(31%), retired (24%) or homemakers (19%). There wasa high prevalence of recurrent back pain with 88%reporting they had experienced prior episodes of backpain.

Table 2 shows the response frequencies for each item.There was no DIF by time so the two questionnaire datasets totalling 174 were combined for further analysis.Initial Rasch analysis of the whole data set of 174 casesidentified 3 persons who either had not completed entirepages of the questionnaire or had scored zero on allitems of the test. The analysable sample was therefore171. Table 3 provides a comparison of the statistics forthe three versions of the Oswestry.

3.1. Threshold order

Four items had disordered thresholds. The first andsecond thresholds for item 9 Social Life and the secondand third thresholds for item 6 Standing were reversed.For item 2 Personal Care the first and second, andfourth and fifth thresholds were reversed. The thresholdsfor item 8 Sex Life were ordered 1,4,3,2,5. Orderedthresholds were achieved for Standing and Sex Life, bycombining response scores 2 and 3. For Personal Care

scores 1 and 2, and 3 and 4 were combined. For Social

Life scores 0 and 1, and 3 and 4 were combined. Fig. 1compares the regular, ordered thresholds of the Pain

item, with the disordered thresholds of the Personal

Care item. In the Pain item the most likely response fora person of high ability located at �4 logits, is 0, the bestpossible score. As person ability decreases there is astep-wise change in the most probable response from 0to 5. The most likely response for a person of low abilityat 4 logits, is 5, the worst possible score. In the Personal

Page 5: Rasch analysis of three versions of the Oswestry Disability Questionnaire

ARTICLE IN PRESS

Table 3

Comparison of fit statistics and DIF for 3 versions of the Oswestry Disability Questionnaire

Version 1 Version 2 Version 3

Standard Work/housework Changing degree of pain

Total item w2 36.39 (P ¼ .014) 22.19 (P ¼ .329) 39.43 (P ¼ .006)

Person separation 0.87 0.88 0.87

Item fit residuals None 472 None 472 Item 12 changing degree of

pain ¼ 2.34

Item fit w2 All items P4.01 All items P4.01 Item 12 changing degree of pain

P ¼ .007

DIF w2 Item 4 Walking: uniform DIF by age

(P ¼ .00004)

Item 4 Walking: uniform DIF by age

(P ¼ .00006)

Item 4 Walking: uniform DIF by age

(P ¼ .00008)

Total item w2 Po.01 indicates poor overall fit.

Item fit w2 Po.01 indicates poor item fit.

DIF w2 Po.001 indicates deviation of observed from expected values.

Fig. 1. Threshold probability curves for item 1 pain and 2 personal care.

M. Davidson / Manual Therapy 13 (2008) 222–231226

Care item, however, respondents are not using theavailable responses in a consistent manner: responses 1and 4 are never the most probable choices no matterwhat the person’s ability level.

3.2. Differential item functioning (DIF)

In all three versions of the Oswestry item 4 Walking

consistently exhibited uniform DIF by age (Table 3).

Page 6: Rasch analysis of three versions of the Oswestry Disability Questionnaire

ARTICLE IN PRESSM. Davidson / Manual Therapy 13 (2008) 222–231 227

The 65 and over age group had an observed score higherthan expected. No other items had uniform or non-uniform DIF by age or gender (P4.001).

3.3. Fit

Table 3 shows that while Version 1 (Standard)and Version 2 (Work/Housework) showed adequateoverall fit to the Rasch model (P4.01), Version 3(Changing Degree of Pain) did not (w2 P ¼ .006).Person separation for all three versions indicatesthat at least 3 strata of person ability can bedifferentiated.

3.4. Unidimensionality

The item Changing Degree of Pain in Version 3 wasmisfitting (residual 2.34, w2 9.82, df2, P ¼ .007). Theitem fit statistics for the two other versions indicatethat each set of items form a unidimensional scale.Paired t-tests of person locations from the fullitem set compared with the item subsets revealedno significant difference for any of the Oswestry versions(Table 4).

Table 4

Test of person locations for anchored item subsets compared to the full set

Subset 1 cf total item set

Mean difference 95% CI

Version 1 Standard �.013 �.109–.08

Version 2 Work/housework �.012 �.096–.07

Version 3 changing degree of pain �.065 �.159–.03

Note: subset 1 comprised items 3,4,6,9 and subset 2 1,2,5,7,10 for all three ver

subset 2 for version 3.

Table 5

Item logit locations (from easiest to hardest) for the 3 Oswestry versions

Item Version 1

Standard

Location (SE)

7 Sleeping 2.505 (0.136)

2 Personal care 2.465 (0.172)

4 Walking 0.309 (0.095)

9 Social life �0.023 (0.119)

10 Traveling �0.460 (0.103)

11 Work/housework

1 Pain �0.657 (0.103)

8 Sex life �0.703 (0.125)

6 Standing �0.881 (0.103)

12 Changing degree of pain

5 Sitting �1.207 (0.087)

3 Lifting �1.347 (0.087)

3.5. Targetting

Average item locations for the three versions areshown in Table 5. The threshold map for Version 1(Standard Oswestry) (Fig. 2) shows the match of itemthreshold difficulty, on the lower part of the graph, toperson ability on the upper part, on a common logit-scale. Person ability and item difficulty move fromhighest (the negative side of the logit scale) to lowest(the positive side of the logit scale). The threshold locatedfurthest to the left of the scale is the statement I can lift

heavy weights without extra pain. The threshold locatedfurthest to the right of the scale is the statement Pain

prevents me from sleeping at all. Note that the thresholdmaps for the other two versions are not shown here asthere is little difference in targeting between the versions.

4. Discussion

This study directly compared three versions of theOswestry Disability Questionnaire, which was achievedby administration of a test version containing the10-item Oswestry plus two additional items contained

Subset 2 cf total item set

P Mean difference 95% CI P

3 .787 �.029 �.112–.054 .489

1 .770 .0002 �.094–.094 .997

.180 .014 �.059–.088 .705

sion. Item 11 was part of subset 1 for version 2, and item 12 was part of

Version 2 Version 3

Work/housework Changing degree of pain

Location (SE) Location (SE)

2.390 (0.135) 2.423 (0.133)

2.475 (0.172) 2.440 (0.168)

0.338 (0.094) 0.318 (0.091)

�0.032 (0.118) �0.027 (0.115)

�0.439 (0.103) �0.439 (0.100)

�0.638 (0.110)

�0.657 (0.102) �0.650 (0.100)

�0.876 (0.103) �0.820 (0.100)

�0.826 (0.094)

�1.198 (0.088) �1.148 (0.085)

�1.361 (0.087) �1.272 (0.084)

Page 7: Rasch analysis of three versions of the Oswestry Disability Questionnaire

ARTICLE IN PRESS

Fig. 2. Threshold Map Oswestry Disability Questionnaire. Note: items of greater difficulty and persons of greater ability are located to the left side of

the logit scale; items of lesser difficulty and persons of lesser ability are located to the right side of the logit scale.

M. Davidson / Manual Therapy 13 (2008) 222–231228

in two modified Oswestry versions. Including all 12items in the test version allowed all three versions to beextracted.

The version that replaces Sex Life with Work/

Housework had the best overall fit to the model(P ¼ .329) and the standard Oswestry showed adequatefit to the Rasch model (P ¼ .014), while the version thatreplaces Sex Life with Changing Degree of Pain did not(P ¼ .006). The results confirm that the item Changing

Degree of Pain does not belong with the other items andis measuring a different underlying construct to theother items.

Testing that sub-sets of items provide an equivalentestimate of person ability to the entire set of items providesa robust indication of whether departures from unidimen-sionality significantly distort estimates of person location.Despite the presence of one misfitting item and overallpoor fit to the model, Version 3 showed that the deviationfrom unidimensionality did not result in significantlydifferent person location estimates calculated for twoanchored subsets of items. It remains to be demonstratedhow robust the estimates of person location are and theextent of departure from unidimensionality that can betolerated before significant deviations occur.

Two previous studies that have used Rasch analysisto examine the ODQ reported that the Pain item didnot fit the model (Page et al., 2002; White and Velozo,2002). That the current study did not find this itemmisfitting may reflect the different wording of the item inthe versions administered. The version administeredin the current study asks only about pain intensity, whilethe two previous studies have administered a versionthat relates pain to analgaesic medication. If the content

of the ODQ is mapped to the WHO InternationalClassification of Functioning the pain item is a measureof impairment while the other items reflect activitylimitations (WHO, 2001). However, these items all relateactivity limitation to pain, and the pain item has beenshown to have a linear relationship with the other items(Fairbank et al., 1980).

The existence of disordered thresholds for Personal

Care, Standing, Sex Life and Social Life is evidence that, atleast for these items, the response options do not performas intended. White and Velozo (2002) and Page et al.(2002) both proposed modified versions of the Oswestry inwhich the Pain item is deleted and response levels 2 and 3,and 4 and 5 for all items are combined, reducing thenumber of response options from 6 to 4. Neither studyreports which individual items had disordered thresholds.Citation tracking has failed to find any subsequent studiesthat have administered or further tested either of theseversions. Due to the low frequency of responses toresponse options 4 and 5 in an ambulatory population(Table 2) there is some merit in suggesting a reduction inresponse options at the upper end of the scale.

Item 4 Walking displayed DIF by age in all threeversions. On this item, persons in the 65-plus age group,at the same level of ability as the younger groups, hadhigher (worse) scores than expected. This indicates thatsomething other than the difficulty of walking as anactivity is influencing older persons’ responses to thisitem. Fear of falling and various sociodemographicvariables have been reported to be associated withreduced mobility in elderly persons (Arfken et al., 1994;Tinetti et al., 1994; Simonsick et al., 1999; Murphy et al.,2002). Neither of the previous Rasch studies reported if

Page 8: Rasch analysis of three versions of the Oswestry Disability Questionnaire

ARTICLE IN PRESSM. Davidson / Manual Therapy 13 (2008) 222–231 229

they examined DIF (Page et al., 2002; White andVelozo, 2002).

The Oswestry item thresholds for the persons in thissample are a reasonable match in that there are thresholdsfor all persons except for a small number of persons ofvery high ability (Fig. 2). This reflects the fact that totalOswestry scores in an ambulatory population are oftenskewed toward the lower (better functioning) end of thescale, with few persons scoring in the top 1/5th of theavailable total score range. This is because the highestresponses options of some items are rarely or neverselected (Table 2). Some gaps are evident in the itemdifficulty threshold placement on the logit scale on the farright (lower functioning) end of the scale. Ideally, itemthresholds should be evenly spread along the logit scale.

Although two versions showed adequate overall fit tothe Rasch model, the problems of disordered thresholdsfor some items, DIF for the Walking item, and gaps intargeting are typical of the limitations of ordinal scalesdesigned using classical test theory and which are onlyrevealed using Rasch anlysis.

A limitation of the study is that the number of eligiblepatients who were not invited or who refused toparticipate in the study is unknown. It is also notknown the extent to which the sample is representativeof ambulatory patients seeking physiotherapy treatmentfor low back pain, as there is no data available todescribe this population. However, participants wererecruited from a number of private and public agenciesin both metropolitan and rural settings and this wouldmaximise the likelihood that the sample is representa-tive. As the data were collected from ambulatorypatients with low back pain no generalisations can bemade to non-ambulatory or admitted patients.

Table A1

A 12-item test version of the Oswestry Disability Questionnaire

Item 1: pain intensity

& I have no pain at the moment

& The pain is very mild at the moment

& The pain is moderate at the moment

& The pain is fairly severe at the moment

& The pain is very severe at the moment

& The pain is the worst imaginable at the moment

Item 2: personal care (washing, dressing, etc.)

& I can look after myself normally without causing extra pain

& I can look after myself normally but it is very painful

& It is painful to look after myself and I am slow and careful

& I need some help but manage most of my personal care

& I need help every day in most aspects of self-care

& I do not get dressed, wash with difficulty and stay in bed

Item 3: lifting

& I can lift heavy weights without extra pain

& I can lift heavy weights but it gives extra pain

& Pain prevents me lifting heavy weights off the floor but I can manage if

& Pain prevents me lifting heavy weights but I can manage light to medium

& I can only lift very light weights

& I cannot lift or carry anything

5. Conclusion

The standard version of the Oswestry and the versionthat replaces Sex Life with Work/Housework both formunidimensional scales in which all items are measuring asingle underlying variable. The item Changing Degree of

Pain that replaces Sex Life in the Hudson–Cook versiondoes not measure the same underlying construct as theother items. These findings suggest that either of the firsttwo of the three versions of this widely used low backpain outcome measure should be selected over the third.Users should also be aware that for some items therating scale steps do not perform as intended.

Acknowledgements

Professor Alan Tennant, Academic Unit of Muscu-loskeletal & Rehabilitation Medicine, University ofLeeds provided advice on Rasch analysis.

Appendix A

This questionnaire has been designed to give usinformation as to how your back or leg pain hasaffected your ability to manage in everyday life. Pleaseanswer by checking one box in each item for thestatement which best applies to you. We realise youmay consider that two of the statements in any one itemrelate to you, but please just mark the box for thestatement that most clearly describes your problem(Table A1 here).

they are conveniently positioned, e.g. on a table

weights if they are conveniently positioned

Page 9: Rasch analysis of three versions of the Oswestry Disability Questionnaire

ARTICLE IN PRESS

Table A1 (continued )

Item 4: walking

& Pain does not prevent me walking any distance

& Pain prevents me from walking more than 2 km

& Pain prevents me from walking more than 1 km

& Pain prevents me from walking more than 500m

& I can only walk using a stick, crutches or other support

& I am unable to walk at all

Item 5: sitting

& I can sit in any chair as long as I like

& I can only sit in my favourite chair as long as I like

& Pain prevents me sitting for more than 1 h

& Pain prevents me from sitting for more than 30min

& Pain prevents me from sitting more than 10min

& Pain prevents me from sitting at all

Item 6: standing

& I can stand as long as I want without extra pain

& I can stand as long as I want but it gives me extra pain

& Pain prevents me from standing for more than 1 h

& Pain prevents me from standing for more than 30min

& Pain prevents me from standing for more than 10min

& Pain prevents me from standing at all

Item 7: sleeping

& My sleep is never disturbed by pain

& My sleep is occasionally disturbed by pain

& Because of pain I have less than 6 h sleep

& Because of pain I have less than 4 h sleep

& Because of pain I have less than 2 h sleep

& Pain prevents me from sleeping at all

Item 8: sex life (if applicable)

& My sex life is normal and causes no extra pain

& My sex life is normal but causes some extra pain

& My sex life is nearly normal but is very painful

& My sex life is severely restricted by pain

& My sex life is nearly absent because of pain

& Pain prevents any sex life at all

Item 9: social life

& My social life is normal and gives me no extra pain

& My social life is normal but increases the degree of pain

& Pain has no significant effect on my social life apart from limiting my more energetic interests e.g. sport, etc.

& Pain has restricted my social life and I do not go out as often

& Pain has restricted my social life to my home

& I have no social life because of pain

Item 10: travelling

& I can travel anywhere without pain

& I can travel anywhere but it gives extra pain

& Pain is bad but I manage journeys over 2 h

& Pain restricts me to journeys of less than 1 h

& Pain restricts me to short necessary journeys under 30min

& Pain prevents me from travelling except to receive treatment

Item 11: work/housework

& My normal work/housework does not cause pain

& My normal work/housework increase my pain, but I can still perform all that is required of me

& I can perform most of my work/housework, but pain prevents me from performing more physically demanding activities (e.g., lifting, vacuuming)

& Pain prevents me from doing anything but light work/housework

& Pain prevents me from doing even light work/housework

& Pain prevents me from performing any work/housework

Item 12: changing degree of pain

& My pain is rapidly getting better

& My pain fluctuates but overall is definitely getting better

& My pain seems to be getting better but improvement is slow at present

& My pain is neither getting better or worse

& My pain is gradually worsening

& My pain is rapidly worsening

M. Davidson / Manual Therapy 13 (2008) 222–231230

Page 10: Rasch analysis of three versions of the Oswestry Disability Questionnaire

ARTICLE IN PRESSM. Davidson / Manual Therapy 13 (2008) 222–231 231

References

Andrich D. A rating formulation for ordered response categories.

Psychometrika 1978;43(4):561–73.

Andrich D. Controversy and the Rasch model: a characteristic of

incompatible paradigms? Medical Care 2004;42(Suppl 1):I7–I16.

Arfken CL, Lach HW, Birge SJ, Miller JP. The prevalence and

correlates of fear of falling in elderly persons living in the

community. American Journal of Public Health 1994;84(4):565–70.

Baker DJ, Pynsent PB, Fairbank JCT. The Oswestry Disability Index

revisited: its reliability, repeatability and validity, and a comparison

with the St Thomas Disability Index. In: Roland M, Jenner JR,

editors. Back pain: new approaches to rehabilitation and educa-

tion. Manchester: Manchester University Press; 1989. p. 174–86.

Chang WC, Chan C. Rasch analysis for outcomes measures: some

methodological considerations. Archives of Physical Medicine and

Rehabilitation 1995;76(10):934–9.

Davidson M, Keating JL. A comparison of five low back disability

questionnaires: reliability and responsiveness. Physical Therapy

2002;82(1):8–24.

Fairbank JC, Pynsent PB. The Oswestry Disability Index. Spine

2000;25(22):2940–52.

Fairbank JC, Couper J, Davies JB, O’Brien JP. The Oswestry low back

pain disability questionnaire. Physiotherapy 1980;66(8):271–3.

Fritz JM, Irrgang JJ. A comparison of a modified Oswestry Low Back

Pain Disability Questionnaire and the Quebec Back Pain Disability

Scale. Physical Therapy 2001;81(2):776–88.

Grotle M, Brox JI, Vollestad NK. Functional status and disability

questionnaires: what do they assess? Spine 2004;30(1):130–40.

Hudson-Cook N, Tomes-Nicholson K, Breen A. A revised Oswestry

Disability Questionnaire. In: Roland M, Jenner JR, editors. Back

pain: new approaches to rehabilitation and education. Manchester:

Manchester University Press; 1989. p. 187–204.

Masters GN. A Rasch model for partial credit scoring. Psychometrika

1982;47(2):149–74.

Masters GN, Keeves JP. Advances in measurement in educational

research and assessment. Amsterdam: Pergamon; 1999.

Murphy SL, Williams CS, Gill TM. Characteristics associated with

fear of falling and activity restriction in community-living

older persons. Journal of the American Geriatric Society 2002;

50(3):516–20.

Page SJ, Shawaryn MA, Cernich AN, Linacre JM. Scaling

of the Revised Oswestry Low Back Pain Questionnaire.

Archives of Physical Medicine and Rehabilitation 2002;83(11):

1579–84.

Rasch G. Probabilistic models for some intelligence and attainment

tests. Copenhagen: Danmarks Paedogogiske Institute; 1960.

Simonsick EM, Guralnik JM, Fried LP. Who walks? Factors

associated with walking behavior in disabled older women with

and without self-reported walking difficulty. Journal of the

American Geriatrics Society 1999;47(6):672–80.

Smith EV. Detecting and evaluating the impact of multidimen-

sionality using item fit statistics and principal component ana-

lysis of residuals. Journal of Applied Measurement 2002;3(2):

205–31.

Tinetti ME, Mendes de Leon CF, Doucette JT, Maker DI. Fear of

falling and fall-related efficacy in relationship to functioning

among community-living elders. Journal of Gerontology 1994;

49(3):M140–7.

White LJ, Velozo CA. The use of Rasch measurement to improve the

Oswestry classification scheme. Archives of Physical Medicine and

Rehabilitation 2002;83(6):822–31.

WHO. International classification of functioning, disability and health.

Geneva: World Health Organization; 2001.

Wright BD, Masters GN. Rating scale analysis. Chicago: Mesa Press;

1982.