reliability, validity, and responsiveness of four knee...

12
The PDF of the article you requested follows this cover page. This is an enhanced PDF from The Journal of Bone and Joint Surgery 83:1459-1469, 2001. J Bone Joint Surg Am. Riley J. Williams, Russell F. Warren and Thomas L. Wickiewicz Robert G. Marx, Edward C. Jones, Answorth A. Allen, David W. Altchek, Stephen J. O'Brien, Scott A. Rodeo, for Athletic Patients Reliability, Validity, and Responsiveness of Four Knee Outcome Scales This information is current as of February 25, 2005 Subject Collections (120 articles) Outcomes (283 articles) Knee (99 articles) Sports (418 articles) Adult Trauma (1077 articles) Adult Disease Articles on similar topics can be found in the following collections Reprints and Permissions Permissions] link. and click on the [Reprints and jbjs.org article, or locate the article citation on to use material from this order reprints or request permission Click here to Publisher Information www.jbjs.org 20 Pickering Street, Needham, MA 02492-3157 The Journal of Bone and Joint Surgery on February 25, 2005 www.ejbjs.org Downloaded from

Upload: buiminh

Post on 15-Mar-2018

215 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Reliability, Validity, and Responsiveness of Four Knee ...drrobertmarx.com/wordpress/wp-content/themes/dr-marx-2015/articles/... · the journal of bone & joint surgery · jbjs.org

The PDF of the article you requested follows this cover page.  

This is an enhanced PDF from The Journal of Bone and Joint Surgery

83:1459-1469, 2001. J Bone Joint Surg Am.Riley J. Williams, Russell F. Warren and Thomas L. Wickiewicz   Robert G. Marx, Edward C. Jones, Answorth A. Allen, David W. Altchek, Stephen J. O'Brien, Scott A. Rodeo, 

for Athletic PatientsReliability, Validity, and Responsiveness of Four Knee Outcome Scales

This information is current as of February 25, 2005

Subject Collections

(120 articles) Outcomes • (283 articles) Knee • (99 articles) Sports •

(418 articles) Adult Trauma • (1077 articles) Adult Disease •

  Articles on similar topics can be found in the following collections

Reprints and Permissions

Permissions] link. and click on the [Reprints andjbjs.orgarticle, or locate the article citation on

to use material from thisorder reprints or request permissionClick here to

Publisher Information

www.jbjs.org20 Pickering Street, Needham, MA 02492-3157The Journal of Bone and Joint Surgery

on February 25, 2005 www.ejbjs.orgDownloaded from

Page 2: Reliability, Validity, and Responsiveness of Four Knee ...drrobertmarx.com/wordpress/wp-content/themes/dr-marx-2015/articles/... · the journal of bone & joint surgery · jbjs.org

COPYRIGHT © 2001 BY THE JOURNAL OF BONE AND JOINT SURGERY, INCORPORATED

Reliability, Validity,and Responsiveness of Four

Knee Outcome Scales forAthletic Patients

BY ROBERT G. MARX, MD, MSC, FRCS(C), EDWARD C. JONES, MD, MA, ANSWORTH A. ALLEN, MD,DAVID W. ALTCHEK, MD, STEPHEN J. O’BRIEN, MD, SCOTT A. RODEO, MD, RILEY J. WILLIAMS, MD,

RUSSELL F. WARREN, MD, AND THOMAS L. WICKIEWICZ, MD

Investigation performed at the Center for Clinical Outcome Research and the Sports Medicine and Shoulder Service,Hospital for Special Surgery, New York, NY

Background: Many patient-based knee-rating scales are available for the evaluation of athletic patients. However,there is little information on the measurement properties of these instruments and therefore no evidence to sup-port the use of one questionnaire rather than another. The goal of the present study was to determine the reliabil-ity, validity, and responsiveness of four knee-rating scales commonly used for the evaluation of athletic patients:the Lysholm scale, the subjective components of the Cincinnati knee-rating system, the American Academy of Or-thopaedic Surgeons sports knee-rating scale, and the Activities of Daily Living scale of the Knee Outcome Survey.Methods: All patients in the study had a disorder of the knee and were active in sports (a Tegner score of ≥4points). Forty-one patients who had a knee disorder that had stabilized and who were not receiving treatment wereadministered all four questionnaires at baseline and again at a mean of 5.2 days (range, two to fourteen days)later to test reliability. Forty-two patients were administered the scales at baseline and at a minimum of threemonths after treatment to test responsiveness. The responses of 133 patients at baseline were studied to testconstruct validity. Results: The reliability was high for all scales, with the intraclass correlation coefficient ranging from 0.88 to0.95. As for construct validity, the correlations among the knee scales ranged from 0.70 to 0.85 and those be-tween the knee scales and the physical component scale of the Short Form-36 (SF-36) and the patient and clini-cian severity ratings ranged from 0.59 to 0.77. Responsiveness, measured with the standardized responsemean, ranged from 0.8 for the Cincinnati knee-rating system to 1.1 for the Activities of Daily Living scale.Conclusions: All four scales satisfied our criteria for reliability, validity, and responsiveness, and all are accept-able for use in clinical research.

linical research studies are placing an increased em-phasis on the perspective of the patient with use ofhealth-related quality-of-life instruments. Many patient-

based knee-rating scales have been developed for the evalua-tion of athletic patients1. There is little information on themeasurement properties of the existing scales and therefore noevidence to support the use of one questionnaire rather thananother. Furthermore, without evidence to support the reli-ability, validity, and responsiveness of these instruments, theresults of clinical studies that are based on these instrumentsare in question.

While instruments that have been used to evaluate pa-tients following knee replacement, shoulder surgery, ortrauma have been studied in detail2-7, relatively little work hasbeen done to assess or compare knee instruments for the eval-uation of athletic patients8.

Previous work in this area has been cross-sectional indesign and therefore has not assessed reliability or how the in-struments performed over time (that is, responsiveness)1,8-11.Furthermore, those investigations focused exclusively on pa-tients with anterior cruciate ligament deficiency, substantiallylimiting the generalizability of the results. The goal of thepresent study was to determine the reliability, validity, and re-sponsiveness of the scales used to measure disability andsymptoms in athletic patients with a wide variety of disordersof the knee.

Materials and Methodsen orthopaedic surgeons who specialized in sports-relatedproblems of the knee were polled to determine which

questionnaires were the most sound and widely used andtherefore the most appropriate for study. The goal was to

C

T

Page 3: Reliability, Validity, and Responsiveness of Four Knee ...drrobertmarx.com/wordpress/wp-content/themes/dr-marx-2015/articles/... · the journal of bone & joint surgery · jbjs.org

TH E JO U R NA L OF BONE & JOINT SURGER Y · JBJS .ORG

VO LU M E 83-A · NUMB ER 10 · OC TOB ER 2001RE LIA BILI T Y, VA L I D I T Y, A N D RESPONSIVENESS OF FOUR

KNEE OUTCOME SC A LES F OR AT H L E T I C PA TIENTS

study questionnaires that measure disabilities rather than im-pairments. Disabilities are restrictions or the lack of an abilityto perform an activity in the usual manner, such as an inabil-ity to run, walk, or play a sport12. Impairments are defined as aloss or abnormality of psychological, physiological, or ana-tomical structure or function at the organ level, such as a de-crease in range of motion or an increase in translation of ajoint12. Impairments are important to patients and to sur-geons; however, they have less impact on patients’ quality oflife than do disabilities.

The Lysholm scale13, the American Academy of Ortho-paedic Surgeons sports knee-rating scale14, the Activities ofDaily Living scale of the Knee Outcome Survey15, and the Cin-cinnati knee-rating system10 were studied. The Cincinnatiknee-rating system has been widely used; however, the scalehas multiple subscores, including some based purely onimpairment10. While all components of this scale are relevantto both clinicians and patients, for the purposes of this re-search we elected to study only the sections relating to disabil-ity. We also administered version 1.0 of the Short Form-36(SF-36) for the validity analyses. The SF-36 is a thirty-six-itemquestionnaire that measures general health16-18. Its use has beenencouraged in conjunction with knee-specific instruments forstudies of patients with an injury of the anterior cruciateligament19. Both a physical component scale and a mentalcomponent scale can be derived from the SF-36. Other instru-ments were not evaluated because they had been recentlydeveloped20 or they focused on impairment21,22.

The present study was not a direct statistical compari-son of the reliability, validity, and responsiveness of the fourknee-rating scales. The goal of the study was to determinewhether the measurement properties of each were satisfactoryfor use in clinical research.

Patient RecruitmentThe study was approved by the Institutional Review Board,and all subjects gave informed consent. Patients in the waitingrooms of orthopaedic surgeons who specialized in disordersof the knee were recruited for the study during a six-monthperiod. The patients completed the questionnaires in thewaiting room prior to seeing their physician. Patients whowere seeing the physician for an initial consultation becauseof a knee disorder were entered into the responsiveness armof the study (that is, the arm measuring sensitivity to clinicalchange), as described below. These patients were retested ata minimum of three months later, following either opera-tive or nonoperative treatment. Patients who were in a clin-ically stable state were entered into the reliability arm of thestudy. These individuals were retested within two days to twoweeks23,24.

A wide variety of diagnoses was sought to test a widespectrum of severity and to ensure generalizability to the vari-ous conditions that affect the knee in athletic patients.

The inclusion criteria were: (1) an ability to read andwrite English, and (2) the presence of a primary disorder ofthe knee, including but not limited to patellofemoral disorders

(including chondromalacia, patellar dislocation, and patellartendinitis), instability (acute ligament injury or chronic insta-bility), meniscal injury, or osteochondritis dissecans.

The exclusion criteria were: (1) an inflammatory jointdisease, tumor, or infection, and (2) a Tegner activity rating(prior to injury if the patient had sustained a recent injury) of≤3 points25.

The Tegner scale rates patients from 0 to 10 points onthe basis of their activity level and sports participation25. Pa-tients who were not participating in high-demand sports (thatis, an activity rating of ≤3 points, indicating that they did notrun or participate in sports except for swimming) were ex-cluded25. We did not exclude patients on the basis of age alone,although patients with a Tegner score of ≥4 points tended tobelong to the age-group of most patients seen in orthopaedicsports-medicine practices.

Test-Retest ReliabilityThe reliability portion of the study involved patients whosecondition was stable and was not expected to change prior tothe second administration of the questionnaire. These pa-tients did not receive treatment in the interval (range, two tofourteen days) between the first and the second administra-tion of the questionnaires. Patients were excluded if they hadhad surgery or a traumatic injury in the preceding threemonths.

When the patients completed the questionnaires for thesecond time, they also completed a transitional index in whichthey rated the severity of the knee condition at that time com-pared with the severity when the questionnaires were firstadministered23. They chose from seven responses: “muchworse,” “somewhat worse,” “a little worse,” “no change,” “alittle better,” “somewhat better,” and “much better”. Only pa-tients who responded “no change” were included in the reli-ability study. The intraclass correlation coefficient and thelimits-of-agreement statistic26-28 were used to compare thescores23. The intraclass correlation coefficient is an index ofconcordance for dimensional measurements, ranging between0 and 1, where ≥0.75 is adequate for patients enrolled in aclinical trial29. The limits-of-agreement statistic was also usedas a descriptive measure of agreement. This value is the meandifference (and two standard deviations) between the twotests27. Ninety-five percent of the differences between the twotest administrations will lie within this interval27.

ValidityValidity is an assessment of whether the instrument actuallymeasures what it is intended to measure. A scale is consideredto have face validity when its qualitative attributes are deemedto be adequate by individuals with experience in the field30.Content validity is the appraisal of the underlying compo-nents of the scale30. These two forms of validity were assessedby five orthopaedic surgeons with experience in the field ofsports medicine. The experts were not asked to quantify thesetypes of validity but rather to ensure that the instruments werevalid from these points of view in order to make certain that

Page 4: Reliability, Validity, and Responsiveness of Four Knee ...drrobertmarx.com/wordpress/wp-content/themes/dr-marx-2015/articles/... · the journal of bone & joint surgery · jbjs.org

TH E JO U R NA L OF BONE & JOINT SURGER Y · JBJS .ORG

VO LU M E 83-A · NUMB ER 10 · OC TOB ER 2001RE LIA BILI T Y, VA L I D I T Y, A N D RESPONSIVENESS OF FOUR

KNEE OUTCOME SC A LES F OR AT H L E T I C PA TIENTS

clinicians would be comfortable using them for the evaluationof athletic patients with disorders of the knee.

Validation is relatively clear-cut when there is a goldstandard against which the results can be compared5. In caseswhere there is no gold standard, such as quality of life, we areforced to use “construct validation.” Construct validity ispresent when the instrument performs as expected in relationto another measurement. We used the patients’ and clinicians’opinions of severity as well as the other knee-rating scales andthe physical component scale of the Short Form-36 (SF-36) asindices to determine construct validity. The patients wereasked to rate their condition on a 5-point scale as either “verymildly bothersome,” “mildly bothersome,” “moderately both-ersome,” “severely bothersome,” or “very severely bother-some.”31 Similarly, clinicians were asked to rate the severity ofthe patient’s knee problem as “very mild,” “mild,” “moder-ate,” “severe,” or “very severe.”

Six hypotheses were proposed to assess the construct va-lidity of the four knee-specific instruments:

1. Since the knee scales are more specific for abnormali-ties of the knee, we hypothesized that these instruments wouldall correlate better with each other than they would with thephysical component scale or the mental component scale(Spearman correlation coefficient) and that the physical com-ponent scale would correlate more strongly with the kneescales, because it is more specific for physical function, thanwould the mental component scale.

2. Since the knee scales are more specific for disordersinvolving the knee, we hypothesized that the knee-ratingscales would correlate better with each other (Spearman cor-relation coefficient) than they would with any of the eight SF-36 subscales.

3. Since certain subscales of the SF-36 are more relatedto symptoms and disabilities experienced by patients withknee disorders, we hypothesized that the knee scales wouldcorrelate better with physical function and role-physical(Spearman correlation coefficient) than they would with vital-ity or social function and that they would correlate better withgeneral health, bodily pain, vitality, and social function thanthey would with role-emotional or mental health.

4. Since patient-rated and clinician-rated severity shouldapproximate knee symptoms and disability, we hypothe-sized that the knee scales would be significantly correlated withclinician-rated and patient-rated severity (Spearman correla-tion coefficient ≥ 0.6, and p ≤ 0.05).

5. Since patient-rated and clinician-rated severity shouldapproximate knee symptoms and disability, we hypothesizedthat there would be a difference in the mean scores on the knee-specific instruments for patients who had different patient-rated severity scores as well as for those who had differentclinician-rated severity scores; we thought that the instru-ments would differentiate between at least two of the levelsof severity 2 (determined by analysis of variance and the Tukeypost hoc honestly significant difference test).

6. Since we included a broad range of diagnoses of vary-ing severity, we hypothesized that there would be no ceiling or

floor effects. Ceiling and floor effects have been defined asone-third of the patients receiving the highest or lowest possi-ble score, respectively32. For greater sensitivity, we defined ceil-ing and floor effects as one-third of the patients receiving thehighest or lowest 10% of the possible scores.

ResponsivenessPatients who were expected to have improvement becauseof the nature of the diagnosis and the proposed treatmentwere entered into the responsiveness arm of the study33.These patients all had conditions that are known to be suc-cessfully treatable in the majority of cases. They were reas-sessed at a minimum of three months following the initialevaluation. Patients who underwent a reconstructive proce-dure had a follow-up evaluation at a minimum of six months.Different durations of follow-up were needed because of thedifferent treatments involved. For example, reconstructionof the anterior cruciate ligament requires a longer follow-upto achieve a change in health status than arthroscopic menis-cectomy does.

In order to be able to detect that a true difference had oc-curred, patients were asked to rate their condition as “muchworse,” “somewhat worse,” “a little worse,” “no change,” “a lit-tle better,” “somewhat better,” and “much better.”34 Patientswho responded “a little better,” “somewhat better,” or “muchbetter” were included in the study of the responsiveness of theinstruments, while patients who responded that they were thesame or worse were excluded from the responsiveness testing35.We included only patients who stated that their condition hadimproved in order to be certain that the improvement that webelieved that we were measuring had occurred.

Many statistics are available to determine responsive-ness36,37. We elected to use the standardized response mean (theobserved change divided by the standard deviation of change)because it has been used widely in previous orthopaedic re-search5-7 and it incorporates the response variance, allowingstatistical testing of the response means38. Standardized re-sponse means for validated orthopaedic instruments haveranged from 0.9 to 1.95,6,37.

The InstrumentsThe modified Lysholm scale, as described by Tegner andLysholm25, is an eight-item questionnaire that was originallydesigned to evaluate patients following knee ligament sur-gery13. It is scored on a 100-point scale, with 25 points for kneestability, 25 points for pain, 15 points for locking, 10 pointseach for swelling and stair-climbing, and 5 points each forlimp, use of a support, and squatting25. This scale has beenused extensively in clinical research studies19,39-41.

The first version of the Cincinnati knee-rating systemwas published in 1983, with additional modifications devel-oped for occupational activities, athletic activities, symptoms,and functional limitations in sports and daily activities42,43. Thesystem has eleven components, including physical exami-nation, laxity of the knee based on instrumented testing, andradiographic evidence of degenerative joint disease32. We eval-

Page 5: Reliability, Validity, and Responsiveness of Four Knee ...drrobertmarx.com/wordpress/wp-content/themes/dr-marx-2015/articles/... · the journal of bone & joint surgery · jbjs.org

TH E JO U R NA L OF BONE & JOINT SURGER Y · JBJS .ORG

VO LU M E 83-A · NUMB ER 10 · OC TOB ER 2001RE LIA BILI T Y, VA L I D I T Y, A N D RESPONSIVENESS OF FOUR

KNEE OUTCOME SC A LES F OR AT H L E T I C PA TIENTS

uated the subjective component, which includes pain, swell-ing, and giving-way, as well as the activity-level component, asthese two parts are most related to disability.

The Activities of Daily Living scale of the Knee OutcomeSurvey was developed recently and published with an evalua-tion of its reliability, validity, and responsiveness15. This scale isdesigned for the evaluation of patients with disorders of theknee ranging from injury of the anterior cruciate ligament toarthrosis. It includes seventeen multiple-choice questions di-vided into two sections: one for symptoms (seven questions)and one for functional disability (ten questions).

The American Academy of Orthopaedic Surgeons sportsknee-rating scale14 was included in the Musculoskeletal Out-comes Data Evaluation and Management System (MODEMS)for athletic patients with disorders of the knee. This instru-ment has five parts with a total of twenty-three questions: acore section (seven questions) on stiffness, swelling, pain, andfunction, and four sections (four questions each) on lockingor catching on activity, giving-way on activity, current activitylimitations due to the knee, and pain on activity due to theknee. At the present time, we are not aware of any publishedevidence of the reliability, validity, or responsiveness of thisinstrument.

Three of the instruments are 100-point scales. The Cin-cinnati knee-rating system is a 35-point scale that we con-verted to a 100-point scale, by dividing the score by thirty-fiveand then multiplying it by 100, to facilitate comparisons.

Data Management and AnalysisThe four questionnaires were collated in random order beforethey were presented to each patient in order to avoid a poten-tial bias due to the sequence in which they were completed.Response forms were completed by the patients, and data en-try was accomplished by manually scanning the forms. If theresponse was not readable by the scanner, the data were en-tered manually. All analyses were carried out on SPSS software(version 9.0; SPSS Advanced Statistics, Chicago, Illinois) forpersonal computers.

As already noted, the scoring system for the AmericanAcademy of Orthopaedic Surgeons sports knee instrumenthas five values, or subscales14. As it was impractical to use fivevalues for each patient for the analysis, we used an unweightedmean of the five subscales to calculate an overall score on thisinstrument. We calculated the score for a given subscale ifone-half or more of the items in that subscale could be scored.If it was possible to calculate a score for three or more of thesubscales, we calculated the mean of the available scores. TheLysholm, Cincinnati, and Activities of Daily Living scales werescored as recommended by the originators of each system10,13,15.

Sample SizeSample-size calculation indicated that for α = 0.05, β = 0.20,ρ(0) = 0.60, and ρ(1) = 0.85, a sample size of forty-two patientswas required for the reliability study44. As far as we know, thereare no studies that describe the sample size for a responsive-ness study, but previous authors have used the same number

of patients as those used for reliability studies36. The baselinequestionnaires from both groups of patients were used forthe validity analyses, which guaranteed a minimum of eightypatients.

Patient DemographicsForty-one patients were included in the reliability arm of thestudy. Twenty of the patients were male, and twenty-one werefemale. The mean age was 32.6 years (range, fifteen to sixtyyears). The mean Tegner score was 6.3 points (range, 4 to 10points). The diagnoses included injury of the anterior cruciateligament in twenty-eight patients; osteochondritis dissecans inthree; arthrosis, a meniscal tear, and patellofemoral joint painin two each; and patellar tendinitis, injury of the posteriorcruciate ligament, knee dislocation, and patellar tendon rup-ture in one each. The mean time between completion of thebaseline questionnaire and the follow-up questionnaire was5.2 days (range, two to fourteen days).

With the baseline responses from the reliability and re-sponsiveness analyses, a total of 133 patients were included inthe validity analysis. There were sixty-nine males and sixty-four females. The mean age was 31.5 years (range, fourteen tosixty-five years). The mean Tegner score was 6.4 points (range,4 to 10 points). The diagnoses included injury of the anteriorcruciate ligament in fifty-seven patients; a patellofemoral dis-order in twenty-one; a meniscal tear in seventeen; arthrosisin thirteen; injury of the medial collateral ligament in five; os-teochondritis dissecans in four; patellar tendinitis and a patel-lar tendon ossicle in three each; injury of the posterior cruciateligament and knee dislocation in two each; and Osgood-Schlatter disease, patellar tendon rupture, symptomatic plica,chondral defect, iliotibial band tendinitis, and quadriceps ten-don injury in one each.

Forty-two patients were involved in the responsivenessarm of the study. The mean age was 30.9 years (range, fifteento sixty-one years). Nineteen of the patients were male, andtwenty-three were female. The mean Tegner score was 6.5points (range, 4 to 9 points). The diagnoses were varied andincluded a disorder of the patellofemoral joint in fifteen pa-tients; injury of the anterior cruciate ligament in twelve; ar-throsis and a meniscal tear in six each; and Osgood-Schlatterdisease, injury of the medial collateral ligament, and a patellartendon ossicle in one each. Twenty-four patients had non-operative treatment, nine had reconstruction of the anteriorcruciate ligament, four had meniscectomy, two had Synvisc(hylan G-F 20) injection, and one each had arthroscopic dé-bridement, meniscal repair, and microfracture.

ResultsReliability ResultsThe mean baseline scores were 71.6 points for the Cincinnatiknee-rating system, 84.1 points for the Lysholm scale, 85.8points for the Activities of Daily Living scale, and 85.1 pointsfor the American Academy of Orthopaedic Surgeons sportsknee-rating scale. The mean scores on the follow-up ques-tionnaires were 71.1 points for the Cincinnati knee-rating

Page 6: Reliability, Validity, and Responsiveness of Four Knee ...drrobertmarx.com/wordpress/wp-content/themes/dr-marx-2015/articles/... · the journal of bone & joint surgery · jbjs.org

TH E JO U R NA L OF BONE & JOINT SURGER Y · JBJS .ORG

VO LU M E 83-A · NUMB ER 10 · OC TOB ER 2001RE LIA BILI T Y, VA L I D I T Y, A N D RESPONSIVENESS OF FOUR

KNEE OUTCOME SC A LES F OR AT H L E T I C PA TIENTS

system, 84.0 points for the Lysholm scale, 86.1 points for theActivities of Daily Living scale, and 85.9 points for the Amer-ican Academy of Orthopaedic Surgeons sports knee-ratingscale.

The intraclass correlation coefficient for these scales was0.88 for the Cincinnati knee-rating system, 0.95 for the Lysh-olm scale, 0.93 for the Activities of Daily Living scale, and 0.92for the American Academy of Orthopaedic Surgeons sportsknee-rating scale. The limits of agreement (mean differenceand two standard deviations between the tests) were 3.8 ± 8.0for the Lysholm scale, 7.8 ± 22.5 for the Cincinnati knee-rating scale, 3.5 ± 9.9 for the Activities of Daily Living scale,and 4.1 ± 9.3 for the American Academy of Orthopaedic Sur-geons sports knee-rating scale.

Validity ResultsThe orthopaedic surgeons all considered the scales to haveface and content validity. All patients who completed thequestionnaires at baseline were included in the construct va-lidity portion of the study since this was a cross-sectionalanalysis. There were three scenarios in which the baseline re-sponses were used for the validity testing but not for the reli-ability or responsiveness analysis. Baseline responses of patientswho were initially entered into the reliability or responsive-ness arm of the study but who did not complete the ques-tionnaires a second time were used for the validity analysis.Patients who were initially entered into the reliability arm

but who indicated that the status of the knee had changed onthe transitional index when they completed the question-naires at follow-up were excluded from the reliability studyalthough their baseline responses were used for the validitystudy. Similarly, patients who were initially entered into theresponsiveness arm but who did not believe that the status ofthe knee had improved on the transitional index at follow-up were excluded from the responsiveness study althoughtheir baseline responses were used for the validity study. Themean scores for these patients are listed in Table I.

With regard to validity testing, the first and second hy-potheses were confirmed as the knee-specific scales all corre-lated better with each other than they did with the physicalcomponent scale, the mental component scale, or any of theSF-36 subscales. As expected, all scales correlated better withthe physical component scale than they did with the mentalcomponent scale (Table II).

The third hypothesis consisted of two parts. The firstwas that the physical function and role-physical subscaleswould correlate better with each knee scale than would the vi-tality or social function subscales, and the second was that thesubscales for general health, bodily pain, vitality, and socialfunction would all correlate better with each knee scale thanwould the role-emotional and mental health subscales. Therewere a small number of minor discrepancies from these con-structs (Table III).

The fourth hypothesis was confirmed as all knee-rating

TABLE I Mean Baseline Scores on the Knee-Rating Scales for One Hundred and Thirty-three Patients Included in the Validity Analysis

Instrument Mean Score Standard Deviation Lowest Score Highest Score

Activities of Daily Living scale 76.0 18.3 16.3 100.0

American Academy of OrthopaedicSurgeons sports knee-rating scale

73.5 19.0 26.9 98.3

Cincinnati knee-rating scale 53.2 28.7 0.0 100.0

Lysholm scale 74.6 18.7 15.0 100.0

Tegner scale 6.4 1.6 4.0 10.0

Short Form-36 subscales

Physical function 74.6 26.1 5.0 100.0

Role-physical 61.8 42.9 0.0 100.0

Bodily pain 65.7 23.0 12.0 100.0

General health 84.2 14.4 42.0 100.0

Vitality 64.4 19.6 5.0 100.0

Social function 83.1 21.1 12.5 100.0

Role-emotional 84.2 31.1 0.0 100.0

Mental health 76.4 14.2 24.0 100.0

Physical component scale 46.1 10.2 20.3 61.0

Mental component scale 53.1 8.7 17.7 67.8

Page 7: Reliability, Validity, and Responsiveness of Four Knee ...drrobertmarx.com/wordpress/wp-content/themes/dr-marx-2015/articles/... · the journal of bone & joint surgery · jbjs.org

TH E JO U R NA L OF BONE & JOINT SURGER Y · JBJS .ORG

VO LU M E 83-A · NUMB ER 10 · OC TOB ER 2001RE LIABILI T Y, VA L I D I T Y, A N D RESPONSIVENESS OF FOUR

KNEE OUTCOME SC A L E S F OR AT H L E T I C PA T I E N T S

scales correlated well with both clinician and patient ratingsof severity (Table II). The minimum correlation between ei-ther of these constructs and one of the knee-rating scales was0.61, and all correlations were significant (p < 0.01) (Table II).Patient-rated severity is more relevant, and all scales cor-related better with patient-rated severity than they did withclinician-rated severity. Analysis of variance demonstrated asignificant difference, with regard to the scores on each knee-rating scale, among patients with different clinician-rated andpatient-rated severity (p < 0.00000001) (Table IV and Figs. 1

through 4). Post hoc testing demonstrated that the scores onthe Activities of Daily Living scale, the American Academy ofOrthopaedic Surgeons sports knee-rating scale, and the Cin-cinnati knee-rating system differed significantly between twoadjacent levels of clinician-rated severity, but the scores on theLysholm scale did not. All four scales demonstrated significantdifferences between two or more adjacent levels of patient-graded severity.

No ceiling or floor effects were demonstrated for any ofthe scales.

TABLE II Validity Analysis with Spearman Correlation Matrix for the Four Knee-Rating Scales, the Physical and Mental Component Scales of the Short Form-36 (SF-36), and the Clinician and Patient Ratings of Severity

PhysicalComponent

Scaleof SF-36

MentalComponent

Scaleof SF-36

American Academyof Orthopaedic

Surgeons Sports Knee-Rating Scale

Lysholm Scale

CincinnatiKnee-Rating

Scale

Activities ofDaily Living

Scale

PatientRating ofSeverity

ClinicianRating ofSeverity

Physical componentscale of SF-36

— –0.18* 0.67† 0.68† 0.68† 0.77† 0.64† 0.5†

Mental componentscale of SF-36

–0.18* — –0.03 0.05 0.01 –0.05 0.08 0.01

American Academyof OrthopaedicSurgeons sports knee-rating scale

0.67† –0.03 — 0.70† 0.83† 0.80† 0.74† 0.64†

Lysholm scale 0.68† 0.05 0.70† — 0.70† 0.85† 0.65† 0.62†

Cincinnati knee-rating scale

0.68† 0.01 0.83† 0.70† — 0.80† 0.67† 0.61†

Activities of DailyLiving scale

0.77† –0.05 0.80† 0.85† 0.80† — 0.73† 0.68†

Patient ratingof severity

0.64† 0.08 0.74† 0.65† 0.67† 0.73† — 0.58†

Clinician ratingof severity

0.59† 0.01 0.64† 0.62† 0.61† 0.68† 0.58† —

*Correlations were significant at p < 0.05. †Correlations were significant at p < 0.01.

TABLE III Validity Analysis with Spearman Correlations for the Four Knee-Rating Scales and Short Form-36 Subscales

Short Form-36Subscale

Cincinnati Knee-Rating Scale

Lysholm Scale

Activities ofDaily Living Scale

American Academy ofOrthopaedic Surgeons

Sports Knee-Rating Scale

Physical function 0.68* 0.66* 0.72* 0.66*

Role-physical 0.52* 0.49* 0.60* 0.62*

Bodily pain 0.51* 0.57* 0.54* 0.44*

General health 0.26* 0.31* 0.28* 0.17

Vitality 0.23* 0.28* 0.26* 0.19†

Social function 0.50* 0.50* 0.48* 0.48*

Role-emotional 0.18† 0.18† 0.14 0.24*

Mental health 0.24* 0.29* 0.21† 0.19†

*Correlations were significant at p < 0.01. †Correlations were significant at p < 0.05.

on February 25, 2005 www.ejbjs.orgDownloaded from

Page 8: Reliability, Validity, and Responsiveness of Four Knee ...drrobertmarx.com/wordpress/wp-content/themes/dr-marx-2015/articles/... · the journal of bone & joint surgery · jbjs.org

TH E JO U R NA L OF BONE & JOINT SURGER Y · JBJS .ORG

VO LU M E 83-A · NUMB ER 10 · OC TOB ER 2001RE LIABILI T Y, VA L I D I T Y, A N D RESPONSIVENESS OF FOUR

KNEE OUTCOME SC A L E S F OR AT H L E T I C PA T I E N T S

Responsiveness ResultsThe mean scores improved from 37.9 points at baseline to65.0 points at the time of follow-up for the Cincinnati knee-rating system, from 64.5 points to 79.6 points for the Lysholmscale, from 66.1 points to 83.6 points for the Activities of DailyLiving scale, and from 60.6 points to 81.8 points for the Amer-ican Academy of Orthopaedic Surgeons sports knee-ratingscale. The standardized response means were 0.8 for the Cin-cinnati knee-rating system, 0.9 for the Lysholm scale, 1.1 forthe Activities of Daily Living scale, and 1.0 for the AmericanAcademy of Orthopaedic Surgeons sports knee-rating scale.While the American Academy of Orthopaedic Surgeons sportsknee-rating scale was quite responsive, six patients had ques-tionnaires that were not able to be scored, leaving only thirty-six patients with questionnaires that could be scored at bothbaseline and follow-up.

Discussionhe results of a clinical research study are of questionablevalue if the measure used to evaluate the effectiveness of

treatment is not known to be reliable, valid, and responsive. Inmany clinical research studies, questionnaires are used as pri-mary outcome measures because these instruments accu-rately reflect symptoms and disabilities that are specific andimportant to patients.

Anderson et al. compared six knee-ligament ratingscales in a study of seventy patients who had had recon-struction of the anterior cruciate ligament five years earlier1.They concluded that the International Knee DocumentationCommittee (IKDC) scale45 should be used to standardizemeasurements. However, the authors did not present anydata on the reliability, validity, or responsiveness of thisscale. In another study, the scores of eight knee-rating scales(including the Lysholm, Hospital for Special Surgery, andIKDC scales) were compared in a group of fifty-six patientswho had undergone reconstruction of the anterior cruciateligament8. The authors encouraged the use of the IKDCscale; however, the measurement properties of these scaleswere not determined.

Other investigators have compared the Lysholm scaleand Cincinnati knee-rating system in studies of patients whoeither had an insufficient anterior cruciate ligament9 or had

had reconstruction of the anterior cruciate ligament10,11,13,46.They found that patients had higher scores on the Lysholmscale but that there was a linear correlation between the twoinstruments. These studies did not assess the reliability or re-sponsiveness of the instruments.

Many scales have been developed without patient inputor the formal techniques of item generation and itemreduction47. In addition, scales have been developed for a widevariety of purposes and for specific patient populations. As aresult, different scales may be used for similar studies, whichcould be a cause for differing conclusions48,49.

We evaluated the reliability, validity, and responsivenessto clinical change of four questionnaires that assess disabilityand symptoms in active patients with disorders of the knee.All of the scales were found to have excellent reliability, withthe intraclass correlation coefficient ranging from a low of0.88 for the Cincinnati knee-rating system to a high of 0.95 forthe Lysholm scale. (A coefficient of >0.75 is adequate for pa-

T

TABLE IV Validity Analysis of Variance Across Levels of Clinician-Rated and Patient-Rated Severity

Clinician-Rated Severity Patient-Rated Severity

Instrument F Test Statistic P Value F Test Statistic P Value

American Academy ofOrthopaedic Surgeons sports knee-rating scale

21.1 <0.00000001 27.3 <0.00000001

Lysholm scale 14.8 <0.00000001 19.5 <0.00000001

Cincinnati knee-rating scale 19.4 <0.00000001 23.8 <0.00000001

Activities of Daily Living scale 20.7 <0.00000001 27.1 <0.00000001

Fig. 1

The mean Lysholm scores according to the clinician-rated severity. The

error bars indicate the standard deviation.

on February 25, 2005 www.ejbjs.orgDownloaded from

Page 9: Reliability, Validity, and Responsiveness of Four Knee ...drrobertmarx.com/wordpress/wp-content/themes/dr-marx-2015/articles/... · the journal of bone & joint surgery · jbjs.org

TH E JO U R NA L OF BONE & JOINT SURGER Y · JBJS .ORG

VO LU M E 83-A · NUMB ER 10 · OC TOB ER 2001RE LIABILI T Y, VA L I D I T Y, A N D RESPONSIVENESS OF FOUR

KNEE OUTCOME SC A L E S F OR AT H L E T I C PA T I E N T S

tients enrolled in a clinical trial29.) The limits-of-agreementstatistic is a measure of reliability that provides additional in-formation to the intraclass correlation coefficient27,28. This sta-tistic is the mean difference (and two standard deviations)between two measures used to evaluate the same subject. Themean difference among the four scales was extremely low(range, 3.5 to 7.8). Three of the four scales had a 95% confi-dence interval between 8.0 and 9.9, while the Cincinnati knee-rating system had a confidence interval of 22.5, indicatingincreased measurement variability or decreased reliability.

The knee scales, the physical component scale and men-tal component scale of the SF-36, and the patient and clinicianseverity ratings were used as constructs to evaluate validity.All of the scales were thought to have adequate face and con-tent validity by ten orthopaedic surgeons with experience inthe field of sports medicine. Our six hypotheses regardingconstruct validity were confirmed. Such confirmation is im-portant because the use of joint-specific scales could be ques-tioned if the knee-specific instruments did not correlate witheach other to a greater degree than they did with the physicalcomponent scale, the mental component scale, or the SF-36subscales.

Responsiveness is dependent not only on the instrumentbut also on the magnitude of change actually experienced bythe patients. The magnitude of change measured in a cohortof patients is determined by the initial score (lower scores al-low more room for improvement), the quality of the interven-tion, the instrument used to measure the health status, andthe statistic used to calculate responsiveness37. The responsive-ness of measurement scales in orthopaedics has been mea-sured with use of the standardized response mean. The

activities of daily living and symptoms subscales of the Cin-cinnati knee-rating system had standardized response meansof 0.72 and 1.56, respectively, in a study of patients who hadundergone anterior cruciate ligament reconstruction32. Twogeneric health-status instruments had standardized responsemeans of 0.88 and 1.00 in a study of patients who had under-gone hip or knee replacement surgery38. The standardized re-sponse means for the knee-specific questionnaires in thepresent study ranged from 0.8 for the Cincinnati knee-ratingsystem to 1.1 for the Activities of Daily Living scale. These val-ues are fairly impressive considering that not all patients un-derwent surgery and that they were generally not severelydisabled at baseline according to their diagnoses and question-naire scores. Therefore, we concluded that these instrumentswere all capable of detecting a clinically relevant differenceover time.

The standard deviation of the measure is important forcalculating sample size when designing a study to comparetwo treatments with use of a rating scale as the primary out-come measure50. Standard deviations can be compared in caseswhere the minimum and maximum scores are the same. Inthe present study, the standard deviations ranged from 18.3 to19.0 for three of the scales, while the Cincinnati knee-ratingsystem had a standard deviation of 28.7, again indicating in-creased measurement variability for the scale. The standarddeviations are relatively large, which is possibly due to the het-erogeneity of the baseline population. Other instruments havealso demonstrated standard deviations in this range whentested in studies of patients with a variety of diagnoses of vary-ing severity. In the initial report that described the Activities ofDaily Living scale, a standard deviation of 20.8 was found forthe baseline responses of patients with diagnoses that rangedfrom tendinitis to osteoarthrosis15. Conversely, in a study of a

The mean scores on the American Academy of Orthopaedic Surgeons

(AAOS) sports knee-rating scale according to the clinician-rated sever-

ity. The error bars indicate the standard deviation.

Fig. 2

Fig. 3

The mean scores on the Cincinnati knee-rating system according to the

clinician-rated severity. The error bars indicate the standard deviation.

on February 25, 2005 www.ejbjs.orgDownloaded from

Page 10: Reliability, Validity, and Responsiveness of Four Knee ...drrobertmarx.com/wordpress/wp-content/themes/dr-marx-2015/articles/... · the journal of bone & joint surgery · jbjs.org

TH E JO U R NA L OF BONE & JOINT SURGER Y · JBJS .ORG

VO LU M E 83-A · NUMB ER 10 · OC TOB ER 2001RE LIA BILI T Y, VA L I D I T Y, A N D RESPONSIVENESS OF FOUR

KNEE OUTCOME SC A LES F OR AT H L E T I C PA TIENTS

more homogeneous patient group (patients who had recov-ered from anterior cruciate ligament reconstruction), the Ly-sholm and Cincinnati scales had lower standard deviations of8.9 and 10.6, respectively1. In the original study by Lysholmand Gillquist, the standard deviation for the scale was 17.8 forpatients with instability of the anterior cruciate ligament butonly 10.8 for patients without instability13. Therefore, the stan-dard deviations in the present study are useful for estimatingsample size; however, they are likely overestimating the stan-dard deviation that would occur in a more homogeneous pa-tient sample.

The four knee-specific questionnaires varied in length,with eight questions in the Lysholm scale, ten in the Cincin-nati knee-rating system, seventeen in the Activities of DailyLiving scale, and twenty-three in the American Academy ofOrthopaedic Surgeons sports knee-rating scale. While all ofthese tools are well within the realm of acceptable responderburden, the number of items is important, particularly if thequestionnaires are administered in conjunction with a generichealth-status measure.

The scoring was relatively straightforward for three ofthe scales; however, the scoring manual provided for theAmerican Academy of Orthopaedic Surgeons sports knee-rating scale suggests the calculation of five subscales. Whilethis information is valuable from a clinical perspective, it istoo complicated to have five subscales describing each aspectof knee function for each patient in a research initiative. Inaddition, this scale was the only one of the four to have theresponse “cannot do for other reasons.” The scoring manualstates that this item should be “dropped,” which we inter-preted as “scored as missing.” We elected to calculate the

mean of the subscales (to arrive at an overall score for the in-strument) if it was possible to score three or more of the sub-scales. The fact that this scale had more items and morecomplicated scoring and that more questionnaires could notbe scored because of missing responses makes its use some-what more onerous. However, when patients completed thequestionnaire sufficiently so that a score could be calculated,the instrument was reliable, valid, and responsive accordingto our criteria.

One limitation of the present study is that the results arenot generalizable to patients who are not active in sports (thatis, those who have a Tegner rating of <4 points). Another po-tential limitation is that we chose to measure a variety of dis-orders of the knee. We did so to allow generalizability of theresults to a wide variety of patients and treatments. In effect,our cohort of active patients with disorders of the knee is a rel-atively homogeneous group from the perspective of generalorthopaedic and medical health. While it is possible that agiven scale would perform differently in a group of patientswith a single diagnosis, it is unlikely that this discrepancywould be very large. Lastly, we used only the subjective com-ponents of the Cincinnati knee-rating system, and our resultsdo not apply to the other components of this system, whichmainly measure impairments.

The Activities of Daily Living scale was well under-stood by patients, could be completed in a relatively shorttime-period, and had slightly better construct validity andresponsiveness than the other scales. This finding is possiblydue to the clear wording of the instrument or to the fact thatit evaluates a very wide variety of symptoms and disabilitiescompared with the others. The latter allows an investigatorto use this instrument for studies involving various knee di-agnoses, as was its intended purpose. We recommend thisinstrument for the study of disorders of the knee in athleticpatients.

The American Academy of Orthopaedic Surgeons, Ly-sholm, and Cincinnati knee-rating scales also satisfied our cri-teria for reliability, validity, and responsiveness, and all areacceptable for use in clinical research. The four scales havemany areas of overlap, and the development of statisticalmethods to compare the results from one scale with thosefrom another is an important area for future research. Addi-tional work to evaluate some of the newer, well-designedknee-specific tools, such as the quality-of-life outcome mea-sure for chronic anterior cruciate ligament deficiency51 and theKnee Injury and Osteoarthritis Outcome Score (KOOS)20, isrequired.

Robert G. Marx, MD, MSc, FRCS(C)Edward C. Jones, MD, MAAnsworth A. Allen, MD David W. Altchek, MDStephen J. O’Brien, MDScott A. Rodeo, MDRiley J. Williams, MDRussell F. Warren, MD

Fig. 4

The mean Activities of Daily Living (ADL) scores according to the

clinician-rated severity. The error bars indicate the standard deviation.

Page 11: Reliability, Validity, and Responsiveness of Four Knee ...drrobertmarx.com/wordpress/wp-content/themes/dr-marx-2015/articles/... · the journal of bone & joint surgery · jbjs.org

TH E JO U R NA L OF BONE & JOINT SURGER Y · JBJS .ORG

VO LU M E 83-A · NUMB ER 10 · OC TOB ER 2001RE LIABILI T Y, VA L I D I T Y, A N D RESPONSIVENESS OF FOUR

KNEE OUTCOME SC A L E S F OR AT H L E T I C PA T I E N T S

Thomas L. Wickiewicz, MDCenter for Clinical Outcome Research (R.G.M. and E.C.J.) and Sports Medicine and Shoulder Service (R.G.M., A.A.A., D.W.A., S.J.O’B., S.A.R., R.J.W., R.F.W., and T.L.W.), Hospital for Special Surgery, 535 East 70th Street, New York, NY 10021. E-mail address for Dr. Marx: [email protected]

No benefits in any form have been received or will be received from a commercial party related directly or indirectly to the subject of this arti-cle. No funds were received in support of this study. Dr. Marx was sup-ported by an American Academy of Orthopaedic Surgeons Health Services Research Fellowship and a Royal College of Physicians and Sur-geons of Canada Detweiler Travelling Fellowship.

References

1. Anderson AA, Federspiel CF, Snyder RB. Evaluation of knee ligament rating systems. Am J Knee Surg. 1993;6:67-73.

2. Beaton DE, Richards RR. Measuring function of the shoulder. A cross-sectional comparison of five questionnaires. J Bone Joint Surg Am. 1996;78:882-90.

3. Bombardier C, Melfi CA, Paul J, Green R, Hawker G, Wright J, Coyte P. Comparison of a generic and a disease-specific measure of pain and physical function after knee replacement surgery. Med Care. 1995;33(4 suppl):AS131-44.

4. Hawker G, Melfi C, Paul J, Green R, Bombardier C. Comparison of a ge-neric (SF-36) and a disease specific (WOMAC) (Western Ontario and McMaster Universities Osteoarthritis Index) instrument in the mea-surement of outcomes after knee replacement surgery. J Rheumatol. 1995;22:1193-6.

5. Kirkley A, Griffin S, McLintock H, Ng L. The development and evaluation of a disease-specific quality of life measurement tool for shoulder instability. The Western Ontario Shoulder Instability Index (WOSI). Am J Sports Med. 1998;26:764-72.

6. L’Insalata JC, Warren RF, Cohen SB, Altchek DW, Peterson MG. A self-administered questionnaire for assessment of symptoms and function of the shoulder. J Bone Joint Surg Am. 1997;79:738-48.

7. Martin DP, Engelberg R, Agel J, Swiontkowski MF. Comparison of the Mus-culoskeletal Function Assessment questionnaire with the Short Form-36, the Western Ontario and McMaster Universities Osteoarthritis Index, and the Sickness Impact Profile health-status measures. J Bone Joint Surg Am. 1997;79:1323-35.

8. Labs K, Paul B. To compare and contrast the various evaluation scoring sys-tems after anterior cruciate ligament reconstruction. Arch Orthop Trauma Surg. 1997;116:92-6.

9. Bollen S, Seedhom BB. A comparison of the Lysholm and Cincinnati knee scoring questionnaires. Am J Sports Med. 1991;19:189-90.

10. Noyes FR, Barber SD, Mooar LA. A rationale for assessing sports activity levels and limitations in knee disorders. Clin Orthop. 1989;246:238-49.

11. Noyes FR, Barber SD, Mangine RE. Bone-patellar ligament-bone and fascia lata allografts for reconstruction of the anterior cruciate ligament. J Bone Joint Surg Am. 1990;72:1125-36.

12. Verbrugge LM, Jette AM. The disablement process. Soc Sci Med. 1994;38:1-14.

13. Lysholm J, Gillquist J. Evaluation of knee ligament surgery results with spe-cial emphasis on use of a scoring scale. Am J Sports Med. 1982;10:150-4.

14. American Academy of Orthopaedic Surgeons. Scoring algorithms for the lower limb outcomes data collection instrument, version 2.0. Rosemont, IL: American Academy of Orthopaedic Surgeons; 1998.

15. Irrgang JJ, Snyder-Mackler L, Wainner RS, Fu FH, Harner CD. Development of a patient-reported measure of function of the knee. J Bone Joint Surg Am. 1998;80:1132-45.

16. McHorney CA, Ware JE Jr, Raczek AE. The MOS 36-Item Short-Form Health Survey (SF-36): II. Psychometric and clinical tests of validity in measuring physical and mental health constructs. Med Care. 1993;31:247-63.

17. McHorney CA, Ware JE Jr, Rogers W, Raczek AE, Lu JF. The validity and relative precision of MOS short- and long-form health status scales and Dartmouth COOP charts. Results from the Medical Outcomes Study. Med Care. 1992;30(5 suppl):MS253-65.

18. Ware JE Jr, Snow KK, Kosinski M, Gandek B. SF-36 health survey: manual and interpretation guide. Boston: The Health Institute, New England Medical Center; 1993.

19. Shapiro ET, Richmond JC, Rockett SE, McGrath MM, Donaldson WR. The use of a generic, patient-based health assessment (SF-36) for evaluation of patients with anterior cruciate ligament injuries. Am J Sports Med. 1996;24:196-200.

20. Roos EM, Roos HP, Lohmander LS, Ekdahl C, Beynnon BD. Knee Injury and Osteoarthritis Outcome Score (KOOS)—development of a self-administered outcome measure. J Orthop Sports Phys Ther. 1998;28:88-96.

21. Brinker MR, Garcia R, Barrack RL, Timon S, Guinn S, Fong B. An analysis of sports knee evaluation instruments. Am J Knee Surg. 1999;12:15-24.

22. Marshall JL, Fetto JF, Botero PM. Knee ligament injuries: a standardized evaluation method. Clin Orthop. 1977;123:115-29.

23. Deyo RA, Diehr P, Patrick DL. Reproducibility and responsiveness of health status measures. Statistics and strategies for evaluation. Control Clin Trials. 1991;12(4 suppl):142S-58S.

24. Streiner DL, Norman GR. Health measurement scales: a practical guide to their development and use. New York: Oxford University Press; 1989.

25. Tegner Y, Lysholm J. Rating systems in the evaluation of knee ligament inju-ries. Clin Orthop. 1985;198:43-9.

26. Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet. 1986;1:307-10.

27. Bland JM, Altman DG. A note on the use of the intraclass correlation coeffi-cient in the evaluation of agreement between two methods of measurement. Comput Biol Med. 1990;20:337-40.

28. Bland JM, Altman DG. Comparing methods of measurement: why plotting dif-ference against standard method is misleading. Lancet. 1995;346:1085-7.

29. Rosner B. Fundamentals of biostatistics. 4th ed. Belmont, CA: Duxbury Press; 1995.

30. Feinstein AR. Clinimetrics. New Haven, CT: Yale University Press; 1987.

31. Juniper EF, Guyatt GH. Development and testing of a new measure of health status for clinical trials in rhinoconjunctivitis. Clin Exp Allergy. 1991;21:77-83.

32. Barber-Westin SD, Noyes FR, McCloskey JW. Rigorous statistical reliability, validity, and responsiveness testing of the Cincinnati knee rating system in 350 subjects with uninjured, injured, or anterior cruciate ligament-reconstructed knees. Am J Sports Med. 1999;27:402-16.

33. Guyatt G, Walter S, Norman G. Measuring change over time: assessing the usefulness of evaluative instruments. J Chronic Dis. 1987;40:171-8.

34. Guyatt G, Mitchell A, Irvine EJ, Singer J, Williams N, Goodacre R, Tompkins C. A new measure of health status for clinical trials in inflammatory bowel disease. Gastroenterology. 1989;96:804-10.

35. Guyatt GH, Eagle DJ, Sackett B, Willan A, Griffith L, McIlroy W, Patterson CJ, Turpie I. Measuring quality of life in the frail elderly. J Clin Epidemiol. 1993;46:1433-44.

36. Beaton DE, Hogg-Johnson S, Bombardier C. Evaluating changes in health status: reliability and responsiveness of five generic health status measures in workers with musculoskeletal disorders. J Clin Epidemiol. 1997;50:79-93.

37. Wright JG, Young NL. A comparison of different indices of responsiveness. J Clin Epidemiol. 1997;50:239-46.

38. Liang MH, Fossel AH, Larson MG. Comparisons of five health status instru-ments for orthopedic evaluation. Med Care. 1990;28:632-42.

39. Gauffin H, Pettersson G, Tegner Y, Tropp H. Function testing in patients with old rupture of the anterior cruciate ligament. Int J Sports Med. 1990;11:73-7.

40. Odensten M, Hamberg P, Nordin M, Lysholm J, Gillquist J. Surgical or conser-vative treatment of the acutely torn anterior cruciate ligament. A randomized study with short-term follow-up observations. Clin Orthop. 1985;198:87-93.

41. Roberts TS, Drez D Jr, McCarthy W, Paine R. Anterior cruciate ligament reconstruction using freeze-dried, ethylene oxide-sterilized, bone-patellar tendon-bone allografts. Two year results in thirty-six patients. Am J Sports Med. 1991;19:35-41; erratum, 1991;19:272.

42. Noyes FR, Mooar PA, Matthews DS, Butler DL. The symptomatic anterior cruciate-deficient knee. Part I: The long-term functional disability in athleti-cally active individuals. J Bone Joint Surg Am. 1983;65:154-62.

on February 25, 2005 www.ejbjs.orgDownloaded from

Page 12: Reliability, Validity, and Responsiveness of Four Knee ...drrobertmarx.com/wordpress/wp-content/themes/dr-marx-2015/articles/... · the journal of bone & joint surgery · jbjs.org

TH E JO U R NA L OF BONE & JOINT SURGER Y · JBJS .ORG

VO LU M E 83-A · NUMB ER 10 · OC TOB ER 2001RE LIABILI T Y, VA L I D I T Y, A N D RESPONSIVENESS OF FOUR

KNEE OUTCOME SC A L E S F OR AT H L E T I C PA T I E N T S

43. Noyes FR, Matthews DS, Mooar PA, Grood ES. The symptomatic anterior cruciate-deficient knee. Part II: The results of rehabilitation, activity modi-fication, and counseling on functional disability. J Bone Joint Surg Am. 1983;65:163-74.

44. Donner A, Eliasziw M. Sample size requirements for reliability studies. Stat Med. 1987;6:441-8.

45. Hefti F, Muller W. [Current state of evaluation of knee ligament lesions. The new IKDC knee evaluation form]. Orthopäde. 1993;22:351-62. German.

46. Sgaglione NA, Del Pizzo W, Fox JM, Friedman MJ. Critical analysis of knee ligament rating systems. Am J Sports Med. 1995;23:660-7.

47. Wright JG. Quality of life in orthopaedics. In: Spilker B, editor. Quality of life

and pharmacoeconomics in clinical trials. 2nd ed. Philadelphia: Lippincott-Raven; 1996. p 1039-44.

48. Andersson G. Hip assessment: a comparison of nine different methods. J Bone Joint Surg Br. 1972;54:621-5.

49. Callaghan JJ, Dysart SH, Savory CF, Hopkinson WJ. Assessing the results of hip replacement. A comparison of five different rating systems. J Bone Joint Surg Br. 1990;72:1008-9.

50. Freedman KB, Bernstein J. Sample size and statistical power in clinical ortho-paedic research. J Bone Joint Surg Am. 1999;81:1454-60.

51. Mohtadi N. Development and validation of the quality of life outcome mea-sure (questionnaire) for chronic anterior cruciate ligament deficiency. Am J Sports Med. 1998.

on February 25, 2005 www.ejbjs.orgDownloaded from