reliability of three assessment tools used to evaluate...

Reliability of Three Assessment

Tools Used to Evaluate

Randomized Controlled Trials

for Treatment of Neck Pain

•N Graham MSc, PT

•T Haines MSc, MD

•C Goldsmith PhD

for Cervical Overview Group (COG)

• A Gross MSc,PT

• S Burnie MSc, DC

•U Shahzad BHSc (Hon)

•E Talovikova BHSc (Hon)

COG

COG

Cervical Overview Group1992 to 2010

Overview

Brief overview of COG and Validity team

Objective

Background Summary

Methods

Results

Conclusions

COG1992 to 2010

MedicinePhysical

Medicine MethodsPatient EducationManual Therapy

Injections

Medication

LLLT

Traction

Electrotherapy

Heat & Cold

Ultrasound

Exercise

Acupuncture

Manipulation

Mobilization

Massage

Orthosis

Standardized

method for

systematic

review

Literature Search

Identification criteria

Selection criteria

Validity Assessment Data Abstraction

Analysis:

Effect Measure

Test of Homogeneity

Sensitivity Analysis

Synthesis

Conclusions

Recommendations

Objective

To assess the inter-rater reliability of three

tools used by COG for assessment of internal

validity of RCTs

Design: Pragmatic, cross-sectional study

June 2003 – June 2009

Tools: Jadad, van Tulder & Risk of Bias

Background

Numerous tools exist Jadad

PEDro

Delphi List

van Tulder (CBRG method guidelines)

Cochrane Risk of Bias (CBRG method guidelines and Cochrane Handbook Chapter 8)

Scales and checklists

Criteria-based domains

Jadad

Developed through standardized item

reduction process in 1996

3 item scale Randomization, blinding, drop-outs

Total score 5

Score of 3 or more indicative of high quality

Validated for evaluation of pain in drug trials

Jadad et al 1996

Jadad Assessment Tool

JADAD SCORING CRITERIA Potential

Score

Score

Awarded

1a. Was the study described as randomized? +1

1b. Was the method of randomization described and

appropriate to conceal allocation?

+1

1c. If described and inappropriate, describe: - 1

2a. Was the study described as double blinded? +1

2b. Was the method of double-blinding described and

appropriate to maintain a double-blinding?

+1

2c. Was the method of blinding inappropriate? -1

3. Was there a description of withdrawals and drop outs? +1

FINAL SCORE (0 – 5) 5 _ / 5

van Tulder Checklist developed in 1997 by the CBRG

editorial board as recommended method guidelines for systematic reviews in area of spinal disorders

Updated in 2003 revised to 11 item criteria list for methodological quality assessment

6/11 considered high quality

Updated in 2009 revised to 12 item internal validity checklist for source of risk of bias

recent evidence supporting the use of sum scores

van Tulder et al 1997, 2003, 2009, Furlan et al 2009

van Tulder 11 item checklist (2003) Yes No / DK

A. Was the method of randomization adequate?

B. Was the treatment allocation concealed?

C. Were the groups similar at baseline regarding the most

important prognostic indicators?

D. Was the patient blinded to the intervention?

E. Was the care provider blinded to the intervention?

F. Was the outcome assessor blinded to the intervention?

G. Were co-interventions avoided or similar?

H. Was the compliance acceptable in all groups?

I. Was the withdrawal/drop-out rate described & acceptable?

J. Was the timing of outcome assessment in all grps similar?

K. Did the analysis include an intention-to-treat analysis?

TOTAL ITEMS SCORED YES (0 - 11) __ / 11

Risk of Bias

Cochrane Collaboration recommends domain based criteria Sequence generation

Allocation concealment

Blinding of participants, personnel and outcome assessors

Incomplete outcome data

Selective outcome reporting

Other sources of bias

Not a scale or checklist with sum score

Qualitative assessment requiring subjective judgment

Cochrane Handbook 2008, Chapter 8

A Sequence generation1. Was the method of randomization adequate? Yes / No / Unsure

B Allocation concealment2. Was the treatment allocation concealed? Yes / No / Unsure

C Blinding of participants, personnel and outcome

assessorsWas knowledge of the allocated interventions adequately prevented

during the study?

3. Was the patient blinded to the intervention?

4. Was the care provider blinded to the intervention?

5. Was the outcome assessor blinded to the intervention?

Yes / No / Unsure

Yes / No / Unsure

Yes / No / Unsure

D

E

Incomplete outcome dataWere incomplete outcome data adequately addressed?

6. Was the drop-out rate described and acceptable?

7. Were all randomized participants analyzed in the group to which they

were allocated?

Selective outcome reporting8. Are reports of the study free of suggestion of selective outcome

reporting?

Yes / No / Unsure

Yes / No / Unsure

Yes / No / Unsure

F Other sources of potential bias9. Were the groups similar at baseline regarding the most important

prognostic indicators?

10. Were co-interventions avoided or similar?

11. Was the compliance acceptable in all groups?

12. Was the timing of the outcome assessment similar in all groups?

Yes / No / Unsure

Yes / No / Unsure

Yes / No / Unsure

Yes / No / Unsure

Risk of Bias

Furlan et al 2009, CBRG

Methods

Four members of COG, multi-professional and

methodological backgrounds

Evaluation of internal validity of 54 RCTs using

Jadad and van Tulder, 18 RCTs using RoB

from June 2003 to June 2009

Kappa statistic with standard agreement

categorization

Methods

Each rater (Statistician, MD, PT, DC) independently assessed RCT using all 3 tools

Meeting to discuss, concensus reached

Individual and concensus recorded

Kappa statistic calculated for each combination of raters

Methods

Scale to interpret Kappa score

≤ 0 = poor

0.01 to 0.2 = slight

0.21 to 0.4 = fair

0.41 to 0.6 = moderate

0.61 to 0.8 = substantial

0.81 to 1 = almost perfect

Landis and Koch 1977

Results

Jadad 4/7 moderate to substantial

agreement

van Tulder 8/11 moderate agreement

Risk of Bias 3/12 moderate to substantial

Results

Kappa mean (min to max) Consistent substantial agreement across all tools for

the domain allocation concealment Jadad 0.69 (0.60 to 0.77)

van Tulder 0.77 (0.73 to 0.81)

Risk of Bias 0.76 (0.65 to 0.88)

Consistent substantial agreement across two tools for the domain randomization

van Tulder 0.53 (0.37 to 0.66)

Risk of Bias 0.66 (0.45 to 0.88)

Other domains demonstrated fair to moderate agreement or were not a fair test of agreement

Results Jadad (n=54)

Kappa estimate (95% CI: lower bound, higher bound)

R* 1b

Appropriate

concealment

1c

Inappropriate

concealment

2a

Double

blinding

2b

Appropriate

blinding

3a

Dropouts

1, 2 0.77

0.60, 0.94

0.48

-0.11, 1.00

0.80

0.60, 1.00

0.65

0.36, 0.93

0.21

-0.05, 0.47

1, 3 0.60

0.38, 0.82

0.46

0.02, 0.90

0.73

0.48, 0.98

0.34

0.00, 0.700

0.16

-0.08, 0.42

2, 3 0.69

0.50, 0.88

0.31

-0.15, 0.78

0.80

0.60, 1.00

0.61

0.31, 0.91

0.09

-0.18, 0.37

Mean 0.69 0.42 0.78 0.53 0.15

R* = raters

**1a (randomization) and 2c (inappropriate blinding) were not computed

Results van Tulder (n=54)

Item A to F


R* A

Randomization

adequate

B

Allocation

concealment

C

Groups at

baseline

D

Patient

blinding

E

Care provider

blinding

F

Assessor

blinding

1, 2 0.53

0.30,0.76

0.81

0.65,0.96

0.42

0.17,0.66

0.65

0.42,0.88

0.26

-0.15,0.67

0.64

0.42,0.85

1, 3 0.37

0.14,0.60

0.77

0.60,0.94

0.40

0.16,0.65

0.72

0.50,0.94

0.64

0.27,1.00

0.47

0.25,0.70

2, 3 0.66

0.49,0.86

0.73

0.55,0.91

0.50

0.26,0.74

0.54

0.27,0.82

0.48

-0.11,1.00

0.38

0.12,0.63

Mean 0.53 0.77 0.44 0.64 0.46 0.50

R* = raters

Results van Tulder (n=54)

Item G to K


R* G

Co-interventions

H

Acceptable

compliance

I

Dropouts

J

Timing of

Assessment

K

Intention-to-treat

1, 2 0.23

-0.02,0.48

0.33

0.09,0.58

0.29

0.01,0.58

-0.02

-0.07,0.01

0.44

0.20,0.68

1, 3 0.29

0.03,0.54

0.52

0.29,0.74

0.39

0.11,0.67

-0.02

-0.06,0.01

0.63

0.42,0.83

2, 3 0.58

0.37,0.79

0.51

0.28,0.74

0.08

-0.20,0.37

0.79

0.39,1.00

0.58

0.37,0.80

Mean 0.36 0.45 0.26 0.24 0.55

R* = raters

Results RoB (n=18) Item 1 to 7


R* 1

Randomization

2

Allocation

concealment

3

Patient

blinding

5

Assessor

blinding

6

Dropouts

7

Intention-to-treat

1, 4 0.88

0.67,1.00

0.65

0.30,1.00

0.15

-0.30,0.60

0.26

-0.16,0.69

0.44

0.03,0.85

-0.05

-0.48,0.37

1, 2 0.45

0.07,0.84

0.77

0.48,1.00

0.36

-0.11,0.85

0.36

-0.11,0.85

0.55

0.18,0.93

0.41

0.01,0.82

1, 3 0.76

0.46,1.00

0.88

0.67,1.00

N/C N/C 0.33

-0.09,0.75

0.44

-0.01,0.90

2, 4 0.56

0.19,0.93

0.88

0.67,1.00

0.76

0.34,1.00

0.45

-0.14,1.00

0.65

0.30,1.00

0.10

-0.36,0.56

3, 4 0.88

0.67,1.00

0.76

0.46,1.00

N/C N/C 0.20

-0.25,0.65

-0.05

-0.48,0.37

2, 3 0.45

0.07,0.84

0.65

0.30,1.00

N/C N/C 0.06

-0.39,0.52

0.64

0.31,0.98

Mean 0.66 0.76 0.42 0.35 0.37 0.24

R* = Reviewers; **4 (care provider blinding) was removed; N/C = not computed

Results RoB (n=18) Item 8 to12Kappa (95% CI: lower bound, higher bound)

R* 8

Selective

reporting

9

Groups similar

at baseline

10

Co-interventions

11

Compliance

12

Timing of

outcome

1, 4 -0.15

-0.30,0.00

0.40

-0.02,0.83

0.49

0.03,0.94

0.41

0.01,0.82

N/C

1, 2 0.30

-0.28,0.89

0.10

-0.31, 0.52

0.40

-0.01, 0.81

0.41

0.01,0.82

-0.05

-0.14, 0.02

1, 3 0.30

-0.28,0.89

0.65

0.32,0.98

0.60

0.13,1.00

0.16

-0.31, 0.65

-0.05

-0.14,0.02

2, 4 0.43

-0.21,1.00

0.33

-0.09, 0.75

0.87

0.62,1.00

0.32

-0.11, 0.76

N/C

3, 4 0.43

-0.21,1.00

0.55

0.18,0.93

0.55

0.11,0.99

0.41

0.01,0.82

N/C

2, 3 1.00 0.29

-0.15, 0.74

0.45

0.01,0.89

0.41

0.01,0.82

1.00

Mean 0.38 0.38 0.56 0.35 0.30

R* = Reviewers; **4 (care provider blinding) was removed; N/C= not computed

Conclusions Blinding is inherent limitation in rehab trials

Incomplete reporting, use of “unsure” leads

to greater variation of kappa scores

Decision rules change and new members

need to be calibrated

Conclusions Diverse group,

generalizability, fairer

test of scoring reliability

Consistent inter-rater

agreement for allocation

concealment and

randomization

Some items require

more judgment

COG

Cervical Overview Group

Thank you Acknowledgement

McMaster University

Carey Chittley, Lisa Ditchburn, Danna Epstein,

Denise Ward, Guillaume Bourrillon

reliability of three assessment tools used to evaluate...

Documents