reliability of three assessment tools used to evaluate...
TRANSCRIPT
Reliability of Three Assessment
Tools Used to Evaluate
Randomized Controlled Trials
for Treatment of Neck Pain
•N Graham MSc, PT
•T Haines MSc, MD
•C Goldsmith PhD
for Cervical Overview Group (COG)
• A Gross MSc,PT
• S Burnie MSc, DC
•U Shahzad BHSc (Hon)
•E Talovikova BHSc (Hon)
COG
COG
Cervical Overview Group1992 to 2010
Overview
Brief overview of COG and Validity team
Objective
Background Summary
Methods
Results
Conclusions
COG1992 to 2010
MedicinePhysical
Medicine MethodsPatient EducationManual Therapy
Injections
Medication
LLLT
Traction
Electrotherapy
Heat & Cold
Ultrasound
Exercise
Acupuncture
Manipulation
Mobilization
Massage
Orthosis
Standardized
method for
systematic
review
Literature Search
Identification criteria
Selection criteria
Validity Assessment Data Abstraction
Analysis:
Effect Measure
Test of Homogeneity
Sensitivity Analysis
Synthesis
Conclusions
Recommendations
Objective
To assess the inter-rater reliability of three
tools used by COG for assessment of internal
validity of RCTs
Design: Pragmatic, cross-sectional study
June 2003 – June 2009
Tools: Jadad, van Tulder & Risk of Bias
Background
Numerous tools exist Jadad
PEDro
Delphi List
van Tulder (CBRG method guidelines)
Cochrane Risk of Bias (CBRG method guidelines and Cochrane Handbook Chapter 8)
Scales and checklists
Criteria-based domains
Jadad
Developed through standardized item
reduction process in 1996
3 item scale Randomization, blinding, drop-outs
Total score 5
Score of 3 or more indicative of high quality
Validated for evaluation of pain in drug trials
Jadad et al 1996
Jadad Assessment Tool
JADAD SCORING CRITERIA Potential
Score
Score
Awarded
1a. Was the study described as randomized? +1
1b. Was the method of randomization described and
appropriate to conceal allocation?
+1
1c. If described and inappropriate, describe: - 1
2a. Was the study described as double blinded? +1
2b. Was the method of double-blinding described and
appropriate to maintain a double-blinding?
+1
2c. Was the method of blinding inappropriate? -1
3. Was there a description of withdrawals and drop outs? +1
FINAL SCORE (0 – 5) 5 _ / 5
van Tulder Checklist developed in 1997 by the CBRG
editorial board as recommended method guidelines for systematic reviews in area of spinal disorders
Updated in 2003 revised to 11 item criteria list for methodological quality assessment
6/11 considered high quality
Updated in 2009 revised to 12 item internal validity checklist for source of risk of bias
recent evidence supporting the use of sum scores
van Tulder et al 1997, 2003, 2009, Furlan et al 2009
van Tulder 11 item checklist (2003) Yes No / DK
A. Was the method of randomization adequate?
B. Was the treatment allocation concealed?
C. Were the groups similar at baseline regarding the most
important prognostic indicators?
D. Was the patient blinded to the intervention?
E. Was the care provider blinded to the intervention?
F. Was the outcome assessor blinded to the intervention?
G. Were co-interventions avoided or similar?
H. Was the compliance acceptable in all groups?
I. Was the withdrawal/drop-out rate described & acceptable?
J. Was the timing of outcome assessment in all grps similar?
K. Did the analysis include an intention-to-treat analysis?
TOTAL ITEMS SCORED YES (0 - 11) __ / 11
Risk of Bias
Cochrane Collaboration recommends domain based criteria Sequence generation
Allocation concealment
Blinding of participants, personnel and outcome assessors
Incomplete outcome data
Selective outcome reporting
Other sources of bias
Not a scale or checklist with sum score
Qualitative assessment requiring subjective judgment
Cochrane Handbook 2008, Chapter 8
A Sequence generation1. Was the method of randomization adequate? Yes / No / Unsure
B Allocation concealment2. Was the treatment allocation concealed? Yes / No / Unsure
C Blinding of participants, personnel and outcome
assessorsWas knowledge of the allocated interventions adequately prevented
during the study?
3. Was the patient blinded to the intervention?
4. Was the care provider blinded to the intervention?
5. Was the outcome assessor blinded to the intervention?
Yes / No / Unsure
Yes / No / Unsure
Yes / No / Unsure
D
E
Incomplete outcome dataWere incomplete outcome data adequately addressed?
6. Was the drop-out rate described and acceptable?
7. Were all randomized participants analyzed in the group to which they
were allocated?
Selective outcome reporting8. Are reports of the study free of suggestion of selective outcome
reporting?
Yes / No / Unsure
Yes / No / Unsure
Yes / No / Unsure
F Other sources of potential bias9. Were the groups similar at baseline regarding the most important
prognostic indicators?
10. Were co-interventions avoided or similar?
11. Was the compliance acceptable in all groups?
12. Was the timing of the outcome assessment similar in all groups?
Yes / No / Unsure
Yes / No / Unsure
Yes / No / Unsure
Yes / No / Unsure
Risk of Bias
Furlan et al 2009, CBRG
Methods
Four members of COG, multi-professional and
methodological backgrounds
Evaluation of internal validity of 54 RCTs using
Jadad and van Tulder, 18 RCTs using RoB
from June 2003 to June 2009
Kappa statistic with standard agreement
categorization
Methods
Each rater (Statistician, MD, PT, DC) independently assessed RCT using all 3 tools
Meeting to discuss, concensus reached
Individual and concensus recorded
Kappa statistic calculated for each combination of raters
Methods
Scale to interpret Kappa score
≤ 0 = poor
0.01 to 0.2 = slight
0.21 to 0.4 = fair
0.41 to 0.6 = moderate
0.61 to 0.8 = substantial
0.81 to 1 = almost perfect
Landis and Koch 1977
Results
Jadad 4/7 moderate to substantial
agreement
van Tulder 8/11 moderate agreement
Risk of Bias 3/12 moderate to substantial
Results
Kappa mean (min to max) Consistent substantial agreement across all tools for
the domain allocation concealment Jadad 0.69 (0.60 to 0.77)
van Tulder 0.77 (0.73 to 0.81)
Risk of Bias 0.76 (0.65 to 0.88)
Consistent substantial agreement across two tools for the domain randomization
van Tulder 0.53 (0.37 to 0.66)
Risk of Bias 0.66 (0.45 to 0.88)
Other domains demonstrated fair to moderate agreement or were not a fair test of agreement
Results Jadad (n=54)
Kappa estimate (95% CI: lower bound, higher bound)
R* 1b
Appropriate
concealment
1c
Inappropriate
concealment
2a
Double
blinding
2b
Appropriate
blinding
3a
Dropouts
1, 2 0.77
0.60, 0.94
0.48
-0.11, 1.00
0.80
0.60, 1.00
0.65
0.36, 0.93
0.21
-0.05, 0.47
1, 3 0.60
0.38, 0.82
0.46
0.02, 0.90
0.73
0.48, 0.98
0.34
0.00, 0.700
0.16
-0.08, 0.42
2, 3 0.69
0.50, 0.88
0.31
-0.15, 0.78
0.80
0.60, 1.00
0.61
0.31, 0.91
0.09
-0.18, 0.37
Mean 0.69 0.42 0.78 0.53 0.15
R* = raters
**1a (randomization) and 2c (inappropriate blinding) were not computed
Results van Tulder (n=54)
Item A to F
Kappa estimate (95% CI: lower bound, higher bound)
R* A
Randomization
adequate
B
Allocation
concealment
C
Groups at
baseline
D
Patient
blinding
E
Care provider
blinding
F
Assessor
blinding
1, 2 0.53
0.30,0.76
0.81
0.65,0.96
0.42
0.17,0.66
0.65
0.42,0.88
0.26
-0.15,0.67
0.64
0.42,0.85
1, 3 0.37
0.14,0.60
0.77
0.60,0.94
0.40
0.16,0.65
0.72
0.50,0.94
0.64
0.27,1.00
0.47
0.25,0.70
2, 3 0.66
0.49,0.86
0.73
0.55,0.91
0.50
0.26,0.74
0.54
0.27,0.82
0.48
-0.11,1.00
0.38
0.12,0.63
Mean 0.53 0.77 0.44 0.64 0.46 0.50
R* = raters
Results van Tulder (n=54)
Item G to K
Kappa estimate (95% CI: lower bound, higher bound)
R* G
Co-interventions
H
Acceptable
compliance
I
Dropouts
J
Timing of
Assessment
K
Intention-to-treat
1, 2 0.23
-0.02,0.48
0.33
0.09,0.58
0.29
0.01,0.58
-0.02
-0.07,0.01
0.44
0.20,0.68
1, 3 0.29
0.03,0.54
0.52
0.29,0.74
0.39
0.11,0.67
-0.02
-0.06,0.01
0.63
0.42,0.83
2, 3 0.58
0.37,0.79
0.51
0.28,0.74
0.08
-0.20,0.37
0.79
0.39,1.00
0.58
0.37,0.80
Mean 0.36 0.45 0.26 0.24 0.55
R* = raters
Results RoB (n=18) Item 1 to 7
Kappa estimate (95% CI: lower bound, higher bound)
R* 1
Randomization
2
Allocation
concealment
3
Patient
blinding
5
Assessor
blinding
6
Dropouts
7
Intention-to-treat
1, 4 0.88
0.67,1.00
0.65
0.30,1.00
0.15
-0.30,0.60
0.26
-0.16,0.69
0.44
0.03,0.85
-0.05
-0.48,0.37
1, 2 0.45
0.07,0.84
0.77
0.48,1.00
0.36
-0.11,0.85
0.36
-0.11,0.85
0.55
0.18,0.93
0.41
0.01,0.82
1, 3 0.76
0.46,1.00
0.88
0.67,1.00
N/C N/C 0.33
-0.09,0.75
0.44
-0.01,0.90
2, 4 0.56
0.19,0.93
0.88
0.67,1.00
0.76
0.34,1.00
0.45
-0.14,1.00
0.65
0.30,1.00
0.10
-0.36,0.56
3, 4 0.88
0.67,1.00
0.76
0.46,1.00
N/C N/C 0.20
-0.25,0.65
-0.05
-0.48,0.37
2, 3 0.45
0.07,0.84
0.65
0.30,1.00
N/C N/C 0.06
-0.39,0.52
0.64
0.31,0.98
Mean 0.66 0.76 0.42 0.35 0.37 0.24
R* = Reviewers; **4 (care provider blinding) was removed; N/C = not computed
Results RoB (n=18) Item 8 to12Kappa (95% CI: lower bound, higher bound)
R* 8
Selective
reporting
9
Groups similar
at baseline
10
Co-interventions
11
Compliance
12
Timing of
outcome
1, 4 -0.15
-0.30,0.00
0.40
-0.02,0.83
0.49
0.03,0.94
0.41
0.01,0.82
N/C
1, 2 0.30
-0.28,0.89
0.10
-0.31, 0.52
0.40
-0.01, 0.81
0.41
0.01,0.82
-0.05
-0.14, 0.02
1, 3 0.30
-0.28,0.89
0.65
0.32,0.98
0.60
0.13,1.00
0.16
-0.31, 0.65
-0.05
-0.14,0.02
2, 4 0.43
-0.21,1.00
0.33
-0.09, 0.75
0.87
0.62,1.00
0.32
-0.11, 0.76
N/C
3, 4 0.43
-0.21,1.00
0.55
0.18,0.93
0.55
0.11,0.99
0.41
0.01,0.82
N/C
2, 3 1.00 0.29
-0.15, 0.74
0.45
0.01,0.89
0.41
0.01,0.82
1.00
Mean 0.38 0.38 0.56 0.35 0.30
R* = Reviewers; **4 (care provider blinding) was removed; N/C= not computed
Conclusions Blinding is inherent limitation in rehab trials
Incomplete reporting, use of “unsure” leads
to greater variation of kappa scores
Decision rules change and new members
need to be calibrated
Conclusions Diverse group,
generalizability, fairer
test of scoring reliability
Consistent inter-rater
agreement for allocation
concealment and
randomization
Some items require
more judgment
COG
Cervical Overview Group
Thank you Acknowledgement
McMaster University
Carey Chittley, Lisa Ditchburn, Danna Epstein,
Denise Ward, Guillaume Bourrillon