August 27 - 30, 2012
GUIDELINE AND SYSTEMATIC REVIEW WORKSHOP
Dr. Elie AklDr. Holger SchünemannDr. Ruth KaldaDr. Alar Irs
August 27, 2012
INTRODUCTION TO GUIDELINE DEVELOPMENT IN THE CONTEXT OF EVIDENCE BASED MEDICINE
Dr. Holger Schünemann
History- 1967 – Founded by David Sackett- 6 chairs since- Instrumental in specialty of Clinical Epidemiology, origin of “Evidence-Based Medicine”
People45 full time and joint faculty~ 120 associate & part time faculty; 19 emeritus~ 180 staff~ 200 PhD and Master students
The Department of Clinical Epidemiology & Biostatistics at
McMaster
No
t M
enti
on
ed
Ro
uti
ne
Exp
erim
enta
l
Rar
e/N
ever
Sp
ecif
ic
Textbook/ReviewRecommendations
0.5 1.0 2.0
Favors Treatment Favors Control
P < 0.01
P < 0.001
P < 0.00001
21
5
1 10
1 2
2 8
7
8
1 12
1 8 4
1 7 3
5 2 2 1
15 8 1
6 1
2121
55
11 1010
11 22
22 88
77
88
11 1212
11 88 44
11 77 33
55 22 22 11
1515 88 11
66 11M
M
M
M
M
M
Year RCTs
1960 12
1965 31970 4
71011151722
1980 2327
1985 303343
54
65
1990 67
70
YearYear RCTsRCTs
19601960 1122
19651965 3319701970 44
7710101111151517172222
19801980 23232727
19851985 303033334343
5454
6565
19901990 6767
7070
Pts
2365149316
176325442651331139295452576761256346657121059
22051
47185
47531
48154
PtsPts
23236565149149316316
176317632544254426512651331133113929392954525452576757676125612563466346657165712105921059
2205122051
4718547185
4753147531
4815448154
Cumulative Odds Ratio (Log Scale)
Why EBM?Thrombolysis in Myocardial
infarction
Antman et al., JAMA, 1992; 268: 240-248
What is a guideline?
• "Guidelines are recommendations intended to assist providers and recipients of health care and other stakeholders to make informed decisions. Recommendations may relate to clinical interventions, public health activities, or government policies."
WHO 2003, 2007
When do we need guidelines?
• Knowledge gap?– Is a guideline the right approach?
• Diagnosis?– Too many cases? Too few? Variation?
• Treatment?– Under? Over? Variation? Something new?
• Screening?• Quality of care? Integration of care?• Other?
What healthcare workers want…
• A guideline is not a textbook or a cookbook• To KNOW that the guideline is evidence based• But not necessarily all of the evidence…• To have it easy to use and accessible• Clear recommendations (more on that later)
Guideline development
Process
Working with evidence
• For key recommendations:– Search for and retrieve all available evidence– Identify relevant SRs– Formally assess quality of evidence– GRADE (systematic and transparent approach)
Institute of Medicine Report on Trustworthy guidelines
• Be based on a systematic review of the existing evidence;
• Be developed by a knowledgeable, multidisciplinary panel of experts and representatives from key affected groups;
• Consider important patient subgroups and patient preferences as appropriate;
• Be based on an explicit and transparent process that minimizes distortions, biases, and conflicts of interest;
• Provide a clear explanation of the logical relationships between alternative care options and health outcomes, and provide ratings of both the quality of evidence and the strength of recommendations; and
• Be reconsidered and revised as appropriate when important new evidence warrants modifications of recommendations.
Guideline International Network
Institute of Medicine Report on Trustworthy guidelines 2011
• Be based on a systematic review of the existing evidence;
• Be developed by a knowledgeable, multidisciplinary panel of experts and representatives from key affected groups;
• Consider important patient subgroups and patient preferences as appropriate;
• Be based on an explicit and transparent process that minimizes distortions, biases, and conflicts of interest;
• Provide a clear explanation of the logical relationships between alternative care options and health outcomes, and provide ratings of both the quality of evidence and the strength of recommendations; and
• Be reconsidered and revised as appropriate when important new evidence warrants modifications of recommendations.
The origin of evidence appraisal systems
Canadian Task Force on the Periodic Health Examination, CMAJ, 1979
Oxford Centre for Evidence Based Medicine
Levels of Evidence and Grades of Recommendations- 23 November 1999. Grade of
Recommendation Level of
Evidence Therapy/Prevention, Aetiology/Harm Prognosis Diagnosis Economic analysis
1a SR (with homogeneity) of RCTs SR (with homogeneity*) of inception cohort studies; or a CPG validated on a test set.
SR (with homogeneity*) of Level 1 diagnostic studies; or a CPG validated on a test set.
SR (with homogeneity*) of Level 1 economic studies
A
1b Individual RCT (with narrow Confidence Interval)
Individual inception cohort study with > 80% follow-up
Independent blind comparison of an appropriate spectrum of consecutive patients, all of whom have undergone both the diagnostic test and the reference standard.
Analysis comparing all (critically-validated) alternative outcomes against appropriate cost measurement, and including a sensitivity analysis incorporating clinically sensible variations in important variables.
1c All or none All or none case-series Absolute SpPins and SnNouts Clearly as good or better, but cheaper. Clearly as bad or worse but more expensive. Clearly better or worse at the same cost.
2a SR (with homogeneity*) of cohort studies SR (with homogeneity*) of either retrospective cohort studies or untreated control groups in RCTs.
SR (with homogeneity*) of Level >2 diagnostic studies
SR (with homogeneity*) of Level >2 economic studies
B
2b Individual cohort study (including low quality RCT; e.g., <80% follow-up)
Retrospective cohort study or follow-up of untreated control patients in an RCT; or CPG not validated in a test set.
Any of: Independent blind or objective comparison; Study performed in a set of non-consecutive
patients, or confined to a narrow spectrum of study individuals (or both) all of whom have undergone both the diagnostic test and the reference standard;
A diagnostic CPG not validated in a test set.
Analysis comparing a limited number of alternative outcomes against appropriate cost measurement, and including a sensitivity analysis incorporating clinically sensible variations in important variables.
2c “Outcomes” Research “Outcomes” Research
3a SR (with homogeneity*) of case-control studies
3b Individual Case-Control Study Independent blind comparison of an appropriate spectrum, but the reference standard was not applied to all study patients
Analysis without accurate cost measurement, but including a sensitivity analysis incorporating clinically sensible variations in important variables.
C
4 Case-series (and poor quality cohort and case-control studies)
Case-series (and poor quality prognostic cohort studies)
Any of: Reference standard was unobjective,
unblinded or not independent; Positive and negative tests were verified
using separate reference standards; Study was performed in an inappropriate
spectrum** of patients.
Analysis with no sensitivity analysis
D
5 Expert opinion without explicit critical appraisal, or based on physiology, bench research or “first principles”
Expert opinion without explicit critical appraisal, or based on physiology, bench research or “first principles”
Expert opinion without explicit critical appraisal, or based on physiology, bench research or “first principles”
Expert opinion without explicit critical appraisal, or based on economic theory
Oxford Centre for Evidence-Based Medicine (Chris Ball, Dave Sackett, Bob Phillips, Brian Haynes, and Sharon Straus).
USPSTF - Grade Definitions After May 2007: Certainty
Level of Certainty DescriptionHigh The available evidence usually includes consistent results from well-designed, well-conducted
studies in representative primary care populations. These studies assess the effects of the preventive service on health outcomes. This conclusion is therefore unlikely to be strongly affected by the results of future studies.
Moderate •The available evidence is sufficient to determine the effects of the preventive service on health outcomes, but confidence in the estimate is constrained by such factors as: The number, size, or quality of individual studies.•Inconsistency of findings across individual studies.•Limited generalizability of findings to routine primary care practice.•Lack of coherence in the chain of evidence.As more information becomes available, the magnitude or direction of the observed effect could change, and this change may be large enough to alter the conclusion.
Low •The available evidence is insufficient to assess effects on health outcomes. Evidence is insufficient because of: The limited number or size of studies.•Important flaws in study design or methods.•Inconsistency of findings across individual studies.•Gaps in the chain of evidence.•Findings not generalizable to routine primary care practice.•Lack of information on important health outcomes.More information may allow estimation of effects on health outcomes.
The USPSTF defines certainty as "likelihood that the USPSTF assessment of the net benefit of a preventive service is correct."
• Recommendations for prognosis– Use prognostic information to determine baseline
risk for healthcare decisions
20
21
Center for Disease Control and Prevention (CDC)
Evidence of Effectiveness
Execution - Good or
Fair
Design Suitability —
Greatest, Moderate, or
Least
Number of Studies
Consistent Effect Sized
Expert Opinion
Strong Good Greatest At Least 2 Yes Sufficient Not Used
Good Greatest or Moderate
At Least 5 Yes Sufficient Not Used
Good or Fair
Greatest At Least 5 Yes Sufficient Not Used
Meet Design, Execution, Number, and Consistency Criteria for Sufficient But Not Strong Evidence
Large Not Used
Sufficient Good Greatest 1 Not Applicable
Sufficient Not Used
Good or Fair
Greatest or Moderate
At Least 3 Yes Sufficient Not Used
Good or Fair
Greatest, Moderate, or Least
At Least 5 Yes Sufficient Not Used
Expert Opinion Varies Varies Varies Varies Sufficient Supports a Recommendation
Insufficient A. Insufficient Designs or Execution
B. Too Few Studies
C. Inconsistent
D. Small E. Not Used
Your patient…as an internist
• 68 year old man with hypertension and non-valvular atrial fibrillation > 3 months
Atrial Fibrillation - Stroke
The clinically sensible questionPopulation: Does in patients with atrial
fibrillation
Intervention: oral anticoagulation Comparison: compared with no therapy
Outcomes: reduce the risk for embolic stroke, increase the risk for bleeding, increase burden…?
PICO
Which approach?
Evidence Recommendation• B Class I• A 1• IV C
Organization AHA ACCP SIGN
Recommendation for use of oral anticoagulation in patients with atrial fibrillation and rheumatic mitral valve disease
What to do?
Hierarchy of evidencebased on quality
STUDY DESIGN Randomized Controlled
Trials Cohort Studies and Case
Control Studies Case Reports and Case
Series, Non-systematic observations
Expert Opinion
BIAS
Issues with evidence hierarchies
• Does one size fit all?• Should RCTs be on top?• What are the special strength of observational
studies?
Healthcare problem
recommendation
“Healthy people”“Rare disease”
“Long term perspective”“Few RCTs”
“Lots of other things”
Explain the following?• Confounding, effect modification & ext. validity• Concealment of randomization• Blinding (who is blinded in a double blinded
study?)• Intention to treat analysis and its correct
application• P-values and confidence intervals
“Everything should be made as simple as possible but not simpler.”
BMJ 2003
BMJ, 2003
Relative risk reduction:….> 99.9 % (1/100,000)
U.S. Parachute Association reported 821 injuries and 18 deaths out of 2.2 million jumps in 2007
Simple hierarchies are (too) simplistic
STUDY DESIGN Randomized Controlled
Trials Cohort Studies and
Case Control Studies Case Reports and Case
Series, Non-systematic observations
BIAS
Expert Opinion
Exp
ert O
pin
ion
Schünemann & Bone, 2003
GRADE Working Group
Grades of Recommendation Assessment, Development and
Evaluation
CMAJ 2003, BMJ 2004, BMC 2004, BMC 2005, AJRCCM 2006, Chest 2006, BMJ 2008
• International group: ACCP, AHRQ, Australian NMRC, BMJ Clinical Evidence, CC, CDC, McMaster Uni., NICE, Oxford CEBM, SIGN, UpToDate, USPSTF, WHO
• Aim: to develop a common, transparent and sensible system for grading the quality of evidence and the strength of recommendations (over 100 systems)
• International group of guideline developers, methodologists & clinicians from around the world (>300 contributors) – since 2000
GRADE Uptake World Health Organization Allergic Rhinitis in Asthma Guidelines (ARIA) American Thoracic Society American College of Physicians European Respiratory Society European Society of Thoracic Surgeons British Medical Journal Infectious Disease Society of America American College of Chest Physicians UpToDate® National Institutes of Health and Clinical Excellence (NICE) Scottish Intercollegiate Guideline Network (SIGN) Cochrane Collaboration Infectious Disease Society of America Clinical Evidence Agency for Health Care Research and Quality (AHRQ) Partner of GIN Over 60 (major) organizations
Evidence based healthcare decisions
Research evidence
Population/societalvalues
and preferences
(Clinical) state and circumstances
Expertise
Haynes et al. 2002
Your patient…as an internist
• 68 year old man with hypertension and non-valvular atrial fibrillation > 3 months– diabetes– large left atrium (→ cardioversion unlikely to be successful)– no history of strokes or transient ischemic attacks (TIAs)
• Terrified of having a stroke
Risk factors for stroke with NVAF
CHADS2 score for assessment of stroke risk in patients with non-rheumatic AF
Risk factor PointsRecent Congestive heart failure exacerbation 1
History of Hypertension 1
Age 75 years or older 1
Diabetes 1
Prior history of Stroke or TIA 2
Risk factors for stroke with NVAF
CHADS2 score for assessment of stroke risk in patients with non-rheumatic AF
Risk factor PointsRecent Congestive heart failure exacerbation 1
History of Hypertension 1
Age 75 years or older 1
Diabetes 1
Prior history of Stroke or TIA 2
CHADS = 2
Evidence concerning NVAF and stroke*
• Risk of stroke if untreated (CHADS =2): 45/1000 per year
• Relative Risk Reduction for stroke– Warfarin: 0.64 (95%CI 0.51-0.77)
• RRI for major bleeding– Warfarin: 2.58 (95%CI 1.12-5.97)
* Pooled estimates of treatment effect in this evidence profile are from a meta-analysis conducted for these guidelines, including data from 6 RCTs of adjusted-dose vitamin K antagonist therapy versus no
antithrombotic therapy (AFASAK I, BAATAF, CAFA, EAFT, SPAF I, SPINAF), You et al., in press.
Physician accuracy in estimating risk: no better than chance…
Primum non nocere
“Primum non net nocere”
Evidence based healthcare decisions
Research evidence
Population/societalvalues
and preferences
(Clinical) state and circumstances
Expertise
Haynes et al. 2002
Balancing desirable and undesirable consequences
↑ burden ↑ resources
↑ dietary restriction↑ bleeding
↑ QoL ↓ stroke
↓ Morbidity↑ survival
For Against
Conditional
Strong
Balancing desirable and undesirable consequences
↑ burden↑ resources
↑ dietary
restriction
↑ bleeding
↑ QoL↓ stroke
↓
Morbidity
↑ survival
For Against
Conditional
Strong
Balancing desirable and undesirable consequences
↑ burden↑ resources
↑ dietary restriction
↑ bleeding
↑ QoL↓ stroke
↓ Morbidity
↑ survival
For Against
Conditional
Strong
Balancing desirable and undesirable consequences
↑ burden
↑ resources
↑ dietary
restriction
↑ bleeding
↑ QoL
↓ stroke
↓
Morbidity
↑ survival
For Against
Conditional
Strong
Balancing desirable and undesirable consequences
↑ burden
↑ resources
↑ dietary
restriction
↑ bleeding
↑ QoL
↓ stroke
↓
Morbidity
↑ survival
For Against
Conditional
Strong
Balancing desirable and undesirable consequences
↑ burden↑ resources
↑ dietary restriction
↑ bleeding
↑ QoL
↓ stroke
↓ Morbidity
↑ survival
For Against
Conditional
Strong
Summary from the practitioner’s perspective for this patient
• must anticoagulate 100 people with NVAF for 1 year to prevent 3 strokes per year
(30 fewer per 1000 or NNT of 33)
• for 100 anticoagulated patients in the community, this will cause 1 additional people to have a major bleed per year
(8 more per 1000 or NNT of 125)
Summary from this patient’s perspective
• If you take anticoagulants– your risk of stroke in the coming year will
decrease from 4.5% to 1.5% per yearbut – your risk of having a major bleed will increase
from 0.5% to 1% per year
GRADE recommendations
• For patients with AF, including those with paroxysmal AF, who are at high risk of stroke (e.g., CHADS2 score ≥ 2), we recommend oral anticoagulation rather than no therapy (strong recommendation, high quality evidence)
• For patients with AF, including those with paroxysmal AF, who are at low risk of stroke (e.g., CHADS2 score = 0), we suggest no therapy rather than antithrombotic therapy (weak recommendation, moderate quality evidence).
• Remark: Patients who place an exceptionally high value on stroke reduction and a low value on avoiding bleeding and the burden associated with antithrombotic therapy are likely to choose antithrombotic therapy rather than no antithrombotic therapy. Other factors that may influence the choices above are a consideration of patient-specific bleeding risk and the presence of additional risk factors for stroke, including age 65 to 74 years and female gender, which have been more consistently validated, and vascular disease, which has been less well validated. The presence of multiple non-CHADS2 risk factors for stroke may favor oral anticoagulation therapy.
GRADE recommendations
Systematic review
Guideline development
PICO
OutcomeOutcomeOutcomeOutcome
Formulate
question
Rate
importa
nce
Critical
Important
Critical
Not important
Create
evidence
profile with
GRADEpro
Summary of findings & estimate of effect for each outcome
Grade overall quality of evidence across outcomes based
on lowest quality of critical outcomes
Panel
Randomization increases initial
quality
1. Risk of bias2. Inconsisten
cy3. Indirectnes
s4. Imprecision5. Publication
bias
Gra
de d
own
Gra
de u
p 1. Large effect
2. Dose response
3. Opposing bias & Confounders
Rate quality
of evidence
for each
outcomeSelect
outcomes
Very low
LowModerate
High
Grade recommendations• For or against (direction) • Strong or conditional/weak (strength)
By considering balance of: Quality of evidence Balance benefits/harms Values and preferences
Revise if necessary by considering: Resource use (cost)
Formulate Recommendations ( | …)• “We recommend using…” | “Clinicians
should…”• “We suggest using…” | “Clinicians might…”• “We suggest not using…” | “Clinicians …
not…”• “We recommend not using…”| “Clinicians
should not…”
Outcomes
across
studies
GuidelineIn
put?
Summary
• Evidence based decision making requires consideration of many factors
• Evidence can be complex and needs careful integration
Asking questions and choosing outcomes
Elie Akl
Systematic review
Guideline development
PICO
OutcomeOutcomeOutcomeOutcome
Formulate
question
Rate
importa
nce
Critical
Important
Critical
Not important
Create
evidence
profile with
GRADEpro
Summary of findings & estimate of effect for each outcome
Grade overall quality of evidence across outcomes based
on lowest quality of critical outcomes
Panel
Randomization increases initial
quality
1. Risk of bias2. Inconsisten
cy3. Indirectnes
s4. Imprecision5. Publication
bias
Gra
de d
own
Gra
de u
p 1. Large effect
2. Dose response
3. Opposing bias & Confounders
Rate quality
of evidence
for each
outcomeSelect
outcomes
Very low
LowModerate
High
Grade recommendations• For or against (direction) • Strong or conditional/weak (strength)
By considering balance of: Quality of evidence Balance benefits/harms Values and preferences
Revise if necessary by considering: Resource use (cost)
Formulate Recommendations ( | …)•“The panel recommends that ….should...”
( | …)•“The panel suggests that ….should...” (? | …)
•“The panel suggests to not ...” (? | …)•“The panel recommends to not...” ( | …)
Outcomes
across
studies
GuidelineIn
put?
OOO
OOO
Guidelines and questions
Guidelines are a way of answering questions about clinical, communication, organisational or policy interventions, in the hope of improving health care or health policy.
It is therefore helpful to structure a guideline in terms of answerable questions.
WHO Guideline Handbook, 2008
Questions
Should be practice NOT
evidence driven
Types of questions
Background QuestionsDefinition: What is COPD?Mechanism: What is the mechanism of
action of mucolytic therapy?
Foreground QuestionsEfficacy: In patients with COPD, does
mucolytic therapy improve survival?
Good questions...
• Questions you have when trying to decide what to prescribe/recommend to your patient
• Questions you have when trying to decide what to provide in your country/region/ clinic
What should you do with the person in front of you?
Types of questions
Background QuestionsDefinition: What is COPD?Mechanism: What is the mechanism of
action of beta-agonists?
Foreground QuestionsEfficacy: In patients with COPD, do
beta-agonists improve survival?
Should we recommend oseltamivir versus no antiviral therapy in adults with influenza-like illness?
Should we recommend that soft drink machines be banned from schools to prevent childhood obesity?
Should in vitro specific IgE determination be used for the diagnosis of IgE-mediated cow’s milk allergy in patients suspected of cow’s milk allergy?
Framing a foreground question
P
I
C
O
Framing a foreground question
Population:
Intervention:
Comparison:
Outcomes:
Framing a foreground question
P
I
C
O
Framing a foreground question
Population:
Intervention:
Comparison:
Outcomes:
Case scenario
A 13 year old girl who lives in rural Indonesia presented with flu symptoms and developed severe respiratory distress over the course of the last 2 days. She required intubation. The history reveals that she shares her living quarters with her parents and her three siblings. At night the family’s chicken stock shares this room too and several chicken had died unexpectedly a few days before the girl fell sick.
Potential interventions: antivirals, such as neuraminidase inhibitors oseltamivir and zanamivir
What are examples of:
• Background questions
• Foreground questions•Population:
•Intervention:
•Comparison:
•Outcomes:
68
Framing a foreground question
Population: Avian Flu/influenza A (H5N1) patients
Intervention: Oseltamivir (or Zanamivir)
Comparison: No pharmacological intervention
Outcomes: Mortality, hospitalizations, resource use, adverse outcomes,
antimicrobial resistance
Schunemann, Hill et al., The Lancet ID, 2007
Choosing outcomes
• Every decision comes with desirable and undesirable consequencesDeveloping recommendations must include a
consideration of desirable and undesirable outcomes
Outcomes should be patient important outcomes.
Outcomes
Should be importance driven
NOT evidence driven
• desirable outcomes– lower mortality– reduced hospital stay– reduced duration of disease– reduced resource expenditure
• undesirable outcomes– adverse reactions – the development of resistance – costs of treatment
Choosing outcomes
What if what is important is not measured?
What if what is measured is not important?
How do we make sure we’ve covered all important outcomes?
Choosing outcomes
• Decision makers (and guideline authors) need to consider the relative importance of outcomes when balancing these outcomes to make a recommendation
• Relative importance vary across populations
• Relative importance may vary across patient groups within the same population
• When considered critical - evaluate
Relative importance of outcomes
2
Critical for decision making
Important, but not critical for decision making
Of lowimportance
5
6
7
8
9
3
4
1
Relative importance of outcomes
Nausea 2
Hierarchy of outcomes according to their importance to assess the effect of oseltamivir in patients with H5N1 influenza
Importance of endpoints
Critical for decision making
Important, but not critical for decision making
Of lowimportance
5
Neurological complications 6
Pneumonia 7
Hospital admission 8
Mortality 9
3
4
1
Good questions lead to good recommendations
• There is controversy around the answer
• There is doubt around the answer
• Want to confirm the present answer
• Has a chance of being answered, or will determine research in future
• Will improve care, cost, quality of life
Agenda
Agenda