key steps in conducting systematic reviews for...

11
Statistics in Urology Key Steps in Conducting Systematic Reviews for Underpinning Clinical Practice Guidelines: Methodology of the European Association of Urology Thomas Knoll a,1 , Muhammad Imran Omar b,1, *, Steven Maclennan b , Virginia Herna ´ndez c , Steven Canfield d , Yuhong Yuan e , Max Bruins f , Lorenzo Marconi g , Hein Van Poppel h , James N’Dow b , Richard Sylvester i , EAU Guidelines Office Senior Associates Group Authorship 2 [8_TD$DIFF] a Department of Urology, Sindelfingen-Boeblingen Medical Center, University of Tübingen, Sindelfingen, Germany; b Academic Urology Unit, University of Aberdeen, Aberdeen, Scotland, UK; c Department of Urology, Hospital Universitario Fundacion Alcorcon, Madrid, Spain; d Division of Urology, University of Texas McGovern Medical School, Houston, TX, USA; e Department of Medicine, McMaster University, Hamilton, ON, Canada; f Department of Urology, Radboud University Medical Center, Nijmegen, The Netherlands; g Department of Urology, Coimbra University Hospital, Coimbra, Portugal; h Department of Urology, University Hospital Gasthuisberg, Katholieke Universiteit Leuven, Leuven, Belgium; i EAU Guidelines Office, Brussels, Belgium EUROPEAN UROLOGY 73 (2018) 290 300 available at www.sciencedirect.com journal homepage: www.europeanurology.com Article info Article history: Accepted August 17, 2017 Associate Editor: James Catto Keywords: Systematic review Meta-analysis Abstract Context: The findings of systematic reviews (SRs) and meta-analyses (MAs) are used for clinical decision making. The European Association of Urology has committed increasing resources into the development of high quality clinical guidelines based on such SRs and MAs. Objective: In this paper, we have summarised the process of conducting SRs for underpin- ning clinical practice guidelines under the auspices of the European Association of Urology Guidelines Ofce. Evidence acquisition: The process involves explicit methods and the ndings should be reproducible. When conducting a SR, the essential rst step is to formulate a clear and answerable research question. An extensive literature search lays the foundation for evidence synthesis. Data are extracted independently by two reviewers and any dis- agreements are resolved by discussion or arbitration by a third reviewer. Evidence synthesis: In SRs, data for particular outcomes in individual randomised controlled trials may be combined statistically in a meta-analysis to increase power when the studies are similar enough. Biases in studies included in a SR/MA can lead to either an over estimation or an under estimation of true intervention effect size, resulting in heterogeneity in outcome between studies. A number of different tools are available such as Cochrane Risk of Bias assessment tool for randomised controlled trials. In circumstances where there is too much heterogeneity, or when a review has included nonrandomised comparative studies, it is more appropriate to conduct a narrative synthesis. The GRADE tool for assessing quality of evidence strives to be a structured and transparent system, which can be applied to all evidence, regardless of 1 These authors are joint rst authors[10_TD$DIFF]. 2 The EAU Guidelines Ofce Senior Associates Group Authorship members are: Saeed Dabestani, Tobias Gross, Fabian Hofmann, Arjun Nambiar, Marc Schneider, Thomas Seisen, Lisette tHoen, Thomas van den Broeck, Erik Veskimae. * Corresponding author. European Association of Urology, University of Aberdeen, Health Sciences Building (Second Floor), Foresterhill, University of Aberdeen, Aberdeen AB25 2ZD, UK. Tel. +[9_TD$DIFF]441224438126. E-mail addresses: [email protected], [email protected] (M.I. Omar). http://dx.doi.org/10.1016/j.eururo.2017.08.016 0302-2838/© 2017 Published by Elsevier B.V. on behalf of European Association of Urology.

Upload: letruc

Post on 21-Jun-2019

217 views

Category:

Documents


0 download

TRANSCRIPT

E U RO P E AN U RO LOGY 7 3 ( 2 018 ) 2 9 0 – 3 0 0

avai lable at www.sciencedirect .com

journal homepage: www.europeanurology.com

Statistics in Urology

Key Steps in Conducting Systematic Reviews for UnderpinningClinical Practice Guidelines: Methodology of the EuropeanAssociation of Urology

Thomas Knoll a,1, Muhammad Imran Omar b,1,*, Steven Maclennan b, Virginia Hernandez c,Steven Canfield d, Yuhong Yuan e, Max Bruins f, Lorenzo Marconi g, Hein Van Poppel h,James N’Dowb, Richard Sylvester i,

EAU Guidelines Office Senior Associates Group Authorship2[8_TD$DIFF]

aDepartment of Urology, Sindelfingen-Boeblingen Medical Center, University of Tübingen, Sindelfingen, Germany; bAcademic Urology Unit, University of

Aberdeen, Aberdeen, Scotland, UK; cDepartment of Urology, Hospital Universitario Fundacion Alcorcon, Madrid, Spain; dDivision of Urology, University of

Texas McGovern Medical School, Houston, TX, USA; eDepartment of Medicine, McMaster University, Hamilton, ON, Canada; fDepartment of Urology,

Radboud University Medical Center, Nijmegen, The Netherlands; gDepartment of Urology, Coimbra University Hospital, Coimbra, Portugal; hDepartment of

Urology, University Hospital Gasthuisberg, Katholieke Universiteit Leuven, Leuven, Belgium; i EAU Guidelines Office, Brussels, Belgium

s of systematic reviews (SRs) andmeta-analyses (MAs) are used for

Article info

Article history:

Abstract

Context: The finding

Accepted August 17, 2017

Associate Editor:

James Catto

Keywords:

Systematic reviewMeta-analysis

clinical decisionmaking. The European Association of Urology has committed increasingresources into the development of high quality clinical guidelines based on such SRs andMAs.Objective: In this paper, we have summarised the process of conducting SRs for underpin-ning clinical practice guidelines under the auspices of the European Association of UrologyGuidelines Office.Evidence acquisition: The process involves explicit methods and the findings should bereproducible. When conducting a SR, the essential first step is to formulate a clear andanswerable research question. An extensive literature search lays the foundation forevidence synthesis. Data are extracted independently by two reviewers and any dis-agreements are resolved by discussion or arbitration by a third reviewer.Evidence synthesis: In SRs, data for particular outcomes in individual randomisedcontrolled trials may be combined statistically in a meta-analysis to increase powerwhen the studies are similar enough. Biases in studies included in a SR/MA can lead toeither an over estimation or an under estimation of true intervention effect size,resulting in heterogeneity in outcome between studies. A number of different toolsare available such as Cochrane Risk of Bias assessment tool for randomised controlledtrials. In circumstances where there is too much heterogeneity, or when a review hasincluded nonrandomised comparative studies, it is more appropriate to conduct anarrative synthesis. The GRADE tool for assessing quality of evidence strives to be astructured and transparent system, which can be applied to all evidence, regardless of

1 These authors are joint first authors[10_TD$DIFF].2 The EAU Guidelines Office Senior Associates Group Authorship members are: Saeed Dabestani,Tobias Gross, Fabian Hofmann, Arjun Nambiar, Marc Schneider, Thomas Seisen, Lisette t’Hoen,Thomas van den Broeck, Erik Veskimae.* Corresponding author. European Association of Urology, University of Aberdeen, Health SciencesBuilding (Second Floor), Foresterhill, University of Aberdeen, Aberdeen AB25 2ZD, UK.Tel. + [9_TD$DIFF]441224438126.E-mail addresses: [email protected], [email protected] (M.I. Omar).

http://dx.doi.org/10.1016/j.eururo.2017.08.0160302-2838/© 2017 Published by Elsevier B.V. on behalf of European Association of Urology.

quality. A SR not only identifies, evaluates, and summarises the best available evidence,but also the gaps to be targeted by future studies.Conclusions: SRs and MAs are integral in developing sound clinical practice guidelinesand recommendations.Patient summary: Clinical practice guidelines should be evidence based, and systematicreviews and meta-analyses are essential in their production. We have discussed the keysteps of conducting systematic reviews and meta-analyses in this paper.

© 2017 Published by Elsevier B.V. on behalf of European Association of Urology.

E U RO P E AN U RO L OGY 73 ( 2 018 ) 2 9 0 – 3 0 0 291

1. Introduction

In 1979, Archibald Cochrane, a Scottish doctor, proposed: “Itis surely a great criticism of our profession that we have notorganised a critical summary, by specialty or subspecialty,adapted periodically, of all relevant randomised controlledtrials” [1]. Cochrane was one of the founding fathers ofevidence-based medicine. He highlighted and advocatedthe importance of critically summarising the findings ofresearch studies and was passionate about this cause. Hedesignated the systematic review (SR) as a methodproviding such a summary [1]. Cochrane’s vision ultimatelyled to the development of the Cochrane Collaboration in1993.

The findings of SRs andmeta-analyses (MAs) are used forclinical decision making and are integral in developingsound clinical practice guidelines and recommendations.The European Association of Urology (EAU) has committedincreasing resources into the development of high qualityclinical guidelines based on such SRs and MAs [2–4]. TheGuidelines Office has completed 21 SRs and 32 reviews areongoing. These high quality systematic reviews are used formaking guideline recommendations [5]. Supplementary

[(Fig._1)TD$FIG]

Fig. 1 – The process of conducting systematic review.GRADE = [1_TD$DIFF]Grading of Recommendation, Assessment, Development, and Evaluatio

Table 1 summarises the Guideline recommendation(s)underpinned by the selected SRs conducted under theauspices of the EAU Guidelines Office. In this paper, we havesummarised the process of conducting SRs under theauspices of the EAU Guidelines Office. The process is alsographically illustrated in Figure 1.

2. Evidence acquisition

2.1. What is a randomised controlled trial?

A randomised controlled trial (RCT) is a type of study inwhich participants are randomly assigned to two or moregroups in order to assess interventions. Awell designed RCTis adequately powered and follows the principles andrecommendations of the CONSORT statement [6].

2.2. What is a SR?

A SR identifies, selects, synthesises, and appraises studiesthatmeet prespecified inclusion criteria to answer a researchquestion. The process involves explicit methods and thefindings should be reproducible. A SR not only identifies,

n.

Table 1 – Differences between a systematic review and traditional review

Systematic review Traditional review

Review question Focused, well-defined clinical question formulated withPICO framework

Question is usually broad and not well defined

Protocol A-priori protocol is developed and published No protocolMethods Usually very well-defined and explicitly stated with study

inclusion and exclusion criteriaUsually not well-defined

Literature search A good systematic review includes a well-definedcomprehensive search, without language or other restrictions

Search strategy is usually not stated and the review isconfined to well-known articles often supporting the authors’ views

Critical appraisal Internal validity of the individual studies included ina systematic review is vetted by various tools such asCochrane risk of bias assessment tool

Critical appraisal is usually not performed

Synthesis Qualitative (sometimes quantitative with meta-analysis):may answer a clinical question which may not be answerableby individual studies

Usually qualitative summary

Findings/conclusion Findings are reproducible Findings are not reproducible. Author’s personal belief mayinfluence the overall conclusion of a traditional review

EURO P E AN U RO L OGY 73 ( 2 018 ) 2 9 0 – 3 0 0292

evaluates, and summarises the best available evidence, butalso the gaps to be targeted by future studies [7].

2.3. What is a MA?

In SRs, data for particular outcomes in individual RCTs maybe combined statistically in a MA to increase power whenthe studies are similar enough [7]. A MAmay thus be able toanswer a clinical question which is unanswerable byindividual studies due to inadequate power. A MA shouldnot be done outside of the SR setting because it is unlikelythat all relevant trialswill have been identified and thereforea biased or misleading pooled estimate may result. The termMA was coined by an American statistician, Gene Glass [8].

2.4. What is a traditional review?

A traditional review is usually written by an expert andprovides a summary or an overview of a topic. Unlike a SR,the topic of a traditional review is often broad and lessprecisely defined. The methods used to conduct traditionalreviews are not standardised. The search strategy is usuallynot stated, the review is confined to well-known articles [9],and the author’s personal beliefs may influence the overallconclusion [10]. If a MA is attempted in this situation, anexaggerated and spuriously precise estimate may beobtained and treated with greater credibility than it isdue. The main features of SRs are covered throughout thisreport and the key differences between a SR and traditionalreview are summarised in Table 1.

2.5. Formulating a clear, well-designed research question and

writing a review protocol

When conducting a SR, the essential first step is toformulate a clear and answerable research question. Thisclarifies the objectives and the study inclusion andexclusion criteria. A straightforward way to do this is tobreak the question down into constituent parts using thePICO framework, which stands for Participants, Interven-tion, Comparator, and Outcomes [11].

It is necessary to be transparent and explicit about whichpopulations, intervention, comparator, and outcomes are tobe included or excluded. This is important for not onlydesigning the search strategy but is also helpful forsystematic reviewers who do not necessarily have in-depthclinical knowledge on the topic. Furthermore, it serves as ablueprint to inform study screening and data extraction. Toillustrate: “Should I perform partial nephrectomy on apatient with kidney cancer?” is a meaningful question,particularly for urologists. However, it is vague and notinformative for deciding which studies should be included.A more scientific way for phrasing this question is: “Inpatients with localised renal cell carcinoma (P), howeffective is partial nephrectomy (I), when compared withradical nephrectomy (C), in improving overall survival (O)?”

Of course, more clinically or methodologically justifiabledetail will need to be provided to define limits on age,disease stage, which approaches to partial and radicalnephrectomy are included, which other (secondary) out-comes should be included, and search date cut-offs. Thetypes of study designs to be included should be specified.Cochrane SRs are traditionally confined to evidence fromRCTs. However, it is not always ethical to randomiseparticipants for comparing interventions and observationalstudies play a crucial role in this scenario.

SRs should be protocol driven and prospectively regis-tered in a database, such as PROSPERO [12] or the CochraneLibrary, to help counter publication bias and selectiveoutcome reporting by giving a permanent record of the apriori methods. PROSPERO is a database maintained by theUniversity of York. PROSPERO provides a free registrationand publication of systematic review protocols and isavailable at the following link: https://www.crd.york.ac.uk/PROSPERO/. Registration of protocols alsomeans that otherscan see if a review is currently underway and work is notduplicated. Crucially, in addition to the PICO elements, aprotocol outlines what actions will be taken in whichcircumstances, such as how particular types of data will behandled in the analyses, the conditions under whichMAs ornarrative synthesis will be undertaken, which subgroupswill be analysed, and how risk of bias (RoB) will be assessed.

E U RO P E AN U RO L OGY 73 ( 2 018 ) 2 9 0 – 3 0 0 293

3. Evidence synthesis

3.1. Literature search

Extensive literature search lays the foundation for evidencesynthesis. With increasing numbers of published researchreports and numerous bibliographic databases, it ischallenging to efficiently identify relevant studies, espe-cially when research topics are complex and study designsare not limited to RCTs.

Conducting a literature search to identify relevantreports from databases and other resources requiresrigorous search techniques. When designing searches, highsensitivity and high specificity should be aimed for.However, no search strategy is 100% perfect; high sensitivitymay result in a huge number of irrelevant records and thereis always a trade-off. It is advisable to involve aninformation scientist for designing search strategies.

Literature searches for SRs should involve at least twobibliographic databases and be as extensive as possible inorder to identify all relevant studies and increase thegeneralisability of the results. According to MethodologicalExpectations for Cochrane Intervention Reviews, CochraneCentral Register of Controlled Trials, MEDLINE, (andEMBASE and Cochrane Review Group’s Specialized Registerwhen available) are the minimum databases to be searchedfor a Cochrane SR [13]. Most published SRs now search atleast MEDLINE, EMBASE, and Cochrane library databases.Databases relevant to the topic of the review (eg, CINAHL fornursing-related topics), national and regional databases,and subject-specific databases should also be considered.Trials registers such as ClinicalTrials.gov, the World HealthOrganisation International Clinical Trials Registry Platformportal, and other sources (eg, pharmaceutical industry andregulatory agencies websites) should be searched forunpublished and ongoing studies. Other sources includedissertations, theses, conference abstracts and proceedings,and grey literature (information produced in electronic andprint formats not controlled by commercial publishing)databases. Searching the bibliographies of identified rele-vant studies and previous SRs on the same topic is a goodapproach for reducing the chance of missing relevantstudies. Study authors and organisations should be con-tacted for missing information on unpublished or ongoingstudies [7,13].

Content experts are very important in designing ad-vanced search strategies [7]. By working closely withinformation specialists, they can assist with identifyingsearch concepts, and suggest keywords as well as keyreferences, while revising and improving preliminarysearches. Too many different search concepts should beavoided. Sometimes, some concepts (eg, outcomes) may notbe well described in the title or abstract of an article and areoften not well indexed with controlled vocabulary terms; itmight be better to search only for the population andintervention. However, a broad range and wide variety ofsearch terms should be combined with “OR” within eachconcept. Both free-text words (including spelling variants,synonyms, related terms, opposites, plurals, acronyms,

truncations, wildcards, and proximity operators) andappropriate subject headings should be used (eg, MeSHand EMTREE). Ensure the Boolean operators “AND,” “NOT,”and “OR” are used correctly. An example of a search strategy,along with an explanation, is presented inFigure 2. Strategies must be translated for every databaseand each interface (eg, PUBMED and OvidSP are twodifferent interfaces for the same database MEDLINE).Specially designed high sensitivity and tested filters canbe used when appropriate. There is no need to use an RCTfilter when searching Cochrane Central Register of Con-trolled Trials because all the records are thoroughly andcorrectly indexed. Applying a publication date, publicationformat, or language restrictions in the search strategy shouldbe justified and included in the report [7].

The search strategy should be documented and reportedtransparently in the protocol and in the review. It isnecessary to report the sources searched (databases andinterfaces), dates and limits, and the numbers found fromeach source [14,15]. Key search terms can be reported in themethods section. The full search strategies for one or moreelectronic databases can be submitted as an appendix andpublished as supplementary files available online to readers.

3.2. Abstract and full-text screening

After the literature search has been conducted, the list ofretrieved abstracts is screened for studies that may beeligible for inclusion. Abstract screening is performed usingan a-priori developed study screening form. This formcontains the inclusion and exclusion criteria for each PICOelement as reported in the study protocol. Following thisformwill lead to the decision whether to exclude or includethe abstract. In case no decision can be made based on theinformation in the abstract, it will be marked as “unclear.”Unclear abstracts are included at this stage, as it requires full-text retrieval and review to make the final decision. Abstractscreening must be performed independently by (at least)two reviewers. The combined results will subsequently bereviewed for disagreements, and a third independentreviewer can be consulted for resolving conflicts.

Full-text screening follows the same principles asabstract screening. A difference, however, is that full-textpapers cannot be marked as “unclear” but require adefinitive decision whether to include or exclude the study.Also, for excluded full-text papers, reviewers must keeptrack of the reason for exclusion. Upon completion, the finalnumber of included and excluded studies (with the reasonfor exclusion) is reported in the Preferred Reporting Itemsfor Systematic Reviews and Meta-Analyses flow diagram asillustrated in Figure 3.

3.3. Data extraction from included studies

One of the most important and time-consuming parts of aSR is data extraction. Data are extracted independently bytwo reviewers and any disagreements are resolved bydiscussion or arbitration by a third reviewer [16]. Table 2summarises the key information that should be extracted

[(Fig._2)TD$FIG]

Fig. 2 – Example of search strategy.exp = [2_TD$DIFF]“exploded” the subject heading.

E U RO P E AN U RO L OGY 73 ( 2 018 ) 2 9 0 – 3 0 0294

from the studies included in a SR. Each review question isunique and therefore requires development of a separateform. A pilot tested electronic or paper version of the dataextraction form includes entries for study characteristicsalong with relevant results and RoB assessment. The dataextraction form should first be piloted on two to threestudies.

The setting and timing of the study, participants, anddisease characteristics (such as age, sex, comorbidities,diagnostic criteria, staging, and prognostic factors) thatmayinfluence intervention effects or external validity should becollected. Relevant information on interventions includesurgical techniques, drug doses and frequency, or routes ofdelivery. It is important to clearly predecide the outcomemeasures that will be collected in terms of their definition(eg, measurement method, scale, and threshold), timing,and unit(s) of measurement. This is crucial as manyoutcomes may be reported using multiple definitions (eg,biochemical recurrence after radical prostatectomy), mea-surement scales (eg, multiple urinary symptom question-naires to access the severity of urinary incontinence),measurement method (eg, healthy parenchymal volumeloss after partial nephrectomy measured by computedtomography volumetric analysis or estimated by surgeon),or different points in time (eg, cancer-specific survival at

1 yr, 5 yr, and 10 yr). Standardised sets of outcomes, knownas “core outcome sets” represent the minimum that shouldbe measured and reported in all clinical trials of a specificcondition making it easier for the results of trials to becombined [17]. The numerical data required for MA are notalways available and sometimes other statistics or graphicalinformation can be collected and converted into therequired format.

3.4. RoB assessment

Biases in studies included in a SR can lead to either anoverestimation or an underestimation of true interventioneffect size, resulting in heterogeneity in outcome betweenstudies.

3.4.1. RCTs

In SRs of RCTs, the RoB in each study is assessed for eachoutcome using the domains (judged as either low, high, orunclear) in the Cochrane Collaboration’s RoB Tool [7,18]:

� S

election bias, which includes random sequence genera-tion bias and allocation concealment bias.

� P

erformance bias due to knowledge of the allocatedintervention by study personnel and participants.

[(Fig._3)TD$FIG]

Fig. 3 – Example Preferred Reporting Items for Systematic Reviews and Meta-analyses flowchart [[3_TD$DIFF]21].

E U RO P E AN U RO L OGY 73 ( 2 018 ) 2 9 0 – 3 0 0 295

� D

etection bias in assessing the outcome due to knowl-edge of the allocated intervention.

� A

ttrition bias due to incomplete data and exclusion fromanalyses.

� R

eporting bias due to selective outcome reporting. � O ther sources of bias.

RevMan is then used to create a RoB summary graph tovisually illustrate the RoB (low, high, or unclear) for eachdomain in each study [19]. A draft version of a revised tool toassess the RoB in RCTs (RoB 2.0) has recently been proposed[20].

Table 2 – Information to consider in data extraction form

Data extraction form field Information to consider in data extraction

Reviewer identification Review author ID; dateStudy identification Study ID; report ID; citation; author contact details; publication yr; country; source of dataMethods Study design; setting; enrolment period; no. of centresParticipant characteristics Total no.; age; sex; co-morbidity; ethnicity; no. lost to follow-upDisease characteristics Staging (eg, TNM); severity (eg, IPSS score); biological behaviour (eg, Gleason score); surgical complexity

(eg, R.E.N.A.L. score); method of diagnosisInterventionsa Surgical technique; surgeon experience/volume; drug dose, route of delivery and lengthDiagnostic test characteristicsb Description of the reference standard, index test, comparator, manufacturer; interpreter of diagnostic testPrognostic factor characteristicsc Dose, level, duration of exposure; method of measurementOutcome Outcome definition (incl. unit of measurement, scale, assessor, time point of measurement)Results to include in a meta-analysis Dichotomous outcomes: no. of events/no. of participants

Continuous outcomes: mean value and SD in each intervention groupTime-to-event outcomes: HR (with 95% CI)Diagnostic test performance outcomes: TP, FP, TN, FN

Risk of bias Cochrane RoB tool for RCTs or other tools such as QUIPS and QUADAS-2Other Review author comments

CI = confidence interval; FN = false negative; FP = false positive; HR = hazard ratio; ID = identification; IPSS = International Prostate Symptom Score;RCT = randomised controlled trial; RoB = risk of bias; SD = standard deviation; TN = true negative; TP = true positive.a For comparative effectiveness of interventions systematic reviews.b For diagnostic test accuracy systematic reviews.c For prognostic factor systematic reviews.

E U RO P E AN U RO L OGY 73 ( 2 018 ) 2 9 0 – 3 0 0296

3.4.2. Nonrandomised comparative studies

Potential biases are greater in nonrandomised comparativestudies (NRCS) than for RCTs due to the high risk ofconfounding. NRCS are thus always considered to be at highrisk of selection bias.

In addition to assessing the above domains for RCTs, amaximum of the five most important confounders shouldbe prespecified. For each outcome, an assessment is madeof: (1) whether the confounderwas considered, (2) whetherits distribution was balanced, and if not then (3) whether itwas controlled for in the analysis.

This information is used to reach an overall decisionabout the RoB for each confounder and each outcome. TheRoB summary graph can also include the additionalconfounder information.

Recently, ROBINS-I, a tool for assessing risk of bias innonrandomised studies of interventions has been pub-lished. This tool is not yet tested within the EAU [[11_TD$DIFF]22].

3.4.3. Single arm studies

In SRs of single arm case series, five aspects are considered:

1. W

as there an a priori protocol? 2. W as the total population included or were study

participants selected consecutively?

3. W as outcome data complete for all participants and any

missing data adequately explained/unlikely to be relatedto the outcome?

4. W

ere all prespecified outcomes of interest and expectedoutcomes reported?

5. W

ere primary benefit and harm outcomes appropriatelymeasured?

If the answer to all five questions is “yes,” the study is at“low” RoB. If the answer to any question is “no,” the study isat “high” RoB.

QUADAS-2 is used for assessing RoB in diagnostic testaccuracy SRs [[12_TD$DIFF]23], while QUIPS is used for prognostic factorSRs [[13_TD$DIFF]24].

3.4.4. RoB assessment process

Two reviewers independently assess the RoB in each studywhen extracting data. A third reviewer acts as an arbitrator.If the RoB is unclear in any domain, an attempt is made toobtain the study protocol or to contact the study authors.RoB summary graphs are created with separate graphs foreach of the three different study types. Biases are describedin the SR report, emphasising areas of concern and thepossible effect of bias in interpreting the results [18].

3.5. MA or narrative synthesis (including subgroup analyses,

assessment of heterogeneity, and publication bias)

Combining information from different studies included in aSR is known as evidence synthesis. Depending on thehomogeneity of the populations, interventions, outcomedefinitions, and measurements in the included studies, theevidence synthesis may be narrative or quantitative.

MA is feasible when more than one RCT reporting thesame outcome is identified [[14_TD$DIFF]25]. MA provides an overallestimate of the effect of the treatment or the interventionanalysed. The effect of the treatment refers to thedifferences in outcomes found between an interventionand comparator. This effect, along with the precision of theestimate, needs to be extracted from all the individualstudies, so that the consistency in the direction andmagnitude of effect across studies can be assessed [[15_TD$DIFF]26].

However, before meta-analysing data, a detailed assess-ment of the included studies is crucial to decide whether aMA of the results is not only feasible but also sensible. First,the internal validity and precision of the included studieswill directly impact the quality of a MA. If the included

E U RO P E AN U RO L OGY 73 ( 2 018 ) 2 9 0 – 3 0 0 297

studies are biased, underpowered, or the individualsestimates inconsistent, then performing a MA can poten-tially give a spuriously precise and often meaninglesssummary statistic.

One should take into account that the studies included ina SR will inevitably differ. Despite clearly defined eligibilitycriteria, differences in the included studies will exist withregards to natural variation in the populations and fidelityto the interventions. Those variations are known asheterogeneity [[16_TD$DIFF]27]. There are three main types of heteroge-neity: (1) clinical heterogeneity, which refers to variationsin the populations, interventions or the way in which theoutcomes were assessed, (2) methodological heterogeneity,which refers to differences in study design and reportingbiases, (3) and statistical heterogeneity, which refers tovariability in treatment effects due to the way the data areanalysed and reported in each study.

There are several ways to investigate heterogeneity. First,heterogeneity can be assessed by visual inspection of theforest plots. Poor overlap among confidence intervals mayindicate statistical heterogeneity. The chi-square Q statisticassesses the degree to which the individual study estimatesof effect size differ from the overall estimate, while the I2

statistic is an estimate of the percent of variation attribut-able to heterogeneity, that is, the ratio of true heterogeneityto total observed variation [7,27,28]. An I2 [17_TD$DIFF] greater than 50%is considered substantial. MA and the estimation of anoverall summary effect in the presence of significantheterogeneity may be misleading, but several options canbe considered in this scenario. Some heterogeneity may beexplained by the intervention having different effects in, forexample, older populations or at higher doses of a drug. Insuch instances, the stratification of studies according totheir characteristics or subgroup analyses in specific patientpopulations may be done to explore heterogeneity or to tryto answer specific questions; however, this may not bepossible without individual patient data. Such analysesshould be regarded as exploratory and the results must beinterpreted with caution [[18_TD$DIFF]29,30]. They should be planned atthe protocol stage rather than in the light of the results,otherwise the conclusions may not be reliable [[19_TD$DIFF]31].

Publication bias should also be considered. Trials showingstatistically significant results are more likely to bepublished than those with negative findings [[20_TD$DIFF]32]. They arealso more likely to be published quickly, in more than oneplace, in English, in high impact, indexed journals, and citedby others [7]. Funnel plots, which plot the effect estimate ofindividual studies against the size of the study, may be usedto identify the publication bias but these should only be usedwhen there are at least 10 studies in the MA.

In circumstances where there is too much heterogeneity,or when a review has included NRCS, it is more appropriateto conduct a “narrative synthesis” [[21_TD$DIFF]7,33,34]. This is a theorydriven and iterative approach for summarising and impos-ing structure on the results of the included studies. Fourmain stages are involved:

1. Develop a theory of how the intervention works, why,

and for whom.

2. D

evelop a preliminary synthesis of the findings of theincluded studies.

3. E

xplore relationships within and between studies. 4. A ssess the robustness of the synthesis [ [22_TD$DIFF]34].

Essential activities include tabulating and describing thekey baseline characteristics and results of individualstudies. Then studies with similar populations, interven-tion, and outcome measurement characteristics can begrouped together and then further grouped with regards tothe direction and magnitude of the results. The results maythen be interpreted critically with regards to the theory,considering also the RoB and potential confounders withinand across the included studies to make statements aboutthe quality of the evidence about the intervention. Figure 4illustrates a decision flow diagram when to meta-analysedata or to use narrative synthesis.

3.6. Quality of evidence ([1_TD$DIFF]Grading of [5_TD$DIFF]Recommendations,

Assessment, Development, and Evaluation)

The [1_TD$DIFF]Grading of [5_TD$DIFF]Recommendations, Assessment, Develop-ment, and Evaluation (GRADE) Working Group, in re-sponse to the need for an improved system of evidencequality assessment, developed a tool in 2000 using inputfrom health professionals, researchers, and guidelinedevelopers around the world [ [23_TD$DIFF]35]. The GRADE tool forassessing quality of evidence strives to be a structured andtransparent system which can be applied to all evidence,regardless of quality. There are a few basic tenets ofevidence quality using GRADE [ [24_TD$DIFF]36]. The final qualityassessment, which applies to the body of evidence (allstudies included in the SR) is reported as one of fourpossible levels: high, moderate, low, or very low. Foremostis that RCTs start out as high quality and that nonrandom-ised (observational) studies start out as low quality.According to GRADE, there are five factors which canaffect overall evidence quality which starts out high in anegative way, and three factors which can affect qualitywhich starts low in a positive way [ [25_TD$DIFF]37].

Starting at potentially high quality, RCTs are susceptibleto certain biaseswhich, whenpresent, can raise doubt in thestudy findings. Once summarised, these inherent studyelements inform the overall RoB of all the studies includedin the analysis. This overall study “RoB” is the first of the five“negative” factors used in GRADE to assess the overallquality of the evidence. Next are itemswhich assess how theoverall evidence is represented and balanced. “Inconsisten-cy” examines the direction of different study results. Forexample, if one study shows an intervention is helpful whileanother does not, they are inconsistent. “Indirectness”examines the details of exactly what is compared in thevarious studies being reviewed. If studies use differentintervention methods, they may be indirectly comparingresults. “Imprecision” examines the margin of potentialerror within all combined studies. All of these factors mayraise doubts about the validity of the combined data,especially when trying to use it for a recommendation.Finally, if small studies showing negative results aremissing

Fig. 4 – A decision flow diagram when to meta-analyse data or to use narrative synthesis.NRS = nonrandomised studies; RCT = randomised controlled trials; RoB = risk of bias.

E U RO P E AN U RO L OGY 73 ( 2 018 ) 2 9 0 – 3 0 0298

from the literature, potential “publication bias” becomes aconcern as well [[26_TD$DIFF]35,37,38].

If observational studies are incorporated in a SR, inherentstudy bias is already assumed and the overall evidence[(Fig._5)TD$FIG]

Fig. 5 – The process of assessing the quality of evidence with GRADE approachEtD = [4_TD$DIFF]Evidence to Decision; GRADE = [1_TD$DIFF]Grading of [5_TD$DIFF]Recommendations, AssessmentRCT = randomised controlled trials; SoF = [7_TD$DIFF]Summary of Findings.

quality starts out as “low.” However, if the combined effectsize of the outcome is undeniable (“large” > 50%, “verylarge”> 75%), if there is an obvious dose response seen, or ifthere is evidence that the plausible confounders would

., Development, and Evaluation; HTA = [6_TD$DIFF]Health technology assessment;

E U RO P E AN U RO L OGY 73 ( 2 018 ) 2 9 0 – 3 0 0 299

diminish the demonstrated effect, the quality of thisevidence can actually be raised. Evidence from observa-tional studies that has been downgraded for any reason (asexplained above) should not be upgraded. The process ofassessing the quality of evidence is summarised inFigure 5. Ultimately this process strives to be structured,systematic, and transparent.

3.7. Incorporation of systematic review results in guidelines

The aim of the EAU Guidelines is to assist practicingclinicians in making informed decisions in different clinicalsituations, taking the highest quality scientific data, theirpatient’s personal circumstances, values, and preferencesinto account. The development of the guidelines is a coreactivity of the EAU as part of their educational efforts andthe available guidelines cover most of the urological field. Atransparent production process and continuous updatingare assured to ensure that these guideline documents are,and remain, of high value to their users. High quality SRs area superb source of datawhen composing clinical guidelines.The EAU Guideline Office made a major step forward whendeciding not only to use such SRs for the guidelines, but toalso perform their own SRs based on PICO questionsidentified and defined by the expert panels. The SRs areperformed in a standardised manner, following Cochranemethodology, with inclusion of expert urologists, associ-ates, methodologists, statisticians, and medical librarians.The results are published in peer-reviewed journals and areincluded in the annual updates of the specific guideline.

3.8. Recent technological advancement

The process of conducting a SR is laborious. However, anumber of technological advancements have made theprocess of conducting a SR easier. For example, softwarewhich can read graphs such as survival curves and help withtheir interpretation are available. Plot Digitizer is one suchtool and is freely available online [[27_TD$DIFF]39]. In addition, RevMannow contains an online calculator for imputingmissing data.The Cochrane Collaboration has continually adopted tech-nology to simplify the process of conducting SRs. As part ofThe Cochrane Collaboration’s Strategy to 2020 initiative, theInformatics and Knowledge Management section is workingon a number of projects such as the PICO annotation projectand the LinkedData project. A number of other organisationshave also developed tools for making the process ofconducting SRs more efficient. Some of the available toolsare: Abstrackr, Covidence, DistillerSR, Eppi-Reviewer 4,EROS, ExaCT, Rayyan, RevMan HAL, SUMARI, and TrialStateSRS 4.0 [[28_TD$DIFF]40]. Two of these tools are recommended by TheCochrane Collaboration (Covidence and Eppi-Reviewer).These packages can speed up the review process and someassist by semi-automating key SR tasks.

4. Conclusions

Projects like AllTrials will provide access to research data [ [29_TD$DIFF]41]. This will eventually reduce research wastage and will

provide a source of “Big Data.” The raw data from individualtrials can be used for Individual Patient Data MA, the goldstandard. Text mining technologies are also explored whichmay reduce the reviewers’ time through automating the keytasks such as citation screening. Data extraction with datamining technology, data-tagging, and translations have alsobeen explored [[30_TD$DIFF]42]. Automation of review process taskscould help us to achieve “living systematic reviews,” that is,SRs which are continuously updated incorporating relevantdata as it becomes available. The use of technology in the SRprocess will reduce time from the completion of a study toincorporating the findings within the review, and evidencewill be more reliable and readily available [[30_TD$DIFF]42].

Author contributions: Muhammad Imran Omar had full access to all thedata in the study and takes responsibility for the integrity of the data andthe accuracy of the data analysis.

Study concept and design: Omar, Knoll, N’Dow, Sylvester.Acquisition of data: Omar, MacLennan, Marconi.Analysis and interpretation of data: Omar, Knoll, MacLennan, Hernández,Canfield, Yuan, Bruins, Marconi, Van Poppel, N’Dow, Sylvester, EAUGuidelines Office Senior Associates Group Authorship.Drafting of themanuscript:Omar, Knoll, MacLennan, Hernández, Canfield,Yuan, Bruins, Marconi, Van Poppel, N’Dow, Sylvester.Critical revision of the manuscript for important intellectual content: Omar,Knoll, MacLennan, Hernández, Canfield, Yuan, Bruins, Marconi, VanPoppel, N’Dow, Sylvester, EAU Guidelines Office Senior Associates GroupAuthorship.Statistical analysis: None.Obtaining funding: None.Administrative, technical, or material support: None.Supervision: Omar.Other: None.

Financial disclosures:Muhammad Imran Omar certifies that all conflictsof interest, including specific financial interests and relationships andaffiliations relevant to the subject matter or materials discussed in themanuscript (eg, employment/affiliation, grants or funding, consultan-cies, honoraria, stock ownership or options, expert testimony, royalties,or patents filed, received, or pending), are the following: None.

Funding/Support and role of the sponsor: None.

Acknowledgments: Figure 5 was created by the GRADE Working Group.We are grateful to Professor Holger Schünemann, from McMasterUniversity, for allowing us to use this figure.

Appendix A. Supplementary data

Supplementary data associated with this article can befound, in the online version, at http://dx.doi.org/10.1016/j.eururo.2017.08.016.

References

[1] Cochrane AL. Effectiveness and efficiency: random reflections onhealth services. UK: The Nuffield Provincial Hospitals Trust; 1972.

[2] Sylvester RJ, OosterlinckW, Holmang S, et al. Systematic Reviewandindividual patient data meta-analysis of randomized trials compar-ing a single immediate instillation of chemotherapy after trans-urethral resection with transurethral resection alone in patientswith Stage pTa-pT1 urothelial carcinoma of the bladder: whichpatients benefit from the instillation? Eur Urol 2016;69:231–44.

E U RO P E AN U RO L OGY 73 ( 2 018 ) 2 9 0 – 3 0 0300

[3] Marconi L, Dabestani S, Lam TB, et al. Systematic review and meta-analysis of diagnostic accuracy of percutaneous renal tumour biop-sy. Eur Urol 2016;69:660–73.

[4] Dabestani S, Marconi L, Hofmann F, et al. Local treatments formetastases of renal cell carcinoma: a systematic review. LancetOncol 2014;15:e549–61.

[5] EAU Guidelines. Presented at the EAU Annual Congress London2017. http://uroweb.org/guidelines/.

[6] Altman DG, Schulz KF, Moher D, et al. The revised CONSORTstatement for reporting randomized trials: explanation and elabo-ration. Ann Intern Med 2001;134:663–94.

[7] Higgins J, Green S. Cochrane handbook for systematic reviews ofinterventions. Version 5. 1. 0. Oxford: The Cochrane Collaboration;2011.

[8] Glass G, Smith M. Meta-analysis of research on the relationship ofclass-size and achievement. Educ Eval Policy Anal 1979;1:2–16.

[9] Blettner M, Sauerbrei W, Schlehofer B, Scheuchenpflug T, Frieden-reich C. Traditional reviews, meta-analyses and pooled analyses inepidemiology. Int J Epidemiol 1999;28:1–9.

[10] Mulrow CD. The medical review article: state of the science. AnnIntern Med 1987;106:485–8.

[11] Huang XLJ, Demner-Fushman D. Evaluation of PICO as a knowledgerepresentation for clinical questions. AMIA Annu Symp Proc 2006;359–63.

[12] PROSPERO. International prospective register of systematicreviews. https://www.crd.york.ac.uk/prospero/about.php?about=citerecord#index.php.

[13] Higgins JP, Lasserson T, Chandler J, Tovey DRC. Expectations ofCochrane Intervention Reviews (MECIR) Standards for the conductand reporting of new Cochrane Intervention Reviews, reporting ofprotocols and the planning, conduct and reporting of updates.http://methods.cochrane.org/mecir.

[14] Moher D, Liberati A, Tetzlaff J, Altman DG, Group P. Preferredreporting items for systematic reviews and meta-analyses: thePRISMA statement. J Clin Epidemiol 2009;62:1006–12.

[15] PRISMA. Preferred Reporting Items for Systematic Reviews andMeta-Analyses (PRISMA). http://www.prisma-statement.org/.

[16] Buscemi N, Harding L, Vandermeer B, Tjosvold L, Klassen TP. Singledata extraction generated more errors than double data extractionin systematic reviews. J Clin Epidemiol 2006;59:697–703.

[17] MacLennan S, Williamson PR, Bekema H, et al. A core outcome setfor localised prostate cancer effectiveness trials. BJU Int. http://dx.doi.org/10.1111/bju.13854.

[18] Higgins JP, Altman DG, Gotzsche PC, et al. The Cochrane Collabora-tion’s tool for assessing risk of bias in randomised trials. BMJ 2011;343:d5928.

[19] ReviewManager (RevMan) [Computer program]. Version5.3. Copen-hagen: The Nordic Cochrane Centre, The Cochrane Collaboration;2014.

[20] Higgins JPT, Sterne JAC, Savovi�c J, et al. A revised tool for assessingrisk of bias in randomized trials. In: Chandler J, McKenzie J, BoutronI, Welch V, editors. Cochrane Methods. USA: Cochrane Database ofSystematic Reviews; 2016.

[21] MacLennan S, Imamura M, Lapitan MC, et al. Systematic review ofperioperative and quality-of-life outcomes following surgical man-agement of localised renal cancer. Eur Urol 2012;62:1097–117.

[22] Sterne JA, Hernan MA, Reeves BC, et al. ROBINS-I: a tool for asses-sing risk of bias in non-randomised studies of interventions. BMJ2016;355:i4919.

[23] Whiting PF, Rutjes AW, Westwood ME, et al. QUADAS-2: a revisedtool for the quality assessment of diagnostic accuracy studies. AnnIntern Med 2011;155:529–36.

[24] Hayden JA, van der Windt DA, Cartwright JL, Cote P, Bombardier C.Assessing bias in studies of prognostic factors. Ann Intern Med2013;158:280–6.

[25] Egger M, Smith GD, Phillips AN. Meta-analysis: principles andprocedures. BMJ 1997;315:1533–7.

[26] Mulrow CD. Rationale for systematic reviews. BMJ 1994;309:597–9.[27] Higgins JPT. Commentary: Heterogeneity in meta-analysis should

be expected and appropriately quantified. Int J Epidemiol2008;37:1158–60.

[28] Higgins JPT, Thompson SG, Deeks JJ, Altman DG. Measuring incon-sistency in meta-analyses. Br Med J 2003;327:557–60.

[29] Yusuf S, Wittes J, Probstfield J, Tyroler HA. Analysis and interpreta-tion of treatment effects in subgroups of patients in randomizedclinical-trials. JAMA 1991;266:93–8.

[30] Oxman AD, Guyatt GH. A consumer’s guide to subgroup analyses.Ann Intern Med 1992;116:78–84.

[31] HardenA,Thomas J.Methodological issues incombiningdiverse studytypes in systematic reviews. Int J Soc Res Methodol 2005;8:257–71.

[32] Thornton A, Lee P. Publication bias in meta-analysis: its causes andconsequences. J Clin Epidemiol 2000;53:207–16.

[33] Mays N, Pope C, Popay J. Systematically reviewing qualitative andquantitative evidence to informmanagement and policy-making inthe health field. J Health Serv Res Policy 2005;10(Suppl 1):6–20.

[34] Popay J, Roberts H, Sowden A, et al. Guidance on the conduct ofnarrative synthesis in systematic reviews. ESRC Research MethodsProgramme 2006.

[35] Guyatt G, Oxman AD, Akl EA, et al. GRADE guidelines: 1. Introduc-tion-GRADE evidence profiles and summary offindings tables. J ClinEpidemiol 2011;64:383–94.

[36] Canfield SE, Dahm P. Rating the quality of evidence and the strengthof recommendations using GRADE. World J Urol 2011;29:311–7.

[37] Balshem H, Helfand M, Schunemann HJ, et al. GRADE guidelines:3. Rating the quality of evidence. J Clin Epidemiol 2011;64:401–6.

[38] Guyatt GH, Thorlund K, Oxman AD, et al. GRADE guidelines: 13. Pre-paring Summary of Findings tables and evidence profiles-continu-ous outcomes. J Clin Epidemiol 2013;66:173–83.

[39] Plot digitizer. http://plotdigitizer.sourceforge.net/.[40] Software for systematic reviewing. http://hlwiki.slais.ubc.ca/index.

php/Software_for_systematic_reviewing.[41] Brown T. It’s time for alltrials registered and reported. Cochrane

Database Syst Rev 2013:ED000057.[42] Elliott J, Sim I, Thomas J, et al. CochraneTech: technology and the

future of systematic reviews [editorial]. CochraneDatabase Syst Rev2014;9.