do we need to improve the study design … · aims primary angle-closure glaucoma is expected to...

12
Chris Hyde, Harriet Hunt, Zhivko Zhelev DO WE NEED TO IMPROVE THE STUDY DESIGN FRAMEWORK FOR EVALUATIONS OF TEST ACCURACY?

Upload: ngodiep

Post on 04-Jun-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

Chris Hyde, Harriet Hunt, Zhivko Zhelev

DO WE NEED TO IMPROVE THE STUDY DESIGN FRAMEWORK FOR

EVALUATIONS OF TEST ACCURACY?

Overview

Issue: Naming the study designs used for evaluating test accuracy

• The problem • Have we already solved it? • What more can we do? • Suggested next steps

The problem

• As investigated in poster, “The use of study design terminology in the abstracts of diagnostic test accuracy (DTA) studies”

• 84 DTAs in 2012 • When we look at how authors describe their studies, they usually resort

to long hand descriptions (76%). • Potentially valuable space in title and abstract is lost • Where there is an attempt to use study design terms beyond accuracy,

there seems to be no standardisation • There is evidence of lack of understanding about key design features • Use of words which are recognised as part of the lexicon of study

design descriptors, but the terms are not very informative in the context of evaluating accuracy

• “Cohort”; “observational”; “retrospective study”; “prospective study”;

Comparing approaches to screening for angle closure in older Chinese adults. Aims Primary angle-closure glaucoma is expected to account for nearly 50% of bilateral glaucoma blindness by 2020. This study was conducted to assess the performance of the scanning peripheral anterior chamber depth analyzer (SPAC) and limbal anterior chamber depth (LACD) as screening methods for angle closure. Methods This study assessed two clinical populations to compare SPAC, LACD, and gonioscopy: the Zhongshan Angle-closure Prevention Trial, from which 370 patients were eligible as closed-angle participants and the Liwan Eye Study, from which 72 patients were selected as open-angle controls. Eligible participants were assessed by SPAC, LACD, and gonioscopy. Results Angle status was defined by gonioscopy. Area under the receiver operating characteristic curve (AUROC) for SPAC was 0.92 (0.89-0.95) whereas AUROC for LACD was 0.94 (0.92-0.97). Using conventional cutoff points, sensitivity/specificity was 93.0%/70.8% for SPAC and 94.1%/87.5% for LACD.

Is this really a problem?

• Worthy of debate • We were surprised by the scale of problem • Principally worried about economy with which authors can express

what they have done and with which readers might assimilate research messages

• But, does it also betray: – Lack of clarity about the design choices in primary research – Related lack of clarity in secondary research and… – Is failure to clearly distinguish different study designs at the heart of difficulties

with quality assessment • We conclude that it is worth exploring whether a coordinated

collection of study design terms for DTAs can be developed

What do we already know? • We have conducted a methodological review: • Incomplete but:

– Searching of MEDLINE, EMBASE and Cochrane Library – Reference checking – Discussion with key experts in the field

• Confident that there is no comprehensive study design typology – Closest remains Rutjes et al 2005 – Not in STARD, QUADAS, Cochrane Handbook, NICE guidance

• The foundations are however visible – Articles providing authoritative and wide ranging descriptions of design

considerations eg Knottnerus and Muris 2003 – Articles raising awareness of key designs

• Case-control designs eg Rutjes et al 2005 • Nested case control designs eg Biesheuvel et al 2008 • ? Randomisation

Components

• STARD Items 1-5 suggest the key components: – Accuracy study – Single test or comparing multiple tests – Population source – Sampling of the source

• Other STARD items/ by analogy with interventional research designs

– Retrospective/prospective – Blinding

• Concerns suggested by our case studies

– Study intent – inferiority; equivalence; superiority – Reference standard – pragmatic; ideal

Putting them together

• Arguably the greater challenge • Needs to recognise that there is a trade-off between completeness and

complexity • Choice required

– Information content – the most important things a reader wants to know – Methodologically important – Combination

• Currently there is an important unwritten assumption that there is a basic test accuracy study design which is widely understood

– Follow this, with supporting documentation (as in STARD) stating and being explicit about what the features of this design are

– Reject it and try a more theoretically based descriptor – cross-sectional validation against reference standard

Example Feature Option 1 Option 2 Option 3

Accuracy Accuracy study

Test number One Two Three etc

Source One-gate Two-gate Two-gate nested

Reference standard

Ideal Pragmatic

Viz: One test, accuracy study with one gate and an ideal reference standard

Screening for angle closure in older Chinese adults. Aims Primary angle-closure glaucoma is expected to account for nearly 50% of bilateral glaucoma blindness by 2020. Early detection may reduce the impact. Methods We conducted a two test accuracy study with one gate and ideal reference standard. The tests for angle-closure were the scanning peripheral anterior chamber depth analyzer (SPAC) and limbal anterior chamber depth (LACD). Gonioscopy was the reference standard. Results Area under the receiver operating characteristic curve (AUROC) for SPAC was 0.92 (0.89-0.95) whereas AUROC for LACD was 0.94 (0.92-0.97). Using conventional cutoff points, sensitivity/specificity was 93.0%/70.8% for SPAC and 94.1%/87.5% for LACD.

Next steps

• This is achievable • Although simple, might have quite a profound effect

– What was the effect of the term RCT on interventional research? • Needs consensus on what are the most important features to

convey in the study design term • Create supporting documentation clarifying terms • Supporting documentation could reinforce important features

of study design which although not captured in the study design term, need to be elaborated in the methods

• Actively disseminate • Move on to other aspects of test evaluation!

Chris Hyde [email protected]

DO WE NEED TO IMPROVE THE STUDY DESIGN FRAMEWORK FOR EVALUATIONS OF TEST ACCURACY?

C.J. Biesheuvel, Y. Vergouwe, R. Oudega, A.W. Hoes, D.E. Grobbee, K.G.M. Moons. Advantages of the nested case-control design in diagnostic research. BMC Medical Research Methodology 2008;8:48. doi:10.1186/1471-2288-8-48 J.A. Knottnerus, J.W. Muris. Assessment of the accuracy of diagnostic tests: the cross-sectional study. Journal of Clinical Epidemiology 2003;56:1118–1128. A.W.S. Rutjes, J.B. Reitsma, J.P. Vandenbroucke, A.S. Glas, P.M.M. Bossuyt. Case–Control and Two-Gate Designs in Diagnostic Accuracy Studies. Clinical Chemistry 2005;51(8):1335–1341.