discus (denniss et al, sept07.09)

1

Discus – A software program to assess judgment of glaucomatous damage in optic disc photographs.

Short title Discus

Words, Figures, Tables 3600, 3, 2

Codes & Presentations GL, Poster at ARVO meeting in May 2008 (program # 3625)

Keywords glaucoma, optic disc, sensitivity, specificity, diagnostic performance

www.wordle.net courtesy Jon Feinberg

Authors Jonathan Denniss, MCOptom1,2

Damian Echendu, OD, MSc1 David B Henson, PhD, FCOptom1,2

Paul H Artes, PhD1,2,3 (corresponding author)

Affiliations & Correspondence

1Research Group for Eye and Vision Sciences, University of Manchester, England 2Manchester Royal Eye Hospital, Manchester, England 3Ophthalmology and Visual Sciences, Dalhousie University Rm 2035, West Victoria 1276 South Park St, Halifax, Nova Scotia B3H 2Y9, Canada [email protected]

Commercial Relationships None

Support College of Optometrists PhD studentship (JD) Nova Scotia Health Research Foundation Grant Med-727 (PHA)

2

Abstract 1

Aim 2

To describe a software package (Discus) for evaluating clinicians’ assessment of optic disc damage, 3

and to provide reference data from a group of expert observers. 4

Methods 5

Optic disc images were selected from patients with manifest or suspected glaucoma or ocular 6

hypertension who attended the Manchester Royal Eye Hospital. Eighty images came from eyes 7

without evidence of visual field (VF) loss in at least 4 consecutive tests (VF-negatives), and 20 8

images from eyes with repeatable VF loss (VF-positives). Software was written to display these 9

images in randomized order, for up to 60 seconds. Expert observers (n=12) rated optic disc damage 10

on a 5-point scale (definitely healthy, probably healthy, not sure, probably damaged, definitely 11

damaged). 12

Results 13

Optic disc damage as determined by the expert observers predicted VF loss with less than perfect 14

accuracy (mean area under receiver-operating curve [AUROC], 0.78; range 0.72 to 0.85). When the 15

responses were combined across the panel of experts, the AUROC reached 0.87, corresponding to a 16

sensitivity of ~60% at 90% specificity. While the observers’ performances were similar, there were 17

large differences between the criteria they adopted (p<0.001), even though all observers had been 18

given identical instructions. 19

Conclusion 20

Discus provides a simple and rapid means for assessing important aspects of optic disc interpretation. 21

The data from the panel of expert observers provide a reference against which students, trainees, and 22

clinicians may compare themselves. The program and the analyses described in this paper are freely 23

accessible from http://discusproject.blogspot.com/.24

3

Introduction 25

The detection of early damage of the optic disc is an important yet difficult task.1, 2 26

In many patients with glaucoma, optic disc damage is the first clinically detectable sign of disease. In 27

the Ocular Hypertension Treatment Study, for example, almost 60% of patients who converted to 28

glaucoma developed optic disc changes before exhibiting reproducible visual field damage.3, 4 29

Broadly similar findings were obtained in the European Glaucoma Prevention Study; in 30

approximately 40% of those participants who developed glaucoma, optic disc changes were 31

recognised before visual field changes.5 However, the diverse range of optic disc appearances in a 32

healthy population, combined with the many ways in which glaucomatous damage may affect the 33

appearance of disc, make it difficult to detect features of early damage.6, 7 34

While several imaging technologies have been developed in the last decades (confocal scanning laser 35

tomography, nerve fibre layer polarimetry, and optical coherence tomography) which provide 36

reproducible assessment of the optic disc and retinal nerve fibre layer, the diagnostic performances 37

of these technologies have not been consistently better than that achieved by clinicians.8-11 Subjective 38

assessment of the optic disc, either by slitlamp biomicroscopy or by inspection of photographs, 39

therefore still plays a pivotal role in the clinical care of patients at risk from glaucoma.8 40

Many papers describe the optic disc changes in glaucoma6, 7, 12-14 and several authors have looked at 41

either at the agreement between clinicians in diagnosing glaucoma, differentiating between different 42

types of optic disc damage, or in estimating specific parameters such as cup/disc ratios.15-25 43

However, because there is no objective reference standard for optic disc damage, it is difficult for 44

students, trainees, or clinicians to assess their judgments against an external reference. 45

In this paper, we describe a software package (“Discus”) which observers can use to view and 46

interpret a set of selected optic disc images under controlled conditions. We further present reference 47

data from 12 expert observers against which future observers can be evaluated, or evaluate 48

themselves. 49

50

4

Methods 50

Selection of Images 51

To obtain a set of optic disc images with a wide spectrum of early glaucomatous damage, data were 52

selected from patients who had attended the Optometrist-lead Glaucoma Assessment (OLGA) clinics 53

at the Royal Eye Hospital (Manchester, UK) between June 2003 and May 2007. This clinic sees 54

patients who are deemed at risk of developing glaucoma, for example due to ocular hypertension, or 55

who have glaucoma but are thought of as being at low risk of progression and are well controlled on 56

medical therapy. Patients undergo regular examinations (normally in intervals of 6 months) by 57

specifically trained optometrists. During each visit, visual field examinations (Humphrey Field 58

Analyzer program 24-2, SITA-Standard) and non-stereoscopic fundus photography are performed 59

(Topcon TRC-50EX, field-of-view 20 degrees, resolution 2000×1312 pixels, 24 bit colour). 60

For this study, images were considered for inclusion if the patient had undergone at least 4 visual 61

field tests on each eye (n=665). The 4 most recent visual fields were then analysed to establish two 62

distinct groups, visual field (VF-) positive and VF-negative (Table 1). Images from patients who did 63

not meet the criteria of either group were excluded. 64

Table 1: Inclusion criteria for VF-positive and VF-negative groups. For inclusion in the VF-negative group, the criteria had to be met with both eyes. In addition, the between-eye differences in MD and PSD had to be less than 1.0 dB

MD PSD

VF-positive between -2.5 and -10.0 dB between 3.0 and 15.0 dB

VF-negative better than [>] -1.5 dB 1 better than [<] 2.0 dB 1

If both eyes of a patient met these criteria, a single eye was randomly selected. A small number of 65

eyes (n=17) were excluded owing to clearly non-glaucomatous visual field loss (for example, 66

hemianopia) or non-glaucomatous lesions visible on the fundus photographs (eg chorioretinal scars). 67

There were 155 eyes in the VF-positive and 144 eyes in the VF-negative group. 68

To eliminate any potential clues other than glaucomatous optic disc damage, we matched the image 69

quality in VF-negative and VF-positive groups. One of the authors (DE) viewed the images on a 70

computer monitor in random order and graded each one on a five-point scale for focus and 71

uniformity of illumination. During grading, the observer was unaware of the status of the image (VF-72

positive or -negative), and the area of the disc had been masked from view. A final set of 20 VF-73

positive images and 80 VF-negative images was then created such that the distribution of image 74

quality was similar in both groups (Table 2). The total size of the image set (100), and the ratio of 75

5

VF-positive to VF-negative images (20:80), had been decided on beforehand to limit the duration of 76

the experiments and to keep the emphasis on discs with early damage. 77

Table 2: Characteristics of VF-positive and VF-negative groups

Image quality was scored subjectively on a scale from 1 to 5. Differences between groups were tested for statistical significance by Mann-Whitney U (MWU) tests.

Image Quality Age, y MD, dB PSD

VF-positive (n=20) 1.82 (1.20) 66.0 (13.1) -6.20 (1.76) 5.58 (2.15)

VF-negative (n=80) 1.68 (1.33) 61.3 (9.3) +0.60 (0.4) 1.50 (0.16)

p-value (MWU) 0.67 0.35 <0.001 <0.001

Expert Observers 78

For the present study, 12 expert observers (either glaucoma fellowship-trained ophthalmologists 79

working in glaucoma sub-speciality clinics (n=10) or scientists involved in research in the optic disc 80

in glaucoma (n=2) were selected as observers. Observers were approached ad-hoc during scientific 81

meetings or contacted by e-mail or letter with a request for participation. 82

Prior to the experiments, the observers were given written instructions detailing the selection of the 83

image set. The instructions also stipulated that responses should be given on the basis of apparent 84

optic disc damage rather than the perceived likelihood of visual field damage. 85

86

6

Experiments 86

In order to present images under controlled conditions, and to collect the observers’ responses, a 87

software package Discus (3.0E, figure 1) was developed in Delphi (CodeGear, San Francisco, CA). 88

Details on availability and configuration of the software are provided in the Appendix. 89

The software displayed the images, in random order, on a computer monitor. After the observer had 90

triggered a new presentation by hitting the “Next” button, an image was displayed until the observer 91

responded by clicking one of 5 buttons (definitely healthy, probably healthy, not sure, probably 92

damaged, definitely damaged). After a time-out period of 60 seconds the image would disappear, but 93

observers were allowed unlimited time to give a response. To guard against occasional finger-errors, 94

observers were also allowed to change their response, as long as this occurred before the “Next” 95

button was hit. 96

To assess the consistency of the observers, 26 images were presented twice (2 in the VF-positive 97

group, 24 in the VF-negative group). No feedback was provided during the sessions. 98

Fig 1: Screenshot of Discus software.

Images remained on display for up to 60 seconds, or until the observer clicked on one of the 5 response categories. A new presentation was triggered by hitting the “Next” button.

99

Analysis 100

The responses were transformed to a numerical scale ranging from -2 (“definitely healthy”) to +2 101

(“definitely damaged”. The proportion of repeated images in which the responses differed by one or 102

more categories was calculated, for each observer. For all subsequent analyses, however, only the 103

last of the two responses was used. All analyses were carried out in the freely available open-source 104

environment R, and the ROCR library was used to plot the ROC curves.26, 27 105

7

Individual observers’ ROC curves 106

To obtain an objective measure of individual observers’ performance at discriminating between eyes 107

with and without visual field damage, ROC curves were derived from each set of responses. For this 108

analysis, the visual field status was the reference standard, and responses in the “not sure” category 109

were interpreted as between “probably healthy” and “probably damaged”. If an observer had used all 110

five response categories, the ROC curve would contain 4 points (A – D). Point A, the most 111

conservative criterion (most specific but least sensitive) gave the sensitivity and specificity to visual 112

field damage when only the “definitely damaged” responses were treated as test positives while all 113

other responses (“probably damaged”, “not sure”, “probably healthy”, “definitely healthy”) were 114

interpreted as test negatives. For point D, the least conservative criterion (most sensitive but least 115

specific), only “definitely healthy” responses were interpreted as test negatives, and all other 116

responses as test positives. 117

Individual observers’ criteria 118

When using a subjective scale, as in the current study, the responses are dependent on the observer’s 119

interpretation of the categories and their individual inclination to respond with “probably damaged” 120

or “definitely damaged” (response criterion). A cautious observer, for example, might regard a 121

particular ONH as “probably damaged” whilst an equally skilled but less cautious observer might 122

respond with “not sure” or “probably healthy”. To investigate the variation in criteria within our 123

group, we compared the observers’ mean responses across the entire image set. 124

Combining responses of expert observers 125

To estimate the performance of a panel of experts, and to obtain a reference other than visual field 126

damage for judging current as well as future observer’s responses, the mean response of the 12 127

expert observers was calculated for each of the 100 images. 128

To estimate if the expert group (n=12) was sufficiently large, we investigated how the performance 129

of the combined panel changed depending on the number of included observers. Areas under the 130

ROC curve were calculated for all possible combinations of 2, 3, 4…11 observers to derive the mean 131

performance, as well as the minimum and maximum. 132

Relationship between responses of individual observers and expert panel 133

As a measure of overall agreement between the expert observers, independent of their individual 134

response criteria, the Spearman rank correlation coefficient between the 12 sets of responses was 135

computed. The underlying rationale of this analysis is that, by assigning each image to one of five 136

ordinal categories, each observer had in fact ranked the 100 images. If two observers had performed 137

identical ranking, the Spearman coefficient would be 1, regardless of the actual responses assigned. 138

8

Results 139

The experiments took between 13 and 46 minutes (mean, 29 min) to complete. On average, the 140

observers responded 7 seconds after the images were first presented on the screen, and the median 141

response latencies of individual observers ranged from 4 to 16 seconds. The reproducibility of 142

individual observer’s responses was moderate - on average, discrepancies of one category were seen 143

in 44% (12) of 26 repeated images (range, 23 – 62%). 144

Individual observers’ results are shown in Fig. 2A-L. The points labelled A, B, C, and D represent 145

the trade-off between the positive rates in the VF-positive (vertical axis) and VF-negative groups 146

(horizontal axis) achieved with the four possible classification criteria. Point A, for example, shows 147

the trade-off when only discs in the “definitely damaged” category are regarded as test-positives. 148

Point B gives the trade-off when discs in both “definitely damaged” and “probably damaged” 149

categories are regarded as test-positives. For D, the least conservative criterion, only responses of 150

“definitely healthy” were interpreted as negatives. To indicate the precision of these estimates, the 151

95% confidence intervals were added to point B. 152

Areas under the curve (AUROC) ranged from 0.71 (95% CI, 0.58, 0.85) to 0.88 (95% CI, 0.82, 153

0.96), with a mean of 0.79. There was no relationship between observers’ overall performance and 154

their median response latency (Spearman’s rho = 0.34, p = 0.29). 155

In contrast to their similar overall performance, the observers’ response criteria differed substantially 156

(p<0.001, Friedman test). For example, the proportion of discs in the VF-positive category which 157

were classified as “definitely damaged” ranged from 15% to 90%, while the proportion of discs in 158

the VF-negative category classified as “definitely healthy” ranged from 8% to 68%. In Fig 2A-L, the 159

response criterion is represented by the inclination of the red line with its origin in the bottom right 160

corner. If the responses had been exactly balanced between the “damaged” and “healthy” categories, 161

the inclination of the line would be 45 degrees. A more horizontal line represents a more 162

conservative criterion (less likely to respond with “probably damaged” or “definitely damaged”, 163

while a more vertical line represents a less conservative criterion. There was no relationship between 164

the observers’ performance (AUROC) and their response criterion (Spearman’s rho 0.41, p = 0.18). 165

To derive the “best possible” performance as a reference for future observers, the responses of the 166

expert panel were combined by calculating the mean response obtained for each image. The ROC 167

curve for the combined responses (grey curve in Fig. 2A-L) enclosed an area of 0.87. 168

169

9

169

Fig. 2. Receiver-operating characteristic (ROC) curves for the classification of optic disc photographs by the 12 expert observers (A-L), with a reference standard of visual field damage. The x-axis (positive rate in VF-negative group) measures specificity to visual field damage, while the y-axis (positive rate in the VF-positive group) gives the sensitivity. Point A (most conservative criterion) shows the trade-off between sensitivity and specificity when only“definitely damaged” responses are interpreted as test positives. For point D (the least conservative criterion) shows the trade-off when all but “definitely healthy” responses are interpreted as test positives. Boxplots (right) give the distributions of response latencies, and the number of times each response was selected.

170

10

170

11

Fig. 2 (cont). To facilitate comparison, the grey ROC curve, and the dotted grey line, represent the performance and the criterion of the group as a whole, respectively. Results provided in numerical format are the area under the ROC curve (AUC), the percentage of the AUC as compared to that of the entire group (individual ROC area – 0.5) / (expert panel ROC area – 0.5), the Spearman rank correlation of the individual’s responses with those of the entire group, the mean difference between repeated responses, and the average response as a measure of criterion (-2=”definitely healthy”, -1=”probably healthy”, 0=”not sure”, 1=”probably damaged”, and 2=”definitely damaged”.

171

12

To investigate how the performance of an expert panel varies with the number of contributing 171

observers, the area under the ROC curve was derived for all possible combinations of 2, 3, 4, etc, up 172

to 11, observers (Fig. 5). The limit of the ROC area was approached with 6 or more observers, and it 173

appeared that a further increase in the number of observers would not have had a substantial effect 174

on the performance of the panel. 175

Fig. 3

Performance (area under ROC curve) of combined expert panel as a function of included observers. All possible combinations of 2 to 11 observers were evaluated. The mean area under the ROC curve approaches its limit with approximately 6 observers.

176

Individual observers’ Spearman rank correlation coefficient with the combined expert panel ranged 177

from 0.62 to 0.86, with a median of 0.79. There was no relationship between the Spearman 178

coefficient and the area under the ROC curve (r = 0.09, p = 0.78). 179

180

181

13

Discussion 181

The objective of this work was to establish an easy-to-use tool for clinicians, trainees, and students to 182

assess their skill at interpreting optic discs for signs of glaucoma-related damage, and to provide data 183

from a panel of experts as a reference for future observers. The study also showed that meaningful 184

experiments with Discus can be performed within a relatively short time. 185

All observers in this study had ROC areas significantly smaller than 1, and even when the judgments 186

of the observers were averaged, the combined responses of the panel failed to discriminate perfectly 187

between optic discs in the VF-positive and VF-negative groups. These findings are not surprising, 188

given the lack of a strong association between structure and function in early glaucoma that has been 189

reported by many previous studies.28-33 However, the experiments provide a powerful illustration of 190

how difficult it is to make diagnostic decisions in glaucoma based solely on the optic disc. 191

Estimated at a specificity fixed at 90%, the combined panel’s sensitivity to visual field loss was 60%. 192

This is within the range of performances previously reported for clinical observers and objective 193

imaging tools.9, 34-37 Unfortunately, objective imaging data are not available for the patients in the 194

current dataset and we are therefore unable to perform a direct comparison. However, the 195

methodology developed in this paper may prove useful for future studies that compare diagnostic 196

performance between clinicians and imaging tools in different clinical settings. A potential weakness 197

of our study was the relatively small size of the expert group (n = 12). However, by averaging every 198

possible combination of 2 to 11 observers within the group, we demonstrated that our panel was 199

likely to have attained near-maximum performance, and that a larger group of observers was unlikely 200

to have changed our findings substantially. 201

One challenging issue is how to derive complete and easily interpretable summary measures of 202

performance, in the absence of a reference standard of optic disc damage. Such summary measures 203

would be useful for giving feedback and for establishing targets for students and trainees. We used 204

visual field data as the criterion to separate optic disc images into VF-positive and VF-negative 205

groups, and there was no selection based on the presence or type of optic disc damage which would 206

have biased our sample.38-40 The ROC area therefore measures the statistical separation between an 207

observer’s responses to optic discs in eyes with and without visual field damage.41, 42 However, 208

owing to the lack of a strong correlation between structure and function, visual field loss is not an 209

ideal metric for optic disc damage in early glaucoma. For example, it is likely that a substantial 210

proportion of the VF-negative images show early structural damage, whereas some optic discs in the 211

VF-positive group may still appear healthy. 212

We have attempted to address the problem of a lacking reference standard in two complementary 213

ways. First, a new observer’s ROC area can be compared to that of the expert panel, such that the 214

14

statistic is re-scaled to cover a potential range from near zero (corresponding to chance performance, 215

AUROC = 0.5) to around 100% (AUROC = 0.87, performance of expert panel). 216

Second, we suggest that the Spearman rank correlation coefficient may be useful as a measure of 217

agreement between a future observer’s responses and those of the expert panel.43 Because this 218

coefficient takes into account the relative ranking of the responses, and not their overall magnitude, it 219

is independent of the observer’s response criterion. Consider, for example, three images graded as 220

“probably damaged”, “probably healthy”, and “definitely healthy” by the expert group. An observer 221

responding with “definitely damaged”, “not sure”, and “probably healthy” would differ in criterion 222

but agree on the relative ranking of damage, and their rank correlation with the expert panel would 223

be 1.0 (perfect). Our data suggest that observers may achieve similar ROC areas with rather different 224

responses (consider observers D and F as an example), and the lack of association between the ROC 225

area and the rank correlation means that these statistics measure somewhat independent aspects of 226

decision-making. 227

A surprising finding was that individual observers in our study adopted very different response 228

criteria, even though they had been provided with identical written instructions and identical 229

information on the source of the images and the distribution of visual field damage in the sample 230

(compare observers A and E, for example). It is possible that we might have been able to control the 231

criteria more closely, for example by instructing observers to use the “probably damaged” category if 232

they believed that the chances for the eye to be healthy were less than, say, 10%. More importantly, 233

however, our findings underscore the need to distinguish between differences in diagnostic 234

performance, and differences in diagnostic criterion, whenever subjective ratings of optic disc 235

damage are involved. This is the principal reason for why we avoided the use of kappa statistics 236

which measure overall agreement but do not isolate differences in criterion.44, 45 237

The outpatient clinic from which our images were obtained sees a relatively high proportion of 238

patients suspected of having glaucoma who do not have visual field loss. Because our image sample 239

is not representative of an unselected population, the ROC curves are likely to underestimate 240

clinicians’ true performance at detecting glaucoma by ophthalmoscopy. However, the use of a 241

“difficult” data set may also be seen as an advantage as it allows observers’ performance to be 242

assessed on the type of optic disc more likely to cause diagnostic problems in clinical practice. 243

In addition to the source of our images, here are several other reasons for why the performance on 244

Discus should not be regarded as providing a truly representative measure of an observer’s real-245

world diagnostic capability. First, we used non-stereoscopic images. Stereoscopic images would 246

have been more representative of slitlamp biomicroscopy, the current standard of care, and there is 247

evidence that many features of glaucomatous damage may be more clearly apparent in stereoscopic 248

images.46 However, the gain over monoscopic images is probably not large.47-50 Second, Discus does 249

15

not permit a comparison of fellow eyes which often provides important clues in patients with early 250

damage.51 Third, through the display of photographic images on a computer monitor we can not 251

assess an observer’s aptitude at obtaining an adequate view of the optic disc in real patients. 252

Notwithstanding these limitations, we believe that Discus provides a useful assessment of some 253

important aspects of recognising glaucomatous optic disc damage. Further studies with Discus are 254

now being undertaken to examine the performance of ophthalmology residents and other trainees as 255

compared to our expert group. These studies will also provide insight into which features of 256

glaucomatous optic disc damage are least well recognised, and how clinicians use information on 257

prior probability in their clinical decision-making. 258

Conclusions 259

The Discus software may be useful in the assessment and training of clinicians involved in the 260

detection of glaucoma. It is freely available from http://discusproject.blogspot.com, and interested 261

users may analyse their results using an automated web server on this site. 262

Acknowledgements 263

Robert Harper, Amanda Harding and Jo Marcks of the OLGA clinic at the Manchester Royal Eye 264

Hospital supported this project and contributed ideas. Jonathan Layes (Medicine) and Bijan Farhoudi 265

(Computer Science) of Dalhousie University helped to improve the software and to implement an 266

automated analysis on our server. We are most grateful to all 12 anonymous observers for their 267

participation. 268

Appendix 269

At present, Discus is available only for the Windows operating systems. The software can be called 270

with different start-up parameters. These parameters (and their defaults) are: 271

1) Duration of image presentations, in ms (10000) 272

2) Rate of Repetitions in the visual field positive group (0.1) 273

3) Rate of Repetitions in the visual field negative group (0.3) 274

4) Save-To-Desktop status (1) 275

If the Save-To-Desktop status is set to 1, a tab delimited file will be saved to the desktop. The user 276

can then upload this file to our server and retrieve their results after a few seconds. 277

278

279

16

References 279

280

1. Weinreb RN, Tee Khaw P. Primary open-angle glaucoma. Lancet 2004;363:1711-1720. 281

2. Garway-Heath DF. Early diagnosis in glaucoma. In: Nucci C, Cerulli L, Osborne NN, 282

Bagetta G (eds), Progress in Brain Research; 2008:47-57. 283

3. Gordon MO, Beiser JA, Brandt JD, et al. The Ocular Hypertension Treatment Study: 284

Baseline Factors That Predict the Onset of Primary Open-Angle Glaucoma. Archives of 285

Ophthalmology 2002;120:714. 286

4. Keltner JL, Johnson CA, Anderson DR, et al. The association between glaucomatous visual 287

fields and optic nerve head features in the Ocular Hypertension Treatment Study. 288

Ophthalmology 2006;113:1603-1612. 289

5. Predictive Factors for Open-Angle Glaucoma among Patients with Ocular Hypertension in 290

the European Glaucoma Prevention Study. Ophthalmology 2007;114:3-9. 291

6. Broadway DC, Nicolela MT, Drance SM. Optic Disk Appearances in Primary Open-Angle 292

Glaucoma. Survey of Ophthalmology 1999;43:223-243. 293

7. Jonas JB, Budde WM, Panda-Jonas S. Ophthalmoscopic evaluation of the optic nerve head. 294

Survey of Ophthalmology 1999;43:293-320. 295

8. Lin SC, Singh K, Jampel HD, et al. Optic Nerve Head and Retinal Nerve Fiber Layer 296

Analysis: A Report by the American Academy of Ophthalmology. Ophthalmology 297

2007;114:1937-1949. 298

9. Sharma P, Sample PA, Zangwill LM, Schuman JS. Diagnostic Tools for Glaucoma Detection 299

and Management. Survey of Ophthalmology 2008;53. 300

10. Zangwill LM, Bowd C, Weinreb RN. Evaluating the Optic Disc and Retinal Nerve Fiber 301

Layer in Glaucoma II: Optical Image Analysis. Seminars in Ophthalmology 2000;15:206 - 302

220. 303

11. Mowatt G, Burr JM, Cook JA, et al. Screening Tests for Detecting Open-Angle Glaucoma: 304

Systematic Review and Meta-analysis. Invest Ophthalmol Vis Sci 2008;49:5373-5385. 305

12. Fingeret M, Medeiros FA, Susanna Jr R, Weinreb RN. Five rules to evaluate the optic disc 306

and retinal nerve fiber layer for glaucoma. Optometry 2005;76:661-668. 307

13. Susanna Jr R, Vessani RM. New findings in the evaluation of the optic disc in glaucoma 308

diagnosis. Current Opinion in Ophthalmology 2007;18:122-128. 309

14. Caprioli J. Clinical evaluation of the optic nerve in glaucoma. Transactions of the American 310

Ophthalmological Society 1994;92:589. 311

15. Lichter PR. Variability of expert observers in evaluating the optic disc. Transactions of the 312

American Ophthalmological Society 1976;74:532. 313

16. Tielsch JM, Katz J, Quigley HA, Miller NR, Sommer A. Intraobserver and interobserver 314

agreement in measurement of optic disc characteristics. Ophthalmology 1988;95:350-356. 315

17. Nicolela MT, Drance SM, Broadway DC, Chauhan BC, McCormick TA, LeBlanc RP. 316

Agreement among clinicians in the recognition of patterns of optic disk damage in glaucoma. 317

American journal of ophthalmology 2001;132:836-844. 318

18. Spalding JM, Litwak AB, Shufelt CL. Optic nerve evaluation among optometrists. Optom Vis 319

Sci 2000;77:446-452. 320

19. Harper R, Reeves B, Smith G. Observer variability in optic disc assessment: implications for 321

glaucoma shared care. Ophthalmic Physiol Opt 2000;20:265-273. 322

17

20. Harper R, Radi N, Reeves BC, Fenerty C, Spencer AF, Batterbury M. Agreement between 323

ophthalmologists and optometrists in optic disc assessment: training implications for 324

glaucoma co-management. Graefes Archive Clin Exp Ophthalmol 2001;239:342-350. 325

21. Spry PG, Spencer IC, Sparrow JM, et al. The Bristol Shared Care Glaucoma Study: reliability 326

of community optometric and hospital eye service test measures. The British journal of 327

ophthalmology 1999;83:707-712. 328

22. Abrams LS, Scott IU, Spaeth GL, Quigley HA, Varma R. Agreement among optometrists, 329

ophthalmologists, and residents in evaluating the optic disc for glaucoma. Ophthalmology 330

1994;101:1662-1667. 331

23. Varma R, Steinmann WC, Scott IU. Expert agreement in evaluating the optic disc for 332

glaucoma. Ophthalmology 1992;99:215-221. 333

24. Azuara-Blanco A, Katz LJ, Spaeth GL, Vernon SA, Spencer F, Lanzl IM. Clinical agreement 334

among glaucoma experts in the detection of glaucomatous changes of the optic disk using 335

simultaneous stereoscopic photographs. American journal of ophthalmology 2003;136:949-336

950. 337

25. Sung VCT, Bhan A, Vernon SA. Agreement in assessing optic discs with a digital 338

stereoscopic optic disc camera (Discam) and Heidelberg retina tomograph. BMJ; 2002:196-339

202. 340

26. Ihaka R, Gentleman R. R: A Language for Data Analysis and Graphics. Journal of 341

Computational and Graphical Statistics 1996;5:299-314. 342

27. Sing T, Sander O, Beerenwinkel N, Lengauer T. ROCR: visualizing classifier performance in 343

R. Bioinformatics 2005;21:3940-3941. 344

28. Anderson RS. The psychophysics of glaucoma: Improving the structure/function relationship. 345

Progress in Retinal and Eye Research 2006;25:79-97. 346

29. Garway-Heath DF, Holder GE, Fitzke FW, Hitchings RA. Relationship between 347

electrophysiological, psychophysical, and anatomical measurements in glaucoma. 348

Investigative Ophthalmology and Visual Science 2002;43:2213-2220. 349

30. Johnson CA, Cioffi GA, Liebmann JR, Sample PA, Zangwill LM, Weinreb RN. The 350

relationship between structural and functional alterations in glaucoma: A review. Seminars in 351

Ophthalmology 2000;15:221-233. 352

31. Harwerth RS, Quigley HA. Visual field defects and retinal ganglion cell losses in patients 353

with glaucoma. Archives of Ophthalmology 2006;124:853-859. 354

32. Caprioli J. Correlation of visual function with optic nerve and nerve fiber layer structure in 355

glaucoma. Survey of Ophthalmology 1989;33:319-330. 356

33. Caprioli J, Miller JM. Correlation of structure and function in glaucoma. Quantitative 357

measurements of disc and field. Ophthalmology 1988;95:723-727. 358

34. Deleon-Ortega JE, Arthur SN, McGwin Jr G, Xie A, Monheit BE, Girkin CA. Discrimination 359

between glaucomatous and nonglaucomatous eyes using quantitative imaging devices and 360

subjective optic nerve head assessment. Invest Ophthalmol Vis Sci 2006;47:3374-3380. 361

35. Mardin CY, Jünemann AGM. The diagnostic value of optic nerve imaging in early glaucoma. 362

Current Opinion in Ophthalmology 2001;12:100-104. 363

36. Greaney MJ, Hoffman DC, Garway-Heath DF, Nakla M, Coleman AL, Caprioli J. 364

Comparison of optic nerve imaging methods to distinguish normal eyes from those with 365

glaucoma. Investigative Ophthalmology and Visual Science 2002;43:140-145. 366

18

37. Harper R, Reeves B. The sensitivity and specificity of direct ophthalmoscopic optic disc 367

assessment in screening for glaucoma: a multivariate analysis. Graefe's Archive for Clinical 368

and Experimental Ophthalmology 2000;238:949-955. 369

38. Whiting P, Rutjes AWS, Reitsma JB, Glas AS, Bossuyt PMM, Kleijnen J. Sources of 370

Variation and Bias in Studies of Diagnostic Accuracy: A Systematic Review. Annals of 371

Internal Medicine 2004;140:189-202. 372

39. Medeiros FA, Ng D, Zangwill LM, Sample PA, Bowd C, Weinreb RN. The effects of study 373

design and spectrum bias on the evaluation of diagnostic accuracy of confocal scanning laser 374

ophthalmoscopy in glaucoma. Investigative Ophthalmology and Visual Science 2007;48:214-375

222. 376

40. Harper R, Henson D, Reeves BC. Appraising evaluations of screening/diagnostic tests: the 377

importance of the study populations. British Journal of Ophthalmology 2000;84:1198. 378

41. Hanley JA. Receiver operating characteristic (ROC) methodology: The state of the art. 379

Critical Reviews in Diagnostic Imaging 1989;29:307-335. 380

42. Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating 381

characteristic (ROC) curve. Radiology 1982;143:29-36. 382

43. Svensson E. A coefficient of agreement adjusted for bias in paired ordered categorical data. 383

Biometrical journal 1997;39:643-657. 384

44. Fleiss JL. Measuring nominal scale agreement among many raters. Psychological Bulletin 385

1971;76:378-382. 386

45. Feinstein AR, Cicchetti DV. High agreement but low kappa: I. The problems of two 387

paradoxes. J Clin Epidemiol 1990;43:543-549. 388

46. Morgan JE, Sheen NJL, North RV, Choong Y, Ansari E. Digital imaging of the optic nerve 389

head: Monoscopic and stereoscopic analysis. British Journal of Ophthalmology 2005;89:879-390

884. 391

47. Hrynchak P, Hutchings N, Jones D, Simpson T. A comparison of cup-to-disc ratio 392

measurement in normal subjects using optical coherence tomography image analysis of the 393

optic nerve head and stereo fundus biomicroscopy. Ophthalmic and Physiological Optics 394

2004;24:543-550. 395

48. Parkin B, Shuttleworth G, Costen M, Davison C. A comparison of stereoscopic and 396

monoscopic evaluation of optic disc topography using a digital optic disc stereo camera. 397

BMJ; 2001:1347-1351. 398

49. Vingrys AJ, Helfrich KA, Smith G. The role that binocular vision and stereopsis have in 399

evaluating fundus features. Optom Vis Sci 1994;71:508-515. 400

50. Rumsey KE, Rumsey JM, Leach NE. Monocular vs. stereospecific measurement of cup-to-401

disc ratios. Optometry and Vision Science 1990;67:546-550. 402

51. Harasymowycz P, Davis B, Xu G, Myers J, Bayer A, Spaeth GL. The use of RADAAR (ratio 403

of rim area to disc area asymmetry) in detecting glaucoma and its severity. Canadian Journal 404

of Ophthalmology 2004;39:240-244. 405

406

407

discus (denniss et al, sept07.09)

Documents