discus (denniss et al, sept07.09)
DESCRIPTION
unpublished manuscript of a software to do experiments with photographs of healthy and damaged optic discs.TRANSCRIPT
1
Discus – A software program to assess judgment of glaucomatous damage in optic disc photographs.
Short title Discus
Words, Figures, Tables 3600, 3, 2
Codes & Presentations GL, Poster at ARVO meeting in May 2008 (program # 3625)
Keywords glaucoma, optic disc, sensitivity, specificity, diagnostic performance
www.wordle.net courtesy Jon Feinberg
Authors Jonathan Denniss, MCOptom1,2
Damian Echendu, OD, MSc1 David B Henson, PhD, FCOptom1,2
Paul H Artes, PhD1,2,3 (corresponding author)
Affiliations & Correspondence
1Research Group for Eye and Vision Sciences, University of Manchester, England 2Manchester Royal Eye Hospital, Manchester, England 3Ophthalmology and Visual Sciences, Dalhousie University Rm 2035, West Victoria 1276 South Park St, Halifax, Nova Scotia B3H 2Y9, Canada [email protected]
Commercial Relationships None
Support College of Optometrists PhD studentship (JD) Nova Scotia Health Research Foundation Grant Med-727 (PHA)
2
Abstract 1
Aim 2
To describe a software package (Discus) for evaluating clinicians’ assessment of optic disc damage, 3
and to provide reference data from a group of expert observers. 4
Methods 5
Optic disc images were selected from patients with manifest or suspected glaucoma or ocular 6
hypertension who attended the Manchester Royal Eye Hospital. Eighty images came from eyes 7
without evidence of visual field (VF) loss in at least 4 consecutive tests (VF-negatives), and 20 8
images from eyes with repeatable VF loss (VF-positives). Software was written to display these 9
images in randomized order, for up to 60 seconds. Expert observers (n=12) rated optic disc damage 10
on a 5-point scale (definitely healthy, probably healthy, not sure, probably damaged, definitely 11
damaged). 12
Results 13
Optic disc damage as determined by the expert observers predicted VF loss with less than perfect 14
accuracy (mean area under receiver-operating curve [AUROC], 0.78; range 0.72 to 0.85). When the 15
responses were combined across the panel of experts, the AUROC reached 0.87, corresponding to a 16
sensitivity of ~60% at 90% specificity. While the observers’ performances were similar, there were 17
large differences between the criteria they adopted (p<0.001), even though all observers had been 18
given identical instructions. 19
Conclusion 20
Discus provides a simple and rapid means for assessing important aspects of optic disc interpretation. 21
The data from the panel of expert observers provide a reference against which students, trainees, and 22
clinicians may compare themselves. The program and the analyses described in this paper are freely 23
accessible from http://discusproject.blogspot.com/.24
3
Introduction 25
The detection of early damage of the optic disc is an important yet difficult task.1, 2 26
In many patients with glaucoma, optic disc damage is the first clinically detectable sign of disease. In 27
the Ocular Hypertension Treatment Study, for example, almost 60% of patients who converted to 28
glaucoma developed optic disc changes before exhibiting reproducible visual field damage.3, 4 29
Broadly similar findings were obtained in the European Glaucoma Prevention Study; in 30
approximately 40% of those participants who developed glaucoma, optic disc changes were 31
recognised before visual field changes.5 However, the diverse range of optic disc appearances in a 32
healthy population, combined with the many ways in which glaucomatous damage may affect the 33
appearance of disc, make it difficult to detect features of early damage.6, 7 34
While several imaging technologies have been developed in the last decades (confocal scanning laser 35
tomography, nerve fibre layer polarimetry, and optical coherence tomography) which provide 36
reproducible assessment of the optic disc and retinal nerve fibre layer, the diagnostic performances 37
of these technologies have not been consistently better than that achieved by clinicians.8-11 Subjective 38
assessment of the optic disc, either by slitlamp biomicroscopy or by inspection of photographs, 39
therefore still plays a pivotal role in the clinical care of patients at risk from glaucoma.8 40
Many papers describe the optic disc changes in glaucoma6, 7, 12-14 and several authors have looked at 41
either at the agreement between clinicians in diagnosing glaucoma, differentiating between different 42
types of optic disc damage, or in estimating specific parameters such as cup/disc ratios.15-25 43
However, because there is no objective reference standard for optic disc damage, it is difficult for 44
students, trainees, or clinicians to assess their judgments against an external reference. 45
In this paper, we describe a software package (“Discus”) which observers can use to view and 46
interpret a set of selected optic disc images under controlled conditions. We further present reference 47
data from 12 expert observers against which future observers can be evaluated, or evaluate 48
themselves. 49
50
4
Methods 50
Selection of Images 51
To obtain a set of optic disc images with a wide spectrum of early glaucomatous damage, data were 52
selected from patients who had attended the Optometrist-lead Glaucoma Assessment (OLGA) clinics 53
at the Royal Eye Hospital (Manchester, UK) between June 2003 and May 2007. This clinic sees 54
patients who are deemed at risk of developing glaucoma, for example due to ocular hypertension, or 55
who have glaucoma but are thought of as being at low risk of progression and are well controlled on 56
medical therapy. Patients undergo regular examinations (normally in intervals of 6 months) by 57
specifically trained optometrists. During each visit, visual field examinations (Humphrey Field 58
Analyzer program 24-2, SITA-Standard) and non-stereoscopic fundus photography are performed 59
(Topcon TRC-50EX, field-of-view 20 degrees, resolution 2000×1312 pixels, 24 bit colour). 60
For this study, images were considered for inclusion if the patient had undergone at least 4 visual 61
field tests on each eye (n=665). The 4 most recent visual fields were then analysed to establish two 62
distinct groups, visual field (VF-) positive and VF-negative (Table 1). Images from patients who did 63
not meet the criteria of either group were excluded. 64
Table 1: Inclusion criteria for VF-positive and VF-negative groups. For inclusion in the VF-negative group, the criteria had to be met with both eyes. In addition, the between-eye differences in MD and PSD had to be less than 1.0 dB
MD PSD
VF-positive between -2.5 and -10.0 dB between 3.0 and 15.0 dB
VF-negative better than [>] -1.5 dB 1 better than [<] 2.0 dB 1
If both eyes of a patient met these criteria, a single eye was randomly selected. A small number of 65
eyes (n=17) were excluded owing to clearly non-glaucomatous visual field loss (for example, 66
hemianopia) or non-glaucomatous lesions visible on the fundus photographs (eg chorioretinal scars). 67
There were 155 eyes in the VF-positive and 144 eyes in the VF-negative group. 68
To eliminate any potential clues other than glaucomatous optic disc damage, we matched the image 69
quality in VF-negative and VF-positive groups. One of the authors (DE) viewed the images on a 70
computer monitor in random order and graded each one on a five-point scale for focus and 71
uniformity of illumination. During grading, the observer was unaware of the status of the image (VF-72
positive or -negative), and the area of the disc had been masked from view. A final set of 20 VF-73
positive images and 80 VF-negative images was then created such that the distribution of image 74
quality was similar in both groups (Table 2). The total size of the image set (100), and the ratio of 75
5
VF-positive to VF-negative images (20:80), had been decided on beforehand to limit the duration of 76
the experiments and to keep the emphasis on discs with early damage. 77
Table 2: Characteristics of VF-positive and VF-negative groups
Image quality was scored subjectively on a scale from 1 to 5. Differences between groups were tested for statistical significance by Mann-Whitney U (MWU) tests.
Image Quality Age, y MD, dB PSD
VF-positive (n=20) 1.82 (1.20) 66.0 (13.1) -6.20 (1.76) 5.58 (2.15)
VF-negative (n=80) 1.68 (1.33) 61.3 (9.3) +0.60 (0.4) 1.50 (0.16)
p-value (MWU) 0.67 0.35 <0.001 <0.001
Expert Observers 78
For the present study, 12 expert observers (either glaucoma fellowship-trained ophthalmologists 79
working in glaucoma sub-speciality clinics (n=10) or scientists involved in research in the optic disc 80
in glaucoma (n=2) were selected as observers. Observers were approached ad-hoc during scientific 81
meetings or contacted by e-mail or letter with a request for participation. 82
Prior to the experiments, the observers were given written instructions detailing the selection of the 83
image set. The instructions also stipulated that responses should be given on the basis of apparent 84
optic disc damage rather than the perceived likelihood of visual field damage. 85
86
6
Experiments 86
In order to present images under controlled conditions, and to collect the observers’ responses, a 87
software package Discus (3.0E, figure 1) was developed in Delphi (CodeGear, San Francisco, CA). 88
Details on availability and configuration of the software are provided in the Appendix. 89
The software displayed the images, in random order, on a computer monitor. After the observer had 90
triggered a new presentation by hitting the “Next” button, an image was displayed until the observer 91
responded by clicking one of 5 buttons (definitely healthy, probably healthy, not sure, probably 92
damaged, definitely damaged). After a time-out period of 60 seconds the image would disappear, but 93
observers were allowed unlimited time to give a response. To guard against occasional finger-errors, 94
observers were also allowed to change their response, as long as this occurred before the “Next” 95
button was hit. 96
To assess the consistency of the observers, 26 images were presented twice (2 in the VF-positive 97
group, 24 in the VF-negative group). No feedback was provided during the sessions. 98
Fig 1: Screenshot of Discus software.
Images remained on display for up to 60 seconds, or until the observer clicked on one of the 5 response categories. A new presentation was triggered by hitting the “Next” button.
99
Analysis 100
The responses were transformed to a numerical scale ranging from -2 (“definitely healthy”) to +2 101
(“definitely damaged”. The proportion of repeated images in which the responses differed by one or 102
more categories was calculated, for each observer. For all subsequent analyses, however, only the 103
last of the two responses was used. All analyses were carried out in the freely available open-source 104
environment R, and the ROCR library was used to plot the ROC curves.26, 27 105
7
Individual observers’ ROC curves 106
To obtain an objective measure of individual observers’ performance at discriminating between eyes 107
with and without visual field damage, ROC curves were derived from each set of responses. For this 108
analysis, the visual field status was the reference standard, and responses in the “not sure” category 109
were interpreted as between “probably healthy” and “probably damaged”. If an observer had used all 110
five response categories, the ROC curve would contain 4 points (A – D). Point A, the most 111
conservative criterion (most specific but least sensitive) gave the sensitivity and specificity to visual 112
field damage when only the “definitely damaged” responses were treated as test positives while all 113
other responses (“probably damaged”, “not sure”, “probably healthy”, “definitely healthy”) were 114
interpreted as test negatives. For point D, the least conservative criterion (most sensitive but least 115
specific), only “definitely healthy” responses were interpreted as test negatives, and all other 116
responses as test positives. 117
Individual observers’ criteria 118
When using a subjective scale, as in the current study, the responses are dependent on the observer’s 119
interpretation of the categories and their individual inclination to respond with “probably damaged” 120
or “definitely damaged” (response criterion). A cautious observer, for example, might regard a 121
particular ONH as “probably damaged” whilst an equally skilled but less cautious observer might 122
respond with “not sure” or “probably healthy”. To investigate the variation in criteria within our 123
group, we compared the observers’ mean responses across the entire image set. 124
Combining responses of expert observers 125
To estimate the performance of a panel of experts, and to obtain a reference other than visual field 126
damage for judging current as well as future observer’s responses, the mean response of the 12 127
expert observers was calculated for each of the 100 images. 128
To estimate if the expert group (n=12) was sufficiently large, we investigated how the performance 129
of the combined panel changed depending on the number of included observers. Areas under the 130
ROC curve were calculated for all possible combinations of 2, 3, 4…11 observers to derive the mean 131
performance, as well as the minimum and maximum. 132
Relationship between responses of individual observers and expert panel 133
As a measure of overall agreement between the expert observers, independent of their individual 134
response criteria, the Spearman rank correlation coefficient between the 12 sets of responses was 135
computed. The underlying rationale of this analysis is that, by assigning each image to one of five 136
ordinal categories, each observer had in fact ranked the 100 images. If two observers had performed 137
identical ranking, the Spearman coefficient would be 1, regardless of the actual responses assigned. 138
8
Results 139
The experiments took between 13 and 46 minutes (mean, 29 min) to complete. On average, the 140
observers responded 7 seconds after the images were first presented on the screen, and the median 141
response latencies of individual observers ranged from 4 to 16 seconds. The reproducibility of 142
individual observer’s responses was moderate - on average, discrepancies of one category were seen 143
in 44% (12) of 26 repeated images (range, 23 – 62%). 144
Individual observers’ results are shown in Fig. 2A-L. The points labelled A, B, C, and D represent 145
the trade-off between the positive rates in the VF-positive (vertical axis) and VF-negative groups 146
(horizontal axis) achieved with the four possible classification criteria. Point A, for example, shows 147
the trade-off when only discs in the “definitely damaged” category are regarded as test-positives. 148
Point B gives the trade-off when discs in both “definitely damaged” and “probably damaged” 149
categories are regarded as test-positives. For D, the least conservative criterion, only responses of 150
“definitely healthy” were interpreted as negatives. To indicate the precision of these estimates, the 151
95% confidence intervals were added to point B. 152
Areas under the curve (AUROC) ranged from 0.71 (95% CI, 0.58, 0.85) to 0.88 (95% CI, 0.82, 153
0.96), with a mean of 0.79. There was no relationship between observers’ overall performance and 154
their median response latency (Spearman’s rho = 0.34, p = 0.29). 155
In contrast to their similar overall performance, the observers’ response criteria differed substantially 156
(p<0.001, Friedman test). For example, the proportion of discs in the VF-positive category which 157
were classified as “definitely damaged” ranged from 15% to 90%, while the proportion of discs in 158
the VF-negative category classified as “definitely healthy” ranged from 8% to 68%. In Fig 2A-L, the 159
response criterion is represented by the inclination of the red line with its origin in the bottom right 160
corner. If the responses had been exactly balanced between the “damaged” and “healthy” categories, 161
the inclination of the line would be 45 degrees. A more horizontal line represents a more 162
conservative criterion (less likely to respond with “probably damaged” or “definitely damaged”, 163
while a more vertical line represents a less conservative criterion. There was no relationship between 164
the observers’ performance (AUROC) and their response criterion (Spearman’s rho 0.41, p = 0.18). 165
To derive the “best possible” performance as a reference for future observers, the responses of the 166
expert panel were combined by calculating the mean response obtained for each image. The ROC 167
curve for the combined responses (grey curve in Fig. 2A-L) enclosed an area of 0.87. 168
169
9
169
Fig. 2. Receiver-operating characteristic (ROC) curves for the classification of optic disc photographs by the 12 expert observers (A-L), with a reference standard of visual field damage. The x-axis (positive rate in VF-negative group) measures specificity to visual field damage, while the y-axis (positive rate in the VF-positive group) gives the sensitivity. Point A (most conservative criterion) shows the trade-off between sensitivity and specificity when only“definitely damaged” responses are interpreted as test positives. For point D (the least conservative criterion) shows the trade-off when all but “definitely healthy” responses are interpreted as test positives. Boxplots (right) give the distributions of response latencies, and the number of times each response was selected.
170
10
170
11
Fig. 2 (cont). To facilitate comparison, the grey ROC curve, and the dotted grey line, represent the performance and the criterion of the group as a whole, respectively. Results provided in numerical format are the area under the ROC curve (AUC), the percentage of the AUC as compared to that of the entire group (individual ROC area – 0.5) / (expert panel ROC area – 0.5), the Spearman rank correlation of the individual’s responses with those of the entire group, the mean difference between repeated responses, and the average response as a measure of criterion (-2=”definitely healthy”, -1=”probably healthy”, 0=”not sure”, 1=”probably damaged”, and 2=”definitely damaged”.
171
12
To investigate how the performance of an expert panel varies with the number of contributing 171
observers, the area under the ROC curve was derived for all possible combinations of 2, 3, 4, etc, up 172
to 11, observers (Fig. 5). The limit of the ROC area was approached with 6 or more observers, and it 173
appeared that a further increase in the number of observers would not have had a substantial effect 174
on the performance of the panel. 175
Fig. 3
Performance (area under ROC curve) of combined expert panel as a function of included observers. All possible combinations of 2 to 11 observers were evaluated. The mean area under the ROC curve approaches its limit with approximately 6 observers.
176
Individual observers’ Spearman rank correlation coefficient with the combined expert panel ranged 177
from 0.62 to 0.86, with a median of 0.79. There was no relationship between the Spearman 178
coefficient and the area under the ROC curve (r = 0.09, p = 0.78). 179
180
181
13
Discussion 181
The objective of this work was to establish an easy-to-use tool for clinicians, trainees, and students to 182
assess their skill at interpreting optic discs for signs of glaucoma-related damage, and to provide data 183
from a panel of experts as a reference for future observers. The study also showed that meaningful 184
experiments with Discus can be performed within a relatively short time. 185
All observers in this study had ROC areas significantly smaller than 1, and even when the judgments 186
of the observers were averaged, the combined responses of the panel failed to discriminate perfectly 187
between optic discs in the VF-positive and VF-negative groups. These findings are not surprising, 188
given the lack of a strong association between structure and function in early glaucoma that has been 189
reported by many previous studies.28-33 However, the experiments provide a powerful illustration of 190
how difficult it is to make diagnostic decisions in glaucoma based solely on the optic disc. 191
Estimated at a specificity fixed at 90%, the combined panel’s sensitivity to visual field loss was 60%. 192
This is within the range of performances previously reported for clinical observers and objective 193
imaging tools.9, 34-37 Unfortunately, objective imaging data are not available for the patients in the 194
current dataset and we are therefore unable to perform a direct comparison. However, the 195
methodology developed in this paper may prove useful for future studies that compare diagnostic 196
performance between clinicians and imaging tools in different clinical settings. A potential weakness 197
of our study was the relatively small size of the expert group (n = 12). However, by averaging every 198
possible combination of 2 to 11 observers within the group, we demonstrated that our panel was 199
likely to have attained near-maximum performance, and that a larger group of observers was unlikely 200
to have changed our findings substantially. 201
One challenging issue is how to derive complete and easily interpretable summary measures of 202
performance, in the absence of a reference standard of optic disc damage. Such summary measures 203
would be useful for giving feedback and for establishing targets for students and trainees. We used 204
visual field data as the criterion to separate optic disc images into VF-positive and VF-negative 205
groups, and there was no selection based on the presence or type of optic disc damage which would 206
have biased our sample.38-40 The ROC area therefore measures the statistical separation between an 207
observer’s responses to optic discs in eyes with and without visual field damage.41, 42 However, 208
owing to the lack of a strong correlation between structure and function, visual field loss is not an 209
ideal metric for optic disc damage in early glaucoma. For example, it is likely that a substantial 210
proportion of the VF-negative images show early structural damage, whereas some optic discs in the 211
VF-positive group may still appear healthy. 212
We have attempted to address the problem of a lacking reference standard in two complementary 213
ways. First, a new observer’s ROC area can be compared to that of the expert panel, such that the 214
14
statistic is re-scaled to cover a potential range from near zero (corresponding to chance performance, 215
AUROC = 0.5) to around 100% (AUROC = 0.87, performance of expert panel). 216
Second, we suggest that the Spearman rank correlation coefficient may be useful as a measure of 217
agreement between a future observer’s responses and those of the expert panel.43 Because this 218
coefficient takes into account the relative ranking of the responses, and not their overall magnitude, it 219
is independent of the observer’s response criterion. Consider, for example, three images graded as 220
“probably damaged”, “probably healthy”, and “definitely healthy” by the expert group. An observer 221
responding with “definitely damaged”, “not sure”, and “probably healthy” would differ in criterion 222
but agree on the relative ranking of damage, and their rank correlation with the expert panel would 223
be 1.0 (perfect). Our data suggest that observers may achieve similar ROC areas with rather different 224
responses (consider observers D and F as an example), and the lack of association between the ROC 225
area and the rank correlation means that these statistics measure somewhat independent aspects of 226
decision-making. 227
A surprising finding was that individual observers in our study adopted very different response 228
criteria, even though they had been provided with identical written instructions and identical 229
information on the source of the images and the distribution of visual field damage in the sample 230
(compare observers A and E, for example). It is possible that we might have been able to control the 231
criteria more closely, for example by instructing observers to use the “probably damaged” category if 232
they believed that the chances for the eye to be healthy were less than, say, 10%. More importantly, 233
however, our findings underscore the need to distinguish between differences in diagnostic 234
performance, and differences in diagnostic criterion, whenever subjective ratings of optic disc 235
damage are involved. This is the principal reason for why we avoided the use of kappa statistics 236
which measure overall agreement but do not isolate differences in criterion.44, 45 237
The outpatient clinic from which our images were obtained sees a relatively high proportion of 238
patients suspected of having glaucoma who do not have visual field loss. Because our image sample 239
is not representative of an unselected population, the ROC curves are likely to underestimate 240
clinicians’ true performance at detecting glaucoma by ophthalmoscopy. However, the use of a 241
“difficult” data set may also be seen as an advantage as it allows observers’ performance to be 242
assessed on the type of optic disc more likely to cause diagnostic problems in clinical practice. 243
In addition to the source of our images, here are several other reasons for why the performance on 244
Discus should not be regarded as providing a truly representative measure of an observer’s real-245
world diagnostic capability. First, we used non-stereoscopic images. Stereoscopic images would 246
have been more representative of slitlamp biomicroscopy, the current standard of care, and there is 247
evidence that many features of glaucomatous damage may be more clearly apparent in stereoscopic 248
images.46 However, the gain over monoscopic images is probably not large.47-50 Second, Discus does 249
15
not permit a comparison of fellow eyes which often provides important clues in patients with early 250
damage.51 Third, through the display of photographic images on a computer monitor we can not 251
assess an observer’s aptitude at obtaining an adequate view of the optic disc in real patients. 252
Notwithstanding these limitations, we believe that Discus provides a useful assessment of some 253
important aspects of recognising glaucomatous optic disc damage. Further studies with Discus are 254
now being undertaken to examine the performance of ophthalmology residents and other trainees as 255
compared to our expert group. These studies will also provide insight into which features of 256
glaucomatous optic disc damage are least well recognised, and how clinicians use information on 257
prior probability in their clinical decision-making. 258
Conclusions 259
The Discus software may be useful in the assessment and training of clinicians involved in the 260
detection of glaucoma. It is freely available from http://discusproject.blogspot.com, and interested 261
users may analyse their results using an automated web server on this site. 262
Acknowledgements 263
Robert Harper, Amanda Harding and Jo Marcks of the OLGA clinic at the Manchester Royal Eye 264
Hospital supported this project and contributed ideas. Jonathan Layes (Medicine) and Bijan Farhoudi 265
(Computer Science) of Dalhousie University helped to improve the software and to implement an 266
automated analysis on our server. We are most grateful to all 12 anonymous observers for their 267
participation. 268
Appendix 269
At present, Discus is available only for the Windows operating systems. The software can be called 270
with different start-up parameters. These parameters (and their defaults) are: 271
1) Duration of image presentations, in ms (10000) 272
2) Rate of Repetitions in the visual field positive group (0.1) 273
3) Rate of Repetitions in the visual field negative group (0.3) 274
4) Save-To-Desktop status (1) 275
If the Save-To-Desktop status is set to 1, a tab delimited file will be saved to the desktop. The user 276
can then upload this file to our server and retrieve their results after a few seconds. 277
278
279
16
References 279
280
1. Weinreb RN, Tee Khaw P. Primary open-angle glaucoma. Lancet 2004;363:1711-1720. 281
2. Garway-Heath DF. Early diagnosis in glaucoma. In: Nucci C, Cerulli L, Osborne NN, 282
Bagetta G (eds), Progress in Brain Research; 2008:47-57. 283
3. Gordon MO, Beiser JA, Brandt JD, et al. The Ocular Hypertension Treatment Study: 284
Baseline Factors That Predict the Onset of Primary Open-Angle Glaucoma. Archives of 285
Ophthalmology 2002;120:714. 286
4. Keltner JL, Johnson CA, Anderson DR, et al. The association between glaucomatous visual 287
fields and optic nerve head features in the Ocular Hypertension Treatment Study. 288
Ophthalmology 2006;113:1603-1612. 289
5. Predictive Factors for Open-Angle Glaucoma among Patients with Ocular Hypertension in 290
the European Glaucoma Prevention Study. Ophthalmology 2007;114:3-9. 291
6. Broadway DC, Nicolela MT, Drance SM. Optic Disk Appearances in Primary Open-Angle 292
Glaucoma. Survey of Ophthalmology 1999;43:223-243. 293
7. Jonas JB, Budde WM, Panda-Jonas S. Ophthalmoscopic evaluation of the optic nerve head. 294
Survey of Ophthalmology 1999;43:293-320. 295
8. Lin SC, Singh K, Jampel HD, et al. Optic Nerve Head and Retinal Nerve Fiber Layer 296
Analysis: A Report by the American Academy of Ophthalmology. Ophthalmology 297
2007;114:1937-1949. 298
9. Sharma P, Sample PA, Zangwill LM, Schuman JS. Diagnostic Tools for Glaucoma Detection 299
and Management. Survey of Ophthalmology 2008;53. 300
10. Zangwill LM, Bowd C, Weinreb RN. Evaluating the Optic Disc and Retinal Nerve Fiber 301
Layer in Glaucoma II: Optical Image Analysis. Seminars in Ophthalmology 2000;15:206 - 302
220. 303
11. Mowatt G, Burr JM, Cook JA, et al. Screening Tests for Detecting Open-Angle Glaucoma: 304
Systematic Review and Meta-analysis. Invest Ophthalmol Vis Sci 2008;49:5373-5385. 305
12. Fingeret M, Medeiros FA, Susanna Jr R, Weinreb RN. Five rules to evaluate the optic disc 306
and retinal nerve fiber layer for glaucoma. Optometry 2005;76:661-668. 307
13. Susanna Jr R, Vessani RM. New findings in the evaluation of the optic disc in glaucoma 308
diagnosis. Current Opinion in Ophthalmology 2007;18:122-128. 309
14. Caprioli J. Clinical evaluation of the optic nerve in glaucoma. Transactions of the American 310
Ophthalmological Society 1994;92:589. 311
15. Lichter PR. Variability of expert observers in evaluating the optic disc. Transactions of the 312
American Ophthalmological Society 1976;74:532. 313
16. Tielsch JM, Katz J, Quigley HA, Miller NR, Sommer A. Intraobserver and interobserver 314
agreement in measurement of optic disc characteristics. Ophthalmology 1988;95:350-356. 315
17. Nicolela MT, Drance SM, Broadway DC, Chauhan BC, McCormick TA, LeBlanc RP. 316
Agreement among clinicians in the recognition of patterns of optic disk damage in glaucoma. 317
American journal of ophthalmology 2001;132:836-844. 318
18. Spalding JM, Litwak AB, Shufelt CL. Optic nerve evaluation among optometrists. Optom Vis 319
Sci 2000;77:446-452. 320
19. Harper R, Reeves B, Smith G. Observer variability in optic disc assessment: implications for 321
glaucoma shared care. Ophthalmic Physiol Opt 2000;20:265-273. 322
17
20. Harper R, Radi N, Reeves BC, Fenerty C, Spencer AF, Batterbury M. Agreement between 323
ophthalmologists and optometrists in optic disc assessment: training implications for 324
glaucoma co-management. Graefes Archive Clin Exp Ophthalmol 2001;239:342-350. 325
21. Spry PG, Spencer IC, Sparrow JM, et al. The Bristol Shared Care Glaucoma Study: reliability 326
of community optometric and hospital eye service test measures. The British journal of 327
ophthalmology 1999;83:707-712. 328
22. Abrams LS, Scott IU, Spaeth GL, Quigley HA, Varma R. Agreement among optometrists, 329
ophthalmologists, and residents in evaluating the optic disc for glaucoma. Ophthalmology 330
1994;101:1662-1667. 331
23. Varma R, Steinmann WC, Scott IU. Expert agreement in evaluating the optic disc for 332
glaucoma. Ophthalmology 1992;99:215-221. 333
24. Azuara-Blanco A, Katz LJ, Spaeth GL, Vernon SA, Spencer F, Lanzl IM. Clinical agreement 334
among glaucoma experts in the detection of glaucomatous changes of the optic disk using 335
simultaneous stereoscopic photographs. American journal of ophthalmology 2003;136:949-336
950. 337
25. Sung VCT, Bhan A, Vernon SA. Agreement in assessing optic discs with a digital 338
stereoscopic optic disc camera (Discam) and Heidelberg retina tomograph. BMJ; 2002:196-339
202. 340
26. Ihaka R, Gentleman R. R: A Language for Data Analysis and Graphics. Journal of 341
Computational and Graphical Statistics 1996;5:299-314. 342
27. Sing T, Sander O, Beerenwinkel N, Lengauer T. ROCR: visualizing classifier performance in 343
R. Bioinformatics 2005;21:3940-3941. 344
28. Anderson RS. The psychophysics of glaucoma: Improving the structure/function relationship. 345
Progress in Retinal and Eye Research 2006;25:79-97. 346
29. Garway-Heath DF, Holder GE, Fitzke FW, Hitchings RA. Relationship between 347
electrophysiological, psychophysical, and anatomical measurements in glaucoma. 348
Investigative Ophthalmology and Visual Science 2002;43:2213-2220. 349
30. Johnson CA, Cioffi GA, Liebmann JR, Sample PA, Zangwill LM, Weinreb RN. The 350
relationship between structural and functional alterations in glaucoma: A review. Seminars in 351
Ophthalmology 2000;15:221-233. 352
31. Harwerth RS, Quigley HA. Visual field defects and retinal ganglion cell losses in patients 353
with glaucoma. Archives of Ophthalmology 2006;124:853-859. 354
32. Caprioli J. Correlation of visual function with optic nerve and nerve fiber layer structure in 355
glaucoma. Survey of Ophthalmology 1989;33:319-330. 356
33. Caprioli J, Miller JM. Correlation of structure and function in glaucoma. Quantitative 357
measurements of disc and field. Ophthalmology 1988;95:723-727. 358
34. Deleon-Ortega JE, Arthur SN, McGwin Jr G, Xie A, Monheit BE, Girkin CA. Discrimination 359
between glaucomatous and nonglaucomatous eyes using quantitative imaging devices and 360
subjective optic nerve head assessment. Invest Ophthalmol Vis Sci 2006;47:3374-3380. 361
35. Mardin CY, Jünemann AGM. The diagnostic value of optic nerve imaging in early glaucoma. 362
Current Opinion in Ophthalmology 2001;12:100-104. 363
36. Greaney MJ, Hoffman DC, Garway-Heath DF, Nakla M, Coleman AL, Caprioli J. 364
Comparison of optic nerve imaging methods to distinguish normal eyes from those with 365
glaucoma. Investigative Ophthalmology and Visual Science 2002;43:140-145. 366
18
37. Harper R, Reeves B. The sensitivity and specificity of direct ophthalmoscopic optic disc 367
assessment in screening for glaucoma: a multivariate analysis. Graefe's Archive for Clinical 368
and Experimental Ophthalmology 2000;238:949-955. 369
38. Whiting P, Rutjes AWS, Reitsma JB, Glas AS, Bossuyt PMM, Kleijnen J. Sources of 370
Variation and Bias in Studies of Diagnostic Accuracy: A Systematic Review. Annals of 371
Internal Medicine 2004;140:189-202. 372
39. Medeiros FA, Ng D, Zangwill LM, Sample PA, Bowd C, Weinreb RN. The effects of study 373
design and spectrum bias on the evaluation of diagnostic accuracy of confocal scanning laser 374
ophthalmoscopy in glaucoma. Investigative Ophthalmology and Visual Science 2007;48:214-375
222. 376
40. Harper R, Henson D, Reeves BC. Appraising evaluations of screening/diagnostic tests: the 377
importance of the study populations. British Journal of Ophthalmology 2000;84:1198. 378
41. Hanley JA. Receiver operating characteristic (ROC) methodology: The state of the art. 379
Critical Reviews in Diagnostic Imaging 1989;29:307-335. 380
42. Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating 381
characteristic (ROC) curve. Radiology 1982;143:29-36. 382
43. Svensson E. A coefficient of agreement adjusted for bias in paired ordered categorical data. 383
Biometrical journal 1997;39:643-657. 384
44. Fleiss JL. Measuring nominal scale agreement among many raters. Psychological Bulletin 385
1971;76:378-382. 386
45. Feinstein AR, Cicchetti DV. High agreement but low kappa: I. The problems of two 387
paradoxes. J Clin Epidemiol 1990;43:543-549. 388
46. Morgan JE, Sheen NJL, North RV, Choong Y, Ansari E. Digital imaging of the optic nerve 389
head: Monoscopic and stereoscopic analysis. British Journal of Ophthalmology 2005;89:879-390
884. 391
47. Hrynchak P, Hutchings N, Jones D, Simpson T. A comparison of cup-to-disc ratio 392
measurement in normal subjects using optical coherence tomography image analysis of the 393
optic nerve head and stereo fundus biomicroscopy. Ophthalmic and Physiological Optics 394
2004;24:543-550. 395
48. Parkin B, Shuttleworth G, Costen M, Davison C. A comparison of stereoscopic and 396
monoscopic evaluation of optic disc topography using a digital optic disc stereo camera. 397
BMJ; 2001:1347-1351. 398
49. Vingrys AJ, Helfrich KA, Smith G. The role that binocular vision and stereopsis have in 399
evaluating fundus features. Optom Vis Sci 1994;71:508-515. 400
50. Rumsey KE, Rumsey JM, Leach NE. Monocular vs. stereospecific measurement of cup-to-401
disc ratios. Optometry and Vision Science 1990;67:546-550. 402
51. Harasymowycz P, Davis B, Xu G, Myers J, Bayer A, Spaeth GL. The use of RADAAR (ratio 403
of rim area to disc area asymmetry) in detecting glaucoma and its severity. Canadian Journal 404
of Ophthalmology 2004;39:240-244. 405
406
407