increased use of available image data decreases …kwb/hollingsworth_dissertation_2010.pdf · 700...

INCREASED USE OF AVAILABLE IMAGE DATA

DECREASES ERRORS IN IRIS BIOMETRICS

A Dissertation

Submitted to the Graduate School

of the University of Notre Dame

in Partial Fulfillment of the Requirements

for the Degree of

Doctor of Philosophy

by

Karen P. Hollingsworth

Kevin W. Bowyer, Co-Director

Patrick J. Flynn, Co-Director

Graduate Program in Computer Science and Engineering

Notre Dame, Indiana

July 2010

c© Copyright by


2010

All Rights Reserved

INCREASED USE OF AVAILABLE IMAGE DATA

DECREASES ERRORS IN IRIS BIOMETRICS

Abstract

by


Iris biometrics is used in a number of different applications, such as frequent

flyer programs, identification of prisoners, and border control in the United Arab

Emirates. However, governments interested in using iris biometrics have still found

difficulties using it on large populations. Further improvements in iris recognition

are required in order to enable this technology to be used in more settings.

In this dissertation, we describe three methods of reducing error rates for iris

biometrics. We define and employ a metric called the fragile bit distance which

uses the locations of less stable bits in an iris template to improve performance.

We also investigate signal fusion of multiple frames in an iris video to achieve

better recognition performance than is possible using single still images. Third,

we present a study of what features are useful for identification in the periocular

region. Periocular biometrics is still an emerging field of research, but we antici-

pate that fusing periocular information with iris information will result in a more

robust biometric system.

A final contribution of this work is a study of how iris biometrics performs

on twins. Our experiments confirm prior claims that iris biometrics is capable

of differentiating between twins. However, we additionally show that there is


texture information in the iris that is not encoded by traditional iris biometrics

systems. Our experiments suggest that human examination of pairs of iris images

for forensic purposes may be feasible. Our results also suggest that development

of different approaches to automated iris image analysis may be useful.

CONTENTS

FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi

TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi

ACKNOWLEDGMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . xii

CHAPTER 1: INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . 1

CHAPTER 2: BACKGROUND . . . . . . . . . . . . . . . . . . . . . . . . 52.1 Performance of Biometric Systems . . . . . . . . . . . . . . . . . . 5

2.1.1 Verification . . . . . . . . . . . . . . . . . . . . . . . . . . 52.1.2 Identification . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.2 Eye Anatomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.3 Early Research in Iris Biometrics . . . . . . . . . . . . . . . . . . 112.4 Recent Research in Iris Biometrics . . . . . . . . . . . . . . . . . . 18

2.4.1 Image Acquisition, Restoration, and Quality Assessment . 192.4.1.1 Image Acquisition . . . . . . . . . . . . . . . . . . . . 192.4.1.2 Image Restoration . . . . . . . . . . . . . . . . . . . . 212.4.1.3 Image Quality . . . . . . . . . . . . . . . . . . . . . . 22

2.4.2 Image Compression . . . . . . . . . . . . . . . . . . . . . . 262.4.3 Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . 27

2.4.3.1 Active Contours . . . . . . . . . . . . . . . . . . . . . 272.4.3.2 Alternatives to Active Contours . . . . . . . . . . . . 282.4.3.3 Eyelid and Eyelash Detection . . . . . . . . . . . . . . 302.4.3.4 Segmenting Iris Images with Non-frontal Gaze . . . . 31

2.4.4 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . 332.4.5 Improvements in Matching . . . . . . . . . . . . . . . . . . 342.4.6 Searching Large Biometrics Databases . . . . . . . . . . . 352.4.7 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . 36

2.4.7.1 Cryptographic Applications . . . . . . . . . . . . . . . 362.4.7.2 Identity Cards in the U.K. . . . . . . . . . . . . . . . 39

ii

2.4.8 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . 402.4.9 Performance under Varying Conditions . . . . . . . . . . . 412.4.10 Multibiometrics . . . . . . . . . . . . . . . . . . . . . . . . 43

CHAPTER 3: FRAGILE BIT COINCIDENCE . . . . . . . . . . . . . . . 463.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

3.2.1 Research on Fusing Hamming Distance with Added Infor-mation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

3.2.2 Research on Fragile Bits . . . . . . . . . . . . . . . . . . . 523.3 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 543.4 Fragile Bit Distance (FBD) . . . . . . . . . . . . . . . . . . . . . 573.5 Score Distributions for Hamming Distance and Fragile Bit Distance 593.6 Fusing Fragile Bit Distance with Hamming Distance . . . . . . . . 603.7 Tests of Statistical Significance . . . . . . . . . . . . . . . . . . . 683.8 Effect of Modifying the Fragile Bit Masking Threshold . . . . . . 703.9 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

CHAPTER 4: AVERAGE IMAGES . . . . . . . . . . . . . . . . . . . . . 774.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 774.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

4.2.1 Video . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 794.2.2 Still Images . . . . . . . . . . . . . . . . . . . . . . . . . . 80

4.3 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 814.4 Average Images and Templates . . . . . . . . . . . . . . . . . . . 82

4.4.1 Selecting Frames and Preprocessing . . . . . . . . . . . . . 824.4.2 Signal Fusion . . . . . . . . . . . . . . . . . . . . . . . . . 854.4.3 Creating an Iris Code Template . . . . . . . . . . . . . . . 88

4.5 Comparison of Median and Mean for Signal Fusion . . . . . . . . 904.6 How Many Frames Should be Fused in an Average Image? . . . . 924.7 How Much Masking Should be Used in an Average Image? . . . . 954.8 Comparison to Other Methods . . . . . . . . . . . . . . . . . . . . 96

4.8.1 Comparison to Previous Multi-gallery Methods . . . . . . 974.8.2 Comparison to Previous Log-Likelihood Method . . . . . . 1014.8.3 Comparing to Large Multi-Gallery, Multi-Probe Methods . 1034.8.4 Computation Time . . . . . . . . . . . . . . . . . . . . . . 105

4.9 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

CHAPTER 5: IRIS BIOMETRICS ON TWINS . . . . . . . . . . . . . . . 1115.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1115.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1135.3 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

iii

5.3.1 Frame Selection . . . . . . . . . . . . . . . . . . . . . . . . 1175.3.2 Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . 118

5.4 Biometric Performance on Twins’ Irises . . . . . . . . . . . . . . . 1195.5 Similarities in Twins’ Irises Detected by Humans . . . . . . . . . 122

5.5.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . 1225.5.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126

5.5.2.1 Can Humans Identify Twins from Iris Texture Alone? 1265.5.2.2 Can Humans Identify Twins from Periocular Informa-

tion Alone? . . . . . . . . . . . . . . . . . . . . . . . . 1265.5.2.3 Did Humans Score Higher on Queries where They Felt

More Certain? . . . . . . . . . . . . . . . . . . . . . . 1275.5.2.4 Is It Easier to Identify Twin Pairs Using Iris Data or

Periocular Data? . . . . . . . . . . . . . . . . . . . . . 1275.5.2.5 Did Subjects Score Better on the Second Half of the

Iris Test than the First Half? . . . . . . . . . . . . . . 1285.5.2.6 Did Subjects Score Better on the Second Half of the

Periocular Test than the First Half? . . . . . . . . . . 1295.5.2.7 Which Image Pairs Were Most Frequently Classified

Correctly, and Which Pairs Were Most Frequently Clas-sified Incorrectly? . . . . . . . . . . . . . . . . . . . . 130

5.5.2.8 Is It More Difficult to Label Twins as Twins than It Isto Label Unrelated People as Unrelated? . . . . . . . 132

5.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

CHAPTER 6: PERIOCULAR BIOMETRICS . . . . . . . . . . . . . . . . 1366.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1366.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1386.3 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1406.4 Experimental Method . . . . . . . . . . . . . . . . . . . . . . . . . 1426.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144

6.5.1 How Well Can Humans Determine whether Two PeriocularImages Are from the Same Person or Not? . . . . . . . . . 144

6.5.2 Did Humans Score Higher when They Felt More Certain? . 1446.5.3 Did Testers Do Better on the Second Half of the Test than

the First Half? . . . . . . . . . . . . . . . . . . . . . . . . 1456.5.4 Which Features Are Correlated with Correct Responses? . 1456.5.5 Which Features Are Correlated with Incorrect Responses? 1476.5.6 What Additional Information Did Testers Provide? . . . . 1476.5.7 Which Pairs Were Most Frequently Classified Correctly, and

Which Pairs Were Most Frequently Classified Incorrectly? 1496.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151

iv

CHAPTER 7: CONCLUSIONS . . . . . . . . . . . . . . . . . . . . . . . . 155

BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157

v

FIGURES

2.1 In a biometric system, the number of false accepts and the numberof false rejects are related to the chosen decision criteria (Figuremodeled after [27]). . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.2 Image 05495d15 from Notre Dame Dataset. Elements seen in atypical iris image are labeled here. . . . . . . . . . . . . . . . . . . 10

2.3 Commercial iris cameras use near-infrared illumination so that theillumination is unintrusive to humans, and so that the texture ofheavily pigmented irises can be imaged more effectively. This graphshows the spectrum of wavelengths emitted by the LEDs on an LG2200 iris camera. This camera uses wavelengths primarily between700 and 900 nanometers. The spectral characteristics were cap-tured using spectrophotometric equipment made available by Prof.Douglas Hall of the University of Notre Dame. . . . . . . . . . . . 13

2.4 Melanin pigment absorbs much of visible light, but reflects more ofthe longer wavelengths of light (Picture reprinted from [23], datafrom [71]). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.5 Major steps in iris biometrics processing. (Picture reprinted from[16] with permission from Elsevier.) . . . . . . . . . . . . . . . . . 17

2.6 Kang and Park [68] and He et al. [48] use information about cam-era optics and position of the subject to estimate a point spreadfunction and restore blurry images to in-focus images. Above is anexample of (a) a blurry iris image and (b) an in-focus image of thesame subject. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

2.7 Belcher and Du’s quality measure [7] combines information aboutocclusion, dilation, and texture. Above is an example of (a) aheavily occluded iris image, and (b) a less occluded image of thesame subject. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2.8 As iris biometrics is used for larger and more varied applications, itwill have to deal with irises with various different conditions. Thisimage shows an unusual iris (Subject 05931) with filaments of tissueextending into the pupil. . . . . . . . . . . . . . . . . . . . . . . . 42

vi

2.9 MBGC data included near infrared iris videos captured with aSarnoff Iris on the Move portal, shown above. Video of a sub-ject is captured as a user walks through the portal. This type ofacquisition is less constrained than traditional iris cameras, how-ever, the quality of the iris images acquired is poorer. It is possibleto acquire both face and iris information using this type of portal.(Picture reprinted from [16] with permission from Elsevier.) . . . 45

3.1 Example images from our data set. These images were capturedusing an LG4000 iris camera. . . . . . . . . . . . . . . . . . . . . 48

3.2 These are the fragile bit patterns (imaginary part) corresponding tothe images in Figure 3.1. Black pixels are bits masked for fragility.We use 4800-bit iris codes and mask 25% of the bits (or 1200 bits)for fragility. Some of the bits are masked for occlusion, and soslightly less than 1200 bits are masked for fragility. . . . . . . . . 49

3.3 These are comparisons of fragile bit patterns, each obtained byANDing two fragile bit masks together. For example, Figure 3.3(a)is the comparison mask obtained by combining Figure 3.2(a) and3.2(b). Black pixels show where the two masks agreed. Blue pixelsshow where they disagreed. White pixels were unmasked for bothiris codes. There is more agreement in same-subject comparisonsthan there is when comparing masks of different subjects. . . . . . 49

3.4 Images in our data set were captured using this LG4000 iris cam-era [76]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

3.5 The LG4000 iris camera captures images of both eyes at the sametime. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

3.6 Score distributions for fragile bit distance. . . . . . . . . . . . . . 613.7 Score distributions for Hamming distance. . . . . . . . . . . . . . 623.8 Joint score distributions for Hamming distance and FBD. Genuine

scores are shown in blue. Impostor scores are shown in red. . . . 633.9 A zoomed-in view of the joint score distributions for Hamming dis-

tance and FBD. Genuine scores are shown in blue. Impostor scoresare shown in red. Each point represents at least 0.003% of thecomparisons. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

3.10 We fused FBD and HD using the expression, α × HD + (1 − α) ×FBD. We found that an α value of 0.6 yielded the lowest equalerror rate. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

3.11 Fusing Hamming distance with FBD performs better than usingHamming distance alone. Fusing by multiplying and fusing byweighted averaging yield similar results. . . . . . . . . . . . . . . . 67

vii

3.12 We considered the effect of masking only 5% or 10% of the bitsin the iris code for fragility. Using these values, we compared theperformance of (1) Hamming distance (HD) with performance of (2)fusing HD and FBD with a weighted average (0.6HD × 0.4FBD).At these low levels of fragile bit masking, the difference betweenHD and the fusion is small. The ROC curves for the two methodsoverlap. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

3.13 We considered the effect of masking 15% or 20% of the bits inthe iris code for fragility. Again, we compared the performanceof (1) Hamming distance (HD) with performance of (2) fusing HDand FBD with a weighted average. At these levels of fragile bitmasking, the fusion clearly does better than HD alone. . . . . . . 74

3.14 We considered the effect of masking 25% or 30% of the bits in theiris code for fragility. At these levels of fragile bit masking, thefusion shows an even greater performance benefit over HD alonethan there was at lower levels of fragile bit masking. . . . . . . . . 75

4.1 The Iridian LG EOU 2200 camera used in acquiring iris video se-quences. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

4.2 The frames shown in (a) and (c) were selected by our frame-selectionalgorithm because the frames were in focus; however, these framesdo not include much valid iris data. In our automated experimentspresented in this paper we kept frames like (a) and (c) so thatwe could show how our software performed without any manualquality checking. In our semi-automated experiments we manuallyreplaced frames like (a) and (c) with better frames from the samevideo like (b) and (d). We expect that in the future, we may beable to develop an algorithm to detect blinks and off-angle imagesso that such frames could be automatically rejected. . . . . . . . . 86

4.3 Our automated experiments contain a few incorrect segmentationslike the one shown in (a). In our semi-automated experiments wemanually replaced incorrect segmentations to obtain results likethat shown in (b). . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

4.4 Our automated software did not correctly detect the eyelid in allframes. In our semi-automated experiments we manually replacedincorrect eyelid detections to obtain results like that shown in (b). 87

4.5 From the ten original images on the top, we created the averageimage shown on the bottom. . . . . . . . . . . . . . . . . . . . . . 89

4.6 Using a mean fusion rule for fusing iris images produces better irisrecognition performance than using a median fusion rule. Graph(a) shows this result using automated segmentation. Graph (b)shows the same result using the manually corrected segmentations. 91

viii

4.7 Fusing ten frames together yields better recognition performancethan fusing four, six, or eight frames. . . . . . . . . . . . . . . . . 93

4.8 Too much masking decreases the degrees of freedom in the non-match distribution, causing an increased false accept rate. (Thisgraph shows the trend from the automatically segmented images.The manually corrected segmentation produces the same trend.) . 97

4.9 The amount of masking used to create average images affects per-formance. When using the manually corrected segmentation, wecan use a smaller masking level (masking level = 60%). With theautomated segmentation, a higher masking level (masking level =80%) mitigates the impact of missed eyelid detections. . . . . . . 98

4.10 The proposed signal-fusion method has better performance thanusing a multi-gallery approach with either an “average” or “mini-mum” score-fusion rule. . . . . . . . . . . . . . . . . . . . . . . . 100

4.11 Signal fusion and log-likelihood score fusion methods perform com-parably. The log-likelihood method performs better at operatingpoints with a large false accept rate. The proposed signal-fusionmethod has better performance at operating points with a smallfalse accept rate. . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

4.12 The MGMP-minimum achieves the best recognition performanceof all of the methods considered in this paper. However, the signal-fusion performs well, while taking only 1/N th of the storage and1/N2 of the matching time. . . . . . . . . . . . . . . . . . . . . . 106

4.13 Even though a large multi-gallery, multi-probe experiment achievesbetter recognition performance, it comes at a cost of much slowerexecution time. The proposed signal fusion method is the fastestmethod presented in this paper, and it achieves better recognitionperformance than previously published multi-gallery methods. . . 109

5.1 Images of the left eyes of two identical twins. Notice the similaritiesin overall iris texture, and also the similarities in the appearance ofthe periocular region. . . . . . . . . . . . . . . . . . . . . . . . . . 118

5.2 Images of irises from identical twins. We segmented the images sothat our testers would only see the iris, and therefore they couldnot use periocular features to help them decide whether two iriseswere from twins. . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

5.3 Images of irises from unrelated people. . . . . . . . . . . . . . . . 1205.4 A histogram of Hamming distance scores between twins looks sim-

ilar to a histogram of Hamming distance scores between non-twins. 121

ix

5.5 We wanted to know whether humans could identify twins basedon periocular information. We created images where the iris wasblacked-out so that our testers would be forced to use periocularfeatures to make a judgment. This is an example pair of images.These images are from identical twins. . . . . . . . . . . . . . . . 124

5.6 All 28 testers correctly classified this pair of images as being fromidentical twins. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130

5.7 All 28 testers correctly classified this pair of images as being fromidentical twins. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130

5.8 All 28 testers correctly classified this pair of images as being fromunrelated people. . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

5.9 All 28 testers correctly classified this pair of images as being fromunrelated people. . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

5.10 Twenty-five of 28 people incorrectly guessed that these images werefrom unrelated people. In fact these irises are from identical twins.The difference in dilation makes this pair particularly difficult toclassify correctly. . . . . . . . . . . . . . . . . . . . . . . . . . . . 132

5.11 Twenty-four of 28 people incorrectly guessed that these images werefrom twins, when in fact, these irises are from unrelated people.The smoothness of the texture makes this pair difficult to classifycorrectly. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

6.1 Eyelashes were considered the most helpful feature for making de-cisions about identity. The tear duct and shape of the eye were alsovery helpful. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146

6.2 We compared the rankings for the features from correct responses(Fig. 6.1) with the rankings from incorrect responses. The shapeof the eye and the outer corner of the eye were both used morefrequently on incorrect responses than on correct responses. Thisresult suggests that those two features would be less helpful thanother features such as eyelashes. . . . . . . . . . . . . . . . . . . . 148

6.3 All 25 testers correctly classified these two images as being fromthe same person. . . . . . . . . . . . . . . . . . . . . . . . . . . . 149

6.4 All 25 testers correctly classified these two images as being fromdifferent people . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150

6.5 Eleven of 25 people incorrectly guessed that these images were fromdifferent people, when in fact, these eyes are from the same person.This pair is challenging because one eye is much more open thanthe other. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150

6.6 Eleven of 25 people incorrectly guessed that these images were fromthe same person, when in fact, they are from two different people. 151

x

TABLES

3.1 AVERAGE FBD FOR GENUINE AND IMPOSTOR COMPAR-ISONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

3.2 FUSING FBD WITH HAMMING DISTANCE . . . . . . . . . . . 68

3.3 IS 0.6HD + 0.4FBD BETTER THAN HD ALONE? . . . . . . . 69

3.4 IS HD × FBD BETTER THAN HD ALONE? . . . . . . . . . . . 69

3.5 IS αHD+(1−α)FBD STATISTICALLY SIGNIFICANTLY DIF-FERENT FROM 0.6HD + 0.4FBD? . . . . . . . . . . . . . . . . 71

4.1 SIGNAL-FUSION COMPARED TO PREVIOUS METHODS . . 101

4.2 SIGNAL-FUSION COMPARED TO LOG-LIKELIHOOD SCOREFUSION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

4.3 SIGNAL-FUSION COMPARED TO MULTI-GALLERY, MULTI-PROBE SCORE FUSION . . . . . . . . . . . . . . . . . . . . . . 107

4.4 PROCESSING TIMES FOR DIFFERENT METHODS . . . . . . 108

5.1 DEMOGRAPHIC INFORMATION OF SUBJECTS . . . . . . . . 123

6.1 PERIOCULAR RESEARCH . . . . . . . . . . . . . . . . . . . . . 141

xi

ACKNOWLEDGMENTS

This dissertation would have been an impossible task without the assistance

of my dedicated and supportive advisors. Dr. Bowyer and Dr. Flynn have spent

countless hours in teaching, guiding, proofreading my papers, and encouraging me

in my research.

I also thank my husband, Nathaniel, for listening to my daily reports of my

progress, for staying up late with me when I have had deadlines to meet, and for

being my number one fan.

I am also grateful for the generous support of our sponsors: the National

Science Foundation under grant CNS01-30839, the Federal Bureau of Investiga-

tion, the Central Intelligence Agency, the Intelligence Advanced Research Projects

Activity, the Biometrics Task Force, and the Technical Support Working Group

through US Army contract W91CRB-08-C-0093. The opinions, findings, and con-

clusions or recommendations expressed here are my own and do not necessarily

reflect the views of these sponsors.

xii

CHAPTER 1

INTRODUCTION

According to the International Standards Organization, biometrics is the “au-

tomated recognition of individuals based on their behavioral and biological char-

acteristics” [61]. Examples of biometric characteristics include fingerprints, face,

voice, and iris. A number of different commercial and governmental groups use

biometrics. The largest current commercial user of biometrics is Walt Disney

World [46]. Disney World takes fingerprints of guests as they enter the park, and

keeps a record of the fingerprints associated with each ticket, to ensure that mul-

tiday passes are not resold [46]. One government agency that uses biometrics is

the U.S. Department of Homeland Security. This department employs biometrics

in its US-VISIT program [116]. At ports of entry, the US-VISIT program takes

photographs and digital fingerprints of international travelers holding non-U.S.

passports or visas. These biometric characteristics are (1) compared to a watch-

list of known or suspected terrorists and criminals, (2) compared to a database

of previous US-VISIT users to ensure that a person does not enter the U.S. using

two different identities, and (3) compared to the images of the person who first

obtained the visa or passport to make sure that the document belongs to the

person presenting it and not to an impostor [115].

The applications listed above employ fingerprint and face recognition. Iris

recognition has also been successfully deployed in some settings. Between 2002

1

and 2005, the UNHCR (United Nations High Commissioner for Refugees) used

iris recognition in a repatriation program for Afghan refugees. The UN provided

cash assistance to the refugees, but each applicant was required to provide an

iris image so that the UN could detect anyone who was trying to seek assistance

more than once [58]. In another type of application, jails use iris recognition

to identify prisoners. Repeat offenders may try to give false information about

their identity to avoid detection by other law enforcement agencies. By using iris

recognition, the officers can determine if the person has been in the jail before [59].

Additionally, the use of iris biometrics ensures that inmates do not impersonate

other prisoners to get released early [62]. In some airports in the U.K., Germany,

the Netherlands, United States, and Canada, travelers enrolled in an frequent flyer

program can have their irises scanned to bypass lines at immigration control or

security checkpoints. [24, 49].

Irises are purported to be as unique as fingerprints and as stable over time.

However, iris biometrics systems have not been used for as many years as fin-

gerprints, nor have they been tested in as many different settings. A 2005 test

conducted by the United Kingdom passport service (UKPS) tried to enroll ten

thousand people into their database but only ninety percent of able-bodied users

and sixty percent of disabled users succeeded in providing an iris image that

passed the system’s quality checks [42]. Some of these failures could be remedied

with better trained operators and iris cameras that easily adjusted to wheel-chair

height. Other failures might require improvements in the iris imaging, feature ex-

traction, and recognition technology. Some current research in iris biometrics aims

to extend the performance of iris biometrics to less constrained image acquisition

environments, and to broader groups of people.

2

Current research also aims to make iris recognition possible on larger databases.

The U.S. Federal Bureau of Investigation plans to spend one billion dollars in the

next ten years to create a database of biometric characteristics that includes fin-

gerprints, palm prints, and eye scans [2]. As iris databases increase in size, iris

biometrics algorithms must improve in accuracy and matching speed. Large iris

biometrics applications require smaller error rates in order for the technology to

be used on such large scales.

This dissertation presents my research in decreasing error rates in iris recog-

nition algorithms and improving the applicability of iris biometrics to broader

applications. In Chapter 2, I give a survey of iris biometrics research. In Chap-

ter 3, I present a method of improving performance by looking at the coincidence

of fragile bits in two iris codes. Chapter 4 discusses how to get improved perfor-

mance by using videos instead of still images of irises; I extract multiple frames

from video and then average intensity values from different frames to get an im-

proved iris image.

In Chapter 5, I present an experiment showing that iris biometrics can dis-

tinguish between identical twins. I also show that traditional biometrics systems

only encode part of the texture information apparent in iris images, and that

this additional information could possibly be used in forensic applications to show

genetic relationships between different eye images. Chapter 5 does not focus on

the error rates in the system, but instead focuses on applying iris recognition to

broader applications.

Iris biometrics could be used in even more applications if we could capture

images of the iris from a farther distance or from less-cooperative subjects. One

possibly strategy could be to capture an image that included portions of the

3

face in addition to the iris. Potentially, information from the periocular region

could be combined with iris information to create a system more robust and more

accurate than iris biometrics alone. In Chapter 6, I investigate what features in

the periocular region could be most helpful for identification by asking human

subjects to identify people based on periocular information. Chapter 7 provides

concluding remarks and suggestions for future research.

4

CHAPTER 2

BACKGROUND

This chapter provides background information on iris biometrics and a review

of related research. First, it introduces basic terminology used in evaluating the

performance of biometric systems. Second, it provides an explanation of some

parts of the eye. Third, a typical iris recognition algorithm is explained. Finally,

some recent research in iris biometrics is highlighted. This chapter includes con-

tent that has been published previously in my masters thesis [50]. In addition,

some content from this chapter is reprinted, with permission, from one of my prior

papers published in Computer Vision and Image Understanding [16] ( c©2008, El-

sevier).

2.1 Performance of Biometric Systems

Biometrics can be used in at least two different types of applications: verifi-

cation scenarios and identification scenarios. The next two subsections describe

these scenarios.

2.1.1 Verification

In a verification scenario, a person claims a particular identity and the biomet-

ric system is used to verify or reject the claim. Verification is done by matching a

5

biometric sample acquired at the time of the claim against the sample previously

enrolled for the claimed identity. If the two samples match well enough, the iden-

tity claim is verified, and if the two samples do not match well enough, the claim is

rejected. Thus there are four possible outcomes. A true accept (TA) occurs when

the system accepts, or verifies, an identity claim, and the claim is true. A false

accept (FA) occurs when the system accepts an identity claim, but the claim is

not true. A true reject (TR) occurs when the system rejects an identity claim and

the claim is false. A false reject (FR) occurs when the system rejects an identity

claim, but the claim is true. The two types of errors that can be made are a false

accept and a false reject.

The number of false accepts and the number of false rejects are dependent on

the decision criteria for the system. In a biometrics system, comparisons between

two samples are assigned a score related to the difference between the two samples.

Figure 2.1 depicts notional genuine and impostor score distributions and related

quantities. The distribution of scores for genuine comparisons is imperfectly sepa-

rated from the distribution of scores for impostor comparisons. The system must

decide on a decision threshold such that all scores below the threshold will be

deemed genuine. Impostor comparisons with scores below this threshold are false

accepts; genuine comparisons with scores above the threshold are false rejects.

Performance for the system across a range of decision thresholds can be sum-

marized in a receiver operating characteristic (ROC) curve. Each point on the

ROC curve represents one possible decision threshold. The curve plots the true

accept rate on the Y axis and the false accept rate on the X axis, or, alternatively,

the false reject rate on the Y axis and the false accept rate on the X axis. The

true accept rate is the number of true accepts divided by the total number of true

6

0 0.1 0.2 0.3 0.4 0.5 0.60

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

Pro

babi

lity

Den

sity

Hamming Distance

GenuineImpostor

Decision Threshold

False Rejects False Accepts

Figure 2.1: In a biometric system, the number of false accepts and the number offalse rejects are related to the chosen decision criteria (Figure modeled after [27]).

7

claims:

TAR =TA

(TA + FR). (2.1)

The false accept rate is the number of false accepts divided by the total number

of false claims:

FAR =FA

(FA + TR). (2.2)

The false reject rate is

FRR = 1 − TAR =FR

TA + FR. (2.3)

The equal-error rate (EER) is a single number often quoted from the ROC curve.

The EER is where the false accept rate equals the false reject rate.

2.1.2 Identification

In an identification scenario, a biometric sample is acquired without any asso-

ciated identity claim. The closed-set identification task is to identify the unknown

sample as matching one of a set of previously enrolled known samples. The open-

set identification task is to either identify the unknown sample or to determine

that the unknown sample does not match any of the known samples. The set

of enrolled samples is often called a gallery, and the unknown sample is often

called a probe. The probe is matched against all of the entries in the gallery, and

the closest match, assuming it is close enough, is used to identify the unknown

sample. Similar to the verification scenario, there are four possible outcomes. A

true positive occurs when the system says that an unknown sample matches a

particular person in the gallery and the match is correct. A false positive occurs

when the system says that an unknown sample matches a particular person in

8

the gallery and the match is not correct. A true negative occurs when the system

says that the sample does not match any of the entries in the gallery, and the

sample in fact does not. A false negative occurs when the system says that the

sample does not match any of the entries in the gallery, but the sample in fact

does belong to someone in the gallery. Performance in an identification scenario is

often summarized in a cumulative match characteristic (CMC) curve. The CMC

curve plots the percent of probes correctly recognized on the Y axis and the cu-

mulative rank considered as a correct match on the X axis. For a cumulative rank

of 2, if the correct match occurs for either the first-ranked or the second-ranked

entry in the gallery, then it is considered as correct recognition, and so on. The

rank-one-recognition rate is a single number often quoted from the CMC curve.

2.2 Eye Anatomy

Many different physical characteristics can be used in a biometrics system.

This work focuses on iris biometrics. The iris is the “colored ring of tissue around

the pupil through which light...enters the interior of the eye.” [89] The iris’s

function is to control the amount of light entering the eye. Two muscles in the iris,

the dilator and the sphincter muscles, control the size of the pupil, and therefore,

the amount of light passing through the pupil. Figure 2.2 shows an example image

acquired by an LG 2200 commercial iris biometrics system at the University of

Notre Dame. The sclera, a white region of connective tissue and blood vessels,

surrounds the iris. A clear covering called the cornea covers the iris and the pupil.

The pupil region generally appears darker than the iris. However, the pupil may

have specular highlights, and cataracts can lighten the pupil. The iris typically

has a rich pattern of furrows, ridges, and pigment spots. The surface of the iris

9

is composed of two regions, the central pupillary zone and the outer ciliary zone.

The collarette is the border between these two regions.

Eyelashes

Eyelid

Sclera

Pupil Pupillary Boundary Limbic Boundary

Collarette

Ciliary Zone

Pupillary Zone

Specular Highlight

Figure 2.2: Image 05495d15 from Notre Dame Dataset. Elements seen in a typicaliris image are labeled here.

The minute details of the iris texture are believed to be determined randomly

in utero. They are also believed to be different between persons and between the

left and right eye of the same person [31]. The color of the iris can change as the

amount of pigment in the iris increases during childhood. Some research asserts

that the texture is relatively constant [28], but other research has detected lower

match scores between images taken multiple years apart [3].

10

2.3 Early Research in Iris Biometrics

The idea of using the iris as a biometric is over 100 years old [10]. However,

the idea of automating iris recognition is more recent. In 1987, Flom and Safir

obtained a patent for an unimplemented conceptual design of an automated iris

biometrics system [41]. Johnston [64] published a report in 1992 on an inves-

tigation of the feasibility of iris biometrics conducted at Los Alamos National

Laboratory after the issuance of Flom and Safir’s patent. Iris images were ac-

quired for 650 persons, and acquired again after a 15-month interval. The pattern

of an individual iris was observed to be unchanged over the 15 months. The com-

plexity of an iris image, including specular highlights and reflections, was noted.

The report concluded that iris biometrics held potential for both verification and

identification scenarios, but no experimental results were presented.

The most important work to date in iris biometrics is that of Daugman. Daug-

man’s 1994 patent [26] and early publications (e.g., [25]) described an operational

iris recognition system in some detail. Iris biometrics as a field has developed

with the concepts in Daugman’s approach becoming a standard reference model.

Also, due to the Flom and Safir patent and the Daugman patent being held for

some time by the same company, nearly all existing commercial iris biometric

technology is based on Daugman’s work.

Daugman’s patent stated that “the system acquires through a video camera

a digitized image of an eye of the human to be identified.” A 2004 paper [28]

said that image acquisition should use near-infrared illumination so that the il-

lumination could be controlled, yet remain unintrusive to humans (Figure 2.3).

Near-infrared illumination also helps reveal the detailed structure of heavily pig-

mented (dark) irises. Melanin pigment absorbs much of visible light, but reflects

11

more of the longer wavelengths of light [23] (Figure 2.4).

Systems built on Daugman’s concepts require subjects to cooperatively posi-

tion their eye within the camera’s field of view. The system assesses the focus of

the image in real time by looking at the power in the middle and upper frequency

bands of the 2-D Fourier spectrum. The algorithm seeks to maximize this spectral

power by adjusting the focus of the system, or giving the subject audio feedback to

adjust their position in front of the camera. More detail on the focusing procedure

is given in the appendix of [28].

Given an image of the eye, the next step is to find the part of the image that

corresponds to the iris. Daugman’s early work approximated the pupillary and

limbic boundaries of the eye as circles. Thus, a boundary could be described

with three parameters: the radius r, and the coordinates of the center of the

circle, x0 and y0. He proposed an integro-differential operator for detecting the

iris boundary by searching the parameter space. His operator is

max(r, x0, y0)

∣

∣

∣

∣

Gσ(r) ∗∂

∂r

∮

r,x0,y0

I(x, y)

2πrds

∣

∣

∣

∣

(2.4)

where Gσ(r) is a smoothing function and I(x, y) is the image of the eye.

All early research in iris segmentation assumed that the iris had a circular

boundary. However, often the pupillary and limbic boundaries are not perfectly

circular. Recently, Daugman has studied alternative segmentation techniques to

better model the iris boundaries [29]. Even when the inner and outer boundaries

of the iris are found, some of the iris still may be occluded by eyelids or eyelashes.

After isolating the iris region, the next step is to describe the features of the

iris in a way that facilitates comparison of irises. The first difficulty lies in the

fact that not all images of an iris are the same size. The distance from the camera

12

Spectral response of LG2200 illuminant

650

10000

20000

30000

40000

50000

60000

0700 750 800

Wavelength (nm)

Sp

ectr

op

ho

tom

ete

r re

sp

on

se

850 900 950 1000 1050

Figure 2.3: Commercial iris cameras use near-infrared illumination so that theillumination is unintrusive to humans, and so that the texture of heavily pig-mented irises can be imaged more effectively. This graph shows the spectrum ofwavelengths emitted by the LEDs on an LG 2200 iris camera. This camera useswavelengths primarily between 700 and 900 nanometers. The spectral character-istics were captured using spectrophotometric equipment made available by Prof.Douglas Hall of the University of Notre Dame.

13

Figure 2.4: Melanin pigment absorbs much of visible light, but reflects more ofthe longer wavelengths of light (Picture reprinted from [23], data from [71]).

14

affects the size of the iris in the image. Also, changes in illumination can cause

the iris to dilate or contract. These problems were addressed by mapping the

extracted iris region into a normalized coordinate system. To accomplish this

normalization, every location on the iris image was defined by two coordinates, (i)

an angle θ between 0 and 360 degrees, and (ii) a radial coordinate r that ranges

between 0 and 1 regardless of the overall size of the image. This normalization

assumes that the iris compresses or stretches linearly in the radial direction when

the pupil dilates or contracts, respectively. A paper by Wyatt [122] explained

that this assumption is a good approximation, but it does not perfectly match the

actual deformation of an iris under dilation or constriction.

The normalized iris image can be displayed as a rectangular image, with the

radial coordinate on the vertical axis, and the angular coordinate on the horizontal

axis. The left side of the normalized image marks 0 degrees on the iris image,

and the right side marks 360 degrees. The division between 0 and 360 degrees

is somewhat arbitrary, because a simple tilt of the head can affect the angular

coordinate. Daugman accounts for this rotation later, in the matching technique.

Directly comparing the pixel intensity of two different iris images could be

prone to error because of differences in lighting between two different images.

Daugman uses convolution with 2-dimensional Gabor filters to extract the texture

from the normalized iris image. In his system, the filters are “multiplied by the

raw image pixel data and integrated over their domain of support to generate

coefficients which describe, extract, and encode image texture information.” [26]

After the texture in the image is analyzed and represented, it is matched

against the stored representation of other irises. If iris recognition were to be

implemented on a large scale, the comparison between two images would have

15

to be very fast. Thus, Daugman chose to quantize each filter’s phase response

into a pair of bits in the texture representation. Each complex coefficient was

transformed into a two-bit code: the first bit was equal to 1 if the real part of the

coefficient was positive, and the second bit was equal to 1 if the imaginary part of

the coefficient was positive. Thus after analyzing the texture of the image using

the Gabor filters, the information from the iris image was summarized in a 256

byte (2048 bit) binary code. The resulting binary “iris codes” can be compared

efficiently using bitwise operations.1

Daugman uses a metric called the fractional Hamming distance, which mea-

sures the fraction of bits for which two iris codes disagree.2 A low fractional

Hamming distance implies strong similarity of the iris codes. If parts of the irises

are occluded, the fractional Hamming distance is the fraction of bits that dis-

agree in the areas that are not occluded on either image. To account for rotation,

comparison between a pair of images involves computing the fractional Hamming

distance for several different orientations that correspond to circular permuta-

tions of the code in the angular coordinate. The minimum computed fractional

Hamming distance is assumed to correspond to the correct alignment of the two

images.

An iris biometrics system following Daugman’s general approach could be de-

scribed in four basic steps: (1) acquisition, (2) segmentation, (3) texture analysis,

and (4) matching. These basic modules are depicted in Figure 2.5. The goal of

image acquisition is to acquire an image that has sufficient quality to support

1The term, “iris code” was used by Daugman in his 1993 paper. I use this term to refer toany binary representation of iris texture that is similar to Daugman’s representation.

2The Hamming distance is the number of bits that disagree. The fractional Hamming distanceis the fraction of bits that disagree. Since fractional Hamming distance is used so frequently,many papers simply mention “Hamming distance” when referring to the fractional Hammingdistance. I also follow this trend in subsequent sections of this work.

16

reliable biometrics processing. The goal of segmentation is to isolate the region

that represents the iris. The goal of texture analysis is to derive a representation

of the iris texture that can be used to match two irises. The goal of matching

is to evaluate the similarity of two iris representations. The distinctive essence

of Daugman’s approach lies in conceiving the representation of the iris texture to

be a binary code obtained by quantizing the phase response of a texture filter.

This representation has several inherent advantages. Among these are the speed

of matching through the fractional Hamming distance, easy handling of rotation

of the iris, and an interpretation of the matching as the result of a statistical test

of independence [25].

Figure 2.5: Major steps in iris biometrics processing. (Picture reprinted from [16]with permission from Elsevier.)

17

Wildes [120] described an iris biometrics system developed at Sarnoff Labs

that uses a very different technical approach from that of Daugman. Whereas

Daugman’s system acquired the image using an LED-based point light source,

the Wildes system used a diffuse light source. When localizing the iris boundary,

Daugman’s approach looked for a maximum in an integro-differential operator that

responds to circular boundary. By contrast, Wildes’ approach involved computing

a binary edge map followed by a Hough transform to detect circles. The Hough

transform considers a set of edge points and finds the circle that best fits the

most edge points. When matching two irises, Daugman’s approach computed

the fractional Hamming distance between iris codes, whereas Wildes’ method

applied a Laplacian of Gaussian filter at multiple scales to produce a template and

computed correlation as a similarity measure. Wildes’ work [120] demonstrated

that multiple distinct technical approaches exist for each of the main modules of

an iris biometrics system.

2.4 Recent Research in Iris Biometrics

A comprehensive review of iris biometrics research up to the year 2007 is given

in [16]. However, there has been a large amount of research published in the

field since that time. A search for IEEE papers with the word “iris” in the title

during the time period of 2008-2010 yields 290 papers3, and a similar search in

Compendex yields about 700. In order to keep up with the latest state-of-the-art

in iris biometrics, I summarized here many of the recent iris biometrics studies. I

focused first on searching for IEEE journal articles on iris biometrics before looking

at conference papers. In reading this somewhat lengthy review, readers may read

3Search run May 26, 2010

18

the sections most pertinent to their field of study, or read the start of each section

then read as much or as little of the details as they desire. This section describes

papers in the general field of iris biometrics. In later chapters, I mention some

additional papers more directly related to my research (see sections 3.2, 4.2, 5.2,

and 6.2).

2.4.1 Image Acquisition, Restoration, and Quality Assessment

2.4.1.1 Image Acquisition

Two recently published papers discussed image acquisition. He et al. [47]

talked about how to make an iris camera at a cheaper cost than the commercially

available cameras. Boyce et al. [17] discussed how iris recognition performed at

different wavelengths of light. The next two paragraphs give details of these

papers.

Iris biometrics research requires high-quality imaging. Commercial iris cam-

eras are expensive. He et al. [47] designed their own iris camera that would be

cheaper than commercial alternatives while still acquiring high-quality images.

They decided to use a CCD camera because CCD cameras produce images of

superior quality than those produced by CMOS cameras. They bought a CCD

sensor with resolution of 0.48 million pixels, and added an optical glass lens,

custom-designed by an optical manufacturer. The lens had a fixed focus at 250

mm. They added NIR-pass filters that transmit wavelengths between 700 and 900

nm. The illumination unit consisted of NIR LEDs of 800nm wavelength, which

they arranged in such a way to try to minimize specular reflections on the iris.

An LCD screen provided feedback for users. The screen displayed an image of

the captured scene inside a view square, and users were asked to position their

19

iris inside the square. Additionally, the screen reported information on position

and focus of each frame. Finally, the camera could be manually angled to capture

images from people between 1.5 and 1.8 meters tall.

Most iris biometrics is done using near infrared light, but there has been some

research into the performance of iris biometrics using images taken with visible

light. Boyce et al. [17] conducted experiments with multiple wavelengths of light.

To acquire multispectral information, they used a Redlake MS3100 multispectral

camera which contains three CCDs and three band-pass prisms behind the lens

to simultaneously capture four different wavelength bands. In this way, they

acquired data from blue, green, red, and near-IR wavelengths. They captured 5

samples each from 24 subjects with varying colors of irises. They tested the effect

of adaptive histogram equalization on the iris images, and reported recognition

performance on an ROC curve. They reported performance from 8 different trials:

the original IR, red, green, and blue channels, and the histogram-equalized IR,

red, green, and blue channels. The blue channel showed improved performance

after histogram equalization. The histogram equalization did not substantially

affect the other channels. Next they tried matching iris images across multiple

channels. They compared IR vs R, IR vs G, IR vs B, R vs G, R vs B, and G vs B.

They found that cross-channel iris recognition worked better when the difference in

wavelengths was smaller (an unsurprising, but previously unresearched idea). For

example, recognition performance when comparing IR probe images to a gallery

of blue wavelength images was poor, but performance when comparing IR to red

was good. Next, they clustered pixels by their RGB value and their L*a*b* value.

They showed example images where the skin and eyelash pixels were assigned to

different clusters than the iris pixels because of the different colors between the

20

iris and surrounding areas. They concluded that clustering pixels by color could

potentially help in segmentation. Their final experiment involved score-level fusion

of match scores from multiple spectral channels. The performance for the blue

channel was improved by fusion with other channels. A fusion of IR, R, and G

gave the highest Genuine Accept Rate at a FAR of 0.1%.

2.4.1.2 Image Restoration

Two papers discussed restoring a blurry iris image by estimating the proper

point spread function (PSF). The ability to restore a blurry iris image to focus

by estimating the PSF could increase the useable depth of field of an iris camera

without requiring extra hardware. A third paper evaluates the depths of field

possible when using wavefront coded imaging.

Kang and Park [68] aimed to restore blurry probe images in real time. In an

initial offline step, they used information about the camera optics to determine

an equation for estimating parameters of the PSF. Those parameters would be a

function of the blurriness of the captured image. During online operation, they

first had to estimate the actual focus of the captured image. They deinterlaced

the image, found an approximate segmentation using a circular edge detector,

and removed eyelashes by detecting windows of the image with high standard

deviation. Then they could run a focus assessment on the iris region only. Once

they knew the image focus, they used their equation to get the PSF and restore

the image to its proper focus. Using focus-restored probe images instead of blurry

probe images increased their equal error rate (EER) from 0.49% to 0.37%. They

consequently increased their camera’s operable depth of field from 22 mm to 50

mm.

21

He et al. [48] estimated the user distance from the camera in order to get the

proper PSF for image restoration. First, they measured the distance between two

specular highlights on the iris. Using this information, plus knowledge about the

positions of the two infrared LEDs, they could get the user’s distance from the

camera without using special hardware like a distance sensor. The knowledge of

the distance from the camera was used in estimating the PSF. Like Kang and

Park [68], they use the constrained least squares fit for restoration.

Boddeti and Kumar [12] investigated the use of wavefront-coded imagery on iris

recognition. This topic had been discussed in the literature before, but Boddeti

and Kumar used a larger data set and presented experiments evaulating how

different parts of the recognition pipeline (e.g. segmentation, feature extraction)

are affected by wavefront coding. They proposed using unrestored image outputs

from the wavefront-coded camera directly, and tested this idea using two different

recognition algorithms. The authors concluded that wavefront coding could help

increase the depth of field of an iris recognition system by a factor of four, and

that the recognition performance on unrestored images was only slightly worse

than the performance on restored images. Figure 2.6 shows examples of blurry

and in-focus iris images.

2.4.1.3 Image Quality

A recent trend in iris image quality research is to combine a number of individ-

ual quality factors to create an overall quality score. Belcher and Du [7, 8] combine

percent occlusion, percent dilation, and “feature information”. An example of an

occluded image is shown in Figure 2.7. To compute “feature information”, they

calculate the relative entropy of the iris texture when compared with a uniform

22

(a) Image 04336d692 (b) Image 04336d695

Figure 2.6: Kang and Park [68] and He et al. [48] use information about cameraoptics and position of the subject to estimate a point spread function and restoreblurry images to in-focus images. Above is an example of (a) a blurry iris imageand (b) an in-focus image of the same subject.

distribution [8]. To fuse the three types of information into a single quality score,

Belcher and Du first computed an exponential function of occlusion and an expo-

nential function of dilation. The final quality score was the product of the three

measures.

Kalka et al. [67] also use percent occlusion in their quality metric, but the

other quality factors that they consider are different. In addition to occlusion, they

consider defocus, motion blur, gaze deviation, amount of specular reflection on the

iris, lighting variation on the iris, and total pixel count on the iris. They measure

defocus using the 8x8 convolution kernel proposed by Daugman to assess high-

frequency content in the image; however, they test for focus only in the bottom

half of the iris region. To estimate motion blur, they first find the dominant

angle of blur using a directional filter in the Fourier space; then they estimate the

magnitude of the blur in that direction. They use the circularity of the pupil as a

measure of gaze direction; for a range of pitch and yaw angles, they use a projective

transformation to rotate the off-angle image into a frontal view image, then test

23

(a) Image 04202d1064 (b) Image 04202d1069

Figure 2.7: Belcher and Du’s quality measure [7] combines information aboutocclusion, dilation, and texture. Above is an example of (a) a heavily occludediris image, and (b) a less occluded image of the same subject.

the circularity of the pupil in the transformed image using Daugman’s integro-

differential operator. They keep the pitch and yaw angles that maximize the

operator. To combine the individual quality factors, Kalka et al. use Demspter-

Shafer theory [108] with Murphy’s Rule [87]. In evaluating various data sets,

Kalka et al. found that the ICE data had more defocused images, the WVU data

had more lighting variation, and the CASIA data had more occlusion than the

other sets.

Schmid and Nicolo [106] suggest a method of analyzing the quality of an entire

database. The authors compare the capacity of a recognition system to the ca-

pacity of a communication channel. Recognition channal capacity can be thought

of as the maximum number of classes that can be successfully recognized. This

capacity can also be used as a measure of overall quality of data in a database.

The authors evaluate the empirical recognition capacity of biometrics systems that

use PCA and ICA. They apply their method to four iris databases and two face

databases. They find that the BATH iris database has a relatively high sample

signal-to-noise ratio, followed by CASIA-III, then ICE 2005. WVU had the lowest

24

signal-to-noise ratio.

Another way of considering quality is to evaluate the quality of typical im-

ages given by individual users. In many biometrics systems, a few users tend to

be responsible for a disproportionate amount of errors in the system. This phe-

nomenon was first noted by Doddington et al. [34]. Users for which the system

performed well were labeled sheep. Goats was the label for users who were difficult

to recognize, and thus responsible for a large number of false rejects in the system.

Lambs were users who were particularly easy to imitate, and thus responsible for a

large number of false accepts. Wolves were users who were particularly successful

at imitating others, and therefore they, like lambs, were responsible for a large

number of false accepts.

A drawback to this original classification system is that it does not describe

relationships between a user’s genuine and impostor scores. Wolf-like and lamb-

like behavior is evaluated by looking at impostor scores only. Goat-like behavior

is evaluated by looking at genuine comparisons only. Another attribute of this

original classification system is that the animals are not distinct. Users who ex-

hibit lamb-like behavior often exhibit wolf-like behavior as well. Yager and Dun-

stone [123] define four new user types based on both the impostor and genuine

scores for a user. Doves are the best users in a biometric system, matching well

against themselves, and poorly against others. Chameleons match well against

themselves, and against others. They rarely cause false rejects, but are likely

to cause false accepts. Phantoms match poorly against themselves, and against

others. They are likely to cause false rejects. Worms match poorly against them-

selves and well against others. Therefore, they cause false rejects when they try

to authenticate as themselves, and they cause false accepts when they try to au-

25

thenticate as others.

Yager and Dunstone [123] tested for the existence or absence of these four

animals in a number of different biometric databases, using a number of biometric

algroithms. Each of the animal types was present in some of the experiments and

absent in others. The authors note that “The reasons that a particular animal

group exist are complex and varied. They depend on a number of factors, including

enrollment procedures, feature extraction and matching algorithms, data quality,

and intrinsic properties of the user population” [123]. Their analysis also leads

the authors to assert that people are rarely “inherently hard to match”. Instead,

they suggest that matching errors are more likely due to enrollment issues and

algorithmic weaknesses rather than intrinsic properties of the users.

2.4.2 Image Compression

There are advantages to storing iris images instead of iris templates: raw

images would improve interoperability between systems and also provide a record

for any investigations of failures in the system. Unfortunately, images can occupy

a significant amount of storage. Therefore, a number of papers have investigated

the impact of compression on iris recognition, and they seem to concur that some

JPEG2000 compression can be applied without significant impact on performance.

Rakshit and Monro [99] proposed to store unwrapped iris images rather than

the original iris images. They recommended using an unwrapped image size of

80 by 512, although they found that they could subsample down to 32 by 342

and still maintain “acceptable system performance”. They found that JPEG2000

compression at 0.5 bpp (bits per pixel) actually improved performance because

the compression removed noise. Error curves were “acceptable” at rates down to

26

0.3 bpp, but performance degraded rapidly at lower rates.

Daugman and Downing [32] chose to compress and store the original images

rather than the unwrapped images. Their argument stated that “polar mappings

depend strongly upon the choice of origin of coordinates, which may be prone

to error, uncertainty, or inconsistency” [32]. The used the NIST ICE 1 data set

for their experiments. To reduce the size of the image, they first automatically

detected the iris location and cropped the 640 by 480 image down to 320 by

320. Next they detected the eyelashes, and replaced the eyelashes and eyelids

with a uniform gray region. This region-of-interest isolation typically resulted in

a two-fold reduction in file size. Finally they used JPEG2000 compression with

a compression factor of 50. These methods reduce file size by a factor of 150,

while only changing 2 to 3% of the bits in the iris code. ROC curves in the paper

showed trade-offs between compression factor and recognition performance.

2.4.3 Segmentation

Segmentation continues to be an active area of research. Correct segmentation

is a prerequisite to high biometric performance. A recent trend is to use active

contours to find the iris and/or eyelids, and to use ellipses rather than circles to

model the pupillary and limbic boundaries. However, active contours is not the

only recently proposed method.

2.4.3.1 Active Contours

A paper by Daugman in 2007 [29] explained his use of active contours for

fitting the iris boundaries. First, he calculated the image gradient in the radial

direction. He detected occlusions by eyelids and modeled those with separate

27

splines. Then a discrete Fourier series approximation was fit to the image gradient

data. In any active contour method, there is a trade-off between how closely

the contour fits the data versus the desired constraints on the final shape of the

contour. Daugman modeled the pupil boundary with weaker constraints than the

iris boundary, because he found that the pupil boundary tended to have stronger

gradient data.

Vatsa et al. [118] improved the speed of active contour segmentation by using

a two-level hierarchical approach. First, they found an approximate initial pupil

boundary. The boundary was modeled as an ellipse with five parameters. The

parameters were varied in a search for a boundary with maximum intensity change.

For each possible parameter combination, the algorithm randomly selected 40

points on the elliptical boundary and calculated total intensity change across the

boundary. Once the pupil boundary was found, the algorithm searched for the iris

boundary in a similar manner, this time selecting 120 points on the boundary for

computing intensity change. The approximate iris boundaries were refined using

an active contour approach. The active contour was initialized to the approximate

pupil boundary and allowed to vary in a narrow band of +/- 5 pixels. In refining

the limbic boundary, the contour was allowed to vary in a band of +/- 10 pixels.

2.4.3.2 Alternatives to Active Contours

Ryan et al. [104] presented an alternative fitting algorithm, called the Starburst

method, for segmenting the iris. They preprocessed the image using a smoothing

filter and a gradient detection filter. Then, they needed to find a pupil location as

a starting point for the algorithm. To do so, they set the darkest 5% of the image

to black, and all other pixels to white. Then they created a Chamfer image: the

28

darkest pixel in the Chamfer image is the pixel farthest from any white pixel in a

thesholded image. They used the darkest point of the Chamfer image as a starting

point. Next, they computed the gradient of the image along rays pointing radially

away from the start point. The two highest gradient locations were assumed to

be points on the pupillary and limbic boundaries. The detected points were used

to fit several ellipses using randomly selected subsets of points. An average of the

best ellipses was reported as the final boundary. The eyelids were detected using

active contours.

Pundlik et al. [97] presented another alternative segmentation algorithm that

used graphs. Their algorithm was a labeling routine instead of a fitting routine

like active contours or the Starburst method. Their first goal was to assign a

label - either “eyelash” or “non-eyelash” - to each pixel. After removing specular

reflections, they used the gradient covariance matrix to find intensity variation in

different directions for each pixel. Then they created a probability map, P, that

assigned the probability of each pixel having high texture in its neighborhood.

The “energy” corresponding to a particular labeling of the images was written as

a function of a smoothness term and a data term. The data term was based on

a texture probability map. They treated the image as a graph where pixels were

nodes and neighboring pixels were joined with edges, then they used a minimum

graph cuts algorithm to find a labeling that minimized the energy function. The

second goal was to assign each pixel one of four labels: eyelash, pupil, iris, or

background. They used a method similar to the initial eyelash segmentation;

however, this time they used an alpha-beta swap graph-cut algorithm. Finally,

they refined their labels using a geometric algorithm to approximate the iris with

an ellipse.

29

2.4.3.3 Eyelid and Eyelash Detection

Some papers discussed eyelid detection without making it the primary focus

of the paper. For instance, the main focus of Ryan’s paper [104] was the Star-

burst segmentation routine, but they additionally used active contours to find

the eyelids. In Kang and Park’s image restoration paper [68] described earlier

in section 2.4.1.2, they identified eyelids by finding windows of the image with

high standard deviation. Pundlik et al. [97] (described in section 2.4.3.2) used

a graph cuts algorithm to label pixels as eyelash or non-eyelash. Daugman [29]

performed a statistical test to see whether the distribution of the pixels in an iris

region was multimodal; for multimodal distributions, he statistically selected an

appropriate threshold, and masked all pixels darker than the chosen threshold.

Some of these methods ([104]) were fitting methods, and others ([68],[97],[29])

were labeling methods.

A 2009 paper by Li and Savvides [77] was focused entirely on occlusion detec-

tion. They performed occlusion detection using a probabilistic method: Figueiredo-

Jain Gaussian Mixture Models. All occlusion-detection was performed on un-

wrapped iris images. They used a single image for training, then assigned each

pixel in each test image a label of “occluded” or “unoccluded”. They compared

their method to a rule-based segmentation method and a Fisher-Linear Discrim-

inant Analysis based method. Additionally, they tried the proposed Gaussian

Mixture Models (GMM) using a number of different combinations of feature sets.

They measured occlusion-detection accuracy by comparing their masks to man-

ually created masks. They also evaluated their method by creating ROC curves

of the recognition results using each segmentation algorithm. They found that

no matter which feature set they used, the proposed GMM method outperformed

30

the FLDA and rule-based segmentation methods. The feature set resulting in

the most accurate eyelid masks was a set using intensity of each pixel, and the

mean and standard deviation of the pixel intensities in a 3x3 window. The fea-

ture set resulting in the best recognition performance used response intensity after

the image was filtered by a Gaussian filter, response intensity after filtering by a

Gabor filter, and the response intensity after filtering by first-order and second-

order Haar wavelets. All feature sets also included the (x, y) coordinate of pixel

location.

Traditionally, occluded regions are masked. However, features near the edges of

the occluded regions are also affected because the tail of the Gabor filter overflows

onto the occluded regions. Munemoto et al. [86] stated that “it is important to not

only exclude the noise region, but also estimate the true texture patterns behind

these occlusions. Even though masks are used for comparison of iris features, the

features around masks are still affected by noise. This is because the response of

filters near the boundary of the mask is affected by the noisy pixels.” Munemoto

et al. used the Criminisi image-filling algorithm to estimate the texture behind

the occlusions. This algorithm iteratively filled 9x9 patches of the occluded region

with 9x9 patches from unoccluded regions. It estimated textures at the boundary

of the region first, selecting 9x9 source patches from the unoccluded iris that

closely matched the iris texture near the boundary of the area to be filled.

2.4.3.4 Segmenting Iris Images with Non-frontal Gaze

Schuckers et al. [107] tried two different approaches to handle “off-angle” irises.

In both approaches, they sought to transform an off-angle image into an equivalent

frontal image. The first method sought to determine how far an image deviated

31

from frontal by trying multiple values of pitch and yaw. For each (pitch, yaw) pair,

they used bilinear interpolation to transform the image. They found the values

of pitch and yaw that resulted in the maximum circularity of the detected pupil.

For encoding and matching irises, they use independent component analysis. The

second method modeled the relationship between actual 3-D iris points with 2-D

projected points. Once that relationship was obtained, the 2-D off-angle image

could be transformed into a frontal view image. Biorthogonal wavelets are used

for encoding and matching. Schuckers et al. found that results using their two

methods were “significantly improved over the iris recognition techniques which

do not perform any correction for angle.” The first method showed “good perfor-

mance for small angle deviations from training to testing, for example, training

with 15 degrees and testing with 0 or 30 degrees. However, there was relatively

poor performance when training using 0 degrees and testing using 30 degree im-

ages. The probable cause of this is the use of the simple projective transform

for large angle deviations.” They concluded that the second method was better.

However, it was unclear why they did not use the same encoding and matching

step for both methods.

Daugman’s 2007 paper [29] also discussed transforming off-angle iris images to

frontal view. He described the shape of the pupil in the image using parametric

equations. Using trigonometry and Fourier series expansions of these equations,

he estimated the direction of gaze. Then he applied an affine transformation to the

off-angle image to obtain an image of the eye apparently looking at the camera.

32

2.4.4 Feature Extraction

The survey paper by Bowyer et al. [16] listed an enormous number of papers

which tried alternative methods of feature extraction. Fewer recent papers have

focused on this topic. Miyazawa et al. [85] suggested a correlation-based technique,

Bodade and Talbar [11] recommended a Complex Wavelet Transform, and Belcher

and Du [9] demonstrated a region-based SIFT approach.

The motivation behind Miyazawa’s proposed method [85] was that Daugman-

like, feature-based iris recognition algorithms required many parameters. They

claimed that their proposed algorithm was easier to train. For each comparison

using the proposed method, they took two images and selected a region that was

valid (unoccluded) in both images. They took the discrete Fourier Transform

of both valid regions, then applied a Phase Only Correlation function (POC).

The POC function involved a difference between the phase components from both

images. They used band-limited POC to avoid information from high-frequency

noise. The proposed algorithm required only two parameters: one parameter

represents the effective horizontal bandwidth for recognition, and the other pa-

rameter represents the effective vertical bandwidth. They achieved better results

using Phase Only Correlation than using Masek’s 1D log-Gabor algorithm.

Bodade and Talbar [11] suggested using a 2D Dual Trace Complex Wavelet

Transform (CWT) and a 2D Dual Trace Rotated CWT because these trans-

forms (1) provided features in more directions than a Discrete Wavelet Trans-

form (DWT), (2) provided shift invariance, and (3) were more computationally

economical than Gabor filters.

The scale-invariant feature transform method (SIFT) is a method that has

been used for object recognition in computer vision. A typical SIFT algorithm

33

has not worked well for iris recognition because many iris structures look similar

between different eyes. To counter this difficulty, Belcher and Du [9] proposed a

region-based SIFT approach. Belcher and Du cited multiple advantages of using

SIFT; the method “does not require highly accurate segmentation, transformation

to polar coordinates, or affine transformation”. Their method divided the iris

area into three regions: left, right, and bottom. Each region was subdivided into

subregions, each containing a potential feature point. After eliminating unstable

points, the dominant orientation and feature point description was found using

the SIFT approach. When comparing two images, they only compared a feature

from a given subregion in the first image with the corresponding subregion in the

second image, or with the eight nearest subregions in the second image.

2.4.5 Improvements in Matching

One recent trend in iris biometrics is that of selecting the most reliable fea-

tures for matching, and masking less reliable features. One example of this idea

is proposed by Ring and Bowyer [101]; they suggest removing local texture dis-

tortions from a comparison by disregarding local windows of a match comparison

with high fractional Hamming distance. Another example of this idea is fragile

bit masking [53] which disregards parts of the iris code that would be less reliable

due to the coarse quantization of a complex filter response.

In Daugman’s traditional algorithm, a texture filter is applied to an iris image,

and the complex filter responses are quantized to two bits. The first bit is a 1 if the

real part of the number is positive, and 0 otherwise; similarly, the second bit is a 1

if the imaginary part of the number is positive, and 0 otherwise. Algorithms that

follow this pattern produce templates in which not all bits have equal value [53].

34

Specifically, complex filter responses near the axes of the complex plane produce

unstable bits in the iris code: a small amount of noise in the iris image can

shift that filter response from one quadrant to the adjacent quadrant, causing the

corresponding bit in the iris code to flip. This type of bit is defined as “fragile”;

that is, there is a substantial probability of it ending up a 0 for some images of

the iris and a 1 for other images of the same iris.

Hollingsworth et al. [53] suggested identifying and masking fragile bits using

the following strategy. If the complex coefficient had a real part very close to

0, they masked the corresponding real bit in the iris code. If the complex co-

efficient had an imaginary part very close to 0, they masked the corresponding

imaginary bit. Hollingsworth et al. masked bits corresponding to the 25% of com-

plex numbers closest to the axes of the complex plane. Barzegar et al. [6] applied

this approach to the CASIAv3 data set and found that using a threshold of 35%

worked better than 20% or 30% on that data set.

Hollingsworth’s approach [53] predicts which bits in an iris code are frag-

ile by looking at the complex filter response, and masking responses with small

real parts or small imaginary parts. In contrast, Dozier et al. [37, 38] create a

fragile bit mask for a subject from training sets of ten images from that subject.

Therefore, Hollingsworth’s method detects axis-fragile bits, while Dozier’s method

detects trained fragile bits. These research papers are summarized in more detail

in Chapter 3.

2.4.6 Searching Large Biometrics Databases

One challenge with implementing large-scale biometrics applications is that

searching large databases can be prohibitively time consuming. Therefore, some

35

researchers are interested in methods of indexing, to reduce the amount of the

database that must be searched to add a new entry or search for a match. Parti-

tioning methods work well in 2-D and 3-D, but do not work well with iris codes

because iris codes are traditionally binary vectors thousands of bits long (e.g.

2048 bits). Partitioning methods suffer from the curse of dimensionality in a

2048-D binary lattice. Clustering methods cannot be applied to this problem

because iris codes are almost uniformly distributed on the lattice. A couple of

algorithms [22, 43] assume that if two records are similar, then there is a high

probability that a segment within the records will match exactly. Unfortunately,

these methods often must still search a large portion of the database to be effective.

One possible solution is proposed by Hao et al. [45]. They propose a “multiple

collision principle”. They require three segments to match exactly before taking

the time to retrieve from disk and compare the entire iris code or record. They

call their algorithm a “beacon guided search” (BGS). They report a 300-times

speedup over an exhaustive search of 632,500 iris codes, with only a slight drop

in performance. The FRR for the exhaustive search was 0.32%; the FRR for the

BGS was 0.64%.

2.4.7 Applications

2.4.7.1 Cryptographic Applications

Private cryptographic keys are typically protected with passwords. However

passwords can be lost or guessed. To eliminate the possibility of people losing

their passwords, a system could protect private keys with biometric data, or use

biometric data to generate the private key. This strategy could also eliminate

the possibility of an impostor guessing the password to get unauthorized access.

36

However, it does not eliminate the possibility of an impostor stealing biometric

information and spoofing the system.

An extensive review of the literature in cryptography and biometrics is beyond

the scope of this work, but we mention a few of the well-known papers here. In

1998, Davida et al. [33] evaluated a number of secure biometric identification

scenarios. For one scenario, they proposed that a user could authenticate in the

following manner. A user’s biometric template would be captured multiple times,

and the multiple vectors would be put through a majoring decoding algorithm.

The biometric would be corrected further using check digits and error correction.

The signature, Sig(Hash(name, attributes, T ‖ C)), is then verified, where Sig(x)

denotes the authorization officer’s signature of x, Hash() is a partial information

hiding hash function, T is the corrected biometric, and C are the check digits for

the biometric.

In 1999, Juels and Wattenberg [66] proposed a technique which they called a

fuzzy commitment scheme. This technique aimed to recover the original biometric

template. If b is a biometric template, and c is a randomly chosen codeword,

enrollment consists of storing z = c ⊕ b. During verification, the system obtains

a new biometric template b′. The system computes z ⊕ b′ = c ⊕ (b ⊕ b′), and

then tries to use error correcting codes to correct [c ⊕ (b ⊕ b′)] thus recovering c.

If the Hamming distance between b and b′ is small, the system recovers c, and

consequently can recover b as well.

The fuzzy vault biometric cryptosystem was proposed by Juels and Sudan [65]

in 2002. In a fuzzy vault, the private key is used to generate a polynomial, pos-

sibly by using the key as coefficients of a polynomial. The components of the

biometric template are used as x-axis coordinates to generate coordinate pairs of

37

genuine points. Additional false points, called chaff points are randomly gener-

ated. Genuine points and chaff points are stored in a vault. During decryption,

the valid user presents his biometric data to determine which points in the vault

are genuine points. The private key can be retrieved by fitting a polynomial to

the genuine points.

Dodis et al. [35, 36] formalized the notions of a secure sketch and a fuzzy

extractor. A fuzzy extractor “extracts a uniformly random string R from its input

w in a noise-tolerant way” [35]. Noise-tolerance means that if the input changes

slightly, but is still close to w, the string R can still be reproduced exactly. A

secure sketch also allows for precise reproduction of the original input, but does

not address uniformity.

The above papers discuss cryptography combined with any biometric template.

Some papers apply cryptographic ideas specifically for iris biometric applications.

Hao et al. [44] developed a fuzzy commitment scheme for iris templates. They

used Hadamard and Reed-Solomon error correcting codes to produce a 140-bit

cryptographic key from iris biometric data and a tamper-resistant token, such as

a smart card. Bringer et al. [18, 19] explained how to estimate the theoretical

performance limit of a secure sketch for binary biometric data. They proposed a

practical fuzzy commitment scheme for iris biometric templates and tested their

technique on two publicly available iris data sets. Lee et al. [75] described a way

to build a fuzzy vault using iris biometrics.

Despite the above mentioned research in fuzzy cryptography, the combination

of biometrics and cryptography is not yet a solved problem. In a recent confer-

ence article, Ballard et al. [5] discuss required security requirements for biometric

key generation, and demonstrate how three published application schemes fail to

38

meet important requirements. Another paper by Simoens et al. [109] revealed

weaknesses in the theoretical constructions themselves. They studied two main

properties of biometric template protection – indistinguishability and irreversibil-

ity – and found that “some sketches based on linear codes, such as the fuzzy com-

mitment scheme of Juels and Wattenberg [66], cannot be securely reused when

considering biometric privacy” [109]. Thus, there is still room for more research

in this area, and need for more rigorous security analysis.

2.4.7.2 Identity Cards in the U.K.

The United Kingdom has considered using iris biometrics in a national identity

card system [119]. In November 2004, the government introduced the Identity

Cards Bill. The Identity Card Bill scheme had a number of elements: a centralized

database called the National Identity Register (NIR), a number assigned to each

U.K. citizen and resident over the age of 16, individual biometrics stored in both

the NIR and the card, and the legal obligation for citizens to produce the card in

order to obtain some public services.

In January 2005, a group of people at the London School of Economics and

Political Science (LSE) started a project, the LSE Identity Project to examine

potential impacts and benefits of the Identity Cards Bill [119]. These people were

concerned that the politicians had not considered all the challenges and risks asso-

ciated with such system. The LSE Identity Project main report, released in June

2005, highlighted some of their concerns. One concern was the proposal for the

centralized database (the NIR). Other European nations with identity card sys-

tems used federated schemes rather than a centralized database (e.g. Germany)

or avoided using a single national identification number to obtain government ser-

39

vices (e.g. France, Hungary, Germany, Czech Republic, and Austria). The LSE

also noted that the UK government showed incredible faith in biometrics, despite

the fact than most existing biometric studies had very controlled experimental se-

tups (e.g. controlled lighting in frequent traveller programs) and limited database

sizes.

Despite these objections, the Identity Card Bill passed in March 2006. How-

ever, the implementation of the bill was delayed, and there was continued consul-

tation about the collection of biometrics. By 2008, iris biometrics were dropped

from the plans [119], and in 2010 the U.K. Identity Card project was suspended.

2.4.8 Evaluation

Newton and Phillips [88] presented a summary of three independent state-

of-the-art iris biometric evaluations: the Independent Testing of Iris Recognition

Technology (ITIRT) conducted by the International Biometric Group (IBG), the

Iris Recognition Study 2006 (IRIS06) conducted by Authenti-Corp (AC), and

the Iris Challenge Evaluation (ICE 2006) conducted by the National Institute of

Standards and Technology (NIST).

ICE2006 compared three algorithms. ITIRT and IRIS06 compared sensors.

The evaluations used between 240 (ICE2006) and 458 (ITIRT) subjects. To sum-

marize the three evaluations, Newton and Phillips compared the FNMRs at a

FMR of 1 in 1000 (.1%). All three evaluations got similar magnitudes of error

rates. The similarity in results may be partly due to the fact that all but one of

the algorithms used in the evaluations were based on work by Daugman.

40

2.4.9 Performance under Varying Conditions

As iris biometrics is used for larger and more varied applications, it is essential

to test the limits of the technology under a variety of conditions (Figure 2.8).

Rakshit and Monro [100] tested the performance of iris biometrics for three pa-

tients who underwent cataract surgery. A cataract is a clouding of the lens in the

eye. More than 200,000 cataract procedures are performed every year in the U.K.

In cataract surgery, the cloudy lens is replaced with a thinner implant, causing

the iris plane to shift away from the cornea. The result is increased magnification

of the iris by the cornea. Rakshit and Monro took pictures of the iris before and

after three patients had cataract surgery. They noticed an increased number of

specular reflections in the pupil after surgery but they found no visible change in

the iris structure, and they obtained an equal error rate of zero when comparing

pre- and post-operative images. They concluded that cataract surgery was not a

degrading factor. This result is the opposite of the result obtained by Roizenblatt

et al. [102], which could possibly be attributed to Rakshit having such a small

data set.

Rakshit and Monro [100] also examined eleven patients whose eyes were di-

lated with eyedrops. They found that after instilling eyedrops, they had six failures

out of 45 images matched, and many of the dilated eyes had non-circular pupils.

Hollingsworth et al. [55] also investigated the effect of dilation. They achieved

images with a range of dilation by darkening the lights in the room. In their

experiments, they found that comparisons between two dilated eyes followed a

distribution with a mean fractional Hamming distance of 0.06 higher than the

mean of the distribution for non-dilated eyes. The means of both the match and

the non-match distributions are expected to fall between 0 and 0.5. Therefore,

41

Figure 2.8: As iris biometrics is used for larger and more varied applications, itwill have to deal with irises with various different conditions. This image showsan unusual iris (Subject 05931) with filaments of tissue extending into the pupil.

a shift of 0.06 is nontrivial, amounting to twelve percent of this range. Further-

more, the difference in dilation between an enrollment image and an image to be

recognized had a marked affect on the comparison. Comparisons between images

with widely different degrees of dilation followed a distribution with a mean about

0.08 higher than the mean of the distribution of images with similar degrees of

dilation.

The Multiple Biometric Grand Challenge (MBGC) is designed to test biometric

performance under less controlled conditions than what has previously been used

for biometrics. More information about the data released with this challenge is

given in Chapter 4.

42

2.4.10 Multibiometrics

According to Kittler and Poh, “the term Multi Biometrics refers to the design

of personal identity verification or recognition systems that base their decision

on the opinions of more than one biometric expert” [70]. Recent research has

highlighted the benefits of using multiple biometric modalities. Benefits include

increased population coverage, more user choice, improved reliability, increased

resilience to spoofing, and improved authentication performance.

Kittler and Poh [70] showed that multi-modal biometrics can provide improved

performance compared to individual component experts. They used five off-the-

shelf conventional technologies and measured the FRR and FAR on each. They

then used weighted averaging to fuse the scores from the five experts. A weighted

fusion of all five experts had an order of magnitude lower error rates than any

single expert. Adding a quality measure to multibiometrics is also beneficial. In

an experiment involving the fusion of six face systems and one speech system,

“using quality measures [reduced] the verification error ... over the baseline fusion

classifier (without quality measure), by as much as 40%.” In creating a multi-

biometric system, there is a trade-off between improved accuracy and increased

computation. This is an optimization problem: find the subset of candidate bio-

metric experts that maximizes performance and minimizes cost. Furthermore,

the solution to the optimization problem should be robust to the population mis-

match between the development and target data sets. Kittler and Poh found that

cross-validation method of evaluation is particularly sensitive to the mismatch in

data sets, and that the Chernoff bound is a better alternative.

A book chapter by Jain et al. [63] also gives an in-depth discussion of multi-

biometrics. Jain et al. divided multibiometric systems into six categories: multi-

43

sensor, multi-algorithm, multi-instance, multi-sample, multimodal, and hybrid.

Information can be fused at sensor-level, feature-level, score-level, rank-level, and

decision-level. Figure 2.9 shows an example acquisition setup that could be used

for multibiometrics. This “Iris on the Move” portal captures video of a person’s

face as the subject walks through the portal. The video frames have sufficient res-

olution to enable iris recognition. Using a combination of face recognition and iris

recognition on these videos would be an example of multi-algorithm biometrics.

44

Figure 2.9: MBGC data included near infrared iris videos captured with a SarnoffIris on the Move portal, shown above. Video of a subject is captured as a userwalks through the portal. This type of acquisition is less constrained than tra-ditional iris cameras, however, the quality of the iris images acquired is poorer.It is possible to acquire both face and iris information using this type of portal.(Picture reprinted from [16] with permission from Elsevier.)

45

CHAPTER 3

FRAGILE BIT COINCIDENCE

As mentioned in section 2.4.5, not all bits in an iris code have equal value.

The observation that some bits in the iris code are less consistent than others was

first made by Bolle et al. [13]. We define an iris code bit as “fragile” when there

is a substantial probability of it ending up a 0 for some images of the iris and a

1 for other images of the same iris. My previous research [53] has shown that iris

recognition performance can be improved by masking these fragile bits. Rather

than ignoring fragile bits completely, we considered what beneficial information

could be obtained from fragile bits. In this chapter, we present evidence that

the locations of fragile bits tend to be consistent across different iris codes of

the same eye, and that this information can be used to improve iris biometrics

performance. Portions of this chapter have been reprinted, with permission, from

the Proc. IEEE Int. Conf. on Biometrics: Theory, Applications, and Systems [56]

( c© 2009, IEEE).

3.1 Motivation

When using fragile bit masking [53], we mask a significant amount of infor-

mation because it is not “stable”. Rather than completely ignoring all of that

fragile bit information, we would like to find a way to make some beneficial use

46

of those bits. We know that the values (zero/one) of those bits are not stable.

However, the physical locations of those bits should be stable and might be used

to to improve our iris recognition performance.

We call the physical locations of fragile bits a fragile bit pattern. Figure 3.1

shows some iris images and Figure 3.2 shows the corresponding fragile bit patterns.

Figure 3.2(a) and Figure 3.2(b) both show subject number 2463, and Figure 3.2(c)

and Figure 3.2(d) both show subject 4261. The fragile bit patterns in Figure 3.2(a)

and Figure 3.2(b) are more similar to each other than the fragile bit patterns in

Figure 3.2(a) and Figure 3.2(c).

To compute the Hamming distance between two iris codes, we must first com-

bine (AND) the masks of the two iris codes. Figure 3.3 shows the fragility masks

obtained by ANDing pairs of fragility masks together. For example, Figure 3.3(a)

is the comparison mask obtained by combining Figure 3.2(a) and 3.2(b). Fig-

ure 3.3(a) and 3.3(b) both show masks obtained when computing the Hamming

distance for a match comparison (same subject). Figure 3.3(c) and 3.3(d) show

masks for nonmatch comparisons. The fragile bit patterns for the match com-

parisons coincide more closely than the fragile bit patterns for the nonmatch

comparisons. By looking at how well two fragile bit patterns align, we can make a

prediction on whether those two irises are from the same subject or from different

subjects. We can fuse that information with the Hamming distance information

and get an improved prediction over using the Hamming distance alone.

The rest of this chapter is organized as follows. In section 3.2 we talk about

related research. Section 3.3 describes the data sets used for our experiments in

this chapter. Section 3.4 defines a new metric, the fragile bit distance (FBD),

which quantifies the difference between two fragile bit patterns. In section 3.5, we

47

(a) 02463d1910 (b) 02463d1912

(c) 04261d1032 (d) 04261d1034

Figure 3.1: Example images from our data set. These images were captured usingan LG4000 iris camera.

48

(a) 02463d1910 fragility mask: 1116 masked bits

Fragility Masks

(b) 02463d1912 fragility mask: 1128 masked bits

(c) 04261d1032 fragility mask: 1098 masked bits

(d) 04261d1034 fragility mask: 1118 masked bits

Figure 3.2: These are the fragile bit patterns (imaginary part) corresponding tothe images in Figure 3.1. Black pixels are bits masked for fragility. We use 4800-bit iris codes and mask 25% of the bits (or 1200 bits) for fragility. Some of thebits are masked for occlusion, and so slightly less than 1200 bits are masked forfragility.

(a) 2463 match comparison: 1706 masked bits

Comparisons Between Pairs of Masks

(b) 4261 match comparison: 1738 masked bits

(c) Nonmatch comparison: 1957 masked bits

(d) Nonmatch comparison: 1978 masked bits

Figure 3.3: These are comparisons of fragile bit patterns, each obtained by AND-ing two fragile bit masks together. For example, Figure 3.3(a) is the comparisonmask obtained by combining Figure 3.2(a) and 3.2(b). Black pixels show wherethe two masks agreed. Blue pixels show where they disagreed. White pixels wereunmasked for both iris codes. There is more agreement in same-subject compar-isons than there is when comparing masks of different subjects.

49

present graphs of the distributions of FBD and Hamming distance. Section 3.6

discusses methods of fusing Hamming distance with FBD. In section 3.7 we show

that the proposed method results in a statistically significant improvement over

using Hamming distance alone. Finally, section 3.8 presents experiments showing

the effect of changing the amount of fragile bit masking used.

3.2 Related Work

In the previous chapter, we surveyed some of the many papers presenting

research in iris biometrics. Here, we focus on the small subset of research that

investigates fusing Hamming distance with other information.

3.2.1 Research on Fusing Hamming Distance with Added Information

A small subset of iris biometrics research investigates combining Hamming

distance with other information. A work by Sun et al. [112] aims to characterize

global iris features using the following feature extraction method. First, they

introduce a local binary pattern operator (LBP) to characterize the iris texture in

each block of the iris image. The image block information is combined to construct

a global graph. Finally, the similarity between two iris images is measured using a

graph matching scheme. They fuse the LBP score with Hamming distance using

the sum rule. They report that using Hamming distance alone yields an equal

error rate (EER) of 0.70%, but the score-fusion of Hamming distance with their

LBP method yields an EER of 0.37%.

As an alternative to the sum rule, Sun et al. [112] state that the LBP score

could be combined with Hamming distance using cascaded classifiers. Since their

LBP method is slower than computing the Hamming distance, they suggest cal-

50

culating the Hamming distance first. If the Hamming distance is below some low

threshold, the comparison is classified as a match. If the Hamming distance is

above some high threshold, the comparison is classified as a nonmatch. If the

Hamming distance is between those two thresholds, the second classifier (using

LBP) should make the decision.

Vatsa et al. [117] characterize iris texture using Euler numbers. They use

a Vector Difference Matching algorithm to compare Euler codes from two irises.

Vatsa et al. combine Hamming distance and Euler score using a cascaded classifier.

Zhang et al. [124] use log Gabor filters to extract 32 global features character-

izing iris texture. To compare the global features from two iris images, they use

a weighted Euclidean distance (WED) between feature vectors. Zhang et al. use

cascaded classifiers to combine the global WED with a Hamming distance score.

However, unlike Sun et al. [112] and Vatsa et al. [117], they propose using their

global classifier first, and then using Hamming distance. In their experiments,

using Hamming distance alone gave a false accept rate (FAR) of 8.1% when the

false reject rate (FRR) was 6.1%. The fusion of WED and Hamming distance

gave FAR = 0.3%, FRR = 1.9%.

Park and Lee [90] generate one feature vector using the binarized directional

subband outputs at various scales. To compare two binary feature vectors, they

use Hamming distance. A second feature vector is computed as the blockwise

normalized directional energy values. Energy feature vectors are compared using

Euclidean distance. To combine scores from these two feature vectors, Park and

Lee use a weighted average of the two scores. Using the binary feature vectors

alone gives an EER of 5.45%; the energy vectors yield an EER of 3.80%; when

the two scores are combined, the EER drops to 2.60%.

51

All of the above mentioned papers combine Hamming distance scores with

some other scores at the matching score level to improve iris recognition. Sun

et al. [112] combine scores by summing. Three of the papers [90, 112, 117] use

cascaded classifiers. Park and Lee [90] use a weighted average. Our work is similar

to these papers in that we also consider combining two match scores to improve

performance. We differ from these other works in that we are the first to use a

score based on the location of fragile bits in two iris codes.

3.2.2 Research on Fragile Bits

Research on fragile bits is a more recent trend in iris biometrics literature. One

of our previous papers [53] presented evidence that not all bits in the iris code are

of equal consistency. We investigated the effect of different filters on bit fragility.

We used 1D log-Gabor filters and multiple sizes of a 2D Gabor filter, and found

that the fragile bit phenomenon was apparent with each filter tested. The largest

filter tended to yield fewer fragile bits than the smaller filters. We investigated

possible causes of inconsistencies and concluded that the inconsistencies are due

largely to the coarse quantization of the filter response. We performed an exper-

iment comparing (1) no masking of fragile bits (baseline) with (2) masking bits

corresponding to complex filter responses near the axes of the complex plane. We

masked fragile bits corresponding to the 25% of filter responses closest to the axes.

Using a data set of 1226 images from 24 subjects, we found that fragile bit masking

improved the separation between the match and nonmatch score distributions.

Other researchers have also begun to investigate the effects of masking fragile

bits. Barzegar et al. [6] investigated fragile bit masking using different thresh-

olds. They compared (1) no fragile bit masking to (2) fragile bit masking with

52

thresholds of 20%, 30% and 35%. They found that using a threshold of 35% for

masking produced the lowest error rates on the CASIA-IrisV3 data set. Our own

initial investigations have shown that the optimal fragility threshold may depend

partly on the quality of the iris images being used; therefore, we feel that further

investigation into the proper fragility threshold would be worthwhile.

Dozier et al. [37] also tried masking inconsistent bits and found an improvement

in performance. However, they used a different method than Hollingsworth et

al. [53] and Barzegar et al. [6]. Hollingsworth et al. [53] and Barzegar et al. [6]

approximated fragile bit masking by masking filter responses near the axes of the

complex plane. In contrast, Dozier et al. used a training set of ten images per

subject to find consistency values for each bit in the iris code. Then for that

subject, they only kept bits that were 90% or 100% consistent in their training

set, and masked all other bits. In addition, they also considered only those bits

that had at least 70% coverage in their training set; that is, if a bit was occluded

by eyelids or eyelashes in four or more of the training images, they masked that

bit. Dozier et al. tested their method on six subjects from the ICE data set.

In a similar paper, Dozier et al. [38] again showed the benefit of masking

inconsistent bits. In this work, they used a genetic algorithm to create an iris

code and corresponding mask for each subject. Once again, they used ten training

images per subject in generating their fragile bit masks.

Each of the above mentioned papers showed the benefit of masking fragile bits,

but in every case, they simply discarded all information from the fragile bits. None

of them considered employing the locations of fragile bits as an extra feature to

fuse with Hamming distance.

The only paper that showed a benefit from using the locations of fragile bits

53

was our conference paper [56]. That paper introduced the idea of comparing

fragile-bit-locations between two irises, and tested our idea on a data set of 9784

images. Here we present experiments on a data set more than twice the size of

our prior set. We have further analyzed the distribution of fragile bit distance,

added statistical tests evaluating our proposed method, and investigated the effect

of varying the amount of fragile bit masking used.

3.3 Data

We acquired a data set of 19,891 iris images taken with an LG4000 iris cam-

era [76] at the University of Notre Dame. Some example images are shown in

Figure 3.1 and the camera is shown in Figures 3.4 and 3.5. The images are 640

pixels by 480 pixels. All images in this set were acquired between January 2008

and May 2009. A total of 686 different people attended acquisitions sessions, so

there are 1372 different eyes in the data set. Each subject attended between one

and eighteen acquisition sessions. At each session, we usually acquired three left

eye images and three right eye images. The minimum number of images for any

one subject is four (two of each iris), and the maximum is 108 (54 of each iris).

For our experiments, we used the most current version of our in-house iris bio-

metric software. This software is based on NIST’s IrisBEE software. It uses one-

dimensional log-Gabor filters for extracting the iris texture from the segmented

and unwrapped iris image. One modification that we made to the IrisBEE soft-

ware is that our software now uses active contours for segmentation. Additionally,

we added fragile bit masking to the software; we use a default fragile bit masking

threshold of 25% [53]. In section 3.8 of this paper, we investigate the effects of

changing this threshold. A third modification involves the size of the iris code.

54

Figure 3.4: Images in our data set were captured using this LG4000 iris cam-era [76].

55

Figure 3.5: The LG4000 iris camera captures images of both eyes at the sametime.

56

We took the default 240 by 20 normalized iris image, and averaged neighboring

rows to create a smaller image to use when generating the iris code. We averaged

pixel values from rows one and two to produce the first output row, from rows

three and four to produce the second output row, and so forth, so that the final

normalized iris image was reduced to a 240 by 10 image. Let L(x, y) be the pixel

intensity at position (x, y) in the 240 by 20 normalized image. Let S(x, y) be the

pixel intensity at position (x, y) in smaller image. The computation used to create

the smaller image was

S(x, y) =L(2x, y) + L(2x − 1, y)

2. (3.1)

This row-averaging resulted in a smaller iris code, with no loss in performance [92].

From each pixel in the normalized image, we get two bits in the iris code, so the

final iris code size is 240 by 10 by 2, or 4800 bits.

3.4 Fragile Bit Distance (FBD)

Figure 3.3 provides some indication of what we should expect when comparing

two fragile bit patterns. In a genuine comparison, the locations of the fragile bits

coincide. In an impostor comparison the locations of the fragile bits do not. When

we compare two iris codes, we mask any bit that is fragile in either of the two

fragile bit patterns. Therefore, we expect more bits to be masked for fragility in

impostor comparisons than in genuine comparisons.

We can theoretically predict how many bits will be unmasked in an impostor

comparison. In this analysis, we make the assumption that the fragility of bits is

independent of position and that each position is independent of all other posi-

tions. Consider the iris code for a single, unoccluded image. We mask 25% of bits

57

for fragility, and leave 75% of bits unmasked. Now consider a comparison of two

unoccluded images from different subjects. We expect (75%)(75%) = 56.25% of

the bits to be unmasked, and 43.75% of the bits to be masked.

Another way to analyze how many bits will be masked is to consider the

number of coincident bits. If we mask 25% of the bits in each of the two irises in an

impostor comparison, we expect (25%)(25%) = 6.25% of the bits to be coincident

fragile bits. About 25%− 6.25% = 18.75% of the bits in the first iris code will be

marked as fragile and not line up with any fragile bits from the second iris code.

The total number of masked bits for the comparison will be the coincident fragile

bits, plus the bits masked in first iris code only, plus the bits masked in the second

iris code only. Therefore we expect 6.25% + 18.75% + 18.75% = 43.75% of the

unoccluded bits will be masked in an impostor comparison.

In contrast, a genuine comparison will have fewer masked bits. In two identical

images, the fragile bits will line up exactly and the comparison will have 75%

unmasked bits and 25% masked bits. However, two different images of the same

iris are not identical because of differences in lighting, dilation, distance to the

camera, focus, or occlusion. Therefore, on average, more than 25% of the bits will

be masked in a genuine comparison.

We define a metric called the fragile bit distance (FBD) to measure how well

two fragile bit patterns align. In order to compute fragile bit distance, we need

to store occluded bits and fragile bits separately. Therefore, each iris template

will consist of three matrices: an iris code i, an occlusion mask m, and a fragility

mask f . Unmasked bits are represented with ones and masked bits are represented

with zeros. Specifically, unoccluded bits and consistent bits are marked as ones,

while occluded and fragile bits are zeros. We do not want FBD to be affected by

58

occlusion, so we consider only unoccluded bits when computing the FBD.

Take two iris templates, template A and template B. The FBD is computed

as follows:

FBD =‖mA ∩ mB ∩ fA ∩ fB‖

‖mA ∩ mB‖(3.2)

where ∩ represents the AND operator, and the line over fA ∩ fB represents the

NOT operator. The norm (‖‖) of a matrix tallies the number of ones in the

matrix.

In above equation, fA ∩ fB is a matrix storing all bits masked for fragility. mA∩

mB is a matrix marking all bits unoccluded by eyelashes and eyelids. The FBD

expresses the fraction of unoccluded bits masked for fragility in the comparison.

This metric is large for impostor comparisons, and small for genuine comparisons.

Our theory predicts that we will have an average FBD of 0.4375 for impostor

comparisons, and an average FBD of somewhere between 0.25 and 0.4375 for

genuine comparisons. We tested these predictions on our data set of 19,891 images.

The average FBD for genuine and impostor comparisons are reported in Table 3.1,

with standard deviations reported in parentheses. The average impostor FBD

was within one standard deviation of the theoretical prediction. Also, the average

genuine FBD was less than the average impostor FBD.

3.5 Score Distributions for Hamming Distance and Fragile Bit Distance

We graphed the genuine and impostor score distributions for fragile bit distance

(FBD) from all possible comparisons in our 19,891-image data set. Figure 3.6

shows the result. In comparison, Figure 3.7 shows the genuine and impostor

score distributions for Hamming distance. There is more separation between the

59

TABLE 3.1

AVERAGE FBD FOR GENUINE AND IMPOSTOR COMPARISONS

Avg. Genuine FBD Avg. Impostor FBD

Theoretical value between 0.25 & 0.4375 0.4375

LG4000 images 0.4047 (std dev=0.0149) 0.4397 (std dev=0.0097)

genuine and the impostor score distributions for Hamming distance than there is

for FBD. The FBD genuine score distribution looks more bell-shaped than the

Hamming distance genuine score distribution.

Figure 3.8 shows the joint distribution of FBD and Hamming distance. Fig-

ure 3.9 shows the same joint distribution, zoomed-in on the area of the graph

where the genuine and impostor score distributions meet. Each blue point in

these figures represents at least 0.003% of the 247,872 match comparisons in our

experiment, and each red point represents at least 0.003% of the 197,229,390 non-

match comparisons. Selecting a single threshold of Hamming distance (e.g. HD =

0.35) would separate genuine and impostor comparisons better than any threshold

we might choose for FBD. Using FBD, we achieve an equal error rate of 6.34×10−2

on this data set. Using Hamming distance, the equal error rate is 8.70 × 10−3.

3.6 Fusing Fragile Bit Distance with Hamming Distance

Even though the FBD is not as powerful a metric as the Hamming distance,

we can combine the features to create a better classifier than Hamming distance

alone. To combine Hamming distance and FBD, we first tried a weighted average

60

0.3 0.35 0.4 0.45 0.5 0.550

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04

0.045

0.05

Fragile Bit Distance

Per

cent

FBD Score Distributions

Genuine Impostor

Figure 3.6: Score distributions for fragile bit distance.

61

0 0.1 0.2 0.3 0.4 0.50

0.02

0.04

0.06

0.08

0.1

0.12 HD Score Distributions

Hamming Distance

Per

cent

Genuine Impostor

Figure 3.7: Score distributions for Hamming distance.

62

0 0.1 0.2 0.3 0.4 0.50.36

0.38

0.4

0.42

0.44

0.46

0.48

Hamming Distance

F

ragi

le B

it D

ista

nce

Fragile Bit Distance vs Hamming Distance

Figure 3.8: Joint score distributions for Hamming distance and FBD. Genuinescores are shown in blue. Impostor scores are shown in red.

63

0.25 0.3 0.35 0.4 0.45

0.41

0.42

0.43

0.44

0.45

0.46

Hamming Distance

F

ragi

le B

it D

ista

nce

Fragile Bit Distance vs Hamming Distance

Each blue triangle represents at least 7 match comparisons.

Each red dot represents at least 5917 nonmatch comparisons.

HD = constant 0.6HD + 0.4FBD = constant

Figure 3.9: A zoomed-in view of the joint score distributions for Hamming distanceand FBD. Genuine scores are shown in blue. Impostor scores are shown in red.Each point represents at least 0.003% of the comparisons.

64

technique, using the same approach as [90]. We combined the two scores using

the equation,

ScoreW = α × HD + (1 − α) × FBD. (3.3)

We varied the parameter α in steps of 0.1 from 0 to 1, and calculated the equal

error rate for each run. Figure 3.10 shows how the equal error rate changes as α

varies. The lowest equal error rate was 8.02× 10−3, which was obtained using an

α value of 0.6.

The benefit of using a weighted average can be seen visually in Figure 3.9. This

figure shows the joint distribution of Hamming distance and FBD scores. The

vertical line marked “HD=constant” shows how using Hamming distance would

separate the genuine and impostor scores. The diagonal line marked “0.6HD +

0.4FBD=constant” shows that a better separation between genuine and impostor

scores is achieved using the weighted average.

Multiplication can be used as an alternative method of score fusion:

ScoreM = HD × FBD. (3.4)

When using multiplication, the equal error rate was 7.99 × 10−3. Fusing by mul-

tiplication and fusing by weighted average yielded similar results. An ROC curve

showing the results of these tests is shown in Figure 3.11, and Table 3.2 shows

summary statistics of these experiments including the equal error rate (EER) and

the false reject rate at an operating point of FAR=0.001 (FRR at FAR=0.001).

Based on the values in Table 3.2, we see that both methods of fusing Hamming

distance with FBD performed better than using Hamming distance alone. By

incorporating FBD, we improved the accuracy of our iris matcher.

65

0 0.2 0.4 0.6 0.8 17

7.5

8

8.5

9

9.5

10

10.5

11x 10

−3

Equ

al E

rror

Rat

e

alpha

Change in EER as Scores Weights Change

EER when using 0.6HD + 0.4FBD

Figure 3.10: We fused FBD and HD using the expression, α×HD+(1−α)×FBD.We found that an α value of 0.6 yielded the lowest equal error rate.

66

0 0.005 0.01 0.015 0.020.98

0.985

0.99

0.995

1

False Accept Rate

Tru

e A

ccep

t Rat

e

Fusion of HD with FBD

HD 0.6HD + 0.4FBD HD*FBD

Figure 3.11: Fusing Hamming distance with FBD performs better than usingHamming distance alone. Fusing by multiplying and fusing by weighted averagingyield similar results.

67

TABLE 3.2

FUSING FBD WITH HAMMING DISTANCE

Method EER FRR at FAR=0.001

HD (baseline) 8.70 × 10−3 1.40 × 10−2

0.6HD + 0.4FBD 8.02 × 10−3 1.25 × 10−2

HD × FBD 7.99 × 10−3 1.23 × 10−2

One caveat with using FBD is that in order to compute FBD, we have to

store the fragility mask separately from the occlusion mask. Therefore, our iris

template is 50% larger than it would be if we did not use FBD.

3.7 Tests of Statistical Significance

The proposed fusion between Hamming distance and FBD works better than

the baseline test of Hamming distance alone. We performed a statistical test to

determine whether this difference was statistically significant. The null hypothesis

for this test is that there is no difference between the baseline Hamming distance

method and the proposed fusion of Hamming distance and FBD. The alterna-

tive is that there is a significant difference. To test for statistical significance, we

randomly divided the subjects into ten different test sets. For each test set, we

measured the performance of using Hamming distance alone, and of using fusion

of Hamming distance and FBD. Then we used a paired t-test to see whether the

proposed method obtained a statistically significant improvement. The results are

given in Table 3.3 for weighted average fusion. Table 3.4 shows the results for fu-

68

TABLE 3.3

IS 0.6HD + 0.4FBD BETTER THAN HD ALONE?

Method Avg. EER Avg. FRR at FAR=0.001

HD (baseline) 8.68 × 10−3 1.51 × 10−2

0.6HD + 0.4FBD (proposed) 8.08 × 10−3 1.33 × 10−2

p-value 3.68 × 10−3 1.45 × 10−3

TABLE 3.4

IS HD × FBD BETTER THAN HD ALONE?

Method Avg. EER Avg. FRR at FAR=0.001

HD (baseline) 8.68 × 10−3 1.51 × 10−2

HD × FBD (proposed) 8.00 × 10−3 1.33 × 10−2

p-value 6.43 × 10−3 1.90 × 10−3

sion using multiplication. The t-test showed statistically significant improvement

of the proposed method over the baseline for both EER and false reject rate at a

false accept rate of 0.1% (FRR at FAR=0.001). Rerunning the same experiment

using different random test sets gave similar results.

Recall that when we performed fusion using a weighted average, we used the

69

equation,

ScoreW = α × HD + (1 − α) × FBD (3.5)

and we found that using a weight value of α = 0.6 worked best. We performed a

statistical test to determine whether this value of α was statistically significant.

For this test, we again divided the subjects randomly into ten different test sets.

We varied the parameter α in steps of 0.1 from 0 to 1. For a given value of α,

we computed the equal error rate for each of the test sets, then found the average

equal error rate for this value of α across all test sets. The results are shown

in Table 3.5. Next, we performed a paired t-test to determine whether the given

value of α produced significantly different results than using α = 0.6. The p-values

for these tests are also shown in Table 3.5. We found that at a significance level

of p=0.05, values of α between 0.4 and 0.7 were not significantly different from

α = 0.6. However, other values of α were significantly different.

3.8 Effect of Modifying the Fragile Bit Masking Threshold

Recall that fragile bit masking ignores the bits corresponding to complex filter

responses close to the axes of the complex plane (see section 2.4.5). In the ex-

periments presented up to this point, we masked 25% of bits in each iris code for

fragility. We chose this threshold because that was the threshold used in previous

work [53]. In this paper, we wanted to study how changing the threshold would

affect our results.

We ran experiments varying the threshold used for fragile bit masking. First

we ran one test with 0% fragile bit masking. We ran an all-vs-all test (comparing

all images to all other images in the data set) and computed the performance

70

TABLE 3.5

IS αHD + (1 − α)FBD STATISTICALLY SIGNIFICANTLY

DIFFERENT FROM 0.6HD + 0.4FBD?

α Avg. EER p-value Yes/No?

0 6.28 × 10−2 2.81 × 10−8 Yes

0.1 2.09 × 10−2 1.75 × 10−6 Yes

0.2 1.16 × 10−2 1.79 × 10−4 Yes

0.3 8.98 × 10−3 1.24 × 10−2 Yes

0.4 8.21 × 10−3 4.29 × 10−1 No

0.5 8.08 × 10−3 9.62 × 10−1 No

0.6 8.08 × 10−3 - -

0.7 8.19 × 10−3 1.18 × 10−1 No

0.8 8.37 × 10−3 2.56 × 10−2 Yes

0.9 8.56 × 10−3 5.25 × 10−3 Yes

1.0 8.68 × 10−3 3.68 × 10−3 Yes

71

using Hamming distance alone. The equal error rate for that test was 8.26×10−3.

Next, we varied the threshold from 5% to 30% in increments of 5%. At each

threshold, we ran three all-vs-all tests. The first test was using Hamming distance

alone. The second test was using the a weighted average of Hamming distance and

fragile bit distance: 0.6HD × 0.4FBD. The third test used the multiplication of

Hamming distance and fragile bit distance: HD × FBD. At low levels of fragile

bit masking, the Hamming distance test and the weighted average test gave very

similar results. The ROC curves for those tests are shown in Figure 3.12. At

thresholds of 15%, 20%, 25%, and 30%, the difference between Hamming distance

and weighted average was larger (see Figures 3.13 and 3.14).

The best performance using Hamming distance alone was achieved using 5%

fragile bit masking; at this threshold, the equal error rate was 8.15 × 10−3. The

best performance using the weighted average of Hamming distance and FBD was

achieved using a 25% fragile bit threshold; the equal error rate on this test was

8.02 × 10−3. The best performance for the multiplication of Hamming distance

and fragile bit distance was 7.99×10−3, and this was achieved using a 25% fragile

bit masking threshold.

We observe that the fusion of Hamming distance and fragile bit distance has

greater benefit when a higher level of fragile bit masking is used. We only tested

fragile bit masking thresholds up to 30% on our data set because for our data and

software, our experiments indicate that increasing the fragile bit masking further

would not improve performance. On the other hand, other researchers have found

that fragile bit masking of 35% worked best on the CASIA version 3 data set [6].

We postulate that any system that uses a fragile bit masking level of 15% or higher

could benefit from using fragile bit distance in addition to Hamming distance.

72

0 0.002 0.004 0.006 0.008 0.010.985

0.99

0.995

Tru

e A

ccep

t Rat

e

False Accept Rate

Verification Performance with Low Levels of Fragile Bit Masking

Fusion with 5% fragile bit masking HD with 5% fragile bit masking Fusion with 10% fragile bit masking HD with 10% fragile bit masking

Figure 3.12: We considered the effect of masking only 5% or 10% of the bits inthe iris code for fragility. Using these values, we compared the performance of(1) Hamming distance (HD) with performance of (2) fusing HD and FBD with aweighted average (0.6HD × 0.4FBD). At these low levels of fragile bit masking,the difference between HD and the fusion is small. The ROC curves for the twomethods overlap.

73

0 0.002 0.004 0.006 0.008 0.010.985

0.99

0.995

Tru

e A

ccep

t Rat

e

False Accept Rate

Verification Performance with Medium Levels of Fragile Bit Masking


Figure 3.13: We considered the effect of masking 15% or 20% of the bits in the iriscode for fragility. Again, we compared the performance of (1) Hamming distance(HD) with performance of (2) fusing HD and FBD with a weighted average. Atthese levels of fragile bit masking, the fusion clearly does better than HD alone.

74

0 0.002 0.004 0.006 0.008 0.010.985

0.99

0.995

Tru

e A

ccep

t Rat

e

False Accept Rate

Verification Performance with High Levels of Fragile Bit Masking


Figure 3.14: We considered the effect of masking 25% or 30% of the bits in theiris code for fragility. At these levels of fragile bit masking, the fusion shows aneven greater performance benefit over HD alone than there was at lower levels offragile bit masking.

75

3.9 Discussion

In this chapter, we defined a new metric, the fragile bit distance (FBD) which

measures how two fragile bit masks differ. Low FBDs are associated with genuine

comparisons between two iris codes. High FBD are associated with impostor

comparisons.

Fusion of FBD and Hamming distance is a better classifier than using Hamming

distance alone. Fusion can be done either by using a weighted average of FBD and

Hamming distance, or by multiplying. The multiplication of FBD and Hamming

distance reduces the EER of our iris recognition system by eight percent – from

8.70 × 10−3 to 7.99 × 10−3 – a statistically significant improvement.

Fusing FBD and Hamming distance has a greater benefit when higher levels

of fragile bit masking is used. At low levels of fragile bit masking, fusion had

similar results to using Hamming distance alone on our data. When using fragile

bit masking thresholds of 15% or greater, fusion had superior performance.

76

CHAPTER 4

AVERAGE IMAGES

The previous chapter focused on reducing error rates in experiments involving

still images. In this chapter, we consider how to reduce error rates when we have

an entire video clip available for both probe and gallery. Portions of this chapter

have been reprinted, with permission, from the Proc. Int. Conf. on Biomet-

rics [54] ( c© 2009, Springer Berlin/Heidelberg) and from IEEE Transactions on

Information Forensics and Security [57] ( c© 2009, IEEE).

4.1 Motivation

Zhou and Chellappa [126] reported that using video can improve face recog-

nition performance. We postulated that employing similar techniques for iris

recognition could also yield improved performance. There is some prior research

in iris recognition that uses multiple still images; for example, [39, 40, 72, 80, 105],

but the research using video for iris recognition is still in its infancy.

There are drawbacks to using single still images. One problem with single still

images is that they usually have a moderate amount of noise. Specular highlights

and eyelash occlusion reduce the amount of iris texture information present in a

single still image. With a video clip of an iris, however, a specular highlight in

one frame may not be present in the next. Additionally, the amount of eyelash

77

occlusion is not constant throughout all frames. It is possible to obtain a better

image by using multiple frames from a video to create a single, clean iris image.

A second difficulty with still images is that lighting differences can cause an in-

creased Hamming distance score in a comparison between two stills. By combining

information from multiple frames of a video, we can reduce variations caused by

changes in lighting.

Zhou and Chellappa suggested averaging to integrate texture information across

multiple video frames to improve face recognition performance. By combining

multiple images, noise is smoothed away, and relevant texture is maintained. In

this chapter, we present a method of averaging frames from an iris video. Our

experiments demonstrate that that our signal-level fusion of multiple frames in an

iris video can improve iris recognition performance.

We perform image fusion of iris images at the pixel level. Our experiments

show that the traditional segmentation and unwrapping of the iris can be used as

a satisfactory method of image registration. We compare two methods of pixel

fusion: using the mean and using the median.

There have been a number of papers discussing score-level fusion for iris recog-

nition, but there has not been any work done with signal-level fusion for iris

recognition. Since we are the first to propose the use of signal-level fusion for iris

recognition, we show that this type of fusion can perform comparably to score-

level fusion. We focus on reimplementing multiple score-level fusion techniques

to show that signal-level fusion can achieve at least as good recognition rates

as score-level fusion. Our experiments show that our method achieves superior

recognition rates to some score-level fusion techniques suggested in the literature.

Additionally, our signal-fusion method has a faster computation time for matching

78

than the score-level fusion methods.

4.2 Related Work

4.2.1 Video

Video has been used effectively to improve face recognition. A recent book

chapter by Zhou and Chellappa [126] surveyed a number of methods to employ

video in face biometrics. In contrast, there is very little research using video in

iris biometrics. In an effort to encourage research in iris biometrics using un-

constrained video, the U.S. government organized the Multiple Biometric Grand

Challenge [95]. The data provided with this challenge included two types of near

infrared iris videos: (1) iris videos captured using an LG 2200 camera, and (2)

videos containing iris and face information captured using a Sarnoff Iris on the

Move portal [82].

There has been a small amount of work published using the MBGC data.

First, some preliminary results were presented at a workshop [93]. In addition,

two conference papers using MBGC iris videos were published in the most recent

International Conference in Biometrics. The first paper was our initial version

of this research [54]. The second paper by Lee et al. [74] presented methods to

detect eyes in the MBGC portal videos and measure the quality of the extracted

eye images. They compared portal iris videos to still images. At a false accept

rate of 0.80%, they achieved a false reject rate of 43.90%.

A recent journal paper by Zhou et al. [127] also presented some results on

the MBGC iris video data. Zhou et al. suggested making some additions to the

traditional iris system in order to select the best frames from video. First they

checked each frame for interlacing, blink, and blur. They used interpolation to

79

correct deinterlacing, and they discarded blurry frames and frames without an

eye. Selected frames were segmented in a traditional manner and then assigned a

confidence score relating to the quality of the segmentation. They further evalu-

ated quality by looking at the variation in iris texture, the amount of occlusion,

and the amount of dilation. They divided the iris videos into five groups based on

quality score, and showed that a higher quality score correlated with lower equal

error rate.1

Our work differs from Lee’s [74] and Zhou’s [127] in that we use videos for

both gallery and probe sets. Also, we compare the use of stills and the use of

videos directly, while they do not. In addition, their papers focus on selecting the

best frame from a video to use for subsequent processing. In contrast, the main

focus of this work is to how to combine information from multiple frames using

signal-level fusion.

4.2.2 Still Images

Some iris biometric research has used multiple still images, but all such research

uses score-level fusion, not signal-level fusion. The information from multiple im-

ages has not been combined to produce a better image. Instead, these experiments

typically employ multiple enrollment images of a subject, and combine matching

results across multiple comparisons.

Du [39] showed that using three enrollment images instead of one increased

their rank-one recognition rate from 98.5% to 99.8%. The paper reported, “We

randomly choose three images [of] each eye from the database to enroll and used

the rest [of the] images to test. We did [this] multiple times and the average iden-

1Lee et al. [74] and Zhou et al. [127] both investigate quality of video frames. A number ofpapers have investigated quality of still images including Vatsa et al. [118], Belcher and Du [8],and Proenca and Alexandre [96].

80

tification [accuracy] rate is 99.8%. If two images are randomly selected to enroll,

... the average identification accuracy rate is 99.5%. If one image is randomly

selected to enroll ... the average identification accuracy is 98.5%.” In another

paper [40], Du et al. used four enrollment images instead of three.

Ma et al. [80] also used three templates of a given iris in their enrollment

database, and took the average of three scores as the final matching score. Krichen

et al. [72] performed a similar experiment, but used the minimum match score

instead of the average. Schmid et al. [105] presented two methods for fusing

Hamming distance scores. They computed average Hamming distance and also

a log-likelihood ratio. They found that in many cases, the log-likelihood ratio

outperformed the average Hamming distance. In all of these cases, information

from multiple images was not combined until after two stills were compared and a

score for the comparison obtained. Thus, these researchers used score-level fusion.

Another method of using multiple iris images is to use them to train a classifier.

Liu et al. [78] used multiple iris images for a linear discriminant analysis algorithm.

Roy and Bhattacharya [103] used six images of each iris class to train a support

vector machine. Even in training these classifiers, each still image was treated

as an individual entity, rather than being combined with other still images to

produce an improved image.

4.3 Data

We used the Multiple Biometric Grand Challenge (MBGC) version 2 iris video

data [95] in our experiments. The videos in this data set were acquired using

an Iridian LG EOU 2200 camera (Figure 4.1). To collect iris videos using the

LG2200 camera, the analog NTSC video signal from the camera was digitized

81

using a Daystar XLR8 USB digitizer and the resulting videos were stored in a

high bit rate (nearly lossless) compressed MP4 format.

The MBGCv2 data contains 986 iris videos collected during the spring of

2008. However, three of the videos in the data set contain less than ten frames.

We dropped those three videos from our experiments and used the remaining 983

videos. The data includes videos of both left and right eyes for each subject; we

treated each individual eye as a separate “subject” in our experiments. There

are a total of 268 different eyes in these videos. We selected the first video from

each subject to include in the gallery set and put the remaining 715 videos in our

probe set. For each subject, there were between one and seven iris videos in the

data set. Any two videos from the same subject were acquired between one week

and three months apart. The MBGC data is the only set of iris videos publicly

available.

4.4 Average Images and Templates

4.4.1 Selecting Frames and Preprocessing

Once each iris video was acquired, we wanted to create a single average im-

age that combined iris texture from multiple frames. The first challenge was to

select focused frames from the iris video. The auto-focus on the LG 2200 camera

continually adjusts the focus in attempts to find the best view of the iris. Some

frames have good focus, while others suffer from severe blurring due to subject

motion or illumination change.

We used a technique described by Daugman with a filter proposed by Kang to

select in-focus images. As described by Daugman in [28], a filter can be applied to

an image as a fast focus measure, typically in the Fourier domain. By exploiting

82

Figure 4.1: The Iridian LG EOU 2200 camera used in acquiring iris video se-quences.

83

Parseval’s Theorem, we were instead able to apply the filter within the image

domain, squaring the response at each pixel. We summed the responses over the

entire image, applying the filter to non-overlapping pixels within the image and

then averaged the response over the number of pixels the kernel was applied to.

The kernel described by Kang and Park [68] was applied to each frame, and the

ten with the highest scores were extracted from the video for use in the image

averaging experiments.

The raw video frames were not pre-processed like the still images that the

Iridian software saved. We do not know what preprocessing is done by the Iridian

system, although it appears that the system does contrast enhancement and possi-

bly some deblurring. Differences between the stills and the video frames are likely

due to differences in the digitizers used to save the signals. We used the Matlab

imadjust function [83] to enhance the contrast in each frame. This function scales

intensities linearly such that 1% of pixel values saturate at black (0), and 1% of

pixel values saturate at white (255).

Our next step was to segment each frame. Our segmentation software uses

a Canny edge detector and a Hough transform to find the iris boundaries. The

boundaries are modeled as two non-concentric circles. A description of the seg-

mentation algorithm is given in [79]. Our segmentation algorithm is designed to

work for frontal iris images acquired from cooperative subjects. A possible area

of future work would be to obtain a segmentation algorithm that could work on

off-angle irises and test our image-averaging technique on that type of iris images.

Our segmentation and eyelid detection algorithms are not as finely tuned as

commercial iris recognition software. To make up for this limitation, we ran

two types of experiments for this paper. The first type of experiments uses the

84

data obtained from the completely automated frame selection, segmentation, and

eyelid detection algorithms. We also ran a second set of experiments that included

manual steps in the preprocessing. We manually checked all 9830 frames selected

by our frame-selection algorithm. A few of the frames did not contain valid iris

information; for example, some frames showed blinks. We also found some off-

angle iris frames. We replaced these frames with other frames from the same

video (Figure 4.2). In total, we replaced 86 (0.9%) of the 9830 frames. Next we

manually checked all of the segmentation results and replaced 153 (1.6%) incorrect

segmentations (Figure 4.3). We corrected the eyelid detection in an additional

1765 (18%) frames (Figure 4.4).

4.4.2 Signal Fusion

For each video, we now had ten frames selected and segmented. We wanted

to create an average image consisting only of iris texture. In order to align the

irises in the ten frames, we transformed the raw pixel coordinates of the iris area

in each frame into normalized polar coordinates. In polar coordinates, the radius

r ranged from zero (adjacent to the pupillary boundary) to one (adjacent to the

limbic boundary). The angle θ ranged from 0 to 2π. This yielded an “unwrapped”

iris image for each video frame selected.

In order to combine the ten unwrapped iris images, we wanted to make sure

they were aligned correctly with each other. Rotation around the optical axis

induces a horizontal shift in the unwrapped iris texture. We tried three methods

of alignment. First, we identified the shift value that maximized the correlation

between the pixel values. Second, we tried computing the iris codes and selecting

the alignment that produced the smallest Hamming distance. Third, we tried the

85

(a) (b)

(c) (d)

Figure 4.2: The frames shown in (a) and (c) were selected by our frame-selectionalgorithm because the frames were in focus; however, these frames do not includemuch valid iris data. In our automated experiments presented in this paper we keptframes like (a) and (c) so that we could show how our software performed withoutany manual quality checking. In our semi-automated experiments we manuallyreplaced frames like (a) and (c) with better frames from the same video like (b)and (d). We expect that in the future, we may be able to develop an algorithmto detect blinks and off-angle images so that such frames could be automaticallyrejected.

86

(a) (b)

Figure 4.3: Our automated experiments contain a few incorrect segmentations likethe one shown in (a). In our semi-automated experiments we manually replacedincorrect segmentations to obtain results like that shown in (b).

(a) (b)

Figure 4.4: Our automated software did not correctly detect the eyelid in allframes. In our semi-automated experiments we manually replaced incorrect eyeliddetections to obtain results like that shown in (b).

87

naive assumption that people would not actively tilt their head while the iris video

was being captured and thus assumed that no shifts were needed. The first two

approaches did not produce any better recognition results than the naive approach.

This is because the images used in our experiments are frontal iris images from

cooperative users. A different method of alignment would be necessary for iris

videos with more eye movement. Since the naive approach worked well for our

data, we used it in our subsequent experiments.

Parts of the unwrapped images contained occlusion by eyelids and eyelashes.

We masked eyelid regions in our image. Then we computed an average unwrapped

image from unmasked iris data in the ten original images, using the following

algorithm. For each (r, θ) position, we find how many of the corresponding pixels

in the ten unwrapped images are unmasked. If a pixel is occluded in nine or ten of

the images, we mask it in the average image. Otherwise, an average pixel value is

based on unmasked pixel values of the corresponding frames. Therefore, the new

pixel value could be an average of between two and ten pixel intensities, depending

on mask values. Section 4.5 will give more details on averaging the pixel values.

Using this method, we obtained 268 average images from the gallery videos.

We similarly obtained 715 average images from the probe videos. An example

average image is shown in Figure 4.5. On the top of the figure are the ten original

images, and on the bottom is the average image fused from the original signals.

4.4.3 Creating an Iris Code Template

Our software uses one-dimensional log-Gabor filters to create the iris code

template. The log-Gabor filter is convolved with rows of the image, and the

corresponding complex coefficients are quantized to create a binary code. Each

88

Figure 4.5: From the ten original images on the top, we created the average imageshown on the bottom.

89

complex coefficient corresponds to two bits of the binary iris code – either “11”,

“01”, “00”, or “10” – depending on whether the complex coefficient is in quadrant

I, II, III, or IV of the complex plane.

Complex coefficients near the axes of the complex plane do not produce stable

bits in the iris code, because a small amount of noise can shift a coefficient from

one quadrant to the next. We use fragile-bit masking [52, 53] to mask out complex

coefficients near the axes, and therefore improve recognition performance.

4.5 Comparison of Median and Mean for Signal Fusion

Using the basic strategy described in 4.4.2 and 4.4.3, we needed to determine

the best method of averaging pixels. Recall that each (r, θ) position in the new

average image is the average of corresponding, unoccluded pixels in the ten original

unwrapped iris images. We considered two ideas: using the median to combine

the pixel values, or using the mean.2

To determine which of these two methods was most appropriate for iris recog-

nition, we compared all images in our probe set to all images in our gallery and

graphed a detection error tradeoff (DET) curve [81]. Figure 4.6 shows the result.

It is clear from the graphs that using the mean for creating the average images

produces better recognition performance than using the median.

The median is a useful statistic for removing outliers. However, it is possible

that many of the extreme outliers in these iris images have already been removed

by eyelid detection. Furthermore, since we are averaging only a small number of

pixels (ten or fewer), the median statistic may be less useful that if we had more

available data. While the median statistic uses information from only one or two

2To compute the mean, we first summed original pixel values, then divided by the numberof pixels, then rounded to the nearest unsigned 8-bit integer.

90

(a)10

−410

−310

−210

−1

10−2

10−1

Comparison of Two Methods of Signal Fusion

Fal

se R

ejec

t Rat

e

False Accept Rate

Automated segmentation

Median Method of Signal Fusion Mean Method of Signal Fusion

(b)10

−410

−310

−210

−1

10−2

10−1

Comparison of Two Methods of Signal Fusion

Fal

se R

ejec

t Rat

e

False Accept Rate

Manually corrected segmentation

Median Method of Signal Fusion Mean Method of Signal Fusion

Figure 4.6: Using a mean fusion rule for fusing iris images produces better irisrecognition performance than using a median fusion rule. Graph (a) shows thisresult using automated segmentation. Graph (b) shows the same result using themanually corrected segmentations.

91

pixels, the mean statistic involves information from all available pixels. Therefore,

in this context, the mean is a better averaging rule than the median.

4.6 How Many Frames Should be Fused in an Average Image?

As described in subsection 4.4.2, we fuse ten frames together to create an

average image. However, ten frames may not be the optimal number of frames to

use. Fusing more frames can give a better average. On the other hand, we add the

best focused frames first, so as we increase the number of frames, we are fusing

poorer quality data. To investigate this trade-off, we ran an experiment varying

the number of frames to use in the fusion.

Recall that from each video, we had frames selected, segmented, and un-

wrapped into normalized polar coordinates. For this experiment, rather than

using all ten selected frames to create an average image, we selected the four

frames having the highest focus scores and we created an average image. In this

manner, we collected a gallery set of four-frame average images, and a probe set

of four-frame average images. We compared all gallery images to all probe images

and graphed the corresponding DET curve (red dash-dot line, Figure 4.7).

We repeated this procedure, this time using six of our selected frames to create

each average image. The set of six frames from each video was a superset of the

set of four frames. We created a gallery set of six-frame average images, and a

probe set of six-frame average images, tried all comparisons, and graphed the DET

curve on the same axes as the four-frame curve (green solid line, Figure 4.7). We

repeated the same procedure three more times, using eight, nine, and ten frames.

All DET curves are shown together in Figure 4.7.

With the automated segmentation, each increase in the number of frames fused

92

(a)10

−410

−310

−210

−1

10−2

10−1

Performance of Signal Fusion using a Varying Number of Frames

Fal

se R

ejec

t Rat

e

False Accept Rate


4 frames 6 frames 8 frames 9 frames 10 frames

(b)10

−410

−310

−210

−1

10−2

10−1

Performance of Signal Fusion using a Varying Number of Frames

Fal

se R

ejec

t Rat

e

False Accept Rate


4 frames 6 frames 8 frames 9 frames 10 frames

Figure 4.7: Fusing ten frames together yields better recognition performance thanfusing four, six, or eight frames.

93

yielded an increase in performance. With the manually corrected segmentation,

this trend holds for four, six, and eight frames. However, the DET curves for

eight, nine, and ten frames all overlap, suggesting that we have approached the

limit of the benefit that can be gained by adding frames.

In a previous paper [54], we used six frames instead of ten, but in that paper,

we had a different data set and different frame selection algorithm. The data set

in our previous paper was a pre-release version of the MBGCv2 videos. 617 of

those videos were included in MBGCv2 and we also had an additional 444 iris

videos captured during the same semester that were not included in MBGCv2.

In our previous paper [54], we chose to use the same frames as were selected

by the Iridian driven software that came with the camera. The software saved

frames in sets of three, where one of the three frames was captured while the top

camera LED was lit, one frame was captured while the right LED was lit, and

one frame was captured while the left LED was lit. Therefore, that technique

guaranteed some lighting differences between the frames selected. Our current

frame selection technique does not enforce such a requirement, so the ten frames

selected using our current method may have fewer variations between them. With

fewer variations between the frames, it makes sense that we could average more

frames before losing any important texture in the iris.

We imagine that the optimal number of frames to fuse in creating an average

image depends both on the data set and on the frame selection algorithm. For

this paper, we decided to use ten frames in creating our average images. Using ten

frames gave the best performance using the automated segmentation. The choice

between using eight, nine, or ten frames for the manually corrected segmentation

was not as clear, but ten frames still gave the best equal error rate, and gave

94

reasonable performance across the whole DET curve.

4.7 How Much Masking Should be Used in an Average Image?

We initially allowed a pixel to be unmasked in the average image if at least two

corresponding pixels from the ten frames were unmasked. However, we suspected

that a different masking rule could improve performance. We could require that

all unmasked pixels in an average image be an average of ten unmasked pixel

values from the ten frames (instead of an average of at least two pixels). This

requirement could result in average images with not much available unmasked

data. If any one frame had a large amount of occlusion, the average image would

be heavily masked. On the other hand, we could use any unmasked pixel values

from the frames in creating the average image, so that an average pixel value could

be an average of between one and ten pixel intensities from the frames, depending

on mask values in the frames.

We defined a parameter, the masking level, to specify how much masking is

done in an average image. A masking level of 100% means that we only have

unmasked pixels in the average image if all ten of the corresponding pixels from

our ten frames were unmasked. A masking level of 10% means that the new pixel

value could be an average of between one and ten pixel intensities, depending on

mask values. A masking level of 50% means that we require at least half of the

corresponding pixels to be unmasked before we compute an average and create an

unmasked pixel in the average image. At this level, the new pixel value could be

an average of between five and ten pixel intensities, depending on mask values.

When we mask too much, we do not have as much iris data in our images

from which to make appropriate decisions. With less iris data, and consequently

95

fewer unmasked bits in a comparison, we get fewer degrees of freedom in the

nonmatch distribution. To illustrate this phenomenon, we graphed the nonmatch

distribution for a range of masking levels (Figure 4.8). As the masking level

increased, the histogram of nonmatch scores got wider, causing an increased false

accept rate. In contrast, when we mask too little, we lose the power gained from

combining data from a number of different images. The result would be like using

too few gallery images in a multi-gallery biometrics experiment.

The optimal masking level depends partly on the quality of the segmentation.

We created DET curves showing the verification performance as we varied the

masking level used in creating the average images (Figure 4.9). With our auto-

mated segmentation, a higher masking parameter is better to mitigate the impact

of segmentation errors. With the manually corrected segmentations, the quality

of the segmentation is good enough for us to use a smaller masking parameter

and thus avoid as large an increase in false accept rate. For our current data

set and segmentation, we chose to use a masking level of 80% for the automated

segmentation experiments, and a masking level of 60% when using the manually

corrected segmentation.

4.8 Comparison to Other Methods

We now present experiments comparing our method to previous methods. We

compare our signal-fusion method to the multi-gallery score-fusion methods de-

scribed by Ma [80] and Krichen [72]. Then we compare signal-fusion to Schmid’s

log-likelihood method [105]. Our last experiment compares signal-fusion to a new

multi-gallery, multi-probe score-fusion method.

96

0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.60

0.05

0.1

0.15

0.2

Hamming Distance

Per

cent

of C

ompa

rison

s

Masking Level Affects the Nonmatch Distribution


masking 100% masking 80% masking 60% masking 40% masking 20%

0.34 0.36 0.38 0.4 0.420

0.02

0.04

zoomed−in view

Figure 4.8: Too much masking decreases the degrees of freedom in the nonmatchdistribution, causing an increased false accept rate. (This graph shows the trendfrom the automatically segmented images. The manually corrected segmentationproduces the same trend.)

4.8.1 Comparison to Previous Multi-gallery Methods

In biometrics, it has been found that enrolling multiple images improves perfor-

mance [14, 21, 94]. Iris recognition is no exception. Many researchers [72, 80, 105]

enroll multiple images, obtain multiple Hamming distance scores, and then fuse

the scores together to make a decision. However, the different researchers have

chosen different ways to combine the information from multiple Hamming distance

scores.

Let N be the number of gallery images for a particular subject. Comparing a

single probe image to the N gallery images gives N different Hamming distance

scores. To combine all of the N scores into a single score, Ma et al. [80] took

the average Hamming distance. We will call this type of experiment an N-to-

1-average comparison. Krichen et al. [72] also enrolled N gallery images of a

97

(a)10

−410

−310

−210

−1

10−2

10−1

Performance of Signal Fusion using Different Masking Levels

Fal

se R

ejec

t Rat

e

False Accept Rate



(b)10

−410

−310

−210

−1

10−2

10−1

Performance of Signal Fusion using Different Masking Levels

Fal

se R

ejec

t Rat

e

False Accept Rate



Figure 4.9: The amount of masking used to create average images affects perfor-mance. When using the manually corrected segmentation, we can use a smallermasking level (masking level = 60%). With the automated segmentation, a highermasking level (masking level = 80%) mitigates the impact of missed eyelid detec-tions.

98

particular subject. However, they took the minimum of all N different Hamming

distance scores. We call this type of experiment an N-to-1-minimum comparison.

In our signal-fusion method, we take N frames from a gallery video and do

signal-level fusion, averaging the images together to create one single average

image. We then take N frames from a probe video and average them together to

create a single average image. Thus, we can call our proposed method a signal

fusion-1-to-1 comparison.

One automatic advantage of the signal fusion method is that storing a single,

average-image iris code takes only a fraction of the space of the score-fusion meth-

ods. Instead of storing N gallery templates per subject, the proposed method only

requires storing one gallery template per subject.

In order to compare our method to previous methods, we have implemented

the N-to-1-average and N-to-1-minimum methods. For our experiments, we let N

= 10. For each of these methods, we used the same data sets. Figure 4.10 shows

the detection error tradeoff curves for these experiments and Table 4.1 shows the

corresponding statistics for the manually corrected segmentation. As an additional

baseline, we also show results for a single-gallery, single-probe experiment (No

Fusion). The DET curve shows that the proposed signal fusion method has the

lowest false accept and false reject rates of all methods shown here.

We conclude that on our data set, the signal-fusion method generally performs

better than the previously proposed N-to-1-average or N-to-1-minimum methods.

In addition, the signal fusion takes 1/N th of the storage and 1/N th of the matching

time.

99

(a)10

−410

−310

−210

−1

10−2

10−1

Signal−fusion Compared to Previous Methods

Fal

se R

ejec

t Rat

e

False Accept Rate


No Fusion Score Fusion N to 1 average Score Fusion N to 1 minimum Signal Fusion

(b)10

−410

−310

−210

−1

10−2

10−1

Signal−fusion Compared to Previous Methods

Fal

se R

ejec

t Rat

e

False Accept Rate


No Fusion Score Fusion N to 1 average Score Fusion N to 1 minimum Signal Fusion

Figure 4.10: The proposed signal-fusion method has better performance than usinga multi-gallery approach with either an “average” or “minimum” score-fusion rule.

100

TABLE 4.1

SIGNAL-FUSION COMPARED TO PREVIOUS METHODS

Method d′ EER FRR@FAR=0.001

no fusion 4.62 1.56 × 10−2 3.32 × 10−2

score fusion: N-to-1 avg 5.02 8.62 × 10−3 1.90 × 10−2

score fusion: N-to-1 min 5.49 7.55 × 10−3 1.36 × 10−2

signal fusion: 1-to-1 6.06 6.99 × 10−3 1.10 × 10−2

4.8.2 Comparison to Previous Log-Likelihood Method

Schmid et al. [105] enrolled N gallery images of a particular subject and also

took N images of a probe subject. The N gallery images and N probe images were

paired in an arbitrary fashion and compared. Thus they obtained N different

Hamming distance scores. They combined the N different Hamming scores using

the log-likelihood ratio.

We give a brief summary of the log-likelihood method here. A more detailed

description can be found in [105]. Let X1, X2, ..., XN be a sequence of N iriscodes

representing a single subject in the gallery. Let Y1, Y2, ..., YN be a sequence of N

iriscodes representing a single subject as a probe. Let d = [d1, d2, ..., dN ] be a

vector of N Hamming distances formed from these two iriscode sequences. The

impostor hypothesis H0 states that the vector d is Gaussian distributed with

common unknown mean for all entries m0, and unknown covariance matrix C0.

The genuine hypothesis, H1 states that the vector d is Gaussian distributed with a

common unknown mean m1 and unknown covariance matrix C1. Denote p(d|Hi)

101

the conditional probability density function for the vector d under hypothesis Hi.

The log-likelihood ratio test statistic is

lN = (1/N)log[p(d|H1)/p(d|H0)]. (4.1)

The statistic lN can be computed as a function of m0, m1, C0, C1, d, and N . The

values m0, m1, C0, C1 are obtained using training data, and a vector of Hamming

distances d is obtained using testing data. Fractional Hamming distance scores

are bounded between zero and one, but log-likelihood test statistics have a wider

range. In our experiments we obtained scores between −1.99 to 44.60. Low scores

are from impostor comparisons and high scores are from genuine comparisons.

The log-likelihood method requires both training and testing data, so we split

our gallery and our probe each in half. We used the first half of the gallery videos

(gallery-set-A) and the first half of the probe videos (probe-set-A) for training and

obtained a set of maximum-likelihood parameters. Next we compared the second

half of the gallery videos (gallery-set-B) and the second half of the probe videos

(probe-set-B); applying the the maximum-likelihood parameters to the resulting

Hamming distance vectors gave us log-likelihood scores from the test data B.

Of course, it would be better to have as many scores as possible from our

data, so we repeated the experiment, this time using set B to train the maximum-

likelihood parameters and set A to test. We obtained log-likelihood scores from

test data A. We combined all log-likelihood scores and created a DET curve rep-

resenting the performance of the log-likelihood method.

Table 4.2 gives statistics comparing the log-likelihood method with the signal

fusion method for the manually corrected segmentations, and Figure 4.11 shows

the DET curves for the comparison. The log-likelihood method has a lower equal

102

TABLE 4.2

SIGNAL-FUSION COMPARED TO LOG-LIKELIHOOD SCORE

FUSION

Method d′ EER FRR@FAR=0.001

log-likelihood 3.90 2.65 × 10−3 9.20 × 10−3

signal fusion 6.06 6.99 × 10−3 1.10 × 10−2

error rate, but the signal fusion method performs better at smaller false accept

rates. In addition, the signal fusion takes 1/N th of the storage and 1/N th of the

matching time.

4.8.3 Comparing to Large Multi-Gallery, Multi-Probe Methods

The previous subsections compared our signal-fusion method to previously-

published methods. Each of those score-fusion methods fused N Hamming dis-

tance scores to create the final score. We also wished to consider the situation

where for a single comparison, there are N gallery images and N probe images

available, and all N2 possible Hamming distance scores are computed and fused.

We would expect that the fusion of N2 scores would perform better than the fusion

of N scores. Although this multi-gallery, multi-probe fusion is a simple extension

of the methods listed in subsection 4.8.1, we do not know of any published work

that uses this idea for iris recognition.

We tested two ideas: we took the average of all N2 scores, and also the min-

imum of all N2 scores. We call these two methods the (1) multi-gallery, multi-

103

(a)10

−410

−310

−210

−1

10−2

10−1

Signal Fusion Compared to Log−likelihood Method

Fal

se R

ejec

t Rat

e

False Accept Rate


Log−likelihood Signal Fusion

(b)10

−410

−310

−210

−1

10−2

10−1

Signal Fusion Compared to Log−likelihood Method

Fal

se R

ejec

t Rat

e

False Accept Rate


Log−likelihood Signal Fusion

Figure 4.11: Signal fusion and log-likelihood score fusion methods perform compa-rably. The log-likelihood method performs better at operating points with a largefalse accept rate. The proposed signal-fusion method has better performance atoperating points with a small false accept rate.

104

probe, average method (MGMP-average) and the (2) multi-gallery, multi-probe,

minimum method (MGMP-minimum). The MGMP-average method produces

impostor Hamming distance distributions with small standard deviations. Using

the “minimum” rule for score-fusion produces smaller Hamming distances than

the “average” rule. However, both the genuine and impostor distributions are

affected. Based on the DET curves (Figure 4.12), we found that for these two

multi-gallery, multi-probe methods, the “minimum” score-fusion rule works bet-

ter than the “average” rule for this data set.

We compared the MGMP methods to the signal fusion method. The signal-

fusion method presented in this subsection is unchanged from the previous subsec-

tion, but we are presenting the results again, for comparison purposes. Statistics

for the signal fusion and the MGMP methods are shown in Table 4.3. The error

rates for signal fusion in Table 4.1 and Table 4.3 are the same because we are

running the same algorithm on the same data set.

Based on the equal error rate and false reject rate, we conclude that the multi-

gallery, multi-probe minimum method that we present in this section achieves

the best recognition performance of all of the methods considered in this paper.

However, the signal-fusion performs well, while taking only 1/N th of the storage

and 1/N2 of the matching time.

4.8.4 Computation Time

In this subsection, we compare the different methods presented in this pa-

per in terms of processing time. We have three types of methods to compare:

(1) the multi-gallery, multi-probe approaches (both MGMP-average and MGMP-

minimum) which require N2 iris code comparisons before fusing values together

105

(a)10

−410

−310

−210

−1

10−2

10−1

Signal Fusion Compared to MGMP Methods

Fal

se R

ejec

t Rat

e

False Accept Rate


MGMP average MGMP minimum Signal Fusion

(b)10

−410

−310

−210

−1

10−2

10−1

Signal Fusion Compared to MGMP Methods

Fal

se R

ejec

t Rat

e

False Accept Rate


MGMP average MGMP minimum Signal Fusion

Figure 4.12: The MGMP-minimum achieves the best recognition performance ofall of the methods considered in this paper. However, the signal-fusion performswell, while taking only 1/N th of the storage and 1/N2 of the matching time.

106

TABLE 4.3

SIGNAL-FUSION COMPARED TO MULTI-GALLERY,

MULTI-PROBE SCORE FUSION

Method d′ EER FRR @ FAR=0.001

MGMP-average 5.32 5.47 × 10−3 1.17 × 10−2

MGMP-minimum 6.51 1.60 × 10−3 3.08 × 10−3

signal fusion 6.06 6.99 × 10−3 1.10 × 10−2

to create a single score; (2) the multi-gallery approaches (Ma and Krichen) which

compare N gallery iris codes to one probe before fusing scores together; and (3)

the signal-fusion approach which first fuses images together, and then has a single

iris code comparison.

For this analysis, we first define the following variables. Let P be the prepro-

cessing time for each image, I be the iris code creation time, and C be the time

required for the XOR comparison of two iris codes. Let N be the number of images

of a subject in a single gallery entry for the multi-gallery methods. Let A be the

time required to average N images together (to perform signal-fusion). Finally,

suppose we have an application such as in the United Arab Emirates where each

person entering the country has his or her iris compared to a watchlist of one

million people [30]. For this application, let W be the number of people on the

watchlist. Expressions for the computation times for all three methods are given

in terms of these variables in Table 4.4.

The multi-gallery, multi-probe methods must do preprocessing and iris code

107

TABLE 4.4

PROCESSING TIMES FOR DIFFERENT METHODS

Method GalleryPreprocessing

ProbePreprocessing

Comparisonto Watchlist

TotalTime

MGMP NP+NI= 4.46 s

NP+NI= 4.46 s

WCN2

= 1000 s1008.9 s

Multi-gallery

NP+NI= 4.46 s

P+I = 0.446 s WCN= 100 s

104.9 s

Signal-fusion

NP+A+I= 3.55 s

NP+A+I= 3.55 s

WC = 10 s 17.09 s

creation for N images to create one gallery entry. Thus, the gallery preprocessing

time for one gallery subject is NP+NI. They also preprocess and create N iris

codes for a probe subject, so the probe preprocessing time is also NP+NI. To

compare a single probe entry to a single gallery entry takes CN2 time because

there are N2 comparisons to be done. To compare a probe to the entire watchlist

takes WCN2 time. Similar logic can be used to find expressions for the time taken

for the other two methods. All such expressions are presented in Table 4.4.

From Daugman’s work [28], we can see that typical preprocessing time for an

image is 344 ms. He also notes that iris code creation takes 102 ms and an XOR

comparison of two iris codes takes 10 µs. Throughout this paper, we have used ten

images for all multi-gallery experiments. The time to compute an average image

from ten preprocessed images is 5 ms. Lastly, we know that the United Arab

Emirates watchlist contains one million people. By substituting these numbers

in for our variables, we found the processing time for all of our three types of

108

methods. These numeric values are also presented in Table 4.4.

A graph of the total computation time for these methods over a number of

different sizes of watchlist is shown in Figure 4.13. From this analysis it is clear,

that although a multi-gallery, multi-probe method may have some performance

improvements over the signal fusion method, it comes at a high computational

cost.

101

102

103

104

105

106

0

200

400

600

800

1000

Size of Watchlist (People)

Sec

onds

Time Required to Compare One Probe to a Watchlist

M−G, M−PScore Fusion N to 1Signal Fusion

Figure 4.13: Even though a large multi-gallery, multi-probe experiment achievesbetter recognition performance, it comes at a cost of much slower execution time.The proposed signal fusion method is the fastest method presented in this paper,and it achieves better recognition performance than previously published multi-gallery methods.

109

4.9 Discussion

We performed fusion of multiple biometric samples at the signal level. Our

signal fusion approach utilizes information from multiple frames in a video. We

were the first to publish research that used video to improve iris recognition per-

formance [54]. Our experiments show that using average images created from ten

frames of an iris video performs very well for iris recognition. Average images

perform better than (1) experiments with single stills and (2) experiments with

ten gallery images compared to single stills. Our proposed multi-gallery, multi-

probe minimum method achieves slightly better recognition performance than our

proposed signal-fusion method. However, the matching time and memory require-

ments are lowest for the signal-fusion method, and the signal-fusion method still

performs better than previously published multi-gallery methods.

110

CHAPTER 5

IRIS BIOMETRICS ON TWINS

Prior research has shown that the textural detail of the iris is sufficiently

distinctive to distinguish identical twin siblings. However, no research has ad-

dressed the question of whether twins’ irises are sufficiently similar in some sense

to correctly determine that two irises are from twins. We conducted a human

classification study in which participants were asked to label pairs of iris images

as “twins” or “unrelated”. This study shows that there is information in the iris

appearance that is not captured by iris codes. This information could potentially

be used for forensic applications to show genetic relationships between different

eye images. Portions of this chapter are reprinted, with permission, from the

Proc. IEEE Computer Vision and Pattern Recognition Biometrics Workshop [51]

( c© 2010, IEEE).

5.1 Motivation

Iris biometrics systems exploit textural details on the iris that have been shown

to be independent even between irises of genetically identical individuals. There-

fore, automated iris biometrics systems can distinguish between identical twins.

Companies selling biometrics products appropriately focus on the differences in

iris texture without considering similarities in iris appearance. L-1 Identity Solu-

tions, a company that licenses the algorithms developed by Dr. John Daugman,

111

advertises that “No two irises are alike. There is no detailed correlation between

the iris patterns of even identical twins, or the right and left eye of an individ-

ual” [73]. Unfortunately, laypersons incorrectly infer from such statements that

the textures in images of irises of genetically identical humans have no similar-

ities. Wikipedia reports, “Even genetically identical individuals have completely

independent iris textures” [60] (emphasis added). The computer vision and bio-

metrics research communities have not done any previous research investigating

the similarities between genetically identical irises.

There is information in iris appearance that is not captured by iris codes.

Experiments described in this chapter show that untrained human observers can

detect the similarities in genetically related irises. This fact suggests a couple

of implications. First, the degree of similarity between twins’ irises is an impor-

tant privacy concern. Managers of an iris biometrics database may assume that

because an iris image is not labeled with a name, that the image does not need

to be encrypted or highly protected. However, if a hacker can determine genetic

relation simply from iris images, database managers should use caution to protect

the images as much as possible. Second, if it is possible for humans to determine

genetic relation from a pair of iris images, it may be possible to design a com-

puterized texture analysis system that could detect the traits that humans are

identifying. We envision a computerized system to predict whether or not two iris

images represent genetically related persons.

When considering templates created by iris biometric systems, we agree with

other researchers that identical twins’ templates are no more similar than those

of unrelated persons’. But, when we do an analogous experiment with human

observers we get a very different result. This raises the possibility that a differ-

112

ent kind of texture analysis algorithm could answer questions that current iris

biometric texture analysis cannot. Based on our human observer results, it is

reasonable to look for an automated texture analysis that could say whether or

not iris images come from identical twins.

The remainder of this chapter is organized as follows. Section 5.2 summarizes

related research including iris biometric experiments on twins, and experiments

to see what additional information (i.e. gender, race) can be determined from

iris texture. Section 5.3 describes our data acquisition and image segmentation.

Section 5.4 corroborates other researchers’ claims that iris biometrics algorithms

generate templates that encode differences in the texture of twins. Section 5.5

explains our new experiments to test how much similarity humans can detect

between identical twins’ irises. Section 5.6 provides a summary and conclusion.

5.2 Related Work

A small number of iris papers have reported on experiments involving twins.

Daugman reports that “about 1% of all persons in the general population have

an identical twin” [28]. Flom and Safir, who held the first patent discussing the

concept iris recognition (1987), asserted that twins irises are different: “Not only

are the irises of the eyes of identical twins different, but the iris of each eye of any

person is different from that of his other eye” [41]. However, Flom and Safir did not

have an iris biometrics implementation, so their claim is based on ophthalmologic

observations rather than biometric experimentation. Daugman’s seminal paper

on iris recognition from 1993 reported that iris texture develops randomly. He

stated, “a property the iris shares with fingerprints is the random morphogenesis

of its minutiae. Because there is no genetic penetrance in the expression of this

113

organ beyond its anatomical form, physiology, color and general appearance, the

iris texture itself is stochastic or possibly chaotic. Since its detailed morphogenesis

depends on initial conditions in the embryonic mesoderm from which it develops,

the phenotypic expression even of two irises with the same genetic genotype (as

in identical twins, or the pair possessed by one individual) have uncorrelated

minutiae” [25]. In 1997, Wildes et al. reported on iris biometrics experiments

that contained twin data. Their experiments used data from 60 different irises

from 40 people, but it is not clear how many of those people were twins. They

report, “Of note is the fact that this sample included identical twins. ... There

were no observed false positives or false negatives in the evaluation of this corpus

of data. In this case, statistical analysis was eschewed owing to the small sample

size. At a qualitative level, however, the data for authentics and imposters were

well separated” [120].

Daugman’s later papers reported on experiments done on genetically identical

eyes. He “compared genetically identical eyes ... in order to discover the degree to

which their textural patterns were correlated and hence genetically determined.

A convenient source of genetically identical irises are the right and left pair from

any given person; such pairs have the same genetic relationship as the four irises

of monozygotic twins, or indeed the prospective 2N irises of N clones. Although

eye color is of course strongly determined genetically, as is overall iris appearance,

the detailed patterns of genetically identical irises appear to be as uncorrelated as

they are among unrelated eyes. ... 648 right/left iris pairs from 324 persons were

compared pairwise. Their mean HD was 0.497 with standard deviation 0.031,

and their distribution ... was statistically indistinguishable from the distribution

for unrelated eyes. A set of six pairwise comparisons among the eyes of actual

114

monozygotic twins also yielded a result (mean HD = 0.507) expected for unrelated

eyes. It appears that the phenotypic random patterns visible in the human iris

are almost entirely epigenetic” [28].

A recent paper by Sun et al. [111] evaluated performance of iris, fingerprint,

and face biometric systems on a set of twins data. Their data set contained

51 pairs of identical twins and 15 pairs on non-identical twins for a total of 66

twin families. They generated biometric scores from twin comparisons and from

unrelated-person comparisons, and found that “the identical twin impostor dis-

tribution is very similar to the general impostor distribution. However, the peaks

that are present in the identical twin impostor distribution tail may indicate that

the irises of identical twins have some correlation” [111]. Any difference between

twins and the general population was very small. They graphed ROC curves show-

ing performance of a twins experiment compared to an unrelated-impostor exper-

iment and concluded that “there is no significant difference in the performance of

the biometric system for the identical twin data and for the general data, which

means the iris biometric system can distinguish identical twins as much as it can

distinguish any two different persons who are not identical twins” [111].

The above quotations demonstrate that at least four different groups of re-

searchers – Flom and Safir [41], Daugman [28], Wildes et al. [120], and Sun et

al. [111] – have found twins’ irises to be distinct according to current iris recog-

nition algorithms. None of these researchers investigate the similarities between

twins’ iris texture.

No prior work has considered similarities in iris texture between genetically

related people [16]. However, there has been work investigating whether gender

or race can be predicted from iris texture. Thomas et al. [113] used decision trees

115

to classify irises as being from male or female subjects. They used two different

types of features. First, they used geometric features such as distance between

the detected iris center and pupil center, and difference in iris area and pupil

area. Second, they used texture-based features such as the mean and standard

deviation of filter responses along rows of an “unwrapped” iris image. Using these

features, they achieved close to 80% accurate gender prediction. Qiu et al. [98]

used an Adaboost algorithm to predict whether irises came from Asian or non-

Asian subjects. They achieved 85.95% correct prediction rate. These two papers

do not directly relate to identifying twins from iris texture. However, they show

that it is possible to predict information about a subject based on iris texture

alone.

5.3 Data

To obtain data from a large number of twins, we attended the Twins Days

festival in Twinsburg, Ohio in August 2009 [114]. Twins Days is the largest annual

gathering of twins in the world, and therefore a logical place to gather biometric

data from twins. Video data of irises was collected using an LG 2200 EOU camera

attached to a Phillips DVDR3576H digital video recorder. The analog signal from

the camera was captured, digitized, and stored using a high bit rate (effectively

lossless) compressed MP4 format.

No DNA testing was required for the twins to participate in the biometric

video acquitions. However, the twins were asked to report whether they were

identical or fraternal twins. Of the 98 twin pairs that came to our research booth,

84 said that they were identical twins, 9 reported that they were fraternal twins,

and 5 reported that they did not know.

116

From the collection of videos of self-reported identical twins, we discarded

videos of subjects wearing glasses, hard contacts, or patterned contacts [4]. We

also discarded videos where the light from the sun had saturated the sensor, result-

ing in poor-contrast video. Videos from the remaining 76 pairs of identical twins

(152 people) were used in our experiments. The largest iris biometrics experiment

previously published contained 51 pairs of identical twins [111]. Therefore, our

data set is about fifty percent larger than the data set in the largest previous

study on twin iris biometrics.

Our experiment uses pairs of images from twins and pairs from unrelated

people. We captured enough data during the Twins Days festival to add an

additional 44 people to our experiment, for a total of 196 people, or 392 distinct

eyes. Some subjects participated on two days, so we have 450 total videos that

we used for the experiments in this paper.

5.3.1 Frame Selection

Each video of iris data contained frames of varying quality. We used a computer

program to help us select which frames to use in our experiments. First, to avoid

using unusually dark frames, our software automatically rejected all frames with

an average intensity less than a threshold of 115. Second, our software used a

Fourier transform to detect and reject frames with high-frequency noise. From

the remaining frames, our software selected the ten most in-focus frames in each

video. From 450 videos, we selected 4500 frames. Unfortunately, one video had

several imaging artifacts, so we only had 4494 usable frames. We used these frames

in a small experiment to show that current iris biometrics algorithms use detailed

texture information that is capable of distinguishing between twins (Section 5.4).

117

Figure 5.1: Images of the left eyes of two identical twins. Notice the similaritiesin overall iris texture, and also the similarities in the appearance of the periocularregion.

We examined all 4494 frames, and hand-selected one frame from each eye to

use in our queries to our human testers (Section 5.5). We chose frames that had

the least eyelid occlusion obscuring the iris. We also favored irises centered in the

frame. An example pair of images used in our human-tester experiment is shown in

Figure 5.1. For one video, the ten automatically-selected frames were not frontal

images, so we hand-picked a frame from the original video that presented a clear

frontal iris image.

5.3.2 Segmentation

For our experiments, we needed to accurately locate the iris in each image. We

first used our automatic segmentation software, which uses active contours to find

the inner and outer iris boundaries. Since our automatic segmentation does not

always segment the image correctly, we hand-checked all of the segmentations. If

our software had made an error in finding the inner or outer iris boundary, we

manually marked the center and a point on the boundary to identify the correct

center and radius of an appropriate circle. If the software had made an error in

finding the eyelid, we marked four points along the boundary to define three line

118

(a) (b)

Figure 5.2: Images of irises from identical twins. We segmented the images so thatour testers would only see the iris, and therefore they could not use periocularfeatures to help them decide whether two irises were from twins.

segments approximating the eyelid contour.

Our primary goal in presenting iris images to humans was to answer the ques-

tion, “Can humans determine whether two irises are from identical twins?” We

did not want eyelashes, eyelids, or other features around the iris to appear in our

images, because those features might influence our testers’ responses. For all seg-

mented iris images, we set all pixels outside the iris region to black. We colored

the pupil and the eyelid black as well. An example of a pair of twins’ images

with our hand-segmentation is shown in Figure 5.2. We hand-marked the eyelid

in both 5.2(a) and 5.2(b). Figure 5.3(a) shows an iris where we used the original

active contour segmentation and did not correct the eyelid.

5.4 Biometric Performance on Twins’ Irises

Our data set of iris images from 76 pairs of identical twins gives us an oppor-

tunity to verify previous claims that identical twins’ irises are different. We used

119

(a) (b)

Figure 5.3: Images of irises from unrelated people.

our iris biometrics software to generate templates or “iris codes” from 4494 iris

images. Next, we compared each template with every other template, in an “all-

vs-all” experiment. We computed the fractional Hamming distance between each

pair of iris codes, then normalized the scores based on the number of unmasked

bits used in each comparison, using the score normalization technique proposed

by Daugman [29].

Figure 5.4 shows the distributions of normalized Hamming distances from our

experiment. We assumed that the system could know whether an image was

a left eye or a right eye, so all scores included in these distributions are either

comparisons of a left eye vs. a left eye, or comparisons of a right eye vs. a right eye.

The blue histogram shows authentic comparisons. This histogram contains scores

from comparisons where an iris image is compared to other images of the same iris.

The black histogram shows impostor comparisons of twins. In other words, the

histogram contains scores from comparisons where a person’s eye was compared

to an eye from that person’s twin. The red histogram shows impostor comparisons

120

0 0.1 0.2 0.3 0.4 0.5 0.60

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04

0.045

0.05

Normalized Hamming Distance

Fra

ctio

n of

Com

paris

ons

Score Distributions

Authentic Twin impostor Non−twin impostor

Figure 5.4: A histogram of Hamming distance scores between twins looks similarto a histogram of Hamming distance scores between non-twins.

from non-twins. That is, the histogram contains scores from comparisons where

a person’s eye was compared to an eye from an unrelated person.

The histograms do not show perfect separation between the authentic and the

impostor histograms, but it is clear that the twin impostor histogram is quite simi-

lar to the non-twin impostor histogram. This result agrees with others’ claims that

iris biometrics systems can differentiate between twins [28, 41, 111, 120]. A larger

data set might make the twin impostor and the non-twin impostor histograms

match more closely.

121

5.5 Similarities in Twins’ Irises Detected by Humans

5.5.1 Experimental Setup

Our data set contained a total of 196 people, enough for 98 query pairs without

using a subject in more than one query. We decided to have 49 queries where

the images were twins, and 49 queries where the images in the pair were from

unrelated people. To create our query pairs, we randomly selected 49 pairs of

twins from the list of identical twin pairs. These twins were used in twin queries.

The remaining twins and the other subjects were used to create unrelated-person

query pairs. These subjects were paired randomly, and then each pair was checked

to ensure that the two subjects in each of these pairs were not twins. An example

of an unrelated pair of irises is shown in Figure 5.3. In the end, we had 98 pairs of

images to present to our testers, with exactly 49 of those pairs containing matching

identical twins and 49 containing unrelated persons. No subject appeared more

than once in all the iris image pairs, and therefore a total of 196 people were

represented. The demographic information for our subjects is shown in Table 5.1.

In addition to our original question about whether humans could pick out

twins’ irises, we also decided to ask our testers whether they could pick out twins

using the features in the eye image that were not part of the iris. From the

same 196 subjects, we again randomly selected 49 pairs of identical twins and

constructed 49 twin pairs of periocular images (Figure 5.5). We randomly paired

the remaining subjects to create unrelated-person periocular queries. No subject

appeared more than once in all the periocular image pairs.

There is not a one-to-one correspondence between the iris pairs and the perioc-

ular pairs used in our experiment. We selected the iris pairs at random, and then

122

TABLE 5.1

DEMOGRAPHIC INFORMATION OF SUBJECTS

Total number of subjects 196

Number of self-reported identical twins 152

Additional subjects imaged on the same day 44

Number of males 47

Number of females 149

White 171

Black or African-American 24

Hispanic 1

Age 18-20 17

Age 21-30 60

Age 31-40 21

Age 41-50 32

Age 51-60 31

Age 61-70 26

Age 71-80 9

123

Figure 5.5: We wanted to know whether humans could identify twins based onperiocular information. We created images where the iris was blacked-out so thatour testers would be forced to use periocular features to make a judgment. Thisis an example pair of images. These images are from identical twins.

re-ran our script to select the periocular pairs at random. Since we had a limited

amount of data, there are some queries where a twin’s right iris appeared the iris

portion of the experiment, and later the twin’s right periocular region appeared in

the periocular portion of the experiment. In other instances we presented twins’

right eyes in the iris portion, and used left eyes in the periocular portion. In other

cases, a twin’s eye appeared in a twin pair in the iris section (paired with his twin,

of course), and then appeared in an unrelated-person query in the periocular por-

tion of the experiment (paired with a different, randomly-selected but unrelated

subject). We used this strategy to maximize the randomization in the selected

pairs. This strategy does prohibit some analysis that we could have performed

if we had used a one-to-one correspondence of images in the iris and periocular

portions of the experiment.

We utilized a graphical user interface to present image pairs to our testers. This

software displayed instructions and 12 example image pairs to familiarize users

with the task. The examples included three pairs of iris images from twins, three

pairs of periocular images from twins, three pairs of iris images from unrelated

124

people, and three pairs of periocular images from unrelated people. Next, the

software presented the 98 iris image pairs. The 98 pairs were presented in a

different random order each time the program was run. Each image pair was

displayed for three seconds. After each pair, the program asked “Were these

images from identical twins?” Five responses were possible: (1) Certain these

images were from identical twins, (2) Likely they were from identical twins, (3)

Can’t tell, (4) Likely they were NOT from identical twins, or (5) Certain they were

NOT from identical twins. After the user responded, the software revealed whether

the user was correct or not. Then the user could click a button to continue to the

next image pair. Once all the iris image pairs were presented, the 98 periocular

image pairs were presented. These pairs were also presented in a different random

order each time the program was run.

We chose to present all of the iris image pairs first, and all of the periocular

image pairs second so that our testers would not be confused switching between

types of questions. Our primary goal was to answer the question, “can humans

determine whether two irises are from identical twins?” We did not want the

presence of the periocular image pairs to affect our iris experiment. However,

we were also interested in some secondary questions: “can humans determine

whether two eye images are from identical twins by looking at the periocular

region alone?” and “is the iris or the periocular region more useful for identifying

twins?” Presenting the periocular image pairs gave us the opportunity to study

these questions. Our experiment is not a perfect way to test the difference between

the power of the iris and the periocular region, because testers might perform

better on the later queries than on earlier ones as they became more familiar with

test format. Nevertheless, our experiment still provides a suggestion of which is

125

better.

We solicited volunteers to participate in our experiment, and twenty-eight peo-

ple signed up. Volunteers were offered ten dollars to participate, and an additional

ten dollars if they could correctly categorize 80 percent or more of the image pairs.

5.5.2 Results

5.5.2.1 Can Humans Identify Twins from Iris Texture Alone?

To find an overall accuracy score, we counted the number of times the tester

was “likely” or “certain” of the correct response; that is, we made no distinction

based on the tester’s confidence level, only on whether they believed a pair to be

twins when they were twins, or believed a pair to be unrelated when they were

unrelated. We divided the number of correct responses by 98 (the total number

of iris queries) to yield an accuracy score. The average percent correct on the iris

portion of the experiment was 81.3% (standard deviation 5.2%). The minimum

score was 68.4%, and the maximum score was 89.8%. We used a t-test to evaluate

the null hypothesis that humans did not perform differently than random guessing.

The resulting p-value was less that 10−4. Thus, we have statistically significant

evidence that our testers were doing better than random.

5.5.2.2 Can Humans Identify Twins from Periocular Information Alone?

We might expect that our testers would be more familiar with the test format

and perform better on the periocular queries than the iris queries. In fact, the

reverse was true. The average percent correct on the periocular queries was 76.5%

(standard deviation 5.1%). This result suggests that for our data, the iris has

better information for identifying twins than does the periocular region. See

126

section 5.5.2.4 for more discussion of this idea.

The minimum score on the periocular portion was 63.3% and the maximum

score was 86.7%. A t-test showed that this result is statistically better than

random guessing (p-value < 10−4).

5.5.2.3 Did Humans Score Higher on Queries where They Felt More Certain?

As mentioned above in section 5.5.1, our testers had the option to mark (1)

Certain these images were from identical twins, (2) Likely they were from identical

twins, (3) Can’t tell, (4) Likely they were NOT from identical twins, or (5) Certain

they were NOT from identical twins. Some testers were more “certain” than

others. One tester responded “certain” for 64 of the 98 iris queries and 57 of the

98 periocular queries. At the other extreme, one tester responded “certain” for

only one of the iris queries and none of the periocular queries. The average number

of “certain” responses on the iris portion of the test was 29.2 out of 98 (standard

deviation 17.1). The average number of “certain” responses on the periocular

portion of the test was 25.6 out of 98 (standard deviation 17.9).

Out of the queries that testers were “certain” about, the average percent cor-

rect on the iris portion was 92.1% (standard deviation 18.4%). On the periocular,

the average percent correct, excluding the three subjects who were never certain,

was 93.4% (standard deviation 5.8%). Therefore, the testers obviously scored

better on the subset of the queries where they felt “certain” of their answer.

5.5.2.4 Is It Easier to Identify Twin Pairs Using Iris Data or Periocular Data?

The majority of testers, 20 out of 28, performed better on the iris portion of

the experiment. One tester scored the same on both portions, and seven testers

127

performed better on the periocular potion. We found the difference between the

iris accuracy score and the periocular score for each tester. The average difference

was 4.9% (standard deviation 6.2%). The minimum difference was -4.1%, meaning

that one subject scored about 4% better on the periocular queries compared to

the iris queries. The maximum difference was 17.4%, meaning that one subject

scored over 17% better on the iris portion. We used a paired t-test to test the

null hypothesis that the scores on the iris portion and the scores on the periocular

portion came from distributions with equal means. The p-value for the test was

0.0003. Thus, there is a statistically significant difference between the scores on

the two portions. This result suggests that for our data, the iris appearance was

more valuable than the periocular appearance for identifying twin pairs.

5.5.2.5 Did Subjects Score Better on the Second Half of the Iris Test than the

First Half?

We computed the difference between the accuracy on the second half of the

iris queries and the first half of the iris queries. That is, we found the accuracy of

the second 49 questions and subtracted the accuracy of the first 49 questions. We

found this difference for each tester, then computed the average difference across

all 28 testers. The average difference was 1.2% (standard deviation 7.4%). The

minimum difference was -12.2% and the maximum difference was 18.4%. Thirteen

of the 28 subjects performed better on the second half of the iris queries; eleven

did worse, and four stayed the same.

Since the average difference is positive, this might suggest that there was some

learning as the test progressed. However, the average difference is small compared

to the standard deviation. A one-tailed t-test shows that the difference is not

128

statistically significant (p-value 0.2064).

5.5.2.6 Did Subjects Score Better on the Second Half of the Periocular Test than

the First Half?

We wanted to acertain whether the testers learned during the periocular por-

tion of the exam. To answer this question, we computed the difference between

the accuracy on the second half of the iris queries and the first half of the iris

queries. The average difference was -0.1% (standard deviation 9.5%). Fourteen

subjects performed better on the second half of the periocular queries, thirteen

performed worse, and one subject’s performance did not change. It seems that

any ideas the testers might have “learned” in the first 49 periocular queries did

not help in the second 49 periocular queries.

The lack of improvement between the first and second halves of the periocular

test does not necessarily imply that humans cannot learn from viewing additional

periocular images. It may be that humans need to see a larger number of examples

to learn the most effective features for discrimination. A number of our images

showed twin eyes with similar mascara or eyeliner. However, there were also twin

pairs where only one of the two twins had eye make-up. A tester viewing the twin

pairs with matching make-up might falsely assume that all twins in the data set

had similar make-up. Thus there may be instances of false learning which were

not corrected in the first 49 image pairs and resulted in incorrect responses later.

129

Figure 5.6: All 28 testers correctly classified this pair of images as being fromidentical twins.

Figure 5.7: All 28 testers correctly classified this pair of images as being fromidentical twins.

5.5.2.7 Which Image Pairs Were Most Frequently Classified Correctly, and Which

Pairs Were Most Frequently Classified Incorrectly?

One pair of twins’ irises was classified correctly by all 28 testers. This pair is

shown in Figure 5.6. Six pairs of twins’ periocular images were classified correctly

by all 28 testers. An example is shown in Figure 5.7. There were ten pairs of

unrelated iris images that were classified correctly by all 28 testers. An example

is shown in Figure 5.8. There was also one pair of unrelated periocular images

that was classified correctly by all testers. This pair is shown in Figure 5.9.

130

Figure 5.8: All 28 testers correctly classified this pair of images as being fromunrelated people.

Figure 5.9: All 28 testers correctly classified this pair of images as being fromunrelated people.

131

Figure 5.10: Twenty-five of 28 people incorrectly guessed that these images werefrom unrelated people. In fact these irises are from identical twins. The differencein dilation makes this pair particularly difficult to classify correctly.

Figure 5.10 shows the image pair that was most frequently classified incorrectly.

Twenty-five of the 28 subjects incorrectly guessed that these images were from

unrelated people. One of the challenges with this pair of images is the significant

difference in pupil radius. Of all unrelated-person pairs, the one most frequently

misclassified is shown in Figure 5.11. The challenge with this pair is that both of

the irises have fairly uniform texture.

5.5.2.8 Is It More Difficult to Label Twins as Twins than It Is to Label Unrelated

People as Unrelated?

As mentioned in section 5.5.2.7, only one pair of twins’ irises was classified

correctly by all 28 testers, yet ten pairs of unrelated iris images were classified

correctly by all 28 testers. This finding prompts a question – is it harder to

label twins as twins than it is to label unrelated people as unrelated? When we

consider the scores over all iris images, this seems to be true. Forty-nine queries

presented pairs of twins’ irises. The average percent correct on those queries

132

Figure 5.11: Twenty-four of 28 people incorrectly guessed that these images werefrom twins, when in fact, these irises are from unrelated people. The smoothnessof the texture makes this pair difficult to classify correctly.

was 79.4% (standard deviation 8.2%). A different forty-nine queries presented

pairs of unrelated irises. The average percent correct on those queries was 83.3%

(standard deviation 6.5%). We used a paired t-test to evaluate the null hypothesis

that the scores on the twin queries and the scores on the unrelated queries came

from distributions of equal mean. The resulting p-value was 0.059 which is not a

strongly significant result, but it does indicate some evidence to show that there

is a difference. It seems easier to label unrelated irises as unrelated.

We considered the same question for the periocular images. There were forty-

nine queries which presented pairs of twins’ periocular regions. The average per-

cent correct on those queries was 75.9% (standard deviation 6.8%). The average

percent correct on the unrelated periocular regions was 75.2% (standard deviation

8.5%). This difference was not significant (p-value 0.382).

133

5.6 Discussion

We have found that when presented with unlabeled twin and non-twin image

pairs in equal numbers, humans can classify pairs of twins with 81% accuracy

using only the appearance of the iris. Furthermore, humans can classify pairs of

twins with 76% accuracy using only the appearance of the periocular region. Our

testers achieved these results using only a three-second display of each image pair.

For the subset of the data where our testers felt more certain, the accuracy was

even better: 92% on the iris portion and 93% on the periocular portion.

For our data, the iris appearance was more valuable than the periocular ap-

pearance for identifying twin pairs. The pair of twin iris images most frequently

misclassified had noticeable difference in pupil size between the two images; this

suggests that it is likely easier to identify twins’ irises when the irises have similar

degrees of dilation. There is a small amount of evidence that it is easier to label

an unrelated iris pair as “unrelated” than it is to label a twin pair as “twins”, at

least for our data.

The majority of testers scored better on the second half of the iris test than the

first half, but the improvement was not statistically significant. Similarly, there

was no statistically significant evidence of learning on the periocular portions of

the test.

Our testers clearly performed well above random guessing. Therefore, we can

conclude that there are similarities in twin iris texture that untrained human

testers can detect. We anticipate that humans can surpass the performance re-

ported in this paper, if given a longer time to study the images and if given the

entire eye image rather than the iris or periocular region alone. This suggests that

human examination of pairs of iris images for forensic purposes may be feasible.

134

Our results also suggest that development of different approaches to automated

iris image analysis may be useful.

135

CHAPTER 6

PERIOCULAR BIOMETRICS

The previous chapters investigated additional information that we could use in

the iris region. Iris biometrics systems typically disregard all information outside

the iris region when making decisions about identity. This chapter investigates

what additional information we can gain from the periocular region.

6.1 Motivation

The periocular region is the part of the face immediately surrounding the eye.

While the face and the iris have both been studied extensively as biometric char-

acteristics, the use of the periocular region for a biometric system is an emerging

field of research. Periocular biometrics could potentially be combined with iris

biometrics to obtain a more robust system than iris biometrics alone. If an iris

biometrics system captured an image where the iris image was poor quality, the

region surrounding the eye might still be used to confirm or refute an identity.

A further argument for researching periocular biometrics is that current iris

biometric systems already capture images containing some periocular information,

yet when making recognition decisions, they ignore all pixel information outside

the iris region. The periocular area of the image may contain useful information

that could improve recognition performance, if we could identify and extract useful

features in that region.

136

A few papers [1, 84, 91, 121] have presented algorithms for periocular recog-

nition, but their approaches have relied on general computer vision techniques

rather than methods specific to this biometric characteristic. One way to begin

designing algorithms specific to this region of the face is to examine how humans

make recognition decisions using the periocular region.

Other computational vision problems have benefitted from a good understand-

ing of the human visual system. In a recent book chapter, O’Toole [20] says,

“Collaborative interactions between computational and psychological approaches

to face recognition have offered numerous insights into the kinds of face represen-

tations capable of supporting the many tasks humans accomplish with faces” [20].

Sinha et al. [110] describe numerous basic findings from the study of human face

recognition that have direct implications for the design of computational systems.

Their report says “The only system that [works] well in the face of [challenges

like sensor noise, viewing distance, and illumination] is the human visual system.

It makes eminent sense, therefore, to attempt to understand the strategies this

biological system employs, as a first step towards eventually translating them into

machine-based algorithms” [110].

In this study, we investigated which features humans found useful for making

decisions about identity based on periocular information. We presented pairs

of periocular images to testers and asked them to determine whether the two

images were from the same person or from different people. We also asked them

to describe what features in the images were helpful to them in making their

decisions. We found that the features that humans found most helpful were not

the features that the current periocular biometrics work uses. Based on this study,

we anticipate that explicit modeling and description of eyelids, eyelashes, and tear

137

ducts could yield more recognition power than the current periocular biometrics

algorithms published in the literature.

The rest of this chapter is organized as follows. Section 6.2 summarizes the

previous work in periocular biometrics. Section 6.3 describes how we selected

and pre-processed eye images for our experiment. Our experimental method is

outlined in Section 6.4. Section 6.5 presents our analysis. Finally, Section 6.6

presents a summary of our findings and a discussion of the implications of our

experiment.

6.2 Related Work

As mentioned above, face recognition and iris recognition have both been re-

searched extensively [16, 125]. In contrast, the field of periocular biometrics is

in its infancy, and only a few authors have published in the area. A pioneering

paper by Park et al. [91] presented a feasibility study for the use of the periocular

biometrics. The authors implemented two methods for analyzing the periocular

region. In their “global method”, they used the location of the iris as an anchor

point. They defined a grid around the iris and computed gradient orientation

histograms and local binary patterns for each point in the grid. They quantized

both the gradient orientation and the local binary patterns (LBP) into eight dis-

tinct values to build an eight-bin histogram, and then used Euclidean distance

to evaluate a match. Their “local method” involved detecting key points using

a SIFT matcher. They collected a database of 899 high-resolution visible-light

face images from 30 subjects. A face matcher gave 100% rank-one recognition for

these images, and the matcher that used only the periocular region gave 77%.

Another paper by Miller et al. also used LBP to analyze the periocular re-

138

gion [84]. They used visible-light face images from the Facial Recognition Grand

Challenge (FRGC) data and the Facial Recognition Technology (FERET) data.

The periocular region was extracted from the face images using the provided eye

center coordinates. Miller et al. extracted the LBP histogram from each block in

the image and used City Block distance to compare the information from two im-

ages. They achieved 89.76% rank-one recognition on the FRGC data, and 74.07%

on the FERET data.

Adams et al. [1] also used LBP to analyze periocular regions from the FRGC

and FERET data, but they trained a genetic algorithm to select the subset of

features that would be best for recognition. The use of the genetic algorithm

increased accuracy from 89.76% to 92.16% on the FRGC data. On the FERET

dataset, the accuracy increased from 74.04% to 85.06%.

While Park et al., Miller et al., and Adams et al. all used datasets of visible-

light images, Woodard et al. [121] performed experiments using near-infrared

(NIR) light images from the Multi-Biometric Grand Challenge (MBGC) portal

data. The MBGC data shows NIR images of faces, using sufficiently high res-

olution that the iris could theoretically be used for iris recognition. However,

the portal data is a challenging data set for iris analysis because the images are

acquired while a subject is in motion, and several feet away from the camera.

Therefore, the authors proposed to analyze both the iris and the periocular re-

gion, and fuse information from the two biometric modalities. From each face,

they cropped a 601x601 image of the periocular region. Their total data set con-

tained 86 subjects’ right eyes and 88 subjects’ left eyes. Using this data, the

authors analyzed the iris texture using a traditional Daugman-like algorithm [28],

and they analyzed the periocular texture using LBP. The periocular identification

139

performed better than the iris identification, and the fusion of the two modalities

performed best.

One difference between our work and the above mentioned papers is the target

data type (Table 6.1). The papers above all used periocular regions cropped from

face data. Our work uses near infrared images of a small periocular region, from

the type of image we get from iris cameras. The anticipated application is to use

periocular information to assist in iris recognition when iris quality is poor.

Another difference between our work and the above work is the development

strategy. The papers mentioned above used gradient orientation histograms, local

binary patterns, and SIFT features. These authors have followed a strategy of

applying common computer vision techniques to analyze images. We attempt to

approach periocular recognition from a different angle. We aim to investigate the

features that humans find most useful for recognition in near infrared images of

the periocular region.

6.3 Data

In selecting our data, we considered using eye images taken from two different

cameras: an LG2200 and an LG4000 iris camera. The LG2200 is an older model,

and the images taken with this camera sometimes have undesirable interlacing or

lighting artifacts [15]. On the other hand, in our data sets, the LG4000 images

seemed to show less periocular data around the eyes. Since our purpose was to

investigate features in the periocular region, we chose to use the LG2200 images so

that the view of the periocular region would be larger. We hand-selected a subset

of images, choosing images in good focus, with minimal interlacing and shadow

artifacts. We also favored images that included both the inner and outer corners

140

TABLE 6.1

PERIOCULAR RESEARCH

Paper Data Algorithm Features

Park 899 visible light Gradient orientation Eye region with

[91] face images histograms width:

30 subjects Local binary patterns 6*iris-radius

Euclidean distance height:

SIFT matcher 4*iris-radius

Miller FRGC data and Local binary patterns Skin

[84] FERET data: City block distance

visible light

face images

464 subjects

Adams Same as Miller et al. Local binary patterns Skin

[1] Genetic algorithm

Woodard MBGC data: near Local binary patterns Skin

[121] infrared face images, Result fused with iris

88 subjects matching results

This work Near infrared iris Human analysis Eyelashes,

images from LG 2200 Tear duct,

camera, 120 subjects Eyelids, and

Shape of eye

141

of the eye.

We selected images from 120 different subjects. We had 60 male subjects and

60 female subjects. 108 of them were Caucasian and 12 were Asian. For 40 of

the subjects, we selected two images of an eye and saved the images as a “match”

pair. In each case, the two images selected were acquired at least a week apart.

For the remaining subjects, we selected one image of an eye, paired it with an

image from another subject, and saved it as a “nonmatch” pair. Thus, the queries

that we would present to our testers involved 40 match pairs, and 40 nonmatch

pairs. All queries were either both left eyes, or both right eyes.

Our objective was to examine how humans analyzed the periocular region.

Consequently, we did not want the iris to be visible during our tests. To locate

the iris in each image, we used our automatic segmentation software, which uses

active contours to find the iris boundaries. Next, we hand-checked all of the

segmentations. If our software had made an error in finding the inner or outer iris

boundary, we manually marked the center and a point on the boundary to identify

the correct center and radius of an appropriate circle. If the software had made

an error in finding the eyelid, we marked four points along the boundary to define

three line segments approximating the eyelid contour. For all of the images, we

set the pixels inside the iris/pupil region to black.

6.4 Experimental Method

In order to determine which features in the periocular region were most helpful

to the human visual system, we designed an experiment to present pairs of eye

images to volunteers and ask for detailed responses. We designed a graphical

user interface to display our images. At the beginning of a session, the computer

142

displayed two example pairs of eye images to the user. The first pair showed two

images of a subject’s eye, taken on different days. The second pair showed eye

images from two different subjects. Next, the GUI displayed the test queries. In

each query, we displayed a pair of images and asked the user to respond whether

he or she thought the two images were from the same person or from different

people. In addition, he could note his level of confidence in his response – whether

he was “certain” of his response, or only thought that his response was “likely”

the correct answer. The user was further asked to rate a number of features

depending on whether each feature was “very helpful”, “helpful”, or “not helpful”

for determining identity. The features listed were “eye shape”, “tear duct”1, “outer

corner”, “eyelashes”, “skin, “eyebrow”, “eyelid, and “other”. If a user marked that

some “other” feature was helpful, he was asked to enter what feature(s) he was

referring to. A final text box on the screen asked the user to describe any other

additional information that he used while examining the eye images.

Users did not have any time limit for examining the images. After the user

had classified the pair of images as “same person” or “different people” and rated

all features, then he could click “Next” to proceed. At that point the user was

told whether he had correctly classified the pair of images. Then, the next query

was displayed. All users viewed the same eighty pairs of images, although they

were presented in a different random order for each user.

We solicited volunteers to participate in our experiment and 25 people signed

up to serve as testers in our experiment. Most testers responded to all of the

queries in about 35 minutes. The fastest tester took about 25 minutes, and the

slowest took about an hour and 40 minutes. They were offered ten dollars for

1We used the term “tear duct” informally in this instance to refer to the region near theinner corner of the eye. A more appropriate term might be “medial canthus” but we did notexpect the volunteers in our experiment to know this term.

143

participation and twenty dollars if they classified at least 95% of pairs correctly.

6.5 Results

6.5.1 How Well Can Humans Determine whether Two Periocular Images Are

from the Same Person or Not?

To find an overall accuracy score, we counted the number of times the tester

was “likely” or “certain” of the correct response; that is, we made no distinction

based on the tester’s confidence level, only on whether they believed a pair to be

from the same person, or believed a pair to be from different people. We divided

the number of correct responses by 80 (the total number of queries) to yield an

accuracy score. The average tester classified about 74 out of 80 pairs correctly,

which is about 92% (standard deviation 4.6%). The minimum score was 65 out

of 80 (81.25%) and the maximum score was 79 out of 80 (98.75%)

6.5.2 Did Humans Score Higher when They Felt More Certain?

As mentioned above, testers had the option to mark whether they were “cer-

tain” of their response or whether their response was merely “likely” to be correct.

Some testers were more “certain” than others. One responded “certain” for 70

of the 80 queries. On the other hand, one tester did not answer “certain” for

any queries. Discounting the tester who was never certain, the average score on

the questions where testers were certain was 97% (standard deviation 5.2%). The

average score when testers were less certain was 84% (standard deviation 11%).

Therefore, testers obviously did better on the subset of the queries where they felt

“certain” of their answer.

144

6.5.3 Did Testers Do Better on the Second Half of the Test than the First Half?

The average score on the first forty queries for each tester was 92.2%. The av-

erage score on the second forty queries was 92.0%. Therefore, there is no evidence

of learning between the first half of the test and the second.

6.5.4 Which Features Are Correlated with Correct Responses?

The primary goal of our experiment was to determine which features in the

periocular region were most helpful to the human visual system when making

recognition decisions. Specifically, we are interested in features present in near-

infrared images of the type that can be obtained by a typical iris camera. To best

answer our question, we only used responses from cases where the tester correctly

determined whether the image pair was from same person. From these responses,

we counted the number of times each feature was “very helpful” to the tester,

“helpful”, or “not helpful”. A bar chart of these counts is given in Figure 6.1.

The features in this figure are sorted by the number of times each feature was

regarded as “very helpful”. According to these results, the most helpful feature

was eyelashes, although tear duct and eye shape were also very helpful. The

ranking from most helpful to least helpful was (1) eyelashes, (2) tear duct, (3) eye

shape, (4) eyelid, (5) eyebrow, (6) outer corner, (7) skin, and (8) other.

Other researchers have found eyebrows to be more useful than eyes in identi-

fying famous people [110], so the fact that eyebrows were ranked fifth out of eight

is perhaps deceiving. The reason eyebrows received such a low ranking in our

experiment is that none of the images showed a complete eyebrow. In about forty

queries, the two images both showed some part of the eyebrow, but in the other

forty queries, the eyebrow was outside the image field-of-view in at least one of

145

Eyelashes Tear Duct Eyeshape Eyelid Eyebrow Outer Corner Skin Other 0

200

400

600

800

1000

1200

1400

1600

1800 Rated Helpfulness of Features from Correct Responses

Num

ber

of R

espo

nses

Very HelpfulHelpfulNot Helpful

Figure 6.1: Eyelashes were considered the most helpful feature for making deci-sions about identity. The tear duct and shape of the eye were also very helpful.

the images in the pair. On images with a larger field of view, eyebrows could be

significantly more valuable. We suggest that iris sensors with a larger field of view

would be more useful when attempting to combine iris and periocular biometric

information.

The low ranking for “outer corner” (sixth out of eight) did not surprise us,

because in our own observation of a number of eye images, the outer corner does

not often provide much unique detail for distinguishing one eye from another.

There were a three queries where the outer corner of the eye was not visible in

the image.

Skin ranked seventh out of eight in our experiment, followed only by “other”.

Part of the reason for the low rank of this feature is that the images were all near-

infrared images. Therefore, testers could not use skin color to make their decisions.

This result may not be quite as striking if we used a data set containing a greater

diversity of ethnicities. However, we have noticed that variations in lighting can

make light skin appear dark in a near-infrared image, suggesting that overall

146

intensity in the skin region may have greater intra-class variation than inter-class

variation in these types of images.

6.5.5 Which Features Are Correlated with Incorrect Responses?

In addition to considering which features were marked most helpful for correct

responses, we also looked at how features were rated when testers responded in-

correctly. For all the incorrectly answered queries, we counted the number of times

each feature was “very helpful”, “helpful”, or “not helpful”. A bar chart of these

counts is given in Figure 6.2. We might expect to have a similar rank ordering for

the features in the incorrect queries as we had for the correct queries, simply be-

cause if certain features are working well for identification, a tester would tend to

continue to use the same features. Therefore, rather than focusing on the overall

rank order of the features, we considered how the feature rankings differed from

the correct responses to the incorrect responses. The ranking from most helpful

feature to least helpful feature for the incorrect queries was (1) eye shape, (2) tear

duct, (3) eyelashes, (4) outer corner, (5) eyebrow, (6) eyelid, (7) skin, and (8)

other. Notice that “eye shape” changed from rank three to rank one. Also “outer

corner” changed from rank six to rank four. This result implies that eye shape and

outer corner are features that are less valuable for correct identification. On the

other hand, “eyelashes” and “eyelid” both changed rank in the opposite direction,

implying that those features are more valuable for correct identification.

6.5.6 What Additional Information Did Testers Provide?

In addition to the specific features that testers were asked to rate, testers were

also asked to describe other factors they considered in making their decisions.

147

Eyeshape Tear Duct Eyelashes Outer Corner Eyebrow Eyelid Skin Other 0

50

100

150 Rated Helpfulness of Features from Incorrect Responses

Num

ber

of R

espo

nses

Very HelpfulHelpfulNot Helpful

Figure 6.2: We compared the rankings for the features from correct responses(Fig. 6.1) with the rankings from incorrect responses. The shape of the eye andthe outer corner of the eye were both used more frequently on incorrect responsesthan on correct responses. This result suggests that those two features would beless helpful than other features such as eyelashes.

Testers were prompted to “explain what features in the image were most useful

to you in making your decision”, and enter their response in a text box.

Testers found a number of different traits of eyelashes valuable. They consid-

ered the density of eyelashes (or number of eyelashes), eyelash direction, length,

and intensity (light vs. dark). Clusters of eyelashes, or single eyelashes pointing

in an unusual direction were helpful, too. Contacts were helpful as a “soft bio-

metric”. That is, the presence of a contact lens in both images could be used

as supporting evidence that the two images were of the same eye. However, no

testers relied on contacts as a deciding factor. One of the eighty queries showed

two images of the same eye where one image showed a contact lens, and the other

did not. Make-up was listed both as “very helpful” for some queries, and as “mis-

leading” for other queries. One of the eighty queries showed a match pair where

only one of the images displayed make-up.

148

Figure 6.3. All 25 testers correctly classified these two images as beingfrom the same person.

6.5.7 Which Pairs Were Most Frequently Classified Correctly, and Which Pairs

Were Most Frequently Classified Incorrectly?

There were 21 match pairs that were classified correctly by all testers. One

example of a pair that was classified correctly by all testers is shown in Figure 6.3.

There were 12 nonmatch pairs classified correctly by all testers. An example is

shown in Figure 6.4.

Figure 6.5 shows the match pair most frequently classified incorrectly. Eleven

of the 25 testers mistakenly thought that these two images were from different

people. This pair is challenging because the eye is wide open in one of the images,

but not it the other. Figure 6.6 shows the nonmatch pair most frequently classified

incorrectly. This pair was also misclassified by 11 testers, although the set of 11

testers who responded incorrectly for the pair in Figure 6.6 was different from the

set of testers who responded incorrectly for Figure 6.5.

149

Figure 6.4. All 25 testers correctly classified these two images as beingfrom different people

Figure 6.5. Eleven of 25 people incorrectly guessed that these imageswere from different people, when in fact, these eyes are from the same

person. This pair is challenging because one eye is much more open thanthe other.

150

Figure 6.6. Eleven of 25 people incorrectly guessed that these imageswere from the same person, when in fact, they are from two different

people.

6.6 Discussion

We have found that when presented with unlabeled pairs of periocular images

in equal numbers, humans can classify the pairs as “same person” or “different

people” with an accuracy of about 92%. When expressing confident judgement,

the accuracy is about 97%. We compared scores on the first half of the test to the

second half of the test and found no evidence of learning as the test progressed.

In making their decisions, testers reported that eyelashes, tear ducts, shape

of the eye, and eyelids were most helpful. However, eye shape was used in a

large number of incorrect responses. Both eye shape and the outer corner of the

eye were used a higher proportion of the time for incorrect responses than they

were for correct responses, thus those two features might not be as useful for

recognition. Eyelashes were helpful in a number of ways. Testers used eyelash

intensity, length, direction, and density. They also looked for groups of eyelashes

that clustered together, and for single eyelashes separated from the others. The

presence of contacts was used as a soft biometric. Eye make-up was helpful in

151

some image pairs, and distracting in others. Changes in lighting were challenging,

and large differences in eye occlusion were also a challenge.

Our analysis suggests some specific ways to design powerful periocular biomet-

rics systems. We expect that a biometrics system that explicitly detects eyelids,

eyelashes, the tear duct and the entire shape of the eye could be more powerful

than some of the skin analysis methods presented previously.

The most helpful feature in our study was eyelashes. In order to analyze the

eyelashes, we first would locate and detect the eyelids. Eyelids can be detected

using edge detection and Hough transforms [69, 120], a parabolic “integrodiffer-

ential operator” [28], or active contours [104]. The research into eyelid detection

has primarily been aimed at detecting and disregarding the eyelids during iris

recognition, but we suggest detecting and describing eyelids and eyelashes to aid

in identification. Feature vectors describing eyelashes could include measures for

the density of eyelashes along the eyelid, the uniformity of direction of the eye-

lashes, and the curvature and length of the eyelashes. We could also use metrics

comparing the upper and lower lashes.

The second most helpful feature in our study was the tear duct region. Once

we have detected the eyelids, we could extend those curves to locate the tear duct

region. This region should more formally be referred to as the medial canthus. A

canthus is the angle or corner on each side of the eye, where the upper and lower

lids meet. The medial canthus is the inner corner of the eye, or the corner closest

to the nose. Two structures are often visible in the medial canthus, the lacrimal

caruncle and the plica semilunaris [89]. These two features typically have lower

contrast than eyelashes and iris. Therefore, they would be harder for a computer

vision algorithm to identify, but if they were detectable, the sizes and shapes of

152

these structures would be possible features. Detecting the medial canthus itself

would be easier than detecting the caruncle and plica semilunaris, because the

algorithm could follow the curves of the upper and lower eyelids until they meet

at the canthus. Once detected, we could measure the angle formed by the upper

and lower eyelids and analyze how the canthus meets the eyelids. In Asians, the

epicanthal fold may cover part of the medial canthus [89] so that there is a smooth

line from the upper eyelid to the inner corner of the eye (e.g. Figure 6.3). The

epicanthal fold is present in fetuses of all races, but in Caucasians it has usually

disappeared by the time of birth [89]. Therefore, Caucasian eyes are more likely

to have a distinct cusp where the medial canthus and upper eyelid meet (e.g.

Figure 6.5).

The shape of the eye has potential to be helpful, but the term “eye shape” is

ambiguous, which might explain the seemingly contradictory results we obtained

about the helpfulness of this particular feature. To describe the shape of the eye,

we could analyze the curvature of the eyelids. We could also detect the presence

or absence of the superior palpebral furrow – the crease in the upper eyelid – and

measure its curvature if present.

Previous periocular research has focused on texture and key points in the area

around the eye. The majority of prior work [1, 84, 121] masked an elliptical region

in the middle of the periocular region “to eliminate the effect of textures in the

iris and the surrounding sclera area” [84]. This mask effectively occludes a large

portion of the eyelashes and tear duct region, thus hiding the features that we

find are most valuable. Park et al. [91] do not mask the eye, but they also do not

do any explicit feature modeling beyond detecting the iris. These promising prior

works have all shown recognition rates at or above 77%. However, we suggest that

153

there is potential for greater recognition power by considering additional features.

154

CHAPTER 7

CONCLUSIONS

In this dissertation, we presented methods of reducing error rates and increas-

ing the applicability of eye biometrics. Our work was the first to propose the

fragile bit distance metric. We introduced this metric and proposed fusing Ham-

ming distance with fragile bit distance. This optimization reduced the equal error

rate by eight percent on a data set of 19,891 iris images. Our second optimization

fused frames from video to create average images. We tested this method on a

data set of 983 iris videos. Using average images from video reduced the equal er-

ror rate by 8.6× 10−3 when compared with using single frames. In comparing the

proposed average images method to a multi-gallery method, we reduced the equal

error rate by 5.6×10−4 while using only one-tenth the matching time required for

the multi-gallery method.

To increase the applicability of eye biometrics, we investigated what additional

information was present in eye images that current algorithms do not detect. By

looking at iris data from identical twins, we showed that there is genetically-related

information present in iris texture that existing iris biometrics algorithms do not

capture. Our work is the first work to experimentally document that people can

reliably distinguish images of twins’ irises from images of unrelated persons’ irises.

Finally, using images from the periocular region, we showed that the features most

useful to humans in that region are not the features that current systems use for

155

analyzing that region. As future work, we hope to develop automated algorithms

that can detect and describe the eyelashes, eyelids, and tear duct to identify people

based on that periocular information.

156

BIBLIOGRAPHY

1. Joshua Adams, Damon L. Woodard, Gerry Dozier, Philip Miller, KelvinBryant, and George Glenn. Genetic-based type II feature extraction forperiocular biometric recognition: Less is more. Proc. Int. Conf. on PatternRecognition, 2010. to appear.

2. Kelli Arena and Carol Cratty. FBI wants palm prints, eye scans, tattoomapping. CNN.com, Feb 2008. http://www.cnn.com/2008/TECH/02/04/fbi.biometrics/, accessed July 2009.

3. Sarah Baker, Kevin W. Bowyer, and Patrick J. Flynn. Empirical evidencefor correct iris match score degradation with increased time-lapse betweengallery and probe matches. Proc. Int. Conf. on Biometrics (ICB2009), pages1170–1179, 2009.

4. Sarah Baker, Amanda Hentz, Kevin W. Bowyer, and Patrick J. Flynn. Con-tact lenses: Handle with care for iris recognition. Proc. IEEE Int. Conf. onBiometrics: Theory, Applications, and Systems, pages 1–8, Sept 2009.

5. Lucas Ballard, Seny Kamara, and Michael K. Reiter. The practical subtletiesof biometric key generation. 17th USENIX Security Symposium, pages 61–74,2008.

6. Nakissa Barzegar and M. Shahram Moin. A new user dependent iris recog-nition system based on an area preserving pointwise level set segmentationapproach. EURASIP Journal on Advances in Signal Processing, pages 1–13,2009.

7. Craig Belcher and Yingzi Du. Feature information based quality measurefor iris recognition. Proc. IEEE International Conference on Systems, Man,and Cybernetics, pages 3339–3345, Oct 2007.

8. Craig Belcher and Yingzi Du. A selective feature information approach foriris image-quality measure. IEEE Transactions on Information Forensicsand Security, 3(3):572–577, Sept 2008.

157

http://www.cnn.com/2008/TECH/02/04/fbi.biometrics/

http://www.cnn.com/2008/TECH/02/04/fbi.biometrics/

9. Craig Belcher and Yingzi Du. Region-based sift approach to iris recognition.Optics and Lasers in Engineering, 47:139–147, 2009.

10. A. Bertillon. La couleur de l’iris. Revue scientifique, 36(3):65–73, 1885.

11. Rajesh M. Bodade and Sanjay N. Talbar. Shift invariant iris feature extrac-tion using rotated complex wavelet and complex wavelet for iris recognitionsystem. Proc. 2009 Seventh International Conference on Advances in PatternRecognition, pages 449–452, 2009.

12. Vishnu Naresh Boddeti and B.V.K. Vijaya Kumar. Extended-depth-of-fieldiris recognition using unrestored wavefront-coded imagery. IEEE Transac-tions on Systems, Man, and Cybernetics - Part A: Systems and Humans,40(3):495–508, May 2010.

13. Ruud M. Bolle, Sharath Pankanti, Jonathon H. Connell, and Nalini Ratha.Iris individuality: A partial iris model. Proc. Int. Conf. on Pattern Recogni-tion, pages II: 927–930, 2004.

14. Kevin W. Bowyer, Kyong I. Chang, Ping Yan, Patrick J. Flynn, EarnieHansley, and Sudeep Sarkar. Multi-modal biometrics: an overview. Proc.Second Workshop on Multi-Modal User Authentication, pages 1–8, May 2006.Toulouse, France.

15. Kevin W. Bowyer and Patrick J. Flynn. The ND-IRIS-0405 iris imagedataset. Technical report, University of Notre Dame, 2009. http://www.nd.edu/∼cvrl/papers/ND-IRIS-0405.pdf.

16. Kevin W. Bowyer, Karen P. Hollingsworth, and Patrick J. Flynn. Imageunderstanding for iris biometrics: A survey. Computer Vision and ImageUnderstanding, 110(2):281–307, 2008.

17. Christopher Boyce, Arun Ross, Matthew Monaco, Lawrence Hornak, and XinLi. Multispectral iris analysis: A preliminary study. Proc. IEEE ComputerVision and Pattern Recognition Workshops, pages 1–9, Jun 2006.

18. J. Bringer, H. Chabanne, G. Cohen, B. Kindarji, and G. Zemor. Optimal irisfuzzy sketches. Proc. IEEE Int. Conf. on Biometrics: Theory, Applications,and Systems, pages 1–6, Sept 2007.

19. Julien Bringer, Herve Chabanne, Gerard Cohen, Bruno Kindarji, and GillesZemor. Theoretical and practical boundaries of binary secure sketches. IEEETransactions on Information Forensics and Security, 3(4):673–683, 2008.

158

http://www.nd.edu/~cvrl/papers/ND-IRIS-0405.pdf

http://www.nd.edu/~cvrl/papers/ND-IRIS-0405.pdf

20. A. Calder and G. Rhodes, editors. Handbook of Face Perception, chapterCognitive and Computational Approaches to Face Perception by O’Toole.Oxford University Press, 2010. in press.

21. Kyong I. Chang, Kevin W. Bowyer, and Patrick J. Flynn. An evaluation ofmulti-modal 2D+3D face biometrics. IEEE Transactions on Pattern Analysisand Machine Intelligence, 27(4):619–624, Apr 2005.

22. Mayur Datar, Nicole Immorlica, Piotr Indyk, and Vahab S. Mirrokni.Locality-sensitive hashing scheme based on p-stable distributions. Proceed-ings of the Twentieth Annual Symposium on Computational Geometry, pages253–262, 2004.

23. John Daugman. Absorption spectrum of melanin. http://www.cl.cam.ac.uk/∼jgd1000/melanin.html, accessed July 2009.

24. John Daugman. Introduction to iris recognition. http://www.cl.cam.ac.uk/∼jgd1000/iris recognition.html, accessed Jun 2010.

25. John Daugman. High confidence visual recognition of persons by a testof statistical independence. IEEE Transactions on Pattern Analysis andMachine Intelligence, 15(11):1148–1161, Nov 1993.

26. John Daugman. Biometric personal identification system based on iris anal-ysis. U.S. Patent No. 5,291,560, Mar 1994.

27. John Daugman. Biometric decision landscapes. Technical Report UCAM-CL-TR-482, University of Cambridge Computer Laboratory, 2000.

28. John Daugman. How iris recognition works. IEEE Transactions on Circuitsand Systems for Video Technology, 14(1):21–30, 2004.

29. John Daugman. New methods in iris recognition. IEEE Transactions onSystems, Man and Cybernetics - B, 37(5):1167–1175, Oct 2007.

30. John Daugman. United Arab Emirates deployment of iris recognition,http://www.cl.cam.ac.uk/jgd1000/deployments.html, accessed Jan 2009.

31. John Daugman and Cathryn Downing. Epigenetic randomness, complexityand singularity of human iris patterns. Proceedings of the Royal Society ofLondon - B, 268:1737–1740, 2001.

32. John Daugman and Cathryn Downing. Effect of severe image compression oniris recognition performance. IEEE Transactions on Information Forensicsand Security, 3(1):52–61, March 2008.

159

http://www.cl.cam.ac.uk/~jgd1000/melanin.html

http://www.cl.cam.ac.uk/~jgd1000/melanin.html

http://www.cl.cam.ac.uk/~jgd1000/iris_recognition.html

http://www.cl.cam.ac.uk/~jgd1000/iris_recognition.html

http://www.cl.cam.ac.uk/jgd1000/deployments.html

33. George Davida, Yair Frankel, and Brian Matt. On enabling secure applica-tions through off-line biometric identification. IEEE Symposium on Securityand Privacy, pages 148–157, 1998.

34. G. Doddington, W. Liggett, A. Martin, M Przybocki, and D. Reynolds.Sheep, goats, lambs, and wolves: A statistical analysis of speaker perfor-mance in the NIST 1998 speaker recognition evaluation. 5th InternationalConference on Spoken Language Processing, pages 1–4, 1998. Sydney, Aus-tralia.

35. Yevgeniy Dodis, Rafail Ostrovsky, Leonid Reyzin, and Adam Smith. Fuzzyextractors: How to generate strong keys from biometrics and other noisydata. SIAM Journal of Computing, 38(1):97–139, 2008.

36. Yevgeniy Dodis, Leonid Reyzin, and Adam Smith. Advances in Cryptology- EUROCRYPT, chapter 13: Fuzzy Extractors: How to Generate StrongKeys from Biometrics and Other Noisy Data, pages 523–540. SpringerBerlin/Heidelberg, 2004.

37. Gerry Dozier, David Bell, Leon Barnes, and Kelvin Bryant. Refining iristemplates via weighted bit consistency. Proc. Midwest Artificial Intelligenceand Cognitive Science (MAICS) Conference, pages 1–5, Apr 2009. FortWayne, Indiana.

38. Gerry Dozier, Kurt Frederiksen, Robert Meeks, Marios Savvides, KelvinBryant, Darlene Hopes, and Taihei Munemoto. Minimizing the number ofbits needed for iris recognition via bit inconsistency and grit. Proc. IEEEWorkshop on Computational Intelligence in Biometrics: Theory, Algorithms,and Applications, pages 30–37, Apr 2009.

39. Yingzi Du. Using 2D log-Gabor spatial filters for iris recognition. Proc. SPIE6202: Biometric Technology for Human Identification III, pages 62020:F1–F8, 2006.

40. Yingzi Du, Robert W. Ives, Delores M. Etter, and Thad B. Welch. Useof one-dimensional iris signatures to rank iris pattern similarities. OpticalEngineering, 45(3):037201–1 – 037201–10, 2006.

41. Leonard Flom and Aran Safir. Iris recognition system. U.S. Patent 4,641,349,1987.

42. Karen Gomm. Passport agency: ‘Iris recognition needs work’. ZDNet UK,Oct 2005. http://news.zdnet.co.uk/emergingtech/0,1000000183,39232694,00.htm, accessed July 2009.

160

http://news.zdnet.co.uk/emergingtech/0,1000000183,39232694,00.htm

http://news.zdnet.co.uk/emergingtech/0,1000000183,39232694,00.htm

43. Jaap Haitsma and Ton Kalker. A highly robust audio fingerprinting systemwith an efficient search strategy. Journal of New Music Research, 32(2):211–221, June 2003.

44. Feng Hao, Ross Anderson, and John Daugman. Combining crypto withbiometrics effectively. IEEE Transactions on Computers, 55(9):1081–1088,Sept 2006.

45. Feng Hao, John Daugman, and Piotr Zielinski. A fast search algorithm fora large fuzzy database. IEEE Transactions on Information Forensics andSecurity, 3(2):203 –212, June 2008.

46. Karen Harmel. Walt disney world: The government’s tomorrowland?News21, Sept 2006. http://news21project.org/story/2006/09/01/waltdisney world the governments, accessed July 2009.

47. Xiaofu He, Jingqi Yan, Guangyu Chen, and Pengfei Shi. Contactless aut-ofeedback iris capture design. IEEE Transactions on Instrumentation andMeasurement, 57(7):1369–1375, Jul 2008.

48. Zhaofeng He, Zhenan Sun, Tieniu Tan, and Xianchao Qiu. Enhanced usabil-ity of iris recognition via efficient user interface and iris image restoration.Proc. 15th IEEE Int. Conf. on Image Processing (ICIP2008), pages 261–264,Oct 2008.

49. Sean Henahan. The eyes have it. Access Excellence, Jun 2002. http://www.accessexcellence.org/WN/SU/irisscan.php, accessed July 2009.

50. Karen Hollingsworth. Sources of error in iris biometrics. Master’s thesis,University of Notre Dame, 2008.

51. Karen Hollingsworth, Kevin W. Bowyer, and Patrick J. Flynn. Similarity ofiris texture between identical twins. Computer Vision and Pattern Recogni-tion Biometrics Workshop, pages 1–8, June 2010.

52. Karen P. Hollingsworth, Kevin W. Bowyer, and Patrick J. Flynn. All iris codebits are not created equal. Proc. IEEE Int. Conf. on Biometrics: Theory,Applications, and Systems, pages 1–6, Sept 2007.

53. Karen P. Hollingsworth, Kevin W. Bowyer, and Patrick J. Flynn. The bestbits in an iris code. IEEE Transactions on Pattern Analysis and MachineIntelligence, 31(6):964–973, Jun 2009.

54. Karen P. Hollingsworth, Kevin W. Bowyer, and Patrick J. Flynn. Imageaveraging for improved iris recognition. Proc. Int. Conf. on Biometrics(ICB2009), pages 1112–1121, 2009.

161

http://news21project.org/story/2006/09/01/walt_disney_world_the_governments

http://news21project.org/story/2006/09/01/walt_disney_world_the_governments

http://www.accessexcellence.org/WN/SU/irisscan.php

http://www.accessexcellence.org/WN/SU/irisscan.php

55. Karen P. Hollingsworth, Kevin W. Bowyer, and Patrick J. Flynn. Pupildilation degrades iris biometric performance. Computer Vision and ImageUnderstanding, 113(1):150–157, 2009.

56. Karen P. Hollingsworth, Kevin W. Bowyer, and Patrick J. Flynn. Usingfragile bit coincidence to improve iris recognition. Proc. IEEE Int. Conf. onBiometrics: Theory, Applications, and Systems, pages 1–6, Sept 2009.

57. Karen P. Hollingsworth, Tanya Peters, Kevin W. Bowyer, and Patrick J.Flynn. Iris recognition using signal-level fusion of frames from video. IEEETransactions on Information Forensics and Security, 4(4):837–848, 2009.

58. Iris testing of returning afghans passes 200,000 mark. UNHCR The UNRefugee Agency, Oct 2003. http://www.unhcr.org/cgi-bin/texis/vtx/search?docid=3f86b4784, accessed July 2009.

59. Iris recognition for inmates. Tarrant County website, Jul 2004. http://www.tarrantcounty.com/esheriff/cwp/view.asp?a=792&q=437580, accessed June2010.

60. Iris recognition. http://en.wikipedia.org/wiki/Iris recognition, accessedMarch 2010.

61. ISO SC37 Harmonized Biometric Vocabulary (Standing Document 2 Version12). Technical report, International Standards Organization, Sept 2009.

62. Jail using new iris scanning system. KSBW.com, Jan 2006. http://www.ksbw.com/news/6403339/detail.html, accessed July 2009.

63. Anil K. Jain, Patrick Flynn, and Arun A Ross. Handbook of Biometrics,chapter Chapter 14: Introduction to Multibiometrics by Ross, Nandakumar,and Jain, pages 271–292. Springer, 2008.

64. R. Johnston. Can iris patterns be used to identify people? Los AlamosNational Laboratory, Chemical and Laser Sciences Division Annual ReportLA-12331-PR, Jun 1992. pages 81-86.

65. Ari Juels and Madhu Sudan. A fuzzy vault scheme. Designs, Codes, andCryptography, 38(2):237–257, 2006.

66. Ari Juels and Martin Wattenberg. A fuzzy commitment scheme. Proceedingsof the ACM Conference on Computer Communications Security, pages 28–36, 1999.

162

http://www.unhcr.org/cgi-bin/texis/vtx/search?docid=3f86b4784

http://www.unhcr.org/cgi-bin/texis/vtx/search?docid=3f86b4784

http://www.tarrantcounty.com/esheriff/cwp/view.asp?a=792&q=437580

http://www.tarrantcounty.com/esheriff/cwp/view.asp?a=792&q=437580

http://en.wikipedia.org/wiki/Iris_recognition

http://www.ksbw.com/news/6403339/detail.html

http://www.ksbw.com/news/6403339/detail.html

67. Nathan D. Kalka, Jinyu Zuo, Natalia A. Schmid, and Bojan Cukic. Esti-mating and fusing quality factors for iris biometrics images. IEEE Trans-actions on Systems, Man, and Cybernetics - Part A: Systems and Humans,40(3):509–524, May 2010.

68. Byung Jun Kang and Kang Ryoung Park. Real-time image restoration for irisrecognition systems. IEEE Transactions on Systems, Man, and Cybernetics–Part B, 37(6):1555–1566, Dec 2007.

69. Byung Jun Kang and Kang Ryoung Park. A robust eyelash detection basedon iris focus assessment. Pattern Recognition Letters, 28(13):1630–1639, Oc-tober 2007.

70. Josef Kittler and Norman Poh. Multibiometrics for identity authentication:Issues, benefits, and challenges. Proc. IEEE Int. Conf. on Biometrics: The-ory, Applications, and Systems, pages 1–6, Sept 2008.

71. N. Kollias. The spectroscopy of human melanin pigmentation. Melanin: ItsRole in Human Photoprotection, pages 31–38. Valdenmar Publishing Co.,1995.

72. Emine Krichen, Lorene Allano, Sonia Garcia-Salicetti, and BernadetteDorizzi. Specific texture analysis for iris recognition. Proc. Int. Conf. onAudio- and Video-Based Biometric Person Authentication (AVBPA 2005),pages 23–30, 2005.

73. L-1 identity solutions, understanding iris recognition. http://www.l1id.com/pages/383-science-behind-the-technology, accessed March 2010.

74. Yooyoung Lee, P. Jonathan Phillips, and Ross J. Michaels. An automatedvideo-based system for iris recognition. Proc. Int. Conf. on Biometrics(ICB2009), pages 1160–1169, 2009.

75. Youn Joo Lee, Kang Ryoung Park, Sung Joo Lee, Kwanghyuk Bae, and Jai-hie Kim. A new method for generating an invariant iris private key basedon the fuzzy vault system. IEEE Transactions on Systems, Man, and Cy-bernetics - Part B: Cybernetics, 38(5), October 2008.

76. LG IrisAccess 4000. http://www.lgiris.com/ps/products/irisaccess4000.htm,accessed Apr 2009.

77. Yung-hui Li and Marios Savvides. Fast and robust probabilistic inference ofiris mask. Proceedings of SPIE, page 730621, May 2009. vol 7306.

78. Chengqiang Liu and Mei Xie. Iris recognition based on DLDA. Proc. Int.Conf. on Pattern Recognition, pages 489–492, Aug 2006.

163

http://www.l1id.com/pages/383-science-behind-the-technology

http://www.l1id.com/pages/383-science-behind-the-technology

http://www.lgiris.com/ps/products/irisaccess4000.htm

79. Xiaomei Liu, Kevin W. Bowyer, and Patrick J. Flynn. Experiments withan improved iris segmentation algorithm. Proc. Fourth IEEE Workshop onAutomatic Identification Technologies, pages 118–123, Oct 2005.

80. Li Ma, Tieniu Tan, Yunhong Wang, and Dexin Zhang. Efficient iris recog-nition by characterizing key local variations. IEEE Transactions on ImageProcessing, 13(6):739–750, Jun 2004.

81. A. Martin, G. Doddington, T. Kamm, M. Ordowski, and M. Przybocki.The DET curve in assessment of detection task performance. Proc. 5thEuropean Conference on Speech Communication and Technology, pages 1895–1898, 1997.

82. J. R. Matey, O. Naroditsky, K. Hanna, R. Kolczynski, D. LoIacono, S. Man-gru, M. Tinker, T. Zappia, and W. Y. Zhao. Iris on the MoveTM: Acquisitionof images for iris recognition in less constrained environments. Proceedingsof the IEEE, 94(11):1936–1946, 2006.

83. The MathWorksTM. Image processing toolbox documentation. http://www.mathworks.com/access/helpdesk/help/toolbox/images/index.html. accessedJune 2009.

84. Phillip Miller, Allen Rawls, Shrinivas Pundlik, and Damon Woodard. Per-sonal identification using periocular skin texture. Proc. ACM 25th Sympo-sium on Applied Computing (SAC2010), pages 1496–1500, 2010.

85. Kazuyuki Miyazawa, Koichi Ito, Takafumi Aoki, and Koji Kobayashi. An ef-fective approach for iris recognition using phase-based image matching. IEEETransactions on Pattern Analysis and Machine Intelligence, 30(10):1741–1756, Oct 2008.

86. Taihei Munemoto, Yung hui Li, and Marios Savvides. “Hallucinating irises”- dealing with partial and occluded iris regions. Proc. IEEE Int. Conf. onBiometrics: Theory, Applications, and Systems, pages 1–6, Sept 2008.

87. R. Murphy. Dempster-shafer theory for sensor fusion in autonomous mobilerobots. IEEE Trans. Robot. Autom., 14(2):197–206, 1998.

88. Elaine M. Newton and P. Johnathan Phillips. Meta-analysis of third-partyevaluations of iris recognition. IEEE Transactions on Systems, Man, andCybernetics–Part A: Systems and Humans, 39(1):4–11, Jan 2009.

89. Clyde Oyster. The Human Eye Structure and Function. Sinauer Associates,1999.

164

http://www.mathworks.com/access/helpdesk/help/toolbox/images/index.html

http://www.mathworks.com/access/helpdesk/help/toolbox/images/index.html

90. Chul-Hyun Park and Joon-Jae Lee. Extracting and combining multimodaldirectional iris features. Int. Conf. on Biometrics (Springer LNCS 3832),pages 389–396, Jan 2006.

91. Unsang Park, Arun Ross, and Anil K. Jain. Periocular biometrics in thevisible spectrum: A feasibility study. Proc. IEEE Int. Conf. on Biometrics:Theory, Applications, and Systems, pages 1–6, Sept 2009.

92. Tanya Peters. Effects of segmentation routine and acquisition environmenton iris recognition. Master’s thesis, University of Notre Dame, 2009.

93. P. Jonathon Phillips. MBGC presentations and publications. http://face.nist.gov/mbgc/mbgc presentations.htm, Dec 2008.

94. P. Jonathon Phillips, Patrick J. Flynn, Todd Scruggs, Kevin W. Bowyer,and William Worek. Preliminary Face Recognition Grand Challenge results.Proc. Int. Conf. on Automatic Face and Gesture Recognition (FG 2006),pages 15–24, Apr 2006.

95. P. Jonathon Phillips, Todd Scruggs, Patrick J. Flynn, Kevin W. Bowyer,Ross Beveridge, Geoff Givens, Bruce Draper, and Alice O’Toole. Overviewof the multiple biometric grand challenge. Proc. Int. Conf. on Biometrics(ICB2009), pages 705–714, 2009.

96. Hugo Proenca and Luıs Alexandre. Toward noncooperative iris recognition:A classification approach using multiple signatures. IEEE Transactions onPattern Analysis and Machine Intelligence, 29(4):607–612, Apr 2007.

97. Shrinivas J. Pundlik, Damon L. Woodard, and Stanley T. Birchfield.Non-ideal iris segmentation using graph cuts. Computer Vision andPattern Recognition Workshops, IEEE Computer Society Conference on(CVPR2008), pages 1–6, June 2008.

98. Xianchao Qiu, Zhenan Sun, and Tieniu Tan. Global texture analysis ofiris images for ethnic classification. Springer LNCS 3832: Int. Conf. onBiometrics, pages 411–418, Jan 2006.

99. Soumyadip Rakshit and Donald M. Monro. An evaluation of image sam-pling and compression for human iris recognition. IEEE Transactions onInformation Forensics and Security, 2(3):605–612, Sept 2007.

100. Soumyadip Rakshit and Donald M. Monro. Medical conditions: Effect oniris recognition. Proc. IEEE 9th Workshop on Multimedia Signal Processing(MMSP), pages 357–360, Oct 2007.

165

http://face.nist.gov/mbgc/mbgc_presentations.htm

http://face.nist.gov/mbgc/mbgc_presentations.htm

101. Sarah Ring and Kevin W. Bowyer. Detection of iris texture distortions byanalyzing iris code matching results. Proc. IEEE Int. Conf. on Biometrics:Theory, Applications, and Systems, pages 1–6, Sept 2008.

102. Roberto Roizenblatt, Paulo Schor, Fabio Dante, Jaime Roizenblatt, andRubens Belfort Jr. Iris recognition as a biometric method after cataractsurgery. Biomedical Engineering Online, 3(1):2–7, Jan 2004.

103. Kaushik Roy and Prabir Bhattacharya. Iris recognition with support vectormachines. Proc. Int. Conf. on Biometrics, pages 486–492, Jan 2006.

104. Wayne J. Ryan, Damon L. Woodard, Andrew T. Duchowski, and Stan T.Birchfield. Adapting starburst for elliptical iris segmentation. Proc. IEEEInt. Conf. on Biometrics: Theory, Applications, and Systems, pages 1–7,Sept 2008.

105. Natalia A. Schmid, Manasi V. Ketkar, Harshinder Singh, and Bojan Cukic.Performance analysis of iris-based identification system at the matching scorelevel. IEEE Transactions on Information Forensics and Security, 1(2):154–168, Jun 2006.

106. Natalia A. Schmid and Francesco Nicolo. On empirical recognition capacityof biometric systems under global PCA and ICA encoding. IEEE Transac-tions on Information Forensics and Security, 3(3):512–528, June 2008.

107. Stephanie A. C. Schuckers, Natalia A. Schmid, Aditya Abhyankar,Vivekanand Dorairaj, Christopher K. Boyce, and Lawrence A. Hornak. Ontechniques for angle compensation in nonideal iris recognition. IEEE Trans-actions on Systems, Man, and Cybernetics - Part B, 37(5):1176–1190, Oct2007.

108. Glenn Shafer. A Mathematical Theory of Evidence. Princeton, N.J.: Prince-ton University Press, 1976.

109. Koen Simoens, Pim Tuyls, and Bart Preneel. Privacy weaknesses in biometricsketches. IEEE Symposium on Security and Privacy, pages 188–203, 2009.

110. Pawan Sinha, Benjamin Balas, Yuri Ostrovsky, and Richard Russell. Facerecognition by humans: Nineteen results all computer vision researchersshould know about. Proceedings of the IEEE, 94(11):1948–1962, Nov 2006.

111. Zhenan Sun, Alessandra A. Paulino, Jianjiang Feng, Zhenhua Chai, TieniuTan, and Anil K. Jain. A study of multibiometric traits of identical twins.SPIE, pages 1–12, March 2010.

166

112. Zhenan Sun, Tieniu Tan, and Xianchao Qiu. Graph matching iris imageblocks with local binary pattern. Proc. Int. Conf. on Biometrics (SpringerLNCS 3832), pages 366–372, Jan 2006.

113. Vince Thomas, Nitesh Chawla, Kevin Bowyer, and Patrick Flynn. Learningto predict gender from iris images. Proc. IEEE Int. Conf. on Biometrics:Theory, Applications, and Systems, pages 1–5, Sept 2007.

114. Twins days festival official website. http://www.twinsdays.org/, accessedMarch 2010.

115. U.S. Department of Homeland Security. US-VISIT biometric identificationservices. http://www.dhs.gov/xprevprot/programs/gc 1208531081211.shtm,accessed July 2009.

116. U.S. Department of Homeland Security. US-VISIT traveler information.http://www.dhs.gov/xtrvlsec/programs/content multi image 0006.shtm,accessed July 2009.

117. Mayank Vatsa, Richa Singh, and Afzel Noore. Reducing the false rejectionrate of iris recognition using textural and topological features. Int. Journalof Signal Processing, 2(2):66–72, 2005.

118. Mayank Vatsa, Richa Singh, and Afzel Noore. Improving iris recognitionperformance using segmentation, quality enhancement, match score fusion,and indexing. IEEE Transactions on Systems, Man, and Cybernetics - PartB, 38(4):1021–1035, Aug 2008.

119. Edgar A. Whitley and Ian R. Hosein. Doing the politics of technologicaldecision making: Due process and the debate about identity cards in theU.K. European Journal of Information Systems, 17:668–677, 2008.

120. Richard P. Wildes. Iris recognition: An emerging biometric technology. Pro-ceedings of the IEEE, 85(9):1348–1363, Sept 1997.

121. Damon L. Woodard, Shrinivas Pundlik, Philip Miller, Raghavender Jillela,and Arun Ross. On the fusion of periocular and iris biometrics in non-idealimagery. Proc. Int. Conf. on Pattern Recognition, 2010. to appear.

122. Harry Wyatt. A minimum wear-and-tear meshwork for the iris. VisionResearch, 40:2167–2176, 2000.

123. N. Yager and T. Dunstone. The biometric menagerie. IEEE Transactionson Pattern Analysis and Machine Intelligence, 32(2):220–230, 2010.

167

http://www.twinsdays.org/

124. Peng-Fei Zhang, De-Sheng Li, and Qi Wang. A novel iris recognition methodbased on feature fusion. Proc. Int. Conf. on Machine Learning and Cyber-netics, pages 3661–3665, 2004.

125. W. Zhao, R. Chellappa, P. J. Phillips, and A. Rosenfeld. Face recognition:A literature survey. ACM Computing Surveys, 35(4):399–458, 2003.

126. Wenyi Zhao and Rama Chellappa, editors. Face Processing: Advanced Model-ing and Methods, chapter 17: Beyond one still image: Face recognition frommultiple still images or a video sequence by S.K. Zhou and R. Chellappa,pages 547–567. Elsevier, 2006.

127. Zhi Zhou, Yingzi Du, and Craig Belcher. Transforming traditional iris recog-nition systems to work in nonideal situations. IEEE Transactions on Indus-trial Electronics, 56(8):3203–3213, Aug 2009.

This document was prepared & typeset with LATEX2ε, and formatted withnddiss2ε classfile (v3.0[2005/07/27]) provided by Sameer Vijay.

168

increased use of available image data decreases …kwb/hollingsworth_dissertation_2010.pdf · 700...

Documents