are iris crypts useful in identity recognition?csis.pace.edu/~ctappert/dps/2013btas/papers/paper...

Are Iris Crypts Useful in Identity Recognition?

Feng Shen Patrick J. FlynnDept. of Computer Science and Engineering, University of Notre Dame

254 Fitzpatrick Hall, Notre Dame, IN [email protected] [email protected]

Abstract

We conducted an experiment in which participants wereasked to annotate the crypts they see in iris images. Theresults were used to assess the utility of crypts in iden-tity recognition for forensic applications. Although theinter-participant annotation consistency may be limited bycrypts’ noticeability and the range of genuine similarityscores obtained by the adopted matcher is correlated tothe number of crypt pixels in the given image, the intra-iriscrypt perception is sufficiently consistent for nearly 95% ofquery images to be correctly identified. The performanceis expected to be better for the professional examiners withcomprehensive training.

1. IntroductionThe iris is claimed to be one of the best performing bio-

metric modes. It rarely suffers damage and changes veryslowly, if at all, during adulthood. The iris recognition sys-tems developed by Daugman [1] are reported to find no falsematches in about 20 billion cross-comparisons [2] and canperform about 100,000 full comparisons between differentirises per second [1].

Despite its advantages in speed and accuracy over otherbiometric modes, the iris has not been examined in the do-main of forensics. One important reason is that the featurerepresentation in traditional iris techniques is not easily in-terpreted by humans. As a result, it is difficult to add manualinspection to verify the system output, which is essential tomany law enforcement applications.

Some iris techniques have been proposed to exploit fea-tures that are better associated with the human vision. Themethod in [6] uses an adjacent pair of local minimum andmaximum of 1D dyadic wavelet transform outputs to locatethe appearance and disappearance of dark irregular blockson each row of the normalized iris. Yang et al. [11] sug-gests using multi-channel 2D Gabor filters to detect keypoints that represent local texture information most effec-tively. De Mira et al. [7] use morphological operators to

highlight ridge-like patterns and match irises by the ex-tracted nodes, end-points and branches. Sunder et al. [10]employ SIFT to characterize the macro-features on the irisand use them for template retrieval. Unfortunately, none ofthe foregoing methods conclusively correlate the extractedfeature with any specific types of visible features.

An iris recognition technique that parallels fingerprinttechniques has been proposed in [8] and [9]. It uses cryptsas the minutiae of iris, and measures the similarity be-tween two irises by the shapes and locations of the ex-tracted crypts and the number of common crypts identifiedbetween two irises. The advantage of this technique is thatit makes it much more convenient for forensic examiners tosupervise the identification process because the representa-tive features of irises and the identification results are di-rectly viewable to human eyes. However, this advantage isbased on the assumption that experienced forensic exam-iners could consistently perceive iris crypts from the givenimages. The foundation of crypt-based identity recognitionwould be questionable if different forensic examiners drewtotally different conclusions from the same iris pair.

In this paper, we examine this assumption through an ex-periment in which human observers were asked to annotatethe crypts they see in gray-scale iris images. The resultsare studied to determine how the human visual system per-forms in crypt identification. To our best knowledge, therehas been no previous effort in collecting manual annotationdata or evaluating the perceptual consistency of crypts. Thestudy in this paper has two different purposes. The first oneis to provide empirical evidence to support a human-in-the-loop framework for crypt-based iris recognition. It also setsa baseline for evaluating the performance of automatic cryptdetection and matching algorithms in the future.

The paper is organized as follows. Section 2 introducesbackground knowledge of crypts and the crypt-based irisrecognition framework. Section 3 describes our dataset,session plan and the annotation software. The analysis ofthe annotation results is presented in Section 4. Section 5summarizes the paper and presents recommendations for fu-ture work.

2. Background

According to [4], crypts are a sequence of openings lo-cated near either side of the collarette on the anterior layerof the iris. They play a role in keeping the stroma and tis-sues on the posterior layer lubricated. Crypts are significantvisible features of the iris because their formation relates toboth pigmentation and the surface structure [3].

When imaged in the near infrared, crypts appear as areasthat are consistently darker than their surroundings. Fig-ure 1 shows examples of iris images with multiple visiblecrypts. All images used in this paper are acquired undernear infrared illumination.

(a)

(b)

(c)

(d)

Figure 1: Examples of visible crypts

The major motivation to develop visible feature-basediris recognition is to aid the law enforcement applicationsthat require human supervision in the process. Figure 2shows the framework of a semi-automatic subject identifi-cation system where manual inspection is added.

In this framework, the system retrieves a short list ofmatching candidates that have the highest similarity scoresand automatically annotates common crypt features thatcontribute to the overall similarity. During manual inspec-tion, the human examiners select the true matching imagefrom the retrieved list or declare a new subject by visuallyevaluating the similarities between the annotated commonfeature pairs. Compared to the traditional fully-automaticiris recognition, the results of a human-supervised systemare more reliable and more convincing to the general publicif we can prove that the human perception of iris crypts areconsistent enough to discriminate different irises.

Figure 2: System framework with manual inspection

3. Experiment

3.1. Data

All the iris images are taken from the LG4000 iris cam-era under NIR lighting [5]. We considered using either theeye images directly or the iris strips that are segmented andnormalized from eye images. The sizes of eyes and theamount of dilation vary in the acquired eye images. Toreduce the complications incurred by these differences asmuch as possible, we use the normalized grey-scale irisstrips as the input images for annotation. The regions oc-cluded by eyelashes or eyelids are masked by solid yellowcolor, and the annotators are instructed to ignore them. Wehand-selected a subset of images that are of good focus, de-cently accurate segmentation and relatively small occlusionarea. Our study uses a total of 124 images from 62 differentirises with 2 images for each iris. The images are presentedto participants in 3 sessions by the noticeability of cryptsand difficulties of annotation as explained in table 1.

The test images are labeled as ‘easy’ or ‘challenging’ be-fore experiments by the author to describe the noticeabilityof crypts in each image. As illustrated in figure 3(b), the‘challenging’ images contain border-case crypts whose oc-currence or exact shapes are difficult to decide. The credi-bility of labeling is limited by the author’s subjective obser-vation, so in addition to the noticeability, the images are alsoselected by the annotation consistency measured from pre-vious sessions. The images presented in multiple sessionshelp us observe the behavior changes between sessions.

3.2. Sessions

The experiments were carried out in three sessions, eachof which contained 60-68 images and the time cost of eachsession varied between 1 and 2 hours for different partici-

Session # of images Description1 60 ‘Easy’ images with relatively

noticeable crypts2 68 28 ‘easy’ images from ses-

sion 1 with poor annotationconsistency and 40 ‘challeng-ing’ images with low-qualitycrypts.

3 64 20 ‘easy’ images randomlyfrom session 1, 20 ‘challeng-ing’ images from session 2with relatively high annota-tion consistency, and new im-ages mixed of 8 ‘easy’ and 16‘challenging’.

Table 1: Test images in three sessions

(a)

(b)

Figure 3: Image classification by crypts noticeability. (a) isan easy image. (b) is a challenging image.

pants. To reduce the likelihood of fatigue and resulting poorannotations, participants were asked to complete each ses-sion in two separate time slots lasting less than 1 hour.

We solicited volunteers from the students and staff onthe Notre Dame campus to participate in our experiments.We had 23 volunteers in session 1. 21 of them continued toparticipate in session 2, and 20 of them finished all three.

All participants were required to watch a 6-minute tuto-rial video about how to operate the annotation software. Italso gave participants a high-level standard for identifyingcrypts: image regions that are consistently darker than theirsurroundings. Participants were instructed to ignore verysmall candidate regions and those located too close to thepupil or limbus boundary because they are either not dis-criminatory or not stable enough.

3.3. Software

We implemented the annotation software, and the userinterface is shown in figure 4. Participants make annota-tions on the grey-scale normalized iris image at the top,and the annotation results are recorded as a binary feature

image at the bottom. To annotate crypts, the participantshold down the mouse button and use it as a ‘pen’ to paintthe regions that exactly cover the crypts. The red bound-aries in the iris image and the connected regions in the fea-ture image are updated automatically with the annotators’drawing. Small holes within donut-shaped annotations areautomatically filled in post-processing. A good annotatoris expected to be careful with the crypt boundaries so thatthe red boundaries perfectly overlap with the observed cryptboundaries.

Figure 4: User interface of the annotation software

4. Results4.1. Inter-Participant Perceptual Consistency

Inter-participant perceptual consistency measures thesimilarities between annotations from different annotatorsfor the same image. For each image I we calculated a con-sensus image Ic in which each bit is set if it receives votesfrom more than 50% of participants as a crypt pixel. Theinter-participant consistency for the given image I is mea-sured by the consistency ratio θ:

θ =average# of set bits in (Ik

⋂Ic)

average# of set bits in (Ik ⊕ Ic)(1)

Where Ik is the annotation result from the kth partici-pant, and symbols ‘

⋂’ and ‘⊕’ denote the logic ‘and’ and

‘exclusive or’ respectively. Higher values of θ indicate bet-ter inter-participant perceptual consistency. Figure 5 plotsthe θ values calculated for all image at each session.

The ideal value for θ is +∞ when all annotations areidentical. The maximum θ we obtained from the 124 testingimages is 1.6856. Although it is difficult to identify a deci-sion threshold above which the participants’ perception ofiris crypts is sufficiently consistent for identity recognition,we found in figure 6 that the level of consistency is higher

Figure 5: Average consistency ratios for each image

for ‘easy’ images, which are the ones containing easily no-ticeable crypts as in figure 3(a).

Figure 6: Consistency ratios of each noticeability category

A major part of annotation inconsistency comes fromcrypt boundaries. Of all annotation images, we found a to-tal of 4,821,598 pixels that are different from the consensusimage Ic, of which 2,992,173 (62.06%) are from regionsconnected to (and are thus very likely to be surrounding)the crypts in Ic. The inconsistent decisions on crypt bound-aries may be caused by lack of training and experience. Thedifferences between over-conservative and over-aggressivelabeling behavior around crypt boundaries contributes con-siderably to the overall inter-participant inconsistencies.

4.2. Inter-Iris Perceptual Discrimination

In this section, we evaluate the discriminatory ability ofthe annotated crypts in spite of the inter-participant incon-sistency we observed in the last section. An all-to-all match-ing is conducted to simulate the identity verification basedon a single decision threshold. We also conducted one-to-allmatching to simulate the identification scenario using eachannotation image for identity query.

4.2.1 Verification

We use the overlapping percentage η to measure the simi-larity between two crypt annotation images Ik and Il:

η =# of set bits in (Ik

⋂Il)

# of set bits in (Ik⋃Il)

(2)

The head tilt during image acquisition causes rotationalhorizontal shift in the normalized iris image, and other fac-tors such as eye movement, pupil dilation and segmentationerror may result in vertical shift. To accommodate the po-sition shifts of crypts, in calculating η we move Ik bothhorizontally and vertically to find the position where Ik andIl have the maximum number of overlapping bits.

Each annotation image is considered independently inmatching. Two annotation images form a matching pair ifthey are from the same eye. Intuitively the similarity be-tween matching pairs is higher than that of nonmatchingpairs, so the value of η is higher for matching pairs. Fig-ure 7 shows the matching score distribution obtained from232,965 matching and 10,637,785 nonmatching pairs. Be-cause we have many more nonmatching pairs in experi-ments, the distributions are normalized to the same heightfor the ease of comparison.

Figure 7: The match and nonmatch score distributions foran all-to-all matching experiment

A reliable identity verification system has an overlappingarea as small as possible between the score distributions of

matching and nonmatching pairs. Larger overlapping areameans higher possibilities of false accept or false reject er-rors. Although the two peaks in figure 7 are separate, theoverlapping area prevents it from being a reliable biometricmode in automatic identity verification when a single de-cision threshold is used to determine a positive or negativeresponse for the identity claim.

In observation we found that images with fewer cryptpixels are more likely to yield low genuine matching scoreη compared to those with more crypt pixels. Figures 8 plotsthe joint distribution between the number of crypt pixelsand the average genuine matching score for each image.The number of crypt pixel of I is estimated as the averageof all annotations {Ik}, and the average genuine matchingscore is obtained by matching all images in {Ik} against{Ik}

⋃{I ′k} where {I ′k} are the annotations of the other im-

age of the same iris.

Figure 8: Scatter plot of average genuine matching scoreand the number of crypt pixels in images

We tested the statistical independence between thematching score η and the number of crypt pixels in figure8. The null hypothesis is that the matching score for a givenimage is independent of the number of crypt pixels in it.The p-value obtained from Pearsons chi-squared test of in-dependence is 1.645 × 10−6, which is well below 0.05, sothe null hypothesis is rejected, which proves the existenceof correlation between the number of crypt pixels in an im-age and the scale of matching scores it could yield againstother images of the same iris. The correlation coefficientbetween them is 0.495 with p-value at 5.072×10−9. Giventhe noise induced by annotation inconsistencies, the corre-lation is significant.

The number of crypt pixels is an intrinsic property thatvaries in a wide range for different iris images, so it ex-

plains why using a single decision threshold for using theoverlapping percentage as the matching score did not pro-duce satisfying performance for identity verifications. It ispart of our future work to model the correlation betweenthe number of crypt pixels and matching scores to developa more discriminatory similarity metric.

4.2.2 Identification

In this section, we measure the crypts’ inter-iris perceptualdiscrimination independently for each image by conductingthe closed-set identification where the identity of a queryimage is determined by the retrieved candidates who havethe highest matching scores. Unlike identity verification,the identification decision of the given image is not affectedby the range of similarity scores for other query images thatare likely to have different numbers of crypt pixels. Thesystem provides a correct identification as long as the queryimage generates a higher similarity score η with genuinematching images in the database than those generated withnonmatching images, even if the values of genuine match-ing scores are very small or the imposter matching scoresare large. Although open-set identification is more popularin practice, the goal of this paper is to study the human per-ception of crypts, and the close-set identification serves thispurpose well.

Figure 9: CMC of identification by crypt annotations

We conducted one-to-all matching using each crypt an-notation image as the query image and match it against allother annotation images from all participants for all iris im-ages. The performance of the closed-set identification ismeasured by the cumulative match characteristics (CMC),which plots how often a genuine matching template appearsin the top rank candidate lists as the number of retrievedcandidates increases. Although there are inter-participant

perceptual differences for each iris image as discussed insection 4.1, figure 9 shows that the identification achieves anoverall identification rate of 89% at rank 1 and nearly 95%at rank 10 for all 4088 query attempts against the databasecomposed of all other annotation images. We expect thisperformance to be considerably improved if the participantsreceive formal training before the annotation task. A com-prehensive training process is likely to reduce the incon-sistent opinions about border-case crypts and crypt bound-aries. It would also help if annotators were more detail-oriented about crypt boundaries in annotation, which is themajor source of inter-participant inconsistency.

Figure 10: Identification fail rate at rank 10 of each notice-ability category

Similar to the case of consistency ratio θ, figure 10 showsthat the noticeability of crypts also affects the performanceof identification. We compared the fail rate at rank 10 foreach image calculated by using all annotations of the givenimage as the query input. According to the results, 171 ofthe 210 (81.43%) failed queries at rank 10 are from ‘chal-lenging’ images while the number of queries from ‘easy’images is 968 more than that of ‘challenging’ images.

5. Conclusion and Future Work

This paper studied human perception of iris crypts byanalyzing the annotation data we collected from 23 vol-unteers. The experiments are carried out in three sessionswith 124 images pre-labeled as ‘easy’ and ‘challenging’ bythe noticeability of crypts. The human perception of cryptsis evaluated in two aspects, namely inter-participant con-sistency and inter-iris discrimination. The inter-participantconsistency measures the similarities between different an-notations of the same iris image. We found that the inter-participant consistency is higher for images with easily no-ticeable crypts and that the boundary areas of crypts are themajor source of inconsistency in annotations. Despite theexistence of perceptual differences, we found through thesimulation of closed-set identification that the intra-iris dif-

ference is sufficiently small for nearly 95% of query imagesto retrieve genuine matching irises at top ranks.

Because the range of similarity scores generated fromthe current matcher is statistically significantly correlatedwith the number of crypt pixels in the image, a bettermatcher that produces a consistent range of scores needs tobe developed before iris crypts be applied reliably in iden-tity verification. Also we suggest developing a training pro-cess focusing on the decision of crypt boundaries in the fu-ture to reduce the dispute over boundary locations.

The experiments showed considerable differences inconsistency and discrimination between the two noticeabil-ity classes. We suggest that it is possible for some irises tonaturally have shallower crypts than others, which makesthem less reliable for crypt-based recognition. We haveattempted to find the correlation between the noticeabilityand some properties such as image intensity and gradient,but these features are proved to be nondiscriminatory inexperiments. We will continue working on an automaticnoticeability evaluation in our future work. If ‘challeng-ing’ images could be automatically identified at acquisi-tion, we may be able to increase the accuracy by applyingpre-processing techniques or merging the results of multiplerecognition methods.

References[1] J. Daugman. How iris recognition works. IEEE Trans.

on Circuits and Systems for Video Technology, 14(1):21–30,January 2004.

[2] J. Daugman. Results from 200 billion iris cross-comparisons.Technical report, University of Cambridge, 2005.

[3] L. Flom and A. Safir. Iris recognition system. US Patent4,641,349, 1987.

[4] D. H. Gold and R. Lewis. Clinical eye atlas. Oxford Univer-sity Press, 2003.

[5] LG. Iris access 4000.http://www.lgiris.com/ps/products/irisaccess4000.htm.

[6] L. Ma, T. Tan, Y. Wang, and D. Zhang. Efficient iris recog-nition by characterizing key local variations. IEEE Trans. onImage Processing, 13(6):1519–1533, 2003.

[7] J. D. Mira and J. Mayer. Image freature extraction for ap-plication of biometric identification of iris - a morphologicalapproach. Proc. of SIBGRAPI, 391-398, 2003.

[8] F. Shen and P. J. Flynn. Iris matching by crypts and anti-crypts. Proc. of IEEE Converence on Technologies forHomeland Security, pages 208–213, November 2012.

[9] F. Shen and P. J. Flynn. Using crypts as iris minutiae. Proc.SPIE, 8712, May 2013.

[10] M. S. Sunder and A. Ross. Iris image retrieval based onmacro-features. Proc. of ICPR, 1318-1321, 2010.

[11] W. Yang, L. Yu, and K. Wang. Iris recognition based onlocation of key points. Proc. ICBA, pages 484–490, July2004.

are iris crypts useful in identity recognition?csis.pace.edu/~ctappert/dps/2013btas/papers/paper...

Documents