biometric pattern recognition genetics neuroscience finance, game theory, asset price volatility

Computer Science Tripos 2013-14, Part II, Information Theory and CodingSome Applications of Information TheoryBiometric Pattern RecognitionGeneticsNeuroscienceFinance, Game Theory, Asset Price Volatility

1. Information Theory and Biometrics: Recognising Persons by their Iris PatternsProf. John Daugman

Some examples of biometric methods and applicationsForensicsFace recognition ??IrisKids (US) missing childrenregistration and identification

Biometric decision power depends on the magnitudes of within-person variability and between-person variability

In the visible band of light, the iris reveals a very rich, random, interwoven texture (the trabecular meshwork )

But even dark brown eyes show rich texture when images are captured in infrared illumination

Entropy: the key to biometric collision avoidance The discriminating power of a biometric depends on its entropy

Entropy measures the amount of random variation in a population: the number of different states or patterns that are possible; the probability distribution across those possible states

Entropy H (in bits) corresponds to 2H discriminable states or patterns

Surviving large database searches requires large biometric entropy

Epigenetic features (not genetically determined) make best biometricsAbout 1 percent of persons have a monozygotic (identical) twin

Epigenetic biometric features are vital if de-duplication of a large national database is required, as in the UID programme in India.

The epigenetic biometric property is especially important in cultures with high rates of group inbreeding (e.g. cousin marriage), so that genetically related persons do not collide in their biometrics.

Iris Patterns are EpigeneticEvery biometric lies somewhere on a continuum between being genetically determined (genotypic) or not (epigenetic)

Examples of genotypic traits: DNA, blood type, gender, raceExamples of epigenetic traits: fingerprints (except for type correlations); and iris patterns (except for eye colour)Example at middle of continuum: facial appearance. (Identical twins look identical, but they both change over time like everyone, yet they track each other as they age.)

Genetically identical eyeshave iris patterns that areuncorrelated in detail:

Monozygotic Twins A (6 year-old boys)


Monozygotic Twins B (18 year-old women)


Monozygotic Twins C (78 year-old men)

2D Gabor wavelets as phase-steerable detectors(Adams Kong)D. Gabor (1900-1979)

Why phase is a good variable for biometric encoding Phase encodes structural information, independent of contrast Phase encoding thereby achieves some valuable invariances Phase information has much higher entropy than amplitude In harmonic (Fourier) terms, phase does all the work Phase can be very coarsely quantised into a binary string Phase is equivalent to a clustering algorithm (c.f. Adams Kong) Question: what is the best quantisation of phase (2, 4, 8... sectors)? Phase can be encoded in a scale-specific, or a scale-invariant, wayGabor wavelets encode phase naturally, but in a scale- (or frequency)-specific way

Alternatives exist that encode phase in a total way (independent of scale/frequency), such as the Analytic function (the signal minus its Hilbert Transform i fHi(x) cousin): f(x) i fHi(x), which is a complex function whose 2 parts are in quadrature

Why IrisCode matching is so fast, parallelisable, and scalable

High entropy gives resistance against False MatchesThe probability of two different people colliding by chance in so many bits (e.g. disagreeing in only one-third of their IrisCode bits) is infinitesimal. Thus the False Match Rate is easily made minuscule.

But its like looking for one of these......in one of these.

Example of the importance of high entropyUIDAI (Unique Identification Authority of India) in 2011 began enrolling iris images of all 1.2 billion citizensAs of October 2013, 450 million had been enrolledCurrently enrolling 1 million persons per dayEach enrolled person is compared against all of those enrolled so far, to detect duplicates (de-duplication). This requires (1 million x 450 million) = 450 trillion iris cross-comparisons daily: 4.5 x 1014 per dayThe avoidance of biometric collisions among comparisons on this scale requires high biometric entropy, as possessed by IrisCode phase bits, ensuring very rapidly attenuating tails of the distribution obtained when comparing different eyes.

1014 iris comparisons per day! A typical galaxy contains just 100 billion stars (1011)... So UIDAI daily workflow equates to the number of stars in 1,000 galaxies

Bits in IrisCodes are equally like to be 1 or 0

- This makes them maximum-entropy bitwise.

- If different irises had some common structure, then this distribution would not be uniform.When bits from IrisCodes derived from different eyes are compared, those comparisons are Bernoulli trials.

Badly defocused iris images do not cause False Matches, because theIrisCode phase bits then just become random, determined by pixel noise.This is an advantage of phase over correlation-based coding methods.

IrisCode Logic and Normalizations

Generating ROC (or DET) curves requires moving the decision threshold, from conservative to liberal, to see the trade-off between FMR and FnMR errors.

The slope of the ROC curve is the likelihood ratio: ratio of the two density distributions at a given decision threshold criterion. Flat ROC curves permit FMR to be greatly reduced by small threshold changes, at little cost to FnMR.

IRIS gates at 10 UK airport terminals for registeredfrequent travellers in lieu of passport presentationUS-Canadian border crossing in lieu of passports

Face vs iris recognition: factor of 100,000 in accuracyNIST report, IREX-III: the best face recognition algorithms have 10 times higher Miss Rate, and 100,000 times higher False Match Rate, than iris recognition.

- The United Arab Emirates iris-based border security system

Deployed at all 32 air, land, and sea-ports 1,190,000 IrisCodes registered in a watch-list On a typical day 12,000 irises are compared to _all on the watch-list (14 billion comparisons/day) Each exhaustive search takes < 2 seconds About 30 trillion (30 million-million) comparisons of irises have been done since 2001 After an amnesty for violators of work permit _laws or other offences in 2001, expellees iris _patterns were encoded. About 150,000 persons _have since been caught trying to re-enter illegally.

Residency Permit Applications

U.S. Police Departments: bookings and releases

Takhtabaig Voluntary Repatriation Centre, Pakistan-Afghan borderThe United NationsHigh Commission forRefugees (UNHCR)administers cash grantsfor returnees, using irisidentification.

Deployments of iris recognition in war zones (Iraq, Afghan.) for monitoring populations near IED events

Sharbat Gula (1984); identified (2002) by these iris algorithms (based on photographs taken by National Geographic)

Schiphol Airport (NL): iris recognition in lieu of passport presentation

Access to condominium building, and programming the lift (!), by iris recognition

Iris image standard; data formats; compressibilityISO/IEC 19794-6 Iris Image Data Interchange Format Standard (revision published in 2011)Inter-operable image formats were required, not proprietary IrisCode templates (vendor neutral)NIST IREX study endorsed new compact formats: iris image compression to as little as 2 KB using JP2K (not JPEG), with cropping and ROI masking; or lossless compression using PNG containerRevision process was empirically-based (process promoted by Prof. C. Busch, and driven by NIST)

Effect of JPEG compression

Region-of-Interest cropping and JPEG-2000 compression allows iris imagesto be compressed to only 2,000 2,500 bytes with little impact on performance

Effect of JPEG-2000 + ROI isolation compression

New ISO Standard: highly compact iris image format, compressed to as little as 2,000 bytesCropping, and masking non-iris regions, preserves the coding budgetPixels outside the ROI are fixed to constant values, for normal segmentationSoftening the mask boundaries also preserves the coding budgetAt only 2,000 bytes, iris images are now much more compact than fingerprints

Iris recognition on the subcontinent: 1.2 billion people

UIDAI (Unique IDentification Authority of India) requirement for severe compression of iris images: the problem, and the proposed solutionRural Indian authentication applications must operate over very restricted bandwidth. (Legacy ATM protocol uses packets 1 KB, with transmission bandwidth ~ 1 KB/sec.)In 2012, Nandan Nilekani announced a plan for about 1 million micro-ATMs enabling local shopkeepers to provide financial services from their cash tills, with online interbank counter-transfers from UIDAI-enabled accounts to theirs. He explained the current situation, even for holders of bank accounts in rural areas (40% of population have none), in terms of A Day in the Life of Ram:Currently if Ram needs to withdraw 50 Rupees, he walks 6 km to an over-crowded bus; he travels for 2 hrs to a town where his bank is, and then travels back. It takes all day.The solution requested by UIDAI is to use ultra-compact representations of biometric images ( < 1 KB) to authenticate a claimed Aadhaar number over a narrowband channel, and authorise financial transactions via the shopkeepers till, with online counter-transfer.

New UIDAI report on scenario test of 100,000+ transactions authenticated online using severely compressed iris images

Fuzzy database matching with a Codex(based on Technical Report circulated in March 2006: Hao, Daugman, and Zielinski, A fast search algorithm for a large fuzzy database, published in IEEE T-IFS, 3(2), pp. 203-212.)

Uses Indexing for large databases, instead of exhaustive search.The concept is similar to Content-Addressable Memory (CAM), in which the data itself is used as an address.A Codex is constructed, listing IrisCodes containing various bit patterns. When enough collisions, or suspicious coincidences occur between IrisCodes, they (and they alone) are considered candidates for matching. Speed-up arises from ignoring others.Pruning factor (therefore speed-up factor) approaches ~ 100:1. Adoption of Indexing should be gated by Quality Assessment.

The Doctrine of Suspicious CoincidencesWhen the recurrence of patterns just by chance is a highly improbable explanation, it is unlikely to be a coincidence.

Synthetic biometrics, minimal description length, and Kolmogorov complexityKolmogorov (1903-1987)Synthetic biometrics (creating an image indistinguishable from an actual sample) are programs that serve to compress biometric samples in Kolmogorovs sense. In the future, will biometric recognition operate by comparing such programs? Kolmogorov introduced a new definition for the complexity of a string of data: it is the length of the shortest program that could generate the data.Creating that program compresses the data; executing that program decompresses (generates) the data.If the shortest program that can generate a data string is essentially a data statement containing it, then the data is its own shortest possible description (K-incompressible).Today iris images can be compressed to about 2 kB, which is about the same size as standard iris templates.

2. Information Theory and GeneticsThe information in DNA is coded in terms of ordered triples of four nucleotide bases (Adenine, Guanine, Cytosine, Thymine; or A,G,C,T).The ordered triples are called codons. There are 64 possible permutations (4 bases taken 3 at a time: 4 = 64 permutations).Codons specify amino acids, the building blocks of proteins. There are only 20 primary amino acids, so any given one may correspond to more than one codon. (Example: AGC, AGU, UCG, UCU all specify serine.)The human genome consists of about 5 billion nucleotide bases, or about 1.7 billion codons, each able to specify ~ 6 bits of information.Only about 3% of the human genome specifies genes, of which there are some 24,000. Average gene length ~ 8,000 nucleotide base pairs.

(Information Theory and Genetics, continued)Therefore on average, a typical gene could specify ~ 16,000 bits. But this space is very sparsely populated with meaningful alternatives.Many (but not all) of your genes could be passed on by either of your parents ( , ) with equal probability, so genes inheritance entropy is: H(genes inheritance | , ) 10,000 bits per generation. But many genes are indistinguishable in effects, whether maternal or paternal.At each generation, the genetic influence of any ancestor is diluted by 50% (as about half of genes come from either parent, each generation).Thus (for example): you share of your genes with any grandparent; with either parent, and with full siblings; with any double cousins; 100% with a monozygotic twin; and N with Nth generation ancestors.

Diverse forms of genetic relatedness

Catastrophe: WHERE are all your missing ancestors??Going back N generations in your family tree, you must have 2N ancestors.Consider just 950 years back (about 30 generations ago), around the time of William the Conqueror (1066). Your family tree must be populated by 230 1 billion ancestors from that time. (Many more if faster reproduction.)But the total population of Europe around that time was only a few million. WHERE will you find a billion ancestors, among just some million persons?The inevitable answer: EVERYBODY then living (who had descendants) was your ancestor; -- and on average, each about 1,000 times over. Conclusion: almost everyone today will become the co-ancestors of all of your descendants, within just a few generations. (Why, then, do you think it matters so much whether and with whom you decide to have children?)

Communal co-ancestry of descendants after N generationsAncestral state-space entropy H = - 2-N log2 2-N = N bits. The essence of sexual reproduction is perpetual mixing, and re-mixing, of the gene pool.2N1

3. Information Theory and NeuroscienceAt a synapse, what is the typical rate of information transfer?

Signal flows in the retina, both lateral and longitudinal

Estimated transmission rate at a graded synapse: 1,650 bits per second.Based on SNR analysis as in Exercise 11. (de Ruyter van Steveninck and Laughlin, Nature, 1996.)(There are both spiking neurons and graded potential synapses)

*

biometric pattern recognition genetics neuroscience finance, game theory, asset price volatility

Documents

biometric encoding phase

work phase

entropy entropy

identical twins

best quantisation of

independent of contrast

binary string phase

epigenetic biometric