fundamentals of media processingresearch.nii.ac.jp/~satoh/fmp/fmp_2012_10_23.pdffisherface method...

Post on 30-Mar-2020

9 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Fundamentals of Media Processing

Shin'ichi SatohKazuya KodamaHiroshi MoDuy-Dinh Le

Course web page

http://research.nii.ac.jp/~satoh/fmp/

Today's topics

Face detection and recognition PCA and Eigenface method LDA and Fisherface method Relation to Bayes decision rule assuming

normal distribution Application of Eigenvalue decomposition Searching document by the latent semantic

information: Latent Semantic Indexing (LSI) Estimating 3-D structure from video:

Factorization method

Face detection and recognition

Assume that we have a set of face images (with identity)

Face detection: decide if given unknown image is face or not

Face recognition: decide the identity of given face image

Image as intensity data

Image conversion

original gray scale crop

discretize quantize

x=[x1 x2 x3 ... xn]

The Space of Faces

An image is a point in a high dimensional space An N x M image is a point in RNM

We can define vectors in this space as we did in the 2D case

+=

[Thanks to Chuck Dyer, Steve Seitz, Nishino]

Multivariate normal distribution

We typically assume normal distribution Let's assume p(x|non-face) yields uniform

distribution, and p(x|face) yields normal distribution

multivariate normal distribution

)()(

21exp

)2(1)( 1T

2/12/μxμxx

dp

Normal distribution

Eigenface method

Key idea: observe distribution of face images in the image space

find principal components by principal component analysis (PCA)

project unknown images onto the space spanned by the obtained principal components

face detection: decide face if it's close enough to the space

face recognition: decide the identity of the closest face image within the space

Eigenface method

Assume we have samples of faces:X=[x1 x2 x3 ... xn]

We can then obtain covariance matrix:

n

in 1

1E ixxμ

TT ))((1)(E μμμμ)(xx XXn

--

Eigenface method

We then apply eigenvalue decomposition to the covariance matrix

where λi are eigenvalues and φi are eigenvectors Retain m eigenvectors (φi , i=1... m) corresponding

to the m largest eigenvalues Then each image x can be converted into m-

dimensional vector

iii

m1

)(T μxx'

Covariance matrix and its algebraic/geometric interpretation What is the quadratic form ?

x1

x2x3

y1 y2 y3

μ

φ

Covariance matrix and its algebraic/geometric interpretation How to maximize wrt ?

Lagrange multipliers method

very good exercise. pls try!

J. M. Rehg © 2002

Principle Component Analysis

PCA is an orthonormal projection of a random vector X onto a lower-dimensional subspace Y that minimizes mean square error.

YUXeXUY

uuuU

uuΣXXΣ

,

},{

21

21

Tm

n

iiiT

nm

E

1x

2x1y

2y

J. M. Rehg © 2002

Principle Component Analysis

Equivalently, PCA yields a distribution for Y with Uncorrelated components Maximum variance

YUXeXUY

uuuU

uuΣXXΣ

,

},{

21

21

Tm

n

iiiT

nm

E

1x

2x1y

2y

J. M. Rehg © 2002

Classification Using PCA

Detection of faces based on distance from face spaceRecognition of faces based on distance within face space

YUXeXUY

uuuU

uuΣXXΣ

,

},{

21

21

Tm

n

iiiT

nm

E

1x

2x1y

2y

?DTe

?Ri Tyy

Eigenfaces PCA extracts the eigenvectors of A

Gives a set of vectors v1, v2, v3, ... Each one of these vectors is a direction in face space

what do these look like?

Projecting onto the Eigenfaces

The eigenfaces v1, ..., vK span the space of faces

A face is converted to eigenface coordinates by

Fisherface method

Eigenface finds principal components which maximize the variance of face distribution

What happens if we use identity information? it would be reasonable idea to maximize the

variance between different people, while minimize the variance within the same people

find such components by Linear Discriminant Analysis (LDA)

project unknown images onto the space spanned by the obtained components

face recognition (only): decide the identity of the closest face image within the space

Fisherface -2

Poor Projection Good Projection

FisherFace - 3

N Sample images: C classes:

Average of each class:

Total average:

Nxx ,,1

c ,,1

ikx

ki

i xN

1

N

kkx

N 1

1

FisherFace - 4

Scatter of class i: Tikx

iki xxSik

c

iiW SS

1

c

i

TiiiBS

1

BWT SSS

Within class scatter:

Between class scatter:

Total scatter:

FisherFace - 5

After projection:

Between class scatter (of y’s):Within class scatter (of y’s):

kT

k xWy

WSWS BT

B ~

WSWS WT

W ~

FisherFace - 6

Good separation

2S

1S

BS

21 SSSW

FisherFace - 7

The wanted projection:

WSW

WSW

SS

WW

TB

T

W

Bopt WW

max arg~~

max arg

miwSwS iWiiB ,,1

How is it found ?

Experimental Results - 1

Variation in Facial Expression, Eyewear, and Lighting

Input: 160 images of 16 people Train: 159 images Test: 1 image

With glasses

Without glasses

3 Lighting conditions

5 expressions

Experimental Results - 2

Normal distribution

Singular Value Decomposition

orthogonalorthogonal

diagonal

Some Properties of SVD

Some Properties of SVD

• That is, Ak is the optimal approximation in terms of the approximation error measured by the Frobenius norm, among all matrices of rank k

• Forms the basics of LSI (Latent Semantic Indexing) in informational retrieval

Low rank approximation by SVD

Applications of SVD

Pseudoinverse Range, null space and rank Matrix approximation Other examples

http://en.wikipedia.org/wiki/Singular_value_decomposition

LSI (Latent Semantic Indexing)

Introduction Latent Semantic Indexing

LSI Query Updating

An example

Problem Introduction

Traditional term-matching method doesn’t work well in information retrieval

We want to capture the concepts instead of words. Concepts are reflected in the words. However, One term may have multiple meaning Different terms may have the same meaning.

LSI (Latent Semantic Indexing)

LSI approach tries to overcome the deficiencies of term-matching retrieval by treating the unreliability of observed term-document association data as a statistical problem.

The goal is to find effective models to represent the relationship between terms and documents. Hence a set of terms, which is by itself incomplete and unreliable, will be replaced by some set of entities which are more reliable indicants.

LSI, the Method

Document-Term M Decompose M by SVD. Approximating M using truncated SVD

LSI, the Method (cont.)

Each row and column of A gets mapped into the k-dimensional LSI space, by the SVD.

Query

A query q is also mapped into this space, by

Compare the similarity in the new space Intuition: Dimension reduction through LSI

brings together “related” axes in the vector space.

1 kkT

k Uqq

Example

Example (cont.)

Example (cont. Mapping)

Example (cont. Query)

Query: Application and Theory

Example (cont. Query)

How to set the value of k?

LSI is useful only if k << n. If k is too large, it doesn't capture the

underlying latent semantic space; if k is too small, too much is lost.

No principled way of determining the best k.

How well does LSI work?

Effectiveness of LSI compared to regular term-matching depends on nature of documents. Typical improvement: 0 to 30% better precision. Advantage greater for texts in which synonymy and ambiguity

are more prevalent. Best when recall is high.

Costs of LSI might outweigh improvement. SVD is computationally expensive; limited use for really large

document collections Inverted index not possible

References

Mini tutorial on the Singular Value Decomposition http://www.cs.brown.edu/research/ai/dynamics/tutorial/Post

script/SingularValueDecomposition.ps

Basics of linear algebra http://www.stanford.edu/class/cs229/section/section_lin_alg

ebra.pdf

Factorization

C. Tomasi and T. Kanade. Shape and motion from image streams under orthography---a factorization method. International Journal on Computer Vision, 9(2):137-154, November 1992. also as Technical Report TR-92-1270, Cornell University, March 1992.

Motivation

Given: an image sequence of a particular object

Automatically extract the 3-D structure of the object

Useful for object identification, 3-D model construction for CG, CAD, etc.

Example

First Step: Feature Point Selection and Tracking

Feature Point SelectionFeature Point Tracking

Feature Point Tracking

Select good points to track We regard a given image sequence as the

following spatio-temporal function: Pixel correspondence:

If we define:

...

),,( tyxI

),,(),,( tyxItyxI

),,()( tyxIJ x),,()( tyxII dx

),,( tyxI ),,( tyxI

),( d

The problem is then to find that minimizes:d

W

wJI xxdx d)]()([ 2

W

If we assume that the displacement vector is small,

dgxdx )()( II

where ),(yI

xI

g

then what we need to solve becomes:ed G

W

wG xgg dT W

wJI xge d)(where and

Feature Point Selection

Good feature points should provide stable solution for tracking: from

G should have numerically stable inverse Assume two eigenvalues of G be The criterion for feature point selection is:

d ed G

21,

),min( 21

Measurement Matrix

f-th frame (f+1)-th framep-th point p-th point),( fpfp vu ),( )1()1( pfpf vu

)(

)(~~~

fpfp

fpfp

vv

uuVUW

P

F

F

f=1,…,Fp=1,…,P

The Rank Theorem

The Rank Theorem

pT

ffpfp

pT

ffpfp

vv

uu

sj

si

under orthographic projection,

P

F

FRSW ss

j

ji

i

1

T

T1

T

T1

~

thus

The Rank Theorem

How we decompose ? by using the singular value decomposition

(SVD)

then

W~

BAW ~

BS

AR2/1

2/1

ˆ

ˆ

Results

Results

Results

Schedule (topics subject to change) 10/9(today) orientation, Bayes decision theory, probability

distribution, normal distribution 10/16 random vector, linear algebra, orthogonal expansions,

principal component analysis(PCA) 10/23 Applications of Eigenvalue decomposition (or Singular Value

Decomposition): Latent Semantic Indexing and the Factorization method

10/30 no class 11/6 advanced pattern recognition #1 11/13 advanced pattern recognition #2 11/20 Nonparametric methods (Parzen windows, k-nearest neighbor

estimate, k-nearset neighbor classification) 11/27 advanced pattern recognition #3 12/4 no class 12/11 Nonparametric methods contd., Clustering (k-Means,

agglomerative hierarchical clustering)

Schedule cont'd

12/18 advanced pattern recognition #4 12/25 advanced pattern recognition #5 2013/1/8 no class 1/15 (Fundamentals of) Signal processing 1/22 (Fundamentals of) Image processing 1/29 3-D/4-D Image processing 2/5 Project 2/12 Discussion

top related