“ pixels that sound ” find pixels that correspond (correlate !?) to sound

30
“ Pixels that Sound ” Find pixels that correspond (correlate !?) to sound Kidron, Schechner, Elad, CVPR 2005 34

Upload: lana-kane

Post on 02-Jan-2016

47 views

Category:

Documents


3 download

DESCRIPTION

34. “ Pixels that Sound ” Find pixels that correspond (correlate !?) to sound. Kidron, Schechner, Elad, CVPR 2005. 47. Audio-Visual Analysis: Applications. Lip reading – detection of lips (or person) Slaney, Covell (2000) Bregler, Konig (1994) Analysis and synthesis of music from motion - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: “ Pixels that Sound ” Find pixels that correspond (correlate !?) to sound

“ Pixels that Sound ”

Find pixels that correspond (correlate !?) to sound

Kidron, Schechner, Elad, CVPR 2005

34

Page 2: “ Pixels that Sound ” Find pixels that correspond (correlate !?) to sound

Audio-Visual Analysis: Applications• Lip reading – detection of lips (or person)

Slaney, Covell (2000)

Bregler, Konig (1994)

• Analysis and synthesis of music from motionMurphy, Andersen, Jensen (2003)

• Source separation based on visionLi, Dimitrova, Li, Sethi (2003)

Smaragdis, Casey (2003)

Nock, Iyengar, Neti (2002)

Fisher, Darrell, Freeman, Viola (2001)

Hershey, Movellan (1999)

• Tracking Vermaak, Gangnet, Blake, Pérez (2001)

• Biological systemsGutfreund, Zheng, Knudsen (2002)

47

Page 3: “ Pixels that Sound ” Find pixels that correspond (correlate !?) to sound

Problem: Different Modalities

camera

microphone

audio-visual analysis

Visual data

25 frames/sec

Each frame: 576 x 720 pixels

Audio data

44.1 KHz, few bands

Not stereophonic

Kidron, Schechner, Elad, Pixels that Sound

47

Page 4: “ Pixels that Sound ” Find pixels that correspond (correlate !?) to sound

Previous Work

Pointwise correlationNock, Iyengar, Neti (2002)

Hershey, Movellan (1999)

Ill-posed(lack of data)

• Canonical Correlation Analysis (CCA)Smaragdis, Casey (2003)

Li, Dimitrova, Li, Sethi (2003)

Slaney, Covell (2000)

Cluster of pixels - linear superposition

• Mutual Information (MI)Fisher et. al. (2001)

Cutler, Davis (2000)

Bregler,Konig (1994)

NotTypical

highly complex

54

Page 5: “ Pixels that Sound ” Find pixels that correspond (correlate !?) to sound

Kidron, Schechner, Elad, Pixels that Sound

49

ProjectionProjection

Video Audio

Pixel #1

Pixel #2

Pixel #3

Band #1

Band #2

Optimal Optimal visual components

CCA

Page 6: “ Pixels that Sound ” Find pixels that correspond (correlate !?) to sound

Visual Projection

1Dvariable

Projection

34012052687436859Video features• Pixels intensity• Transform coeff (wavelet)• Image differences

v

40

Page 7: “ Pixels that Sound ” Find pixels that correspond (correlate !?) to sound

Audio Projection

1Dvariable

Projection

Audio features• Average energy per frame• Transform coeffs per frame

a

41

Page 8: “ Pixels that Sound ” Find pixels that correspond (correlate !?) to sound

Canonical Correlation

Video AudioRepresentation

Projections(per time window)

Random variables(time dependent)

Correlation coefficient

42

Page 9: “ Pixels that Sound ” Find pixels that correspond (correlate !?) to sound

CCA Formulation

yield an eigenvalue problem:Knutsson, Borga, Landelius (1995)

CanonicalCorrelationProjections

Largest Eigenvalue

equivalent to

Corresponding Eigenvectors

43

Page 10: “ Pixels that Sound ” Find pixels that correspond (correlate !?) to sound

Visual Data

t (frames)

Spatial Location(pixels intensities)

Kidron, Schechner, Elad, Pixels that Sound

51

Page 11: “ Pixels that Sound ” Find pixels that correspond (correlate !?) to sound

Rank Deficiency

t (frames)

Spatial Location(pixels intensities)

=

Kidron, Schechner, Elad, Pixels that Sound

44

Page 12: “ Pixels that Sound ” Find pixels that correspond (correlate !?) to sound

Estimation of Covariance

Rank deficient

45

Page 13: “ Pixels that Sound ” Find pixels that correspond (correlate !?) to sound

Ill-Posedness

Prior solutions:

• Use many more frames poor temporal resolution.

• Aggressive spatial pruning poor spatial resolution.

• Trivial regularization

Impossible to invert !!!

46

Page 14: “ Pixels that Sound ” Find pixels that correspond (correlate !?) to sound

A General Problem

Small amount of data

The problem is ILL-POSED

Over fitting is likely

Large number of weights

47

Page 15: “ Pixels that Sound ” Find pixels that correspond (correlate !?) to sound

An Equivalent Problem

Minimizing

Maximizing

48

Page 16: “ Pixels that Sound ” Find pixels that correspond (correlate !?) to sound

Single Audio Band

(The denominator is non-zero)

Minimizing

Knowndata

A has a single column, and

49

Page 17: “ Pixels that Sound ” Find pixels that correspond (correlate !?) to sound

=

Time

a(ti)

a (1)

a (30)

a (2)

V a

Full correlation if

Underdetermined system !

Kidron, Schechner, Elad, Pixels that Sound

52

end

Page 18: “ Pixels that Sound ” Find pixels that correspond (correlate !?) to sound

Detected correlated pixels

“Out of clutter, find simplicity.

From discord, find harmony.”

Albert Einstein

52

end

Page 19: “ Pixels that Sound ” Find pixels that correspond (correlate !?) to sound

Sparse Solution

• Non-convex• Exponential

complexity

-norm minimum

53

Page 20: “ Pixels that Sound ” Find pixels that correspond (correlate !?) to sound

The -norm criterion

• Sparse• Convex• Polynomial

complexity

in common situations

-norm minimum

Donoho, Elad (2005)

54

Page 21: “ Pixels that Sound ” Find pixels that correspond (correlate !?) to sound

The Minimum Norm Solution

Energy spread

-norm minimum

Solving using -norm (pseudo-inverse, SVD, QR)

55

Page 22: “ Pixels that Sound ” Find pixels that correspond (correlate !?) to sound

Linear programming

Fully correlated

Sparse

No parameters to tweak

Polynomial

Audio-visual events

Maximum correlation: Eigenproblem

Minimum objective function G

56

Page 23: “ Pixels that Sound ” Find pixels that correspond (correlate !?) to sound

Multiple Audio Bands - Solution

-ball

Non-convex constraint

• Convex• Linear

The optimization problem:

57

Page 24: “ Pixels that Sound ” Find pixels that correspond (correlate !?) to sound

1 ball

Multiple Audio Bands

Optimization over each face is:

S1

S2

S3 S4

No parameters to tweak

• Each face: linear programming

58

Page 25: “ Pixels that Sound ” Find pixels that correspond (correlate !?) to sound

Sharp & Dynamic, Despite Distraction

Frame 9 Frame 42 Frame 68

Frame 115 Frame 146 Frame 169

Page 26: “ Pixels that Sound ” Find pixels that correspond (correlate !?) to sound

Frame 51

Frame 106

Frame 83

Frame 177

• Sparse

• Localization on the proper elements

• False alarm – temporally inconsistent

• Handling dynamics

Performing in Audio Noise

Page 27: “ Pixels that Sound ” Find pixels that correspond (correlate !?) to sound

–norm: Energy Spread

Movie #1 Movie #2

Frame 83Frame 146

56

Page 28: “ Pixels that Sound ” Find pixels that correspond (correlate !?) to sound

–norm: Localization

Movie #1 Movie #2

Frame 83Frame 146

57

Page 29: “ Pixels that Sound ” Find pixels that correspond (correlate !?) to sound

The “Chorus Ambiguity”

Who’s talking?

Synchronized talk

Not unique (ambiguous)

Possible solutions:• Left• Right• Both

Page 30: “ Pixels that Sound ” Find pixels that correspond (correlate !?) to sound

The “Chorus Ambiguity”

-norm-norm

feature 1

feature 2

feature 1

feature 2

Both