collaborators - home | computer sciencecseweb.ucsd.edu/~gary/258a/sun-nimble-small.pdf · salience...

24
1 SUN: SUN: A Model of Visual Salience Using Natural Statistics Gary Cottrell Gary Cottrell Lingyun Lingyun Zhang Zhang Matthew Tong Matthew Tong Tim Marks Tim Marks Honghao Honghao Shan Shan Nick Nick Butko Butko Javier Javier Movellan Movellan Chris Chris Kanan Kanan 2 SUN: SUN: A Model of Visual Salience Using Natural Statistics …and it use in object and face recognition Gary Cottrell Gary Cottrell Lingyun Lingyun Zhang Zhang Matthew Tong Matthew Tong Tim Marks Tim Marks Honghao Honghao Shan Shan Nick Nick Butko Butko Javier Javier Movellan Movellan Chris Chris Kanan Kanan 3 Matthew H. Tong Matthew H. Tong Collaborators Collaborators Lingyun Zhang Honghao Honghao Shan Shan Tim Marks Tim Marks 4 Collaborators Collaborators Nicholas J. Nicholas J. Butko Butko Javier R. Javier R. Movellan Movellan

Upload: others

Post on 07-Nov-2020

10 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Collaborators - Home | Computer Sciencecseweb.ucsd.edu/~gary/258a/SUN-NIMBLE-small.pdf · salience map predicts human fixations in static images.! We are best in the KL distance

1

SUN:SUN:A Model of Visual Salience Using

Natural Statistics

Gary CottrellGary CottrellLingyun Lingyun Zhang Zhang Matthew Tong Matthew Tong

Tim MarksTim Marks Honghao Honghao ShanShanNick Nick Butko Butko Javier Javier MovellanMovellan

Chris Chris KananKanan2

SUN:SUN:A Model of Visual Salience Using

Natural Statistics…and it use in object and face

recognition

Gary CottrellGary CottrellLingyun Lingyun Zhang Zhang Matthew Tong Matthew Tong

Tim MarksTim Marks Honghao Honghao ShanShanNick Nick Butko Butko Javier Javier MovellanMovellan

Chris Chris KananKanan

3

Matthew H. TongMatthew H. Tong

CollaboratorsCollaborators

Lingyun Zhang

Honghao Honghao ShanShanTim MarksTim Marks 4

CollaboratorsCollaborators

Nicholas J. Nicholas J. ButkoButko Javier R. Javier R. MovellanMovellan

Page 2: Collaborators - Home | Computer Sciencecseweb.ucsd.edu/~gary/258a/SUN-NIMBLE-small.pdf · salience map predicts human fixations in static images.! We are best in the KL distance

5

CollaboratorsCollaborators

Chris Chris KananKanan

6

Visual SalienceVisual Salience

!! Visual SalienceVisual Salience is some notion of what is is some notion of what isinterestinginteresting in the world - it captures our in the world - it captures ourattention.attention.

!! Visual salience is important because itVisual salience is important because itdrives a decision we make a drives a decision we make a couple ofcouple ofhundred thousand times a dayhundred thousand times a day - where to - where tolook.look.

7

Visual SalienceVisual Salience!! Visual SalienceVisual Salience is some notion of what is is some notion of what is

interestinginteresting in the world - it captures our in the world - it captures ourattention.attention.

!! But thatBut that’’s kind of vagues kind of vague……!! The role of Cognitive Science is to make thatThe role of Cognitive Science is to make that

explicit, byexplicit, by creating a creating a working modelworking model of visual of visualsalience.salience.

!! A good way to do that these days is to useA good way to do that these days is to useprobability theory - because as everyone knows,probability theory - because as everyone knows,the brain is Bayesian! ;-)the brain is Bayesian! ;-)

8

Data We Want to ExplainData We Want to Explain

!! Visual search:Visual search:!! Search asymmetry: A search for one object among aSearch asymmetry: A search for one object among a

set of distractors is faster than vice versa.set of distractors is faster than vice versa.!! ParallelParallel vs vs. serial search (and the continuum in. serial search (and the continuum in

between): An item between): An item ““pops outpops out”” of the display noof the display nomatter how many distractorsmatter how many distractors vs vs. reaction time. reaction timeincreasing with the number of increasing with the number of distractors distractors (not(notemphasized in this talkemphasized in this talk……))

!! Eye movements when viewing images andEye movements when viewing images andvideos.videos.

Page 3: Collaborators - Home | Computer Sciencecseweb.ucsd.edu/~gary/258a/SUN-NIMBLE-small.pdf · salience map predicts human fixations in static images.! We are best in the KL distance

9

Audience participation!Audience participation!

Look for the unique itemLook for the unique item

Clap when you find itClap when you find it

10

11 12

Page 4: Collaborators - Home | Computer Sciencecseweb.ucsd.edu/~gary/258a/SUN-NIMBLE-small.pdf · salience map predicts human fixations in static images.! We are best in the KL distance

13 14

15 16

Page 5: Collaborators - Home | Computer Sciencecseweb.ucsd.edu/~gary/258a/SUN-NIMBLE-small.pdf · salience map predicts human fixations in static images.! We are best in the KL distance

17 18

What just happened?What just happened?

!! This phenomenon is called This phenomenon is called the visualthe visualsearch asymmetrysearch asymmetry::!! Tilted bars are more easily found amongTilted bars are more easily found among

vertical bars than vice-versa.vertical bars than vice-versa.!! Backwards Backwards ““ss””’’s s areare more easily found amongmore easily found among

normal normal ““ss””’’s s than vice-versa.than vice-versa.!! Upside-down elephants are more easily foundUpside-down elephants are more easily found

among right-side up ones than vice-versa.among right-side up ones than vice-versa.

19

Why is there an asymmetry?Why is there an asymmetry?

!! There are not too manyThere are not too many computationalcomputationalexplanations:explanations:!! ““Prototypes do not pop outPrototypes do not pop out””!! ““Novelty attracts attentionNovelty attracts attention””

!! Our model of visual salience will naturallyOur model of visual salience will naturallyaccount for this.account for this.

20

Saliency MapsSaliency Maps

!! Koch and Koch and UllmanUllman, 1985: the brain, 1985: the braincalculates an explicit saliency map of thecalculates an explicit saliency map of thevisual worldvisual world

!! Their definition of saliency relied onTheir definition of saliency relied oncenter-surround principlescenter-surround principles!! Points in the visual scene are salient if theyPoints in the visual scene are salient if they

differ from their neighborsdiffer from their neighbors!! In more recent years, there have been aIn more recent years, there have been a

multitude of definitions of saliencymultitude of definitions of saliency

Page 6: Collaborators - Home | Computer Sciencecseweb.ucsd.edu/~gary/258a/SUN-NIMBLE-small.pdf · salience map predicts human fixations in static images.! We are best in the KL distance

21

Saliency MapsSaliency Maps

!! There are a number of candidates for theThere are a number of candidates for thesalience map: there is at least one in LIP, thesalience map: there is at least one in LIP, theLateral Lateral Intraparietal SulcusIntraparietal Sulcus, a region of the, a region of theparietal lobe, also in the frontal eye fields, theparietal lobe, also in the frontal eye fields, thesuperior superior colliculuscolliculus,,…… but there may be but there may berepresentations of salience much earlier in therepresentations of salience much earlier in thevisual pathway - some even suggest in V1.visual pathway - some even suggest in V1.

!! But we wonBut we won’’t be talking about the brain todayt be talking about the brain today……

22

Probabilistic SaliencyProbabilistic Saliency

!! Our basic assumption:Our basic assumption:!! The main goal of the visual system is to findThe main goal of the visual system is to find

potential targets that are important forpotential targets that are important forsurvival, such as prey and predators.survival, such as prey and predators.

!! The visual system should direct attention toThe visual system should direct attention tolocations in the visual field with a highlocations in the visual field with a highprobability of the target class or classes.probability of the target class or classes.

!! We will lump all of the potential targetsWe will lump all of the potential targetstogether in onetogether in one random variable, random variable, TT

!! For ease of exposition, we will leave out ourFor ease of exposition, we will leave out ourlocation random variable,location random variable, L.L.

23

Probabilistic SaliencyProbabilistic Saliency!! Notation: Notation: xx denotes a point in the visual field denotes a point in the visual field

!! TTxx: binary variable signifying whether point : binary variable signifying whether point xx belongs belongsto a target classto a target class

!! FFxx: the visual features at point : the visual features at point xx!! The task is to find the point The task is to find the point xx that maximizes that maximizes

the probability of a target given the features atthe probability of a target given the features atpoint point xx

!! This quantity This quantity isis the saliency of a point the saliency of a point xx!! Note: Note: This is what mostThis is what most classifiers compute!classifiers compute!

24

Probabilistic SaliencyProbabilistic Saliency

!! Taking the log and applying Taking the log and applying BayesBayes’’ RuleRuleresults in:results in:

Page 7: Collaborators - Home | Computer Sciencecseweb.ucsd.edu/~gary/258a/SUN-NIMBLE-small.pdf · salience map predicts human fixations in static images.! We are best in the KL distance

25

Probabilistic SaliencyProbabilistic Saliency

!! log p(log p(FFxx||TTxx))!! Probabilistic description of the features of theProbabilistic description of the features of the

targettarget!! Provides a form ofProvides a form of top-downtop-down (endogenous, (endogenous,

intrinsic)intrinsic) saliencysaliency!! Some similarity to Iconic Search (Some similarity to Iconic Search (Rao Rao et al.,et al.,

1995) and Guided Search (Wolfe, 1989)1995) and Guided Search (Wolfe, 1989)26

Probabilistic SaliencyProbabilistic Saliency

!! log p(log p(TTxx))!! Constant over locations for fixed targetConstant over locations for fixed target

classes, so we can drop it.classes, so we can drop it.!! Note: this is a stripped-down version of ourNote: this is a stripped-down version of our

model, useful for presentations tomodel, useful for presentations toundergraduates! ;-) - wundergraduates! ;-) - we usually include ae usually include alocation variable as well thatlocation variable as well that encodes theencodes theprior probability of targets being in particularprior probability of targets being in particularlocations.locations.

27

Probabilistic SaliencyProbabilistic Saliency

!! -log p(-log p(FFxx))!! This is called the This is called the self-informationself-information of this

variable!! It says that rare It says that rare featurefeature valuesvalues attract attract

attentionattention!! Independent of taskIndependent of task!! Provides notion of Provides notion of bottom-upbottom-up (exogenous, (exogenous,

extrinsic) saliencyextrinsic) saliency28

Probabilistic SaliencyProbabilistic Saliency

!! Now we have two terms:Now we have two terms:!! Top-downTop-down saliency saliency!! Bottom-upBottom-up saliency saliency!! Taken together, this is the Taken together, this is the pointwise pointwise mutualmutual

informationinformation between the features and the between the features and thetargettarget

Page 8: Collaborators - Home | Computer Sciencecseweb.ucsd.edu/~gary/258a/SUN-NIMBLE-small.pdf · salience map predicts human fixations in static images.! We are best in the KL distance

29

Math in Action:Math in Action:Saliency Using Saliency Using ““NaturalNatural

StatisticsStatistics””!! For most of what I will be telling youFor most of what I will be telling you

about next, we use only the -log p(F) term,about next, we use only the -log p(F) term,or bottom up salience.or bottom up salience.

!! Remember, this means rare feature valuesRemember, this means rare feature valuesattract attention.attract attention.

!! This This isis a computational instantiation of thea computational instantiation of theidea that idea that ““novelty attracts attentionnovelty attracts attention””

30

Math in Action:Math in Action:Saliency Using Saliency Using ““NaturalNatural

StatisticsStatistics””!! Remember, this means rare feature valuesRemember, this means rare feature values

attract attention.attract attention.!! This means two things:This means two things:

!! We needWe need some features (that havesome features (that havevalues!)! What should we use?values!)! What should we use?

!! We need to know when the values areWe need to know when the values areunusualunusual: So we need : So we need experience.experience.

31

Math in Action:Math in Action:Saliency Using Saliency Using ““NaturalNatural

StatisticsStatistics””!! Experience, in this case, means collectingExperience, in this case, means collecting

statisticsstatistics of how the features respond toof how the features respond tonatural images.natural images.

!! We will use two kinds of features:We will use two kinds of features:!! Difference of Gaussians (Difference of Gaussians (DOGsDOGs))!! Independent ComponentsIndependent Components Analysis (ICA)Analysis (ICA)

derived featuresderived features

32

Feature Space 1:Feature Space 1:Differences of GaussiansDifferences of Gaussians

These respond to differences in brightnessThese respond to differences in brightnessbetween the center and the surround.between the center and the surround.We apply them to three different colorWe apply them to three different colorchannels separately (intensity, Red-Green andchannels separately (intensity, Red-Green andBlue-Yellow) at four scales: 12 features total.Blue-Yellow) at four scales: 12 features total.

Page 9: Collaborators - Home | Computer Sciencecseweb.ucsd.edu/~gary/258a/SUN-NIMBLE-small.pdf · salience map predicts human fixations in static images.! We are best in the KL distance

33

Feature Space 1:Feature Space 1:Differences of GaussiansDifferences of Gaussians

!! Now, we run these over Now, we run these over LingyunLingyun’’ssvacation photos, and record howvacation photos, and record howfrequently they respond.frequently they respond.

34

Feature Space 2:Feature Space 2:Independent ComponentsIndependent Components

35

Learning the DistributionLearning the Distribution

We fit a generalized Gaussian distribution tothe histogram of each feature.

p(Fi ;! i ,"i ) =" i

2! i # $1"i

%&'

()*

exp + Fi! i

"i%

&''

(

)**

log p(Fi ) = const.!Fi" i

#i

where Fi is the ith filter response, !i is the shape parameter and " i is the scale parameter.

36

•• This is P(F) for four different features.This is P(F) for four different features.•• Note these features are Note these features are sparse sparse - I.e.,- I.e.,their most frequent response istheir most frequent response is near 0.near 0.•• When there is a big response (positiveWhen there is a big response (positiveor negative), it is interesting!or negative), it is interesting!

The Learned DistributionThe Learned Distribution((DOGsDOGs))

Page 10: Collaborators - Home | Computer Sciencecseweb.ucsd.edu/~gary/258a/SUN-NIMBLE-small.pdf · salience map predicts human fixations in static images.! We are best in the KL distance

37

The Learned DistributionThe Learned Distribution(ICA)(ICA)

!! For example, hereFor example, here’’s as afeature:feature:

!! HereHere’’ss a frequency counta frequency countof how often it matchesof how often it matches aapatch of image:patch of image:

!! Most of the time, itMost of the time, itdoesndoesn’’t match at all - at match at all - aresponse of response of ““00””

!! Very infrequently, itVery infrequently, itmatches very well - amatches very well - aresponse of response of ““200200””

BOREDOM!

NOVELTY!38

Bottom-up SaliencyBottom-up Saliency

!! We have to estimate the joint probabilityWe have to estimate the joint probabilityfrom the features.from the features.

!! If all filter responses are independent:If all filter responses are independent:

!! TheyThey’’re not independent, but we proceedre not independent, but we proceedas if they are. (ICA features areas if they are. (ICA features are ““prettyprettyindependentindependent””))

!! Note: No weighting of features isNote: No weighting of features isnecessary!necessary!

! log p(F) = ! log p(Fi )i"

39

Qualitative Results: BU SaliencyQualitative Results: BU SaliencyOriginalOriginal HumanHuman DOGDOG ICAICAImageImage fixationsfixations Salience Salience SalienceSalience

40

OriginalOriginal HumanHuman DOGDOG ICAICAImageImage fixationsfixations Salience Salience SalienceSalience

Qualitative Results: BU SaliencyQualitative Results: BU Saliency

Page 11: Collaborators - Home | Computer Sciencecseweb.ucsd.edu/~gary/258a/SUN-NIMBLE-small.pdf · salience map predicts human fixations in static images.! We are best in the KL distance

41

Qualitative Results: BU SaliencyQualitative Results: BU Saliency

42

Quantitative Results: BU SaliencyQuantitative Results: BU Saliency

!! These are quantitative measures of how well theThese are quantitative measures of how well thesalience map predicts salience map predicts humanhuman fixations in static images. fixations in static images.

!! We are best in the KL distance measure, and secondWe are best in the KL distance measure, and secondbest in the ROC measure.best in the ROC measure.

!! Our main competition is Bruce & Our main competition is Bruce & TsotsosTsotsos, who have, who haveessentially the same idea we have, except theyessentially the same idea we have, except theycompute novelty compute novelty in the current image.in the current image.

0.6682(0.0008)0.6682(0.0008)0.2097(0.0016)0.2097(0.0016)SUN (ICA)SUN (ICA)

0.6570(0.0007)0.6570(0.0007)0.1723(0.0012)0.1723(0.0012)SUN (SUN (DoGDoG))

0.6395(0.0007)0.6395(0.0007)0.1535(0.0016)0.1535(0.0016)Gao Gao & & Vasconcelos Vasconcelos (2007)(2007)

0.6727(0.0008)0.6727(0.0008)0.2029(0.0017)0.2029(0.0017)Bruce & Bruce & Tsotsos Tsotsos (2006)(2006)

0.6146(0.0008)0.6146(0.0008)0.1130(0.0011)0.1130(0.0011)Itti Itti et al.(1998)et al.(1998)

ROC(SE)ROC(SE)KL(SE)KL(SE)ModelModel

43

Related WorkRelated Work

!! Torralba Torralba et al. (2003) derives a similaret al. (2003) derives a similarprobabilistic account of saliency, but:probabilistic account of saliency, but:!! Uses Uses currentcurrent image image’’s statisticss statistics!! Emphasizes effects of global features andEmphasizes effects of global features and

scene gistscene gist!! Bruce and Bruce and Tsotsos Tsotsos (2006) also use self-(2006) also use self-

information as bottom-up saliencyinformation as bottom-up saliency!! Uses Uses currentcurrent image image’’s statisticss statistics

44

Related WorkRelated Work

!! The use of the current imageThe use of the current image’’s statistics means:s statistics means:!! These models follow a very different principle: findsThese models follow a very different principle: finds

rare feature values rare feature values in the current imagein the current image instead of instead ofunusual feature values in general: novelty.unusual feature values in general: novelty.

!! As weAs we’’ll see, novelty helps explain severalll see, novelty helps explain severalsearch asymmetriessearch asymmetries

!! Models using the current imageModels using the current image’’s statistics ares statistics areunlikely to be unlikely to be neurally neurally computable in thecomputable in thenecessary timeframe, as the system must collectnecessary timeframe, as the system must collectstatistics from entire image to calculate localstatistics from entire image to calculate localsaliency at each pointsaliency at each point

Page 12: Collaborators - Home | Computer Sciencecseweb.ucsd.edu/~gary/258a/SUN-NIMBLE-small.pdf · salience map predicts human fixations in static images.! We are best in the KL distance

45

Search AsymmetrySearch Asymmetry

!! Our definition of bottom-up saliency leads to aOur definition of bottom-up saliency leads to aclean explanation of several search asymmetriesclean explanation of several search asymmetries(Zhang, Tong, and Cottrell, 2007)(Zhang, Tong, and Cottrell, 2007)!! All else being equal, targets with uncommon featureAll else being equal, targets with uncommon feature

values are easier to findvalues are easier to find!! Examples:Examples:

!! Treisman Treisman andand Gormican Gormican, 1988 - A tilted bar is more easily, 1988 - A tilted bar is more easilyfound among vertical bars than vice versafound among vertical bars than vice versa

!! Levin, 2000 - For Caucasian subjects, finding an African-Levin, 2000 - For Caucasian subjects, finding an African-American face in Caucasian faces is faster due to its relativeAmerican face in Caucasian faces is faster due to its relativerarity in our experience (basketball fans who have to identifyrarity in our experience (basketball fans who have to identifythe players do not show this effect).the players do not show this effect).

46

Search Asymmetry ResultsSearch Asymmetry Results

47

Search Asymmetry ResultsSearch Asymmetry Results

48

Top-down salienceTop-down saliencein Visual Searchin Visual Search

!! Suppose we actually have a target in mind - e.g.,Suppose we actually have a target in mind - e.g.,find pictures, or mugs, or people in scenes.find pictures, or mugs, or people in scenes.

!! As I mentioned previously, the originalAs I mentioned previously, the original(stripped down)(stripped down) salience model can besalience model can beimplemented as a classifier applied to each pointimplemented as a classifier applied to each pointin the image.in the image.

!! When we include location, we get (after a largeWhen we include location, we get (after a largenumber of completely unwarrantednumber of completely unwarrantedassumptions):assumptions):

log saliencex = ! log p(F = fx )

Self-information:Bottom-up saliency

! "## $##+ log p(F = fx | Tx = 1)

Log likelihood:Top-down knowledge

of appearance

! "### $###+ log p(Tx = 1 | L = l)

Location prior:Top-down knowledge

of target's location

! "### $###

Page 13: Collaborators - Home | Computer Sciencecseweb.ucsd.edu/~gary/258a/SUN-NIMBLE-small.pdf · salience map predicts human fixations in static images.! We are best in the KL distance

49

Qualitative Results (mug search)Qualitative Results (mug search)

!! Where weWhere wedisagree the mostdisagree the mostwith with Torralba Torralba etetal. (2006)al. (2006)

GISTGIST

SUNSUN50

Qualitative Results (picture search)Qualitative Results (picture search)

!! Where weWhere wedisagree the mostdisagree the mostwith with Torralba Torralba etetal. (2006)al. (2006)

GISTGIST

SUNSUN

51

Qualitative Results (people search)Qualitative Results (people search)

!! Where weWhere we agreeagreethe most withthe most withTorralba Torralba et al.et al.(2006)(2006)

GISTGIST

SUNSUN 52

Qualitative Results (painting search)Qualitative Results (painting search)

!! This is an example where SUN and humansThis is an example where SUN and humansmake the same mistake due to the similarmake the same mistake due to the similarappearance of TVappearance of TV’’s and pictures (the blacks and pictures (the blacksquare in the uppersquare in the upper left is a TV!).left is a TV!).

Image Humans SUN

Page 14: Collaborators - Home | Computer Sciencecseweb.ucsd.edu/~gary/258a/SUN-NIMBLE-small.pdf · salience map predicts human fixations in static images.! We are best in the KL distance

53

Quantitative ResultsQuantitative Results

!! Area Under the ROC Curve (AUC) givesArea Under the ROC Curve (AUC) givesbasically identical results.basically identical results. 54

Saliency of Dynamic ScenesSaliency of Dynamic Scenes

!! Created spatiotemporal filtersCreated spatiotemporal filters!! Temporal filters: Difference ofTemporal filters: Difference of

exponentials (exponentials (DoEDoE))!! Highly active if changeHighly active if change!! If features stay constant, goes toIf features stay constant, goes to

zero responsezero response!! Resembles responses of someResembles responses of some

neurons (cells in LGN)neurons (cells in LGN)!! Easy to computeEasy to compute

!! Convolve with spatial filters toConvolve with spatial filters tocreate spatiotemporal filterscreate spatiotemporal filters

55

Saliency of Dynamic ScenesSaliency of Dynamic Scenes

!! Bayesian Saliency (Bayesian Saliency (Itti Itti and and BaldiBaldi, 2006):, 2006):!! Saliency is Bayesian Saliency is Bayesian ““surprisesurprise”” (different from self- (different from self-

information)information)!! Maintain distribution over a set of models attemptingMaintain distribution over a set of models attempting

to explain the data, P(M)to explain the data, P(M)!! As new data comes in, calculate saliency of a point asAs new data comes in, calculate saliency of a point as

the degree to which it makes you alter your modelsthe degree to which it makes you alter your models!! Total surprise: S(D, M) = KL(P(M|D); P(M))Total surprise: S(D, M) = KL(P(M|D); P(M))

!! Better predictor than standard spatial salienceBetter predictor than standard spatial salience!! Much more complicated (~500,000 differentMuch more complicated (~500,000 different

distributions being modeled) than SUN dynamicdistributions being modeled) than SUN dynamicsaliency (days to run saliency (days to run vsvs. hours or real-time). hours or real-time) 56

Saliency of Dynamic ScenesSaliency of Dynamic Scenes

!! In the process of evaluating and comparing, weIn the process of evaluating and comparing, wediscovered how much the center-bias of humandiscovered how much the center-bias of humanfixations was affecting results.fixations was affecting results.

!! Most human fixations are towards the center ofMost human fixations are towards the center ofthe screen (the screen (ReinagelReinagel, 1999), 1999)

Accumulated human fixations from three experiments

Page 15: Collaborators - Home | Computer Sciencecseweb.ucsd.edu/~gary/258a/SUN-NIMBLE-small.pdf · salience map predicts human fixations in static images.! We are best in the KL distance

57

Saliency of Dynamic ScenesSaliency of Dynamic Scenes

!! Results varied widely depending on howResults varied widely depending on howedges were handlededges were handled!! How is the invalid portion of the convolutionHow is the invalid portion of the convolution

handled?handled?

Accumulated saliency of three models 58

Saliency of Dynamic ScenesSaliency of Dynamic Scenes

Initial results

59

Measures of Dynamic SaliencyMeasures of Dynamic Saliency

!! Typically, the algorithm is compared to the humanTypically, the algorithm is compared to the humanfixations within a framefixations within a frame!! I.e., how salient is the human-fixated point according toI.e., how salient is the human-fixated point according to

the model versus all other points in the framethe model versus all other points in the frame!! This measure is subject to the center bias - if the bordersThis measure is subject to the center bias - if the borders

are down-weighted, the score goes upare down-weighted, the score goes up

60

Measures of Dynamic SaliencyMeasures of Dynamic Saliency

!! An alternative is to compare the salience of theAn alternative is to compare the salience of thehuman-fixated point to the same point acrosshuman-fixated point to the same point acrossframesframes!! Underestimates performance, since often locations areUnderestimates performance, since often locations are

genuinely more salient at all time points (ex. an anchorgenuinely more salient at all time points (ex. an anchor’’ssface during a news broadcast)face during a news broadcast)

!! Gives any static measure (e.g., centered-Gaussian) aGives any static measure (e.g., centered-Gaussian) abaseline score of 0.baseline score of 0.

!! This is equivalent to sampling from the distribution ofThis is equivalent to sampling from the distribution ofhuman fixations, rather than uniformlyhuman fixations, rather than uniformly

!! On this set of measures, we perform comparably withOn this set of measures, we perform comparably with((Itti Itti andand Baldi Baldi, 2006), 2006)

Page 16: Collaborators - Home | Computer Sciencecseweb.ucsd.edu/~gary/258a/SUN-NIMBLE-small.pdf · salience map predicts human fixations in static images.! We are best in the KL distance

61

Saliency of Dynamic ScenesSaliency of Dynamic Scenes

Results using non-center-biased metrics onthe human fixation data on videos from

Itti(2005) - 4 subjects/movie, 50 movies,~25 minutes of video. 62

MoviesMovies……

63 64

Page 17: Collaborators - Home | Computer Sciencecseweb.ucsd.edu/~gary/258a/SUN-NIMBLE-small.pdf · salience map predicts human fixations in static images.! We are best in the KL distance

65 66

DemoDemo……

67

Summary of this part of the talkSummary of this part of the talk!! It is a good idea to start from firstIt is a good idea to start from first

principles.principles.!! Often the simplest model is bestOften the simplest model is best!! Our model of salience rocks.Our model of salience rocks.

!! It does bottom upIt does bottom up!! It does top downIt does top down!! It does video (fast!)It does video (fast!)!! ItIt naturally accounts for searchnaturally accounts for search

asymmetriesasymmetries 68

Summary and ConclusionsSummary and Conclusions!! But, as is usually the case with gradBut, as is usually the case with grad

students, students, Lingyun Lingyun didndidn’’t do everythingt do everythingI askedI asked……

!! We are beginning to explore modelsWe are beginning to explore modelsbased on utility: Some targets are morebased on utility: Some targets are moreuseful than others, depending on theuseful than others, depending on thestate of the animalstate of the animal

!! We are also looking at using ourWe are also looking at using ourhierarchical ICA model, to get higher-hierarchical ICA model, to get higher-level featureslevel features

Page 18: Collaborators - Home | Computer Sciencecseweb.ucsd.edu/~gary/258a/SUN-NIMBLE-small.pdf · salience map predicts human fixations in static images.! We are best in the KL distance

69

Summary and ConclusionsSummary and Conclusions!! And a And a foveated foveated retina,retina,!! And updating the salience based onAnd updating the salience based on

where the model looks (as is actuallywhere the model looks (as is actuallyseen in LIP).seen in LIP).

70

Christopher Christopher KananKananGarrison CottrellGarrison Cottrell

71

!! Now we have a model ofNow we have a model ofsalience - but whatsalience - but what can itcan it bebeused for?used for?

!! Here, we show that we canHere, we show that we canuse it to recognize objects.use it to recognize objects.

Christopher Kanan

MotivationMotivation

72

"" Our attention isOur attention isautomatically drawn toautomatically drawn tointeresting regions ininteresting regions inimages.images.

"" Our salience algorithm isOur salience algorithm isautomatically drawn toautomatically drawn tointeresting regions ininteresting regions inimages.images.

"" These are usefulThese are useful locationslocationsfor for discriminatingdiscriminating one oneobject (face, butterfly) fromobject (face, butterfly) fromanother.another.

One reason why this might beOne reason why this might bea good ideaa good idea……

Page 19: Collaborators - Home | Computer Sciencecseweb.ucsd.edu/~gary/258a/SUN-NIMBLE-small.pdf · salience map predicts human fixations in static images.! We are best in the KL distance

73

!! Training Phase (learning objectTraining Phase (learning objectappearances):appearances):

!! Use the salience map to decide Use the salience map to decide wherewhereto look. to look. (We use the ICA salience map)(We use the ICA salience map)

!! Memorize these Memorize these samples samples of theof theimage, with image, with labelslabels (Bob, Carol, Ted,(Bob, Carol, Ted,or Alice) or Alice) (We store the ICA (We store the ICA feature valuesfeature values))

Christopher Kanan

Main IdeaMain Idea

74

!! Testing Phase (recognizingTesting Phase (recognizingobjects we have learned):objects we have learned):

!! Now, given a new face, use the salienceNow, given a new face, use the saliencemap to decide where to look.map to decide where to look.

!! Compare Compare new new image samples to image samples to storedstoredones - tones - the closest ones in memory get tohe closest ones in memory get tovote for their label.vote for their label.

Christopher Kanan

Main IdeaMain Idea

75

Stored memories of BobStored memories of AliceNew fragments

75Result: 7 votes for Alice, only 3 for Bob. It’s Alice! 76

VotingVoting

!! The voting process is actually based onThe voting process is actually based onBayesian updating (and the Naïve Bayesian updating (and the Naïve BayesBayesassumption).assumption).

!! The size of the vote depends on theThe size of the vote depends on thedistance from the stored sample, usingdistance from the stored sample, usingkernel density estimation.kernel density estimation.

!! Hence NIMBLE: NIM with BayesianHence NIMBLE: NIM with BayesianLikelihood Estimation.Likelihood Estimation.

Page 20: Collaborators - Home | Computer Sciencecseweb.ucsd.edu/~gary/258a/SUN-NIMBLE-small.pdf · salience map predicts human fixations in static images.! We are best in the KL distance

77

!! The ICA features doThe ICA features do double-duty:double-duty:!! They areThey are combinedcombined to make the salience map to make the salience map

- which is used to decide where to look- which is used to decide where to look!! They are They are storedstored to represent the object at to represent the object at

that locationthat location

Overview of the systemOverview of the system

78

!! Compare this to standard computerCompare this to standard computervision systems:vision systems:

!! One pass over the image, and globalOne pass over the image, and globalfeatures.features.

Image GlobalFeatures

GlobalClassifier

Decision

NIMBLE NIMBLE vsvs. Computer Vision. Computer Vision

79 80Belief After 1 Fixation Belief After 10 Fixations

Page 21: Collaborators - Home | Computer Sciencecseweb.ucsd.edu/~gary/258a/SUN-NIMBLE-small.pdf · salience map predicts human fixations in static images.! We are best in the KL distance

81

!! Human vision works in multiple environmentsHuman vision works in multiple environments- our basic features (neurons!) don- our basic features (neurons!) don’’t changet changefrom one problem to the next.from one problem to the next.

!! We tune our parameters so that the systemWe tune our parameters so that the systemworks well on Bird and Butterfly datasets - andworks well on Bird and Butterfly datasets - andthen apply the system then apply the system unchangedunchanged to faces, to faces,flowers, and objectsflowers, and objects

!! This is very different from standard computerThis is very different from standard computervision systems, that are tuned to particular setvision systems, that are tuned to particular set

Christopher Kanan

Robust VisionRobust Vision

82

Cal Tech 101: 101 Different Categories

AR dataset: 120 Different People with different lighting, expression, and accessories

83

Flowers: 102 Different Flower SpeciesFlowers: 102 Different Flower Species

Christopher Kanan

84

!! ~7 fixations required to achieve~7 fixations required to achieveat least 90% of maximumat least 90% of maximumperformanceperformanceChristopher Kanan

Page 22: Collaborators - Home | Computer Sciencecseweb.ucsd.edu/~gary/258a/SUN-NIMBLE-small.pdf · salience map predicts human fixations in static images.! We are best in the KL distance

85

!! So, we created a simple cognitive modelSo, we created a simple cognitive modelthat uses simulated fixations to recognizethat uses simulated fixations to recognizethings.things.!! But it isnBut it isn’’t t thatthat complicated. complicated.

!! How does it compare to approaches inHow does it compare to approaches incomputer vision?computer vision?

86

!! Caveats:Caveats:!! As of mid-2010.As of mid-2010.!! Only comparing to single feature typeOnly comparing to single feature type

approaches (no approaches (no ““Multiple KernelMultiple KernelLearningLearning”” (MKL) approaches). (MKL) approaches).

!! Still superior to MKL with very fewStill superior to MKL with very fewtraining examples per category.training examples per category.

871 5 15 30

NUMBER OF TRAINING EXAMPLES 881 2 3 6 8

NUMBER OF TRAINING EXAMPLES

Page 23: Collaborators - Home | Computer Sciencecseweb.ucsd.edu/~gary/258a/SUN-NIMBLE-small.pdf · salience map predicts human fixations in static images.! We are best in the KL distance

89 90

!! More More neurally neurally and behaviorallyand behaviorallyrelevant gaze control andrelevant gaze control andfixation integration.fixation integration.!!People donPeople don’’t randomly samplet randomly sample

images.images.!! A A foveated foveated retinaretina!! Comparison with human eyeComparison with human eye

movement data duringmovement data duringrecognition/classification ofrecognition/classification offaces, objects, etc.faces, objects, etc.

91

!! A fixation-based approach can workA fixation-based approach can workwell for image classification.well for image classification.

!! Fixation-based models can achieve,Fixation-based models can achieve,and even exceed, some of the bestand even exceed, some of the bestmodels in computer vision.models in computer vision.

……Especially when you donEspecially when you don’’t have at have alot of training images.lot of training images.

Christopher Kanan

92

!! Software and Paper AvailableSoftware and Paper Availableatatwww.www.chriskananchriskanan.com.com

!!ckanan@[email protected] work was supportedby the NSF (grant #SBE-0542013) to the Temporal

Dynamics of LearningCenter.

Page 24: Collaborators - Home | Computer Sciencecseweb.ucsd.edu/~gary/258a/SUN-NIMBLE-small.pdf · salience map predicts human fixations in static images.! We are best in the KL distance

93Thanks!Thanks!