visual deep learning models,cs.wellesley.edu/~vision/slides/tommy_class.pdf · • human brain...

Post on 11-Jul-2020

4 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Center Name Presenter Name

Visual deep learning models, in particular for face recognition

and models of invariant recognition

in the ventral stream

Towards a theory of the above

tomaso poggio, CBMM, BCS, CSAIL, McGovern MIT

Plan

• Recognition in visual cortex • DCLNs • Deep Face systems • iTheory

Second  Annual  NSF  Site  Visit,  June  2  –  3,  2015

Theoretical/conceptual framework for vision

• The first 100ms of vision: feedforward and invariant: what, who, where

• Top-down needed for verification step and more complex questions: generative models, probabilistic inference, top-down visual routines.

Following this conceptual framework we are working on:

1.a theory of invariance cortical computation —> i-theory2.a generative approach, probabilistic in nature 3.visual routines, and of how they may be learned.

Object  recogni-on

• Human Brain –1010-1011 neurons (~1 million flies) –1014- 1015 synapses

Vision:  what  is  where  

• Ventral stream in rhesus monkey –~109 neurons in the ventral stream

(350 106 in each emisphere) –~15 106 neurons in AIT (Anterior

InferoTemporal) cortex

• ~200M in V1, ~200M in V2, 50M in V4

Van Essen & Anderson, 1990

Source: Lennie, Maunsell, Movshon

Vision:  what  is  where  

[software available online]Riesenhuber & Poggio 1999, 2000; Serre Kouh Cadieu Knoblich Kreiman & Poggio 2005; Serre Oliva Poggio 2007

• It is in the family of “Hubel-Wiesel” models (Hubel & Wiesel, 1959: qual. Fukushima, 1980: quant; Oram & Perrett, 1993: qual; Wallis & Rolls, 1997; Riesenhuber & Poggio, 1999; Thorpe, 2002; Ullman et al., 2002; Mel, 1997; Wersing and Koerner, 2003; LeCun et al 1998: not-bio; Amit & Mascaro, 2003: not-bio; Hinton, LeCun, Bengio not-bio; Deco & Rolls 2006…)

• As a biological model of object recognition in the ventral stream – from V1 to PFC -- it is perhaps the most quantitatively faithful to known neuroscience data

   Recogni-on  in  Visual  Cortex:  ‘’classical  model”,  selec-ve  and  invariant

Feedforward Models: “predict” rapid categorization (82% model vs. 80% humans)

Hierarchical feedforward models of the ventral stream

Why do these networks including DLCNs

work so well?

Models are not enough… we need a theory!

Plan

• Recognition in visual cortex • DCLNs • Deep Face systems • iTheory

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

Invariance via pooling

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

New name for virtual examples

45

46

A poor man regularization!

47

48

49

50

51

52

53

54

55

56

57

58

59

60

Mobileye

Plan

• Recognition in visual cortex • DCLNs • Deep Face systems • iTheory

Plan

• Recognition in visual cortex • DCLNs • Deep Face systems • iTheory

71

i-theoryLearning of invariant&selective Representations in Sensory Cortex

i-theory: exploring a new hypothesis

A main computational goal of the feedforward ventral stream hierarchy — and of vision — is to compute a representation for each incoming image which is invariant to transformations previously experienced in the visual environment.

73

Empirical  demonstraCon:  invariant  representaCon  leads  to  lower  sample  complexity  for  a  supervised  classifier

Theorem   (transla)on   case)  Consider   a   space   of   images   of  dimensions                            pixels  which   may   appear   in   any  posiCon   within   a   window   of  size                      pixels.  The  usual  image   representaCon   yields   a  sample  complexity  (  of  a  linear  c l a s s i fi e r )     o f  order                                ;the    oracle  representaCon     (invariant)  yields   (because   of   much  smaller   covering   numbers)   a    sample  complexity  of  order

d × d

rd × rd

m = O(r2d 2 )

moracle = O(d2 ) =

mimage

r2

74

An algorithm that learns in an unsupervised way to compute invariant representations

ν

P(ν )

νµkn(I) = 1/|G|

|G|X

i=1

�(I · gitk + n�)

75

Invariant signature from a single image of a new object

We need only a finite number of projections, K, to distinguish among n images.

Similar in spirit to Johnson-Lindestrauss

 

 

 

Local and global invariance: whole-parts theorem

l=4

l=3

l=2

l=1HW module

biophysics: prediction

...

Basic machine: a HW module (dot products and histograms/moments for image seen through RF)

• The cumulative histogram (empirical cdf) can be be computed as

• This maps directly into a set of simple cells with threshold

• …and a complex cell indexed by n and k summating the simple cells

µnk (I ) = 1

|G |σ ( I ,git

k + nΔ)i=1

|G |

The nonlinearity can be rather arbitrary for invariance provided it is stationary in time

Second  Annual  NSF  Site  Visit,  June  2  –  3,  2015

Dendrites of a complex cells as simple cells…

Active properties in the dendrites of the complex cell

Plan

• i-theory • DCLNs • equivalence to DCLNs, theory notes • Some predictions of i-theory • Deep Face systems

top related