deep networks in the brain - home | computer sciencedasgupta/254-deep/vicente.pdf · 7figure from...

Deep Networks in the Brain

Vicente L. Malave

January 26, 2012

A quantitative theory of immediate visual recognition

1

1Figure from [Serre et al., 2007]

I’m going to present the model, talk about where the data comefrom, and be critical of some of the claims.

Most of this is not in the paper, but I want you to put this incontext, be aware of some things omitted from the paper, andto be more critical when someone claims to know how the brainworks.

Outline

Early Visual System : Simple and Complex Cells

The Model

Electrophysiological evidence for early object recognition.

Claim: Linear Classifiers indicate that information isrepresented in IT cortex

A V1 strawman.

Conclusions

Visual System

2

2Figure from [Felleman and Van Essen, 1991]

What Serre et al think the Visual System Does

In summary, the accumulated evidence points to four,mostly accepted, properties of the feed-forward path ofthe ventral stream architecture: (a) a hierarchicalbuild-up of invariances first to position and scale andthen to viewpoint and other transformations; (b) anincreasing selectivity, originating from inputs fromprevious layers and areas, with a parallel increase inboth the size of the receptive elds and in the complexityof the optimal stimulus; (c) a basic feedforwardprocessing of information (for immediate recognitiontasks); and (d) plasticity and learning probably at allstages with a time scale that decreases from V1 to ITand PFC.

It’s often claimed that object recognition is feedforward.

3

3Figure from [Thorpe and Fabre-Thorpe, 2001]

Feedforward Object Recognition

I Very fast (but how fast, exactly?)I No eye movementsI No attention [Li et al., 2002]

Simple Cells

Most sparse coding papers contain this figure, and an argumentthat it’s similar to what V1 cells do.

4

4Figure from [Olshausen and D.J., 1996]

Recording a receptive field

(tell the Hubel and Wiesel story, show the video clip) http://www.youtube.com/watch?v=KE952yueVLA&feature=related

http://www.youtube.com/watch?v=KE952yueVLA&feature=related

http://www.youtube.com/watch?v=KE952yueVLA&feature=related

Simple Cells

5

5Figure from [Hubel and Wiesel, 1962]

Complex Cells

Complex cells have some mild invariance.

6


Simple Cells can be Modeled by Gabors

The gabor filter [Daugman, 1985] is a good model of whatsimple cells do.

7

7Figure from [Jones and Palmer, 1987]


8



9


Being a little more formal: Reverse Correlation

For more detail on how to measure neural tuning, see[Wu et al., 2006, Dayan and Abbott, 2001]

Outline


The Model



A V1 strawman.

Conclusions

How to build complex cells

Hubel and Wiesel proposed that you can build invariances bycombining a set of input functions.

10



The model is an attempt to push this idea as far as they can.

11


Simple Cells : Radial Basis Functions

Radial Basis Function [Bishop, 1995] is:

y = exp

− 12σ2

N∑j=1

(wj − xj)2

(1)

Complex Cells : Max

y = maxj=1,...,N

xj (2)


The model is an attempt to push this idea as far as they can.

12


How do they learn the parameters? [Serre et al., 2005]

This paper’s main claim is that the model reproduces many ofthe experimental results. How good are these experiments?

Outline


The Model



A V1 strawman.

Conclusions

Speed of Processing in the human visual system

The main result (in humans) comes from [Thorpe et al., 1996].First, I’ll explain the method.

Clearly the upper bound on processing time is 445 ms.

13

13Figure from [Thorpe et al., 1996]

14

14Figure from [Luck, 2005]

Fast Recognition

The ERPs are different after about 150 milliseconds, at afrontal electrode.

15

15Figure from [Thorpe et al., 1996]

Here’s the problem. We know some EEG activity isstimulus-driven: it could have nothing to do with the behavior.

The stimuli

Thorpe used images similar to experiment 1.

16

16Figure from [Johnson and Olshausen, 2003]

Spatial Frequency

There is a huge difference in spatial frequency between thesecategories. Could it be driving the response?

17


What did they actually show?

Condition A Animal TargetCondition B Natural NontargetCondition C Natural TargetCondition D Animal Nontarget

Fast Recognition ?

18


Fast Recognition ?There are two components here: a fast time-locked stimulusdriven part, and slower component which co-varies withreaction time.

19


Caltech 101Not just a psychology problem, in computer vision there can bedataset problems too.

(mean image of caltech 101). [Ponce et al., 2006].

Outline


The Model



A V1 strawman.

Conclusions

Another key paper is [Hung et al., 2005]: in inferior temporalcortex, they can use a linear classifier to readout object identityafter 200 milliseconds.

20

20Figure from [Hung et al., 2005]

21

21Figure from [Hung et al., 2005]

The conclusions, which the model can actually account forI Linear classifiers can readout object identity from Inferior

Temporal (C2b in the model).I Invariance to position and scaling

22


Are these good stimuli?

Object images were not normalized for mean gray level,contrast or other basic image properties. It is possibleto partially read out object category based on some ofthese simple image properties (1).

“Only some spatial patterns of fMRI response are readout in task performance.”

23

23Figure from [Williams et al., 2007]

Behavioral Validation

To their credit, the model does also match human behaviorusign C2b units but not earlier ones.

24


Outline


The Model



A V1 strawman.

Conclusions

How well can you do with V1-like features.?

25

25Figure from [Pinto et al., 2008]

Maybe you’re solving the wrong problem?

26

26Figure from [Pinto et al., 2008]

Outline


The Model



A V1 strawman.

Conclusions

What Serre et al think the Visual System Does

In summary, the accumulated evidence points to four,mostly accepted, properties of the feedforward path ofthe ventral stream architecture: (a) a hierarchicalbuild-up of invariances first to position and scale andthen to viewpoint and other transformations; (b) anincreasing selectivity, originating from inputs fromprevious layers and areas, with a parallel increase inboth the size of the receptive elds and in the complexityof the optimal stimulus; (c) a basic feedforwardprocessing of information (for immediate recognitiontasks); and (d) plasticity and learning probably at allstages with a time scale that decreases from V1 to ITand PFC.

Recommended Reading on Invariance:I [DiCarlo and Cox, 2007]I [Rust and Stocker, 2010]I [Kravitz et al., 2008, Kravitz et al., 2010]

Recommended reading on Hierarchical theories:I Tai-Sing Lee (CMU)

[Lee and Mumford, 2003, Lee and Yuille, 2006]Other things you could read:

I [Friston, 2008, Friston, 2009]I [George and Hawkins, 2005, Hawkins and Blakeslee, 2005]

If nothing else, please keep in mind that whatever theneuroscientists tell you is an inference, and they can often bewrong.

When you read (or write) a sparse coding paper, ask yourself,am I making quantitative claims?

There are some papers on the limits of classifiers compared toother techniques, [Serences and Saproo, 2011,Naselaris et al., 2010, Kriegeskorte, 2011], and I‘m writing abetter one and I’d love to talk about it.

Questions

I Feedforward object recognition ?I A linear classifier isn’t enough, the data have to be

behaviorally useful.I How can you rigorously say your sparse code looks like V1?I Is Object recognition invariant?I What dataset should we use? (easy to criticize, hard to

solve)

Bishop, C. (1995).Neural networks for pattern recognition.

Daugman, J. (1985).Uncertainty relation for resolution in space, spatialfrequency, and orientation optimized by two-dimensionalvisual cortical filters.Optical Society of America, Journal, A: Optics and ImageScience, 2:1160–1169.

Dayan, P. and Abbott, L. (2001).Theoretical neuroscience: Computational and mathematicalmodeling of neural systems.

DiCarlo, J. and Cox, D. (2007).Untangling invariant object recognition.Trends in Cognitive Sciences, 11(8):333–341.

Felleman, D. and Van Essen, D. (1991).Distributed hierarchical processing in the primate cerebralcortex.

Cerebral cortex, 1(1):1.

Friston, K. (2008).Hierarchical models in the brain.PLoS computational biology, 4(11):e1000211.

Friston, K. (2009).The free-energy principle: a rough guide to the brain?Trends in cognitive sciences, 13(7):293–301.

George, D. and Hawkins, J. (2005).A hierarchical bayesian model of invariant patternrecognition in the visual cortex.In Neural Networks, 2005. IJCNN’05. Proceedings. 2005IEEE International Joint Conference on, volume 3, pages1812–1817. Ieee.

Hawkins, J. and Blakeslee, S. (2005).On intelligence.Owl Books.

Hubel, D. and Wiesel, T. (1962).

Receptive fields, binocular interaction and functionalarchitecture in the cat’s visual cortex.The Journal of physiology, 160(1):106.

Hung, C., Kreiman, G., Poggio, T., and DiCarlo, J. (2005).Fast readout of object identity from macaque inferiortemporal cortex.Science, 310(5749):863.

Johnson, J. and Olshausen, B. (2003).Timecourse of neural signatures of object recognition.Journal of Vision, 3(7).

Jones, J. and Palmer, L. (1987).An evaluation of the two-dimensional gabor filter model ofsimple receptive fields in cat striate cortex.Journal of Neurophysiology, 58(6):1233.

Kravitz, D., Kriegeskorte, N., and Baker, C. (2010).High-level visual object representations are constrained byposition.Cerebral Cortex, 20(12):2916.

Kravitz, D., Vinson, L., and Baker, C. (2008).How position dependent is visual object recognition?Trends in cognitive sciences, 12(3):114–122.

Kriegeskorte, N. (2011).Pattern-information analysis: From stimulus decoding tocomputational-model testing.NeuroImage.

Lee, T. and Mumford, D. (2003).Hierarchical bayesian inference in the visual cortex.JOSA A, 20(7):1434–1448.

Lee, T. and Yuille, A. (2006).Efficient coding of visual scenes by grouping andsegmentation.Bayesian Brain: Probabilistic Approaches to Neural Coding,MIT Press, Cambridge, MA, pages 145–188.

Li, F., VanRullen, R., Koch, C., and Perona, P. (2002).

Rapid natural scene categorization in the near absence ofattention.Proceedings of the National Academy of Sciences,99(14):9596.

Luck, S. (2005).An introduction to the event-related potential technique.MIT Press.

Naselaris, T., Kay, K., Nishimoto, S., and Gallant, J.(2010).Encoding and decoding in fMRI.Neuroimage.

Olshausen, B. and D.J., F. (1996).Emergence of simple-cell receptive field properties bylearning a sparse code for natural images.Nature, 381(6583):607–609.

Pinto, N., Cox, D., and DiCarlo, J. (2008).Why is real-world visual object recognition hard?PLoS computational biology, 4(1):e27.

Ponce, J., Berg, T., Everingham, M., Forsyth, D., Hebert,M., Lazebnik, S., Marszalek, M., Schmid, C., Russell, B.,Torralba, A., et al. (2006).Dataset issues in object recognition.Toward category-level object recognition, pages 29–48.

Rust, N. and Stocker, A. (2010).Ambiguity and invariance: two fundamental challenges forvisual processing.Current opinion in neurobiology, 20(3):382–388.

Serences, J. and Saproo, S. (2011).Computational advances towards linking bold and behavior.

Neuropsychologia.

Serre, T., Kouh, M., Cadieu, C., Knoblich, U., Kreiman,G., and Poggio, T. (2005).theory of object recognition: computations and circuits inthe feedforward path of the ventral stream in primate visualcortex,.

Technical Report CBCL Paper #259/AI Memo #2005-036,,Massachusetts Institute of Technology, Cambridge, MA.

Serre, T., Kreiman, G., Kouh, M., Cadieu, C., Knoblich,U., and Poggio, T. (2007).A quantitative theory of immediate visual recognition.Progress in Brain Research, 165:33–56.

Thorpe, S. and Fabre-Thorpe, M. (2001).Seeking categories in the brain.Science, 291(5502):260.

Thorpe, S., Fize, D., Marlot, C., et al. (1996).Speed of processing in the human visual system.nature, 381(6582):520–522.

Williams, M., Dang, S., and Kanwisher, N. (2007).Only some spatial patterns of fmri response are read out intask performance.Nature neuroscience, 10(6):685–686.

Wu, M., David, S., and Gallant, J. (2006).

Complete functional characterization of sensory neurons bysystem identification.Annu. Rev. Neurosci., 29:477–505.

deep networks in the brain - home | computer sciencedasgupta/254-deep/vicente.pdf · 7figure from...

Documents