deep networks in the brain - home | computer sciencedasgupta/254-deep/vicente.pdf · 7figure from...
TRANSCRIPT
Deep Networks in the Brain
Vicente L. Malave
January 26, 2012
A quantitative theory of immediate visual recognition
1
1Figure from [Serre et al., 2007]
I’m going to present the model, talk about where the data comefrom, and be critical of some of the claims.
Most of this is not in the paper, but I want you to put this incontext, be aware of some things omitted from the paper, andto be more critical when someone claims to know how the brainworks.
Outline
Early Visual System : Simple and Complex Cells
The Model
Electrophysiological evidence for early object recognition.
Claim: Linear Classifiers indicate that information isrepresented in IT cortex
A V1 strawman.
Conclusions
Visual System
2
2Figure from [Felleman and Van Essen, 1991]
What Serre et al think the Visual System Does
In summary, the accumulated evidence points to four,mostly accepted, properties of the feed-forward path ofthe ventral stream architecture: (a) a hierarchicalbuild-up of invariances first to position and scale andthen to viewpoint and other transformations; (b) anincreasing selectivity, originating from inputs fromprevious layers and areas, with a parallel increase inboth the size of the receptive elds and in the complexityof the optimal stimulus; (c) a basic feedforwardprocessing of information (for immediate recognitiontasks); and (d) plasticity and learning probably at allstages with a time scale that decreases from V1 to ITand PFC.
It’s often claimed that object recognition is feedforward.
3
3Figure from [Thorpe and Fabre-Thorpe, 2001]
Feedforward Object Recognition
I Very fast (but how fast, exactly?)I No eye movementsI No attention [Li et al., 2002]
Simple Cells
Most sparse coding papers contain this figure, and an argumentthat it’s similar to what V1 cells do.
4
4Figure from [Olshausen and D.J., 1996]
Recording a receptive field
(tell the Hubel and Wiesel story, show the video clip) http://www.youtube.com/watch?v=KE952yueVLA&feature=related
Simple Cells
5
5Figure from [Hubel and Wiesel, 1962]
Complex Cells
Complex cells have some mild invariance.
6
6Figure from [Hubel and Wiesel, 1962]
Simple Cells can be Modeled by Gabors
The gabor filter [Daugman, 1985] is a good model of whatsimple cells do.
7
7Figure from [Jones and Palmer, 1987]
Simple Cells can be Modeled by Gabors
8
8Figure from [Jones and Palmer, 1987]
Simple Cells can be Modeled by Gabors
9
9Figure from [Jones and Palmer, 1987]
Being a little more formal: Reverse Correlation
For more detail on how to measure neural tuning, see[Wu et al., 2006, Dayan and Abbott, 2001]
Outline
Early Visual System : Simple and Complex Cells
The Model
Electrophysiological evidence for early object recognition.
Claim: Linear Classifiers indicate that information isrepresented in IT cortex
A V1 strawman.
Conclusions
How to build complex cells
Hubel and Wiesel proposed that you can build invariances bycombining a set of input functions.
10
10Figure from [Hubel and Wiesel, 1962]
A quantitative theory of immediate visual recognition
The model is an attempt to push this idea as far as they can.
11
11Figure from [Serre et al., 2007]
Simple Cells : Radial Basis Functions
Radial Basis Function [Bishop, 1995] is:
y = exp
− 12σ2
N∑j=1
(wj − xj)2
(1)
Complex Cells : Max
y = maxj=1,...,N
xj (2)
A quantitative theory of immediate visual recognition
The model is an attempt to push this idea as far as they can.
12
12Figure from [Serre et al., 2007]
How do they learn the parameters? [Serre et al., 2005]
This paper’s main claim is that the model reproduces many ofthe experimental results. How good are these experiments?
Outline
Early Visual System : Simple and Complex Cells
The Model
Electrophysiological evidence for early object recognition.
Claim: Linear Classifiers indicate that information isrepresented in IT cortex
A V1 strawman.
Conclusions
Speed of Processing in the human visual system
The main result (in humans) comes from [Thorpe et al., 1996].First, I’ll explain the method.
Clearly the upper bound on processing time is 445 ms.
13
13Figure from [Thorpe et al., 1996]
14
14Figure from [Luck, 2005]
Fast Recognition
The ERPs are different after about 150 milliseconds, at afrontal electrode.
15
15Figure from [Thorpe et al., 1996]
Here’s the problem. We know some EEG activity isstimulus-driven: it could have nothing to do with the behavior.
The stimuli
Thorpe used images similar to experiment 1.
16
16Figure from [Johnson and Olshausen, 2003]
Spatial Frequency
There is a huge difference in spatial frequency between thesecategories. Could it be driving the response?
17
17Figure from [Johnson and Olshausen, 2003]
What did they actually show?
Condition A Animal TargetCondition B Natural NontargetCondition C Natural TargetCondition D Animal Nontarget
Fast Recognition ?
18
18Figure from [Johnson and Olshausen, 2003]
Fast Recognition ?There are two components here: a fast time-locked stimulusdriven part, and slower component which co-varies withreaction time.
19
19Figure from [Johnson and Olshausen, 2003]
Caltech 101Not just a psychology problem, in computer vision there can bedataset problems too.
(mean image of caltech 101). [Ponce et al., 2006].
Outline
Early Visual System : Simple and Complex Cells
The Model
Electrophysiological evidence for early object recognition.
Claim: Linear Classifiers indicate that information isrepresented in IT cortex
A V1 strawman.
Conclusions
Another key paper is [Hung et al., 2005]: in inferior temporalcortex, they can use a linear classifier to readout object identityafter 200 milliseconds.
20
20Figure from [Hung et al., 2005]
21
21Figure from [Hung et al., 2005]
The conclusions, which the model can actually account forI Linear classifiers can readout object identity from Inferior
Temporal (C2b in the model).I Invariance to position and scaling
22
22Figure from [Serre et al., 2007]
Are these good stimuli?
Object images were not normalized for mean gray level,contrast or other basic image properties. It is possibleto partially read out object category based on some ofthese simple image properties (1).
“Only some spatial patterns of fMRI response are readout in task performance.”
23
23Figure from [Williams et al., 2007]
Behavioral Validation
To their credit, the model does also match human behaviorusign C2b units but not earlier ones.
24
24Figure from [Serre et al., 2007]
Outline
Early Visual System : Simple and Complex Cells
The Model
Electrophysiological evidence for early object recognition.
Claim: Linear Classifiers indicate that information isrepresented in IT cortex
A V1 strawman.
Conclusions
How well can you do with V1-like features.?
25
25Figure from [Pinto et al., 2008]
Maybe you’re solving the wrong problem?
26
26Figure from [Pinto et al., 2008]
Outline
Early Visual System : Simple and Complex Cells
The Model
Electrophysiological evidence for early object recognition.
Claim: Linear Classifiers indicate that information isrepresented in IT cortex
A V1 strawman.
Conclusions
What Serre et al think the Visual System Does
In summary, the accumulated evidence points to four,mostly accepted, properties of the feedforward path ofthe ventral stream architecture: (a) a hierarchicalbuild-up of invariances first to position and scale andthen to viewpoint and other transformations; (b) anincreasing selectivity, originating from inputs fromprevious layers and areas, with a parallel increase inboth the size of the receptive elds and in the complexityof the optimal stimulus; (c) a basic feedforwardprocessing of information (for immediate recognitiontasks); and (d) plasticity and learning probably at allstages with a time scale that decreases from V1 to ITand PFC.
Recommended Reading on Invariance:I [DiCarlo and Cox, 2007]I [Rust and Stocker, 2010]I [Kravitz et al., 2008, Kravitz et al., 2010]
Recommended reading on Hierarchical theories:I Tai-Sing Lee (CMU)
[Lee and Mumford, 2003, Lee and Yuille, 2006]Other things you could read:
I [Friston, 2008, Friston, 2009]I [George and Hawkins, 2005, Hawkins and Blakeslee, 2005]
If nothing else, please keep in mind that whatever theneuroscientists tell you is an inference, and they can often bewrong.
When you read (or write) a sparse coding paper, ask yourself,am I making quantitative claims?
There are some papers on the limits of classifiers compared toother techniques, [Serences and Saproo, 2011,Naselaris et al., 2010, Kriegeskorte, 2011], and I‘m writing abetter one and I’d love to talk about it.
Questions
I Feedforward object recognition ?I A linear classifier isn’t enough, the data have to be
behaviorally useful.I How can you rigorously say your sparse code looks like V1?I Is Object recognition invariant?I What dataset should we use? (easy to criticize, hard to
solve)
Bishop, C. (1995).Neural networks for pattern recognition.
Daugman, J. (1985).Uncertainty relation for resolution in space, spatialfrequency, and orientation optimized by two-dimensionalvisual cortical filters.Optical Society of America, Journal, A: Optics and ImageScience, 2:1160–1169.
Dayan, P. and Abbott, L. (2001).Theoretical neuroscience: Computational and mathematicalmodeling of neural systems.
DiCarlo, J. and Cox, D. (2007).Untangling invariant object recognition.Trends in Cognitive Sciences, 11(8):333–341.
Felleman, D. and Van Essen, D. (1991).Distributed hierarchical processing in the primate cerebralcortex.
Cerebral cortex, 1(1):1.
Friston, K. (2008).Hierarchical models in the brain.PLoS computational biology, 4(11):e1000211.
Friston, K. (2009).The free-energy principle: a rough guide to the brain?Trends in cognitive sciences, 13(7):293–301.
George, D. and Hawkins, J. (2005).A hierarchical bayesian model of invariant patternrecognition in the visual cortex.In Neural Networks, 2005. IJCNN’05. Proceedings. 2005IEEE International Joint Conference on, volume 3, pages1812–1817. Ieee.
Hawkins, J. and Blakeslee, S. (2005).On intelligence.Owl Books.
Hubel, D. and Wiesel, T. (1962).
Receptive fields, binocular interaction and functionalarchitecture in the cat’s visual cortex.The Journal of physiology, 160(1):106.
Hung, C., Kreiman, G., Poggio, T., and DiCarlo, J. (2005).Fast readout of object identity from macaque inferiortemporal cortex.Science, 310(5749):863.
Johnson, J. and Olshausen, B. (2003).Timecourse of neural signatures of object recognition.Journal of Vision, 3(7).
Jones, J. and Palmer, L. (1987).An evaluation of the two-dimensional gabor filter model ofsimple receptive fields in cat striate cortex.Journal of Neurophysiology, 58(6):1233.
Kravitz, D., Kriegeskorte, N., and Baker, C. (2010).High-level visual object representations are constrained byposition.Cerebral Cortex, 20(12):2916.
Kravitz, D., Vinson, L., and Baker, C. (2008).How position dependent is visual object recognition?Trends in cognitive sciences, 12(3):114–122.
Kriegeskorte, N. (2011).Pattern-information analysis: From stimulus decoding tocomputational-model testing.NeuroImage.
Lee, T. and Mumford, D. (2003).Hierarchical bayesian inference in the visual cortex.JOSA A, 20(7):1434–1448.
Lee, T. and Yuille, A. (2006).Efficient coding of visual scenes by grouping andsegmentation.Bayesian Brain: Probabilistic Approaches to Neural Coding,MIT Press, Cambridge, MA, pages 145–188.
Li, F., VanRullen, R., Koch, C., and Perona, P. (2002).
Rapid natural scene categorization in the near absence ofattention.Proceedings of the National Academy of Sciences,99(14):9596.
Luck, S. (2005).An introduction to the event-related potential technique.MIT Press.
Naselaris, T., Kay, K., Nishimoto, S., and Gallant, J.(2010).Encoding and decoding in fMRI.Neuroimage.
Olshausen, B. and D.J., F. (1996).Emergence of simple-cell receptive field properties bylearning a sparse code for natural images.Nature, 381(6583):607–609.
Pinto, N., Cox, D., and DiCarlo, J. (2008).Why is real-world visual object recognition hard?PLoS computational biology, 4(1):e27.
Ponce, J., Berg, T., Everingham, M., Forsyth, D., Hebert,M., Lazebnik, S., Marszalek, M., Schmid, C., Russell, B.,Torralba, A., et al. (2006).Dataset issues in object recognition.Toward category-level object recognition, pages 29–48.
Rust, N. and Stocker, A. (2010).Ambiguity and invariance: two fundamental challenges forvisual processing.Current opinion in neurobiology, 20(3):382–388.
Serences, J. and Saproo, S. (2011).Computational advances towards linking bold and behavior.
Neuropsychologia.
Serre, T., Kouh, M., Cadieu, C., Knoblich, U., Kreiman,G., and Poggio, T. (2005).theory of object recognition: computations and circuits inthe feedforward path of the ventral stream in primate visualcortex,.
Technical Report CBCL Paper #259/AI Memo #2005-036,,Massachusetts Institute of Technology, Cambridge, MA.
Serre, T., Kreiman, G., Kouh, M., Cadieu, C., Knoblich,U., and Poggio, T. (2007).A quantitative theory of immediate visual recognition.Progress in Brain Research, 165:33–56.
Thorpe, S. and Fabre-Thorpe, M. (2001).Seeking categories in the brain.Science, 291(5502):260.
Thorpe, S., Fize, D., Marlot, C., et al. (1996).Speed of processing in the human visual system.nature, 381(6582):520–522.
Williams, M., Dang, S., and Kanwisher, N. (2007).Only some spatial patterns of fmri response are read out intask performance.Nature neuroscience, 10(6):685–686.
Wu, M., David, S., and Gallant, J. (2006).
Complete functional characterization of sensory neurons bysystem identification.Annu. Rev. Neurosci., 29:477–505.