optimal disparity estimation in natural stereo images · the local differences between the retinal...

34
Optimal disparity estimation in natural stereo images Johannes Burge # $ Center for Perceptual Systems and Department of Psychology, University of Texas at Austin, Austin,TX, USA Wilson S. Geisler # $ Center for Perceptual Systems and Department of Psychology, University of Texas at Austin, Austin,TX, USA A great challenge of systems neuroscience is to understand the computations that underlie perceptual constancies, the ability to represent behaviorally relevant stimulus properties as constant even when irrelevant stimulus properties vary. As signals proceed through the visual system, neural states become more selective for properties of the environment, and more invariant to irrelevant features of the retinal images. Here, we describe a method for determining the computations that perform these transformations optimally, and apply it to the specific computational task of estimating a powerful depth cue: binocular disparity. We simultaneously determine the optimal receptive field population for encoding natural stereo images of locally planar surfaces and the optimal nonlinear units for decoding the population responses into estimates of disparity. The optimal processing predicts well- established properties of neurons in cortex. Estimation performance parallels important aspects of human performance. Thus, by analyzing the photoreceptor responses to natural images, we provide a normative account of the neurophysiology and psychophysics of absolute disparity processing. Critically, the optimal processing rules are not arbitrarily chosen to match the properties of neurophysiological processing, nor are they fit to match behavioral performance. Rather, they are dictated by the task-relevant statistical properties of complex natural stimuli. Our approach reveals how selective invariant tuning—especially for properties not trivially available in the retinal images—could be implemented in neural systems to maximize performance in particular tasks. Introduction Front-facing eyes evolved, at least in part, to support binocular depth perception (Figure 1a). Stereopsis— perceiving depth from disparity—is a perceptual constancy that pervades the animal kingdom. In the binocular zone, each eye’s view yields a slightly different image of the scene. The local differences between the retinal images—the binocular disparities— are powerful signals for fixating the eyes and comput- ing the depth structure of the scene. The critical step in enabling the use of disparity in service of these tasks is to estimate disparity itself. Once the disparities are estimated, metric depth can be computed by triangu- lation given the fixation of the eyes. There have been many computational studies of disparity estimation (Banks, Gepshtein, & Landy, 2004; Cormack, Steven- son, & Schor, 1991; Marr & T. Poggio, 1976; Qian, 1997; Qian & Zhu, 1997; Read & Cumming, 2007; Tyler & Julesz, 1978) and of the behavioral limits in humans (Banks et al., 2004; Cormack et al., 1991; Marr & T. Poggio, 1976; Ogle, 1952; Panum, 1858; Tyler & Julesz, 1978). The underlying neural mechanisms of disparity processing have also been extensively re- searched (Cumming & DeAngelis, 2001; DeAngelis, Ohzawa, & Freeman, 1991; Nienborg, Bridge, Parker, & Cumming, 2004; Ohzawa, DeAngelis, & Freeman, 1990). However, there is no widely accepted ideal observer theory of disparity estimation in natural images. Such a theory would be a useful tool for evaluating performance in disparity-related tasks, providing principled hypotheses for neural mecha- nisms, and developing practical applications. Deriving the ideal observer for disparity estimation is a hierarchical, multistep process (see Figure 2). The first step is to model the photoreceptor responses to stereo images of natural scenes. The second step is to learn the optimal set of binocular filters for disparity estimation from a large collection of natural images. The third step is to determine how the optimal filter responses should be combined to obtain units that are selective for particular disparities and maximally invariant to stimulus dimensions other than disparity (e.g., texture). The fourth and final step is to read out the population response to obtain the optimal disparity estimates. In addition to carrying out these steps, we Citation: Burge, J., & Geisler,W. S. (2014). Optimal disparity estimation in natural stereo images. Journal of Vision, 14(2):1, 1–18, http://www.journalofvision.org/content/14/2/1, doi:10.1167/14.2.1. Journal of Vision (2014) 14(2):1, 1–18 1 http://www.journalofvision.org/content/14/2/1 doi: 10.1167/14.2.1 ISSN 1534-7362 Ó 2014 ARVO Received May 28, 2013; published February 3, 2014

Upload: others

Post on 13-May-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Optimal disparity estimation in natural stereo images · The local differences between the retinal images—the binocular disparities— ... vertically oriented receptive fields.The

Optimal disparity estimation in natural stereo images

Johannes Burge $Center for Perceptual Systems and Department of

Psychology University of Texas at Austin Austin TX USA

Wilson S Geisler $Center for Perceptual Systems and Department of

Psychology University of Texas at Austin Austin TX USA

A great challenge of systems neuroscience is tounderstand the computations that underlie perceptualconstancies the ability to represent behaviorallyrelevant stimulus properties as constant even whenirrelevant stimulus properties vary As signals proceedthrough the visual system neural states become moreselective for properties of the environment and moreinvariant to irrelevant features of the retinal imagesHere we describe a method for determining thecomputations that perform these transformationsoptimally and apply it to the specific computational taskof estimating a powerful depth cue binocular disparityWe simultaneously determine the optimal receptive fieldpopulation for encoding natural stereo images of locallyplanar surfaces and the optimal nonlinear units fordecoding the population responses into estimates ofdisparity The optimal processing predicts well-established properties of neurons in cortex Estimationperformance parallels important aspects of humanperformance Thus by analyzing the photoreceptorresponses to natural images we provide a normativeaccount of the neurophysiology and psychophysics ofabsolute disparity processing Critically the optimalprocessing rules are not arbitrarily chosen to match theproperties of neurophysiological processing nor are theyfit to match behavioral performance Rather they aredictated by the task-relevant statistical properties ofcomplex natural stimuli Our approach reveals howselective invariant tuningmdashespecially for properties nottrivially available in the retinal imagesmdashcould beimplemented in neural systems to maximizeperformance in particular tasks

Introduction

Front-facing eyes evolved at least in part to supportbinocular depth perception (Figure 1a) Stereopsismdashperceiving depth from disparitymdashis a perceptualconstancy that pervades the animal kingdom In the

binocular zone each eyersquos view yields a slightlydifferent image of the scene The local differencesbetween the retinal imagesmdashthe binocular disparitiesmdashare powerful signals for fixating the eyes and comput-ing the depth structure of the scene The critical step inenabling the use of disparity in service of these tasks isto estimate disparity itself Once the disparities areestimated metric depth can be computed by triangu-lation given the fixation of the eyes There have beenmany computational studies of disparity estimation(Banks Gepshtein amp Landy 2004 Cormack Steven-son amp Schor 1991 Marr amp T Poggio 1976 Qian1997 Qian amp Zhu 1997 Read amp Cumming 2007Tyler amp Julesz 1978) and of the behavioral limits inhumans (Banks et al 2004 Cormack et al 1991 Marramp T Poggio 1976 Ogle 1952 Panum 1858 Tyler ampJulesz 1978) The underlying neural mechanisms ofdisparity processing have also been extensively re-searched (Cumming amp DeAngelis 2001 DeAngelisOhzawa amp Freeman 1991 Nienborg Bridge Parkeramp Cumming 2004 Ohzawa DeAngelis amp Freeman1990) However there is no widely accepted idealobserver theory of disparity estimation in naturalimages Such a theory would be a useful tool forevaluating performance in disparity-related tasksproviding principled hypotheses for neural mecha-nisms and developing practical applications

Deriving the ideal observer for disparity estimationis a hierarchical multistep process (see Figure 2) Thefirst step is to model the photoreceptor responses tostereo images of natural scenes The second step is tolearn the optimal set of binocular filters for disparityestimation from a large collection of natural imagesThe third step is to determine how the optimal filterresponses should be combined to obtain units that areselective for particular disparities and maximallyinvariant to stimulus dimensions other than disparity(eg texture) The fourth and final step is to read outthe population response to obtain the optimal disparityestimates In addition to carrying out these steps we

Citation Burge J amp Geisler W S (2014) Optimal disparity estimation in natural stereo images Journal of Vision 14(2)1 1ndash18httpwwwjournalofvisionorgcontent1421 doi1011671421

Journal of Vision (2014) 14(2)1 1ndash18 1httpwwwjournalofvisionorgcontent1421

doi 10 1167 14 2 1 ISSN 1534-7362 2014 ARVOReceived May 28 2013 published February 3 2014

show how the optimal computations can be imple-mented with well-understood cortical mechanismsInterestingly the binocular receptive field properties ofcortical neurons and human behavioral performancein disparity-related tasks parallel those of the optimalestimator The results help explain the exquisite abilityof the human visual system to recover the 3-D structureof natural scenes from binocular disparity

The computational approach employed here hasimplications not just for understanding disparityestimation but for the more general problem ofunderstanding neural encoding and decoding in specifictasks Our approach merges and extends the two majorexisting computational approaches for understandingthe neural encoding-decoding problem The firstexisting approachmdashefficient codingmdashfocuses on how toefficiently (compactly) represent natural sensory signalsin neural populations (Hoyer amp Hyvarinen 2000Lewicki 2002 Li amp Atick 1994 Olshausen amp Field1996 Simoncelli amp Olshausen 2001) While thisapproach has provided general insights into retinal andcortical encoding its goal is to represent the sensorysignals without loss of information Because it has adifferent goal it does not elucidate the encoding or thedecoding necessary for specific perceptual tasks In acertain sense it provides no more insight into thecomputations underlying specific perceptual abilities

than the photoreceptor responses themselves Thesecond existing approach focuses on how to decodepopulations of neurons tuned to the stimulus variableassociated with a specific perceptual task (GirshickLandy amp Simoncelli 2011 Jazayeri amp Movshon 2006Ma Beck Latham amp Pouget 2006 Read amp Cumming2007) However this approach focuses on responsevariability that is intrinsic to the neural system ratherthan response variability that is due to the naturalvariation in the stimuli themselves Specifically thisapproach assumes neurons that have perfectly invari-ant tuning functions These neurons give the sameresponse (except for neural noise) to all stimuli havingthe same value of the variable of interest In generalthis cannot occur under natural viewing conditionsVariation in natural signals along irrelevant dimensionsinevitably causes variation in the tuning function Thusthis approach does not address how encoding affectsthe neural encoding-decoding problem It is critical toconsider natural sensory signals and task-specificencoding and decoding simultaneously because en-coded signals relevant for one task may be irrelevantfor another and because task-specific decoding mustdiscount irrelevant variation in the encoded signals

Arguably the most significant advances in behav-ioral and systems neuroscience have resulted from thestudy of neural populations associated with particular

Figure 1 Natural scene inputs disparity geometry and example left and right eye signals (a) Animals with front facing eyes (b)

Example natural images used in the analysis (c) Stereo geometry The eyes are fixated and focused at a point straight ahead at 40 cm

We considered retinal disparity patterns corresponding to fronto-parallel and slanted surfaces Non-planar surfaces were also

considered (see Discussion) (d) Photographs of natural scenes are texture mapped onto planar fronto-parallel or slanted surfaces

Here the left and right eye retinal images are perspective projections (inset) of a fronto-parallel surface with 5 arcmin of uncrossed

disparity Left and right eye signals are obtained by vertically averaging each image these are the signals available to neurons with

vertically oriented receptive fields The signals are not identical shifted copies of each other because of perspective projection added

noise and cosine windowing (see text) We note that across image patches there is considerable signal variation due to stimulus

properties (eg texture) unrelated to disparity A selective invariant neural population must be insensitive to this variation

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 2

sensory and perceptual tasks Here we describe anapproach for determining the encoding and decodingof natural sensory stimuli required for optimalperformance in specific tasks and show that well-established cortical mechanisms can implement the

optimal computations Our results provide principledtestable hypotheses about the processing stages thatgive rise to the selective invariant neural populationsthat are thought to underlie sensory and perceptualconstancies

Methods and results

Natural disparity signals

The first step in deriving the ideal observer fordisparity estimation is to simulate the photoreceptorresponses to natural scenes (see Figure 2a) The vectorof photoreceptor responses r (see Figure 2a) isdetermined by the luminance (Figure 1b) and depthstructure of natural scenes projective viewing geome-try and the optics sensors and noise in the visionsystem We model each of these factors for the humanvisual system (see Methods details) We generate a largenumber of noisy sampled stereo-images of small (18 middot18) fronto-parallel surface patches for each of a largenumber of disparity levels within Panumrsquos fusionalrange (Panum 1858) (16875 to 16875 arcmin in1875 arcmin steps) This range covers approximately80 of disparities that occur at or near the fovea innatural viewing (Liu Bovik amp Cormack 2008) andthese image patches represent the information availableto the vision system for processing Although we focusfirst on fronto-parallel patches we later show that theresults are robust to surface slant and are likely to berobust to other depth variations occurring in naturalscenes We emphasize that in this paper we simulateretinal images by projecting natural image patches ontoplanar surfaces (Figure 1c) rather than simulatingretinal images from stereo photographs Later weevaluate the effects of this simplification (see Discus-sion)

The binocular disparity in the images entering thetwo eyes is given by

d frac14 aL aR eth1THORNwhere aL and aR are the angles between the retinalprojections of a target and fixation point in the left andright eyes This is the definition of absolute retinaldisparity The specific pattern of binocular disparitiesdepends on the distance between the eyes the distanceand direction that the eyes are fixated and the distanceand depth structure of the surfaces in the scene Weconsider a viewing situation in which the eyes areseparated by 65 cm (a typical interocular separation)and are fixated on a point 40 cm straight ahead (atypical armrsquos length) (Figure 1c)

Figure 2 Hierarchical processing steps in optimal disparity

estimation (a) The photoreceptor responses are computed for

each of many natural images for each of many different

disparities (b) The optimal filters for disparity estimation are

learned from this collection of photoreceptor responses to

natural stimuli These are the eight most useful vertically

oriented filters (receptive fields) (see also Figure 3a) Left and

right eye filter weights are shown in gray Filter responses are

given by the dot product between the photoreceptor responses

and filter weights (c) The optimal selective invariant units are

constructed from the filter responses Each unit in the

population is tuned to a particular disparity These units result

from a unique combination of the optimal filter responses (see

also Figure 7) (d) The optimal readout of the selective invariant

population response is determined Each black dot shows the

response of one of the disparity-tuned units in (c) to the

particular image shown in (a) The peak of the population

response is the optimal (MAP) estimate Note that r R and RLL

are vectors representing the photoreceptor filter and disparity-

tuned-unit population responses to particular stimuli Red

outlines represent the responses to the particular stereo-image

patch in (a) The steps in this procedure are general and will be

useful for developing ideal observers for other estimation tasks

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 3

Optimal linear binocular filters

Estimating disparity (solving the correspondenceproblem) is difficult because there often are multiplepoints in one eyersquos image that match a single point inthe other (Marr amp T Poggio 1976) Thereforeaccurate estimation requires using the local imageregion around each point to eliminate false matches Inmammals neurons with binocular receptive fields thatweight and sum over local image regions first appear inprimary visual cortex Much is known about theseneurons It is not known however why they have thespecific receptive fields they do nor is it known howtheir responses are combined for the task of estimatingdisparity

To discover the linear binocular filters that areoptimal this task (see Figure 2b) we use a Bayesianstatistical technique for dimensionality reduction calledaccuracy maximization analysis (AMA) (Geisler Na-jemnik amp Ing 2009) The method depends both on thestatistical structure of the sensory signals and on thetask to be solved This feature of the method iscritically important because information relevant forone task may be irrelevant for another For any fixednumber of filters (feature dimensions) AMA returnsthe linear filters that maximize accuracy for the specificcomputational task at hand (see Methods details)(Note that feature dimensions identified with otherdimensionality-reduction techniques like PCA and ICAmay be irrelevant for a given task) Importantly AMAmakes no a priori assumptions about the shapes of thefilters (eg there no requirement that they be orthog-onal) We applied AMA to the task of identifying thedisparity level from a discrete set of levels in a randomcollection of retinal stereo images of natural scenesThe number of disparity levels was sufficient to allowcontinuous disparity estimation from15 to 15 arcmin(see Methods details)

Before applying AMA we perform a few additionaltransformations consistent with the image processingknown to occur early in the primate visual system(Burge amp Geisler 2011) First each eyersquos noisy sampledimage patch is converted from a luminance image to awindowed contrast image c(x) by subtracting off anddividing by the mean and then multiplying by a raisedcosine of 058 at half height The window limits themaximum possible size of the binocular filters it placesno restriction on minimum size The size of the cosinewindow approximately matches the largest V1 binoc-ular receptive field sizes near the fovea (Nienborg et al2004) Next each eyersquos sampled image patch isaveraged vertically to obtain what are henceforthreferred to as left and right eye signals Verticalaveraging is tantamount to considering only verticallyoriented filters for this is the operation that verticallyoriented filters perform on images All orientations can

provide information about binocular disparity (Chen ampQian 2004 DeAngelis et al 1991) however becausecanonical disparity receptive fields are vertically ori-ented we focus our analysis on them An examplestereo pair and corresponding left and right eye signalsare shown in Figure 1d Finally the signals are contrastnormalized to a vector magnitude of 10 cnorm(x) frac14c(x)||c(x)|| This normalization is a simplified versionof the contrast normalization seen in cortical neurons(Albrecht amp Geisler 1991 Albrecht amp Hamilton 1982Heeger 1992) (see Supplement) Seventy-six hundrednormalized left and right eye signals (400 NaturalInputs middot 19 Disparity Levels) constituted the trainingset for AMA

The eight most useful linear binocular filters (in rankorder) are shown in Figure 3a These filters specify thesubspace that a population of neurophysiologicalreceptive fields should cover for maximally accuratedisparity estimation Some filters are excited bynonzero disparities some are excited by zero (or near-zero) disparities and still others are suppressed by zerodisparity The left and right eye filter components areapproximately log-Gabor (Gaussian on a log spatialfrequency axis) The spatial frequency selectivity ofeach filterrsquos left and right eye components are similarbut differ between filters (Figure 3b) Filter tuning andbandwidth range between 12ndash47 c8 and 09ndash42 c8respectively with an average bandwidth of approxi-mately 15 octaves Thus the spatial extent of the filters(see Figure 3a) is inversely related to its spatialfrequency tuning As the tuned frequency increases thespatial extent of the filter decreases The filters alsoexhibit a mixture of phase and position coding (Figure3c) suggesting that a mixture of phase and positioncoding is optimal Similar filters result (Figure S1andashc)when the training set contains surfaces having adistribution of different slants (see Discussion) (Notethat the cosine windows bias the filters more towardphase than position encoding Additional analyses havenevertheless shown that windows having a nonzeroposition offset ie position disparity do not qualita-tively change the filters)

Interestingly some filters provide the most infor-mation about disparity when they are not respondingFor example the disparity is overwhelmingly likely tobe zero when filter f5 (see Figure 3 SupplementaryFigure S5) does not respond because anticorrelatedintensity patterns in the left- and right-eye images arevery unlikely in natural images A complex cell withthis binocular receptive field would produce a disparitytuning curve similar to the tuning curve produced by aclassic tuned-inhibitory cell (Figure S5 Poggio Gon-zalez Krause 1988)

The properties of our linear filters are similar tothose of binocular simple cells in early visual cortex(Cumming amp DeAngelis 2001 De Valois Albrecht amp

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 4

Thorell 1982 DeAngelis et al 1991 Poggio et al1988) Specifically binocular simple cells have asimilar range of shapes a similar range of spatialfrequency tuning similar octave bandwidths of 15and similar distributions of phase and position coding(Figure 3c) Despite the similarities between theoptimal filters and receptive fields in primary visualcortex we emphasize that our explicit aim is not toaccount for the properties of neurophysiologicalreceptive fields (Olshausen amp Field 1996) Rather ouraim is to determine the filters that encode the retinalinformation most useful for estimating disparity innatural images Thus neurophysiological receptivefields with the properties described here would beideally suited for disparity estimation

The fact that many neurophysiological properties ofbinocular neurons are predicted by a task-drivenanalysis of natural signals with appropriate biologicalconstraints and zero free parameters suggests thatsimilar ideal observer analyses may be useful forunderstanding and evaluating the neural underpin-nings of other fundamental sensory and perceptualabilities

Optimal disparity estimation

The optimal linear binocular filters encode the mostrelevant information in the retinal images for disparityestimation However optimal estimation performancecan only be reached if the joint responses of the filtersare appropriately combined and decoded (Figure 2c d)Here we determine the Bayes optimal (nonlinear)decoder by analyzing the responses of the filters inFigure 3 to natural images with a wide range ofdisparities Later we show how this decoder could beimplemented with linear and nonlinear mechanismscommonly observed in early visual cortex (We notethat the decoder that was used to learn the AMA filtersfrom the training stimuli cannot be used for arbitrarystimuli see Methods details)

First we examine the joint distribution of filterresponses to the training stimuli conditioned on eachdisparity level dk (The dot product between each filterand a contrast normalized left- and right-eye signalgives each filterrsquos response The filter responses arerepresented by the vector R see Figure 2b) Theconditional response distributions p(Rjdk) are approx-imately Gaussian (83 of the marginal distributionsconditioned on disparity are indistinguishable fromGaussian K-S test p 001 see Figure 4a) (Theapproximately Gaussian form is largely due to thecontrast normalization of the training stimuli seeDiscussion) Each of these distributions was fit with amultidimensional Gaussian gauss(R uk Ck) estimatedfrom the sample mean vector uk and the sample

covariance matrix Ck Figure 4a shows sample filterresponse distributions (conditioned on several disparitylevels) for the first two AMA filters Much of theinformation about disparity is contained in thecovariance between the filter responses indicating thata nonlinear decoder is required for optimal perfor-mance

The posterior probability of each disparity given anobserved response vector of AMA filter responses Rcan be obtained via Bayesrsquo rule

pethdkjRTHORN frac14gaussethRukCkTHORNpethdkTHORNXNlfrac141

gaussethR ulClTHORNpethdlTHORN eth2THORN

Here we assume that all disparities are equallylikely Hence the prior probabilities p(d) cancel out(This assumption yields a lower bound on performanceperformance will increase somewhat in natural condi-tions when the prior probabilities are not flat ascenario that is considered later)

The solid curves in Figure 4b show posteriorprobability distributions averaged across all naturalstimuli having the same disparity Each posterior canbe conceptualized as the expected response of apopulation of (exponentiated) log-likelihood neurons(see below) ordered by their preferred disparitiesColored areas show variation in the posterior proba-bilities due to irrelevant image variation When the goalis to obtain the most probable disparity the optimalestimate is given by the maximum a posteriori (MAP)read-out rule

d frac14 argmaxd

pethdjRTHORN eth3THORN

The posterior distributions given by Equation 2could of course also be read out with other decodingrules which would be optimal given different goals(cost functions)

The accuracy of disparity estimates from a largecollection of natural stereo-image test patches (29600Test Patches 800 Natural Inputs middot 37 DisparityLevels) is shown in Figure 5a (Note None of the testpatches were in the training set and only half the testdisparity levels were in the training set) Disparityestimates are unbiased over a wide range At zerodisparity estimate precision corresponds to a detectionthreshold of 6 arcsec As disparity increases (ie asthe surface patch is displaced from the point offixation) precision decreases approximately exponen-tially with disparity (Figure 5b) Sign confusions arerare (37 Figure 5c) Similar but somewhat poorerperformance is obtained with surfaces having a cosinedistribution of slants (Figures 5d e Figure S1f seeDiscussion) In summary optimal encoding (filters inFigure 3) and decoding (Equations 2 and 3) of task-

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 5

relevant information in natural images yields excellentdisparity estimation performance

This pattern of performance is consistent withhuman performance Human disparity detectionthresholds are exquisite a few arcsec on average(Blakemore 1970 Cormack et al 1991) discrimina-tion thresholds decrease exponentially with disparity(Badcock amp Schor 1985 Blakemore 1970 McKeeLevi amp Bowne 1990 Stevenson Cormack Schor ampTyler 1992) and sign confusions occur with a similarpattern and a similar proportion of the time (Landers ampCormack 1997)

Most psychophysical and neurophysiological datahas been collected with artificial stimuli usuallyrandom-dot stereograms (RDSs) We asked how wellour optimal estimator performs on RDS stimuli Giventhat our estimator was trained on natural stimuli it isinteresting to note that performance is very similar (butslightly poorer) with RDSs (Figure S2andashe) Fortunate-ly this finding suggests that under many circumstancesRDS stimuli are reasonable surrogates for naturalstimuli when measuring disparity processing in bio-logical vision systems

Our optimal estimator also performs poorly onanticorrelated stimuli (eg stimuli in which one eyersquosimage is contrast reversed) just like humans (Cum-ming Shapiro amp Parker 1998) Relevant informationexists in the optimal filter responses to anticorrelatedstimuli although the conditional response distributionsare more poorly segregated and the responses aregenerally slightly weaker (Figure S2f g) The primaryreason for poor performance is that an optimalestimator trained on natural images cannot accuratelydecode the filter responses to anticorrelated stimuli(Figure S2h i) This fact may help explain whybinocular neurons often respond strongly to anticor-related stereograms and why anticorrelated stereo-

grams appear like ill-specified volumes of points indepth

Optimal disparity encoding and decoding withneural populations

Although the AMA filters appear similar to binoc-ular simple cells (Figure 3 Figure S1) it may not beobvious how the optimal Bayesian rule for combiningtheir responses is related to processing in visual cortexHere we show that the optimal computations can beimplemented with neurally plausible operationsmdashlinearexcitation linear inhibition and simple static non-linearities (thresholding and squaring) Appropriateweighted summation of binocular simple and complexcell population responses can result in a new popula-tion of neurons having tightly tuned unimodaldisparity tuning curves that are largely invariant (seeFigure 2c)

The key step in implementing a Bayes-optimalestimation rule is to compute the likelihoodmdashorequivalently the log likelihoodmdashof the optimal filterresponses conditioned on each disparity level For thepresent case of a uniform prior the optimal MAPestimate is simply the disparity with the greatest loglikelihood Given that the likelihoods are Gaussian thelog likelihoods are quadratic

ln gaussethRjdkTHORNfrac12 frac14 05ethR ukTHORNTC1k ethR ukTHORN

thorn const eth4THORNwhere uk and Ck which are the mean and covariancematrices of the AMA filter responses to a largecollection of natural stereo-image patches havingdisparity dk (see Figure 4a) By multiplying throughand collecting terms Equation 4 can be expressed as

Figure 3 Optimal linear binocular receptive fields for disparity estimation (a) Spatial receptive fields Solid lines with closed symbols

indicate the left eye filter components Dashed lines with open symbols indicate right eye filter components Insets show 2-D versions

of the 1-D filters (b) Luminance spatial frequency tuning versus spatial frequency bandwidth The filters have bandwidths of

approximately 15 octaves (c) Phase and position shift coding for optimal binocular filters (circles) and binocular cells in macaque

(squares) (Cumming amp DeAngelis 2001) Phase shifts are expressed in equivalent position shifts Note that the filters were optimized

for the fovea whereas macaque cells were recorded from a range of different eccentricities

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 6

ln gaussethRjdkTHORNfrac12 frac14Xnifrac141

wikRi thornXnifrac141

wiikR2i

thornXn1

ifrac141

Xnjfrac14ithorn1

wijkethRi thorn RjTHORN2

thorn const0 eth5THORNwhere Ri is the response of the ith AMA filter and wikwiik and wijk are the weights for disparity dk Theweights are simple functions of the mean and covari-ance matrices from Equation 4 (see Supplement) Thusa neuron which responds according to the loglikelihood of a given disparity (an LL neuron)mdashthat isRLL

k frac14 ln[gauss(Rjdk)]mdashcan be obtained by weightedsummation of the linear (first term) squared (secondterm) and pairwise-sum-squared (third term) AMAfilter responses (Equation 5) The implication is that alarge collection of LL neurons each with a differentpreferred disparity can be constructed from a smallfixed set of linear filters simply by changing the weightson the linear filter responses squared-filter responsesand pairwise-sum-squared filter responses

A potential concern is that these computations couldnot be implemented in cortex AMA filters are strictlylinear and produce positive and negative responses(Figure 6a) whereas real cortical neurons produce onlypositive responses However the response of eachAMA filter could be obtained by subtracting theoutputs of two half-wave rectified simple cells that arelsquolsquoonrsquorsquo and lsquolsquooffrsquorsquo versions of each AMA filter (seeSupplement Figure S3a) The squared and pairwise-

sum-squared responses are modeled as resulting from alinear filter followed by a static squaring nonlinearity(Figure 6b) these responses could be obtained fromcortical binocular complex cells (see Supplement)(Note that our lsquolsquocomplex cellrsquorsquo differs from thedefinition given by some prominent expositors of thedisparity energy model [Figure 6d] Cumming ampDeAngelis 2001 Ohzawa 1998 Qian 1997) Thesquared responses could be obtained by summing andsquaring the responses of on and off simple cells (seeSupplement Figure S3a b) Finally the LL neuronresponse could be obtained via a weighted sum of theAMA filter and model complex cell responses (Figure6c) Thus all the optimal computations are biologicallyplausible

Figure 6 shows processing schematics disparitytuning curves and response variability for the threefilter types implied by our analysis an AMA filter amodel complex cell a model LL neuron and forcomparison a standard disparity energy neuron(Cumming amp DeAngelis 2001) (The model complexcells are labeled lsquolsquocomplexrsquorsquo because in general theyexhibit temporal frequency doubling and a fundamen-tal to mean response ratio that is less than 10 Skottunet al 1991) Response variability due to retinal imagefeatures that are irrelevant for estimating disparity isindicated by the gray area the smaller the gray areathe more invariant the neural response to irrelevantimage features The AMA filter is poorly tuned todisparity and gives highly variable responses todifferent stereo-image patches with the same disparity

Figure 4 Joint filter response distributions conditioned on disparity for filters F1 and F2 (see Figure 3a) (a) Joint filter responses to

each of the 7600 image patches in the training set Different colors and symbols denote different disparity levels Contours show

Gaussian fits to the conditional filter response distributions The black curves on the x and y axes represent the marginal response

distributions p(R1) and p(R2) (b) Posterior probability distributions averaged across all stimuli at each disparity level if only filters F1

and F2 are used (dotted curves) and if all eight filter responses are used (solid curves) Using eight AMA filters instead of two

increases disparity selectivity Shaded areas represent 68 confidence intervals on the posterior probabilities this variation is due to

natural stimulus variation that is irrelevant for estimating disparity Natural stimulus variation thus creates response variability even in

hypothetical populations of noiseless neurons

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 7

(Figure 6a) The model complex cell is better tuned butit is not unimodal and its responses also vary severelywith irrelevant image information (Figure 6b) (Thedisparity tuning curves for all model complex cells areshown in Figure S5) In contrast the LL neuron issharply tuned is effectively unimodal and is stronglyresponse invariant (Figure 6c) That is it respondssimilarly to all natural image patches of a givendisparity The disparity tuning curves for a range of LLneurons are shown in Figure 7a When slant variessimilar but broader LL neuron tuning curves result(Figure S1d e)

These results show that it is potentially misleading torefer to canonical V1 binocular simple and complexcells as disparity tuned because their responses aretypically as strongly modulated by variations incontrast pattern as they are by variations in disparity(gray area Figure 6a b) The LL neurons on the otherhand are tuned to a narrow range of disparities andrespond largely independent of the spatial frequencycontent and contrast

The LL neurons have several interesting propertiesFirst their responses are determined almost exclusivelyby the model complex cell inputs because the weightson the linear responses (Equation 5 Figure 6a c) aregenerally near zero (see Supplement) In this regard theLL neurons are consistent with the predictions of thestandard disparity energy model (Cumming amp DeAn-gelis 2001 Ohzawa 1998) However standard dis-

parity energy neurons are not as narrowly tuned or asinvariant (Figure 6d)

Second each LL neuron receives strong inputs frommultiple complex cells (Figure 7b) In this regard theLL neurons are inconsistent with the disparity energymodel which proposes that disparity-tuned cells areconstructed from two binocular subunits The potentialvalue of more than two subunits has been previouslydemonstrated (Qian amp Zhu 1997) Recently it hasbeen shown that some disparity-tuned V1 cells areoften modulated by a greater number of binocularsubunits than two Indeed as many as 14 subunits candrive the activity of a single disparity selective cell(Tanabe Haefner amp Cumming 2011)

Third the weights on the model complex cells(Figure 7b)mdashwhich are determined by the conditionalresponse distributions (Figure 4a)mdashspecify how infor-mation at different spatial frequencies should becombined

Fourth as preferred disparity increases the numberof strong weights on the complex cell inputs decreases(Figure 7b) This occurs because high spatial frequen-cies are less useful for encoding large disparities (seebelow) Thus if cells exist in cortex that behavesimilarly to LL neurons the number and spatialfrequency tuning of binocular subunits driving theirresponse should decrease as a function of theirpreferred disparity

Fifth for all preferred disparities the excitatory(positive) and inhibitory (negative) weights are in a

Figure 5 Accuracy and precision of disparity estimates on test patches (a) Disparity estimates of fronto-parallel surfaces displaced

from fixation using the filters in Figure 3 Symbols represent the median MAP readout of posterior probability distributions (see

Figure 4b) Error bars represent 68 confidence intervals on the estimates Red boxes mark disparity levels not in the training set

Error bars at untrained levels are no larger than at the trained levels indicating that the algorithm makes continuous estimates (b)

Precision of disparity estimates on a semilog axis Symbols represent 68 confidence intervals (same data as error bars in Figure 5a)

Human discrimination thresholds also rise exponentially as stereo stimuli are moved off the plane of fixation The gray area shows the

hyperacuity region (c) Sign identification performance as a function of disparity (d) (e) Same as in (b) (c) except that data is for

surfaces with a cosine distribution of slants

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 8

classic push-pull relationship (Ferster amp Miller 2000)(Figure 7b) consistent with the fact that disparityselective neurons in visual cortex contain both excit-atory and suppressive subunits (Tanabe et al 2011)

Sixth the LL neuron disparity tuning curves areapproximately log-Gaussian in shape (ie Gaussian ona log-disparity axis) Because their standard deviationsare approximately constant on a log disparity axistheir disparity bandwidths increase linearly with thepreferred disparity (Figure 7c) In this respect many

cortical neurons behave like LL neurons the disparitytuning functions of cortical neurons typically havemore low frequency power than is predicted from thestandard energy model (Ohzawa DeAngelis amp Free-man 1997 Read amp Cumming 2003) Additionallypsychophysically estimated disparity channels in hu-mans exhibit a similar relationship between bandwidthand preferred disparity (Stevenson et al 1992) Humandisparity channels however differ in that they haveinhibitory side-lobes (Stevenson et al 1992) that is

Figure 6 Biologically plausible implementation of selective invariant tuning processing schematics and disparity tuning curves for an

AMA filter a model complex cell and a model log-likelihood (LL) neuron For comparison a disparity energy unit is also presented In

all cases the inputs are contrast normalized photoreceptor responses Disparity tuning curves show the mean response of each filter

type across many natural image patches having the same disparity for many different disparities Shaded areas show response

variation due to variation in irrelevant features in the natural patches (not neural noise) Selectivity for disparity and invariance to

irrelevant features (external variation) increase as processing proceeds (a) The filter response is obtained by linearly filtering the

contrast normalized input signal with the AMA filter (b) The model complex cell response is obtained by squaring the linear AMA

filter response (c) The response of an LL neuron with preferred disparity dk is obtained by a weighted sum of linear and squared

filter responses The weights can be positiveexcitatory or negativeinhibitory (see Figure 7) The weights for an LL neuron with a

particular preferred disparity are specified by the filter response distribution to natural images having that disparity (Figure 4a

Equations 4 5 S1ndash 4) In disparity estimation the filter responses specify that the weights on the linear filter responses are near zero

(see Supplement) (d) A standard disparity energy unit is obtained by simply summing the squared responses of two binocular linear

filters that are in quadrature (908 out of phase with each other) Here we show the tuning curve of a disparity energy unit having

binocular linear filters (subunits) with left and right-eye components that are also 908 out of phase with each other (ie each

binocular subunit is selective for a nonzero disparity)

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 9

they have the center-surround organization that is ahallmark of retinal ganglion cell LGN and V1receptive fields Understanding the basis of this center-surround organization is an important direction forfuture work

V1 binocular neurons are unlikely to have receptivefields exactly matching those of the simple and complexcells implied by the optimal AMA filters but V1neurons are likely to span the subspace spanned by theoptimal filters It is thus plausible that excitatory andinhibitory synaptic weights could develop so that asubset of the neurons in V1 or in other cortical areassignal the log likelihood of different specific disparities(see Figure 7b) Indeed some cells in cortical areas V1V2 and V3V3a exhibit sharp tuning to the disparity ofrandom dot stimuli (Cumming 2002 Ohzawa et al1997 G F Poggio et al 1988 Read amp Cumming2003)

Computational models of estimation from neuralpopulations often rely on the assumption that eachneuron is invariant and unimodally tuned to thestimulus property of interest (Girshick et al 2011Jazayeri amp Movshon 2006 Lehky amp Sejnowski 1990Ma et al 2006) However it is often not discussed howinvariant unimodal tuning arises For example thebinocular neurons (ie complex cells) predicted by thestandard disparity energy model do not generallyexhibit invariant unimodal tuning (Figure 6d) Ouranalysis shows that neurons with unimodal tuning tostimulus properties not trivially available in the retinal

images (eg disparity) can result from appropriatelinear combination of nonlinear filter responses

To obtain optimal disparity estimates the LLneuron population response (represented by the vectorRLL in Figure 2c) must be read out (see Figure 2d)The optimal read-out rule depends on the observerrsquosgoal (the cost function) A common goal is to pick thedisparity having the maximum a posteriori probabil-ity If the prior probability of the different possibledisparities is uniform then the optimal MAP decodingrule reduces to finding the LL neuron with themaximum response Nonuniform prior probabilitiescan be taken into account by adding a disparity-dependent constant to each LL neuron responsebefore finding the peak response There are elegantproposals for how the peak of a population responsecan be computed in noisy neural systems (Jazayeri ampMovshon 2006 Ma et al 2006) Other commonlyassumed cost functions (eg MMSE) yield similarperformance

In sum our analysis has several implications First itsuggests that optimal disparity estimation is bestunderstood in the context of a population codeSecond it shows how to linearly sum nonlinear neuralresponses to construct cells with invariant unimodaltuning curves Third it suggests that the eclecticmixture of binocular receptive field properties in cortexmay play a functional role in disparity estimationFourth it provides a principled hypothesis for howneurons may compute the posterior probabilities of

Figure 7 Constructing selective invariant disparity-tuned units (LL neurons) (a) Tuning curves for several LL neurons each with a

different preferred disparity Each point on the tuning curve represents the average response across a collection of natural stereo-

images having the same disparity Gray areas indicate 61 SD of response due to stimulus-induced response variability (b)

Normalized weights on model complex cell responses (see Supplement Equation 5 Figure 6c) for constructing the five LL neurons

marked with arrows in (a) Positive weights are excitatory (red) Negative weights are inhibitory (blue) On-diagonal weights

correspond to model complex cells having linear receptive fields like the filters in Figure 3a Off-diagonal weights correspond to model

complex cells having linear receptive fields like the scaled pairwise sums of the AMA filters (see Figures S3ndashS5) High spatial

frequencies are not useful for estimating large disparities (Figure S7) Thus the number of strongly weighted complex cells decreases

as the magnitude of the preferred disparity increases from zero (c) LL neuron bandwidth (ie full-width at half-height of disparity

tuning curves) as a function of preferred disparity Bandwidth increases approximately linearly with tuning

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 10

stimulus properties (eg disparity defocus motion)not trivially available in the retinal image(s) Thus ouranalysis provides a recipe for how to increase selectivityand invariance in the visual processing stream invari-ance to image content variability and selectivity for thestimulus property of interest

Discussion

Using principles of Bayesian statistical decisiontheory we showed how to optimally estimate binoc-ular disparity in natural stereo images given theoptical sensor and noise properties of the humanvisual system First we determined the linear binoc-ular filters that encode the retinal image informationmost useful for estimating disparity in natural stereo-images The optimal linear filters share many proper-ties with binocular simple cells in the primary visualcortex of monkeys and cats (Figure 3 Figure S1)Then we determined how the optimal linear filterresponses should be nonlinearly decoded to maximizethe accuracy of disparity estimation The optimaldecoder was not based on prior assumptions butrather was dictated by the statistical pattern of thejoint filter responses to natural stereo images Overallperformance was excellent and matched importantaspects of human performance (Figure 5) Finally weshowed that the optimal encoder and decoder can beimplemented with well-established neural operationsThe operations show how selectivity and invariancecan emerge along the visual processing stream(Figures 6 amp 7)

External variability and neural noise

A visual systemrsquos performance is limited by bothexternal (stimulus-induced) variability and intrinsicneural response variability (ie neural noise) Manytheoretical studies have focused on the impact ofneural noise on neural computation (Berens EckerGerwinn Tolias amp Bethge 2011 Cumming ampDeAngelis 2001 DeAngelis et al 1991 Jazayeri ampMovshon 2006 Ma et al 2006 Nienborg et al 2004Ohzawa DeAngelis amp Freeman 1990) However inmany real-world situations stimulus-induced responsevariability (ie variability due to nuisance stimulusproperties) is a major or the dominant source ofvariability Our analysis demonstrates that the exter-nal variation in natural image contentmdashthat isvariation in image features irrelevant to the taskmdashconstitutes an important source of response variabilityin disparity estimation (see Figures 4 5 7) Thus evenin a hypothetical visual system composed of noiseless

neurons significant response variability will occurwhen estimating disparity under natural conditions(see Figures 4 and 5)

Determining the best performance possible withoutneural noise is essential for understanding the effect ofneural noise (Haefner amp Bethge 2010 Lewicki 2002Li amp Atick 1994 Olshausen amp Field 1996 Simoncelliamp Olshausen 2001) because it sets bounds on theamount of neural noise that a system can toleratebefore performance significantly deteriorates Indeedthe impact of neural noise on encoding and decoding innatural viewing cannot be evaluated without consider-ing the impact of stimulus-induced response variabilityon the population response (see Supplement)

Seventy years ago Hecht Shlaer and Pirenne (1942)showed in a landmark study that dark-adapted humanscan detect light in the periphery from the absorption ofonly five to eight quanta and that external variability(in the form of photon noise) was the primary source ofvariability in human performance This result contra-dicted the notion (widespread at the time) that responsevariability is due primarily to variability within theorganism (eg intrinsic noise) Similarly our resultsunderscore the importance of analyzing the variabilityof natural signals when analyzing the responsevariability of neural circuits that underlie performancein critical behavioral tasks

Comparison with the disparity energy model

The most common neurophysiological model ofdisparity processing is the disparity energy modelwhich aims to account for the response properties ofbinocular complex cells (Cumming amp DeAngelis 2001DeAngelis et al 1991 Ohzawa 1998 Ohzawa et al1990) The model had remarkable initial successHowever an increasing number of discrepancies haveemerged between the neurophysiology and the modelrsquospredictions The standard disparity energy model forexample does not predict the reduced response toanticorrelated stereograms observed in cortex It alsodoes not predict the dependence of disparity selectivecells on more than two binocular subunits Fixes to thestandard model have been proposed to account for theobserved binocular response properties (Haefner ampCumming 2008 Qian amp Zhu 1997 Read Parker ampCumming 2002 Tanabe amp Cumming 2008) We haveshown however that a number of these properties area natural consequence of an optimal algorithm fordisparity estimation

The most common computational model of disparityprocessing is the local cross-correlation model (Bankset al 2004 Cormack et al 1991 Tyler amp Julesz 1978)In this model the estimated disparity is the one thatproduces the maximum local cross-correlation between

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 11

the images in the two eyes Local cross-correlation isthe computational equivalent of appropriately pro-cessing large populations of disparity-energy complexcells having a wide range of tuning characteristics(Anzai Ohzawa amp Freeman 1999 Fleet Wagner ampHeeger 1996) Cross-correlation is optimal when thedisparities across a patch are small and constantbecause it is then akin to the simple template matchingof noise-limited ideal observers Cross-correlation isnot optimal for larger or varying disparities becausethen the left- and right-eye signals falling within abinocular receptive field often differ substantially

How does our ideal observer for disparity estimationcompare with the standard disparity-energy and cross-correlation models To enable a meaningful compari-son both methods were given access to the same imageinformation Doing so requires a slight modification tothe standard cross correlation algorithm (seeSupplement) The eight AMA filters in Figure 3outperform cross-correlation in both accuracy andprecision when the disparities exceed approximately75 arcmin (Figure S6) This performance increase isimpressive given that local cross-correlation has (ineffect) access to a large bank of filters whereas ourmethod uses only eight Furthermore increasing thenumber of AMA filters to 10ndash12 gives essentiallyequivalent performance to cross-correlation for smallerdisparities

Spatial frequency tuning of optimal linearbinocular filters

The optimal linear binocular filters (the AMA filters)are selective for low to mid spatial frequencies (12ndash47c8) and not for higher frequencies (see Figure 3 FigureS1) To develop an intuition for why we examined thebinocular contrast signals as a function of frequencythat result from a binocularly viewed high-contrastedge (Figure S7) The binocular contrast signals barelydiffer above 6 c8 indicating that higher spatialfrequencies carry little information for disparity esti-mation

This analysis of binocular contrast signals providesinsight into another puzzling aspect of disparityprocessing Human sensitivity to disparity modulation(ie sinusoidal modulations in disparity-defined depth)cuts off at very low frequencies (4 c8) (Banks et al2004 Tyler 1975) V1 binocular receptive fields tunedto the highest useful (luminance) spatial frequency (6c8) have widths of 8 arcmin assuming the typicaloctave bandwidth for cortical neurons Given thatreceptive fields cannot signal variation finer than theirown size the fact that humans cannot see disparitymodulations higher than 4 c8 is nicely predicted bythe 8 arcmin width suggested by the present analysis

This size matches previous psychophysically basedestimates of the smallest useful mechanism in disparityestimation (Banks et al 2004 Harris McKee ampSmallman 1997)

Eye movement jitter has previously been proposed asan explanation for why useful binocular information isrestricted to low spatial frequencies (Vlaskamp Yoonamp Banks 2011) The argument is that jitter and therelatively long integration time of the stereo systemcould lsquolsquosmear outrsquorsquo the disparity signals Eye movementjitter certainly can degrade stereo information enoughto render high frequency signals unmeasurable (FigureS7c d) However the analysis in the supplement(Figure S7) suggests that the low frequency at whichhumans lose the ability to detect disparity modulationmay also be explained by the statistics of naturalimages

Generality of findings

Although our analysis is based on calibrated naturalstereo-image patches and realistic characterizations ofhuman optics sensors and neural noise there are somesimplifying assumptions that could potentially limit thegenerality of our conclusions Here we examine theeffect of these assumptions

The effect of surface slant and nonplanar depth structure

The stereo-image patches were projections offronto-parallel surfaces In natural scenes surfaces areoften slanted which causes the disparity to changeacross a patch We evaluated the effect of surface slantby repeating our analysis with a training set having adistribution of slants (see Figure 5d e andSupplement) This distribution of slants produced adistribution of disparity gradients that are comparableto those that occur when viewing natural scenes(Hibbard 2008) (Figure S1g) The optimal binocularfilters are only very slightly affected by surface slant(cf Figure 3 and Figure S1a) These differenceswould be difficult to detect in neurophysiologicalexperiments This may help explain why there hasbeen little success in finding neurons in early visualcortex that are tuned to nonzero slants (Nienborg etal 2004) Disparity estimation performance is alsoquite similar although estimate precision is somewhatreduced (Figure 5d e)

Even after including the effect of surface slant ourtraining set lacks the non-planar depth structuremdashwithin-patch depth variation and depth discontinuities(occlusions)mdashthat is present in natural scenes (FigureS8andashc) To determine how the magnitude of theseother sources depth variation compare to that due toslant we analyzed a set of range images obtained with

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 12

a high-precision Riegl VZ400 range scanner (seeSupplement) We find that most of the depth variationin the range images is captured by variation in slant(Figure S8d) Given that variation in slant has littleeffect on the optimal receptive fields and given thatlocations of fine depth structure and monocularocclusion zones (9 of points in our dataset) aremost likely random with respect to the receptive fieldsthen it is likely that the other sources of variation willhave little effect

Nevertheless occlusions create monocular zones thatcarry information supporting da Vinci stereopsis inhumans (Kaye 1978 Nakayama amp Shimojo 1990)Neurophysiological evidence suggests that binocularmechanisms exist for locating occlusion boundaries(Tsao Conway amp Livingstone 2003) In the futurewhen databases become available of high-qualitynatural stereo images with coregistered RGB anddistance information the general approach describedhere may prove useful for determining the filters thatare optimal for detecting occlusion boundaries

The effect of a realistic disparity prior

Our analysis assumed a flat prior probabilitydistribution over disparity Two groups have recentlypublished estimates of the prior distribution ofdisparities based on range measurements in naturalscenes and human eye movement statistics (Hibbard2008 Liu et al 2008) Does the prior distribution ofdisparity signals have a significant effect on the optimalfilters and estimation performance To check wemodified the frequency of different disparities in ourtraining set to match the prior distribution ofdisparities encountered in natural viewing (Liu et al2008) Linear filter shapes LL neuron tuning curveshapes and performance levels were robust to differ-ences between a flat and realistic disparity prior

It has been hypothesized that the prior over thestimulus dimension of interest (eg disparity) maydetermine the optimal tuning curve shapes and theoptimal distribution of peak tunings (ie how thetuning curves should tile the stimulus dimension)(Ganguli amp Simoncelli 2010) In evaluating thishypothesis one must consider the results presented inthe present paper Our results show that natural signalsand the linear receptive fields that filter those signalsplace strong constraints on the shapes that tuningcurves can have Specifically the shapes of the LLneuron disparity tuning curve (approximately log-Gaussian) are robust to changes in the prior Thusalthough the prior may influence the optimal distribu-tion of peak tuning (our analysis does not address thisissue) it is unlikely to be the sole (or even the primary)determinant of tuning curve shapes

The effect of contrast normalization

Contrast normalization significantly contributes tothe Gaussian form of the filter response distributionsOne potential advantage of Gaussian response distri-butions is that pair-wise (and lower order) statisticsfully characterize the joint responses from an arbitrarynumber of filters making possible the decoding of largefilter population responses In the analysis presentedhere all binocular signals were normalized to a mean ofzero and a vector magnitude of 10 before beingprojected onto the binocular filters This is a simplifiedform of the contrast normalization that is a ubiquitousfeature of retinal and early cortical processing (Car-andini amp Heeger 2012) The standard model ofcontrast normalization in cortical neurons is given by

cnorm(x)frac14 c(x)ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffijjcethxTHORNjj2 thorn nc2

50

qwhere n is the dimen-

sionality of the vector and c50 is the half-saturationconstant (Albrecht amp Geisler 1991 Albrecht ampHamilton 1982 Heeger 1992) (The half-saturationconstant is so-called because in a neuron with anoutput squaring nonlinearity the response rate willequal half its maximum when the contrast of thestimulus equals the value of c50)

We examined the effect of different half-saturationconstants The optimal filters are robust to differentvalues of c50 The conditional response distributions arenot For large values of c50 the response distributionshave tails much heavier than Gaussians When c50frac14 00(as it did throughout the paper) the distributions arewell approximated but somewhat lighter-tailed thanGaussians The distributions (eg see Figure 4a) aremost Gaussian on average when c50 frac14 01 (Figure S9)On the basis of this finding we hypothesize a newfunction for cortical contrast normalization in additionto the many already proposed (Carandini amp Heeger2012) Contrast normalization may help create condi-tional filter response distributions that are Gaussianthereby making simpler the encoding and decoding ofhigh-dimensional subspaces of retinal image informa-tion

The effect of different optics in the two eyes

We modeled the optics of the two eyes as beingidentical whereas refractive power and monochromaticaberrations often differ somewhat between the eyes(Marcos amp Burns 2000) To evaluate the effect ofnormal optical variations we convolved the retinalprojections with point-spread functions previouslymeasured in the first authorrsquos left and right eyes (Burgeamp Geisler 2011) (The first author has normal opticalvariation) Then we determined optimal filters andperformance Filters and performance levels are similarto those in Figures 3 and 4 Thus our results are robustto typical optical differences between the eyes How-

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 13

ever if the optics in one eye is severely degradedrelative to the other then the optimal filters are quitedifferent and disparity estimation performance issignificantly poorer

Interestingly a persistent 4 diopter difference inrefractive power between the two eyes is the mostimportant risk factor for the development of amblyopia(Levi McKee amp Movshon 2011) a condition char-acterized by extremely poor or absent stereopsis(among other deficits) A four diopter differencebetween the left and right eye optics drastically reduces(ie nearly eliminates) computable interocular contrastsignals that covary systematically with disparity(Equation S6 Figure S7endashh)

Conclusions

The method presented here provides a prescriptionfor how to optimally estimate local depth cues fromimages captured by the photosensors in any visionsystem Our analysis of the depth cue of binoculardisparity provides insight into the computations andneural populations required for accurate estimation ofbinocular disparity in animals viewing natural scenesSpecifically by analyzing the information available atthe photoreceptors we improve upon existing compu-tational methods for solving the stereo correspondenceproblem and provide a normative account of a range ofestablished neurophysiological and psychophysicalfindings This study demonstrates the power ofcharacterizing the properties of natural signals that arerelevant for performing specific natural tasks A similarrecent analysis provided insight into the neuralcomputations required for accurate estimation of thefocus error (defocus) in local regions of retinal imagesof natural scenes (Burge amp Geisler 2011 Stevenson etal 1992) The same approach seems poised to provideinsight into the neural encoding and decoding thatunderlie many other fundamental sensory and percep-tual tasks

Methods details

Natural disparity signals

To determine the luminance structure of naturalscenes we photographed natural scenes on and aroundthe University of Texas at Austin campus with a tripod-mounted Nikon D700 14-bit SLR camera (4256 middot 2836pixels) fitted with a Sigma 50 mm prime lens (Burge ampGeisler 2011) To ensure sharp photographs thecamera lens was focused on optical infinity and all

imaged objects were at least 16 m from the cameraRAW photographs were converted to luminance valuesusing the measured sensitivities of the camera (Burge ampGeisler 2011 2012) and the human photopic sensitivityfunction (Stockman amp Sharpe 2000) Twelve hundred256 middot 256 pixel patches were randomly selected from80 photographs (15 Patches middot 80 Images Figure 1b)Four hundred were used for training the other 800were used for testing

To approximate the depth structure of naturalscenes we texture map the luminance patches ontoplanar fronto-parallel or slanted surface patches thatwere straight ahead at eye height After perspectiveprojection this procedure yields the same pattern ofretinal stimulation that would be created by viewingphotographs pasted on surfaces (eg walls) differentlyslanted in depth The disparity of the surface patch wasdefined as the disparity of the surfacersquos central point

d frac14 2 tan1 IPD=2

dfixation thorn D

tan1 IPD=2

ethdfixationTHORN

eth6THORNwhere dfixation is the fixation distance to the surfacecenter IPD is the interpupillary distance and D is thedepth Depth is given by D frac14 dsurface dfixation wheredsurface is the distance to the surface The disparity ofother points on the surface patch varies slightly acrossthe surface patch The change in disparity across thesurface is greater when the surface patches are slanted

Each surface patch was either coincident with ordisplaced from the point of fixation (Figure 1c) that isthe surface patches were positioned at one of 37 depthscorresponding to 37 disparity levels within Panumrsquosfusional range (Panum 1858) (16875 to 16875arcmin in equally spaced steps) Each surface patchsubtended 28 of visual angle from the cyclopean eyeSurfaces were then slanted (or not) and left- and right-eye projections were determined via perspective pro-jection ensuring that horizontal and vertical disparitieswere correct (Figure 1c) Next left and right projec-tions were resampled at 128 samples8 approximatelythe sampling rate of the human foveal cones (CurcioSloan Kalina amp Hendrickson 1990) Finally theimages were cropped such that they subtended 18 fromeach eye (6 058 about the left- and right-eye foveae)

Optics

Patches were defocused with a polychromatic point-spread function based on a wave-optics model of thehuman visual system The model assumed a 2 mm pupil(a size typical for a bright sunny day) (Wyszecki ampStiles 1982) human chromatic aberrations (ThibosYe Zhang amp Bradley 1992) a single refracting

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 14

surface and the Fraunhoffer approximation whichimplies that at or near the focal plane the opticaltransfer function (OTF) is given by the cross-correla-tion of the generalized pupil function with its complexconjugate In humans and nonhuman primates ac-commodative and vergence eye-movement systems aretightly coupled In other words the focus distanceusually equals the fixation distance (Fincham ampWalton 1957) We set the focus distance equal to thefixation distance of 40 cm and defocused the imagesappropriate for each disparity See Supplement forfurther details of how the optics were simulated

Sensor array responses

The information for disparity estimation depends onthe depth and luminance structure of natural scenesprojective geometry the optics of the two eyes thesensor arrays and the sensor noise These factorstogether determine the pattern of noisy sensor re-sponses

reethxTHORN frac14 IeethxTHORNpsfeethxTHORNfrac12 sampethxTHORN thorn g eth7THORNfor each eye e Ie(x) represents an eye-specific lumi-nance image of the light striking the sensor array ateach location xfrac14 (x y) in a hypothetical optical systemthat causes zero degradation in image quality Eacheyersquos optics degrade the idealized image the optics arerepresented by a polychromatic point-spread functionpsfe(x) that contains the effects of defocus chromaticaberration and photopic wavelength sensitivity Thesensor arrays are represented by a spatial samplingfunctions samp(x) Finally the sensor responses arecorrupted by noise g the noise level was set just highenough to remove retinal image detail that is unde-tectable by the human visual system (Williams 1985)(In pilot studies we found that the falloff in contrastsensitivity at low frequencies had a negligible effect onresults for simplicity we did not model its effects) Notethat Equation 7 represents a luminance (photopic)approximation of the images that would be captured bythe photoreceptors This approximation is sufficientlyaccurate for the present purposes (Burge amp Geisler2011)

Accuracy maximization analysis (AMA)

AMA is a method for dimensionality reduction thatfinds the low-dimensional set of features in sensorysignals that are most useful for performing a specifictask The dependence of AMA on task renders itdistinct from many other popular methods for dimen-sionality reduction Principal components analysis(PCA) for example finds the dimensions that account

for the most variation in the sensory signals There isof course no guarantee that factors causing the mostvariation in the sensory signals will be useful for thetask at hand Color lighting and the multitude oftextures in natural images for example cause signifi-cant variation in the retinal images Yet all of thisvariation is irrelevant for the task of estimatingdisparity Thus PCA returns features that are notnecessarily useful to the task while AMA is specificallydesigned to ignore the irrelevant variation

The logic of AMA is as follows Consider encodingeach training stimulus with a small population of filterswith known (internal) noise characteristics The filterresponses are given by the dot product of the contrast-normalized stimulus scaled by the maximum responserate plus response noise (here a small amount ofGaussian noise) With a known noise model it isstraightforward to compute the mean and variance ofeach filterrsquos response to each training stimulus Then aclosed-form expression is derived for the approximateaccuracy of the Bayesian optimal decoder (GeislerNajemnik amp Ing 2009) Finally this closed-formexpression can be used to search the space of linearfilters to find those that give the most accurateperformance If the algorithm does not settle in localminima it finds the Bayes-optimal filters for maximiz-ing performance in a given task (Geisler et al 2009)Here different random initializations yielded the samefinal estimated filters A Matlab implementation ofAMA and a short discussion of how to apply it areavailable at httpjburgecpsutexaseduresearchCodehtml

It is important to note that the Bayesian optimaldecoder used in AMA requires knowing the means andvariances for each possible stimulus and hence it canonly be used to decode the training stimuli In otherwords AMA can find the optimal linear filters given alarge enough training set but the decoder it uses tolearn those filters cannot be applied to arbitrary stimuliA separate analysis (like the one described in the bodyof the present paper) is required to determine how tooptimally decode the AMA filter responses for arbi-trary test stimuli

Estimating disparity

Increasing the number of disparity levels in thetraining set increases the accuracy of disparity estima-tion However increasing the number of disparitylevels in the training set also increases the training setsize and the computational complexity of learningfilters via AMA A balance must be struck Excellentcontinuous estimates are obtained using 1875 arcminsteps for training followed by interpolation of 577 filterresponse distributions (ie 31 interpolated response

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 15

distribution between each training step correspondingto one interpolated distribution every 35 arcsec)Interpolated distributions were obtained by fittingcubic splines through the response distribution meanvectors and covariance matrices This procedureresulted in 577 LL neurons which resulted in posteriorprobability distributions that were defined by 577discrete points MAP estimates were obtained byselecting the disparity with the highest posteriorprobability Increasing the number of interpolateddistributions (ie LL neurons) had no effect onperformance

Keywords natural scene statistics perceptual con-stancy ideal observer Bayesian statistics populationcode encoding decoding selectivity invariance depthperception stereopsis hierarchical model disparityenergy model simple cells complex cells

Acknowledgments

The authors thank Larry Cormack Bruno Olshau-sen and Jonathan Pillow for comments on an earlierdraft JB was supported by NSF grant IIS-1111328NIH grant 2R01-EY011747 and NIH Training Grant1T32-EY021462-01A1 WSG was supported by NIHgrant 2R01-EY011747

Commercial relationships noneCorresponding author Johannes BurgeEmail jburgemailcpsutexaseduAddress Center for Perceptual Systems University ofTexas at Austin Austin TX USA

References

Albrecht D G amp Geisler W S (1991) Motionselectivity and the contrast-response function ofsimple cells in the visual cortex Visual Neurosci-ence 7(6) 531ndash546

Albrecht D G amp Hamilton D B (1982) Striatecortex of monkey and cat Contrast responsefunction Journal of Neurophysiology 48(1) 217ndash237

Anzai A Ohzawa I amp Freeman R D (1999)Neural mechanisms for processing binocular in-formation I Simple cells Journal of Neurophysiol-ogy 82(2) 891ndash908

Badcock D R amp Schor C M (1985) Depth-increment detection function for individual spatialchannels Journal of the Optical Society of AmericaA 2(7) 1211ndash1215

Banks M S Gepshtein S amp Landy M S (2004)Why is spatial stereoresolution so low Journal ofNeuroscience 24(9) 2077ndash2089

Berens P Ecker A S Gerwinn S Tolias A S ampBethge M (2011) Reassessing optimal neuralpopulation codes with neurometric functionsProceedings of the National Academy of SciencesUSA 108(11) 4423

Blakemore C (1970) The range and scope ofbinocular depth discrimination in man The Journalof Physiology 211(3) 599ndash622

Burge J amp Geisler W S (2011) Optimal defocusestimation in individual natural images Proceed-ings of the National Academy of Sciences USA108(40) 16849ndash16854

Burge J amp Geisler W S (2012) Optimal defocusestimates from individual images for autofocusing adigital camera Proceedings of the ISampTSPIE 47thAnnual Meeting January 2012 Burlingame CA

Carandini M amp Heeger D J (2012) Normalizationas a canonical neural computation Nature ReviewsNeuroscience 13(1) 51ndash62

Chen Y amp Qian N (2004) A coarse-to-fine disparityenergy model with both phase-shift and position-shift receptive field mechanisms Neural Computa-tion 16(8) 1545ndash1577

Cormack L K Stevenson S B amp Schor C M(1991) Interocular correlation luminance contrastand cyclopean processing Vision Research 31(12)2195ndash2207

Cumming B G (2002) An unexpected specializationfor horizontal disparity in primate primary visualcortex Nature 418(6898) 633ndash636

Cumming B G amp DeAngelis G C (2001) Thephysiology of stereopsis Annual Review of Neuro-science 24 203ndash238

Cumming B G Shapiro S E amp Parker A J (1998)Disparity detection in anticorrelated stereogramsPerception 27(11) 1367ndash1377

Curcio C A Sloan K R Kalina R E ampHendrickson A E (1990) Human photoreceptortopography The Journal of Comparative Neurolo-gy 292(4) 497ndash523

De Valois R L Albrecht D G amp Thorell L G(1982) Spatial frequency selectivity of cells inmacaque visual cortex Vision Research 22(5) 545ndash559

DeAngelis G C Ohzawa I amp Freeman R D(1991) Depth is encoded in the visual cortex by aspecialized receptive field structure Nature352(6331) 156ndash159

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 16

Ferster D amp Miller K D (2000) Neural mechanismsof orientation selectivity in the visual cortexAnnual Review of Neuroscience 23(1) 441ndash471

Fincham E amp Walton J (1957) The reciprocalactions of accommodation and convergence TheJournal of Physiology 137(3) 488ndash508

Fleet D J Wagner H amp Heeger D J (1996)Neural encoding of binocular disparity Energymodels position shifts and phase shifts VisionResearch 36(12) 1839ndash1857

Ganguli D amp Simoncelli E P (2010) Implicitencoding of prior probabilities in optimal neuralpopulations Advances in Neural Information Pro-cessing Systems (NIPS) 23 658ndash666

Geisler W S Najemnik J amp Ing A D (2009)Optimal stimulus encoders for natural tasksJournal of Vision 9(13)17 1ndash16 httpwwwjournalofvisionorgcontent91317 10116791317 [PubMed] [Article]

Girshick A R Landy M S amp Simoncelli E P(2011) Cardinal rules Visual orientation percep-tion reflects knowledge of environmental statisticsNature Neuroscience 14(7) 926ndash932

Haefner R amp Bethge M (2010) Evaluating neuronalcodes for inference using Fisher informationAdvances in Neural Information Processing Systems23 1ndash9

Haefner R M amp Cumming B G (2008) Adaptationto natural binocular disparities in primate V1explained by a generalized energy model Neuron57(1) 147ndash158

Harris J M McKee S P amp Smallman H S (1997)Fine-scale processing in human binocular stereop-sis Journal of the Optical Society of America AOptics Image Science and Vision 14(8) 1673ndash1683

Hecht S Shlaer S amp Pirenne M H (1942) Energyquanta and vision The Journal of General Physi-ology 25(6) 819ndash840

Heeger D J (1992) Normalization of cell responses incat striate cortex Visual Neuroscience 9(2) 181ndash197

Hibbard P B (2008) Binocular energy responses tonatural images Vision Research 48(12) 1427ndash1439

Hoyer P O amp Hyvarinen A (2000) Independentcomponent analysis applied to feature extractionfrom colour and stereo images Network Compu-tation in Neural Systems 11(3) 191ndash210

Jazayeri M amp Movshon J A (2006) Optimalrepresentation of sensory information by neuralpopulations Nature Neuroscience 9(5) 690ndash696

Kaye M (1978) Stereopsis without binocular corre-lation Vision Research 18(8) 1013ndash1022

Landers D D amp Cormack L K (1997) Asymme-tries and errors in perception of depth fromdisparity suggest a multicomponent model ofdisparity processing Perception amp Psychophysics59(2) 219ndash231

Lehky S R amp Sejnowski T J (1990) Neural modelof stereoacuity and depth interpolation based on adistributed representation of stereo disparityJournal of Neuroscience 10(7) 2281ndash2299

Levi D M McKee S P amp Movshon J A (2011)Visual deficits in anisometropia Vision Research51(1) 48ndash57

Lewicki M S (2002) Efficient coding of naturalsounds Nature Neuroscience 5(4) 356ndash363

Li Z amp Atick J J (1994) Efficient stereo coding inthe multiscale representation Network Computa-tion in Neural Systems 5 157ndash174

Liu Y Bovik A C amp Cormack L K (2008)Disparity statistics in natural scenes Journal ofVision 8(11)19 1ndash14 httpwwwjournalofvisionorgcontent81119 doi10116781119 [PubMed][Article]

Ma W J Beck J M Latham P E amp Pouget A(2006) Bayesian inference with probabilistic pop-ulation codes Nature Neuroscience 9(11) 1432ndash1438

Marcos S amp Burns S A (2000) On the symmetrybetween eyes of wavefront aberration and conedirectionality Vision Research 40(18) 2437ndash2447

Marr D amp Poggio T (1976) Cooperative computa-tion of stereo disparity Science 194(4262) 283ndash287

McKee S P Levi D M amp Bowne S F (1990) Theimprecision of stereopsis Vision Research 30(11)1763ndash1779

Nakayama K amp Shimojo S (1990) Da Vincistereopsis Depth and subjective occluding contoursfrom unpaired image points Vision Research30(11) 1811ndash1825

Nienborg H Bridge H Parker A J amp CummingB G (2004) Receptive field size in V1 neuronslimits acuity for perceiving disparity modulationJournal of Neuroscience 24(9) 2065ndash2076

Ogle K N (1952) On the limits of stereoscopic visionJournal of Experimental Psychology 44(4) 253ndash259

Ohzawa I (1998) Mechanisms of stereoscopic visionThe disparity energy model Current Opinion inNeurobiology 8(4) 509ndash515

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 17

Ohzawa I DeAngelis G C amp Freeman R D(1990) Stereoscopic depth discrimination in thevisual cortex Neurons ideally suited as disparitydetectors Science 249(4972) 1037ndash1041

Ohzawa I DeAngelis G C amp Freeman R D(1997) Encoding of binocular disparity by complexcells in the catrsquos visual cortex Journal of Neuro-physiology 77(6) 2879ndash2909

Olshausen B A amp Field D J (1996) Emergence ofsimple-cell receptive field properties by learning asparse code for natural images Nature 381(6583)607ndash609

Panum P (1858) Physiologische Untersuchungen uberdas Sehen mit zwei Augen [Translation Psycholog-ical investigations on seeing with two eyes] KielSchwerssche Buchandlung

Poggio G F Gonzalez F amp Krause F (1988)Stereoscopic mechanisms in monkey visual cortexBinocular correlation and disparity selectivityJournal of Neuroscience 8(12) 4531ndash4550

Qian N (1997) Binocular disparity and the perceptionof depth Neuron 18(3) 359ndash368

Qian N amp Zhu Y (1997) Physiological computationof binocular disparity Vision Research 37(13)1811ndash1827

Read J C A amp Cumming B G (2003) Testingquantitative models of binocular disparity selectiv-ity in primary visual cortex Journal of Neurophys-iology 90(5) 2795ndash2817

Read J C A amp Cumming B G (2007) Sensors forimpossible stimuli may solve the stereo correspon-dence problem Nature Neuroscience 10(10) 1322ndash1328

Read J C A Parker A J amp Cumming B G (2002)A simple model accounts for the response ofdisparity-tuned V1 neurons to anticorrelated im-ages Visual Neuroscience 19(6) 735ndash753

Simoncelli E P amp Olshausen B A (2001) Naturalimage statistics and neural representation AnnualReview of Neuroscience 24 1193ndash1216

Skottun B C De Valois R L Grosof D HMovshon J A Albrecht D G amp Bonds A B(1991) Classifying simple and complex cells on the

basis of response modulation Vision Research31(7-8) 1079ndash1086

Stevenson S B Cormack L K Schor C M ampTyler C W (1992) Disparity tuning in mecha-nisms of human stereopsis Vision Research 32(9)1685ndash1694

Stockman A amp Sharpe L T (2000) The spectralsensitivities of the middle- and long-wavelength-sensitive cones derived from measurements inobservers of known genotype Vision Research40(13) 1711ndash1737

Tanabe S amp Cumming B G (2008) Mechanismsunderlying the transformation of disparity signalsfrom V1 to V2 in the macaque Journal ofNeuroscience 28(44) 11304ndash11314

Tanabe S Haefner R M amp Cumming B G (2011)Suppressive mechanisms in monkey V1 help tosolve the stereo correspondence problem Journal ofNeuroscience 31(22) 8295ndash8305

Thibos L N Ye M Zhang X amp Bradley A(1992) The chromatic eye A new reduced-eyemodel of ocular chromatic aberration in humansApplied Optics 31(19) 3594ndash3600

Tsao D Y Conway B R amp Livingstone M S(2003) Receptive fields of disparity-tuned simplecells in macaque V1 Neuron 38(1) 103ndash114

Tyler C W (1975) Spatial organization of binoculardisparity sensitivity Vision Research 15(5) 583ndash590

Tyler C W amp Julesz B (1978) Binocular cross-correlation in time and space Vision Research18(1) 101ndash105

Vlaskamp B N S Yoon G amp Banks M S (2011)Human stereopsis is not limited by the optics of thewell-focused eye Journal of Neuroscience 31(27)9814ndash9818

Williams D R (1985) Visibility of interference fringesnear the resolution limit Journal of the OpticalSociety of America A 2(7) 1087ndash1093

Wyszecki G amp Stiles W (1982) Color scienceConcepts and methods quantitative data and for-mulas New York John Wiley amp Sons

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 18

113 13

Optimal13 disparity13 estimation13 in13 natural13 stereo-shy‐images13 (Supplement)13

Johannes13 Burge13 amp13 Wilson13 S13 Geisler13 13 Computing13 the13 log13 likelihood13 with13 populations13 of13 simple13 and13 complex13 cells13 13 Simple13 cells13 in13 visual13 cortex13 produce13 only13 positive13 responses13 whereas13 AMA13 filters13 produce13 both13 positive13 and13 negative13 responses13 The13 linear13 AMA13 filter13 responses13 (see13 first13 term13 in13 Eq13 5)13 and13 their13 pairwise13 sums13 (see13 the13 sum13 in13 the13 third13 termrsquos13 parentheses)13 can13 be13 obtained13 from13 a13 population13 of13 simple13 cells13 We13 define13 simple13 cells13 as13 units13 whose13 outputs13 result13 from13 linear13 filtering13 followed13 by13 half-shy‐wave13 rectification13 (ie13 thresholding13 at13 zero)13 Each13 AMA13 filter13 response13 in13 the13 first13 term13 can13 be13 obtained13 from13 two13 simple13 cells13 that13 are13 lsquoonrsquo13 and13 lsquooffrsquo13 versions13 of13 each13 AMA13 filter13 (Fig13 S3a13 see13 Fig13 3)13 Each13 pairwise-shy‐summed13 AMA13 filter13 response13 in13 the13 third13 term13 can13 be13 obtained13 from13 two13 simple13 cells13 that13 are13 lsquoonrsquo13 and13 lsquooffrsquo13 versions13 of13 each13 pairwise-shy‐summed13 filter13 (Fig13 S3a13 S4)13 13 13 The13 squared13 AMA13 filter13 responses13 (see13 the13 second13 and13 third13 terms13 in13 Eq13 5)13 can13 be13 obtained13 from13 a13 population13 of13 complex13 cells13 We13 define13 complex13 cells13 as13 units13 whose13 responses13 result13 from13 linear13 filtering13 followed13 by13 a13 squaring13 non-shy‐linearity13 as13 is13 often13 the13 case13 in13 cortex13 Complex13 cells13 can13 be13 implemented13 by13 summing13 and13 then13 squaring13 (or13 squaring13 and13 then13 summing)13 the13 outputs13 of13 lsquoonrsquo13 and13 lsquooffrsquo13 simple13 cells13 (Fig13 S3b13 the13 disparity13 tuning13 curves13 of13 these13 complex13 cells13 are13 given13 in13 Fig13 S5)13 13 Finally13 neurons13 whose13 response13 rates13 represent13 the13 log13 likelihood13 of13 the13 AMA13 filter13 population13 responses13 can13 be13 constructed13 via13 a13 weighted13 sum13 of13 simple13 and13 complex13 cell13 population13 responses13 (Eq13 513 Fig13 613 Fig13 7)13 (Note13 that13 the13 likelihood13 can13 be13 obtained13 by13 exponentiating13 the13 log-shy‐likelihood13 responses)13 In13 other13 words13 a13 large13 collection13 of13 log-shy‐likelihood13 (LL)13 neurons13 each13 with13 a13 different13 preferred13 disparity13 can13 be13 constructed13 from13 a13 fixed13 set13 of13 simple13 and13 complex13 cells13 simply13 by13 changing13 the13 weights13 13 13 Derivation13 of13 optimal13 weights13 on13 simple13 and13 complex13 cells13 The13 first13 term13 in13 Eq13 513 (ie13 the13 linear13 term)13 can13 be13 conceptualized13 as13 a13 weighted13 sum13 of13 simple13 cell13 population13 responses13 The13 second13 and13 third13 terms13 (ie13 the13 quadratic13 terms)13 can13 be13 conceptualized13 as13 weighted13 sums13 of13 complex13 cell13 population13 responses13 The13 constant13 corresponds13 to13 baseline13 response13 The13 constant13 and13 the13 weights13 on13 the13 linear13 squared13 and13 sum-shy‐squared13 filters13 are13 given13 by13

wik = Ck

minus1uk 13 13 13 13 13 13 13 13 13 (S1a)13

13 wiik = minusdiag(Cminus1)+ 05Cminus11 13 13 13 13 13 13 13 (S1b)13

wij k = minus05Cij

minus1 forallij j gt i 13 13 13 13 13 13 13 13 (S1c)13

cons primetk = minus05ukTCk

minus1uk + constk 13 13 13 13 13 13 (S1d)13 13 where13 I13 is13 the13 identity13 matrix13 113 is13 the13 lsquoonesrsquo13 vector13 and13 diag()13 sets13 a13 matrix13 diagonal13 to13 a13 vector13 Here13 we13 derive13 these13 weight13 equations13 based13 on13 the13 multi-shy‐dimensional13 Gaussian13 approximations13 to13 the13 empirically13 determined13 filter13 response13 distributions13 (ie13 conditional13 response13 distributions)13 (see13 Fig13 4a)13 13 13 13 13 13

213 13

First13 we13 rewrite13 Eq13 513 in13 the13 main13 text13 by13 expanding13 its13 third13 term13 13

ln p R δ k( ) = wik Ri

i=1

n

sum + wiik Ri2

i=1

n

sum + wij k Ri2 + 2Ri Rj + Rj

2( )j=i+1

n

sumi=1

nminus1

sum + cons primet 13 (S2)13

13 Second13 we13 rewrite13 Eq13 413 in13 the13 main13 text13

13

ln p R δ k( ) = minus05 R minus uk( )T Ak R minus uk( )+ const 13 13 13 13 13 (S3)13

13 where13 Ak 13 equals13 the13 inverse13 covariance13 matrix13 Ck

minus1 13 (Recall13 that13 the13 mean13 vector13 uk and13 covariance13 matrix13 Ck are13 obtained13 by13 projecting13 a13 collection13 of13 natural13 stereo-shy‐images13 having13 disparity13 δ k 13 onto13 the13 filters)13 From13 here13 forward13 the13 disparity13 index13 k13 is13 dropped13 for13 notational13 simplicity13 Third13 we13 set13 Eqs13 S213 and13 S313 equal13 to13 one13 another13 multiply13 through13 and13 collect13 terms13 The13 terms13 of13 the13 log13 likelihood13 (Eq13 S3)13 are13 found13 to13 have13 the13 form13 13

( ) ( )2 2 205 05 2ii i i ii i ii i i ii ia R u a R a Ru a uminus minus = minus minus + 13 13 13 13 (S4a)13

minusaij Ri minus ui( ) Rj minus u j( ) = minus aijRiRj minus aijRiu j minus aijRjui + aiju jui( ) 13 13 13 (S4b)13

13 where13 iia 13 are13 the13 on-shy‐diagonal13 elements13 of13 A 13 ija are13 the13 off-shy‐diagonal13 elements13 and13 iu 13 the13 are13 the13 mean13 filter13 responses13 13 13 The13 aim13 is13 now13 to13 express13 wi wii 13 and13 wij 13 in13 terms13 of13 iu 13 iia 13 and13 ija 13 First13 consider13 wi 13 In13 this13 case13

we13 want13 to13 collect13 terms13 containing13 iR 13 wiRi 13 from13 Eq13 S213 and13 aiiRiui 13 and13 aijRiu j 13 from13 Eq13 S4a13 Setting13 the13 iR 13 terms13 equal13 and13 canceling13 gives13 13

wi = aiju jj=1

n

sum 13 13 13 13 13 13 13 13 13 (S5a)13

13 Note13 that13 the13 weights13 on13 the13 linear13 filter13 responses13 wi 13 are13 zero13 if13 the13 mean13 linear13 filter13 responses13 ui 13 are13 zero13 In13 our13 case13 the13 filter13 response13 distributions13 means13 (ie13 the13 means13 of13 the13 conditional13 response13 distributions)13 are13 all13 near13 zero13 (see13 Fig13 4a13 Fig13 S2)13 Thus13 the13 weights13 on13 the13 simple13 cell13 responses13 (Fig13 6a)13 contribute13 little13 to13 the13 units13 (LL13 neurons)13 that13 respond13 according13 to13 the13 log13 likelihood13 of13 the13 linear13 filter13 population13 response13 (see13 Discussion)13 13 Next13 consider13 wij 13 i ne j 13 In13 this13 case13 we13 want13 to13 collect13 terms13 containing13 RiRj 13 2wijRiRj 13 appears13

in13 Eq13 S213 and13 minusaijRiRj 13 appears13 in13 Eq13 S4b13 Setting13 the13 two13 RiRj 13 terms13 equal13 to13 each13 other13 and13 canceling13 gives13 13

wij = minus05aij 13 13 13 13 13 13 13 13 13 (S5b)13

313 13

Next13 consider13 wii 13 In13 this13 case13 the13 total13 weight13 on13 2iR 13 in13 Eq13 S213 must13 be13 05 iiaminus 13 The13 weight13 from13

the13 third13 term13 of13 Eq13 S213 can13 be13 found13 by13 substituting13 wij = minus05aij 13 Subtracting13 that13 weight13 from13 the13

total13 weight13 specified13 by13 Eq13 S413 gives13 wii 13 13

wii = minusaii + 05 aijjsum 13 13 13 13 13 13 13 13 13 (S5c)13

13 Last13 by13 grouping13 terms13 from13 Eq13 S413 that13 are13 independent13 of13 the13 filter13 response13 we13 find13 the13 constant13 in13 Eq13 S213 is13 given13 by13 13

cons primet = minus05 aiimicroi2 + aijmicroimicro j

j foralljneisum

⎝⎜⎞

⎠⎟isum + const 13 13 13 13 (S5d)13

13 Eqs13 S1a-shy‐d13 are13 obtained13 by13 rewriting13 Eqs13 S5a-shy‐d13 in13 matrix13 notation13 13 13 In13 visual13 cortex13 simple13 and13 complex13 cells13 have13 the13 same13 maximum13 firing13 rates13 on13 average13 (Albrecht13 amp13 Hamilton13 1982)13 However13 the13 maximum13 possible13 response13 of13 the13 squared13 filters13 (see13 the13 second13 term13 in13 Eq13 5)13 is13 R2max 13 and13 the13 maximum13 possible13 response13 of13 the13 sum-shy‐squared13 filters13 (see13 the13 third13

term13 in13 Eq13 5)13 is13 given13 by13 R2ijmax = Rmax 2+ 2fi sdot f j( )2 13 where13 fi 13 and13 f j 13 are13 the13 weighting13 functions13 defining13 filters13 i13 and13 j13 (see13 Fig13 3a)13 Thus13 if13 these13 computations13 were13 implemented13 in13 cortexmdashthat13 is13 if13 the13 maximum13 response13 of13 the13 complex13 cells13 equaled13 the13 maximum13 response13 of13 the13 simple13 cells13 Rmax 13 rather13 than13 R

2max 13 and13 R

2ijmaxmdashthe13 weights13 on13 the13 complex13 cell13 responses13 must13 be13 scaled13

Thus13 the13 optimal13 weights13 on13 complex13 cells13 are13 given13 by13 primewi = wi 13 primewii = Rmaxwii 13 and13 primewij =Rijmax2

Rmaxwij 13

These13 scale13 factors13 are13 important13 because13 in13 a13 neurophysiological13 experiment13 one13 would13 measure13 primew 13 not13 w 13 Consequently13 primewii 13 and13 primewij 13 are13 the13 normalized13 weights13 plotted13 in13 Fig13 7b13

13 Evaluating13 the13 effect13 of13 surface13 slant13 Slant13 causes13 disparity13 to13 change13 across13 the13 patch13 We13 evaluated13 the13 effect13 of13 surface13 slant13 by13 determining13 the13 optimal13 filters13 and13 estimation13 performance13 for13 a13 training13 set13 with13 a13 distribution13 of13 slants13 If13 we13 assume13 (plausibly)13 that13 objects13 are13 on13 average13 equally13 likely13 to13 be13 viewed13 from13 any13 given13 angle13 then13 for13 a13 given13 small13 angular13 retinal13 extent13 the13 probability13 of13 a13 given13 slant13 about13 the13 vertical13 axis13 should13 fall13 as13 a13 half-shy‐cosine13 function13 Slant13 was13 calculated13 for13 the13 cyclopean13 eye13 To13 evaluate13 the13 effect13 of13 surface13 slant13 variation13 we13 generated13 a13 training13 set13 of13 stereo-shy‐images13 from13 a13 set13 of13 surfaces13 with13 slants13 that13 were13 randomly13 drawn13 from13 a13 truncated13 (+7113 deg)13 cosine13 probability13 density13 (more13 extreme13 slants13 were13 not13 practical13 to13 include13 because13 of13 the13 surface13 size13 required13 to13 produce13 a13 projected13 patch13 width13 of13 113 deg)13 The13 within-shy‐patch13 changes13 in13 disparity13 that13 were13 introduced13 by13 these13 slants13 are13 comparable13 to13 those13 in13 natural13 viewing13 (Fig13 S1g)13 Results13 based13 on13 this13 analysis13 are13 thus13 likely13 to13 be13 representative13 Then13 we13 determined13 the13 optimal13 binocular13 filters13 for13 this13 new13 training13 set13 The13 filters13 that13 were13 trained13 on13 surfaces13 having13 different13 slants13 (Fig13 S1a-shy‐c)13 and13 the13 LL13 neurons13 that13 are13 constructed13 from13 them13 (Fig13 S1de)13 are13 quite13 similar13 to13 those13 that13 were13 trained13 only13 on13 fronto-shy‐parallel13 surfaces13 The13 main13 (although13 slight)13 differences13 are13 that13 the13 new13 filters13 have13 spatial13 extents13 that13 are13 slightly13 smaller13 and13 octave13 bandwidths13 that13 are13 slightly13 higher13

413 13

and13 that13 the13 new13 LL13 neuron13 tuning13 curves13 are13 broader13 Disparity13 estimation13 performance13 on13 slant13 surfaces13 is13 also13 comparable13 but13 precision13 is13 somewhat13 reduced13 (Fig13 S1f13 Fig13 5d)13 When13 disparity13 changes13 across13 the13 patch13 the13 optimal13 encoding13 filters13 must13 be13 small13 enough13 to13 prevent13 the13 disparity13 signal13 from13 being13 lsquosmeared13 outrsquo13 by13 within-shy‐patch13 (ie13 within-shy‐receptive13 field)13 disparity13 variation13 And13 they13 must13 be13 large13 enough13 to13 prevent13 a13 poor13 signal-shy‐to-shy‐noise13 ratio13 which13 would13 lead13 to13 a13 preponderance13 of13 false13 matches13 (ie13 inaccurate13 disparity13 estimates)13 The13 optimal13 filters13 appear13 to13 strike13 this13 balance13 13 13 Evaluating13 the13 effect13 of13 depth13 variations13 other13 than13 slant13 Naturally13 occurring13 fine13 depth13 structure13 and13 occlusions13 (ie13 within-shy‐patch13 depth13 variation13 and13 depth13 discontinuities13 Fig13 S8a-shy‐c)13 were13 not13 present13 in13 either13 of13 our13 training13 sets13 To13 determine13 how13 the13 magnitude13 of13 these13 other13 sources13 of13 depth13 variation13 compare13 to13 that13 due13 to13 slant13 we13 analyzed13 4000013 113 deg13 patches13 from13 3013 outdoor13 range13 images13 obtained13 with13 a13 high-shy‐precision13 Riegl13 VZ40013 range13 scanner13 For13 each13 patch13 we13 computed13 the13 standard13 deviation13 of13 i)13 the13 differences13 between13 the13 raw13 range13 values13 and13 the13 best-shy‐fitting13 fronto-shy‐parallel13 plane13

σ nonminus planar 13 and13 ii)13 the13 differences13

between13 the13 best-shy‐fitting13 slanted13 and13 fronto-shy‐parallel13 planes13 σ planar 13 The13 ratio13 of13 these13 standard13

deviations13 σ planar σ nonminus planar 13 provides13 a13 measure13 of13 the13 relative13 magnitude13 of13 depth13 variation13 due13 to13

slant13 and13 variation13 due13 to13 other13 factors13 If13 the13 depth13 variation13 is13 due13 only13 to13 surface13 slant13 then13 the13 ratio13

σ planar σ nonminus planar 13 will13 have13 a13 value13 equal13 to13 1013 If13 all13 of13 the13 depth13 variation13 is13 due13 to13 factors13

other13 than13 slant13 the13 ratio13 will13 have13 a13 value13 of13 0013 Fig13 S8d13 plots13 the13 distribution13 of13 this13 ratio13 for13 all13 range13 patches13 In13 the13 most13 common13 patches13 essentially13 all13 the13 variation13 is13 captured13 by13 slant13 (ie13 the13 modal13 ratio13 is13 ~10)13 The13 median13 ratio13 is13 07513 indicating13 that13 in13 the13 median13 patch13 7513 of13 the13 depth13 variation13 is13 captured13 by13 slant13 Given13 that13 the13 optimal13 binocular13 filters13 are13 robust13 to13 variation13 in13 surface13 slant13 and13 given13 that13 most13 of13 natural13 depth13 variation13 is13 captured13 by13 slant13 it13 is13 unlikely13 that13 depth13 variation13 other13 than13 slant13 will13 qualitatively13 affect13 our13 results13 13 External13 variability13 and13 neural13 noise13 13 We13 examined13 the13 relative13 magnitude13 of13 stimulus-shy‐induced13 LL13 neuron13 variability13 and13 typical13 neural13 noise13 in13 cortex13 Typical13 cortical13 neurons13 have13 a13 peak13 mean13 response13 larger13 than13 3013 spikessec13 (Geisler13 amp13 Albrecht13 1997)13 and13 intrinsic13 noise13 variance13 proportional13 to13 the13 neuronrsquos13 mean13 response13 with13 a13 proportionality13 constant13 (ie13 Fano13 factor)13 less13 than13 15(Tolhurst13 Movshon13 amp13 Dean13 1983)13 To13 err13 on13 the13 side13 of13 too13 much13 neural13 noise13 we13 assume13 that13 each13 LL13 neuron13 has13 a13 maximum13 average13 response13 of13 3013 spkss13 and13 a13 Fano13 factor13 of13 1513 (bc13 the13 effect13 of13 noise13 is13 greater13 at13 lower13 firing13 rates)13 We13 convert13 the13 LL13 neuron13 responses13 to13 spike13 rates13 by13 first13 obtaining13 the13 average13 response13 of13 each13 LL13 neuron13 to13 its13 preferred13 disparity13 and13 then13 scaling13 that13 value13 to13 the13 maximum13 average13 response13 For13 an13 integration13 time13 of13 a13 typical13 fixation13 duration13 (30013 ms)13 stimulus-shy‐induced13 response13 variance13 was13 ~3x13 greater13 than13 the13 internal13 noise13 For13 an13 integration13 time13 equal13 to13 a13 typical13 stimulus13 presentation13 in13 a13 psychophysical13 task13 (113 sec)13 13 stimulus-shy‐induced13 response13 variance13 was13 ~9x13 greater13 than13 the13 internal13 noise13 Thus13 external13 variability13 must13 be13 considered13 to13 understand13 the13 influence13 of13 neural13 noise13 on13 the13 behavioral13 limits13 of13 vision13 under13 natural13 conditions13 In13 fact13 stimulus-shy‐induced13 variability13 is13 likely13 to13 be13 the13 controlling13 variability13 in13 many13 natural13 single-shy‐trial13 tasks13 13 Spatial13 frequency13 tuning13 of13 optimal13 linear13 binocular13 filters13 13 To13 develop13 an13 intuition13 for13 why13 spatial13 frequencies13 higher13 than13 ~613 cpd13 carry13 little13 information13 about13 disparity13 we13 examined13 the13 disparity13 signals13 resulting13 from13 a13 binocularly-shy‐viewed13 high-shy‐contrast13 cosine-shy‐windowed13 edge13 The13 Fourier-shy‐decomposition13 of13 each13 eyersquos13 stimulus13 is13 a13 set13 of13

513 13

phase-shy‐aligned13 sinewave13 gratings13 with13 the13 1f13 contrast13 fall-shy‐off13 typical13 of13 natural13 scenes(Field13 1987)13 The13 binocular13 difference13 signal13 (ie13 the13 difference13 of13 the13 left13 and13 right13 signals)13 is13 sinusoidal13 with13 contrast13 amplitude13 given13 by13 13

AB f δ k( ) = AL( f )2 + AR( f )2 minus 2AL( f )AR( f )cos 2π fδ k( ) 13 13 13 (S6)13

13 where f 13 is13 the13 spatial13 frequency13 δ k 13 is13 a13 particular13 disparity13 AL ( f ) 13 and13 AR( f ) 13 are13 the13 left13 and13

right13 eye13 retinal13 amplitudes13 and13 AB f δ k( ) 13 is13 the13 amplitude13 of13 the13 binocular13 contrast13 difference13 signal13 If13 the13 two13 eyes13 have13 identical13 optics13 the13 left-shy‐13 and13 right-shy‐eye13 retinal13 amplitudes13 (which13 include13 the13 effects13 of13 the13 optics13 and13 the13 1 f 13 falloff13 of13 natural13 image13 spectra)13 will13 be13 the13 same13 for13 a13 given13 disparity13 Fig13 S7a13 plots13 pattern13 of13 binocular13 contrast13 signals13 for13 each13 of13 several13 disparities13 Fig13 S7b13 shows13 the13 signals13 after13 filtering13 with13 a13 bank13 of13 log-shy‐Gabor13 filters13 each13 having13 a13 1513 octave13 bandwidth13 The13 patterns13 of13 binocular13 contrast13 barely13 differ13 above13 ~613 cpd13 A13 filter13 population13 (for13 disparity13 estimation)13 will13 be13 most13 efficient13 if13 each13 filter13 signals13 information13 about13 a13 range13 of13 disparities13 13 13 V113 binocular13 receptive13 fields13 tuned13 to13 the13 highest13 useful13 spatial13 frequency13 (~613 cpd)13 would13 have13 a13 spatial13 extent13 of13 ~813 arcmin13 assuming13 the13 typical13 1513 octave13 bandwidth13 of13 neurons13 in13 cortex13 Given13 that13 receptive13 fields13 cannot13 signal13 disparity13 variation13 finer13 than13 their13 own13 width13 this13 analysis13 may13 help13 explain13 why13 humans13 lose13 the13 ability13 to13 detect13 disparity13 modulations13 (ie13 sinusoidal13 modulations13 in13 disparity-shy‐defined13 depth)13 at13 spatial13 frequencies13 higher13 than13 ~413 cpd13 (Banks13 Gepshtein13 amp13 Landy13 200413 Harris13 McKee13 amp13 Smallman13 1997)13 Vergence13 eye13 movement13 jitter13 has13 previously13 been13 proposed13 as13 an13 explanation13 for13 why13 useful13 binocular13 information13 is13 restricted13 to13 low13 spatial13 frequencies13 (Vlaskamp13 Yoon13 amp13 Banks13 2011)13 vergence13 noise13 can13 degrade13 stereo13 information13 enough13 to13 render13 high13 frequency13 signals13 unmeasurable13 (Fig13 S7cd)13 The13 analysis13 presented13 here13 suggests13 that13 the13 frequency13 at13 which13 humans13 lose13 the13 ability13 to13 detect13 disparity13 modulation13 may13 simply13 be13 explained13 by13 the13 statistics13 of13 natural13 images13 13 Optics13 The13 retinal13 images13 used13 in13 our13 analysis13 were13 simulated13 to13 match13 the13 retinal13 image13 formation13 that13 occurs13 in13 the13 human13 eye13 In13 humans13 and13 non-shy‐human13 primates13 accommodative13 and13 vergence13 eye-shy‐movement13 systems13 are13 yoked(Fincham13 amp13 Walton13 1957)13 Consistent13 with13 this13 fact13 we13 set13 the13 focus13 distance13 equal13 to13 the13 fixation13 distance13 of13 4013 cm13 and13 defocused13 the13 images13 appropriate13 for13 each13 disparity13 The13 image13 of13 a13 surface13 displaced13 from13 the13 fixation13 is13 defocused13 by13

13 ΔD cong δ IPD 13 13 13 13 13 13 13 13 13 13 (S7)13

13 where13 δ 13 is13 the13 disparity13 of13 the13 surface13 expressed13 in13 radians13 IPD13 is13 the13 inter-shy‐pupillary13 distance13 expressed13 in13 meters13 and13 ΔD 13 is13 the13 defocus13 expressed13 in13 diopters13 (1meters)13 To13 account13 for13 the13 effect13 of13 chromatic13 aberration13 we13 created13 a13 different13 polychromatic13 point-shy‐spread13 function13 (PSF)13 for13 each13 disparity13 (ie13 defocus13 Eq13 S7)13 Single-shy‐wavelength13 PSFs13 were13 computed13 every13 1013 nm13 between13 40013 and13 70013 nm13 The13 wavelength-shy‐dependent13 change13 in13 refractive13 power13 of13 the13 human13 eye13 was13 taken13 from13 the13 literature13 (Thibos13 Ye13 Zhang13 amp13 Bradley13 1992)13 Polychromatic13 PSFs13 were13 obtained13 by13 weighting13 the13 single-shy‐wavelength13 PSFs13 by13 the13 photopic13 sensitivity13 function13 (Stockman13 amp13 Sharpe13 2000)13 and13 by13 the13 D6513 daylight13 illumination13 spectrum13 and13 then13 summing13 13

613 13

13 psf photopic xΔD( ) = 1

Kpsf xλΔD( )sc

λsum λ( )D65 λ( ) 13 13 13 13 (S8)13

13 where13 sc λ( ) 13 is13 the13 human13 photopic13 sensitivity13 function13 and13 K13 is13 a13 normalizing13 constant13 that13 sets13 the13 volume13 of13 psf photopic 13 to13 1013 The13 modulation13 transfer13 function13 (MTF)13 is13 the13 amplitude13 of13 the13 Fourier13 transform13 of13 the13 PSF13 13 Contrast13 normalization13 Contrast13 normalization13 occurs13 early13 in13 visual13 processing13 The13 standard13 model13 of13 cortical13 neuron13 response13 assumes13 that13 the13 input13 is13 the13 contrast13 signal13

c x( ) = r x( )minus r( ) r 13 where13 r x( ) 13 are13 the13

noisy13 sensor13 responses13 to13 the13 luminance13 image13 falling13 on13 the13 retina13 over13 a13 local13 area13 and13 r 13 is13 the13 local13 mean13 This13 contrast13 signal13 is13 then13 normalized13 by13 the13 local13 contrast13

cnorm x( ) = c x( ) c x( ) 2

+ nc502 13 where13 n13 is13 the13 dimensionality13 of13 the13 vector13 c x( ) 2 13 is13 the13 contrast13

energy13 (or13 equivalently13 n 13 times13 the13 squared13 RMS13 contrast)13 and13 50c 13 is13 the13 half-shy‐saturation13 constant13 (Albrecht13 amp13 Geisler13 199113 Albrecht13 amp13 Hamilton13 198213 Heeger13 1992)13 (We13 are13 agnostic13 about13 the13 mechanism13 by13 which13 the13 contrast13 normalization13 is13 achieved)13 Finally13 the13 normalized13 contrast13 signal13 is13 weighted13 by13 the13 cortical13 neuronrsquos13 receptive13 field13 and13 passed13 through13 a13 static13 non-shy‐linearity13 For13 example13 in13 the13 case13 of13 a13 binocular13 simple13 cell13 the13 response13 of13 the13 neuron13 is13 obtained13 by13 taking13 the13 dot13 product13 of13 the13 contrast-shy‐normalized13 signal13 with13 the13 receptive13 field13 and13 half-shy‐wave13 rectifying13

Ri = Rmax cnorm x( ) sdot fi x( )⎢⎣ ⎥⎦ 13 13

13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13

713 13

13 References13 Albrecht13 D13 G13 amp13 Geisler13 W13 S13 (1991)13 Motion13 selectivity13 and13 the13 contrast-shy‐response13 function13 of13

simple13 cells13 in13 the13 visual13 cortex13 Visual13 Neuroscience13 7(6)13 531ndash54613 Albrecht13 D13 G13 amp13 Hamilton13 D13 B13 (1982)13 Striate13 cortex13 of13 monkey13 and13 cat13 contrast13 response13

function13 Journal13 of13 Neurophysiology13 48(1)13 217ndash23713 Banks13 M13 S13 Gepshtein13 S13 amp13 Landy13 M13 S13 (2004)13 Why13 is13 spatial13 stereoresolution13 so13 low13 The13

Journal13 of13 Neuroscience13 24(9)13 2077ndash208913 13 Field13 D13 J13 (1987)13 Relations13 between13 the13 statistics13 of13 natural13 images13 and13 the13 response13 properties13 of13

cortical13 cells13 Journal13 of13 the13 Optical13 Society13 of13 America13 A13 4(12)13 2379ndash239413 Fincham13 E13 amp13 Walton13 J13 (1957)13 The13 reciprocal13 actions13 of13 accommodation13 and13 convergence13 The13

Journal13 of13 Physiology13 137(3)13 488ndash50813 Geisler13 W13 S13 amp13 Albrecht13 D13 G13 (1997)13 Visual13 cortex13 neurons13 in13 monkeys13 and13 cats13 detection13

discrimination13 and13 identification13 Visual13 Neuroscience13 14(5)13 897ndash91913 Harris13 J13 M13 McKee13 S13 P13 amp13 Smallman13 H13 S13 (1997)13 Fine-shy‐scale13 processing13 in13 human13 binocular13

stereopsis13 Journal13 of13 the13 Optical13 Society13 of13 America13 A13 14(8)13 1673ndash168313 Heeger13 D13 J13 (1992)13 Normalization13 of13 cell13 responses13 in13 cat13 striate13 cortex13 Visual13 Neuroscience13 9(2)13

181ndash19713 Hibbard13 P13 B13 (2008)13 Binocular13 energy13 responses13 to13 natural13 images13 Vision13 Research13 48(12)13 1427ndash

143913 13 Stockman13 A13 amp13 Sharpe13 L13 T13 (2000)13 The13 spectral13 sensitivities13 of13 the13 middle-shy‐13 and13 long-shy‐wavelength-shy‐

sensitive13 cones13 derived13 from13 measurements13 in13 observers13 of13 known13 genotype13 Vision13 Research13 40(13)13 1711ndash173713

Thibos13 L13 N13 Ye13 M13 Zhang13 X13 amp13 Bradley13 A13 (1992)13 The13 chromatic13 eye13 a13 new13 reduced-shy‐eye13 model13 of13 ocular13 chromatic13 aberration13 in13 humans13 Applied13 Optics13 31(19)13 3594ndash360013

Tolhurst13 D13 J13 Movshon13 J13 A13 amp13 Dean13 A13 F13 (1983)13 The13 statistical13 reliability13 of13 signals13 in13 single13 neurons13 in13 cat13 and13 monkey13 visual13 cortex13 Vision13 Research13 23(8)13 775ndash78513

Vlaskamp13 B13 N13 S13 Yoon13 G13 amp13 Banks13 M13 S13 (2011)13 Human13 stereopsis13 is13 not13 limited13 by13 the13 optics13 of13 the13 well-shy‐focused13 eye13 The13 Journal13 of13 Neuroscience13 31(27)13 9814ndash981813

13

813 13

13 Figure13 S113 Optimal13 filters13 and13 disparity13 estimation13 performance13 for13 estimating13 disparity13 for13 surfaces13 with13 a13 cosine13 distribution13 of13 slants13 (a)13 Optimal13 linear13 binocular13 filters13 for13 estimating13 the13 disparity13 (of13 the13 center13 pixel)13 of13 slanted13 surfaces13 The13 filters13 were13 learned13 on13 a13 training13 set13 having13 surfaces13 with13 a13 cosine13 distribution13 of13 slants13 The13 optimal13 filter13 shapes13 are13 robust13 to13 the13 presence13 of13 non-shy‐zero13 surface13 slants13 in13 the13 training13 set13 However13 these13 filters13 are13 of13 somewhat13 smaller13 spatial13 extent13 and13 slightly13 higher13 octave13 bandwidth13 (compare13 filters13 F3-shy‐F513 to13 those13 in13 Fig13 3)13 (b)13 Spatial13 frequency13 tuning13 and13 bandwidth13 of13 filters13 (c)13 Two-shy‐dimensional13 analogs13 of13 filters13 in13 (a)13 The13 two-shy‐dimensional13 filters13 respond13 virtually13 identically13 to13 two-shy‐dimensional13 images13 as13 do13 the13 one-shy‐dimensional13 filters13 to13 one-shy‐dimensional13 signals13 (d)13 Tuning13 curves13 of13 LL13 neurons13 trained13 and13 tested13 on13 surfaces13 with13 varying13 surface13 slant13 (e)13 Normalized13 weights13 on13 model13 complex13 cell13 responses13 for13 constructing13 five13 LL13 neurons13 with13 the13 indicated13 preferred13 disparities13 (f)13 Disparity13 estimation13 performance13 with13 surfaces13 having13 a13 cosine13 distribution13 of13 slants13 (black13 line)13 Also13 shown13 are13 disparity13 estimates13 derived13 from13 a13 local13 cross-shy‐correlator13 (gray13 line)13 (g)13 Disparity13 gradients13 from13 viewing13 natural13 scenes(Hibbard13 2008)13 vs13 disparity13 gradients13 in13 our13 training13 set13 The13 relationship13 between13 slant13 and13 disparity13 gradient13 is13 given13 by13

g asymp IPD tanθ( ) d 13 where13 θ 13 is13 the13 surface13 slant13 and13 d13 is13 the13 distance13 to13 the13 surface13 Hibbard13 (2008)13 estimated13 disparity13 gradients13 in13 natural13 viewing13 for13 distances13 nearer13 than13 3513 m13 (dashed13 curve)13 and13 0513 m13 (dotted13 curve)13 At13 the13 04m13 viewing13 distance13 assumed13 throughout13 the13 paper13 our13 manipulation13 of13 surface13 slant13 introduced13 distribution13 of13 disparity13 gradients13 (black13 line)13 that13 is13 comparable13 to13 the13 gradients13 introduced13 in13 natural13 viewing13 13

913 13

13 Figure13 S213 Filter13 responses13 LL13 neuron13 responses13 and13 performance13 with13 random-shy‐dot13 stereograms13 and13 anti-shy‐correlated13 random-shy‐dot13 stereograms13 (a)13 Joint13 response13 distributions13 of13 filters13 F113 and13 F213 when13 stimulated13 with13 random-shy‐dot13 stereograms13 Compare13 to13 natural13 stimuli13 in13 Fig13 4a13 (b)13 Joint13 responses13 of13 filters13 F313 and13 F413 when13 stimulated13 with13 random-shy‐dot13 stereograms13 (c)13 LL13 neuron13 responses13 to13 random13 dot13 stereograms13 (d)13 LL13 neuron13 tuning13 vs13 bandwidth13 when13 stimulated13 with13 natural13 stimuli13 Note13 that13 bandwidths13 are13 elevated13 compared13 to13 when13 the13 LL13 neurons13 are13 stimulated13 with13 natural13 stimuli13 (see13 Fig13 7c)13 (e)13 Disparity13 estimates13 and13 estimate13 precision13 (inset)13 obtained13 with13 random-shy‐dot13 stereograms13 (fghi)13 Same13 as13 abce13 except13 that13 the13 stimuli13 were13 anti-shy‐correlated13 random13 dot13 stereograms13 Filter13 response13 distributions13 still13 segregate13 as13 a13 function13 of13 disparity13 (especially13 with13 F313 and13 F4)13 However13 the13 anti-shy‐correlated13 filter13 response13 distributions13 do13 not13 coincide13 with13 the13 lsquoexpectedrsquo13 pattern13 of13 response13 for13 particular13 disparities13 The13 disparities13 are13 decoded13 improperly13 with13 a13 decoder13 trained13 on13 natural13 stimuli13 LL13 neuron13 tuning13 curves13 with13 anti-shy‐correlated13 stereograms13 are13 neither13 selective13 nor13 invariant13 Inaccurate13 highly13 variable13 estimates13 result13 13 13

13 13

1013 13

13 Figure13 S313 Obtaining13 AMA13 filter13 and13 model13 complex13 cell13 responses13 with13 established13 neural13 operations13 (a)13 Simple13 cell13 responses13 are13 obtained13 by13 projecting13 the13 left-shy‐13 and13 right-shy‐eye13 signals13 onto13 each13 binocular13 filter13 and13 half-shy‐wave13 rectifying13 Subtracting13 lsquoonrsquo13 and13 lsquooffrsquo13 simple13 cell13 responses13 give13 the13 AMA13 filter13 responses13 (see13 Fig13 3a)13 (b)13 Model13 (and13 neurophysiological)13 complex13 cell13 responses13 are13 obtained13 by13 summing13 and13 squaring13 the13 responses13 of13 simple13 cells13 (Model13 complex13 cell13 responses13 can13 also13 be13 obtained13 by13 summing13 the13 responses13 of13 simple13 cells13 that13 have13 both13 a13 half-shy‐wave13 rectifying13 and13 a13 squaring13 nonlinearity)13 Note13 that13 the13 model13 complex13 cells13 differ13 from13 canonical13 disparity13 energy13 binocular13 complex13 cells13 We13 label13 them13 lsquocomplexrsquo13 because13 they13 exhibit13 temporal13 frequency13 doubling13 and13 have13 a13 fundamental13 to13 DC13 response13 ratio13 of13 less13 than13 one13 The13 model13 LL13 neurons13 (see13 Fig13 613 and13 Fig13 7)13 are13 more13 similar13 to13 disparity13 energy13 complex13 cells13 although13 as13 noted13 in13 the13 text13 there13 are13 also13 important13 differences13

13

1113 13

13 Figure13 S413 The13 eight13 AMA13 filters13 from13 Fig13 313 (on13 the13 diagonal)13 and13 their13 pairwise13 sums13 (see13 Fig13 S3a)13 Solid13 and13 dashed13 curves13 indicate13 the13 left13 and13 right13 eye13 filter13 components13 respectively13 Filter13 identity13 is13 indicated13 by13 the13 subscripts13 Fij13 Some13 filters13 have13 shapes13 similar13 to13 canonical13 V1-shy‐like13 binocular13 receptive13 fields13 These13 filters13 yield13 model13 complex13 cell13 tuning13 curves13 that13 vary13 with13 disparity13 (Fig13 S5)13 Other13 filters13 are13 irregularly13 shaped13 (eg13 F36)13 Irregularly13 shaped13 filters13 yield13 model13 complex13 cell13 tuning13 curves13 that13 do13 not13 vary13 significantly13 with13 disparity13 (Fig13 S5)13 The13 inset13 shows13 the13 largest13 absolute13 normalized13 weight13 (see13 Supplement13 Eqs13 S1a-shy‐d)13 given13 to13 each13 filter13 in13 constructing13 the13 set13 of13 LL13 neurons13 (Fig13 7a)13 Lighter13 colors13 indicate13 smaller13 weights13 Unsurprisingly13 the13 irregularly13 shaped13 filters13 (eg13 F36)13 play13 a13 generally13 smaller13 role13 in13 shaping13 the13 response13 properties13 of13 log-shy‐likelihood13 (LL)13 neurons13 13 13

1213 13

13 Figure13 S513 Disparity13 tuning13 curves13 of13 the13 model13 complex13 cells13 implied13 by13 the13 AMA13 filters13 (on-shy‐diagonal)13 and13 their13 normalized13 pairwise13 sums13 (off-shy‐diagonal)13 Solid13 curves13 show13 the13 complex13 cell13 tuning13 curves13 to13 natural13 stimuli13 The13

complex13 response13 is13 given13 by13 filtering13 the13 contrast-shy‐normalized13 stimulus13 and13 then13 squaring13 Rmax fi

Tcnorm( )2 13 (see13 Fig13 S4)13 Each13 point13 on13 the13 curve13 represents13 the13 average13 response13 to13 a13 large13 number13 of13 natural13 stimuli13 having13 the13 same13 disparity13 Gray13 area13 shows13 response13 variability13 due13 to13 variation13 in13 natural13 image13 content13 Dashed13 curves13 show13 the13 complex13 cell13 tuning13 curve13 to13 513 random-shy‐dot13 stereograms13 (RDSs)13 In13 most13 cases13 RDS13 tuning13 curves13 are13 similar13 to13 the13 tuning13 curves13 for13 natural13 stimuli13 although13 response13 magnitude13 is13 generally13 somewhat13 attenuated13 These13 individual13 model13 complex13 cells13 have13 some13 selectivity13 for13 disparity13 However13 they13 exhibit13 little13 invariance13 to13 variation13 in13 natural13 image13 content13 Thus13 the13 pattern13 of13 the13 joint13 population13 response13 must13 be13 used13 to13 accurately13 estimate13 disparity13 Neurons13 that13 signal13 the13 log-shy‐likelihood13 of13 disparity13 (LL13 neurons)13 can13 be13 constructed13 by13 appropriately13 combining13 the13 joint13 responses13 of13 model13 complex13 cells13 LL13 neurons13 are13 highly13 selective13 for13 disparity13 and13 are13 also13 strongly13 response13 invariant13 to13 irrelevant13

1313 13

13 Figure13 S613 Optimal13 disparity13 estimation13 vs13 estimation13 via13 local13 cross-shy‐correlation13 Performance13 measures13 were13 obtained13 by13 testing13 each13 method13 on13 identical13 sets13 of13 left-shy‐13 and13 right-shy‐eye13 signals13 like13 those13 shown13 in13 Fig13 1d13 (Note13 that13 this13 differs13 somewhat13 from13 the13 most13 common13 test13 of13 the13 cross-shy‐correlation13 model13 which13 gives13 the13 algorithm13 windowed13 access13 to13 many13 patches13 in13 each13 eye13 We13 chose13 the13 current13 method13 so13 that13 each13 algorithm13 has13 access13 to13 the13 same13 image13 information13 thereby13 making13 the13 comparison13 meaningful)13 Black13 curves13 indicate13 estimation13 performance13 via13 local13 cross-shy‐correlation13 Gray13 curves13 indicate13 performance13 obtained13 with13 the13 eight13 AMA13 filters13 shown13 in13 the13 main13 text13 (data13 replotted13 from13 Fig13 5)13 (a)13 Median13 MAP13 estimates13 for13 frontoparallel13 surfaces13 only13 Error13 bars13 indicate13 6813 confidence13 intervals13 (b)13 Estimate13 precision13 as13 indicated13 by13 6813 confidence13 intervals13 Error13 bars13 from13 (a)13 are13 replotted13 on13 a13 linear13 axis13 (Confidence13 intervals13 were13 plotted13 on13 a13 semi-shy‐log13 plot13 in13 the13 main13 text)13 (c)13 Sign13 identification13 performance13 with13 frontoparallel13 surfaces13 (de)13 Same13 as13 bc13 except13 that13 the13 performance13 is13 for13 surfaces13 with13 a13 cosine13 distribution13 of13 slants13 Estimation13 performance13 with13 the13 cross-shy‐correlator13 is13 (slightly)13 more13 biased13 and13 is13 significantly13 less13 precise13 at13 disparities13 larger13 than13 7513 arcmin13 13

1413 13

13 Figure13 S713 Information13 for13 disparity13 estimation13 is13 concentrated13 in13 low13 to13 mid13 spatial13 frequencies13 (a)13 Inter-shy‐ocular13 retinal13 contrast13 amplitude13 of13 a13 cosine-shy‐windowed13 high-shy‐contrast13 edge13 as13 a13 function13 of13 spatial13 frequency13 for13 different13 disparities13 (Eq13 S6)13 The13 optics13 in13 the13 two13 eyes13 were13 identical13 ie13

AL f( ) = AR f( )prop MTFδ f( ) f 13 where13 MTFδ is13 the13 modulation13 transfer13 function13 associated13 with13 a13 particular13 disparity13 (see13 Methods)13 (b)13 Inter-shy‐ocular13 contrast13 signals13 that13 have13 been13 passed13 through13 a13 bank13 of13 1513 octave13 bandwidth13 log-shy‐Gabor13 filters13 The13 signals13 differ13 little13 above13 ~513 cpd13 (c)13 The13 effect13 of13 vergence13 jitter13 on13 inter-shy‐ocular13 contrast13 signals13 Signals13 were13 corrupted13 by13 213 arcmin13 of13 vergence13 jitter13 a13 typical13 value13 in13 humans13 Vergence13 noise13 was13 simulated13 by13 convolving13 each13 disparity13 signal13 with13 a13 Gaussian13 having13 a13 standard13 deviation13 of13 213 arcmin13 Vergence13 jitter13 further13 attenuates13 high13 frequency13 inter-shy‐ocular13 contrast13 signals13 Natural13 image13 structure13 and13 vergence13 jitter13 both13 contribute13 to13 the13 lack13 of13 disparity13 information13 at13 high13 luminance13 spatial13 frequencies13 (d)13 The13 signals13 in13 (c)13 after13 having13 been13 passed13 through13 a13 bank13 of13 1513 octave13 bandwidth13 log-shy‐Gabor13 filters13 (e)13 The13 effect13 of13 a13 four13 diopter13 difference13 in13 refractive13 power13 between13 the13 two13 eyes13 drastically13 reduces13 the13 inter-shy‐ocular13 contrast13 signal13 A13 four13 diopter13 (or13 larger)13 difference13 between13 the13 left13 and13 right13 optics13 is13 a13 primary13 risk13 factor13 for13 the13 development13 of13 amblyopia13 (Levi13 et13 al13 2011)13 Note13 that13 these13 signals13 were13 simulated13 for13 eyes13 with13 2mm13 pupils13 With13 larger13 pupil13 diameters13 the13 same13 difference13 in13 refractive13 power13 would13 further13 reduce13 available13 disparity13 signals13 For13 example13 with13 a13 2mm13 pupil13 an13 813 diopter13 difference13 between13 the13 eyes13 produces13 the13 same13 difference13 in13 defocus13 blur13 as13 would13 a13 4mm13 pupil13 with13 a13 413 diopters13 difference13 between13 the13 eyes13 In13 that13 case13 there13 are13 no13 measureable13 signals13 above13 113 cpd13 (f)13 The13 signals13 in13 (e)13 after13 having13 been13 passed13 through13 a13 bank13 of13 1513 octave13 bandwidth13 log-shy‐Gabor13 filters13 (g)13 Same13 as13 (e)13 but13 with13 a13 4mm13 pupil13 (h)13 Same13 as13 (f)13 but13 with13 a13 4mm13 pupil13 13

13

1513 13

13 Figure13 S813 Depth13 structure13 not13 attributable13 to13 slant13 in13 natural13 scenes13 (a)13 A13 representative13 natural13 range13 image13 (b)13 Overhead13 plan13 view13 of13 the13 range13 along13 the13 white13 scan-shy‐line13 in13 (a)13 (c)13 Range13 values13 within13 a13 113 deg13 patch13 marked13 by13 the13 white13 box13 in13 (a)13 and13 the13 black13 box13 in13 (b)13 The13 dots13 represent13 the13 raw13 range13 values13 The13 plane13 is13 the13 best13 fitting13 plane13 to13 the13 range13 data13 (d)13 Depth13 variation13 in13 113 deg13 patches13 of13 natural13 scenes13 can13 largely13 be13 captured13 by13 slanted13 planes13 (see13 supplementary13 text)13 Values13 near13 one13 indicate13 that13 all13 range13 variation13 can13 be13 captured13 by13 (planar)13 surface13 slant13 Values13 near13 zero13 indicate13 that13 none13 of13 the13 range13 variation13 can13 be13 captured13 by13 planar13 surface13 slant13 In13 the13 most13 common13 patch13 nearly13 all13 depth13 variation13 is13 attributable13 to13 slant13 In13 the13 median13 patch13 7513 of13 depth13 variation13 is13 captured13 by13 slant13 13 13 13 13 13 13 13 13 13 13

1613 13

13 Figure13 S913 Contrast13 normalization13 and13 the13 Gaussian13 form13 of13 the13 filter13 response13 distributions13 The13 standard13 model13 of13

contrast13 normalization13 in13 retinal13 and13 cortical13 neurons13 is13 given13 by13 cnorm x( ) = c x( ) c x( ) 2

+ nc502 (Albrecht13 amp13 Geisler13

199113 Albrecht13 amp13 Hamilton13 198213 Heeger13 1992)13 Throughout13 the13 paper13 we13 assumed13 that13 c50 13 had13 a13 value13 of13 zero13 To13 examine13 the13 effect13 of13 different13 values13 of13 c50 13 on13 the13 form13 of13 the13 filter13 response13 distributions13 we13 first13 contrast-shy‐normalized13 the13 test13 stimuli13 using13 different13 values13 of13 c50 13 Then13 we13 projected13 the13 contrast-shy‐normalized13 the13 test13 stimuli13 from13 each13 disparity13 level13 onto13 the13 filters13 to13 obtain13 the13 conditional13 filter13 response13 distributions13 Finally13 we13 fit13 a13 multi-shy‐dimensional13 generalized13 Gaussian13 (one-shy‐dimension13 for13 each13 filter)13 to13 each13 conditional13 response13 distribution13

P R |δ k( ) 13 via13 maximum13 likelihood13 estimation13 (a)13 When13 c50 =13 0013 the13 conditional13 response13 distributions13 have13 tails13 that13 tend13 to13 be13 somewhat13 lighter13 than13 Gaussians13 (ie13 lower13 kurtosis13 than13 a13 Gaussian)13 When13 c50 =13 0113 the13 conditional13 response13 distributions13 are13 most13 Gaussian13 on13 average13 For13 c50 0113 the13 distributions13 have13 heavier13 tails13 than13 Gaussians13 (ie13 higher13 kurtosis13 than13 a13 Gaussian)13 (b)13 Powers13 of13 the13 multi-shy‐dimensional13 generalized13 Gaussians13 that13 best13 fit13 a13 subset13 of13 the13 two-shy‐dimensional13 conditional13 filter13 response13 distributions13 A13 power13 of13 two13 (dashed13 line)13 indicates13 that13 the13 best13 fitting13 distribution13 was13 Gaussian13 Powers13 greater13 than13 or13 less13 than13 two13 indicate13 that13 the13 responses13 are13 better13 fit13 by13 distributions13 having13 lighter13 or13 heavier13 tails13 than13 a13 Gaussian13 respectively13 The13 best-shy‐fitting13 powers13 are13 shown13 for13 three13 different13 values13 of13 c50 13 00013 01013 and13 02513 The13 average13 powers13 across13 the13 conditional13 response13 distributions13 for13 the13 three13 tested13 values13 of13 c50 13 were13 27913 20713 and13 11713 respectively13 The13 same13 qualitative13 results13 hold13 for13 the13 other13 two-shy‐dimensional13 response13 distributions13 The13 pattern13 also13 holds13 for13 higher-shy‐dimensional13 filter13 response13 distributions13 In13 higher13 dimensions13 however13 results13 are13 noisier13 because13 stable13 estimates13 of13 kurtosis13 require13 significantly13 more13 data13 13 13 13 13 13

  • jovi-14-02-01_118
    • Introduction
    • f01
    • Methods and results
    • e01
    • f02
    • e02
    • e03
    • e04
    • f03
    • e05
    • f04
    • f05
    • f06
    • f07
    • Discussion
    • Conclusions
    • Methods details
    • e06
    • e07
    • Albrecht1
    • Albrecht2
    • Anzai1
    • Badcock1
    • Banks1
    • Berens1
    • Blakemore1
    • Burge1
    • Burge2
    • Carandini1
    • Chen1
    • Cormack1
    • Cumming1
    • Cumming2
    • Cumming3
    • Curcio1
    • DeValois1
    • DeAngelis1
    • Ferster1
    • Fincham1
    • Fleet1
    • Ganguli1
    • Geisler1
    • Girshick1
    • Haefner1
    • Haefner2
    • Harris1
    • Hecht1
    • Heeger1
    • Hibbard1
    • Hoyer1
    • Jazayeri1
    • Kaye1
    • Landers1
    • Lehky1
    • Levi1
    • Lewicki1
    • Li1
    • Liu1
    • Ma1
    • Marcos1
    • Marr1
    • McKee1
    • Nakayama1
    • Nienborg1
    • Ogle1
    • Ohzawa1
    • Ohzawa2
    • Ohzawa3
    • Olshausen1
    • Panum1
    • Poggio1
    • Qian1
    • Qian2
    • Read1
    • Read2
    • Read3
    • Simoncelli1
    • Skottun1
    • Stevenson1
    • Stockman1
    • Tanabe1
    • Tanabe2
    • Thibos1
    • Tsao1
    • Tyler1
    • Tyler2
    • Vlaskamp1
    • Williams1
    • Wyszecki1
      • JOV-03741-2013-s01
Page 2: Optimal disparity estimation in natural stereo images · The local differences between the retinal images—the binocular disparities— ... vertically oriented receptive fields.The

show how the optimal computations can be imple-mented with well-understood cortical mechanismsInterestingly the binocular receptive field properties ofcortical neurons and human behavioral performancein disparity-related tasks parallel those of the optimalestimator The results help explain the exquisite abilityof the human visual system to recover the 3-D structureof natural scenes from binocular disparity

The computational approach employed here hasimplications not just for understanding disparityestimation but for the more general problem ofunderstanding neural encoding and decoding in specifictasks Our approach merges and extends the two majorexisting computational approaches for understandingthe neural encoding-decoding problem The firstexisting approachmdashefficient codingmdashfocuses on how toefficiently (compactly) represent natural sensory signalsin neural populations (Hoyer amp Hyvarinen 2000Lewicki 2002 Li amp Atick 1994 Olshausen amp Field1996 Simoncelli amp Olshausen 2001) While thisapproach has provided general insights into retinal andcortical encoding its goal is to represent the sensorysignals without loss of information Because it has adifferent goal it does not elucidate the encoding or thedecoding necessary for specific perceptual tasks In acertain sense it provides no more insight into thecomputations underlying specific perceptual abilities

than the photoreceptor responses themselves Thesecond existing approach focuses on how to decodepopulations of neurons tuned to the stimulus variableassociated with a specific perceptual task (GirshickLandy amp Simoncelli 2011 Jazayeri amp Movshon 2006Ma Beck Latham amp Pouget 2006 Read amp Cumming2007) However this approach focuses on responsevariability that is intrinsic to the neural system ratherthan response variability that is due to the naturalvariation in the stimuli themselves Specifically thisapproach assumes neurons that have perfectly invari-ant tuning functions These neurons give the sameresponse (except for neural noise) to all stimuli havingthe same value of the variable of interest In generalthis cannot occur under natural viewing conditionsVariation in natural signals along irrelevant dimensionsinevitably causes variation in the tuning function Thusthis approach does not address how encoding affectsthe neural encoding-decoding problem It is critical toconsider natural sensory signals and task-specificencoding and decoding simultaneously because en-coded signals relevant for one task may be irrelevantfor another and because task-specific decoding mustdiscount irrelevant variation in the encoded signals

Arguably the most significant advances in behav-ioral and systems neuroscience have resulted from thestudy of neural populations associated with particular

Figure 1 Natural scene inputs disparity geometry and example left and right eye signals (a) Animals with front facing eyes (b)

Example natural images used in the analysis (c) Stereo geometry The eyes are fixated and focused at a point straight ahead at 40 cm

We considered retinal disparity patterns corresponding to fronto-parallel and slanted surfaces Non-planar surfaces were also

considered (see Discussion) (d) Photographs of natural scenes are texture mapped onto planar fronto-parallel or slanted surfaces

Here the left and right eye retinal images are perspective projections (inset) of a fronto-parallel surface with 5 arcmin of uncrossed

disparity Left and right eye signals are obtained by vertically averaging each image these are the signals available to neurons with

vertically oriented receptive fields The signals are not identical shifted copies of each other because of perspective projection added

noise and cosine windowing (see text) We note that across image patches there is considerable signal variation due to stimulus

properties (eg texture) unrelated to disparity A selective invariant neural population must be insensitive to this variation

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 2

sensory and perceptual tasks Here we describe anapproach for determining the encoding and decodingof natural sensory stimuli required for optimalperformance in specific tasks and show that well-established cortical mechanisms can implement the

optimal computations Our results provide principledtestable hypotheses about the processing stages thatgive rise to the selective invariant neural populationsthat are thought to underlie sensory and perceptualconstancies

Methods and results

Natural disparity signals

The first step in deriving the ideal observer fordisparity estimation is to simulate the photoreceptorresponses to natural scenes (see Figure 2a) The vectorof photoreceptor responses r (see Figure 2a) isdetermined by the luminance (Figure 1b) and depthstructure of natural scenes projective viewing geome-try and the optics sensors and noise in the visionsystem We model each of these factors for the humanvisual system (see Methods details) We generate a largenumber of noisy sampled stereo-images of small (18 middot18) fronto-parallel surface patches for each of a largenumber of disparity levels within Panumrsquos fusionalrange (Panum 1858) (16875 to 16875 arcmin in1875 arcmin steps) This range covers approximately80 of disparities that occur at or near the fovea innatural viewing (Liu Bovik amp Cormack 2008) andthese image patches represent the information availableto the vision system for processing Although we focusfirst on fronto-parallel patches we later show that theresults are robust to surface slant and are likely to berobust to other depth variations occurring in naturalscenes We emphasize that in this paper we simulateretinal images by projecting natural image patches ontoplanar surfaces (Figure 1c) rather than simulatingretinal images from stereo photographs Later weevaluate the effects of this simplification (see Discus-sion)

The binocular disparity in the images entering thetwo eyes is given by

d frac14 aL aR eth1THORNwhere aL and aR are the angles between the retinalprojections of a target and fixation point in the left andright eyes This is the definition of absolute retinaldisparity The specific pattern of binocular disparitiesdepends on the distance between the eyes the distanceand direction that the eyes are fixated and the distanceand depth structure of the surfaces in the scene Weconsider a viewing situation in which the eyes areseparated by 65 cm (a typical interocular separation)and are fixated on a point 40 cm straight ahead (atypical armrsquos length) (Figure 1c)

Figure 2 Hierarchical processing steps in optimal disparity

estimation (a) The photoreceptor responses are computed for

each of many natural images for each of many different

disparities (b) The optimal filters for disparity estimation are

learned from this collection of photoreceptor responses to

natural stimuli These are the eight most useful vertically

oriented filters (receptive fields) (see also Figure 3a) Left and

right eye filter weights are shown in gray Filter responses are

given by the dot product between the photoreceptor responses

and filter weights (c) The optimal selective invariant units are

constructed from the filter responses Each unit in the

population is tuned to a particular disparity These units result

from a unique combination of the optimal filter responses (see

also Figure 7) (d) The optimal readout of the selective invariant

population response is determined Each black dot shows the

response of one of the disparity-tuned units in (c) to the

particular image shown in (a) The peak of the population

response is the optimal (MAP) estimate Note that r R and RLL

are vectors representing the photoreceptor filter and disparity-

tuned-unit population responses to particular stimuli Red

outlines represent the responses to the particular stereo-image

patch in (a) The steps in this procedure are general and will be

useful for developing ideal observers for other estimation tasks

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 3

Optimal linear binocular filters

Estimating disparity (solving the correspondenceproblem) is difficult because there often are multiplepoints in one eyersquos image that match a single point inthe other (Marr amp T Poggio 1976) Thereforeaccurate estimation requires using the local imageregion around each point to eliminate false matches Inmammals neurons with binocular receptive fields thatweight and sum over local image regions first appear inprimary visual cortex Much is known about theseneurons It is not known however why they have thespecific receptive fields they do nor is it known howtheir responses are combined for the task of estimatingdisparity

To discover the linear binocular filters that areoptimal this task (see Figure 2b) we use a Bayesianstatistical technique for dimensionality reduction calledaccuracy maximization analysis (AMA) (Geisler Na-jemnik amp Ing 2009) The method depends both on thestatistical structure of the sensory signals and on thetask to be solved This feature of the method iscritically important because information relevant forone task may be irrelevant for another For any fixednumber of filters (feature dimensions) AMA returnsthe linear filters that maximize accuracy for the specificcomputational task at hand (see Methods details)(Note that feature dimensions identified with otherdimensionality-reduction techniques like PCA and ICAmay be irrelevant for a given task) Importantly AMAmakes no a priori assumptions about the shapes of thefilters (eg there no requirement that they be orthog-onal) We applied AMA to the task of identifying thedisparity level from a discrete set of levels in a randomcollection of retinal stereo images of natural scenesThe number of disparity levels was sufficient to allowcontinuous disparity estimation from15 to 15 arcmin(see Methods details)

Before applying AMA we perform a few additionaltransformations consistent with the image processingknown to occur early in the primate visual system(Burge amp Geisler 2011) First each eyersquos noisy sampledimage patch is converted from a luminance image to awindowed contrast image c(x) by subtracting off anddividing by the mean and then multiplying by a raisedcosine of 058 at half height The window limits themaximum possible size of the binocular filters it placesno restriction on minimum size The size of the cosinewindow approximately matches the largest V1 binoc-ular receptive field sizes near the fovea (Nienborg et al2004) Next each eyersquos sampled image patch isaveraged vertically to obtain what are henceforthreferred to as left and right eye signals Verticalaveraging is tantamount to considering only verticallyoriented filters for this is the operation that verticallyoriented filters perform on images All orientations can

provide information about binocular disparity (Chen ampQian 2004 DeAngelis et al 1991) however becausecanonical disparity receptive fields are vertically ori-ented we focus our analysis on them An examplestereo pair and corresponding left and right eye signalsare shown in Figure 1d Finally the signals are contrastnormalized to a vector magnitude of 10 cnorm(x) frac14c(x)||c(x)|| This normalization is a simplified versionof the contrast normalization seen in cortical neurons(Albrecht amp Geisler 1991 Albrecht amp Hamilton 1982Heeger 1992) (see Supplement) Seventy-six hundrednormalized left and right eye signals (400 NaturalInputs middot 19 Disparity Levels) constituted the trainingset for AMA

The eight most useful linear binocular filters (in rankorder) are shown in Figure 3a These filters specify thesubspace that a population of neurophysiologicalreceptive fields should cover for maximally accuratedisparity estimation Some filters are excited bynonzero disparities some are excited by zero (or near-zero) disparities and still others are suppressed by zerodisparity The left and right eye filter components areapproximately log-Gabor (Gaussian on a log spatialfrequency axis) The spatial frequency selectivity ofeach filterrsquos left and right eye components are similarbut differ between filters (Figure 3b) Filter tuning andbandwidth range between 12ndash47 c8 and 09ndash42 c8respectively with an average bandwidth of approxi-mately 15 octaves Thus the spatial extent of the filters(see Figure 3a) is inversely related to its spatialfrequency tuning As the tuned frequency increases thespatial extent of the filter decreases The filters alsoexhibit a mixture of phase and position coding (Figure3c) suggesting that a mixture of phase and positioncoding is optimal Similar filters result (Figure S1andashc)when the training set contains surfaces having adistribution of different slants (see Discussion) (Notethat the cosine windows bias the filters more towardphase than position encoding Additional analyses havenevertheless shown that windows having a nonzeroposition offset ie position disparity do not qualita-tively change the filters)

Interestingly some filters provide the most infor-mation about disparity when they are not respondingFor example the disparity is overwhelmingly likely tobe zero when filter f5 (see Figure 3 SupplementaryFigure S5) does not respond because anticorrelatedintensity patterns in the left- and right-eye images arevery unlikely in natural images A complex cell withthis binocular receptive field would produce a disparitytuning curve similar to the tuning curve produced by aclassic tuned-inhibitory cell (Figure S5 Poggio Gon-zalez Krause 1988)

The properties of our linear filters are similar tothose of binocular simple cells in early visual cortex(Cumming amp DeAngelis 2001 De Valois Albrecht amp

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 4

Thorell 1982 DeAngelis et al 1991 Poggio et al1988) Specifically binocular simple cells have asimilar range of shapes a similar range of spatialfrequency tuning similar octave bandwidths of 15and similar distributions of phase and position coding(Figure 3c) Despite the similarities between theoptimal filters and receptive fields in primary visualcortex we emphasize that our explicit aim is not toaccount for the properties of neurophysiologicalreceptive fields (Olshausen amp Field 1996) Rather ouraim is to determine the filters that encode the retinalinformation most useful for estimating disparity innatural images Thus neurophysiological receptivefields with the properties described here would beideally suited for disparity estimation

The fact that many neurophysiological properties ofbinocular neurons are predicted by a task-drivenanalysis of natural signals with appropriate biologicalconstraints and zero free parameters suggests thatsimilar ideal observer analyses may be useful forunderstanding and evaluating the neural underpin-nings of other fundamental sensory and perceptualabilities

Optimal disparity estimation

The optimal linear binocular filters encode the mostrelevant information in the retinal images for disparityestimation However optimal estimation performancecan only be reached if the joint responses of the filtersare appropriately combined and decoded (Figure 2c d)Here we determine the Bayes optimal (nonlinear)decoder by analyzing the responses of the filters inFigure 3 to natural images with a wide range ofdisparities Later we show how this decoder could beimplemented with linear and nonlinear mechanismscommonly observed in early visual cortex (We notethat the decoder that was used to learn the AMA filtersfrom the training stimuli cannot be used for arbitrarystimuli see Methods details)

First we examine the joint distribution of filterresponses to the training stimuli conditioned on eachdisparity level dk (The dot product between each filterand a contrast normalized left- and right-eye signalgives each filterrsquos response The filter responses arerepresented by the vector R see Figure 2b) Theconditional response distributions p(Rjdk) are approx-imately Gaussian (83 of the marginal distributionsconditioned on disparity are indistinguishable fromGaussian K-S test p 001 see Figure 4a) (Theapproximately Gaussian form is largely due to thecontrast normalization of the training stimuli seeDiscussion) Each of these distributions was fit with amultidimensional Gaussian gauss(R uk Ck) estimatedfrom the sample mean vector uk and the sample

covariance matrix Ck Figure 4a shows sample filterresponse distributions (conditioned on several disparitylevels) for the first two AMA filters Much of theinformation about disparity is contained in thecovariance between the filter responses indicating thata nonlinear decoder is required for optimal perfor-mance

The posterior probability of each disparity given anobserved response vector of AMA filter responses Rcan be obtained via Bayesrsquo rule

pethdkjRTHORN frac14gaussethRukCkTHORNpethdkTHORNXNlfrac141

gaussethR ulClTHORNpethdlTHORN eth2THORN

Here we assume that all disparities are equallylikely Hence the prior probabilities p(d) cancel out(This assumption yields a lower bound on performanceperformance will increase somewhat in natural condi-tions when the prior probabilities are not flat ascenario that is considered later)

The solid curves in Figure 4b show posteriorprobability distributions averaged across all naturalstimuli having the same disparity Each posterior canbe conceptualized as the expected response of apopulation of (exponentiated) log-likelihood neurons(see below) ordered by their preferred disparitiesColored areas show variation in the posterior proba-bilities due to irrelevant image variation When the goalis to obtain the most probable disparity the optimalestimate is given by the maximum a posteriori (MAP)read-out rule

d frac14 argmaxd

pethdjRTHORN eth3THORN

The posterior distributions given by Equation 2could of course also be read out with other decodingrules which would be optimal given different goals(cost functions)

The accuracy of disparity estimates from a largecollection of natural stereo-image test patches (29600Test Patches 800 Natural Inputs middot 37 DisparityLevels) is shown in Figure 5a (Note None of the testpatches were in the training set and only half the testdisparity levels were in the training set) Disparityestimates are unbiased over a wide range At zerodisparity estimate precision corresponds to a detectionthreshold of 6 arcsec As disparity increases (ie asthe surface patch is displaced from the point offixation) precision decreases approximately exponen-tially with disparity (Figure 5b) Sign confusions arerare (37 Figure 5c) Similar but somewhat poorerperformance is obtained with surfaces having a cosinedistribution of slants (Figures 5d e Figure S1f seeDiscussion) In summary optimal encoding (filters inFigure 3) and decoding (Equations 2 and 3) of task-

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 5

relevant information in natural images yields excellentdisparity estimation performance

This pattern of performance is consistent withhuman performance Human disparity detectionthresholds are exquisite a few arcsec on average(Blakemore 1970 Cormack et al 1991) discrimina-tion thresholds decrease exponentially with disparity(Badcock amp Schor 1985 Blakemore 1970 McKeeLevi amp Bowne 1990 Stevenson Cormack Schor ampTyler 1992) and sign confusions occur with a similarpattern and a similar proportion of the time (Landers ampCormack 1997)

Most psychophysical and neurophysiological datahas been collected with artificial stimuli usuallyrandom-dot stereograms (RDSs) We asked how wellour optimal estimator performs on RDS stimuli Giventhat our estimator was trained on natural stimuli it isinteresting to note that performance is very similar (butslightly poorer) with RDSs (Figure S2andashe) Fortunate-ly this finding suggests that under many circumstancesRDS stimuli are reasonable surrogates for naturalstimuli when measuring disparity processing in bio-logical vision systems

Our optimal estimator also performs poorly onanticorrelated stimuli (eg stimuli in which one eyersquosimage is contrast reversed) just like humans (Cum-ming Shapiro amp Parker 1998) Relevant informationexists in the optimal filter responses to anticorrelatedstimuli although the conditional response distributionsare more poorly segregated and the responses aregenerally slightly weaker (Figure S2f g) The primaryreason for poor performance is that an optimalestimator trained on natural images cannot accuratelydecode the filter responses to anticorrelated stimuli(Figure S2h i) This fact may help explain whybinocular neurons often respond strongly to anticor-related stereograms and why anticorrelated stereo-

grams appear like ill-specified volumes of points indepth

Optimal disparity encoding and decoding withneural populations

Although the AMA filters appear similar to binoc-ular simple cells (Figure 3 Figure S1) it may not beobvious how the optimal Bayesian rule for combiningtheir responses is related to processing in visual cortexHere we show that the optimal computations can beimplemented with neurally plausible operationsmdashlinearexcitation linear inhibition and simple static non-linearities (thresholding and squaring) Appropriateweighted summation of binocular simple and complexcell population responses can result in a new popula-tion of neurons having tightly tuned unimodaldisparity tuning curves that are largely invariant (seeFigure 2c)

The key step in implementing a Bayes-optimalestimation rule is to compute the likelihoodmdashorequivalently the log likelihoodmdashof the optimal filterresponses conditioned on each disparity level For thepresent case of a uniform prior the optimal MAPestimate is simply the disparity with the greatest loglikelihood Given that the likelihoods are Gaussian thelog likelihoods are quadratic

ln gaussethRjdkTHORNfrac12 frac14 05ethR ukTHORNTC1k ethR ukTHORN

thorn const eth4THORNwhere uk and Ck which are the mean and covariancematrices of the AMA filter responses to a largecollection of natural stereo-image patches havingdisparity dk (see Figure 4a) By multiplying throughand collecting terms Equation 4 can be expressed as

Figure 3 Optimal linear binocular receptive fields for disparity estimation (a) Spatial receptive fields Solid lines with closed symbols

indicate the left eye filter components Dashed lines with open symbols indicate right eye filter components Insets show 2-D versions

of the 1-D filters (b) Luminance spatial frequency tuning versus spatial frequency bandwidth The filters have bandwidths of

approximately 15 octaves (c) Phase and position shift coding for optimal binocular filters (circles) and binocular cells in macaque

(squares) (Cumming amp DeAngelis 2001) Phase shifts are expressed in equivalent position shifts Note that the filters were optimized

for the fovea whereas macaque cells were recorded from a range of different eccentricities

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 6

ln gaussethRjdkTHORNfrac12 frac14Xnifrac141

wikRi thornXnifrac141

wiikR2i

thornXn1

ifrac141

Xnjfrac14ithorn1

wijkethRi thorn RjTHORN2

thorn const0 eth5THORNwhere Ri is the response of the ith AMA filter and wikwiik and wijk are the weights for disparity dk Theweights are simple functions of the mean and covari-ance matrices from Equation 4 (see Supplement) Thusa neuron which responds according to the loglikelihood of a given disparity (an LL neuron)mdashthat isRLL

k frac14 ln[gauss(Rjdk)]mdashcan be obtained by weightedsummation of the linear (first term) squared (secondterm) and pairwise-sum-squared (third term) AMAfilter responses (Equation 5) The implication is that alarge collection of LL neurons each with a differentpreferred disparity can be constructed from a smallfixed set of linear filters simply by changing the weightson the linear filter responses squared-filter responsesand pairwise-sum-squared filter responses

A potential concern is that these computations couldnot be implemented in cortex AMA filters are strictlylinear and produce positive and negative responses(Figure 6a) whereas real cortical neurons produce onlypositive responses However the response of eachAMA filter could be obtained by subtracting theoutputs of two half-wave rectified simple cells that arelsquolsquoonrsquorsquo and lsquolsquooffrsquorsquo versions of each AMA filter (seeSupplement Figure S3a) The squared and pairwise-

sum-squared responses are modeled as resulting from alinear filter followed by a static squaring nonlinearity(Figure 6b) these responses could be obtained fromcortical binocular complex cells (see Supplement)(Note that our lsquolsquocomplex cellrsquorsquo differs from thedefinition given by some prominent expositors of thedisparity energy model [Figure 6d] Cumming ampDeAngelis 2001 Ohzawa 1998 Qian 1997) Thesquared responses could be obtained by summing andsquaring the responses of on and off simple cells (seeSupplement Figure S3a b) Finally the LL neuronresponse could be obtained via a weighted sum of theAMA filter and model complex cell responses (Figure6c) Thus all the optimal computations are biologicallyplausible

Figure 6 shows processing schematics disparitytuning curves and response variability for the threefilter types implied by our analysis an AMA filter amodel complex cell a model LL neuron and forcomparison a standard disparity energy neuron(Cumming amp DeAngelis 2001) (The model complexcells are labeled lsquolsquocomplexrsquorsquo because in general theyexhibit temporal frequency doubling and a fundamen-tal to mean response ratio that is less than 10 Skottunet al 1991) Response variability due to retinal imagefeatures that are irrelevant for estimating disparity isindicated by the gray area the smaller the gray areathe more invariant the neural response to irrelevantimage features The AMA filter is poorly tuned todisparity and gives highly variable responses todifferent stereo-image patches with the same disparity

Figure 4 Joint filter response distributions conditioned on disparity for filters F1 and F2 (see Figure 3a) (a) Joint filter responses to

each of the 7600 image patches in the training set Different colors and symbols denote different disparity levels Contours show

Gaussian fits to the conditional filter response distributions The black curves on the x and y axes represent the marginal response

distributions p(R1) and p(R2) (b) Posterior probability distributions averaged across all stimuli at each disparity level if only filters F1

and F2 are used (dotted curves) and if all eight filter responses are used (solid curves) Using eight AMA filters instead of two

increases disparity selectivity Shaded areas represent 68 confidence intervals on the posterior probabilities this variation is due to

natural stimulus variation that is irrelevant for estimating disparity Natural stimulus variation thus creates response variability even in

hypothetical populations of noiseless neurons

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 7

(Figure 6a) The model complex cell is better tuned butit is not unimodal and its responses also vary severelywith irrelevant image information (Figure 6b) (Thedisparity tuning curves for all model complex cells areshown in Figure S5) In contrast the LL neuron issharply tuned is effectively unimodal and is stronglyresponse invariant (Figure 6c) That is it respondssimilarly to all natural image patches of a givendisparity The disparity tuning curves for a range of LLneurons are shown in Figure 7a When slant variessimilar but broader LL neuron tuning curves result(Figure S1d e)

These results show that it is potentially misleading torefer to canonical V1 binocular simple and complexcells as disparity tuned because their responses aretypically as strongly modulated by variations incontrast pattern as they are by variations in disparity(gray area Figure 6a b) The LL neurons on the otherhand are tuned to a narrow range of disparities andrespond largely independent of the spatial frequencycontent and contrast

The LL neurons have several interesting propertiesFirst their responses are determined almost exclusivelyby the model complex cell inputs because the weightson the linear responses (Equation 5 Figure 6a c) aregenerally near zero (see Supplement) In this regard theLL neurons are consistent with the predictions of thestandard disparity energy model (Cumming amp DeAn-gelis 2001 Ohzawa 1998) However standard dis-

parity energy neurons are not as narrowly tuned or asinvariant (Figure 6d)

Second each LL neuron receives strong inputs frommultiple complex cells (Figure 7b) In this regard theLL neurons are inconsistent with the disparity energymodel which proposes that disparity-tuned cells areconstructed from two binocular subunits The potentialvalue of more than two subunits has been previouslydemonstrated (Qian amp Zhu 1997) Recently it hasbeen shown that some disparity-tuned V1 cells areoften modulated by a greater number of binocularsubunits than two Indeed as many as 14 subunits candrive the activity of a single disparity selective cell(Tanabe Haefner amp Cumming 2011)

Third the weights on the model complex cells(Figure 7b)mdashwhich are determined by the conditionalresponse distributions (Figure 4a)mdashspecify how infor-mation at different spatial frequencies should becombined

Fourth as preferred disparity increases the numberof strong weights on the complex cell inputs decreases(Figure 7b) This occurs because high spatial frequen-cies are less useful for encoding large disparities (seebelow) Thus if cells exist in cortex that behavesimilarly to LL neurons the number and spatialfrequency tuning of binocular subunits driving theirresponse should decrease as a function of theirpreferred disparity

Fifth for all preferred disparities the excitatory(positive) and inhibitory (negative) weights are in a

Figure 5 Accuracy and precision of disparity estimates on test patches (a) Disparity estimates of fronto-parallel surfaces displaced

from fixation using the filters in Figure 3 Symbols represent the median MAP readout of posterior probability distributions (see

Figure 4b) Error bars represent 68 confidence intervals on the estimates Red boxes mark disparity levels not in the training set

Error bars at untrained levels are no larger than at the trained levels indicating that the algorithm makes continuous estimates (b)

Precision of disparity estimates on a semilog axis Symbols represent 68 confidence intervals (same data as error bars in Figure 5a)

Human discrimination thresholds also rise exponentially as stereo stimuli are moved off the plane of fixation The gray area shows the

hyperacuity region (c) Sign identification performance as a function of disparity (d) (e) Same as in (b) (c) except that data is for

surfaces with a cosine distribution of slants

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 8

classic push-pull relationship (Ferster amp Miller 2000)(Figure 7b) consistent with the fact that disparityselective neurons in visual cortex contain both excit-atory and suppressive subunits (Tanabe et al 2011)

Sixth the LL neuron disparity tuning curves areapproximately log-Gaussian in shape (ie Gaussian ona log-disparity axis) Because their standard deviationsare approximately constant on a log disparity axistheir disparity bandwidths increase linearly with thepreferred disparity (Figure 7c) In this respect many

cortical neurons behave like LL neurons the disparitytuning functions of cortical neurons typically havemore low frequency power than is predicted from thestandard energy model (Ohzawa DeAngelis amp Free-man 1997 Read amp Cumming 2003) Additionallypsychophysically estimated disparity channels in hu-mans exhibit a similar relationship between bandwidthand preferred disparity (Stevenson et al 1992) Humandisparity channels however differ in that they haveinhibitory side-lobes (Stevenson et al 1992) that is

Figure 6 Biologically plausible implementation of selective invariant tuning processing schematics and disparity tuning curves for an

AMA filter a model complex cell and a model log-likelihood (LL) neuron For comparison a disparity energy unit is also presented In

all cases the inputs are contrast normalized photoreceptor responses Disparity tuning curves show the mean response of each filter

type across many natural image patches having the same disparity for many different disparities Shaded areas show response

variation due to variation in irrelevant features in the natural patches (not neural noise) Selectivity for disparity and invariance to

irrelevant features (external variation) increase as processing proceeds (a) The filter response is obtained by linearly filtering the

contrast normalized input signal with the AMA filter (b) The model complex cell response is obtained by squaring the linear AMA

filter response (c) The response of an LL neuron with preferred disparity dk is obtained by a weighted sum of linear and squared

filter responses The weights can be positiveexcitatory or negativeinhibitory (see Figure 7) The weights for an LL neuron with a

particular preferred disparity are specified by the filter response distribution to natural images having that disparity (Figure 4a

Equations 4 5 S1ndash 4) In disparity estimation the filter responses specify that the weights on the linear filter responses are near zero

(see Supplement) (d) A standard disparity energy unit is obtained by simply summing the squared responses of two binocular linear

filters that are in quadrature (908 out of phase with each other) Here we show the tuning curve of a disparity energy unit having

binocular linear filters (subunits) with left and right-eye components that are also 908 out of phase with each other (ie each

binocular subunit is selective for a nonzero disparity)

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 9

they have the center-surround organization that is ahallmark of retinal ganglion cell LGN and V1receptive fields Understanding the basis of this center-surround organization is an important direction forfuture work

V1 binocular neurons are unlikely to have receptivefields exactly matching those of the simple and complexcells implied by the optimal AMA filters but V1neurons are likely to span the subspace spanned by theoptimal filters It is thus plausible that excitatory andinhibitory synaptic weights could develop so that asubset of the neurons in V1 or in other cortical areassignal the log likelihood of different specific disparities(see Figure 7b) Indeed some cells in cortical areas V1V2 and V3V3a exhibit sharp tuning to the disparity ofrandom dot stimuli (Cumming 2002 Ohzawa et al1997 G F Poggio et al 1988 Read amp Cumming2003)

Computational models of estimation from neuralpopulations often rely on the assumption that eachneuron is invariant and unimodally tuned to thestimulus property of interest (Girshick et al 2011Jazayeri amp Movshon 2006 Lehky amp Sejnowski 1990Ma et al 2006) However it is often not discussed howinvariant unimodal tuning arises For example thebinocular neurons (ie complex cells) predicted by thestandard disparity energy model do not generallyexhibit invariant unimodal tuning (Figure 6d) Ouranalysis shows that neurons with unimodal tuning tostimulus properties not trivially available in the retinal

images (eg disparity) can result from appropriatelinear combination of nonlinear filter responses

To obtain optimal disparity estimates the LLneuron population response (represented by the vectorRLL in Figure 2c) must be read out (see Figure 2d)The optimal read-out rule depends on the observerrsquosgoal (the cost function) A common goal is to pick thedisparity having the maximum a posteriori probabil-ity If the prior probability of the different possibledisparities is uniform then the optimal MAP decodingrule reduces to finding the LL neuron with themaximum response Nonuniform prior probabilitiescan be taken into account by adding a disparity-dependent constant to each LL neuron responsebefore finding the peak response There are elegantproposals for how the peak of a population responsecan be computed in noisy neural systems (Jazayeri ampMovshon 2006 Ma et al 2006) Other commonlyassumed cost functions (eg MMSE) yield similarperformance

In sum our analysis has several implications First itsuggests that optimal disparity estimation is bestunderstood in the context of a population codeSecond it shows how to linearly sum nonlinear neuralresponses to construct cells with invariant unimodaltuning curves Third it suggests that the eclecticmixture of binocular receptive field properties in cortexmay play a functional role in disparity estimationFourth it provides a principled hypothesis for howneurons may compute the posterior probabilities of

Figure 7 Constructing selective invariant disparity-tuned units (LL neurons) (a) Tuning curves for several LL neurons each with a

different preferred disparity Each point on the tuning curve represents the average response across a collection of natural stereo-

images having the same disparity Gray areas indicate 61 SD of response due to stimulus-induced response variability (b)

Normalized weights on model complex cell responses (see Supplement Equation 5 Figure 6c) for constructing the five LL neurons

marked with arrows in (a) Positive weights are excitatory (red) Negative weights are inhibitory (blue) On-diagonal weights

correspond to model complex cells having linear receptive fields like the filters in Figure 3a Off-diagonal weights correspond to model

complex cells having linear receptive fields like the scaled pairwise sums of the AMA filters (see Figures S3ndashS5) High spatial

frequencies are not useful for estimating large disparities (Figure S7) Thus the number of strongly weighted complex cells decreases

as the magnitude of the preferred disparity increases from zero (c) LL neuron bandwidth (ie full-width at half-height of disparity

tuning curves) as a function of preferred disparity Bandwidth increases approximately linearly with tuning

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 10

stimulus properties (eg disparity defocus motion)not trivially available in the retinal image(s) Thus ouranalysis provides a recipe for how to increase selectivityand invariance in the visual processing stream invari-ance to image content variability and selectivity for thestimulus property of interest

Discussion

Using principles of Bayesian statistical decisiontheory we showed how to optimally estimate binoc-ular disparity in natural stereo images given theoptical sensor and noise properties of the humanvisual system First we determined the linear binoc-ular filters that encode the retinal image informationmost useful for estimating disparity in natural stereo-images The optimal linear filters share many proper-ties with binocular simple cells in the primary visualcortex of monkeys and cats (Figure 3 Figure S1)Then we determined how the optimal linear filterresponses should be nonlinearly decoded to maximizethe accuracy of disparity estimation The optimaldecoder was not based on prior assumptions butrather was dictated by the statistical pattern of thejoint filter responses to natural stereo images Overallperformance was excellent and matched importantaspects of human performance (Figure 5) Finally weshowed that the optimal encoder and decoder can beimplemented with well-established neural operationsThe operations show how selectivity and invariancecan emerge along the visual processing stream(Figures 6 amp 7)

External variability and neural noise

A visual systemrsquos performance is limited by bothexternal (stimulus-induced) variability and intrinsicneural response variability (ie neural noise) Manytheoretical studies have focused on the impact ofneural noise on neural computation (Berens EckerGerwinn Tolias amp Bethge 2011 Cumming ampDeAngelis 2001 DeAngelis et al 1991 Jazayeri ampMovshon 2006 Ma et al 2006 Nienborg et al 2004Ohzawa DeAngelis amp Freeman 1990) However inmany real-world situations stimulus-induced responsevariability (ie variability due to nuisance stimulusproperties) is a major or the dominant source ofvariability Our analysis demonstrates that the exter-nal variation in natural image contentmdashthat isvariation in image features irrelevant to the taskmdashconstitutes an important source of response variabilityin disparity estimation (see Figures 4 5 7) Thus evenin a hypothetical visual system composed of noiseless

neurons significant response variability will occurwhen estimating disparity under natural conditions(see Figures 4 and 5)

Determining the best performance possible withoutneural noise is essential for understanding the effect ofneural noise (Haefner amp Bethge 2010 Lewicki 2002Li amp Atick 1994 Olshausen amp Field 1996 Simoncelliamp Olshausen 2001) because it sets bounds on theamount of neural noise that a system can toleratebefore performance significantly deteriorates Indeedthe impact of neural noise on encoding and decoding innatural viewing cannot be evaluated without consider-ing the impact of stimulus-induced response variabilityon the population response (see Supplement)

Seventy years ago Hecht Shlaer and Pirenne (1942)showed in a landmark study that dark-adapted humanscan detect light in the periphery from the absorption ofonly five to eight quanta and that external variability(in the form of photon noise) was the primary source ofvariability in human performance This result contra-dicted the notion (widespread at the time) that responsevariability is due primarily to variability within theorganism (eg intrinsic noise) Similarly our resultsunderscore the importance of analyzing the variabilityof natural signals when analyzing the responsevariability of neural circuits that underlie performancein critical behavioral tasks

Comparison with the disparity energy model

The most common neurophysiological model ofdisparity processing is the disparity energy modelwhich aims to account for the response properties ofbinocular complex cells (Cumming amp DeAngelis 2001DeAngelis et al 1991 Ohzawa 1998 Ohzawa et al1990) The model had remarkable initial successHowever an increasing number of discrepancies haveemerged between the neurophysiology and the modelrsquospredictions The standard disparity energy model forexample does not predict the reduced response toanticorrelated stereograms observed in cortex It alsodoes not predict the dependence of disparity selectivecells on more than two binocular subunits Fixes to thestandard model have been proposed to account for theobserved binocular response properties (Haefner ampCumming 2008 Qian amp Zhu 1997 Read Parker ampCumming 2002 Tanabe amp Cumming 2008) We haveshown however that a number of these properties area natural consequence of an optimal algorithm fordisparity estimation

The most common computational model of disparityprocessing is the local cross-correlation model (Bankset al 2004 Cormack et al 1991 Tyler amp Julesz 1978)In this model the estimated disparity is the one thatproduces the maximum local cross-correlation between

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 11

the images in the two eyes Local cross-correlation isthe computational equivalent of appropriately pro-cessing large populations of disparity-energy complexcells having a wide range of tuning characteristics(Anzai Ohzawa amp Freeman 1999 Fleet Wagner ampHeeger 1996) Cross-correlation is optimal when thedisparities across a patch are small and constantbecause it is then akin to the simple template matchingof noise-limited ideal observers Cross-correlation isnot optimal for larger or varying disparities becausethen the left- and right-eye signals falling within abinocular receptive field often differ substantially

How does our ideal observer for disparity estimationcompare with the standard disparity-energy and cross-correlation models To enable a meaningful compari-son both methods were given access to the same imageinformation Doing so requires a slight modification tothe standard cross correlation algorithm (seeSupplement) The eight AMA filters in Figure 3outperform cross-correlation in both accuracy andprecision when the disparities exceed approximately75 arcmin (Figure S6) This performance increase isimpressive given that local cross-correlation has (ineffect) access to a large bank of filters whereas ourmethod uses only eight Furthermore increasing thenumber of AMA filters to 10ndash12 gives essentiallyequivalent performance to cross-correlation for smallerdisparities

Spatial frequency tuning of optimal linearbinocular filters

The optimal linear binocular filters (the AMA filters)are selective for low to mid spatial frequencies (12ndash47c8) and not for higher frequencies (see Figure 3 FigureS1) To develop an intuition for why we examined thebinocular contrast signals as a function of frequencythat result from a binocularly viewed high-contrastedge (Figure S7) The binocular contrast signals barelydiffer above 6 c8 indicating that higher spatialfrequencies carry little information for disparity esti-mation

This analysis of binocular contrast signals providesinsight into another puzzling aspect of disparityprocessing Human sensitivity to disparity modulation(ie sinusoidal modulations in disparity-defined depth)cuts off at very low frequencies (4 c8) (Banks et al2004 Tyler 1975) V1 binocular receptive fields tunedto the highest useful (luminance) spatial frequency (6c8) have widths of 8 arcmin assuming the typicaloctave bandwidth for cortical neurons Given thatreceptive fields cannot signal variation finer than theirown size the fact that humans cannot see disparitymodulations higher than 4 c8 is nicely predicted bythe 8 arcmin width suggested by the present analysis

This size matches previous psychophysically basedestimates of the smallest useful mechanism in disparityestimation (Banks et al 2004 Harris McKee ampSmallman 1997)

Eye movement jitter has previously been proposed asan explanation for why useful binocular information isrestricted to low spatial frequencies (Vlaskamp Yoonamp Banks 2011) The argument is that jitter and therelatively long integration time of the stereo systemcould lsquolsquosmear outrsquorsquo the disparity signals Eye movementjitter certainly can degrade stereo information enoughto render high frequency signals unmeasurable (FigureS7c d) However the analysis in the supplement(Figure S7) suggests that the low frequency at whichhumans lose the ability to detect disparity modulationmay also be explained by the statistics of naturalimages

Generality of findings

Although our analysis is based on calibrated naturalstereo-image patches and realistic characterizations ofhuman optics sensors and neural noise there are somesimplifying assumptions that could potentially limit thegenerality of our conclusions Here we examine theeffect of these assumptions

The effect of surface slant and nonplanar depth structure

The stereo-image patches were projections offronto-parallel surfaces In natural scenes surfaces areoften slanted which causes the disparity to changeacross a patch We evaluated the effect of surface slantby repeating our analysis with a training set having adistribution of slants (see Figure 5d e andSupplement) This distribution of slants produced adistribution of disparity gradients that are comparableto those that occur when viewing natural scenes(Hibbard 2008) (Figure S1g) The optimal binocularfilters are only very slightly affected by surface slant(cf Figure 3 and Figure S1a) These differenceswould be difficult to detect in neurophysiologicalexperiments This may help explain why there hasbeen little success in finding neurons in early visualcortex that are tuned to nonzero slants (Nienborg etal 2004) Disparity estimation performance is alsoquite similar although estimate precision is somewhatreduced (Figure 5d e)

Even after including the effect of surface slant ourtraining set lacks the non-planar depth structuremdashwithin-patch depth variation and depth discontinuities(occlusions)mdashthat is present in natural scenes (FigureS8andashc) To determine how the magnitude of theseother sources depth variation compare to that due toslant we analyzed a set of range images obtained with

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 12

a high-precision Riegl VZ400 range scanner (seeSupplement) We find that most of the depth variationin the range images is captured by variation in slant(Figure S8d) Given that variation in slant has littleeffect on the optimal receptive fields and given thatlocations of fine depth structure and monocularocclusion zones (9 of points in our dataset) aremost likely random with respect to the receptive fieldsthen it is likely that the other sources of variation willhave little effect

Nevertheless occlusions create monocular zones thatcarry information supporting da Vinci stereopsis inhumans (Kaye 1978 Nakayama amp Shimojo 1990)Neurophysiological evidence suggests that binocularmechanisms exist for locating occlusion boundaries(Tsao Conway amp Livingstone 2003) In the futurewhen databases become available of high-qualitynatural stereo images with coregistered RGB anddistance information the general approach describedhere may prove useful for determining the filters thatare optimal for detecting occlusion boundaries

The effect of a realistic disparity prior

Our analysis assumed a flat prior probabilitydistribution over disparity Two groups have recentlypublished estimates of the prior distribution ofdisparities based on range measurements in naturalscenes and human eye movement statistics (Hibbard2008 Liu et al 2008) Does the prior distribution ofdisparity signals have a significant effect on the optimalfilters and estimation performance To check wemodified the frequency of different disparities in ourtraining set to match the prior distribution ofdisparities encountered in natural viewing (Liu et al2008) Linear filter shapes LL neuron tuning curveshapes and performance levels were robust to differ-ences between a flat and realistic disparity prior

It has been hypothesized that the prior over thestimulus dimension of interest (eg disparity) maydetermine the optimal tuning curve shapes and theoptimal distribution of peak tunings (ie how thetuning curves should tile the stimulus dimension)(Ganguli amp Simoncelli 2010) In evaluating thishypothesis one must consider the results presented inthe present paper Our results show that natural signalsand the linear receptive fields that filter those signalsplace strong constraints on the shapes that tuningcurves can have Specifically the shapes of the LLneuron disparity tuning curve (approximately log-Gaussian) are robust to changes in the prior Thusalthough the prior may influence the optimal distribu-tion of peak tuning (our analysis does not address thisissue) it is unlikely to be the sole (or even the primary)determinant of tuning curve shapes

The effect of contrast normalization

Contrast normalization significantly contributes tothe Gaussian form of the filter response distributionsOne potential advantage of Gaussian response distri-butions is that pair-wise (and lower order) statisticsfully characterize the joint responses from an arbitrarynumber of filters making possible the decoding of largefilter population responses In the analysis presentedhere all binocular signals were normalized to a mean ofzero and a vector magnitude of 10 before beingprojected onto the binocular filters This is a simplifiedform of the contrast normalization that is a ubiquitousfeature of retinal and early cortical processing (Car-andini amp Heeger 2012) The standard model ofcontrast normalization in cortical neurons is given by

cnorm(x)frac14 c(x)ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffijjcethxTHORNjj2 thorn nc2

50

qwhere n is the dimen-

sionality of the vector and c50 is the half-saturationconstant (Albrecht amp Geisler 1991 Albrecht ampHamilton 1982 Heeger 1992) (The half-saturationconstant is so-called because in a neuron with anoutput squaring nonlinearity the response rate willequal half its maximum when the contrast of thestimulus equals the value of c50)

We examined the effect of different half-saturationconstants The optimal filters are robust to differentvalues of c50 The conditional response distributions arenot For large values of c50 the response distributionshave tails much heavier than Gaussians When c50frac14 00(as it did throughout the paper) the distributions arewell approximated but somewhat lighter-tailed thanGaussians The distributions (eg see Figure 4a) aremost Gaussian on average when c50 frac14 01 (Figure S9)On the basis of this finding we hypothesize a newfunction for cortical contrast normalization in additionto the many already proposed (Carandini amp Heeger2012) Contrast normalization may help create condi-tional filter response distributions that are Gaussianthereby making simpler the encoding and decoding ofhigh-dimensional subspaces of retinal image informa-tion

The effect of different optics in the two eyes

We modeled the optics of the two eyes as beingidentical whereas refractive power and monochromaticaberrations often differ somewhat between the eyes(Marcos amp Burns 2000) To evaluate the effect ofnormal optical variations we convolved the retinalprojections with point-spread functions previouslymeasured in the first authorrsquos left and right eyes (Burgeamp Geisler 2011) (The first author has normal opticalvariation) Then we determined optimal filters andperformance Filters and performance levels are similarto those in Figures 3 and 4 Thus our results are robustto typical optical differences between the eyes How-

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 13

ever if the optics in one eye is severely degradedrelative to the other then the optimal filters are quitedifferent and disparity estimation performance issignificantly poorer

Interestingly a persistent 4 diopter difference inrefractive power between the two eyes is the mostimportant risk factor for the development of amblyopia(Levi McKee amp Movshon 2011) a condition char-acterized by extremely poor or absent stereopsis(among other deficits) A four diopter differencebetween the left and right eye optics drastically reduces(ie nearly eliminates) computable interocular contrastsignals that covary systematically with disparity(Equation S6 Figure S7endashh)

Conclusions

The method presented here provides a prescriptionfor how to optimally estimate local depth cues fromimages captured by the photosensors in any visionsystem Our analysis of the depth cue of binoculardisparity provides insight into the computations andneural populations required for accurate estimation ofbinocular disparity in animals viewing natural scenesSpecifically by analyzing the information available atthe photoreceptors we improve upon existing compu-tational methods for solving the stereo correspondenceproblem and provide a normative account of a range ofestablished neurophysiological and psychophysicalfindings This study demonstrates the power ofcharacterizing the properties of natural signals that arerelevant for performing specific natural tasks A similarrecent analysis provided insight into the neuralcomputations required for accurate estimation of thefocus error (defocus) in local regions of retinal imagesof natural scenes (Burge amp Geisler 2011 Stevenson etal 1992) The same approach seems poised to provideinsight into the neural encoding and decoding thatunderlie many other fundamental sensory and percep-tual tasks

Methods details

Natural disparity signals

To determine the luminance structure of naturalscenes we photographed natural scenes on and aroundthe University of Texas at Austin campus with a tripod-mounted Nikon D700 14-bit SLR camera (4256 middot 2836pixels) fitted with a Sigma 50 mm prime lens (Burge ampGeisler 2011) To ensure sharp photographs thecamera lens was focused on optical infinity and all

imaged objects were at least 16 m from the cameraRAW photographs were converted to luminance valuesusing the measured sensitivities of the camera (Burge ampGeisler 2011 2012) and the human photopic sensitivityfunction (Stockman amp Sharpe 2000) Twelve hundred256 middot 256 pixel patches were randomly selected from80 photographs (15 Patches middot 80 Images Figure 1b)Four hundred were used for training the other 800were used for testing

To approximate the depth structure of naturalscenes we texture map the luminance patches ontoplanar fronto-parallel or slanted surface patches thatwere straight ahead at eye height After perspectiveprojection this procedure yields the same pattern ofretinal stimulation that would be created by viewingphotographs pasted on surfaces (eg walls) differentlyslanted in depth The disparity of the surface patch wasdefined as the disparity of the surfacersquos central point

d frac14 2 tan1 IPD=2

dfixation thorn D

tan1 IPD=2

ethdfixationTHORN

eth6THORNwhere dfixation is the fixation distance to the surfacecenter IPD is the interpupillary distance and D is thedepth Depth is given by D frac14 dsurface dfixation wheredsurface is the distance to the surface The disparity ofother points on the surface patch varies slightly acrossthe surface patch The change in disparity across thesurface is greater when the surface patches are slanted

Each surface patch was either coincident with ordisplaced from the point of fixation (Figure 1c) that isthe surface patches were positioned at one of 37 depthscorresponding to 37 disparity levels within Panumrsquosfusional range (Panum 1858) (16875 to 16875arcmin in equally spaced steps) Each surface patchsubtended 28 of visual angle from the cyclopean eyeSurfaces were then slanted (or not) and left- and right-eye projections were determined via perspective pro-jection ensuring that horizontal and vertical disparitieswere correct (Figure 1c) Next left and right projec-tions were resampled at 128 samples8 approximatelythe sampling rate of the human foveal cones (CurcioSloan Kalina amp Hendrickson 1990) Finally theimages were cropped such that they subtended 18 fromeach eye (6 058 about the left- and right-eye foveae)

Optics

Patches were defocused with a polychromatic point-spread function based on a wave-optics model of thehuman visual system The model assumed a 2 mm pupil(a size typical for a bright sunny day) (Wyszecki ampStiles 1982) human chromatic aberrations (ThibosYe Zhang amp Bradley 1992) a single refracting

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 14

surface and the Fraunhoffer approximation whichimplies that at or near the focal plane the opticaltransfer function (OTF) is given by the cross-correla-tion of the generalized pupil function with its complexconjugate In humans and nonhuman primates ac-commodative and vergence eye-movement systems aretightly coupled In other words the focus distanceusually equals the fixation distance (Fincham ampWalton 1957) We set the focus distance equal to thefixation distance of 40 cm and defocused the imagesappropriate for each disparity See Supplement forfurther details of how the optics were simulated

Sensor array responses

The information for disparity estimation depends onthe depth and luminance structure of natural scenesprojective geometry the optics of the two eyes thesensor arrays and the sensor noise These factorstogether determine the pattern of noisy sensor re-sponses

reethxTHORN frac14 IeethxTHORNpsfeethxTHORNfrac12 sampethxTHORN thorn g eth7THORNfor each eye e Ie(x) represents an eye-specific lumi-nance image of the light striking the sensor array ateach location xfrac14 (x y) in a hypothetical optical systemthat causes zero degradation in image quality Eacheyersquos optics degrade the idealized image the optics arerepresented by a polychromatic point-spread functionpsfe(x) that contains the effects of defocus chromaticaberration and photopic wavelength sensitivity Thesensor arrays are represented by a spatial samplingfunctions samp(x) Finally the sensor responses arecorrupted by noise g the noise level was set just highenough to remove retinal image detail that is unde-tectable by the human visual system (Williams 1985)(In pilot studies we found that the falloff in contrastsensitivity at low frequencies had a negligible effect onresults for simplicity we did not model its effects) Notethat Equation 7 represents a luminance (photopic)approximation of the images that would be captured bythe photoreceptors This approximation is sufficientlyaccurate for the present purposes (Burge amp Geisler2011)

Accuracy maximization analysis (AMA)

AMA is a method for dimensionality reduction thatfinds the low-dimensional set of features in sensorysignals that are most useful for performing a specifictask The dependence of AMA on task renders itdistinct from many other popular methods for dimen-sionality reduction Principal components analysis(PCA) for example finds the dimensions that account

for the most variation in the sensory signals There isof course no guarantee that factors causing the mostvariation in the sensory signals will be useful for thetask at hand Color lighting and the multitude oftextures in natural images for example cause signifi-cant variation in the retinal images Yet all of thisvariation is irrelevant for the task of estimatingdisparity Thus PCA returns features that are notnecessarily useful to the task while AMA is specificallydesigned to ignore the irrelevant variation

The logic of AMA is as follows Consider encodingeach training stimulus with a small population of filterswith known (internal) noise characteristics The filterresponses are given by the dot product of the contrast-normalized stimulus scaled by the maximum responserate plus response noise (here a small amount ofGaussian noise) With a known noise model it isstraightforward to compute the mean and variance ofeach filterrsquos response to each training stimulus Then aclosed-form expression is derived for the approximateaccuracy of the Bayesian optimal decoder (GeislerNajemnik amp Ing 2009) Finally this closed-formexpression can be used to search the space of linearfilters to find those that give the most accurateperformance If the algorithm does not settle in localminima it finds the Bayes-optimal filters for maximiz-ing performance in a given task (Geisler et al 2009)Here different random initializations yielded the samefinal estimated filters A Matlab implementation ofAMA and a short discussion of how to apply it areavailable at httpjburgecpsutexaseduresearchCodehtml

It is important to note that the Bayesian optimaldecoder used in AMA requires knowing the means andvariances for each possible stimulus and hence it canonly be used to decode the training stimuli In otherwords AMA can find the optimal linear filters given alarge enough training set but the decoder it uses tolearn those filters cannot be applied to arbitrary stimuliA separate analysis (like the one described in the bodyof the present paper) is required to determine how tooptimally decode the AMA filter responses for arbi-trary test stimuli

Estimating disparity

Increasing the number of disparity levels in thetraining set increases the accuracy of disparity estima-tion However increasing the number of disparitylevels in the training set also increases the training setsize and the computational complexity of learningfilters via AMA A balance must be struck Excellentcontinuous estimates are obtained using 1875 arcminsteps for training followed by interpolation of 577 filterresponse distributions (ie 31 interpolated response

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 15

distribution between each training step correspondingto one interpolated distribution every 35 arcsec)Interpolated distributions were obtained by fittingcubic splines through the response distribution meanvectors and covariance matrices This procedureresulted in 577 LL neurons which resulted in posteriorprobability distributions that were defined by 577discrete points MAP estimates were obtained byselecting the disparity with the highest posteriorprobability Increasing the number of interpolateddistributions (ie LL neurons) had no effect onperformance

Keywords natural scene statistics perceptual con-stancy ideal observer Bayesian statistics populationcode encoding decoding selectivity invariance depthperception stereopsis hierarchical model disparityenergy model simple cells complex cells

Acknowledgments

The authors thank Larry Cormack Bruno Olshau-sen and Jonathan Pillow for comments on an earlierdraft JB was supported by NSF grant IIS-1111328NIH grant 2R01-EY011747 and NIH Training Grant1T32-EY021462-01A1 WSG was supported by NIHgrant 2R01-EY011747

Commercial relationships noneCorresponding author Johannes BurgeEmail jburgemailcpsutexaseduAddress Center for Perceptual Systems University ofTexas at Austin Austin TX USA

References

Albrecht D G amp Geisler W S (1991) Motionselectivity and the contrast-response function ofsimple cells in the visual cortex Visual Neurosci-ence 7(6) 531ndash546

Albrecht D G amp Hamilton D B (1982) Striatecortex of monkey and cat Contrast responsefunction Journal of Neurophysiology 48(1) 217ndash237

Anzai A Ohzawa I amp Freeman R D (1999)Neural mechanisms for processing binocular in-formation I Simple cells Journal of Neurophysiol-ogy 82(2) 891ndash908

Badcock D R amp Schor C M (1985) Depth-increment detection function for individual spatialchannels Journal of the Optical Society of AmericaA 2(7) 1211ndash1215

Banks M S Gepshtein S amp Landy M S (2004)Why is spatial stereoresolution so low Journal ofNeuroscience 24(9) 2077ndash2089

Berens P Ecker A S Gerwinn S Tolias A S ampBethge M (2011) Reassessing optimal neuralpopulation codes with neurometric functionsProceedings of the National Academy of SciencesUSA 108(11) 4423

Blakemore C (1970) The range and scope ofbinocular depth discrimination in man The Journalof Physiology 211(3) 599ndash622

Burge J amp Geisler W S (2011) Optimal defocusestimation in individual natural images Proceed-ings of the National Academy of Sciences USA108(40) 16849ndash16854

Burge J amp Geisler W S (2012) Optimal defocusestimates from individual images for autofocusing adigital camera Proceedings of the ISampTSPIE 47thAnnual Meeting January 2012 Burlingame CA

Carandini M amp Heeger D J (2012) Normalizationas a canonical neural computation Nature ReviewsNeuroscience 13(1) 51ndash62

Chen Y amp Qian N (2004) A coarse-to-fine disparityenergy model with both phase-shift and position-shift receptive field mechanisms Neural Computa-tion 16(8) 1545ndash1577

Cormack L K Stevenson S B amp Schor C M(1991) Interocular correlation luminance contrastand cyclopean processing Vision Research 31(12)2195ndash2207

Cumming B G (2002) An unexpected specializationfor horizontal disparity in primate primary visualcortex Nature 418(6898) 633ndash636

Cumming B G amp DeAngelis G C (2001) Thephysiology of stereopsis Annual Review of Neuro-science 24 203ndash238

Cumming B G Shapiro S E amp Parker A J (1998)Disparity detection in anticorrelated stereogramsPerception 27(11) 1367ndash1377

Curcio C A Sloan K R Kalina R E ampHendrickson A E (1990) Human photoreceptortopography The Journal of Comparative Neurolo-gy 292(4) 497ndash523

De Valois R L Albrecht D G amp Thorell L G(1982) Spatial frequency selectivity of cells inmacaque visual cortex Vision Research 22(5) 545ndash559

DeAngelis G C Ohzawa I amp Freeman R D(1991) Depth is encoded in the visual cortex by aspecialized receptive field structure Nature352(6331) 156ndash159

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 16

Ferster D amp Miller K D (2000) Neural mechanismsof orientation selectivity in the visual cortexAnnual Review of Neuroscience 23(1) 441ndash471

Fincham E amp Walton J (1957) The reciprocalactions of accommodation and convergence TheJournal of Physiology 137(3) 488ndash508

Fleet D J Wagner H amp Heeger D J (1996)Neural encoding of binocular disparity Energymodels position shifts and phase shifts VisionResearch 36(12) 1839ndash1857

Ganguli D amp Simoncelli E P (2010) Implicitencoding of prior probabilities in optimal neuralpopulations Advances in Neural Information Pro-cessing Systems (NIPS) 23 658ndash666

Geisler W S Najemnik J amp Ing A D (2009)Optimal stimulus encoders for natural tasksJournal of Vision 9(13)17 1ndash16 httpwwwjournalofvisionorgcontent91317 10116791317 [PubMed] [Article]

Girshick A R Landy M S amp Simoncelli E P(2011) Cardinal rules Visual orientation percep-tion reflects knowledge of environmental statisticsNature Neuroscience 14(7) 926ndash932

Haefner R amp Bethge M (2010) Evaluating neuronalcodes for inference using Fisher informationAdvances in Neural Information Processing Systems23 1ndash9

Haefner R M amp Cumming B G (2008) Adaptationto natural binocular disparities in primate V1explained by a generalized energy model Neuron57(1) 147ndash158

Harris J M McKee S P amp Smallman H S (1997)Fine-scale processing in human binocular stereop-sis Journal of the Optical Society of America AOptics Image Science and Vision 14(8) 1673ndash1683

Hecht S Shlaer S amp Pirenne M H (1942) Energyquanta and vision The Journal of General Physi-ology 25(6) 819ndash840

Heeger D J (1992) Normalization of cell responses incat striate cortex Visual Neuroscience 9(2) 181ndash197

Hibbard P B (2008) Binocular energy responses tonatural images Vision Research 48(12) 1427ndash1439

Hoyer P O amp Hyvarinen A (2000) Independentcomponent analysis applied to feature extractionfrom colour and stereo images Network Compu-tation in Neural Systems 11(3) 191ndash210

Jazayeri M amp Movshon J A (2006) Optimalrepresentation of sensory information by neuralpopulations Nature Neuroscience 9(5) 690ndash696

Kaye M (1978) Stereopsis without binocular corre-lation Vision Research 18(8) 1013ndash1022

Landers D D amp Cormack L K (1997) Asymme-tries and errors in perception of depth fromdisparity suggest a multicomponent model ofdisparity processing Perception amp Psychophysics59(2) 219ndash231

Lehky S R amp Sejnowski T J (1990) Neural modelof stereoacuity and depth interpolation based on adistributed representation of stereo disparityJournal of Neuroscience 10(7) 2281ndash2299

Levi D M McKee S P amp Movshon J A (2011)Visual deficits in anisometropia Vision Research51(1) 48ndash57

Lewicki M S (2002) Efficient coding of naturalsounds Nature Neuroscience 5(4) 356ndash363

Li Z amp Atick J J (1994) Efficient stereo coding inthe multiscale representation Network Computa-tion in Neural Systems 5 157ndash174

Liu Y Bovik A C amp Cormack L K (2008)Disparity statistics in natural scenes Journal ofVision 8(11)19 1ndash14 httpwwwjournalofvisionorgcontent81119 doi10116781119 [PubMed][Article]

Ma W J Beck J M Latham P E amp Pouget A(2006) Bayesian inference with probabilistic pop-ulation codes Nature Neuroscience 9(11) 1432ndash1438

Marcos S amp Burns S A (2000) On the symmetrybetween eyes of wavefront aberration and conedirectionality Vision Research 40(18) 2437ndash2447

Marr D amp Poggio T (1976) Cooperative computa-tion of stereo disparity Science 194(4262) 283ndash287

McKee S P Levi D M amp Bowne S F (1990) Theimprecision of stereopsis Vision Research 30(11)1763ndash1779

Nakayama K amp Shimojo S (1990) Da Vincistereopsis Depth and subjective occluding contoursfrom unpaired image points Vision Research30(11) 1811ndash1825

Nienborg H Bridge H Parker A J amp CummingB G (2004) Receptive field size in V1 neuronslimits acuity for perceiving disparity modulationJournal of Neuroscience 24(9) 2065ndash2076

Ogle K N (1952) On the limits of stereoscopic visionJournal of Experimental Psychology 44(4) 253ndash259

Ohzawa I (1998) Mechanisms of stereoscopic visionThe disparity energy model Current Opinion inNeurobiology 8(4) 509ndash515

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 17

Ohzawa I DeAngelis G C amp Freeman R D(1990) Stereoscopic depth discrimination in thevisual cortex Neurons ideally suited as disparitydetectors Science 249(4972) 1037ndash1041

Ohzawa I DeAngelis G C amp Freeman R D(1997) Encoding of binocular disparity by complexcells in the catrsquos visual cortex Journal of Neuro-physiology 77(6) 2879ndash2909

Olshausen B A amp Field D J (1996) Emergence ofsimple-cell receptive field properties by learning asparse code for natural images Nature 381(6583)607ndash609

Panum P (1858) Physiologische Untersuchungen uberdas Sehen mit zwei Augen [Translation Psycholog-ical investigations on seeing with two eyes] KielSchwerssche Buchandlung

Poggio G F Gonzalez F amp Krause F (1988)Stereoscopic mechanisms in monkey visual cortexBinocular correlation and disparity selectivityJournal of Neuroscience 8(12) 4531ndash4550

Qian N (1997) Binocular disparity and the perceptionof depth Neuron 18(3) 359ndash368

Qian N amp Zhu Y (1997) Physiological computationof binocular disparity Vision Research 37(13)1811ndash1827

Read J C A amp Cumming B G (2003) Testingquantitative models of binocular disparity selectiv-ity in primary visual cortex Journal of Neurophys-iology 90(5) 2795ndash2817

Read J C A amp Cumming B G (2007) Sensors forimpossible stimuli may solve the stereo correspon-dence problem Nature Neuroscience 10(10) 1322ndash1328

Read J C A Parker A J amp Cumming B G (2002)A simple model accounts for the response ofdisparity-tuned V1 neurons to anticorrelated im-ages Visual Neuroscience 19(6) 735ndash753

Simoncelli E P amp Olshausen B A (2001) Naturalimage statistics and neural representation AnnualReview of Neuroscience 24 1193ndash1216

Skottun B C De Valois R L Grosof D HMovshon J A Albrecht D G amp Bonds A B(1991) Classifying simple and complex cells on the

basis of response modulation Vision Research31(7-8) 1079ndash1086

Stevenson S B Cormack L K Schor C M ampTyler C W (1992) Disparity tuning in mecha-nisms of human stereopsis Vision Research 32(9)1685ndash1694

Stockman A amp Sharpe L T (2000) The spectralsensitivities of the middle- and long-wavelength-sensitive cones derived from measurements inobservers of known genotype Vision Research40(13) 1711ndash1737

Tanabe S amp Cumming B G (2008) Mechanismsunderlying the transformation of disparity signalsfrom V1 to V2 in the macaque Journal ofNeuroscience 28(44) 11304ndash11314

Tanabe S Haefner R M amp Cumming B G (2011)Suppressive mechanisms in monkey V1 help tosolve the stereo correspondence problem Journal ofNeuroscience 31(22) 8295ndash8305

Thibos L N Ye M Zhang X amp Bradley A(1992) The chromatic eye A new reduced-eyemodel of ocular chromatic aberration in humansApplied Optics 31(19) 3594ndash3600

Tsao D Y Conway B R amp Livingstone M S(2003) Receptive fields of disparity-tuned simplecells in macaque V1 Neuron 38(1) 103ndash114

Tyler C W (1975) Spatial organization of binoculardisparity sensitivity Vision Research 15(5) 583ndash590

Tyler C W amp Julesz B (1978) Binocular cross-correlation in time and space Vision Research18(1) 101ndash105

Vlaskamp B N S Yoon G amp Banks M S (2011)Human stereopsis is not limited by the optics of thewell-focused eye Journal of Neuroscience 31(27)9814ndash9818

Williams D R (1985) Visibility of interference fringesnear the resolution limit Journal of the OpticalSociety of America A 2(7) 1087ndash1093

Wyszecki G amp Stiles W (1982) Color scienceConcepts and methods quantitative data and for-mulas New York John Wiley amp Sons

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 18

113 13

Optimal13 disparity13 estimation13 in13 natural13 stereo-shy‐images13 (Supplement)13

Johannes13 Burge13 amp13 Wilson13 S13 Geisler13 13 Computing13 the13 log13 likelihood13 with13 populations13 of13 simple13 and13 complex13 cells13 13 Simple13 cells13 in13 visual13 cortex13 produce13 only13 positive13 responses13 whereas13 AMA13 filters13 produce13 both13 positive13 and13 negative13 responses13 The13 linear13 AMA13 filter13 responses13 (see13 first13 term13 in13 Eq13 5)13 and13 their13 pairwise13 sums13 (see13 the13 sum13 in13 the13 third13 termrsquos13 parentheses)13 can13 be13 obtained13 from13 a13 population13 of13 simple13 cells13 We13 define13 simple13 cells13 as13 units13 whose13 outputs13 result13 from13 linear13 filtering13 followed13 by13 half-shy‐wave13 rectification13 (ie13 thresholding13 at13 zero)13 Each13 AMA13 filter13 response13 in13 the13 first13 term13 can13 be13 obtained13 from13 two13 simple13 cells13 that13 are13 lsquoonrsquo13 and13 lsquooffrsquo13 versions13 of13 each13 AMA13 filter13 (Fig13 S3a13 see13 Fig13 3)13 Each13 pairwise-shy‐summed13 AMA13 filter13 response13 in13 the13 third13 term13 can13 be13 obtained13 from13 two13 simple13 cells13 that13 are13 lsquoonrsquo13 and13 lsquooffrsquo13 versions13 of13 each13 pairwise-shy‐summed13 filter13 (Fig13 S3a13 S4)13 13 13 The13 squared13 AMA13 filter13 responses13 (see13 the13 second13 and13 third13 terms13 in13 Eq13 5)13 can13 be13 obtained13 from13 a13 population13 of13 complex13 cells13 We13 define13 complex13 cells13 as13 units13 whose13 responses13 result13 from13 linear13 filtering13 followed13 by13 a13 squaring13 non-shy‐linearity13 as13 is13 often13 the13 case13 in13 cortex13 Complex13 cells13 can13 be13 implemented13 by13 summing13 and13 then13 squaring13 (or13 squaring13 and13 then13 summing)13 the13 outputs13 of13 lsquoonrsquo13 and13 lsquooffrsquo13 simple13 cells13 (Fig13 S3b13 the13 disparity13 tuning13 curves13 of13 these13 complex13 cells13 are13 given13 in13 Fig13 S5)13 13 Finally13 neurons13 whose13 response13 rates13 represent13 the13 log13 likelihood13 of13 the13 AMA13 filter13 population13 responses13 can13 be13 constructed13 via13 a13 weighted13 sum13 of13 simple13 and13 complex13 cell13 population13 responses13 (Eq13 513 Fig13 613 Fig13 7)13 (Note13 that13 the13 likelihood13 can13 be13 obtained13 by13 exponentiating13 the13 log-shy‐likelihood13 responses)13 In13 other13 words13 a13 large13 collection13 of13 log-shy‐likelihood13 (LL)13 neurons13 each13 with13 a13 different13 preferred13 disparity13 can13 be13 constructed13 from13 a13 fixed13 set13 of13 simple13 and13 complex13 cells13 simply13 by13 changing13 the13 weights13 13 13 Derivation13 of13 optimal13 weights13 on13 simple13 and13 complex13 cells13 The13 first13 term13 in13 Eq13 513 (ie13 the13 linear13 term)13 can13 be13 conceptualized13 as13 a13 weighted13 sum13 of13 simple13 cell13 population13 responses13 The13 second13 and13 third13 terms13 (ie13 the13 quadratic13 terms)13 can13 be13 conceptualized13 as13 weighted13 sums13 of13 complex13 cell13 population13 responses13 The13 constant13 corresponds13 to13 baseline13 response13 The13 constant13 and13 the13 weights13 on13 the13 linear13 squared13 and13 sum-shy‐squared13 filters13 are13 given13 by13

wik = Ck

minus1uk 13 13 13 13 13 13 13 13 13 (S1a)13

13 wiik = minusdiag(Cminus1)+ 05Cminus11 13 13 13 13 13 13 13 (S1b)13

wij k = minus05Cij

minus1 forallij j gt i 13 13 13 13 13 13 13 13 (S1c)13

cons primetk = minus05ukTCk

minus1uk + constk 13 13 13 13 13 13 (S1d)13 13 where13 I13 is13 the13 identity13 matrix13 113 is13 the13 lsquoonesrsquo13 vector13 and13 diag()13 sets13 a13 matrix13 diagonal13 to13 a13 vector13 Here13 we13 derive13 these13 weight13 equations13 based13 on13 the13 multi-shy‐dimensional13 Gaussian13 approximations13 to13 the13 empirically13 determined13 filter13 response13 distributions13 (ie13 conditional13 response13 distributions)13 (see13 Fig13 4a)13 13 13 13 13 13

213 13

First13 we13 rewrite13 Eq13 513 in13 the13 main13 text13 by13 expanding13 its13 third13 term13 13

ln p R δ k( ) = wik Ri

i=1

n

sum + wiik Ri2

i=1

n

sum + wij k Ri2 + 2Ri Rj + Rj

2( )j=i+1

n

sumi=1

nminus1

sum + cons primet 13 (S2)13

13 Second13 we13 rewrite13 Eq13 413 in13 the13 main13 text13

13

ln p R δ k( ) = minus05 R minus uk( )T Ak R minus uk( )+ const 13 13 13 13 13 (S3)13

13 where13 Ak 13 equals13 the13 inverse13 covariance13 matrix13 Ck

minus1 13 (Recall13 that13 the13 mean13 vector13 uk and13 covariance13 matrix13 Ck are13 obtained13 by13 projecting13 a13 collection13 of13 natural13 stereo-shy‐images13 having13 disparity13 δ k 13 onto13 the13 filters)13 From13 here13 forward13 the13 disparity13 index13 k13 is13 dropped13 for13 notational13 simplicity13 Third13 we13 set13 Eqs13 S213 and13 S313 equal13 to13 one13 another13 multiply13 through13 and13 collect13 terms13 The13 terms13 of13 the13 log13 likelihood13 (Eq13 S3)13 are13 found13 to13 have13 the13 form13 13

( ) ( )2 2 205 05 2ii i i ii i ii i i ii ia R u a R a Ru a uminus minus = minus minus + 13 13 13 13 (S4a)13

minusaij Ri minus ui( ) Rj minus u j( ) = minus aijRiRj minus aijRiu j minus aijRjui + aiju jui( ) 13 13 13 (S4b)13

13 where13 iia 13 are13 the13 on-shy‐diagonal13 elements13 of13 A 13 ija are13 the13 off-shy‐diagonal13 elements13 and13 iu 13 the13 are13 the13 mean13 filter13 responses13 13 13 The13 aim13 is13 now13 to13 express13 wi wii 13 and13 wij 13 in13 terms13 of13 iu 13 iia 13 and13 ija 13 First13 consider13 wi 13 In13 this13 case13

we13 want13 to13 collect13 terms13 containing13 iR 13 wiRi 13 from13 Eq13 S213 and13 aiiRiui 13 and13 aijRiu j 13 from13 Eq13 S4a13 Setting13 the13 iR 13 terms13 equal13 and13 canceling13 gives13 13

wi = aiju jj=1

n

sum 13 13 13 13 13 13 13 13 13 (S5a)13

13 Note13 that13 the13 weights13 on13 the13 linear13 filter13 responses13 wi 13 are13 zero13 if13 the13 mean13 linear13 filter13 responses13 ui 13 are13 zero13 In13 our13 case13 the13 filter13 response13 distributions13 means13 (ie13 the13 means13 of13 the13 conditional13 response13 distributions)13 are13 all13 near13 zero13 (see13 Fig13 4a13 Fig13 S2)13 Thus13 the13 weights13 on13 the13 simple13 cell13 responses13 (Fig13 6a)13 contribute13 little13 to13 the13 units13 (LL13 neurons)13 that13 respond13 according13 to13 the13 log13 likelihood13 of13 the13 linear13 filter13 population13 response13 (see13 Discussion)13 13 Next13 consider13 wij 13 i ne j 13 In13 this13 case13 we13 want13 to13 collect13 terms13 containing13 RiRj 13 2wijRiRj 13 appears13

in13 Eq13 S213 and13 minusaijRiRj 13 appears13 in13 Eq13 S4b13 Setting13 the13 two13 RiRj 13 terms13 equal13 to13 each13 other13 and13 canceling13 gives13 13

wij = minus05aij 13 13 13 13 13 13 13 13 13 (S5b)13

313 13

Next13 consider13 wii 13 In13 this13 case13 the13 total13 weight13 on13 2iR 13 in13 Eq13 S213 must13 be13 05 iiaminus 13 The13 weight13 from13

the13 third13 term13 of13 Eq13 S213 can13 be13 found13 by13 substituting13 wij = minus05aij 13 Subtracting13 that13 weight13 from13 the13

total13 weight13 specified13 by13 Eq13 S413 gives13 wii 13 13

wii = minusaii + 05 aijjsum 13 13 13 13 13 13 13 13 13 (S5c)13

13 Last13 by13 grouping13 terms13 from13 Eq13 S413 that13 are13 independent13 of13 the13 filter13 response13 we13 find13 the13 constant13 in13 Eq13 S213 is13 given13 by13 13

cons primet = minus05 aiimicroi2 + aijmicroimicro j

j foralljneisum

⎝⎜⎞

⎠⎟isum + const 13 13 13 13 (S5d)13

13 Eqs13 S1a-shy‐d13 are13 obtained13 by13 rewriting13 Eqs13 S5a-shy‐d13 in13 matrix13 notation13 13 13 In13 visual13 cortex13 simple13 and13 complex13 cells13 have13 the13 same13 maximum13 firing13 rates13 on13 average13 (Albrecht13 amp13 Hamilton13 1982)13 However13 the13 maximum13 possible13 response13 of13 the13 squared13 filters13 (see13 the13 second13 term13 in13 Eq13 5)13 is13 R2max 13 and13 the13 maximum13 possible13 response13 of13 the13 sum-shy‐squared13 filters13 (see13 the13 third13

term13 in13 Eq13 5)13 is13 given13 by13 R2ijmax = Rmax 2+ 2fi sdot f j( )2 13 where13 fi 13 and13 f j 13 are13 the13 weighting13 functions13 defining13 filters13 i13 and13 j13 (see13 Fig13 3a)13 Thus13 if13 these13 computations13 were13 implemented13 in13 cortexmdashthat13 is13 if13 the13 maximum13 response13 of13 the13 complex13 cells13 equaled13 the13 maximum13 response13 of13 the13 simple13 cells13 Rmax 13 rather13 than13 R

2max 13 and13 R

2ijmaxmdashthe13 weights13 on13 the13 complex13 cell13 responses13 must13 be13 scaled13

Thus13 the13 optimal13 weights13 on13 complex13 cells13 are13 given13 by13 primewi = wi 13 primewii = Rmaxwii 13 and13 primewij =Rijmax2

Rmaxwij 13

These13 scale13 factors13 are13 important13 because13 in13 a13 neurophysiological13 experiment13 one13 would13 measure13 primew 13 not13 w 13 Consequently13 primewii 13 and13 primewij 13 are13 the13 normalized13 weights13 plotted13 in13 Fig13 7b13

13 Evaluating13 the13 effect13 of13 surface13 slant13 Slant13 causes13 disparity13 to13 change13 across13 the13 patch13 We13 evaluated13 the13 effect13 of13 surface13 slant13 by13 determining13 the13 optimal13 filters13 and13 estimation13 performance13 for13 a13 training13 set13 with13 a13 distribution13 of13 slants13 If13 we13 assume13 (plausibly)13 that13 objects13 are13 on13 average13 equally13 likely13 to13 be13 viewed13 from13 any13 given13 angle13 then13 for13 a13 given13 small13 angular13 retinal13 extent13 the13 probability13 of13 a13 given13 slant13 about13 the13 vertical13 axis13 should13 fall13 as13 a13 half-shy‐cosine13 function13 Slant13 was13 calculated13 for13 the13 cyclopean13 eye13 To13 evaluate13 the13 effect13 of13 surface13 slant13 variation13 we13 generated13 a13 training13 set13 of13 stereo-shy‐images13 from13 a13 set13 of13 surfaces13 with13 slants13 that13 were13 randomly13 drawn13 from13 a13 truncated13 (+7113 deg)13 cosine13 probability13 density13 (more13 extreme13 slants13 were13 not13 practical13 to13 include13 because13 of13 the13 surface13 size13 required13 to13 produce13 a13 projected13 patch13 width13 of13 113 deg)13 The13 within-shy‐patch13 changes13 in13 disparity13 that13 were13 introduced13 by13 these13 slants13 are13 comparable13 to13 those13 in13 natural13 viewing13 (Fig13 S1g)13 Results13 based13 on13 this13 analysis13 are13 thus13 likely13 to13 be13 representative13 Then13 we13 determined13 the13 optimal13 binocular13 filters13 for13 this13 new13 training13 set13 The13 filters13 that13 were13 trained13 on13 surfaces13 having13 different13 slants13 (Fig13 S1a-shy‐c)13 and13 the13 LL13 neurons13 that13 are13 constructed13 from13 them13 (Fig13 S1de)13 are13 quite13 similar13 to13 those13 that13 were13 trained13 only13 on13 fronto-shy‐parallel13 surfaces13 The13 main13 (although13 slight)13 differences13 are13 that13 the13 new13 filters13 have13 spatial13 extents13 that13 are13 slightly13 smaller13 and13 octave13 bandwidths13 that13 are13 slightly13 higher13

413 13

and13 that13 the13 new13 LL13 neuron13 tuning13 curves13 are13 broader13 Disparity13 estimation13 performance13 on13 slant13 surfaces13 is13 also13 comparable13 but13 precision13 is13 somewhat13 reduced13 (Fig13 S1f13 Fig13 5d)13 When13 disparity13 changes13 across13 the13 patch13 the13 optimal13 encoding13 filters13 must13 be13 small13 enough13 to13 prevent13 the13 disparity13 signal13 from13 being13 lsquosmeared13 outrsquo13 by13 within-shy‐patch13 (ie13 within-shy‐receptive13 field)13 disparity13 variation13 And13 they13 must13 be13 large13 enough13 to13 prevent13 a13 poor13 signal-shy‐to-shy‐noise13 ratio13 which13 would13 lead13 to13 a13 preponderance13 of13 false13 matches13 (ie13 inaccurate13 disparity13 estimates)13 The13 optimal13 filters13 appear13 to13 strike13 this13 balance13 13 13 Evaluating13 the13 effect13 of13 depth13 variations13 other13 than13 slant13 Naturally13 occurring13 fine13 depth13 structure13 and13 occlusions13 (ie13 within-shy‐patch13 depth13 variation13 and13 depth13 discontinuities13 Fig13 S8a-shy‐c)13 were13 not13 present13 in13 either13 of13 our13 training13 sets13 To13 determine13 how13 the13 magnitude13 of13 these13 other13 sources13 of13 depth13 variation13 compare13 to13 that13 due13 to13 slant13 we13 analyzed13 4000013 113 deg13 patches13 from13 3013 outdoor13 range13 images13 obtained13 with13 a13 high-shy‐precision13 Riegl13 VZ40013 range13 scanner13 For13 each13 patch13 we13 computed13 the13 standard13 deviation13 of13 i)13 the13 differences13 between13 the13 raw13 range13 values13 and13 the13 best-shy‐fitting13 fronto-shy‐parallel13 plane13

σ nonminus planar 13 and13 ii)13 the13 differences13

between13 the13 best-shy‐fitting13 slanted13 and13 fronto-shy‐parallel13 planes13 σ planar 13 The13 ratio13 of13 these13 standard13

deviations13 σ planar σ nonminus planar 13 provides13 a13 measure13 of13 the13 relative13 magnitude13 of13 depth13 variation13 due13 to13

slant13 and13 variation13 due13 to13 other13 factors13 If13 the13 depth13 variation13 is13 due13 only13 to13 surface13 slant13 then13 the13 ratio13

σ planar σ nonminus planar 13 will13 have13 a13 value13 equal13 to13 1013 If13 all13 of13 the13 depth13 variation13 is13 due13 to13 factors13

other13 than13 slant13 the13 ratio13 will13 have13 a13 value13 of13 0013 Fig13 S8d13 plots13 the13 distribution13 of13 this13 ratio13 for13 all13 range13 patches13 In13 the13 most13 common13 patches13 essentially13 all13 the13 variation13 is13 captured13 by13 slant13 (ie13 the13 modal13 ratio13 is13 ~10)13 The13 median13 ratio13 is13 07513 indicating13 that13 in13 the13 median13 patch13 7513 of13 the13 depth13 variation13 is13 captured13 by13 slant13 Given13 that13 the13 optimal13 binocular13 filters13 are13 robust13 to13 variation13 in13 surface13 slant13 and13 given13 that13 most13 of13 natural13 depth13 variation13 is13 captured13 by13 slant13 it13 is13 unlikely13 that13 depth13 variation13 other13 than13 slant13 will13 qualitatively13 affect13 our13 results13 13 External13 variability13 and13 neural13 noise13 13 We13 examined13 the13 relative13 magnitude13 of13 stimulus-shy‐induced13 LL13 neuron13 variability13 and13 typical13 neural13 noise13 in13 cortex13 Typical13 cortical13 neurons13 have13 a13 peak13 mean13 response13 larger13 than13 3013 spikessec13 (Geisler13 amp13 Albrecht13 1997)13 and13 intrinsic13 noise13 variance13 proportional13 to13 the13 neuronrsquos13 mean13 response13 with13 a13 proportionality13 constant13 (ie13 Fano13 factor)13 less13 than13 15(Tolhurst13 Movshon13 amp13 Dean13 1983)13 To13 err13 on13 the13 side13 of13 too13 much13 neural13 noise13 we13 assume13 that13 each13 LL13 neuron13 has13 a13 maximum13 average13 response13 of13 3013 spkss13 and13 a13 Fano13 factor13 of13 1513 (bc13 the13 effect13 of13 noise13 is13 greater13 at13 lower13 firing13 rates)13 We13 convert13 the13 LL13 neuron13 responses13 to13 spike13 rates13 by13 first13 obtaining13 the13 average13 response13 of13 each13 LL13 neuron13 to13 its13 preferred13 disparity13 and13 then13 scaling13 that13 value13 to13 the13 maximum13 average13 response13 For13 an13 integration13 time13 of13 a13 typical13 fixation13 duration13 (30013 ms)13 stimulus-shy‐induced13 response13 variance13 was13 ~3x13 greater13 than13 the13 internal13 noise13 For13 an13 integration13 time13 equal13 to13 a13 typical13 stimulus13 presentation13 in13 a13 psychophysical13 task13 (113 sec)13 13 stimulus-shy‐induced13 response13 variance13 was13 ~9x13 greater13 than13 the13 internal13 noise13 Thus13 external13 variability13 must13 be13 considered13 to13 understand13 the13 influence13 of13 neural13 noise13 on13 the13 behavioral13 limits13 of13 vision13 under13 natural13 conditions13 In13 fact13 stimulus-shy‐induced13 variability13 is13 likely13 to13 be13 the13 controlling13 variability13 in13 many13 natural13 single-shy‐trial13 tasks13 13 Spatial13 frequency13 tuning13 of13 optimal13 linear13 binocular13 filters13 13 To13 develop13 an13 intuition13 for13 why13 spatial13 frequencies13 higher13 than13 ~613 cpd13 carry13 little13 information13 about13 disparity13 we13 examined13 the13 disparity13 signals13 resulting13 from13 a13 binocularly-shy‐viewed13 high-shy‐contrast13 cosine-shy‐windowed13 edge13 The13 Fourier-shy‐decomposition13 of13 each13 eyersquos13 stimulus13 is13 a13 set13 of13

513 13

phase-shy‐aligned13 sinewave13 gratings13 with13 the13 1f13 contrast13 fall-shy‐off13 typical13 of13 natural13 scenes(Field13 1987)13 The13 binocular13 difference13 signal13 (ie13 the13 difference13 of13 the13 left13 and13 right13 signals)13 is13 sinusoidal13 with13 contrast13 amplitude13 given13 by13 13

AB f δ k( ) = AL( f )2 + AR( f )2 minus 2AL( f )AR( f )cos 2π fδ k( ) 13 13 13 (S6)13

13 where f 13 is13 the13 spatial13 frequency13 δ k 13 is13 a13 particular13 disparity13 AL ( f ) 13 and13 AR( f ) 13 are13 the13 left13 and13

right13 eye13 retinal13 amplitudes13 and13 AB f δ k( ) 13 is13 the13 amplitude13 of13 the13 binocular13 contrast13 difference13 signal13 If13 the13 two13 eyes13 have13 identical13 optics13 the13 left-shy‐13 and13 right-shy‐eye13 retinal13 amplitudes13 (which13 include13 the13 effects13 of13 the13 optics13 and13 the13 1 f 13 falloff13 of13 natural13 image13 spectra)13 will13 be13 the13 same13 for13 a13 given13 disparity13 Fig13 S7a13 plots13 pattern13 of13 binocular13 contrast13 signals13 for13 each13 of13 several13 disparities13 Fig13 S7b13 shows13 the13 signals13 after13 filtering13 with13 a13 bank13 of13 log-shy‐Gabor13 filters13 each13 having13 a13 1513 octave13 bandwidth13 The13 patterns13 of13 binocular13 contrast13 barely13 differ13 above13 ~613 cpd13 A13 filter13 population13 (for13 disparity13 estimation)13 will13 be13 most13 efficient13 if13 each13 filter13 signals13 information13 about13 a13 range13 of13 disparities13 13 13 V113 binocular13 receptive13 fields13 tuned13 to13 the13 highest13 useful13 spatial13 frequency13 (~613 cpd)13 would13 have13 a13 spatial13 extent13 of13 ~813 arcmin13 assuming13 the13 typical13 1513 octave13 bandwidth13 of13 neurons13 in13 cortex13 Given13 that13 receptive13 fields13 cannot13 signal13 disparity13 variation13 finer13 than13 their13 own13 width13 this13 analysis13 may13 help13 explain13 why13 humans13 lose13 the13 ability13 to13 detect13 disparity13 modulations13 (ie13 sinusoidal13 modulations13 in13 disparity-shy‐defined13 depth)13 at13 spatial13 frequencies13 higher13 than13 ~413 cpd13 (Banks13 Gepshtein13 amp13 Landy13 200413 Harris13 McKee13 amp13 Smallman13 1997)13 Vergence13 eye13 movement13 jitter13 has13 previously13 been13 proposed13 as13 an13 explanation13 for13 why13 useful13 binocular13 information13 is13 restricted13 to13 low13 spatial13 frequencies13 (Vlaskamp13 Yoon13 amp13 Banks13 2011)13 vergence13 noise13 can13 degrade13 stereo13 information13 enough13 to13 render13 high13 frequency13 signals13 unmeasurable13 (Fig13 S7cd)13 The13 analysis13 presented13 here13 suggests13 that13 the13 frequency13 at13 which13 humans13 lose13 the13 ability13 to13 detect13 disparity13 modulation13 may13 simply13 be13 explained13 by13 the13 statistics13 of13 natural13 images13 13 Optics13 The13 retinal13 images13 used13 in13 our13 analysis13 were13 simulated13 to13 match13 the13 retinal13 image13 formation13 that13 occurs13 in13 the13 human13 eye13 In13 humans13 and13 non-shy‐human13 primates13 accommodative13 and13 vergence13 eye-shy‐movement13 systems13 are13 yoked(Fincham13 amp13 Walton13 1957)13 Consistent13 with13 this13 fact13 we13 set13 the13 focus13 distance13 equal13 to13 the13 fixation13 distance13 of13 4013 cm13 and13 defocused13 the13 images13 appropriate13 for13 each13 disparity13 The13 image13 of13 a13 surface13 displaced13 from13 the13 fixation13 is13 defocused13 by13

13 ΔD cong δ IPD 13 13 13 13 13 13 13 13 13 13 (S7)13

13 where13 δ 13 is13 the13 disparity13 of13 the13 surface13 expressed13 in13 radians13 IPD13 is13 the13 inter-shy‐pupillary13 distance13 expressed13 in13 meters13 and13 ΔD 13 is13 the13 defocus13 expressed13 in13 diopters13 (1meters)13 To13 account13 for13 the13 effect13 of13 chromatic13 aberration13 we13 created13 a13 different13 polychromatic13 point-shy‐spread13 function13 (PSF)13 for13 each13 disparity13 (ie13 defocus13 Eq13 S7)13 Single-shy‐wavelength13 PSFs13 were13 computed13 every13 1013 nm13 between13 40013 and13 70013 nm13 The13 wavelength-shy‐dependent13 change13 in13 refractive13 power13 of13 the13 human13 eye13 was13 taken13 from13 the13 literature13 (Thibos13 Ye13 Zhang13 amp13 Bradley13 1992)13 Polychromatic13 PSFs13 were13 obtained13 by13 weighting13 the13 single-shy‐wavelength13 PSFs13 by13 the13 photopic13 sensitivity13 function13 (Stockman13 amp13 Sharpe13 2000)13 and13 by13 the13 D6513 daylight13 illumination13 spectrum13 and13 then13 summing13 13

613 13

13 psf photopic xΔD( ) = 1

Kpsf xλΔD( )sc

λsum λ( )D65 λ( ) 13 13 13 13 (S8)13

13 where13 sc λ( ) 13 is13 the13 human13 photopic13 sensitivity13 function13 and13 K13 is13 a13 normalizing13 constant13 that13 sets13 the13 volume13 of13 psf photopic 13 to13 1013 The13 modulation13 transfer13 function13 (MTF)13 is13 the13 amplitude13 of13 the13 Fourier13 transform13 of13 the13 PSF13 13 Contrast13 normalization13 Contrast13 normalization13 occurs13 early13 in13 visual13 processing13 The13 standard13 model13 of13 cortical13 neuron13 response13 assumes13 that13 the13 input13 is13 the13 contrast13 signal13

c x( ) = r x( )minus r( ) r 13 where13 r x( ) 13 are13 the13

noisy13 sensor13 responses13 to13 the13 luminance13 image13 falling13 on13 the13 retina13 over13 a13 local13 area13 and13 r 13 is13 the13 local13 mean13 This13 contrast13 signal13 is13 then13 normalized13 by13 the13 local13 contrast13

cnorm x( ) = c x( ) c x( ) 2

+ nc502 13 where13 n13 is13 the13 dimensionality13 of13 the13 vector13 c x( ) 2 13 is13 the13 contrast13

energy13 (or13 equivalently13 n 13 times13 the13 squared13 RMS13 contrast)13 and13 50c 13 is13 the13 half-shy‐saturation13 constant13 (Albrecht13 amp13 Geisler13 199113 Albrecht13 amp13 Hamilton13 198213 Heeger13 1992)13 (We13 are13 agnostic13 about13 the13 mechanism13 by13 which13 the13 contrast13 normalization13 is13 achieved)13 Finally13 the13 normalized13 contrast13 signal13 is13 weighted13 by13 the13 cortical13 neuronrsquos13 receptive13 field13 and13 passed13 through13 a13 static13 non-shy‐linearity13 For13 example13 in13 the13 case13 of13 a13 binocular13 simple13 cell13 the13 response13 of13 the13 neuron13 is13 obtained13 by13 taking13 the13 dot13 product13 of13 the13 contrast-shy‐normalized13 signal13 with13 the13 receptive13 field13 and13 half-shy‐wave13 rectifying13

Ri = Rmax cnorm x( ) sdot fi x( )⎢⎣ ⎥⎦ 13 13

13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13

713 13

13 References13 Albrecht13 D13 G13 amp13 Geisler13 W13 S13 (1991)13 Motion13 selectivity13 and13 the13 contrast-shy‐response13 function13 of13

simple13 cells13 in13 the13 visual13 cortex13 Visual13 Neuroscience13 7(6)13 531ndash54613 Albrecht13 D13 G13 amp13 Hamilton13 D13 B13 (1982)13 Striate13 cortex13 of13 monkey13 and13 cat13 contrast13 response13

function13 Journal13 of13 Neurophysiology13 48(1)13 217ndash23713 Banks13 M13 S13 Gepshtein13 S13 amp13 Landy13 M13 S13 (2004)13 Why13 is13 spatial13 stereoresolution13 so13 low13 The13

Journal13 of13 Neuroscience13 24(9)13 2077ndash208913 13 Field13 D13 J13 (1987)13 Relations13 between13 the13 statistics13 of13 natural13 images13 and13 the13 response13 properties13 of13

cortical13 cells13 Journal13 of13 the13 Optical13 Society13 of13 America13 A13 4(12)13 2379ndash239413 Fincham13 E13 amp13 Walton13 J13 (1957)13 The13 reciprocal13 actions13 of13 accommodation13 and13 convergence13 The13

Journal13 of13 Physiology13 137(3)13 488ndash50813 Geisler13 W13 S13 amp13 Albrecht13 D13 G13 (1997)13 Visual13 cortex13 neurons13 in13 monkeys13 and13 cats13 detection13

discrimination13 and13 identification13 Visual13 Neuroscience13 14(5)13 897ndash91913 Harris13 J13 M13 McKee13 S13 P13 amp13 Smallman13 H13 S13 (1997)13 Fine-shy‐scale13 processing13 in13 human13 binocular13

stereopsis13 Journal13 of13 the13 Optical13 Society13 of13 America13 A13 14(8)13 1673ndash168313 Heeger13 D13 J13 (1992)13 Normalization13 of13 cell13 responses13 in13 cat13 striate13 cortex13 Visual13 Neuroscience13 9(2)13

181ndash19713 Hibbard13 P13 B13 (2008)13 Binocular13 energy13 responses13 to13 natural13 images13 Vision13 Research13 48(12)13 1427ndash

143913 13 Stockman13 A13 amp13 Sharpe13 L13 T13 (2000)13 The13 spectral13 sensitivities13 of13 the13 middle-shy‐13 and13 long-shy‐wavelength-shy‐

sensitive13 cones13 derived13 from13 measurements13 in13 observers13 of13 known13 genotype13 Vision13 Research13 40(13)13 1711ndash173713

Thibos13 L13 N13 Ye13 M13 Zhang13 X13 amp13 Bradley13 A13 (1992)13 The13 chromatic13 eye13 a13 new13 reduced-shy‐eye13 model13 of13 ocular13 chromatic13 aberration13 in13 humans13 Applied13 Optics13 31(19)13 3594ndash360013

Tolhurst13 D13 J13 Movshon13 J13 A13 amp13 Dean13 A13 F13 (1983)13 The13 statistical13 reliability13 of13 signals13 in13 single13 neurons13 in13 cat13 and13 monkey13 visual13 cortex13 Vision13 Research13 23(8)13 775ndash78513

Vlaskamp13 B13 N13 S13 Yoon13 G13 amp13 Banks13 M13 S13 (2011)13 Human13 stereopsis13 is13 not13 limited13 by13 the13 optics13 of13 the13 well-shy‐focused13 eye13 The13 Journal13 of13 Neuroscience13 31(27)13 9814ndash981813

13

813 13

13 Figure13 S113 Optimal13 filters13 and13 disparity13 estimation13 performance13 for13 estimating13 disparity13 for13 surfaces13 with13 a13 cosine13 distribution13 of13 slants13 (a)13 Optimal13 linear13 binocular13 filters13 for13 estimating13 the13 disparity13 (of13 the13 center13 pixel)13 of13 slanted13 surfaces13 The13 filters13 were13 learned13 on13 a13 training13 set13 having13 surfaces13 with13 a13 cosine13 distribution13 of13 slants13 The13 optimal13 filter13 shapes13 are13 robust13 to13 the13 presence13 of13 non-shy‐zero13 surface13 slants13 in13 the13 training13 set13 However13 these13 filters13 are13 of13 somewhat13 smaller13 spatial13 extent13 and13 slightly13 higher13 octave13 bandwidth13 (compare13 filters13 F3-shy‐F513 to13 those13 in13 Fig13 3)13 (b)13 Spatial13 frequency13 tuning13 and13 bandwidth13 of13 filters13 (c)13 Two-shy‐dimensional13 analogs13 of13 filters13 in13 (a)13 The13 two-shy‐dimensional13 filters13 respond13 virtually13 identically13 to13 two-shy‐dimensional13 images13 as13 do13 the13 one-shy‐dimensional13 filters13 to13 one-shy‐dimensional13 signals13 (d)13 Tuning13 curves13 of13 LL13 neurons13 trained13 and13 tested13 on13 surfaces13 with13 varying13 surface13 slant13 (e)13 Normalized13 weights13 on13 model13 complex13 cell13 responses13 for13 constructing13 five13 LL13 neurons13 with13 the13 indicated13 preferred13 disparities13 (f)13 Disparity13 estimation13 performance13 with13 surfaces13 having13 a13 cosine13 distribution13 of13 slants13 (black13 line)13 Also13 shown13 are13 disparity13 estimates13 derived13 from13 a13 local13 cross-shy‐correlator13 (gray13 line)13 (g)13 Disparity13 gradients13 from13 viewing13 natural13 scenes(Hibbard13 2008)13 vs13 disparity13 gradients13 in13 our13 training13 set13 The13 relationship13 between13 slant13 and13 disparity13 gradient13 is13 given13 by13

g asymp IPD tanθ( ) d 13 where13 θ 13 is13 the13 surface13 slant13 and13 d13 is13 the13 distance13 to13 the13 surface13 Hibbard13 (2008)13 estimated13 disparity13 gradients13 in13 natural13 viewing13 for13 distances13 nearer13 than13 3513 m13 (dashed13 curve)13 and13 0513 m13 (dotted13 curve)13 At13 the13 04m13 viewing13 distance13 assumed13 throughout13 the13 paper13 our13 manipulation13 of13 surface13 slant13 introduced13 distribution13 of13 disparity13 gradients13 (black13 line)13 that13 is13 comparable13 to13 the13 gradients13 introduced13 in13 natural13 viewing13 13

913 13

13 Figure13 S213 Filter13 responses13 LL13 neuron13 responses13 and13 performance13 with13 random-shy‐dot13 stereograms13 and13 anti-shy‐correlated13 random-shy‐dot13 stereograms13 (a)13 Joint13 response13 distributions13 of13 filters13 F113 and13 F213 when13 stimulated13 with13 random-shy‐dot13 stereograms13 Compare13 to13 natural13 stimuli13 in13 Fig13 4a13 (b)13 Joint13 responses13 of13 filters13 F313 and13 F413 when13 stimulated13 with13 random-shy‐dot13 stereograms13 (c)13 LL13 neuron13 responses13 to13 random13 dot13 stereograms13 (d)13 LL13 neuron13 tuning13 vs13 bandwidth13 when13 stimulated13 with13 natural13 stimuli13 Note13 that13 bandwidths13 are13 elevated13 compared13 to13 when13 the13 LL13 neurons13 are13 stimulated13 with13 natural13 stimuli13 (see13 Fig13 7c)13 (e)13 Disparity13 estimates13 and13 estimate13 precision13 (inset)13 obtained13 with13 random-shy‐dot13 stereograms13 (fghi)13 Same13 as13 abce13 except13 that13 the13 stimuli13 were13 anti-shy‐correlated13 random13 dot13 stereograms13 Filter13 response13 distributions13 still13 segregate13 as13 a13 function13 of13 disparity13 (especially13 with13 F313 and13 F4)13 However13 the13 anti-shy‐correlated13 filter13 response13 distributions13 do13 not13 coincide13 with13 the13 lsquoexpectedrsquo13 pattern13 of13 response13 for13 particular13 disparities13 The13 disparities13 are13 decoded13 improperly13 with13 a13 decoder13 trained13 on13 natural13 stimuli13 LL13 neuron13 tuning13 curves13 with13 anti-shy‐correlated13 stereograms13 are13 neither13 selective13 nor13 invariant13 Inaccurate13 highly13 variable13 estimates13 result13 13 13

13 13

1013 13

13 Figure13 S313 Obtaining13 AMA13 filter13 and13 model13 complex13 cell13 responses13 with13 established13 neural13 operations13 (a)13 Simple13 cell13 responses13 are13 obtained13 by13 projecting13 the13 left-shy‐13 and13 right-shy‐eye13 signals13 onto13 each13 binocular13 filter13 and13 half-shy‐wave13 rectifying13 Subtracting13 lsquoonrsquo13 and13 lsquooffrsquo13 simple13 cell13 responses13 give13 the13 AMA13 filter13 responses13 (see13 Fig13 3a)13 (b)13 Model13 (and13 neurophysiological)13 complex13 cell13 responses13 are13 obtained13 by13 summing13 and13 squaring13 the13 responses13 of13 simple13 cells13 (Model13 complex13 cell13 responses13 can13 also13 be13 obtained13 by13 summing13 the13 responses13 of13 simple13 cells13 that13 have13 both13 a13 half-shy‐wave13 rectifying13 and13 a13 squaring13 nonlinearity)13 Note13 that13 the13 model13 complex13 cells13 differ13 from13 canonical13 disparity13 energy13 binocular13 complex13 cells13 We13 label13 them13 lsquocomplexrsquo13 because13 they13 exhibit13 temporal13 frequency13 doubling13 and13 have13 a13 fundamental13 to13 DC13 response13 ratio13 of13 less13 than13 one13 The13 model13 LL13 neurons13 (see13 Fig13 613 and13 Fig13 7)13 are13 more13 similar13 to13 disparity13 energy13 complex13 cells13 although13 as13 noted13 in13 the13 text13 there13 are13 also13 important13 differences13

13

1113 13

13 Figure13 S413 The13 eight13 AMA13 filters13 from13 Fig13 313 (on13 the13 diagonal)13 and13 their13 pairwise13 sums13 (see13 Fig13 S3a)13 Solid13 and13 dashed13 curves13 indicate13 the13 left13 and13 right13 eye13 filter13 components13 respectively13 Filter13 identity13 is13 indicated13 by13 the13 subscripts13 Fij13 Some13 filters13 have13 shapes13 similar13 to13 canonical13 V1-shy‐like13 binocular13 receptive13 fields13 These13 filters13 yield13 model13 complex13 cell13 tuning13 curves13 that13 vary13 with13 disparity13 (Fig13 S5)13 Other13 filters13 are13 irregularly13 shaped13 (eg13 F36)13 Irregularly13 shaped13 filters13 yield13 model13 complex13 cell13 tuning13 curves13 that13 do13 not13 vary13 significantly13 with13 disparity13 (Fig13 S5)13 The13 inset13 shows13 the13 largest13 absolute13 normalized13 weight13 (see13 Supplement13 Eqs13 S1a-shy‐d)13 given13 to13 each13 filter13 in13 constructing13 the13 set13 of13 LL13 neurons13 (Fig13 7a)13 Lighter13 colors13 indicate13 smaller13 weights13 Unsurprisingly13 the13 irregularly13 shaped13 filters13 (eg13 F36)13 play13 a13 generally13 smaller13 role13 in13 shaping13 the13 response13 properties13 of13 log-shy‐likelihood13 (LL)13 neurons13 13 13

1213 13

13 Figure13 S513 Disparity13 tuning13 curves13 of13 the13 model13 complex13 cells13 implied13 by13 the13 AMA13 filters13 (on-shy‐diagonal)13 and13 their13 normalized13 pairwise13 sums13 (off-shy‐diagonal)13 Solid13 curves13 show13 the13 complex13 cell13 tuning13 curves13 to13 natural13 stimuli13 The13

complex13 response13 is13 given13 by13 filtering13 the13 contrast-shy‐normalized13 stimulus13 and13 then13 squaring13 Rmax fi

Tcnorm( )2 13 (see13 Fig13 S4)13 Each13 point13 on13 the13 curve13 represents13 the13 average13 response13 to13 a13 large13 number13 of13 natural13 stimuli13 having13 the13 same13 disparity13 Gray13 area13 shows13 response13 variability13 due13 to13 variation13 in13 natural13 image13 content13 Dashed13 curves13 show13 the13 complex13 cell13 tuning13 curve13 to13 513 random-shy‐dot13 stereograms13 (RDSs)13 In13 most13 cases13 RDS13 tuning13 curves13 are13 similar13 to13 the13 tuning13 curves13 for13 natural13 stimuli13 although13 response13 magnitude13 is13 generally13 somewhat13 attenuated13 These13 individual13 model13 complex13 cells13 have13 some13 selectivity13 for13 disparity13 However13 they13 exhibit13 little13 invariance13 to13 variation13 in13 natural13 image13 content13 Thus13 the13 pattern13 of13 the13 joint13 population13 response13 must13 be13 used13 to13 accurately13 estimate13 disparity13 Neurons13 that13 signal13 the13 log-shy‐likelihood13 of13 disparity13 (LL13 neurons)13 can13 be13 constructed13 by13 appropriately13 combining13 the13 joint13 responses13 of13 model13 complex13 cells13 LL13 neurons13 are13 highly13 selective13 for13 disparity13 and13 are13 also13 strongly13 response13 invariant13 to13 irrelevant13

1313 13

13 Figure13 S613 Optimal13 disparity13 estimation13 vs13 estimation13 via13 local13 cross-shy‐correlation13 Performance13 measures13 were13 obtained13 by13 testing13 each13 method13 on13 identical13 sets13 of13 left-shy‐13 and13 right-shy‐eye13 signals13 like13 those13 shown13 in13 Fig13 1d13 (Note13 that13 this13 differs13 somewhat13 from13 the13 most13 common13 test13 of13 the13 cross-shy‐correlation13 model13 which13 gives13 the13 algorithm13 windowed13 access13 to13 many13 patches13 in13 each13 eye13 We13 chose13 the13 current13 method13 so13 that13 each13 algorithm13 has13 access13 to13 the13 same13 image13 information13 thereby13 making13 the13 comparison13 meaningful)13 Black13 curves13 indicate13 estimation13 performance13 via13 local13 cross-shy‐correlation13 Gray13 curves13 indicate13 performance13 obtained13 with13 the13 eight13 AMA13 filters13 shown13 in13 the13 main13 text13 (data13 replotted13 from13 Fig13 5)13 (a)13 Median13 MAP13 estimates13 for13 frontoparallel13 surfaces13 only13 Error13 bars13 indicate13 6813 confidence13 intervals13 (b)13 Estimate13 precision13 as13 indicated13 by13 6813 confidence13 intervals13 Error13 bars13 from13 (a)13 are13 replotted13 on13 a13 linear13 axis13 (Confidence13 intervals13 were13 plotted13 on13 a13 semi-shy‐log13 plot13 in13 the13 main13 text)13 (c)13 Sign13 identification13 performance13 with13 frontoparallel13 surfaces13 (de)13 Same13 as13 bc13 except13 that13 the13 performance13 is13 for13 surfaces13 with13 a13 cosine13 distribution13 of13 slants13 Estimation13 performance13 with13 the13 cross-shy‐correlator13 is13 (slightly)13 more13 biased13 and13 is13 significantly13 less13 precise13 at13 disparities13 larger13 than13 7513 arcmin13 13

1413 13

13 Figure13 S713 Information13 for13 disparity13 estimation13 is13 concentrated13 in13 low13 to13 mid13 spatial13 frequencies13 (a)13 Inter-shy‐ocular13 retinal13 contrast13 amplitude13 of13 a13 cosine-shy‐windowed13 high-shy‐contrast13 edge13 as13 a13 function13 of13 spatial13 frequency13 for13 different13 disparities13 (Eq13 S6)13 The13 optics13 in13 the13 two13 eyes13 were13 identical13 ie13

AL f( ) = AR f( )prop MTFδ f( ) f 13 where13 MTFδ is13 the13 modulation13 transfer13 function13 associated13 with13 a13 particular13 disparity13 (see13 Methods)13 (b)13 Inter-shy‐ocular13 contrast13 signals13 that13 have13 been13 passed13 through13 a13 bank13 of13 1513 octave13 bandwidth13 log-shy‐Gabor13 filters13 The13 signals13 differ13 little13 above13 ~513 cpd13 (c)13 The13 effect13 of13 vergence13 jitter13 on13 inter-shy‐ocular13 contrast13 signals13 Signals13 were13 corrupted13 by13 213 arcmin13 of13 vergence13 jitter13 a13 typical13 value13 in13 humans13 Vergence13 noise13 was13 simulated13 by13 convolving13 each13 disparity13 signal13 with13 a13 Gaussian13 having13 a13 standard13 deviation13 of13 213 arcmin13 Vergence13 jitter13 further13 attenuates13 high13 frequency13 inter-shy‐ocular13 contrast13 signals13 Natural13 image13 structure13 and13 vergence13 jitter13 both13 contribute13 to13 the13 lack13 of13 disparity13 information13 at13 high13 luminance13 spatial13 frequencies13 (d)13 The13 signals13 in13 (c)13 after13 having13 been13 passed13 through13 a13 bank13 of13 1513 octave13 bandwidth13 log-shy‐Gabor13 filters13 (e)13 The13 effect13 of13 a13 four13 diopter13 difference13 in13 refractive13 power13 between13 the13 two13 eyes13 drastically13 reduces13 the13 inter-shy‐ocular13 contrast13 signal13 A13 four13 diopter13 (or13 larger)13 difference13 between13 the13 left13 and13 right13 optics13 is13 a13 primary13 risk13 factor13 for13 the13 development13 of13 amblyopia13 (Levi13 et13 al13 2011)13 Note13 that13 these13 signals13 were13 simulated13 for13 eyes13 with13 2mm13 pupils13 With13 larger13 pupil13 diameters13 the13 same13 difference13 in13 refractive13 power13 would13 further13 reduce13 available13 disparity13 signals13 For13 example13 with13 a13 2mm13 pupil13 an13 813 diopter13 difference13 between13 the13 eyes13 produces13 the13 same13 difference13 in13 defocus13 blur13 as13 would13 a13 4mm13 pupil13 with13 a13 413 diopters13 difference13 between13 the13 eyes13 In13 that13 case13 there13 are13 no13 measureable13 signals13 above13 113 cpd13 (f)13 The13 signals13 in13 (e)13 after13 having13 been13 passed13 through13 a13 bank13 of13 1513 octave13 bandwidth13 log-shy‐Gabor13 filters13 (g)13 Same13 as13 (e)13 but13 with13 a13 4mm13 pupil13 (h)13 Same13 as13 (f)13 but13 with13 a13 4mm13 pupil13 13

13

1513 13

13 Figure13 S813 Depth13 structure13 not13 attributable13 to13 slant13 in13 natural13 scenes13 (a)13 A13 representative13 natural13 range13 image13 (b)13 Overhead13 plan13 view13 of13 the13 range13 along13 the13 white13 scan-shy‐line13 in13 (a)13 (c)13 Range13 values13 within13 a13 113 deg13 patch13 marked13 by13 the13 white13 box13 in13 (a)13 and13 the13 black13 box13 in13 (b)13 The13 dots13 represent13 the13 raw13 range13 values13 The13 plane13 is13 the13 best13 fitting13 plane13 to13 the13 range13 data13 (d)13 Depth13 variation13 in13 113 deg13 patches13 of13 natural13 scenes13 can13 largely13 be13 captured13 by13 slanted13 planes13 (see13 supplementary13 text)13 Values13 near13 one13 indicate13 that13 all13 range13 variation13 can13 be13 captured13 by13 (planar)13 surface13 slant13 Values13 near13 zero13 indicate13 that13 none13 of13 the13 range13 variation13 can13 be13 captured13 by13 planar13 surface13 slant13 In13 the13 most13 common13 patch13 nearly13 all13 depth13 variation13 is13 attributable13 to13 slant13 In13 the13 median13 patch13 7513 of13 depth13 variation13 is13 captured13 by13 slant13 13 13 13 13 13 13 13 13 13 13

1613 13

13 Figure13 S913 Contrast13 normalization13 and13 the13 Gaussian13 form13 of13 the13 filter13 response13 distributions13 The13 standard13 model13 of13

contrast13 normalization13 in13 retinal13 and13 cortical13 neurons13 is13 given13 by13 cnorm x( ) = c x( ) c x( ) 2

+ nc502 (Albrecht13 amp13 Geisler13

199113 Albrecht13 amp13 Hamilton13 198213 Heeger13 1992)13 Throughout13 the13 paper13 we13 assumed13 that13 c50 13 had13 a13 value13 of13 zero13 To13 examine13 the13 effect13 of13 different13 values13 of13 c50 13 on13 the13 form13 of13 the13 filter13 response13 distributions13 we13 first13 contrast-shy‐normalized13 the13 test13 stimuli13 using13 different13 values13 of13 c50 13 Then13 we13 projected13 the13 contrast-shy‐normalized13 the13 test13 stimuli13 from13 each13 disparity13 level13 onto13 the13 filters13 to13 obtain13 the13 conditional13 filter13 response13 distributions13 Finally13 we13 fit13 a13 multi-shy‐dimensional13 generalized13 Gaussian13 (one-shy‐dimension13 for13 each13 filter)13 to13 each13 conditional13 response13 distribution13

P R |δ k( ) 13 via13 maximum13 likelihood13 estimation13 (a)13 When13 c50 =13 0013 the13 conditional13 response13 distributions13 have13 tails13 that13 tend13 to13 be13 somewhat13 lighter13 than13 Gaussians13 (ie13 lower13 kurtosis13 than13 a13 Gaussian)13 When13 c50 =13 0113 the13 conditional13 response13 distributions13 are13 most13 Gaussian13 on13 average13 For13 c50 0113 the13 distributions13 have13 heavier13 tails13 than13 Gaussians13 (ie13 higher13 kurtosis13 than13 a13 Gaussian)13 (b)13 Powers13 of13 the13 multi-shy‐dimensional13 generalized13 Gaussians13 that13 best13 fit13 a13 subset13 of13 the13 two-shy‐dimensional13 conditional13 filter13 response13 distributions13 A13 power13 of13 two13 (dashed13 line)13 indicates13 that13 the13 best13 fitting13 distribution13 was13 Gaussian13 Powers13 greater13 than13 or13 less13 than13 two13 indicate13 that13 the13 responses13 are13 better13 fit13 by13 distributions13 having13 lighter13 or13 heavier13 tails13 than13 a13 Gaussian13 respectively13 The13 best-shy‐fitting13 powers13 are13 shown13 for13 three13 different13 values13 of13 c50 13 00013 01013 and13 02513 The13 average13 powers13 across13 the13 conditional13 response13 distributions13 for13 the13 three13 tested13 values13 of13 c50 13 were13 27913 20713 and13 11713 respectively13 The13 same13 qualitative13 results13 hold13 for13 the13 other13 two-shy‐dimensional13 response13 distributions13 The13 pattern13 also13 holds13 for13 higher-shy‐dimensional13 filter13 response13 distributions13 In13 higher13 dimensions13 however13 results13 are13 noisier13 because13 stable13 estimates13 of13 kurtosis13 require13 significantly13 more13 data13 13 13 13 13 13

  • jovi-14-02-01_118
    • Introduction
    • f01
    • Methods and results
    • e01
    • f02
    • e02
    • e03
    • e04
    • f03
    • e05
    • f04
    • f05
    • f06
    • f07
    • Discussion
    • Conclusions
    • Methods details
    • e06
    • e07
    • Albrecht1
    • Albrecht2
    • Anzai1
    • Badcock1
    • Banks1
    • Berens1
    • Blakemore1
    • Burge1
    • Burge2
    • Carandini1
    • Chen1
    • Cormack1
    • Cumming1
    • Cumming2
    • Cumming3
    • Curcio1
    • DeValois1
    • DeAngelis1
    • Ferster1
    • Fincham1
    • Fleet1
    • Ganguli1
    • Geisler1
    • Girshick1
    • Haefner1
    • Haefner2
    • Harris1
    • Hecht1
    • Heeger1
    • Hibbard1
    • Hoyer1
    • Jazayeri1
    • Kaye1
    • Landers1
    • Lehky1
    • Levi1
    • Lewicki1
    • Li1
    • Liu1
    • Ma1
    • Marcos1
    • Marr1
    • McKee1
    • Nakayama1
    • Nienborg1
    • Ogle1
    • Ohzawa1
    • Ohzawa2
    • Ohzawa3
    • Olshausen1
    • Panum1
    • Poggio1
    • Qian1
    • Qian2
    • Read1
    • Read2
    • Read3
    • Simoncelli1
    • Skottun1
    • Stevenson1
    • Stockman1
    • Tanabe1
    • Tanabe2
    • Thibos1
    • Tsao1
    • Tyler1
    • Tyler2
    • Vlaskamp1
    • Williams1
    • Wyszecki1
      • JOV-03741-2013-s01
Page 3: Optimal disparity estimation in natural stereo images · The local differences between the retinal images—the binocular disparities— ... vertically oriented receptive fields.The

sensory and perceptual tasks Here we describe anapproach for determining the encoding and decodingof natural sensory stimuli required for optimalperformance in specific tasks and show that well-established cortical mechanisms can implement the

optimal computations Our results provide principledtestable hypotheses about the processing stages thatgive rise to the selective invariant neural populationsthat are thought to underlie sensory and perceptualconstancies

Methods and results

Natural disparity signals

The first step in deriving the ideal observer fordisparity estimation is to simulate the photoreceptorresponses to natural scenes (see Figure 2a) The vectorof photoreceptor responses r (see Figure 2a) isdetermined by the luminance (Figure 1b) and depthstructure of natural scenes projective viewing geome-try and the optics sensors and noise in the visionsystem We model each of these factors for the humanvisual system (see Methods details) We generate a largenumber of noisy sampled stereo-images of small (18 middot18) fronto-parallel surface patches for each of a largenumber of disparity levels within Panumrsquos fusionalrange (Panum 1858) (16875 to 16875 arcmin in1875 arcmin steps) This range covers approximately80 of disparities that occur at or near the fovea innatural viewing (Liu Bovik amp Cormack 2008) andthese image patches represent the information availableto the vision system for processing Although we focusfirst on fronto-parallel patches we later show that theresults are robust to surface slant and are likely to berobust to other depth variations occurring in naturalscenes We emphasize that in this paper we simulateretinal images by projecting natural image patches ontoplanar surfaces (Figure 1c) rather than simulatingretinal images from stereo photographs Later weevaluate the effects of this simplification (see Discus-sion)

The binocular disparity in the images entering thetwo eyes is given by

d frac14 aL aR eth1THORNwhere aL and aR are the angles between the retinalprojections of a target and fixation point in the left andright eyes This is the definition of absolute retinaldisparity The specific pattern of binocular disparitiesdepends on the distance between the eyes the distanceand direction that the eyes are fixated and the distanceand depth structure of the surfaces in the scene Weconsider a viewing situation in which the eyes areseparated by 65 cm (a typical interocular separation)and are fixated on a point 40 cm straight ahead (atypical armrsquos length) (Figure 1c)

Figure 2 Hierarchical processing steps in optimal disparity

estimation (a) The photoreceptor responses are computed for

each of many natural images for each of many different

disparities (b) The optimal filters for disparity estimation are

learned from this collection of photoreceptor responses to

natural stimuli These are the eight most useful vertically

oriented filters (receptive fields) (see also Figure 3a) Left and

right eye filter weights are shown in gray Filter responses are

given by the dot product between the photoreceptor responses

and filter weights (c) The optimal selective invariant units are

constructed from the filter responses Each unit in the

population is tuned to a particular disparity These units result

from a unique combination of the optimal filter responses (see

also Figure 7) (d) The optimal readout of the selective invariant

population response is determined Each black dot shows the

response of one of the disparity-tuned units in (c) to the

particular image shown in (a) The peak of the population

response is the optimal (MAP) estimate Note that r R and RLL

are vectors representing the photoreceptor filter and disparity-

tuned-unit population responses to particular stimuli Red

outlines represent the responses to the particular stereo-image

patch in (a) The steps in this procedure are general and will be

useful for developing ideal observers for other estimation tasks

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 3

Optimal linear binocular filters

Estimating disparity (solving the correspondenceproblem) is difficult because there often are multiplepoints in one eyersquos image that match a single point inthe other (Marr amp T Poggio 1976) Thereforeaccurate estimation requires using the local imageregion around each point to eliminate false matches Inmammals neurons with binocular receptive fields thatweight and sum over local image regions first appear inprimary visual cortex Much is known about theseneurons It is not known however why they have thespecific receptive fields they do nor is it known howtheir responses are combined for the task of estimatingdisparity

To discover the linear binocular filters that areoptimal this task (see Figure 2b) we use a Bayesianstatistical technique for dimensionality reduction calledaccuracy maximization analysis (AMA) (Geisler Na-jemnik amp Ing 2009) The method depends both on thestatistical structure of the sensory signals and on thetask to be solved This feature of the method iscritically important because information relevant forone task may be irrelevant for another For any fixednumber of filters (feature dimensions) AMA returnsthe linear filters that maximize accuracy for the specificcomputational task at hand (see Methods details)(Note that feature dimensions identified with otherdimensionality-reduction techniques like PCA and ICAmay be irrelevant for a given task) Importantly AMAmakes no a priori assumptions about the shapes of thefilters (eg there no requirement that they be orthog-onal) We applied AMA to the task of identifying thedisparity level from a discrete set of levels in a randomcollection of retinal stereo images of natural scenesThe number of disparity levels was sufficient to allowcontinuous disparity estimation from15 to 15 arcmin(see Methods details)

Before applying AMA we perform a few additionaltransformations consistent with the image processingknown to occur early in the primate visual system(Burge amp Geisler 2011) First each eyersquos noisy sampledimage patch is converted from a luminance image to awindowed contrast image c(x) by subtracting off anddividing by the mean and then multiplying by a raisedcosine of 058 at half height The window limits themaximum possible size of the binocular filters it placesno restriction on minimum size The size of the cosinewindow approximately matches the largest V1 binoc-ular receptive field sizes near the fovea (Nienborg et al2004) Next each eyersquos sampled image patch isaveraged vertically to obtain what are henceforthreferred to as left and right eye signals Verticalaveraging is tantamount to considering only verticallyoriented filters for this is the operation that verticallyoriented filters perform on images All orientations can

provide information about binocular disparity (Chen ampQian 2004 DeAngelis et al 1991) however becausecanonical disparity receptive fields are vertically ori-ented we focus our analysis on them An examplestereo pair and corresponding left and right eye signalsare shown in Figure 1d Finally the signals are contrastnormalized to a vector magnitude of 10 cnorm(x) frac14c(x)||c(x)|| This normalization is a simplified versionof the contrast normalization seen in cortical neurons(Albrecht amp Geisler 1991 Albrecht amp Hamilton 1982Heeger 1992) (see Supplement) Seventy-six hundrednormalized left and right eye signals (400 NaturalInputs middot 19 Disparity Levels) constituted the trainingset for AMA

The eight most useful linear binocular filters (in rankorder) are shown in Figure 3a These filters specify thesubspace that a population of neurophysiologicalreceptive fields should cover for maximally accuratedisparity estimation Some filters are excited bynonzero disparities some are excited by zero (or near-zero) disparities and still others are suppressed by zerodisparity The left and right eye filter components areapproximately log-Gabor (Gaussian on a log spatialfrequency axis) The spatial frequency selectivity ofeach filterrsquos left and right eye components are similarbut differ between filters (Figure 3b) Filter tuning andbandwidth range between 12ndash47 c8 and 09ndash42 c8respectively with an average bandwidth of approxi-mately 15 octaves Thus the spatial extent of the filters(see Figure 3a) is inversely related to its spatialfrequency tuning As the tuned frequency increases thespatial extent of the filter decreases The filters alsoexhibit a mixture of phase and position coding (Figure3c) suggesting that a mixture of phase and positioncoding is optimal Similar filters result (Figure S1andashc)when the training set contains surfaces having adistribution of different slants (see Discussion) (Notethat the cosine windows bias the filters more towardphase than position encoding Additional analyses havenevertheless shown that windows having a nonzeroposition offset ie position disparity do not qualita-tively change the filters)

Interestingly some filters provide the most infor-mation about disparity when they are not respondingFor example the disparity is overwhelmingly likely tobe zero when filter f5 (see Figure 3 SupplementaryFigure S5) does not respond because anticorrelatedintensity patterns in the left- and right-eye images arevery unlikely in natural images A complex cell withthis binocular receptive field would produce a disparitytuning curve similar to the tuning curve produced by aclassic tuned-inhibitory cell (Figure S5 Poggio Gon-zalez Krause 1988)

The properties of our linear filters are similar tothose of binocular simple cells in early visual cortex(Cumming amp DeAngelis 2001 De Valois Albrecht amp

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 4

Thorell 1982 DeAngelis et al 1991 Poggio et al1988) Specifically binocular simple cells have asimilar range of shapes a similar range of spatialfrequency tuning similar octave bandwidths of 15and similar distributions of phase and position coding(Figure 3c) Despite the similarities between theoptimal filters and receptive fields in primary visualcortex we emphasize that our explicit aim is not toaccount for the properties of neurophysiologicalreceptive fields (Olshausen amp Field 1996) Rather ouraim is to determine the filters that encode the retinalinformation most useful for estimating disparity innatural images Thus neurophysiological receptivefields with the properties described here would beideally suited for disparity estimation

The fact that many neurophysiological properties ofbinocular neurons are predicted by a task-drivenanalysis of natural signals with appropriate biologicalconstraints and zero free parameters suggests thatsimilar ideal observer analyses may be useful forunderstanding and evaluating the neural underpin-nings of other fundamental sensory and perceptualabilities

Optimal disparity estimation

The optimal linear binocular filters encode the mostrelevant information in the retinal images for disparityestimation However optimal estimation performancecan only be reached if the joint responses of the filtersare appropriately combined and decoded (Figure 2c d)Here we determine the Bayes optimal (nonlinear)decoder by analyzing the responses of the filters inFigure 3 to natural images with a wide range ofdisparities Later we show how this decoder could beimplemented with linear and nonlinear mechanismscommonly observed in early visual cortex (We notethat the decoder that was used to learn the AMA filtersfrom the training stimuli cannot be used for arbitrarystimuli see Methods details)

First we examine the joint distribution of filterresponses to the training stimuli conditioned on eachdisparity level dk (The dot product between each filterand a contrast normalized left- and right-eye signalgives each filterrsquos response The filter responses arerepresented by the vector R see Figure 2b) Theconditional response distributions p(Rjdk) are approx-imately Gaussian (83 of the marginal distributionsconditioned on disparity are indistinguishable fromGaussian K-S test p 001 see Figure 4a) (Theapproximately Gaussian form is largely due to thecontrast normalization of the training stimuli seeDiscussion) Each of these distributions was fit with amultidimensional Gaussian gauss(R uk Ck) estimatedfrom the sample mean vector uk and the sample

covariance matrix Ck Figure 4a shows sample filterresponse distributions (conditioned on several disparitylevels) for the first two AMA filters Much of theinformation about disparity is contained in thecovariance between the filter responses indicating thata nonlinear decoder is required for optimal perfor-mance

The posterior probability of each disparity given anobserved response vector of AMA filter responses Rcan be obtained via Bayesrsquo rule

pethdkjRTHORN frac14gaussethRukCkTHORNpethdkTHORNXNlfrac141

gaussethR ulClTHORNpethdlTHORN eth2THORN

Here we assume that all disparities are equallylikely Hence the prior probabilities p(d) cancel out(This assumption yields a lower bound on performanceperformance will increase somewhat in natural condi-tions when the prior probabilities are not flat ascenario that is considered later)

The solid curves in Figure 4b show posteriorprobability distributions averaged across all naturalstimuli having the same disparity Each posterior canbe conceptualized as the expected response of apopulation of (exponentiated) log-likelihood neurons(see below) ordered by their preferred disparitiesColored areas show variation in the posterior proba-bilities due to irrelevant image variation When the goalis to obtain the most probable disparity the optimalestimate is given by the maximum a posteriori (MAP)read-out rule

d frac14 argmaxd

pethdjRTHORN eth3THORN

The posterior distributions given by Equation 2could of course also be read out with other decodingrules which would be optimal given different goals(cost functions)

The accuracy of disparity estimates from a largecollection of natural stereo-image test patches (29600Test Patches 800 Natural Inputs middot 37 DisparityLevels) is shown in Figure 5a (Note None of the testpatches were in the training set and only half the testdisparity levels were in the training set) Disparityestimates are unbiased over a wide range At zerodisparity estimate precision corresponds to a detectionthreshold of 6 arcsec As disparity increases (ie asthe surface patch is displaced from the point offixation) precision decreases approximately exponen-tially with disparity (Figure 5b) Sign confusions arerare (37 Figure 5c) Similar but somewhat poorerperformance is obtained with surfaces having a cosinedistribution of slants (Figures 5d e Figure S1f seeDiscussion) In summary optimal encoding (filters inFigure 3) and decoding (Equations 2 and 3) of task-

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 5

relevant information in natural images yields excellentdisparity estimation performance

This pattern of performance is consistent withhuman performance Human disparity detectionthresholds are exquisite a few arcsec on average(Blakemore 1970 Cormack et al 1991) discrimina-tion thresholds decrease exponentially with disparity(Badcock amp Schor 1985 Blakemore 1970 McKeeLevi amp Bowne 1990 Stevenson Cormack Schor ampTyler 1992) and sign confusions occur with a similarpattern and a similar proportion of the time (Landers ampCormack 1997)

Most psychophysical and neurophysiological datahas been collected with artificial stimuli usuallyrandom-dot stereograms (RDSs) We asked how wellour optimal estimator performs on RDS stimuli Giventhat our estimator was trained on natural stimuli it isinteresting to note that performance is very similar (butslightly poorer) with RDSs (Figure S2andashe) Fortunate-ly this finding suggests that under many circumstancesRDS stimuli are reasonable surrogates for naturalstimuli when measuring disparity processing in bio-logical vision systems

Our optimal estimator also performs poorly onanticorrelated stimuli (eg stimuli in which one eyersquosimage is contrast reversed) just like humans (Cum-ming Shapiro amp Parker 1998) Relevant informationexists in the optimal filter responses to anticorrelatedstimuli although the conditional response distributionsare more poorly segregated and the responses aregenerally slightly weaker (Figure S2f g) The primaryreason for poor performance is that an optimalestimator trained on natural images cannot accuratelydecode the filter responses to anticorrelated stimuli(Figure S2h i) This fact may help explain whybinocular neurons often respond strongly to anticor-related stereograms and why anticorrelated stereo-

grams appear like ill-specified volumes of points indepth

Optimal disparity encoding and decoding withneural populations

Although the AMA filters appear similar to binoc-ular simple cells (Figure 3 Figure S1) it may not beobvious how the optimal Bayesian rule for combiningtheir responses is related to processing in visual cortexHere we show that the optimal computations can beimplemented with neurally plausible operationsmdashlinearexcitation linear inhibition and simple static non-linearities (thresholding and squaring) Appropriateweighted summation of binocular simple and complexcell population responses can result in a new popula-tion of neurons having tightly tuned unimodaldisparity tuning curves that are largely invariant (seeFigure 2c)

The key step in implementing a Bayes-optimalestimation rule is to compute the likelihoodmdashorequivalently the log likelihoodmdashof the optimal filterresponses conditioned on each disparity level For thepresent case of a uniform prior the optimal MAPestimate is simply the disparity with the greatest loglikelihood Given that the likelihoods are Gaussian thelog likelihoods are quadratic

ln gaussethRjdkTHORNfrac12 frac14 05ethR ukTHORNTC1k ethR ukTHORN

thorn const eth4THORNwhere uk and Ck which are the mean and covariancematrices of the AMA filter responses to a largecollection of natural stereo-image patches havingdisparity dk (see Figure 4a) By multiplying throughand collecting terms Equation 4 can be expressed as

Figure 3 Optimal linear binocular receptive fields for disparity estimation (a) Spatial receptive fields Solid lines with closed symbols

indicate the left eye filter components Dashed lines with open symbols indicate right eye filter components Insets show 2-D versions

of the 1-D filters (b) Luminance spatial frequency tuning versus spatial frequency bandwidth The filters have bandwidths of

approximately 15 octaves (c) Phase and position shift coding for optimal binocular filters (circles) and binocular cells in macaque

(squares) (Cumming amp DeAngelis 2001) Phase shifts are expressed in equivalent position shifts Note that the filters were optimized

for the fovea whereas macaque cells were recorded from a range of different eccentricities

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 6

ln gaussethRjdkTHORNfrac12 frac14Xnifrac141

wikRi thornXnifrac141

wiikR2i

thornXn1

ifrac141

Xnjfrac14ithorn1

wijkethRi thorn RjTHORN2

thorn const0 eth5THORNwhere Ri is the response of the ith AMA filter and wikwiik and wijk are the weights for disparity dk Theweights are simple functions of the mean and covari-ance matrices from Equation 4 (see Supplement) Thusa neuron which responds according to the loglikelihood of a given disparity (an LL neuron)mdashthat isRLL

k frac14 ln[gauss(Rjdk)]mdashcan be obtained by weightedsummation of the linear (first term) squared (secondterm) and pairwise-sum-squared (third term) AMAfilter responses (Equation 5) The implication is that alarge collection of LL neurons each with a differentpreferred disparity can be constructed from a smallfixed set of linear filters simply by changing the weightson the linear filter responses squared-filter responsesand pairwise-sum-squared filter responses

A potential concern is that these computations couldnot be implemented in cortex AMA filters are strictlylinear and produce positive and negative responses(Figure 6a) whereas real cortical neurons produce onlypositive responses However the response of eachAMA filter could be obtained by subtracting theoutputs of two half-wave rectified simple cells that arelsquolsquoonrsquorsquo and lsquolsquooffrsquorsquo versions of each AMA filter (seeSupplement Figure S3a) The squared and pairwise-

sum-squared responses are modeled as resulting from alinear filter followed by a static squaring nonlinearity(Figure 6b) these responses could be obtained fromcortical binocular complex cells (see Supplement)(Note that our lsquolsquocomplex cellrsquorsquo differs from thedefinition given by some prominent expositors of thedisparity energy model [Figure 6d] Cumming ampDeAngelis 2001 Ohzawa 1998 Qian 1997) Thesquared responses could be obtained by summing andsquaring the responses of on and off simple cells (seeSupplement Figure S3a b) Finally the LL neuronresponse could be obtained via a weighted sum of theAMA filter and model complex cell responses (Figure6c) Thus all the optimal computations are biologicallyplausible

Figure 6 shows processing schematics disparitytuning curves and response variability for the threefilter types implied by our analysis an AMA filter amodel complex cell a model LL neuron and forcomparison a standard disparity energy neuron(Cumming amp DeAngelis 2001) (The model complexcells are labeled lsquolsquocomplexrsquorsquo because in general theyexhibit temporal frequency doubling and a fundamen-tal to mean response ratio that is less than 10 Skottunet al 1991) Response variability due to retinal imagefeatures that are irrelevant for estimating disparity isindicated by the gray area the smaller the gray areathe more invariant the neural response to irrelevantimage features The AMA filter is poorly tuned todisparity and gives highly variable responses todifferent stereo-image patches with the same disparity

Figure 4 Joint filter response distributions conditioned on disparity for filters F1 and F2 (see Figure 3a) (a) Joint filter responses to

each of the 7600 image patches in the training set Different colors and symbols denote different disparity levels Contours show

Gaussian fits to the conditional filter response distributions The black curves on the x and y axes represent the marginal response

distributions p(R1) and p(R2) (b) Posterior probability distributions averaged across all stimuli at each disparity level if only filters F1

and F2 are used (dotted curves) and if all eight filter responses are used (solid curves) Using eight AMA filters instead of two

increases disparity selectivity Shaded areas represent 68 confidence intervals on the posterior probabilities this variation is due to

natural stimulus variation that is irrelevant for estimating disparity Natural stimulus variation thus creates response variability even in

hypothetical populations of noiseless neurons

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 7

(Figure 6a) The model complex cell is better tuned butit is not unimodal and its responses also vary severelywith irrelevant image information (Figure 6b) (Thedisparity tuning curves for all model complex cells areshown in Figure S5) In contrast the LL neuron issharply tuned is effectively unimodal and is stronglyresponse invariant (Figure 6c) That is it respondssimilarly to all natural image patches of a givendisparity The disparity tuning curves for a range of LLneurons are shown in Figure 7a When slant variessimilar but broader LL neuron tuning curves result(Figure S1d e)

These results show that it is potentially misleading torefer to canonical V1 binocular simple and complexcells as disparity tuned because their responses aretypically as strongly modulated by variations incontrast pattern as they are by variations in disparity(gray area Figure 6a b) The LL neurons on the otherhand are tuned to a narrow range of disparities andrespond largely independent of the spatial frequencycontent and contrast

The LL neurons have several interesting propertiesFirst their responses are determined almost exclusivelyby the model complex cell inputs because the weightson the linear responses (Equation 5 Figure 6a c) aregenerally near zero (see Supplement) In this regard theLL neurons are consistent with the predictions of thestandard disparity energy model (Cumming amp DeAn-gelis 2001 Ohzawa 1998) However standard dis-

parity energy neurons are not as narrowly tuned or asinvariant (Figure 6d)

Second each LL neuron receives strong inputs frommultiple complex cells (Figure 7b) In this regard theLL neurons are inconsistent with the disparity energymodel which proposes that disparity-tuned cells areconstructed from two binocular subunits The potentialvalue of more than two subunits has been previouslydemonstrated (Qian amp Zhu 1997) Recently it hasbeen shown that some disparity-tuned V1 cells areoften modulated by a greater number of binocularsubunits than two Indeed as many as 14 subunits candrive the activity of a single disparity selective cell(Tanabe Haefner amp Cumming 2011)

Third the weights on the model complex cells(Figure 7b)mdashwhich are determined by the conditionalresponse distributions (Figure 4a)mdashspecify how infor-mation at different spatial frequencies should becombined

Fourth as preferred disparity increases the numberof strong weights on the complex cell inputs decreases(Figure 7b) This occurs because high spatial frequen-cies are less useful for encoding large disparities (seebelow) Thus if cells exist in cortex that behavesimilarly to LL neurons the number and spatialfrequency tuning of binocular subunits driving theirresponse should decrease as a function of theirpreferred disparity

Fifth for all preferred disparities the excitatory(positive) and inhibitory (negative) weights are in a

Figure 5 Accuracy and precision of disparity estimates on test patches (a) Disparity estimates of fronto-parallel surfaces displaced

from fixation using the filters in Figure 3 Symbols represent the median MAP readout of posterior probability distributions (see

Figure 4b) Error bars represent 68 confidence intervals on the estimates Red boxes mark disparity levels not in the training set

Error bars at untrained levels are no larger than at the trained levels indicating that the algorithm makes continuous estimates (b)

Precision of disparity estimates on a semilog axis Symbols represent 68 confidence intervals (same data as error bars in Figure 5a)

Human discrimination thresholds also rise exponentially as stereo stimuli are moved off the plane of fixation The gray area shows the

hyperacuity region (c) Sign identification performance as a function of disparity (d) (e) Same as in (b) (c) except that data is for

surfaces with a cosine distribution of slants

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 8

classic push-pull relationship (Ferster amp Miller 2000)(Figure 7b) consistent with the fact that disparityselective neurons in visual cortex contain both excit-atory and suppressive subunits (Tanabe et al 2011)

Sixth the LL neuron disparity tuning curves areapproximately log-Gaussian in shape (ie Gaussian ona log-disparity axis) Because their standard deviationsare approximately constant on a log disparity axistheir disparity bandwidths increase linearly with thepreferred disparity (Figure 7c) In this respect many

cortical neurons behave like LL neurons the disparitytuning functions of cortical neurons typically havemore low frequency power than is predicted from thestandard energy model (Ohzawa DeAngelis amp Free-man 1997 Read amp Cumming 2003) Additionallypsychophysically estimated disparity channels in hu-mans exhibit a similar relationship between bandwidthand preferred disparity (Stevenson et al 1992) Humandisparity channels however differ in that they haveinhibitory side-lobes (Stevenson et al 1992) that is

Figure 6 Biologically plausible implementation of selective invariant tuning processing schematics and disparity tuning curves for an

AMA filter a model complex cell and a model log-likelihood (LL) neuron For comparison a disparity energy unit is also presented In

all cases the inputs are contrast normalized photoreceptor responses Disparity tuning curves show the mean response of each filter

type across many natural image patches having the same disparity for many different disparities Shaded areas show response

variation due to variation in irrelevant features in the natural patches (not neural noise) Selectivity for disparity and invariance to

irrelevant features (external variation) increase as processing proceeds (a) The filter response is obtained by linearly filtering the

contrast normalized input signal with the AMA filter (b) The model complex cell response is obtained by squaring the linear AMA

filter response (c) The response of an LL neuron with preferred disparity dk is obtained by a weighted sum of linear and squared

filter responses The weights can be positiveexcitatory or negativeinhibitory (see Figure 7) The weights for an LL neuron with a

particular preferred disparity are specified by the filter response distribution to natural images having that disparity (Figure 4a

Equations 4 5 S1ndash 4) In disparity estimation the filter responses specify that the weights on the linear filter responses are near zero

(see Supplement) (d) A standard disparity energy unit is obtained by simply summing the squared responses of two binocular linear

filters that are in quadrature (908 out of phase with each other) Here we show the tuning curve of a disparity energy unit having

binocular linear filters (subunits) with left and right-eye components that are also 908 out of phase with each other (ie each

binocular subunit is selective for a nonzero disparity)

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 9

they have the center-surround organization that is ahallmark of retinal ganglion cell LGN and V1receptive fields Understanding the basis of this center-surround organization is an important direction forfuture work

V1 binocular neurons are unlikely to have receptivefields exactly matching those of the simple and complexcells implied by the optimal AMA filters but V1neurons are likely to span the subspace spanned by theoptimal filters It is thus plausible that excitatory andinhibitory synaptic weights could develop so that asubset of the neurons in V1 or in other cortical areassignal the log likelihood of different specific disparities(see Figure 7b) Indeed some cells in cortical areas V1V2 and V3V3a exhibit sharp tuning to the disparity ofrandom dot stimuli (Cumming 2002 Ohzawa et al1997 G F Poggio et al 1988 Read amp Cumming2003)

Computational models of estimation from neuralpopulations often rely on the assumption that eachneuron is invariant and unimodally tuned to thestimulus property of interest (Girshick et al 2011Jazayeri amp Movshon 2006 Lehky amp Sejnowski 1990Ma et al 2006) However it is often not discussed howinvariant unimodal tuning arises For example thebinocular neurons (ie complex cells) predicted by thestandard disparity energy model do not generallyexhibit invariant unimodal tuning (Figure 6d) Ouranalysis shows that neurons with unimodal tuning tostimulus properties not trivially available in the retinal

images (eg disparity) can result from appropriatelinear combination of nonlinear filter responses

To obtain optimal disparity estimates the LLneuron population response (represented by the vectorRLL in Figure 2c) must be read out (see Figure 2d)The optimal read-out rule depends on the observerrsquosgoal (the cost function) A common goal is to pick thedisparity having the maximum a posteriori probabil-ity If the prior probability of the different possibledisparities is uniform then the optimal MAP decodingrule reduces to finding the LL neuron with themaximum response Nonuniform prior probabilitiescan be taken into account by adding a disparity-dependent constant to each LL neuron responsebefore finding the peak response There are elegantproposals for how the peak of a population responsecan be computed in noisy neural systems (Jazayeri ampMovshon 2006 Ma et al 2006) Other commonlyassumed cost functions (eg MMSE) yield similarperformance

In sum our analysis has several implications First itsuggests that optimal disparity estimation is bestunderstood in the context of a population codeSecond it shows how to linearly sum nonlinear neuralresponses to construct cells with invariant unimodaltuning curves Third it suggests that the eclecticmixture of binocular receptive field properties in cortexmay play a functional role in disparity estimationFourth it provides a principled hypothesis for howneurons may compute the posterior probabilities of

Figure 7 Constructing selective invariant disparity-tuned units (LL neurons) (a) Tuning curves for several LL neurons each with a

different preferred disparity Each point on the tuning curve represents the average response across a collection of natural stereo-

images having the same disparity Gray areas indicate 61 SD of response due to stimulus-induced response variability (b)

Normalized weights on model complex cell responses (see Supplement Equation 5 Figure 6c) for constructing the five LL neurons

marked with arrows in (a) Positive weights are excitatory (red) Negative weights are inhibitory (blue) On-diagonal weights

correspond to model complex cells having linear receptive fields like the filters in Figure 3a Off-diagonal weights correspond to model

complex cells having linear receptive fields like the scaled pairwise sums of the AMA filters (see Figures S3ndashS5) High spatial

frequencies are not useful for estimating large disparities (Figure S7) Thus the number of strongly weighted complex cells decreases

as the magnitude of the preferred disparity increases from zero (c) LL neuron bandwidth (ie full-width at half-height of disparity

tuning curves) as a function of preferred disparity Bandwidth increases approximately linearly with tuning

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 10

stimulus properties (eg disparity defocus motion)not trivially available in the retinal image(s) Thus ouranalysis provides a recipe for how to increase selectivityand invariance in the visual processing stream invari-ance to image content variability and selectivity for thestimulus property of interest

Discussion

Using principles of Bayesian statistical decisiontheory we showed how to optimally estimate binoc-ular disparity in natural stereo images given theoptical sensor and noise properties of the humanvisual system First we determined the linear binoc-ular filters that encode the retinal image informationmost useful for estimating disparity in natural stereo-images The optimal linear filters share many proper-ties with binocular simple cells in the primary visualcortex of monkeys and cats (Figure 3 Figure S1)Then we determined how the optimal linear filterresponses should be nonlinearly decoded to maximizethe accuracy of disparity estimation The optimaldecoder was not based on prior assumptions butrather was dictated by the statistical pattern of thejoint filter responses to natural stereo images Overallperformance was excellent and matched importantaspects of human performance (Figure 5) Finally weshowed that the optimal encoder and decoder can beimplemented with well-established neural operationsThe operations show how selectivity and invariancecan emerge along the visual processing stream(Figures 6 amp 7)

External variability and neural noise

A visual systemrsquos performance is limited by bothexternal (stimulus-induced) variability and intrinsicneural response variability (ie neural noise) Manytheoretical studies have focused on the impact ofneural noise on neural computation (Berens EckerGerwinn Tolias amp Bethge 2011 Cumming ampDeAngelis 2001 DeAngelis et al 1991 Jazayeri ampMovshon 2006 Ma et al 2006 Nienborg et al 2004Ohzawa DeAngelis amp Freeman 1990) However inmany real-world situations stimulus-induced responsevariability (ie variability due to nuisance stimulusproperties) is a major or the dominant source ofvariability Our analysis demonstrates that the exter-nal variation in natural image contentmdashthat isvariation in image features irrelevant to the taskmdashconstitutes an important source of response variabilityin disparity estimation (see Figures 4 5 7) Thus evenin a hypothetical visual system composed of noiseless

neurons significant response variability will occurwhen estimating disparity under natural conditions(see Figures 4 and 5)

Determining the best performance possible withoutneural noise is essential for understanding the effect ofneural noise (Haefner amp Bethge 2010 Lewicki 2002Li amp Atick 1994 Olshausen amp Field 1996 Simoncelliamp Olshausen 2001) because it sets bounds on theamount of neural noise that a system can toleratebefore performance significantly deteriorates Indeedthe impact of neural noise on encoding and decoding innatural viewing cannot be evaluated without consider-ing the impact of stimulus-induced response variabilityon the population response (see Supplement)

Seventy years ago Hecht Shlaer and Pirenne (1942)showed in a landmark study that dark-adapted humanscan detect light in the periphery from the absorption ofonly five to eight quanta and that external variability(in the form of photon noise) was the primary source ofvariability in human performance This result contra-dicted the notion (widespread at the time) that responsevariability is due primarily to variability within theorganism (eg intrinsic noise) Similarly our resultsunderscore the importance of analyzing the variabilityof natural signals when analyzing the responsevariability of neural circuits that underlie performancein critical behavioral tasks

Comparison with the disparity energy model

The most common neurophysiological model ofdisparity processing is the disparity energy modelwhich aims to account for the response properties ofbinocular complex cells (Cumming amp DeAngelis 2001DeAngelis et al 1991 Ohzawa 1998 Ohzawa et al1990) The model had remarkable initial successHowever an increasing number of discrepancies haveemerged between the neurophysiology and the modelrsquospredictions The standard disparity energy model forexample does not predict the reduced response toanticorrelated stereograms observed in cortex It alsodoes not predict the dependence of disparity selectivecells on more than two binocular subunits Fixes to thestandard model have been proposed to account for theobserved binocular response properties (Haefner ampCumming 2008 Qian amp Zhu 1997 Read Parker ampCumming 2002 Tanabe amp Cumming 2008) We haveshown however that a number of these properties area natural consequence of an optimal algorithm fordisparity estimation

The most common computational model of disparityprocessing is the local cross-correlation model (Bankset al 2004 Cormack et al 1991 Tyler amp Julesz 1978)In this model the estimated disparity is the one thatproduces the maximum local cross-correlation between

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 11

the images in the two eyes Local cross-correlation isthe computational equivalent of appropriately pro-cessing large populations of disparity-energy complexcells having a wide range of tuning characteristics(Anzai Ohzawa amp Freeman 1999 Fleet Wagner ampHeeger 1996) Cross-correlation is optimal when thedisparities across a patch are small and constantbecause it is then akin to the simple template matchingof noise-limited ideal observers Cross-correlation isnot optimal for larger or varying disparities becausethen the left- and right-eye signals falling within abinocular receptive field often differ substantially

How does our ideal observer for disparity estimationcompare with the standard disparity-energy and cross-correlation models To enable a meaningful compari-son both methods were given access to the same imageinformation Doing so requires a slight modification tothe standard cross correlation algorithm (seeSupplement) The eight AMA filters in Figure 3outperform cross-correlation in both accuracy andprecision when the disparities exceed approximately75 arcmin (Figure S6) This performance increase isimpressive given that local cross-correlation has (ineffect) access to a large bank of filters whereas ourmethod uses only eight Furthermore increasing thenumber of AMA filters to 10ndash12 gives essentiallyequivalent performance to cross-correlation for smallerdisparities

Spatial frequency tuning of optimal linearbinocular filters

The optimal linear binocular filters (the AMA filters)are selective for low to mid spatial frequencies (12ndash47c8) and not for higher frequencies (see Figure 3 FigureS1) To develop an intuition for why we examined thebinocular contrast signals as a function of frequencythat result from a binocularly viewed high-contrastedge (Figure S7) The binocular contrast signals barelydiffer above 6 c8 indicating that higher spatialfrequencies carry little information for disparity esti-mation

This analysis of binocular contrast signals providesinsight into another puzzling aspect of disparityprocessing Human sensitivity to disparity modulation(ie sinusoidal modulations in disparity-defined depth)cuts off at very low frequencies (4 c8) (Banks et al2004 Tyler 1975) V1 binocular receptive fields tunedto the highest useful (luminance) spatial frequency (6c8) have widths of 8 arcmin assuming the typicaloctave bandwidth for cortical neurons Given thatreceptive fields cannot signal variation finer than theirown size the fact that humans cannot see disparitymodulations higher than 4 c8 is nicely predicted bythe 8 arcmin width suggested by the present analysis

This size matches previous psychophysically basedestimates of the smallest useful mechanism in disparityestimation (Banks et al 2004 Harris McKee ampSmallman 1997)

Eye movement jitter has previously been proposed asan explanation for why useful binocular information isrestricted to low spatial frequencies (Vlaskamp Yoonamp Banks 2011) The argument is that jitter and therelatively long integration time of the stereo systemcould lsquolsquosmear outrsquorsquo the disparity signals Eye movementjitter certainly can degrade stereo information enoughto render high frequency signals unmeasurable (FigureS7c d) However the analysis in the supplement(Figure S7) suggests that the low frequency at whichhumans lose the ability to detect disparity modulationmay also be explained by the statistics of naturalimages

Generality of findings

Although our analysis is based on calibrated naturalstereo-image patches and realistic characterizations ofhuman optics sensors and neural noise there are somesimplifying assumptions that could potentially limit thegenerality of our conclusions Here we examine theeffect of these assumptions

The effect of surface slant and nonplanar depth structure

The stereo-image patches were projections offronto-parallel surfaces In natural scenes surfaces areoften slanted which causes the disparity to changeacross a patch We evaluated the effect of surface slantby repeating our analysis with a training set having adistribution of slants (see Figure 5d e andSupplement) This distribution of slants produced adistribution of disparity gradients that are comparableto those that occur when viewing natural scenes(Hibbard 2008) (Figure S1g) The optimal binocularfilters are only very slightly affected by surface slant(cf Figure 3 and Figure S1a) These differenceswould be difficult to detect in neurophysiologicalexperiments This may help explain why there hasbeen little success in finding neurons in early visualcortex that are tuned to nonzero slants (Nienborg etal 2004) Disparity estimation performance is alsoquite similar although estimate precision is somewhatreduced (Figure 5d e)

Even after including the effect of surface slant ourtraining set lacks the non-planar depth structuremdashwithin-patch depth variation and depth discontinuities(occlusions)mdashthat is present in natural scenes (FigureS8andashc) To determine how the magnitude of theseother sources depth variation compare to that due toslant we analyzed a set of range images obtained with

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 12

a high-precision Riegl VZ400 range scanner (seeSupplement) We find that most of the depth variationin the range images is captured by variation in slant(Figure S8d) Given that variation in slant has littleeffect on the optimal receptive fields and given thatlocations of fine depth structure and monocularocclusion zones (9 of points in our dataset) aremost likely random with respect to the receptive fieldsthen it is likely that the other sources of variation willhave little effect

Nevertheless occlusions create monocular zones thatcarry information supporting da Vinci stereopsis inhumans (Kaye 1978 Nakayama amp Shimojo 1990)Neurophysiological evidence suggests that binocularmechanisms exist for locating occlusion boundaries(Tsao Conway amp Livingstone 2003) In the futurewhen databases become available of high-qualitynatural stereo images with coregistered RGB anddistance information the general approach describedhere may prove useful for determining the filters thatare optimal for detecting occlusion boundaries

The effect of a realistic disparity prior

Our analysis assumed a flat prior probabilitydistribution over disparity Two groups have recentlypublished estimates of the prior distribution ofdisparities based on range measurements in naturalscenes and human eye movement statistics (Hibbard2008 Liu et al 2008) Does the prior distribution ofdisparity signals have a significant effect on the optimalfilters and estimation performance To check wemodified the frequency of different disparities in ourtraining set to match the prior distribution ofdisparities encountered in natural viewing (Liu et al2008) Linear filter shapes LL neuron tuning curveshapes and performance levels were robust to differ-ences between a flat and realistic disparity prior

It has been hypothesized that the prior over thestimulus dimension of interest (eg disparity) maydetermine the optimal tuning curve shapes and theoptimal distribution of peak tunings (ie how thetuning curves should tile the stimulus dimension)(Ganguli amp Simoncelli 2010) In evaluating thishypothesis one must consider the results presented inthe present paper Our results show that natural signalsand the linear receptive fields that filter those signalsplace strong constraints on the shapes that tuningcurves can have Specifically the shapes of the LLneuron disparity tuning curve (approximately log-Gaussian) are robust to changes in the prior Thusalthough the prior may influence the optimal distribu-tion of peak tuning (our analysis does not address thisissue) it is unlikely to be the sole (or even the primary)determinant of tuning curve shapes

The effect of contrast normalization

Contrast normalization significantly contributes tothe Gaussian form of the filter response distributionsOne potential advantage of Gaussian response distri-butions is that pair-wise (and lower order) statisticsfully characterize the joint responses from an arbitrarynumber of filters making possible the decoding of largefilter population responses In the analysis presentedhere all binocular signals were normalized to a mean ofzero and a vector magnitude of 10 before beingprojected onto the binocular filters This is a simplifiedform of the contrast normalization that is a ubiquitousfeature of retinal and early cortical processing (Car-andini amp Heeger 2012) The standard model ofcontrast normalization in cortical neurons is given by

cnorm(x)frac14 c(x)ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffijjcethxTHORNjj2 thorn nc2

50

qwhere n is the dimen-

sionality of the vector and c50 is the half-saturationconstant (Albrecht amp Geisler 1991 Albrecht ampHamilton 1982 Heeger 1992) (The half-saturationconstant is so-called because in a neuron with anoutput squaring nonlinearity the response rate willequal half its maximum when the contrast of thestimulus equals the value of c50)

We examined the effect of different half-saturationconstants The optimal filters are robust to differentvalues of c50 The conditional response distributions arenot For large values of c50 the response distributionshave tails much heavier than Gaussians When c50frac14 00(as it did throughout the paper) the distributions arewell approximated but somewhat lighter-tailed thanGaussians The distributions (eg see Figure 4a) aremost Gaussian on average when c50 frac14 01 (Figure S9)On the basis of this finding we hypothesize a newfunction for cortical contrast normalization in additionto the many already proposed (Carandini amp Heeger2012) Contrast normalization may help create condi-tional filter response distributions that are Gaussianthereby making simpler the encoding and decoding ofhigh-dimensional subspaces of retinal image informa-tion

The effect of different optics in the two eyes

We modeled the optics of the two eyes as beingidentical whereas refractive power and monochromaticaberrations often differ somewhat between the eyes(Marcos amp Burns 2000) To evaluate the effect ofnormal optical variations we convolved the retinalprojections with point-spread functions previouslymeasured in the first authorrsquos left and right eyes (Burgeamp Geisler 2011) (The first author has normal opticalvariation) Then we determined optimal filters andperformance Filters and performance levels are similarto those in Figures 3 and 4 Thus our results are robustto typical optical differences between the eyes How-

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 13

ever if the optics in one eye is severely degradedrelative to the other then the optimal filters are quitedifferent and disparity estimation performance issignificantly poorer

Interestingly a persistent 4 diopter difference inrefractive power between the two eyes is the mostimportant risk factor for the development of amblyopia(Levi McKee amp Movshon 2011) a condition char-acterized by extremely poor or absent stereopsis(among other deficits) A four diopter differencebetween the left and right eye optics drastically reduces(ie nearly eliminates) computable interocular contrastsignals that covary systematically with disparity(Equation S6 Figure S7endashh)

Conclusions

The method presented here provides a prescriptionfor how to optimally estimate local depth cues fromimages captured by the photosensors in any visionsystem Our analysis of the depth cue of binoculardisparity provides insight into the computations andneural populations required for accurate estimation ofbinocular disparity in animals viewing natural scenesSpecifically by analyzing the information available atthe photoreceptors we improve upon existing compu-tational methods for solving the stereo correspondenceproblem and provide a normative account of a range ofestablished neurophysiological and psychophysicalfindings This study demonstrates the power ofcharacterizing the properties of natural signals that arerelevant for performing specific natural tasks A similarrecent analysis provided insight into the neuralcomputations required for accurate estimation of thefocus error (defocus) in local regions of retinal imagesof natural scenes (Burge amp Geisler 2011 Stevenson etal 1992) The same approach seems poised to provideinsight into the neural encoding and decoding thatunderlie many other fundamental sensory and percep-tual tasks

Methods details

Natural disparity signals

To determine the luminance structure of naturalscenes we photographed natural scenes on and aroundthe University of Texas at Austin campus with a tripod-mounted Nikon D700 14-bit SLR camera (4256 middot 2836pixels) fitted with a Sigma 50 mm prime lens (Burge ampGeisler 2011) To ensure sharp photographs thecamera lens was focused on optical infinity and all

imaged objects were at least 16 m from the cameraRAW photographs were converted to luminance valuesusing the measured sensitivities of the camera (Burge ampGeisler 2011 2012) and the human photopic sensitivityfunction (Stockman amp Sharpe 2000) Twelve hundred256 middot 256 pixel patches were randomly selected from80 photographs (15 Patches middot 80 Images Figure 1b)Four hundred were used for training the other 800were used for testing

To approximate the depth structure of naturalscenes we texture map the luminance patches ontoplanar fronto-parallel or slanted surface patches thatwere straight ahead at eye height After perspectiveprojection this procedure yields the same pattern ofretinal stimulation that would be created by viewingphotographs pasted on surfaces (eg walls) differentlyslanted in depth The disparity of the surface patch wasdefined as the disparity of the surfacersquos central point

d frac14 2 tan1 IPD=2

dfixation thorn D

tan1 IPD=2

ethdfixationTHORN

eth6THORNwhere dfixation is the fixation distance to the surfacecenter IPD is the interpupillary distance and D is thedepth Depth is given by D frac14 dsurface dfixation wheredsurface is the distance to the surface The disparity ofother points on the surface patch varies slightly acrossthe surface patch The change in disparity across thesurface is greater when the surface patches are slanted

Each surface patch was either coincident with ordisplaced from the point of fixation (Figure 1c) that isthe surface patches were positioned at one of 37 depthscorresponding to 37 disparity levels within Panumrsquosfusional range (Panum 1858) (16875 to 16875arcmin in equally spaced steps) Each surface patchsubtended 28 of visual angle from the cyclopean eyeSurfaces were then slanted (or not) and left- and right-eye projections were determined via perspective pro-jection ensuring that horizontal and vertical disparitieswere correct (Figure 1c) Next left and right projec-tions were resampled at 128 samples8 approximatelythe sampling rate of the human foveal cones (CurcioSloan Kalina amp Hendrickson 1990) Finally theimages were cropped such that they subtended 18 fromeach eye (6 058 about the left- and right-eye foveae)

Optics

Patches were defocused with a polychromatic point-spread function based on a wave-optics model of thehuman visual system The model assumed a 2 mm pupil(a size typical for a bright sunny day) (Wyszecki ampStiles 1982) human chromatic aberrations (ThibosYe Zhang amp Bradley 1992) a single refracting

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 14

surface and the Fraunhoffer approximation whichimplies that at or near the focal plane the opticaltransfer function (OTF) is given by the cross-correla-tion of the generalized pupil function with its complexconjugate In humans and nonhuman primates ac-commodative and vergence eye-movement systems aretightly coupled In other words the focus distanceusually equals the fixation distance (Fincham ampWalton 1957) We set the focus distance equal to thefixation distance of 40 cm and defocused the imagesappropriate for each disparity See Supplement forfurther details of how the optics were simulated

Sensor array responses

The information for disparity estimation depends onthe depth and luminance structure of natural scenesprojective geometry the optics of the two eyes thesensor arrays and the sensor noise These factorstogether determine the pattern of noisy sensor re-sponses

reethxTHORN frac14 IeethxTHORNpsfeethxTHORNfrac12 sampethxTHORN thorn g eth7THORNfor each eye e Ie(x) represents an eye-specific lumi-nance image of the light striking the sensor array ateach location xfrac14 (x y) in a hypothetical optical systemthat causes zero degradation in image quality Eacheyersquos optics degrade the idealized image the optics arerepresented by a polychromatic point-spread functionpsfe(x) that contains the effects of defocus chromaticaberration and photopic wavelength sensitivity Thesensor arrays are represented by a spatial samplingfunctions samp(x) Finally the sensor responses arecorrupted by noise g the noise level was set just highenough to remove retinal image detail that is unde-tectable by the human visual system (Williams 1985)(In pilot studies we found that the falloff in contrastsensitivity at low frequencies had a negligible effect onresults for simplicity we did not model its effects) Notethat Equation 7 represents a luminance (photopic)approximation of the images that would be captured bythe photoreceptors This approximation is sufficientlyaccurate for the present purposes (Burge amp Geisler2011)

Accuracy maximization analysis (AMA)

AMA is a method for dimensionality reduction thatfinds the low-dimensional set of features in sensorysignals that are most useful for performing a specifictask The dependence of AMA on task renders itdistinct from many other popular methods for dimen-sionality reduction Principal components analysis(PCA) for example finds the dimensions that account

for the most variation in the sensory signals There isof course no guarantee that factors causing the mostvariation in the sensory signals will be useful for thetask at hand Color lighting and the multitude oftextures in natural images for example cause signifi-cant variation in the retinal images Yet all of thisvariation is irrelevant for the task of estimatingdisparity Thus PCA returns features that are notnecessarily useful to the task while AMA is specificallydesigned to ignore the irrelevant variation

The logic of AMA is as follows Consider encodingeach training stimulus with a small population of filterswith known (internal) noise characteristics The filterresponses are given by the dot product of the contrast-normalized stimulus scaled by the maximum responserate plus response noise (here a small amount ofGaussian noise) With a known noise model it isstraightforward to compute the mean and variance ofeach filterrsquos response to each training stimulus Then aclosed-form expression is derived for the approximateaccuracy of the Bayesian optimal decoder (GeislerNajemnik amp Ing 2009) Finally this closed-formexpression can be used to search the space of linearfilters to find those that give the most accurateperformance If the algorithm does not settle in localminima it finds the Bayes-optimal filters for maximiz-ing performance in a given task (Geisler et al 2009)Here different random initializations yielded the samefinal estimated filters A Matlab implementation ofAMA and a short discussion of how to apply it areavailable at httpjburgecpsutexaseduresearchCodehtml

It is important to note that the Bayesian optimaldecoder used in AMA requires knowing the means andvariances for each possible stimulus and hence it canonly be used to decode the training stimuli In otherwords AMA can find the optimal linear filters given alarge enough training set but the decoder it uses tolearn those filters cannot be applied to arbitrary stimuliA separate analysis (like the one described in the bodyof the present paper) is required to determine how tooptimally decode the AMA filter responses for arbi-trary test stimuli

Estimating disparity

Increasing the number of disparity levels in thetraining set increases the accuracy of disparity estima-tion However increasing the number of disparitylevels in the training set also increases the training setsize and the computational complexity of learningfilters via AMA A balance must be struck Excellentcontinuous estimates are obtained using 1875 arcminsteps for training followed by interpolation of 577 filterresponse distributions (ie 31 interpolated response

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 15

distribution between each training step correspondingto one interpolated distribution every 35 arcsec)Interpolated distributions were obtained by fittingcubic splines through the response distribution meanvectors and covariance matrices This procedureresulted in 577 LL neurons which resulted in posteriorprobability distributions that were defined by 577discrete points MAP estimates were obtained byselecting the disparity with the highest posteriorprobability Increasing the number of interpolateddistributions (ie LL neurons) had no effect onperformance

Keywords natural scene statistics perceptual con-stancy ideal observer Bayesian statistics populationcode encoding decoding selectivity invariance depthperception stereopsis hierarchical model disparityenergy model simple cells complex cells

Acknowledgments

The authors thank Larry Cormack Bruno Olshau-sen and Jonathan Pillow for comments on an earlierdraft JB was supported by NSF grant IIS-1111328NIH grant 2R01-EY011747 and NIH Training Grant1T32-EY021462-01A1 WSG was supported by NIHgrant 2R01-EY011747

Commercial relationships noneCorresponding author Johannes BurgeEmail jburgemailcpsutexaseduAddress Center for Perceptual Systems University ofTexas at Austin Austin TX USA

References

Albrecht D G amp Geisler W S (1991) Motionselectivity and the contrast-response function ofsimple cells in the visual cortex Visual Neurosci-ence 7(6) 531ndash546

Albrecht D G amp Hamilton D B (1982) Striatecortex of monkey and cat Contrast responsefunction Journal of Neurophysiology 48(1) 217ndash237

Anzai A Ohzawa I amp Freeman R D (1999)Neural mechanisms for processing binocular in-formation I Simple cells Journal of Neurophysiol-ogy 82(2) 891ndash908

Badcock D R amp Schor C M (1985) Depth-increment detection function for individual spatialchannels Journal of the Optical Society of AmericaA 2(7) 1211ndash1215

Banks M S Gepshtein S amp Landy M S (2004)Why is spatial stereoresolution so low Journal ofNeuroscience 24(9) 2077ndash2089

Berens P Ecker A S Gerwinn S Tolias A S ampBethge M (2011) Reassessing optimal neuralpopulation codes with neurometric functionsProceedings of the National Academy of SciencesUSA 108(11) 4423

Blakemore C (1970) The range and scope ofbinocular depth discrimination in man The Journalof Physiology 211(3) 599ndash622

Burge J amp Geisler W S (2011) Optimal defocusestimation in individual natural images Proceed-ings of the National Academy of Sciences USA108(40) 16849ndash16854

Burge J amp Geisler W S (2012) Optimal defocusestimates from individual images for autofocusing adigital camera Proceedings of the ISampTSPIE 47thAnnual Meeting January 2012 Burlingame CA

Carandini M amp Heeger D J (2012) Normalizationas a canonical neural computation Nature ReviewsNeuroscience 13(1) 51ndash62

Chen Y amp Qian N (2004) A coarse-to-fine disparityenergy model with both phase-shift and position-shift receptive field mechanisms Neural Computa-tion 16(8) 1545ndash1577

Cormack L K Stevenson S B amp Schor C M(1991) Interocular correlation luminance contrastand cyclopean processing Vision Research 31(12)2195ndash2207

Cumming B G (2002) An unexpected specializationfor horizontal disparity in primate primary visualcortex Nature 418(6898) 633ndash636

Cumming B G amp DeAngelis G C (2001) Thephysiology of stereopsis Annual Review of Neuro-science 24 203ndash238

Cumming B G Shapiro S E amp Parker A J (1998)Disparity detection in anticorrelated stereogramsPerception 27(11) 1367ndash1377

Curcio C A Sloan K R Kalina R E ampHendrickson A E (1990) Human photoreceptortopography The Journal of Comparative Neurolo-gy 292(4) 497ndash523

De Valois R L Albrecht D G amp Thorell L G(1982) Spatial frequency selectivity of cells inmacaque visual cortex Vision Research 22(5) 545ndash559

DeAngelis G C Ohzawa I amp Freeman R D(1991) Depth is encoded in the visual cortex by aspecialized receptive field structure Nature352(6331) 156ndash159

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 16

Ferster D amp Miller K D (2000) Neural mechanismsof orientation selectivity in the visual cortexAnnual Review of Neuroscience 23(1) 441ndash471

Fincham E amp Walton J (1957) The reciprocalactions of accommodation and convergence TheJournal of Physiology 137(3) 488ndash508

Fleet D J Wagner H amp Heeger D J (1996)Neural encoding of binocular disparity Energymodels position shifts and phase shifts VisionResearch 36(12) 1839ndash1857

Ganguli D amp Simoncelli E P (2010) Implicitencoding of prior probabilities in optimal neuralpopulations Advances in Neural Information Pro-cessing Systems (NIPS) 23 658ndash666

Geisler W S Najemnik J amp Ing A D (2009)Optimal stimulus encoders for natural tasksJournal of Vision 9(13)17 1ndash16 httpwwwjournalofvisionorgcontent91317 10116791317 [PubMed] [Article]

Girshick A R Landy M S amp Simoncelli E P(2011) Cardinal rules Visual orientation percep-tion reflects knowledge of environmental statisticsNature Neuroscience 14(7) 926ndash932

Haefner R amp Bethge M (2010) Evaluating neuronalcodes for inference using Fisher informationAdvances in Neural Information Processing Systems23 1ndash9

Haefner R M amp Cumming B G (2008) Adaptationto natural binocular disparities in primate V1explained by a generalized energy model Neuron57(1) 147ndash158

Harris J M McKee S P amp Smallman H S (1997)Fine-scale processing in human binocular stereop-sis Journal of the Optical Society of America AOptics Image Science and Vision 14(8) 1673ndash1683

Hecht S Shlaer S amp Pirenne M H (1942) Energyquanta and vision The Journal of General Physi-ology 25(6) 819ndash840

Heeger D J (1992) Normalization of cell responses incat striate cortex Visual Neuroscience 9(2) 181ndash197

Hibbard P B (2008) Binocular energy responses tonatural images Vision Research 48(12) 1427ndash1439

Hoyer P O amp Hyvarinen A (2000) Independentcomponent analysis applied to feature extractionfrom colour and stereo images Network Compu-tation in Neural Systems 11(3) 191ndash210

Jazayeri M amp Movshon J A (2006) Optimalrepresentation of sensory information by neuralpopulations Nature Neuroscience 9(5) 690ndash696

Kaye M (1978) Stereopsis without binocular corre-lation Vision Research 18(8) 1013ndash1022

Landers D D amp Cormack L K (1997) Asymme-tries and errors in perception of depth fromdisparity suggest a multicomponent model ofdisparity processing Perception amp Psychophysics59(2) 219ndash231

Lehky S R amp Sejnowski T J (1990) Neural modelof stereoacuity and depth interpolation based on adistributed representation of stereo disparityJournal of Neuroscience 10(7) 2281ndash2299

Levi D M McKee S P amp Movshon J A (2011)Visual deficits in anisometropia Vision Research51(1) 48ndash57

Lewicki M S (2002) Efficient coding of naturalsounds Nature Neuroscience 5(4) 356ndash363

Li Z amp Atick J J (1994) Efficient stereo coding inthe multiscale representation Network Computa-tion in Neural Systems 5 157ndash174

Liu Y Bovik A C amp Cormack L K (2008)Disparity statistics in natural scenes Journal ofVision 8(11)19 1ndash14 httpwwwjournalofvisionorgcontent81119 doi10116781119 [PubMed][Article]

Ma W J Beck J M Latham P E amp Pouget A(2006) Bayesian inference with probabilistic pop-ulation codes Nature Neuroscience 9(11) 1432ndash1438

Marcos S amp Burns S A (2000) On the symmetrybetween eyes of wavefront aberration and conedirectionality Vision Research 40(18) 2437ndash2447

Marr D amp Poggio T (1976) Cooperative computa-tion of stereo disparity Science 194(4262) 283ndash287

McKee S P Levi D M amp Bowne S F (1990) Theimprecision of stereopsis Vision Research 30(11)1763ndash1779

Nakayama K amp Shimojo S (1990) Da Vincistereopsis Depth and subjective occluding contoursfrom unpaired image points Vision Research30(11) 1811ndash1825

Nienborg H Bridge H Parker A J amp CummingB G (2004) Receptive field size in V1 neuronslimits acuity for perceiving disparity modulationJournal of Neuroscience 24(9) 2065ndash2076

Ogle K N (1952) On the limits of stereoscopic visionJournal of Experimental Psychology 44(4) 253ndash259

Ohzawa I (1998) Mechanisms of stereoscopic visionThe disparity energy model Current Opinion inNeurobiology 8(4) 509ndash515

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 17

Ohzawa I DeAngelis G C amp Freeman R D(1990) Stereoscopic depth discrimination in thevisual cortex Neurons ideally suited as disparitydetectors Science 249(4972) 1037ndash1041

Ohzawa I DeAngelis G C amp Freeman R D(1997) Encoding of binocular disparity by complexcells in the catrsquos visual cortex Journal of Neuro-physiology 77(6) 2879ndash2909

Olshausen B A amp Field D J (1996) Emergence ofsimple-cell receptive field properties by learning asparse code for natural images Nature 381(6583)607ndash609

Panum P (1858) Physiologische Untersuchungen uberdas Sehen mit zwei Augen [Translation Psycholog-ical investigations on seeing with two eyes] KielSchwerssche Buchandlung

Poggio G F Gonzalez F amp Krause F (1988)Stereoscopic mechanisms in monkey visual cortexBinocular correlation and disparity selectivityJournal of Neuroscience 8(12) 4531ndash4550

Qian N (1997) Binocular disparity and the perceptionof depth Neuron 18(3) 359ndash368

Qian N amp Zhu Y (1997) Physiological computationof binocular disparity Vision Research 37(13)1811ndash1827

Read J C A amp Cumming B G (2003) Testingquantitative models of binocular disparity selectiv-ity in primary visual cortex Journal of Neurophys-iology 90(5) 2795ndash2817

Read J C A amp Cumming B G (2007) Sensors forimpossible stimuli may solve the stereo correspon-dence problem Nature Neuroscience 10(10) 1322ndash1328

Read J C A Parker A J amp Cumming B G (2002)A simple model accounts for the response ofdisparity-tuned V1 neurons to anticorrelated im-ages Visual Neuroscience 19(6) 735ndash753

Simoncelli E P amp Olshausen B A (2001) Naturalimage statistics and neural representation AnnualReview of Neuroscience 24 1193ndash1216

Skottun B C De Valois R L Grosof D HMovshon J A Albrecht D G amp Bonds A B(1991) Classifying simple and complex cells on the

basis of response modulation Vision Research31(7-8) 1079ndash1086

Stevenson S B Cormack L K Schor C M ampTyler C W (1992) Disparity tuning in mecha-nisms of human stereopsis Vision Research 32(9)1685ndash1694

Stockman A amp Sharpe L T (2000) The spectralsensitivities of the middle- and long-wavelength-sensitive cones derived from measurements inobservers of known genotype Vision Research40(13) 1711ndash1737

Tanabe S amp Cumming B G (2008) Mechanismsunderlying the transformation of disparity signalsfrom V1 to V2 in the macaque Journal ofNeuroscience 28(44) 11304ndash11314

Tanabe S Haefner R M amp Cumming B G (2011)Suppressive mechanisms in monkey V1 help tosolve the stereo correspondence problem Journal ofNeuroscience 31(22) 8295ndash8305

Thibos L N Ye M Zhang X amp Bradley A(1992) The chromatic eye A new reduced-eyemodel of ocular chromatic aberration in humansApplied Optics 31(19) 3594ndash3600

Tsao D Y Conway B R amp Livingstone M S(2003) Receptive fields of disparity-tuned simplecells in macaque V1 Neuron 38(1) 103ndash114

Tyler C W (1975) Spatial organization of binoculardisparity sensitivity Vision Research 15(5) 583ndash590

Tyler C W amp Julesz B (1978) Binocular cross-correlation in time and space Vision Research18(1) 101ndash105

Vlaskamp B N S Yoon G amp Banks M S (2011)Human stereopsis is not limited by the optics of thewell-focused eye Journal of Neuroscience 31(27)9814ndash9818

Williams D R (1985) Visibility of interference fringesnear the resolution limit Journal of the OpticalSociety of America A 2(7) 1087ndash1093

Wyszecki G amp Stiles W (1982) Color scienceConcepts and methods quantitative data and for-mulas New York John Wiley amp Sons

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 18

113 13

Optimal13 disparity13 estimation13 in13 natural13 stereo-shy‐images13 (Supplement)13

Johannes13 Burge13 amp13 Wilson13 S13 Geisler13 13 Computing13 the13 log13 likelihood13 with13 populations13 of13 simple13 and13 complex13 cells13 13 Simple13 cells13 in13 visual13 cortex13 produce13 only13 positive13 responses13 whereas13 AMA13 filters13 produce13 both13 positive13 and13 negative13 responses13 The13 linear13 AMA13 filter13 responses13 (see13 first13 term13 in13 Eq13 5)13 and13 their13 pairwise13 sums13 (see13 the13 sum13 in13 the13 third13 termrsquos13 parentheses)13 can13 be13 obtained13 from13 a13 population13 of13 simple13 cells13 We13 define13 simple13 cells13 as13 units13 whose13 outputs13 result13 from13 linear13 filtering13 followed13 by13 half-shy‐wave13 rectification13 (ie13 thresholding13 at13 zero)13 Each13 AMA13 filter13 response13 in13 the13 first13 term13 can13 be13 obtained13 from13 two13 simple13 cells13 that13 are13 lsquoonrsquo13 and13 lsquooffrsquo13 versions13 of13 each13 AMA13 filter13 (Fig13 S3a13 see13 Fig13 3)13 Each13 pairwise-shy‐summed13 AMA13 filter13 response13 in13 the13 third13 term13 can13 be13 obtained13 from13 two13 simple13 cells13 that13 are13 lsquoonrsquo13 and13 lsquooffrsquo13 versions13 of13 each13 pairwise-shy‐summed13 filter13 (Fig13 S3a13 S4)13 13 13 The13 squared13 AMA13 filter13 responses13 (see13 the13 second13 and13 third13 terms13 in13 Eq13 5)13 can13 be13 obtained13 from13 a13 population13 of13 complex13 cells13 We13 define13 complex13 cells13 as13 units13 whose13 responses13 result13 from13 linear13 filtering13 followed13 by13 a13 squaring13 non-shy‐linearity13 as13 is13 often13 the13 case13 in13 cortex13 Complex13 cells13 can13 be13 implemented13 by13 summing13 and13 then13 squaring13 (or13 squaring13 and13 then13 summing)13 the13 outputs13 of13 lsquoonrsquo13 and13 lsquooffrsquo13 simple13 cells13 (Fig13 S3b13 the13 disparity13 tuning13 curves13 of13 these13 complex13 cells13 are13 given13 in13 Fig13 S5)13 13 Finally13 neurons13 whose13 response13 rates13 represent13 the13 log13 likelihood13 of13 the13 AMA13 filter13 population13 responses13 can13 be13 constructed13 via13 a13 weighted13 sum13 of13 simple13 and13 complex13 cell13 population13 responses13 (Eq13 513 Fig13 613 Fig13 7)13 (Note13 that13 the13 likelihood13 can13 be13 obtained13 by13 exponentiating13 the13 log-shy‐likelihood13 responses)13 In13 other13 words13 a13 large13 collection13 of13 log-shy‐likelihood13 (LL)13 neurons13 each13 with13 a13 different13 preferred13 disparity13 can13 be13 constructed13 from13 a13 fixed13 set13 of13 simple13 and13 complex13 cells13 simply13 by13 changing13 the13 weights13 13 13 Derivation13 of13 optimal13 weights13 on13 simple13 and13 complex13 cells13 The13 first13 term13 in13 Eq13 513 (ie13 the13 linear13 term)13 can13 be13 conceptualized13 as13 a13 weighted13 sum13 of13 simple13 cell13 population13 responses13 The13 second13 and13 third13 terms13 (ie13 the13 quadratic13 terms)13 can13 be13 conceptualized13 as13 weighted13 sums13 of13 complex13 cell13 population13 responses13 The13 constant13 corresponds13 to13 baseline13 response13 The13 constant13 and13 the13 weights13 on13 the13 linear13 squared13 and13 sum-shy‐squared13 filters13 are13 given13 by13

wik = Ck

minus1uk 13 13 13 13 13 13 13 13 13 (S1a)13

13 wiik = minusdiag(Cminus1)+ 05Cminus11 13 13 13 13 13 13 13 (S1b)13

wij k = minus05Cij

minus1 forallij j gt i 13 13 13 13 13 13 13 13 (S1c)13

cons primetk = minus05ukTCk

minus1uk + constk 13 13 13 13 13 13 (S1d)13 13 where13 I13 is13 the13 identity13 matrix13 113 is13 the13 lsquoonesrsquo13 vector13 and13 diag()13 sets13 a13 matrix13 diagonal13 to13 a13 vector13 Here13 we13 derive13 these13 weight13 equations13 based13 on13 the13 multi-shy‐dimensional13 Gaussian13 approximations13 to13 the13 empirically13 determined13 filter13 response13 distributions13 (ie13 conditional13 response13 distributions)13 (see13 Fig13 4a)13 13 13 13 13 13

213 13

First13 we13 rewrite13 Eq13 513 in13 the13 main13 text13 by13 expanding13 its13 third13 term13 13

ln p R δ k( ) = wik Ri

i=1

n

sum + wiik Ri2

i=1

n

sum + wij k Ri2 + 2Ri Rj + Rj

2( )j=i+1

n

sumi=1

nminus1

sum + cons primet 13 (S2)13

13 Second13 we13 rewrite13 Eq13 413 in13 the13 main13 text13

13

ln p R δ k( ) = minus05 R minus uk( )T Ak R minus uk( )+ const 13 13 13 13 13 (S3)13

13 where13 Ak 13 equals13 the13 inverse13 covariance13 matrix13 Ck

minus1 13 (Recall13 that13 the13 mean13 vector13 uk and13 covariance13 matrix13 Ck are13 obtained13 by13 projecting13 a13 collection13 of13 natural13 stereo-shy‐images13 having13 disparity13 δ k 13 onto13 the13 filters)13 From13 here13 forward13 the13 disparity13 index13 k13 is13 dropped13 for13 notational13 simplicity13 Third13 we13 set13 Eqs13 S213 and13 S313 equal13 to13 one13 another13 multiply13 through13 and13 collect13 terms13 The13 terms13 of13 the13 log13 likelihood13 (Eq13 S3)13 are13 found13 to13 have13 the13 form13 13

( ) ( )2 2 205 05 2ii i i ii i ii i i ii ia R u a R a Ru a uminus minus = minus minus + 13 13 13 13 (S4a)13

minusaij Ri minus ui( ) Rj minus u j( ) = minus aijRiRj minus aijRiu j minus aijRjui + aiju jui( ) 13 13 13 (S4b)13

13 where13 iia 13 are13 the13 on-shy‐diagonal13 elements13 of13 A 13 ija are13 the13 off-shy‐diagonal13 elements13 and13 iu 13 the13 are13 the13 mean13 filter13 responses13 13 13 The13 aim13 is13 now13 to13 express13 wi wii 13 and13 wij 13 in13 terms13 of13 iu 13 iia 13 and13 ija 13 First13 consider13 wi 13 In13 this13 case13

we13 want13 to13 collect13 terms13 containing13 iR 13 wiRi 13 from13 Eq13 S213 and13 aiiRiui 13 and13 aijRiu j 13 from13 Eq13 S4a13 Setting13 the13 iR 13 terms13 equal13 and13 canceling13 gives13 13

wi = aiju jj=1

n

sum 13 13 13 13 13 13 13 13 13 (S5a)13

13 Note13 that13 the13 weights13 on13 the13 linear13 filter13 responses13 wi 13 are13 zero13 if13 the13 mean13 linear13 filter13 responses13 ui 13 are13 zero13 In13 our13 case13 the13 filter13 response13 distributions13 means13 (ie13 the13 means13 of13 the13 conditional13 response13 distributions)13 are13 all13 near13 zero13 (see13 Fig13 4a13 Fig13 S2)13 Thus13 the13 weights13 on13 the13 simple13 cell13 responses13 (Fig13 6a)13 contribute13 little13 to13 the13 units13 (LL13 neurons)13 that13 respond13 according13 to13 the13 log13 likelihood13 of13 the13 linear13 filter13 population13 response13 (see13 Discussion)13 13 Next13 consider13 wij 13 i ne j 13 In13 this13 case13 we13 want13 to13 collect13 terms13 containing13 RiRj 13 2wijRiRj 13 appears13

in13 Eq13 S213 and13 minusaijRiRj 13 appears13 in13 Eq13 S4b13 Setting13 the13 two13 RiRj 13 terms13 equal13 to13 each13 other13 and13 canceling13 gives13 13

wij = minus05aij 13 13 13 13 13 13 13 13 13 (S5b)13

313 13

Next13 consider13 wii 13 In13 this13 case13 the13 total13 weight13 on13 2iR 13 in13 Eq13 S213 must13 be13 05 iiaminus 13 The13 weight13 from13

the13 third13 term13 of13 Eq13 S213 can13 be13 found13 by13 substituting13 wij = minus05aij 13 Subtracting13 that13 weight13 from13 the13

total13 weight13 specified13 by13 Eq13 S413 gives13 wii 13 13

wii = minusaii + 05 aijjsum 13 13 13 13 13 13 13 13 13 (S5c)13

13 Last13 by13 grouping13 terms13 from13 Eq13 S413 that13 are13 independent13 of13 the13 filter13 response13 we13 find13 the13 constant13 in13 Eq13 S213 is13 given13 by13 13

cons primet = minus05 aiimicroi2 + aijmicroimicro j

j foralljneisum

⎝⎜⎞

⎠⎟isum + const 13 13 13 13 (S5d)13

13 Eqs13 S1a-shy‐d13 are13 obtained13 by13 rewriting13 Eqs13 S5a-shy‐d13 in13 matrix13 notation13 13 13 In13 visual13 cortex13 simple13 and13 complex13 cells13 have13 the13 same13 maximum13 firing13 rates13 on13 average13 (Albrecht13 amp13 Hamilton13 1982)13 However13 the13 maximum13 possible13 response13 of13 the13 squared13 filters13 (see13 the13 second13 term13 in13 Eq13 5)13 is13 R2max 13 and13 the13 maximum13 possible13 response13 of13 the13 sum-shy‐squared13 filters13 (see13 the13 third13

term13 in13 Eq13 5)13 is13 given13 by13 R2ijmax = Rmax 2+ 2fi sdot f j( )2 13 where13 fi 13 and13 f j 13 are13 the13 weighting13 functions13 defining13 filters13 i13 and13 j13 (see13 Fig13 3a)13 Thus13 if13 these13 computations13 were13 implemented13 in13 cortexmdashthat13 is13 if13 the13 maximum13 response13 of13 the13 complex13 cells13 equaled13 the13 maximum13 response13 of13 the13 simple13 cells13 Rmax 13 rather13 than13 R

2max 13 and13 R

2ijmaxmdashthe13 weights13 on13 the13 complex13 cell13 responses13 must13 be13 scaled13

Thus13 the13 optimal13 weights13 on13 complex13 cells13 are13 given13 by13 primewi = wi 13 primewii = Rmaxwii 13 and13 primewij =Rijmax2

Rmaxwij 13

These13 scale13 factors13 are13 important13 because13 in13 a13 neurophysiological13 experiment13 one13 would13 measure13 primew 13 not13 w 13 Consequently13 primewii 13 and13 primewij 13 are13 the13 normalized13 weights13 plotted13 in13 Fig13 7b13

13 Evaluating13 the13 effect13 of13 surface13 slant13 Slant13 causes13 disparity13 to13 change13 across13 the13 patch13 We13 evaluated13 the13 effect13 of13 surface13 slant13 by13 determining13 the13 optimal13 filters13 and13 estimation13 performance13 for13 a13 training13 set13 with13 a13 distribution13 of13 slants13 If13 we13 assume13 (plausibly)13 that13 objects13 are13 on13 average13 equally13 likely13 to13 be13 viewed13 from13 any13 given13 angle13 then13 for13 a13 given13 small13 angular13 retinal13 extent13 the13 probability13 of13 a13 given13 slant13 about13 the13 vertical13 axis13 should13 fall13 as13 a13 half-shy‐cosine13 function13 Slant13 was13 calculated13 for13 the13 cyclopean13 eye13 To13 evaluate13 the13 effect13 of13 surface13 slant13 variation13 we13 generated13 a13 training13 set13 of13 stereo-shy‐images13 from13 a13 set13 of13 surfaces13 with13 slants13 that13 were13 randomly13 drawn13 from13 a13 truncated13 (+7113 deg)13 cosine13 probability13 density13 (more13 extreme13 slants13 were13 not13 practical13 to13 include13 because13 of13 the13 surface13 size13 required13 to13 produce13 a13 projected13 patch13 width13 of13 113 deg)13 The13 within-shy‐patch13 changes13 in13 disparity13 that13 were13 introduced13 by13 these13 slants13 are13 comparable13 to13 those13 in13 natural13 viewing13 (Fig13 S1g)13 Results13 based13 on13 this13 analysis13 are13 thus13 likely13 to13 be13 representative13 Then13 we13 determined13 the13 optimal13 binocular13 filters13 for13 this13 new13 training13 set13 The13 filters13 that13 were13 trained13 on13 surfaces13 having13 different13 slants13 (Fig13 S1a-shy‐c)13 and13 the13 LL13 neurons13 that13 are13 constructed13 from13 them13 (Fig13 S1de)13 are13 quite13 similar13 to13 those13 that13 were13 trained13 only13 on13 fronto-shy‐parallel13 surfaces13 The13 main13 (although13 slight)13 differences13 are13 that13 the13 new13 filters13 have13 spatial13 extents13 that13 are13 slightly13 smaller13 and13 octave13 bandwidths13 that13 are13 slightly13 higher13

413 13

and13 that13 the13 new13 LL13 neuron13 tuning13 curves13 are13 broader13 Disparity13 estimation13 performance13 on13 slant13 surfaces13 is13 also13 comparable13 but13 precision13 is13 somewhat13 reduced13 (Fig13 S1f13 Fig13 5d)13 When13 disparity13 changes13 across13 the13 patch13 the13 optimal13 encoding13 filters13 must13 be13 small13 enough13 to13 prevent13 the13 disparity13 signal13 from13 being13 lsquosmeared13 outrsquo13 by13 within-shy‐patch13 (ie13 within-shy‐receptive13 field)13 disparity13 variation13 And13 they13 must13 be13 large13 enough13 to13 prevent13 a13 poor13 signal-shy‐to-shy‐noise13 ratio13 which13 would13 lead13 to13 a13 preponderance13 of13 false13 matches13 (ie13 inaccurate13 disparity13 estimates)13 The13 optimal13 filters13 appear13 to13 strike13 this13 balance13 13 13 Evaluating13 the13 effect13 of13 depth13 variations13 other13 than13 slant13 Naturally13 occurring13 fine13 depth13 structure13 and13 occlusions13 (ie13 within-shy‐patch13 depth13 variation13 and13 depth13 discontinuities13 Fig13 S8a-shy‐c)13 were13 not13 present13 in13 either13 of13 our13 training13 sets13 To13 determine13 how13 the13 magnitude13 of13 these13 other13 sources13 of13 depth13 variation13 compare13 to13 that13 due13 to13 slant13 we13 analyzed13 4000013 113 deg13 patches13 from13 3013 outdoor13 range13 images13 obtained13 with13 a13 high-shy‐precision13 Riegl13 VZ40013 range13 scanner13 For13 each13 patch13 we13 computed13 the13 standard13 deviation13 of13 i)13 the13 differences13 between13 the13 raw13 range13 values13 and13 the13 best-shy‐fitting13 fronto-shy‐parallel13 plane13

σ nonminus planar 13 and13 ii)13 the13 differences13

between13 the13 best-shy‐fitting13 slanted13 and13 fronto-shy‐parallel13 planes13 σ planar 13 The13 ratio13 of13 these13 standard13

deviations13 σ planar σ nonminus planar 13 provides13 a13 measure13 of13 the13 relative13 magnitude13 of13 depth13 variation13 due13 to13

slant13 and13 variation13 due13 to13 other13 factors13 If13 the13 depth13 variation13 is13 due13 only13 to13 surface13 slant13 then13 the13 ratio13

σ planar σ nonminus planar 13 will13 have13 a13 value13 equal13 to13 1013 If13 all13 of13 the13 depth13 variation13 is13 due13 to13 factors13

other13 than13 slant13 the13 ratio13 will13 have13 a13 value13 of13 0013 Fig13 S8d13 plots13 the13 distribution13 of13 this13 ratio13 for13 all13 range13 patches13 In13 the13 most13 common13 patches13 essentially13 all13 the13 variation13 is13 captured13 by13 slant13 (ie13 the13 modal13 ratio13 is13 ~10)13 The13 median13 ratio13 is13 07513 indicating13 that13 in13 the13 median13 patch13 7513 of13 the13 depth13 variation13 is13 captured13 by13 slant13 Given13 that13 the13 optimal13 binocular13 filters13 are13 robust13 to13 variation13 in13 surface13 slant13 and13 given13 that13 most13 of13 natural13 depth13 variation13 is13 captured13 by13 slant13 it13 is13 unlikely13 that13 depth13 variation13 other13 than13 slant13 will13 qualitatively13 affect13 our13 results13 13 External13 variability13 and13 neural13 noise13 13 We13 examined13 the13 relative13 magnitude13 of13 stimulus-shy‐induced13 LL13 neuron13 variability13 and13 typical13 neural13 noise13 in13 cortex13 Typical13 cortical13 neurons13 have13 a13 peak13 mean13 response13 larger13 than13 3013 spikessec13 (Geisler13 amp13 Albrecht13 1997)13 and13 intrinsic13 noise13 variance13 proportional13 to13 the13 neuronrsquos13 mean13 response13 with13 a13 proportionality13 constant13 (ie13 Fano13 factor)13 less13 than13 15(Tolhurst13 Movshon13 amp13 Dean13 1983)13 To13 err13 on13 the13 side13 of13 too13 much13 neural13 noise13 we13 assume13 that13 each13 LL13 neuron13 has13 a13 maximum13 average13 response13 of13 3013 spkss13 and13 a13 Fano13 factor13 of13 1513 (bc13 the13 effect13 of13 noise13 is13 greater13 at13 lower13 firing13 rates)13 We13 convert13 the13 LL13 neuron13 responses13 to13 spike13 rates13 by13 first13 obtaining13 the13 average13 response13 of13 each13 LL13 neuron13 to13 its13 preferred13 disparity13 and13 then13 scaling13 that13 value13 to13 the13 maximum13 average13 response13 For13 an13 integration13 time13 of13 a13 typical13 fixation13 duration13 (30013 ms)13 stimulus-shy‐induced13 response13 variance13 was13 ~3x13 greater13 than13 the13 internal13 noise13 For13 an13 integration13 time13 equal13 to13 a13 typical13 stimulus13 presentation13 in13 a13 psychophysical13 task13 (113 sec)13 13 stimulus-shy‐induced13 response13 variance13 was13 ~9x13 greater13 than13 the13 internal13 noise13 Thus13 external13 variability13 must13 be13 considered13 to13 understand13 the13 influence13 of13 neural13 noise13 on13 the13 behavioral13 limits13 of13 vision13 under13 natural13 conditions13 In13 fact13 stimulus-shy‐induced13 variability13 is13 likely13 to13 be13 the13 controlling13 variability13 in13 many13 natural13 single-shy‐trial13 tasks13 13 Spatial13 frequency13 tuning13 of13 optimal13 linear13 binocular13 filters13 13 To13 develop13 an13 intuition13 for13 why13 spatial13 frequencies13 higher13 than13 ~613 cpd13 carry13 little13 information13 about13 disparity13 we13 examined13 the13 disparity13 signals13 resulting13 from13 a13 binocularly-shy‐viewed13 high-shy‐contrast13 cosine-shy‐windowed13 edge13 The13 Fourier-shy‐decomposition13 of13 each13 eyersquos13 stimulus13 is13 a13 set13 of13

513 13

phase-shy‐aligned13 sinewave13 gratings13 with13 the13 1f13 contrast13 fall-shy‐off13 typical13 of13 natural13 scenes(Field13 1987)13 The13 binocular13 difference13 signal13 (ie13 the13 difference13 of13 the13 left13 and13 right13 signals)13 is13 sinusoidal13 with13 contrast13 amplitude13 given13 by13 13

AB f δ k( ) = AL( f )2 + AR( f )2 minus 2AL( f )AR( f )cos 2π fδ k( ) 13 13 13 (S6)13

13 where f 13 is13 the13 spatial13 frequency13 δ k 13 is13 a13 particular13 disparity13 AL ( f ) 13 and13 AR( f ) 13 are13 the13 left13 and13

right13 eye13 retinal13 amplitudes13 and13 AB f δ k( ) 13 is13 the13 amplitude13 of13 the13 binocular13 contrast13 difference13 signal13 If13 the13 two13 eyes13 have13 identical13 optics13 the13 left-shy‐13 and13 right-shy‐eye13 retinal13 amplitudes13 (which13 include13 the13 effects13 of13 the13 optics13 and13 the13 1 f 13 falloff13 of13 natural13 image13 spectra)13 will13 be13 the13 same13 for13 a13 given13 disparity13 Fig13 S7a13 plots13 pattern13 of13 binocular13 contrast13 signals13 for13 each13 of13 several13 disparities13 Fig13 S7b13 shows13 the13 signals13 after13 filtering13 with13 a13 bank13 of13 log-shy‐Gabor13 filters13 each13 having13 a13 1513 octave13 bandwidth13 The13 patterns13 of13 binocular13 contrast13 barely13 differ13 above13 ~613 cpd13 A13 filter13 population13 (for13 disparity13 estimation)13 will13 be13 most13 efficient13 if13 each13 filter13 signals13 information13 about13 a13 range13 of13 disparities13 13 13 V113 binocular13 receptive13 fields13 tuned13 to13 the13 highest13 useful13 spatial13 frequency13 (~613 cpd)13 would13 have13 a13 spatial13 extent13 of13 ~813 arcmin13 assuming13 the13 typical13 1513 octave13 bandwidth13 of13 neurons13 in13 cortex13 Given13 that13 receptive13 fields13 cannot13 signal13 disparity13 variation13 finer13 than13 their13 own13 width13 this13 analysis13 may13 help13 explain13 why13 humans13 lose13 the13 ability13 to13 detect13 disparity13 modulations13 (ie13 sinusoidal13 modulations13 in13 disparity-shy‐defined13 depth)13 at13 spatial13 frequencies13 higher13 than13 ~413 cpd13 (Banks13 Gepshtein13 amp13 Landy13 200413 Harris13 McKee13 amp13 Smallman13 1997)13 Vergence13 eye13 movement13 jitter13 has13 previously13 been13 proposed13 as13 an13 explanation13 for13 why13 useful13 binocular13 information13 is13 restricted13 to13 low13 spatial13 frequencies13 (Vlaskamp13 Yoon13 amp13 Banks13 2011)13 vergence13 noise13 can13 degrade13 stereo13 information13 enough13 to13 render13 high13 frequency13 signals13 unmeasurable13 (Fig13 S7cd)13 The13 analysis13 presented13 here13 suggests13 that13 the13 frequency13 at13 which13 humans13 lose13 the13 ability13 to13 detect13 disparity13 modulation13 may13 simply13 be13 explained13 by13 the13 statistics13 of13 natural13 images13 13 Optics13 The13 retinal13 images13 used13 in13 our13 analysis13 were13 simulated13 to13 match13 the13 retinal13 image13 formation13 that13 occurs13 in13 the13 human13 eye13 In13 humans13 and13 non-shy‐human13 primates13 accommodative13 and13 vergence13 eye-shy‐movement13 systems13 are13 yoked(Fincham13 amp13 Walton13 1957)13 Consistent13 with13 this13 fact13 we13 set13 the13 focus13 distance13 equal13 to13 the13 fixation13 distance13 of13 4013 cm13 and13 defocused13 the13 images13 appropriate13 for13 each13 disparity13 The13 image13 of13 a13 surface13 displaced13 from13 the13 fixation13 is13 defocused13 by13

13 ΔD cong δ IPD 13 13 13 13 13 13 13 13 13 13 (S7)13

13 where13 δ 13 is13 the13 disparity13 of13 the13 surface13 expressed13 in13 radians13 IPD13 is13 the13 inter-shy‐pupillary13 distance13 expressed13 in13 meters13 and13 ΔD 13 is13 the13 defocus13 expressed13 in13 diopters13 (1meters)13 To13 account13 for13 the13 effect13 of13 chromatic13 aberration13 we13 created13 a13 different13 polychromatic13 point-shy‐spread13 function13 (PSF)13 for13 each13 disparity13 (ie13 defocus13 Eq13 S7)13 Single-shy‐wavelength13 PSFs13 were13 computed13 every13 1013 nm13 between13 40013 and13 70013 nm13 The13 wavelength-shy‐dependent13 change13 in13 refractive13 power13 of13 the13 human13 eye13 was13 taken13 from13 the13 literature13 (Thibos13 Ye13 Zhang13 amp13 Bradley13 1992)13 Polychromatic13 PSFs13 were13 obtained13 by13 weighting13 the13 single-shy‐wavelength13 PSFs13 by13 the13 photopic13 sensitivity13 function13 (Stockman13 amp13 Sharpe13 2000)13 and13 by13 the13 D6513 daylight13 illumination13 spectrum13 and13 then13 summing13 13

613 13

13 psf photopic xΔD( ) = 1

Kpsf xλΔD( )sc

λsum λ( )D65 λ( ) 13 13 13 13 (S8)13

13 where13 sc λ( ) 13 is13 the13 human13 photopic13 sensitivity13 function13 and13 K13 is13 a13 normalizing13 constant13 that13 sets13 the13 volume13 of13 psf photopic 13 to13 1013 The13 modulation13 transfer13 function13 (MTF)13 is13 the13 amplitude13 of13 the13 Fourier13 transform13 of13 the13 PSF13 13 Contrast13 normalization13 Contrast13 normalization13 occurs13 early13 in13 visual13 processing13 The13 standard13 model13 of13 cortical13 neuron13 response13 assumes13 that13 the13 input13 is13 the13 contrast13 signal13

c x( ) = r x( )minus r( ) r 13 where13 r x( ) 13 are13 the13

noisy13 sensor13 responses13 to13 the13 luminance13 image13 falling13 on13 the13 retina13 over13 a13 local13 area13 and13 r 13 is13 the13 local13 mean13 This13 contrast13 signal13 is13 then13 normalized13 by13 the13 local13 contrast13

cnorm x( ) = c x( ) c x( ) 2

+ nc502 13 where13 n13 is13 the13 dimensionality13 of13 the13 vector13 c x( ) 2 13 is13 the13 contrast13

energy13 (or13 equivalently13 n 13 times13 the13 squared13 RMS13 contrast)13 and13 50c 13 is13 the13 half-shy‐saturation13 constant13 (Albrecht13 amp13 Geisler13 199113 Albrecht13 amp13 Hamilton13 198213 Heeger13 1992)13 (We13 are13 agnostic13 about13 the13 mechanism13 by13 which13 the13 contrast13 normalization13 is13 achieved)13 Finally13 the13 normalized13 contrast13 signal13 is13 weighted13 by13 the13 cortical13 neuronrsquos13 receptive13 field13 and13 passed13 through13 a13 static13 non-shy‐linearity13 For13 example13 in13 the13 case13 of13 a13 binocular13 simple13 cell13 the13 response13 of13 the13 neuron13 is13 obtained13 by13 taking13 the13 dot13 product13 of13 the13 contrast-shy‐normalized13 signal13 with13 the13 receptive13 field13 and13 half-shy‐wave13 rectifying13

Ri = Rmax cnorm x( ) sdot fi x( )⎢⎣ ⎥⎦ 13 13

13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13

713 13

13 References13 Albrecht13 D13 G13 amp13 Geisler13 W13 S13 (1991)13 Motion13 selectivity13 and13 the13 contrast-shy‐response13 function13 of13

simple13 cells13 in13 the13 visual13 cortex13 Visual13 Neuroscience13 7(6)13 531ndash54613 Albrecht13 D13 G13 amp13 Hamilton13 D13 B13 (1982)13 Striate13 cortex13 of13 monkey13 and13 cat13 contrast13 response13

function13 Journal13 of13 Neurophysiology13 48(1)13 217ndash23713 Banks13 M13 S13 Gepshtein13 S13 amp13 Landy13 M13 S13 (2004)13 Why13 is13 spatial13 stereoresolution13 so13 low13 The13

Journal13 of13 Neuroscience13 24(9)13 2077ndash208913 13 Field13 D13 J13 (1987)13 Relations13 between13 the13 statistics13 of13 natural13 images13 and13 the13 response13 properties13 of13

cortical13 cells13 Journal13 of13 the13 Optical13 Society13 of13 America13 A13 4(12)13 2379ndash239413 Fincham13 E13 amp13 Walton13 J13 (1957)13 The13 reciprocal13 actions13 of13 accommodation13 and13 convergence13 The13

Journal13 of13 Physiology13 137(3)13 488ndash50813 Geisler13 W13 S13 amp13 Albrecht13 D13 G13 (1997)13 Visual13 cortex13 neurons13 in13 monkeys13 and13 cats13 detection13

discrimination13 and13 identification13 Visual13 Neuroscience13 14(5)13 897ndash91913 Harris13 J13 M13 McKee13 S13 P13 amp13 Smallman13 H13 S13 (1997)13 Fine-shy‐scale13 processing13 in13 human13 binocular13

stereopsis13 Journal13 of13 the13 Optical13 Society13 of13 America13 A13 14(8)13 1673ndash168313 Heeger13 D13 J13 (1992)13 Normalization13 of13 cell13 responses13 in13 cat13 striate13 cortex13 Visual13 Neuroscience13 9(2)13

181ndash19713 Hibbard13 P13 B13 (2008)13 Binocular13 energy13 responses13 to13 natural13 images13 Vision13 Research13 48(12)13 1427ndash

143913 13 Stockman13 A13 amp13 Sharpe13 L13 T13 (2000)13 The13 spectral13 sensitivities13 of13 the13 middle-shy‐13 and13 long-shy‐wavelength-shy‐

sensitive13 cones13 derived13 from13 measurements13 in13 observers13 of13 known13 genotype13 Vision13 Research13 40(13)13 1711ndash173713

Thibos13 L13 N13 Ye13 M13 Zhang13 X13 amp13 Bradley13 A13 (1992)13 The13 chromatic13 eye13 a13 new13 reduced-shy‐eye13 model13 of13 ocular13 chromatic13 aberration13 in13 humans13 Applied13 Optics13 31(19)13 3594ndash360013

Tolhurst13 D13 J13 Movshon13 J13 A13 amp13 Dean13 A13 F13 (1983)13 The13 statistical13 reliability13 of13 signals13 in13 single13 neurons13 in13 cat13 and13 monkey13 visual13 cortex13 Vision13 Research13 23(8)13 775ndash78513

Vlaskamp13 B13 N13 S13 Yoon13 G13 amp13 Banks13 M13 S13 (2011)13 Human13 stereopsis13 is13 not13 limited13 by13 the13 optics13 of13 the13 well-shy‐focused13 eye13 The13 Journal13 of13 Neuroscience13 31(27)13 9814ndash981813

13

813 13

13 Figure13 S113 Optimal13 filters13 and13 disparity13 estimation13 performance13 for13 estimating13 disparity13 for13 surfaces13 with13 a13 cosine13 distribution13 of13 slants13 (a)13 Optimal13 linear13 binocular13 filters13 for13 estimating13 the13 disparity13 (of13 the13 center13 pixel)13 of13 slanted13 surfaces13 The13 filters13 were13 learned13 on13 a13 training13 set13 having13 surfaces13 with13 a13 cosine13 distribution13 of13 slants13 The13 optimal13 filter13 shapes13 are13 robust13 to13 the13 presence13 of13 non-shy‐zero13 surface13 slants13 in13 the13 training13 set13 However13 these13 filters13 are13 of13 somewhat13 smaller13 spatial13 extent13 and13 slightly13 higher13 octave13 bandwidth13 (compare13 filters13 F3-shy‐F513 to13 those13 in13 Fig13 3)13 (b)13 Spatial13 frequency13 tuning13 and13 bandwidth13 of13 filters13 (c)13 Two-shy‐dimensional13 analogs13 of13 filters13 in13 (a)13 The13 two-shy‐dimensional13 filters13 respond13 virtually13 identically13 to13 two-shy‐dimensional13 images13 as13 do13 the13 one-shy‐dimensional13 filters13 to13 one-shy‐dimensional13 signals13 (d)13 Tuning13 curves13 of13 LL13 neurons13 trained13 and13 tested13 on13 surfaces13 with13 varying13 surface13 slant13 (e)13 Normalized13 weights13 on13 model13 complex13 cell13 responses13 for13 constructing13 five13 LL13 neurons13 with13 the13 indicated13 preferred13 disparities13 (f)13 Disparity13 estimation13 performance13 with13 surfaces13 having13 a13 cosine13 distribution13 of13 slants13 (black13 line)13 Also13 shown13 are13 disparity13 estimates13 derived13 from13 a13 local13 cross-shy‐correlator13 (gray13 line)13 (g)13 Disparity13 gradients13 from13 viewing13 natural13 scenes(Hibbard13 2008)13 vs13 disparity13 gradients13 in13 our13 training13 set13 The13 relationship13 between13 slant13 and13 disparity13 gradient13 is13 given13 by13

g asymp IPD tanθ( ) d 13 where13 θ 13 is13 the13 surface13 slant13 and13 d13 is13 the13 distance13 to13 the13 surface13 Hibbard13 (2008)13 estimated13 disparity13 gradients13 in13 natural13 viewing13 for13 distances13 nearer13 than13 3513 m13 (dashed13 curve)13 and13 0513 m13 (dotted13 curve)13 At13 the13 04m13 viewing13 distance13 assumed13 throughout13 the13 paper13 our13 manipulation13 of13 surface13 slant13 introduced13 distribution13 of13 disparity13 gradients13 (black13 line)13 that13 is13 comparable13 to13 the13 gradients13 introduced13 in13 natural13 viewing13 13

913 13

13 Figure13 S213 Filter13 responses13 LL13 neuron13 responses13 and13 performance13 with13 random-shy‐dot13 stereograms13 and13 anti-shy‐correlated13 random-shy‐dot13 stereograms13 (a)13 Joint13 response13 distributions13 of13 filters13 F113 and13 F213 when13 stimulated13 with13 random-shy‐dot13 stereograms13 Compare13 to13 natural13 stimuli13 in13 Fig13 4a13 (b)13 Joint13 responses13 of13 filters13 F313 and13 F413 when13 stimulated13 with13 random-shy‐dot13 stereograms13 (c)13 LL13 neuron13 responses13 to13 random13 dot13 stereograms13 (d)13 LL13 neuron13 tuning13 vs13 bandwidth13 when13 stimulated13 with13 natural13 stimuli13 Note13 that13 bandwidths13 are13 elevated13 compared13 to13 when13 the13 LL13 neurons13 are13 stimulated13 with13 natural13 stimuli13 (see13 Fig13 7c)13 (e)13 Disparity13 estimates13 and13 estimate13 precision13 (inset)13 obtained13 with13 random-shy‐dot13 stereograms13 (fghi)13 Same13 as13 abce13 except13 that13 the13 stimuli13 were13 anti-shy‐correlated13 random13 dot13 stereograms13 Filter13 response13 distributions13 still13 segregate13 as13 a13 function13 of13 disparity13 (especially13 with13 F313 and13 F4)13 However13 the13 anti-shy‐correlated13 filter13 response13 distributions13 do13 not13 coincide13 with13 the13 lsquoexpectedrsquo13 pattern13 of13 response13 for13 particular13 disparities13 The13 disparities13 are13 decoded13 improperly13 with13 a13 decoder13 trained13 on13 natural13 stimuli13 LL13 neuron13 tuning13 curves13 with13 anti-shy‐correlated13 stereograms13 are13 neither13 selective13 nor13 invariant13 Inaccurate13 highly13 variable13 estimates13 result13 13 13

13 13

1013 13

13 Figure13 S313 Obtaining13 AMA13 filter13 and13 model13 complex13 cell13 responses13 with13 established13 neural13 operations13 (a)13 Simple13 cell13 responses13 are13 obtained13 by13 projecting13 the13 left-shy‐13 and13 right-shy‐eye13 signals13 onto13 each13 binocular13 filter13 and13 half-shy‐wave13 rectifying13 Subtracting13 lsquoonrsquo13 and13 lsquooffrsquo13 simple13 cell13 responses13 give13 the13 AMA13 filter13 responses13 (see13 Fig13 3a)13 (b)13 Model13 (and13 neurophysiological)13 complex13 cell13 responses13 are13 obtained13 by13 summing13 and13 squaring13 the13 responses13 of13 simple13 cells13 (Model13 complex13 cell13 responses13 can13 also13 be13 obtained13 by13 summing13 the13 responses13 of13 simple13 cells13 that13 have13 both13 a13 half-shy‐wave13 rectifying13 and13 a13 squaring13 nonlinearity)13 Note13 that13 the13 model13 complex13 cells13 differ13 from13 canonical13 disparity13 energy13 binocular13 complex13 cells13 We13 label13 them13 lsquocomplexrsquo13 because13 they13 exhibit13 temporal13 frequency13 doubling13 and13 have13 a13 fundamental13 to13 DC13 response13 ratio13 of13 less13 than13 one13 The13 model13 LL13 neurons13 (see13 Fig13 613 and13 Fig13 7)13 are13 more13 similar13 to13 disparity13 energy13 complex13 cells13 although13 as13 noted13 in13 the13 text13 there13 are13 also13 important13 differences13

13

1113 13

13 Figure13 S413 The13 eight13 AMA13 filters13 from13 Fig13 313 (on13 the13 diagonal)13 and13 their13 pairwise13 sums13 (see13 Fig13 S3a)13 Solid13 and13 dashed13 curves13 indicate13 the13 left13 and13 right13 eye13 filter13 components13 respectively13 Filter13 identity13 is13 indicated13 by13 the13 subscripts13 Fij13 Some13 filters13 have13 shapes13 similar13 to13 canonical13 V1-shy‐like13 binocular13 receptive13 fields13 These13 filters13 yield13 model13 complex13 cell13 tuning13 curves13 that13 vary13 with13 disparity13 (Fig13 S5)13 Other13 filters13 are13 irregularly13 shaped13 (eg13 F36)13 Irregularly13 shaped13 filters13 yield13 model13 complex13 cell13 tuning13 curves13 that13 do13 not13 vary13 significantly13 with13 disparity13 (Fig13 S5)13 The13 inset13 shows13 the13 largest13 absolute13 normalized13 weight13 (see13 Supplement13 Eqs13 S1a-shy‐d)13 given13 to13 each13 filter13 in13 constructing13 the13 set13 of13 LL13 neurons13 (Fig13 7a)13 Lighter13 colors13 indicate13 smaller13 weights13 Unsurprisingly13 the13 irregularly13 shaped13 filters13 (eg13 F36)13 play13 a13 generally13 smaller13 role13 in13 shaping13 the13 response13 properties13 of13 log-shy‐likelihood13 (LL)13 neurons13 13 13

1213 13

13 Figure13 S513 Disparity13 tuning13 curves13 of13 the13 model13 complex13 cells13 implied13 by13 the13 AMA13 filters13 (on-shy‐diagonal)13 and13 their13 normalized13 pairwise13 sums13 (off-shy‐diagonal)13 Solid13 curves13 show13 the13 complex13 cell13 tuning13 curves13 to13 natural13 stimuli13 The13

complex13 response13 is13 given13 by13 filtering13 the13 contrast-shy‐normalized13 stimulus13 and13 then13 squaring13 Rmax fi

Tcnorm( )2 13 (see13 Fig13 S4)13 Each13 point13 on13 the13 curve13 represents13 the13 average13 response13 to13 a13 large13 number13 of13 natural13 stimuli13 having13 the13 same13 disparity13 Gray13 area13 shows13 response13 variability13 due13 to13 variation13 in13 natural13 image13 content13 Dashed13 curves13 show13 the13 complex13 cell13 tuning13 curve13 to13 513 random-shy‐dot13 stereograms13 (RDSs)13 In13 most13 cases13 RDS13 tuning13 curves13 are13 similar13 to13 the13 tuning13 curves13 for13 natural13 stimuli13 although13 response13 magnitude13 is13 generally13 somewhat13 attenuated13 These13 individual13 model13 complex13 cells13 have13 some13 selectivity13 for13 disparity13 However13 they13 exhibit13 little13 invariance13 to13 variation13 in13 natural13 image13 content13 Thus13 the13 pattern13 of13 the13 joint13 population13 response13 must13 be13 used13 to13 accurately13 estimate13 disparity13 Neurons13 that13 signal13 the13 log-shy‐likelihood13 of13 disparity13 (LL13 neurons)13 can13 be13 constructed13 by13 appropriately13 combining13 the13 joint13 responses13 of13 model13 complex13 cells13 LL13 neurons13 are13 highly13 selective13 for13 disparity13 and13 are13 also13 strongly13 response13 invariant13 to13 irrelevant13

1313 13

13 Figure13 S613 Optimal13 disparity13 estimation13 vs13 estimation13 via13 local13 cross-shy‐correlation13 Performance13 measures13 were13 obtained13 by13 testing13 each13 method13 on13 identical13 sets13 of13 left-shy‐13 and13 right-shy‐eye13 signals13 like13 those13 shown13 in13 Fig13 1d13 (Note13 that13 this13 differs13 somewhat13 from13 the13 most13 common13 test13 of13 the13 cross-shy‐correlation13 model13 which13 gives13 the13 algorithm13 windowed13 access13 to13 many13 patches13 in13 each13 eye13 We13 chose13 the13 current13 method13 so13 that13 each13 algorithm13 has13 access13 to13 the13 same13 image13 information13 thereby13 making13 the13 comparison13 meaningful)13 Black13 curves13 indicate13 estimation13 performance13 via13 local13 cross-shy‐correlation13 Gray13 curves13 indicate13 performance13 obtained13 with13 the13 eight13 AMA13 filters13 shown13 in13 the13 main13 text13 (data13 replotted13 from13 Fig13 5)13 (a)13 Median13 MAP13 estimates13 for13 frontoparallel13 surfaces13 only13 Error13 bars13 indicate13 6813 confidence13 intervals13 (b)13 Estimate13 precision13 as13 indicated13 by13 6813 confidence13 intervals13 Error13 bars13 from13 (a)13 are13 replotted13 on13 a13 linear13 axis13 (Confidence13 intervals13 were13 plotted13 on13 a13 semi-shy‐log13 plot13 in13 the13 main13 text)13 (c)13 Sign13 identification13 performance13 with13 frontoparallel13 surfaces13 (de)13 Same13 as13 bc13 except13 that13 the13 performance13 is13 for13 surfaces13 with13 a13 cosine13 distribution13 of13 slants13 Estimation13 performance13 with13 the13 cross-shy‐correlator13 is13 (slightly)13 more13 biased13 and13 is13 significantly13 less13 precise13 at13 disparities13 larger13 than13 7513 arcmin13 13

1413 13

13 Figure13 S713 Information13 for13 disparity13 estimation13 is13 concentrated13 in13 low13 to13 mid13 spatial13 frequencies13 (a)13 Inter-shy‐ocular13 retinal13 contrast13 amplitude13 of13 a13 cosine-shy‐windowed13 high-shy‐contrast13 edge13 as13 a13 function13 of13 spatial13 frequency13 for13 different13 disparities13 (Eq13 S6)13 The13 optics13 in13 the13 two13 eyes13 were13 identical13 ie13

AL f( ) = AR f( )prop MTFδ f( ) f 13 where13 MTFδ is13 the13 modulation13 transfer13 function13 associated13 with13 a13 particular13 disparity13 (see13 Methods)13 (b)13 Inter-shy‐ocular13 contrast13 signals13 that13 have13 been13 passed13 through13 a13 bank13 of13 1513 octave13 bandwidth13 log-shy‐Gabor13 filters13 The13 signals13 differ13 little13 above13 ~513 cpd13 (c)13 The13 effect13 of13 vergence13 jitter13 on13 inter-shy‐ocular13 contrast13 signals13 Signals13 were13 corrupted13 by13 213 arcmin13 of13 vergence13 jitter13 a13 typical13 value13 in13 humans13 Vergence13 noise13 was13 simulated13 by13 convolving13 each13 disparity13 signal13 with13 a13 Gaussian13 having13 a13 standard13 deviation13 of13 213 arcmin13 Vergence13 jitter13 further13 attenuates13 high13 frequency13 inter-shy‐ocular13 contrast13 signals13 Natural13 image13 structure13 and13 vergence13 jitter13 both13 contribute13 to13 the13 lack13 of13 disparity13 information13 at13 high13 luminance13 spatial13 frequencies13 (d)13 The13 signals13 in13 (c)13 after13 having13 been13 passed13 through13 a13 bank13 of13 1513 octave13 bandwidth13 log-shy‐Gabor13 filters13 (e)13 The13 effect13 of13 a13 four13 diopter13 difference13 in13 refractive13 power13 between13 the13 two13 eyes13 drastically13 reduces13 the13 inter-shy‐ocular13 contrast13 signal13 A13 four13 diopter13 (or13 larger)13 difference13 between13 the13 left13 and13 right13 optics13 is13 a13 primary13 risk13 factor13 for13 the13 development13 of13 amblyopia13 (Levi13 et13 al13 2011)13 Note13 that13 these13 signals13 were13 simulated13 for13 eyes13 with13 2mm13 pupils13 With13 larger13 pupil13 diameters13 the13 same13 difference13 in13 refractive13 power13 would13 further13 reduce13 available13 disparity13 signals13 For13 example13 with13 a13 2mm13 pupil13 an13 813 diopter13 difference13 between13 the13 eyes13 produces13 the13 same13 difference13 in13 defocus13 blur13 as13 would13 a13 4mm13 pupil13 with13 a13 413 diopters13 difference13 between13 the13 eyes13 In13 that13 case13 there13 are13 no13 measureable13 signals13 above13 113 cpd13 (f)13 The13 signals13 in13 (e)13 after13 having13 been13 passed13 through13 a13 bank13 of13 1513 octave13 bandwidth13 log-shy‐Gabor13 filters13 (g)13 Same13 as13 (e)13 but13 with13 a13 4mm13 pupil13 (h)13 Same13 as13 (f)13 but13 with13 a13 4mm13 pupil13 13

13

1513 13

13 Figure13 S813 Depth13 structure13 not13 attributable13 to13 slant13 in13 natural13 scenes13 (a)13 A13 representative13 natural13 range13 image13 (b)13 Overhead13 plan13 view13 of13 the13 range13 along13 the13 white13 scan-shy‐line13 in13 (a)13 (c)13 Range13 values13 within13 a13 113 deg13 patch13 marked13 by13 the13 white13 box13 in13 (a)13 and13 the13 black13 box13 in13 (b)13 The13 dots13 represent13 the13 raw13 range13 values13 The13 plane13 is13 the13 best13 fitting13 plane13 to13 the13 range13 data13 (d)13 Depth13 variation13 in13 113 deg13 patches13 of13 natural13 scenes13 can13 largely13 be13 captured13 by13 slanted13 planes13 (see13 supplementary13 text)13 Values13 near13 one13 indicate13 that13 all13 range13 variation13 can13 be13 captured13 by13 (planar)13 surface13 slant13 Values13 near13 zero13 indicate13 that13 none13 of13 the13 range13 variation13 can13 be13 captured13 by13 planar13 surface13 slant13 In13 the13 most13 common13 patch13 nearly13 all13 depth13 variation13 is13 attributable13 to13 slant13 In13 the13 median13 patch13 7513 of13 depth13 variation13 is13 captured13 by13 slant13 13 13 13 13 13 13 13 13 13 13

1613 13

13 Figure13 S913 Contrast13 normalization13 and13 the13 Gaussian13 form13 of13 the13 filter13 response13 distributions13 The13 standard13 model13 of13

contrast13 normalization13 in13 retinal13 and13 cortical13 neurons13 is13 given13 by13 cnorm x( ) = c x( ) c x( ) 2

+ nc502 (Albrecht13 amp13 Geisler13

199113 Albrecht13 amp13 Hamilton13 198213 Heeger13 1992)13 Throughout13 the13 paper13 we13 assumed13 that13 c50 13 had13 a13 value13 of13 zero13 To13 examine13 the13 effect13 of13 different13 values13 of13 c50 13 on13 the13 form13 of13 the13 filter13 response13 distributions13 we13 first13 contrast-shy‐normalized13 the13 test13 stimuli13 using13 different13 values13 of13 c50 13 Then13 we13 projected13 the13 contrast-shy‐normalized13 the13 test13 stimuli13 from13 each13 disparity13 level13 onto13 the13 filters13 to13 obtain13 the13 conditional13 filter13 response13 distributions13 Finally13 we13 fit13 a13 multi-shy‐dimensional13 generalized13 Gaussian13 (one-shy‐dimension13 for13 each13 filter)13 to13 each13 conditional13 response13 distribution13

P R |δ k( ) 13 via13 maximum13 likelihood13 estimation13 (a)13 When13 c50 =13 0013 the13 conditional13 response13 distributions13 have13 tails13 that13 tend13 to13 be13 somewhat13 lighter13 than13 Gaussians13 (ie13 lower13 kurtosis13 than13 a13 Gaussian)13 When13 c50 =13 0113 the13 conditional13 response13 distributions13 are13 most13 Gaussian13 on13 average13 For13 c50 0113 the13 distributions13 have13 heavier13 tails13 than13 Gaussians13 (ie13 higher13 kurtosis13 than13 a13 Gaussian)13 (b)13 Powers13 of13 the13 multi-shy‐dimensional13 generalized13 Gaussians13 that13 best13 fit13 a13 subset13 of13 the13 two-shy‐dimensional13 conditional13 filter13 response13 distributions13 A13 power13 of13 two13 (dashed13 line)13 indicates13 that13 the13 best13 fitting13 distribution13 was13 Gaussian13 Powers13 greater13 than13 or13 less13 than13 two13 indicate13 that13 the13 responses13 are13 better13 fit13 by13 distributions13 having13 lighter13 or13 heavier13 tails13 than13 a13 Gaussian13 respectively13 The13 best-shy‐fitting13 powers13 are13 shown13 for13 three13 different13 values13 of13 c50 13 00013 01013 and13 02513 The13 average13 powers13 across13 the13 conditional13 response13 distributions13 for13 the13 three13 tested13 values13 of13 c50 13 were13 27913 20713 and13 11713 respectively13 The13 same13 qualitative13 results13 hold13 for13 the13 other13 two-shy‐dimensional13 response13 distributions13 The13 pattern13 also13 holds13 for13 higher-shy‐dimensional13 filter13 response13 distributions13 In13 higher13 dimensions13 however13 results13 are13 noisier13 because13 stable13 estimates13 of13 kurtosis13 require13 significantly13 more13 data13 13 13 13 13 13

  • jovi-14-02-01_118
    • Introduction
    • f01
    • Methods and results
    • e01
    • f02
    • e02
    • e03
    • e04
    • f03
    • e05
    • f04
    • f05
    • f06
    • f07
    • Discussion
    • Conclusions
    • Methods details
    • e06
    • e07
    • Albrecht1
    • Albrecht2
    • Anzai1
    • Badcock1
    • Banks1
    • Berens1
    • Blakemore1
    • Burge1
    • Burge2
    • Carandini1
    • Chen1
    • Cormack1
    • Cumming1
    • Cumming2
    • Cumming3
    • Curcio1
    • DeValois1
    • DeAngelis1
    • Ferster1
    • Fincham1
    • Fleet1
    • Ganguli1
    • Geisler1
    • Girshick1
    • Haefner1
    • Haefner2
    • Harris1
    • Hecht1
    • Heeger1
    • Hibbard1
    • Hoyer1
    • Jazayeri1
    • Kaye1
    • Landers1
    • Lehky1
    • Levi1
    • Lewicki1
    • Li1
    • Liu1
    • Ma1
    • Marcos1
    • Marr1
    • McKee1
    • Nakayama1
    • Nienborg1
    • Ogle1
    • Ohzawa1
    • Ohzawa2
    • Ohzawa3
    • Olshausen1
    • Panum1
    • Poggio1
    • Qian1
    • Qian2
    • Read1
    • Read2
    • Read3
    • Simoncelli1
    • Skottun1
    • Stevenson1
    • Stockman1
    • Tanabe1
    • Tanabe2
    • Thibos1
    • Tsao1
    • Tyler1
    • Tyler2
    • Vlaskamp1
    • Williams1
    • Wyszecki1
      • JOV-03741-2013-s01
Page 4: Optimal disparity estimation in natural stereo images · The local differences between the retinal images—the binocular disparities— ... vertically oriented receptive fields.The

Optimal linear binocular filters

Estimating disparity (solving the correspondenceproblem) is difficult because there often are multiplepoints in one eyersquos image that match a single point inthe other (Marr amp T Poggio 1976) Thereforeaccurate estimation requires using the local imageregion around each point to eliminate false matches Inmammals neurons with binocular receptive fields thatweight and sum over local image regions first appear inprimary visual cortex Much is known about theseneurons It is not known however why they have thespecific receptive fields they do nor is it known howtheir responses are combined for the task of estimatingdisparity

To discover the linear binocular filters that areoptimal this task (see Figure 2b) we use a Bayesianstatistical technique for dimensionality reduction calledaccuracy maximization analysis (AMA) (Geisler Na-jemnik amp Ing 2009) The method depends both on thestatistical structure of the sensory signals and on thetask to be solved This feature of the method iscritically important because information relevant forone task may be irrelevant for another For any fixednumber of filters (feature dimensions) AMA returnsthe linear filters that maximize accuracy for the specificcomputational task at hand (see Methods details)(Note that feature dimensions identified with otherdimensionality-reduction techniques like PCA and ICAmay be irrelevant for a given task) Importantly AMAmakes no a priori assumptions about the shapes of thefilters (eg there no requirement that they be orthog-onal) We applied AMA to the task of identifying thedisparity level from a discrete set of levels in a randomcollection of retinal stereo images of natural scenesThe number of disparity levels was sufficient to allowcontinuous disparity estimation from15 to 15 arcmin(see Methods details)

Before applying AMA we perform a few additionaltransformations consistent with the image processingknown to occur early in the primate visual system(Burge amp Geisler 2011) First each eyersquos noisy sampledimage patch is converted from a luminance image to awindowed contrast image c(x) by subtracting off anddividing by the mean and then multiplying by a raisedcosine of 058 at half height The window limits themaximum possible size of the binocular filters it placesno restriction on minimum size The size of the cosinewindow approximately matches the largest V1 binoc-ular receptive field sizes near the fovea (Nienborg et al2004) Next each eyersquos sampled image patch isaveraged vertically to obtain what are henceforthreferred to as left and right eye signals Verticalaveraging is tantamount to considering only verticallyoriented filters for this is the operation that verticallyoriented filters perform on images All orientations can

provide information about binocular disparity (Chen ampQian 2004 DeAngelis et al 1991) however becausecanonical disparity receptive fields are vertically ori-ented we focus our analysis on them An examplestereo pair and corresponding left and right eye signalsare shown in Figure 1d Finally the signals are contrastnormalized to a vector magnitude of 10 cnorm(x) frac14c(x)||c(x)|| This normalization is a simplified versionof the contrast normalization seen in cortical neurons(Albrecht amp Geisler 1991 Albrecht amp Hamilton 1982Heeger 1992) (see Supplement) Seventy-six hundrednormalized left and right eye signals (400 NaturalInputs middot 19 Disparity Levels) constituted the trainingset for AMA

The eight most useful linear binocular filters (in rankorder) are shown in Figure 3a These filters specify thesubspace that a population of neurophysiologicalreceptive fields should cover for maximally accuratedisparity estimation Some filters are excited bynonzero disparities some are excited by zero (or near-zero) disparities and still others are suppressed by zerodisparity The left and right eye filter components areapproximately log-Gabor (Gaussian on a log spatialfrequency axis) The spatial frequency selectivity ofeach filterrsquos left and right eye components are similarbut differ between filters (Figure 3b) Filter tuning andbandwidth range between 12ndash47 c8 and 09ndash42 c8respectively with an average bandwidth of approxi-mately 15 octaves Thus the spatial extent of the filters(see Figure 3a) is inversely related to its spatialfrequency tuning As the tuned frequency increases thespatial extent of the filter decreases The filters alsoexhibit a mixture of phase and position coding (Figure3c) suggesting that a mixture of phase and positioncoding is optimal Similar filters result (Figure S1andashc)when the training set contains surfaces having adistribution of different slants (see Discussion) (Notethat the cosine windows bias the filters more towardphase than position encoding Additional analyses havenevertheless shown that windows having a nonzeroposition offset ie position disparity do not qualita-tively change the filters)

Interestingly some filters provide the most infor-mation about disparity when they are not respondingFor example the disparity is overwhelmingly likely tobe zero when filter f5 (see Figure 3 SupplementaryFigure S5) does not respond because anticorrelatedintensity patterns in the left- and right-eye images arevery unlikely in natural images A complex cell withthis binocular receptive field would produce a disparitytuning curve similar to the tuning curve produced by aclassic tuned-inhibitory cell (Figure S5 Poggio Gon-zalez Krause 1988)

The properties of our linear filters are similar tothose of binocular simple cells in early visual cortex(Cumming amp DeAngelis 2001 De Valois Albrecht amp

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 4

Thorell 1982 DeAngelis et al 1991 Poggio et al1988) Specifically binocular simple cells have asimilar range of shapes a similar range of spatialfrequency tuning similar octave bandwidths of 15and similar distributions of phase and position coding(Figure 3c) Despite the similarities between theoptimal filters and receptive fields in primary visualcortex we emphasize that our explicit aim is not toaccount for the properties of neurophysiologicalreceptive fields (Olshausen amp Field 1996) Rather ouraim is to determine the filters that encode the retinalinformation most useful for estimating disparity innatural images Thus neurophysiological receptivefields with the properties described here would beideally suited for disparity estimation

The fact that many neurophysiological properties ofbinocular neurons are predicted by a task-drivenanalysis of natural signals with appropriate biologicalconstraints and zero free parameters suggests thatsimilar ideal observer analyses may be useful forunderstanding and evaluating the neural underpin-nings of other fundamental sensory and perceptualabilities

Optimal disparity estimation

The optimal linear binocular filters encode the mostrelevant information in the retinal images for disparityestimation However optimal estimation performancecan only be reached if the joint responses of the filtersare appropriately combined and decoded (Figure 2c d)Here we determine the Bayes optimal (nonlinear)decoder by analyzing the responses of the filters inFigure 3 to natural images with a wide range ofdisparities Later we show how this decoder could beimplemented with linear and nonlinear mechanismscommonly observed in early visual cortex (We notethat the decoder that was used to learn the AMA filtersfrom the training stimuli cannot be used for arbitrarystimuli see Methods details)

First we examine the joint distribution of filterresponses to the training stimuli conditioned on eachdisparity level dk (The dot product between each filterand a contrast normalized left- and right-eye signalgives each filterrsquos response The filter responses arerepresented by the vector R see Figure 2b) Theconditional response distributions p(Rjdk) are approx-imately Gaussian (83 of the marginal distributionsconditioned on disparity are indistinguishable fromGaussian K-S test p 001 see Figure 4a) (Theapproximately Gaussian form is largely due to thecontrast normalization of the training stimuli seeDiscussion) Each of these distributions was fit with amultidimensional Gaussian gauss(R uk Ck) estimatedfrom the sample mean vector uk and the sample

covariance matrix Ck Figure 4a shows sample filterresponse distributions (conditioned on several disparitylevels) for the first two AMA filters Much of theinformation about disparity is contained in thecovariance between the filter responses indicating thata nonlinear decoder is required for optimal perfor-mance

The posterior probability of each disparity given anobserved response vector of AMA filter responses Rcan be obtained via Bayesrsquo rule

pethdkjRTHORN frac14gaussethRukCkTHORNpethdkTHORNXNlfrac141

gaussethR ulClTHORNpethdlTHORN eth2THORN

Here we assume that all disparities are equallylikely Hence the prior probabilities p(d) cancel out(This assumption yields a lower bound on performanceperformance will increase somewhat in natural condi-tions when the prior probabilities are not flat ascenario that is considered later)

The solid curves in Figure 4b show posteriorprobability distributions averaged across all naturalstimuli having the same disparity Each posterior canbe conceptualized as the expected response of apopulation of (exponentiated) log-likelihood neurons(see below) ordered by their preferred disparitiesColored areas show variation in the posterior proba-bilities due to irrelevant image variation When the goalis to obtain the most probable disparity the optimalestimate is given by the maximum a posteriori (MAP)read-out rule

d frac14 argmaxd

pethdjRTHORN eth3THORN

The posterior distributions given by Equation 2could of course also be read out with other decodingrules which would be optimal given different goals(cost functions)

The accuracy of disparity estimates from a largecollection of natural stereo-image test patches (29600Test Patches 800 Natural Inputs middot 37 DisparityLevels) is shown in Figure 5a (Note None of the testpatches were in the training set and only half the testdisparity levels were in the training set) Disparityestimates are unbiased over a wide range At zerodisparity estimate precision corresponds to a detectionthreshold of 6 arcsec As disparity increases (ie asthe surface patch is displaced from the point offixation) precision decreases approximately exponen-tially with disparity (Figure 5b) Sign confusions arerare (37 Figure 5c) Similar but somewhat poorerperformance is obtained with surfaces having a cosinedistribution of slants (Figures 5d e Figure S1f seeDiscussion) In summary optimal encoding (filters inFigure 3) and decoding (Equations 2 and 3) of task-

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 5

relevant information in natural images yields excellentdisparity estimation performance

This pattern of performance is consistent withhuman performance Human disparity detectionthresholds are exquisite a few arcsec on average(Blakemore 1970 Cormack et al 1991) discrimina-tion thresholds decrease exponentially with disparity(Badcock amp Schor 1985 Blakemore 1970 McKeeLevi amp Bowne 1990 Stevenson Cormack Schor ampTyler 1992) and sign confusions occur with a similarpattern and a similar proportion of the time (Landers ampCormack 1997)

Most psychophysical and neurophysiological datahas been collected with artificial stimuli usuallyrandom-dot stereograms (RDSs) We asked how wellour optimal estimator performs on RDS stimuli Giventhat our estimator was trained on natural stimuli it isinteresting to note that performance is very similar (butslightly poorer) with RDSs (Figure S2andashe) Fortunate-ly this finding suggests that under many circumstancesRDS stimuli are reasonable surrogates for naturalstimuli when measuring disparity processing in bio-logical vision systems

Our optimal estimator also performs poorly onanticorrelated stimuli (eg stimuli in which one eyersquosimage is contrast reversed) just like humans (Cum-ming Shapiro amp Parker 1998) Relevant informationexists in the optimal filter responses to anticorrelatedstimuli although the conditional response distributionsare more poorly segregated and the responses aregenerally slightly weaker (Figure S2f g) The primaryreason for poor performance is that an optimalestimator trained on natural images cannot accuratelydecode the filter responses to anticorrelated stimuli(Figure S2h i) This fact may help explain whybinocular neurons often respond strongly to anticor-related stereograms and why anticorrelated stereo-

grams appear like ill-specified volumes of points indepth

Optimal disparity encoding and decoding withneural populations

Although the AMA filters appear similar to binoc-ular simple cells (Figure 3 Figure S1) it may not beobvious how the optimal Bayesian rule for combiningtheir responses is related to processing in visual cortexHere we show that the optimal computations can beimplemented with neurally plausible operationsmdashlinearexcitation linear inhibition and simple static non-linearities (thresholding and squaring) Appropriateweighted summation of binocular simple and complexcell population responses can result in a new popula-tion of neurons having tightly tuned unimodaldisparity tuning curves that are largely invariant (seeFigure 2c)

The key step in implementing a Bayes-optimalestimation rule is to compute the likelihoodmdashorequivalently the log likelihoodmdashof the optimal filterresponses conditioned on each disparity level For thepresent case of a uniform prior the optimal MAPestimate is simply the disparity with the greatest loglikelihood Given that the likelihoods are Gaussian thelog likelihoods are quadratic

ln gaussethRjdkTHORNfrac12 frac14 05ethR ukTHORNTC1k ethR ukTHORN

thorn const eth4THORNwhere uk and Ck which are the mean and covariancematrices of the AMA filter responses to a largecollection of natural stereo-image patches havingdisparity dk (see Figure 4a) By multiplying throughand collecting terms Equation 4 can be expressed as

Figure 3 Optimal linear binocular receptive fields for disparity estimation (a) Spatial receptive fields Solid lines with closed symbols

indicate the left eye filter components Dashed lines with open symbols indicate right eye filter components Insets show 2-D versions

of the 1-D filters (b) Luminance spatial frequency tuning versus spatial frequency bandwidth The filters have bandwidths of

approximately 15 octaves (c) Phase and position shift coding for optimal binocular filters (circles) and binocular cells in macaque

(squares) (Cumming amp DeAngelis 2001) Phase shifts are expressed in equivalent position shifts Note that the filters were optimized

for the fovea whereas macaque cells were recorded from a range of different eccentricities

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 6

ln gaussethRjdkTHORNfrac12 frac14Xnifrac141

wikRi thornXnifrac141

wiikR2i

thornXn1

ifrac141

Xnjfrac14ithorn1

wijkethRi thorn RjTHORN2

thorn const0 eth5THORNwhere Ri is the response of the ith AMA filter and wikwiik and wijk are the weights for disparity dk Theweights are simple functions of the mean and covari-ance matrices from Equation 4 (see Supplement) Thusa neuron which responds according to the loglikelihood of a given disparity (an LL neuron)mdashthat isRLL

k frac14 ln[gauss(Rjdk)]mdashcan be obtained by weightedsummation of the linear (first term) squared (secondterm) and pairwise-sum-squared (third term) AMAfilter responses (Equation 5) The implication is that alarge collection of LL neurons each with a differentpreferred disparity can be constructed from a smallfixed set of linear filters simply by changing the weightson the linear filter responses squared-filter responsesand pairwise-sum-squared filter responses

A potential concern is that these computations couldnot be implemented in cortex AMA filters are strictlylinear and produce positive and negative responses(Figure 6a) whereas real cortical neurons produce onlypositive responses However the response of eachAMA filter could be obtained by subtracting theoutputs of two half-wave rectified simple cells that arelsquolsquoonrsquorsquo and lsquolsquooffrsquorsquo versions of each AMA filter (seeSupplement Figure S3a) The squared and pairwise-

sum-squared responses are modeled as resulting from alinear filter followed by a static squaring nonlinearity(Figure 6b) these responses could be obtained fromcortical binocular complex cells (see Supplement)(Note that our lsquolsquocomplex cellrsquorsquo differs from thedefinition given by some prominent expositors of thedisparity energy model [Figure 6d] Cumming ampDeAngelis 2001 Ohzawa 1998 Qian 1997) Thesquared responses could be obtained by summing andsquaring the responses of on and off simple cells (seeSupplement Figure S3a b) Finally the LL neuronresponse could be obtained via a weighted sum of theAMA filter and model complex cell responses (Figure6c) Thus all the optimal computations are biologicallyplausible

Figure 6 shows processing schematics disparitytuning curves and response variability for the threefilter types implied by our analysis an AMA filter amodel complex cell a model LL neuron and forcomparison a standard disparity energy neuron(Cumming amp DeAngelis 2001) (The model complexcells are labeled lsquolsquocomplexrsquorsquo because in general theyexhibit temporal frequency doubling and a fundamen-tal to mean response ratio that is less than 10 Skottunet al 1991) Response variability due to retinal imagefeatures that are irrelevant for estimating disparity isindicated by the gray area the smaller the gray areathe more invariant the neural response to irrelevantimage features The AMA filter is poorly tuned todisparity and gives highly variable responses todifferent stereo-image patches with the same disparity

Figure 4 Joint filter response distributions conditioned on disparity for filters F1 and F2 (see Figure 3a) (a) Joint filter responses to

each of the 7600 image patches in the training set Different colors and symbols denote different disparity levels Contours show

Gaussian fits to the conditional filter response distributions The black curves on the x and y axes represent the marginal response

distributions p(R1) and p(R2) (b) Posterior probability distributions averaged across all stimuli at each disparity level if only filters F1

and F2 are used (dotted curves) and if all eight filter responses are used (solid curves) Using eight AMA filters instead of two

increases disparity selectivity Shaded areas represent 68 confidence intervals on the posterior probabilities this variation is due to

natural stimulus variation that is irrelevant for estimating disparity Natural stimulus variation thus creates response variability even in

hypothetical populations of noiseless neurons

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 7

(Figure 6a) The model complex cell is better tuned butit is not unimodal and its responses also vary severelywith irrelevant image information (Figure 6b) (Thedisparity tuning curves for all model complex cells areshown in Figure S5) In contrast the LL neuron issharply tuned is effectively unimodal and is stronglyresponse invariant (Figure 6c) That is it respondssimilarly to all natural image patches of a givendisparity The disparity tuning curves for a range of LLneurons are shown in Figure 7a When slant variessimilar but broader LL neuron tuning curves result(Figure S1d e)

These results show that it is potentially misleading torefer to canonical V1 binocular simple and complexcells as disparity tuned because their responses aretypically as strongly modulated by variations incontrast pattern as they are by variations in disparity(gray area Figure 6a b) The LL neurons on the otherhand are tuned to a narrow range of disparities andrespond largely independent of the spatial frequencycontent and contrast

The LL neurons have several interesting propertiesFirst their responses are determined almost exclusivelyby the model complex cell inputs because the weightson the linear responses (Equation 5 Figure 6a c) aregenerally near zero (see Supplement) In this regard theLL neurons are consistent with the predictions of thestandard disparity energy model (Cumming amp DeAn-gelis 2001 Ohzawa 1998) However standard dis-

parity energy neurons are not as narrowly tuned or asinvariant (Figure 6d)

Second each LL neuron receives strong inputs frommultiple complex cells (Figure 7b) In this regard theLL neurons are inconsistent with the disparity energymodel which proposes that disparity-tuned cells areconstructed from two binocular subunits The potentialvalue of more than two subunits has been previouslydemonstrated (Qian amp Zhu 1997) Recently it hasbeen shown that some disparity-tuned V1 cells areoften modulated by a greater number of binocularsubunits than two Indeed as many as 14 subunits candrive the activity of a single disparity selective cell(Tanabe Haefner amp Cumming 2011)

Third the weights on the model complex cells(Figure 7b)mdashwhich are determined by the conditionalresponse distributions (Figure 4a)mdashspecify how infor-mation at different spatial frequencies should becombined

Fourth as preferred disparity increases the numberof strong weights on the complex cell inputs decreases(Figure 7b) This occurs because high spatial frequen-cies are less useful for encoding large disparities (seebelow) Thus if cells exist in cortex that behavesimilarly to LL neurons the number and spatialfrequency tuning of binocular subunits driving theirresponse should decrease as a function of theirpreferred disparity

Fifth for all preferred disparities the excitatory(positive) and inhibitory (negative) weights are in a

Figure 5 Accuracy and precision of disparity estimates on test patches (a) Disparity estimates of fronto-parallel surfaces displaced

from fixation using the filters in Figure 3 Symbols represent the median MAP readout of posterior probability distributions (see

Figure 4b) Error bars represent 68 confidence intervals on the estimates Red boxes mark disparity levels not in the training set

Error bars at untrained levels are no larger than at the trained levels indicating that the algorithm makes continuous estimates (b)

Precision of disparity estimates on a semilog axis Symbols represent 68 confidence intervals (same data as error bars in Figure 5a)

Human discrimination thresholds also rise exponentially as stereo stimuli are moved off the plane of fixation The gray area shows the

hyperacuity region (c) Sign identification performance as a function of disparity (d) (e) Same as in (b) (c) except that data is for

surfaces with a cosine distribution of slants

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 8

classic push-pull relationship (Ferster amp Miller 2000)(Figure 7b) consistent with the fact that disparityselective neurons in visual cortex contain both excit-atory and suppressive subunits (Tanabe et al 2011)

Sixth the LL neuron disparity tuning curves areapproximately log-Gaussian in shape (ie Gaussian ona log-disparity axis) Because their standard deviationsare approximately constant on a log disparity axistheir disparity bandwidths increase linearly with thepreferred disparity (Figure 7c) In this respect many

cortical neurons behave like LL neurons the disparitytuning functions of cortical neurons typically havemore low frequency power than is predicted from thestandard energy model (Ohzawa DeAngelis amp Free-man 1997 Read amp Cumming 2003) Additionallypsychophysically estimated disparity channels in hu-mans exhibit a similar relationship between bandwidthand preferred disparity (Stevenson et al 1992) Humandisparity channels however differ in that they haveinhibitory side-lobes (Stevenson et al 1992) that is

Figure 6 Biologically plausible implementation of selective invariant tuning processing schematics and disparity tuning curves for an

AMA filter a model complex cell and a model log-likelihood (LL) neuron For comparison a disparity energy unit is also presented In

all cases the inputs are contrast normalized photoreceptor responses Disparity tuning curves show the mean response of each filter

type across many natural image patches having the same disparity for many different disparities Shaded areas show response

variation due to variation in irrelevant features in the natural patches (not neural noise) Selectivity for disparity and invariance to

irrelevant features (external variation) increase as processing proceeds (a) The filter response is obtained by linearly filtering the

contrast normalized input signal with the AMA filter (b) The model complex cell response is obtained by squaring the linear AMA

filter response (c) The response of an LL neuron with preferred disparity dk is obtained by a weighted sum of linear and squared

filter responses The weights can be positiveexcitatory or negativeinhibitory (see Figure 7) The weights for an LL neuron with a

particular preferred disparity are specified by the filter response distribution to natural images having that disparity (Figure 4a

Equations 4 5 S1ndash 4) In disparity estimation the filter responses specify that the weights on the linear filter responses are near zero

(see Supplement) (d) A standard disparity energy unit is obtained by simply summing the squared responses of two binocular linear

filters that are in quadrature (908 out of phase with each other) Here we show the tuning curve of a disparity energy unit having

binocular linear filters (subunits) with left and right-eye components that are also 908 out of phase with each other (ie each

binocular subunit is selective for a nonzero disparity)

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 9

they have the center-surround organization that is ahallmark of retinal ganglion cell LGN and V1receptive fields Understanding the basis of this center-surround organization is an important direction forfuture work

V1 binocular neurons are unlikely to have receptivefields exactly matching those of the simple and complexcells implied by the optimal AMA filters but V1neurons are likely to span the subspace spanned by theoptimal filters It is thus plausible that excitatory andinhibitory synaptic weights could develop so that asubset of the neurons in V1 or in other cortical areassignal the log likelihood of different specific disparities(see Figure 7b) Indeed some cells in cortical areas V1V2 and V3V3a exhibit sharp tuning to the disparity ofrandom dot stimuli (Cumming 2002 Ohzawa et al1997 G F Poggio et al 1988 Read amp Cumming2003)

Computational models of estimation from neuralpopulations often rely on the assumption that eachneuron is invariant and unimodally tuned to thestimulus property of interest (Girshick et al 2011Jazayeri amp Movshon 2006 Lehky amp Sejnowski 1990Ma et al 2006) However it is often not discussed howinvariant unimodal tuning arises For example thebinocular neurons (ie complex cells) predicted by thestandard disparity energy model do not generallyexhibit invariant unimodal tuning (Figure 6d) Ouranalysis shows that neurons with unimodal tuning tostimulus properties not trivially available in the retinal

images (eg disparity) can result from appropriatelinear combination of nonlinear filter responses

To obtain optimal disparity estimates the LLneuron population response (represented by the vectorRLL in Figure 2c) must be read out (see Figure 2d)The optimal read-out rule depends on the observerrsquosgoal (the cost function) A common goal is to pick thedisparity having the maximum a posteriori probabil-ity If the prior probability of the different possibledisparities is uniform then the optimal MAP decodingrule reduces to finding the LL neuron with themaximum response Nonuniform prior probabilitiescan be taken into account by adding a disparity-dependent constant to each LL neuron responsebefore finding the peak response There are elegantproposals for how the peak of a population responsecan be computed in noisy neural systems (Jazayeri ampMovshon 2006 Ma et al 2006) Other commonlyassumed cost functions (eg MMSE) yield similarperformance

In sum our analysis has several implications First itsuggests that optimal disparity estimation is bestunderstood in the context of a population codeSecond it shows how to linearly sum nonlinear neuralresponses to construct cells with invariant unimodaltuning curves Third it suggests that the eclecticmixture of binocular receptive field properties in cortexmay play a functional role in disparity estimationFourth it provides a principled hypothesis for howneurons may compute the posterior probabilities of

Figure 7 Constructing selective invariant disparity-tuned units (LL neurons) (a) Tuning curves for several LL neurons each with a

different preferred disparity Each point on the tuning curve represents the average response across a collection of natural stereo-

images having the same disparity Gray areas indicate 61 SD of response due to stimulus-induced response variability (b)

Normalized weights on model complex cell responses (see Supplement Equation 5 Figure 6c) for constructing the five LL neurons

marked with arrows in (a) Positive weights are excitatory (red) Negative weights are inhibitory (blue) On-diagonal weights

correspond to model complex cells having linear receptive fields like the filters in Figure 3a Off-diagonal weights correspond to model

complex cells having linear receptive fields like the scaled pairwise sums of the AMA filters (see Figures S3ndashS5) High spatial

frequencies are not useful for estimating large disparities (Figure S7) Thus the number of strongly weighted complex cells decreases

as the magnitude of the preferred disparity increases from zero (c) LL neuron bandwidth (ie full-width at half-height of disparity

tuning curves) as a function of preferred disparity Bandwidth increases approximately linearly with tuning

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 10

stimulus properties (eg disparity defocus motion)not trivially available in the retinal image(s) Thus ouranalysis provides a recipe for how to increase selectivityand invariance in the visual processing stream invari-ance to image content variability and selectivity for thestimulus property of interest

Discussion

Using principles of Bayesian statistical decisiontheory we showed how to optimally estimate binoc-ular disparity in natural stereo images given theoptical sensor and noise properties of the humanvisual system First we determined the linear binoc-ular filters that encode the retinal image informationmost useful for estimating disparity in natural stereo-images The optimal linear filters share many proper-ties with binocular simple cells in the primary visualcortex of monkeys and cats (Figure 3 Figure S1)Then we determined how the optimal linear filterresponses should be nonlinearly decoded to maximizethe accuracy of disparity estimation The optimaldecoder was not based on prior assumptions butrather was dictated by the statistical pattern of thejoint filter responses to natural stereo images Overallperformance was excellent and matched importantaspects of human performance (Figure 5) Finally weshowed that the optimal encoder and decoder can beimplemented with well-established neural operationsThe operations show how selectivity and invariancecan emerge along the visual processing stream(Figures 6 amp 7)

External variability and neural noise

A visual systemrsquos performance is limited by bothexternal (stimulus-induced) variability and intrinsicneural response variability (ie neural noise) Manytheoretical studies have focused on the impact ofneural noise on neural computation (Berens EckerGerwinn Tolias amp Bethge 2011 Cumming ampDeAngelis 2001 DeAngelis et al 1991 Jazayeri ampMovshon 2006 Ma et al 2006 Nienborg et al 2004Ohzawa DeAngelis amp Freeman 1990) However inmany real-world situations stimulus-induced responsevariability (ie variability due to nuisance stimulusproperties) is a major or the dominant source ofvariability Our analysis demonstrates that the exter-nal variation in natural image contentmdashthat isvariation in image features irrelevant to the taskmdashconstitutes an important source of response variabilityin disparity estimation (see Figures 4 5 7) Thus evenin a hypothetical visual system composed of noiseless

neurons significant response variability will occurwhen estimating disparity under natural conditions(see Figures 4 and 5)

Determining the best performance possible withoutneural noise is essential for understanding the effect ofneural noise (Haefner amp Bethge 2010 Lewicki 2002Li amp Atick 1994 Olshausen amp Field 1996 Simoncelliamp Olshausen 2001) because it sets bounds on theamount of neural noise that a system can toleratebefore performance significantly deteriorates Indeedthe impact of neural noise on encoding and decoding innatural viewing cannot be evaluated without consider-ing the impact of stimulus-induced response variabilityon the population response (see Supplement)

Seventy years ago Hecht Shlaer and Pirenne (1942)showed in a landmark study that dark-adapted humanscan detect light in the periphery from the absorption ofonly five to eight quanta and that external variability(in the form of photon noise) was the primary source ofvariability in human performance This result contra-dicted the notion (widespread at the time) that responsevariability is due primarily to variability within theorganism (eg intrinsic noise) Similarly our resultsunderscore the importance of analyzing the variabilityof natural signals when analyzing the responsevariability of neural circuits that underlie performancein critical behavioral tasks

Comparison with the disparity energy model

The most common neurophysiological model ofdisparity processing is the disparity energy modelwhich aims to account for the response properties ofbinocular complex cells (Cumming amp DeAngelis 2001DeAngelis et al 1991 Ohzawa 1998 Ohzawa et al1990) The model had remarkable initial successHowever an increasing number of discrepancies haveemerged between the neurophysiology and the modelrsquospredictions The standard disparity energy model forexample does not predict the reduced response toanticorrelated stereograms observed in cortex It alsodoes not predict the dependence of disparity selectivecells on more than two binocular subunits Fixes to thestandard model have been proposed to account for theobserved binocular response properties (Haefner ampCumming 2008 Qian amp Zhu 1997 Read Parker ampCumming 2002 Tanabe amp Cumming 2008) We haveshown however that a number of these properties area natural consequence of an optimal algorithm fordisparity estimation

The most common computational model of disparityprocessing is the local cross-correlation model (Bankset al 2004 Cormack et al 1991 Tyler amp Julesz 1978)In this model the estimated disparity is the one thatproduces the maximum local cross-correlation between

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 11

the images in the two eyes Local cross-correlation isthe computational equivalent of appropriately pro-cessing large populations of disparity-energy complexcells having a wide range of tuning characteristics(Anzai Ohzawa amp Freeman 1999 Fleet Wagner ampHeeger 1996) Cross-correlation is optimal when thedisparities across a patch are small and constantbecause it is then akin to the simple template matchingof noise-limited ideal observers Cross-correlation isnot optimal for larger or varying disparities becausethen the left- and right-eye signals falling within abinocular receptive field often differ substantially

How does our ideal observer for disparity estimationcompare with the standard disparity-energy and cross-correlation models To enable a meaningful compari-son both methods were given access to the same imageinformation Doing so requires a slight modification tothe standard cross correlation algorithm (seeSupplement) The eight AMA filters in Figure 3outperform cross-correlation in both accuracy andprecision when the disparities exceed approximately75 arcmin (Figure S6) This performance increase isimpressive given that local cross-correlation has (ineffect) access to a large bank of filters whereas ourmethod uses only eight Furthermore increasing thenumber of AMA filters to 10ndash12 gives essentiallyequivalent performance to cross-correlation for smallerdisparities

Spatial frequency tuning of optimal linearbinocular filters

The optimal linear binocular filters (the AMA filters)are selective for low to mid spatial frequencies (12ndash47c8) and not for higher frequencies (see Figure 3 FigureS1) To develop an intuition for why we examined thebinocular contrast signals as a function of frequencythat result from a binocularly viewed high-contrastedge (Figure S7) The binocular contrast signals barelydiffer above 6 c8 indicating that higher spatialfrequencies carry little information for disparity esti-mation

This analysis of binocular contrast signals providesinsight into another puzzling aspect of disparityprocessing Human sensitivity to disparity modulation(ie sinusoidal modulations in disparity-defined depth)cuts off at very low frequencies (4 c8) (Banks et al2004 Tyler 1975) V1 binocular receptive fields tunedto the highest useful (luminance) spatial frequency (6c8) have widths of 8 arcmin assuming the typicaloctave bandwidth for cortical neurons Given thatreceptive fields cannot signal variation finer than theirown size the fact that humans cannot see disparitymodulations higher than 4 c8 is nicely predicted bythe 8 arcmin width suggested by the present analysis

This size matches previous psychophysically basedestimates of the smallest useful mechanism in disparityestimation (Banks et al 2004 Harris McKee ampSmallman 1997)

Eye movement jitter has previously been proposed asan explanation for why useful binocular information isrestricted to low spatial frequencies (Vlaskamp Yoonamp Banks 2011) The argument is that jitter and therelatively long integration time of the stereo systemcould lsquolsquosmear outrsquorsquo the disparity signals Eye movementjitter certainly can degrade stereo information enoughto render high frequency signals unmeasurable (FigureS7c d) However the analysis in the supplement(Figure S7) suggests that the low frequency at whichhumans lose the ability to detect disparity modulationmay also be explained by the statistics of naturalimages

Generality of findings

Although our analysis is based on calibrated naturalstereo-image patches and realistic characterizations ofhuman optics sensors and neural noise there are somesimplifying assumptions that could potentially limit thegenerality of our conclusions Here we examine theeffect of these assumptions

The effect of surface slant and nonplanar depth structure

The stereo-image patches were projections offronto-parallel surfaces In natural scenes surfaces areoften slanted which causes the disparity to changeacross a patch We evaluated the effect of surface slantby repeating our analysis with a training set having adistribution of slants (see Figure 5d e andSupplement) This distribution of slants produced adistribution of disparity gradients that are comparableto those that occur when viewing natural scenes(Hibbard 2008) (Figure S1g) The optimal binocularfilters are only very slightly affected by surface slant(cf Figure 3 and Figure S1a) These differenceswould be difficult to detect in neurophysiologicalexperiments This may help explain why there hasbeen little success in finding neurons in early visualcortex that are tuned to nonzero slants (Nienborg etal 2004) Disparity estimation performance is alsoquite similar although estimate precision is somewhatreduced (Figure 5d e)

Even after including the effect of surface slant ourtraining set lacks the non-planar depth structuremdashwithin-patch depth variation and depth discontinuities(occlusions)mdashthat is present in natural scenes (FigureS8andashc) To determine how the magnitude of theseother sources depth variation compare to that due toslant we analyzed a set of range images obtained with

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 12

a high-precision Riegl VZ400 range scanner (seeSupplement) We find that most of the depth variationin the range images is captured by variation in slant(Figure S8d) Given that variation in slant has littleeffect on the optimal receptive fields and given thatlocations of fine depth structure and monocularocclusion zones (9 of points in our dataset) aremost likely random with respect to the receptive fieldsthen it is likely that the other sources of variation willhave little effect

Nevertheless occlusions create monocular zones thatcarry information supporting da Vinci stereopsis inhumans (Kaye 1978 Nakayama amp Shimojo 1990)Neurophysiological evidence suggests that binocularmechanisms exist for locating occlusion boundaries(Tsao Conway amp Livingstone 2003) In the futurewhen databases become available of high-qualitynatural stereo images with coregistered RGB anddistance information the general approach describedhere may prove useful for determining the filters thatare optimal for detecting occlusion boundaries

The effect of a realistic disparity prior

Our analysis assumed a flat prior probabilitydistribution over disparity Two groups have recentlypublished estimates of the prior distribution ofdisparities based on range measurements in naturalscenes and human eye movement statistics (Hibbard2008 Liu et al 2008) Does the prior distribution ofdisparity signals have a significant effect on the optimalfilters and estimation performance To check wemodified the frequency of different disparities in ourtraining set to match the prior distribution ofdisparities encountered in natural viewing (Liu et al2008) Linear filter shapes LL neuron tuning curveshapes and performance levels were robust to differ-ences between a flat and realistic disparity prior

It has been hypothesized that the prior over thestimulus dimension of interest (eg disparity) maydetermine the optimal tuning curve shapes and theoptimal distribution of peak tunings (ie how thetuning curves should tile the stimulus dimension)(Ganguli amp Simoncelli 2010) In evaluating thishypothesis one must consider the results presented inthe present paper Our results show that natural signalsand the linear receptive fields that filter those signalsplace strong constraints on the shapes that tuningcurves can have Specifically the shapes of the LLneuron disparity tuning curve (approximately log-Gaussian) are robust to changes in the prior Thusalthough the prior may influence the optimal distribu-tion of peak tuning (our analysis does not address thisissue) it is unlikely to be the sole (or even the primary)determinant of tuning curve shapes

The effect of contrast normalization

Contrast normalization significantly contributes tothe Gaussian form of the filter response distributionsOne potential advantage of Gaussian response distri-butions is that pair-wise (and lower order) statisticsfully characterize the joint responses from an arbitrarynumber of filters making possible the decoding of largefilter population responses In the analysis presentedhere all binocular signals were normalized to a mean ofzero and a vector magnitude of 10 before beingprojected onto the binocular filters This is a simplifiedform of the contrast normalization that is a ubiquitousfeature of retinal and early cortical processing (Car-andini amp Heeger 2012) The standard model ofcontrast normalization in cortical neurons is given by

cnorm(x)frac14 c(x)ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffijjcethxTHORNjj2 thorn nc2

50

qwhere n is the dimen-

sionality of the vector and c50 is the half-saturationconstant (Albrecht amp Geisler 1991 Albrecht ampHamilton 1982 Heeger 1992) (The half-saturationconstant is so-called because in a neuron with anoutput squaring nonlinearity the response rate willequal half its maximum when the contrast of thestimulus equals the value of c50)

We examined the effect of different half-saturationconstants The optimal filters are robust to differentvalues of c50 The conditional response distributions arenot For large values of c50 the response distributionshave tails much heavier than Gaussians When c50frac14 00(as it did throughout the paper) the distributions arewell approximated but somewhat lighter-tailed thanGaussians The distributions (eg see Figure 4a) aremost Gaussian on average when c50 frac14 01 (Figure S9)On the basis of this finding we hypothesize a newfunction for cortical contrast normalization in additionto the many already proposed (Carandini amp Heeger2012) Contrast normalization may help create condi-tional filter response distributions that are Gaussianthereby making simpler the encoding and decoding ofhigh-dimensional subspaces of retinal image informa-tion

The effect of different optics in the two eyes

We modeled the optics of the two eyes as beingidentical whereas refractive power and monochromaticaberrations often differ somewhat between the eyes(Marcos amp Burns 2000) To evaluate the effect ofnormal optical variations we convolved the retinalprojections with point-spread functions previouslymeasured in the first authorrsquos left and right eyes (Burgeamp Geisler 2011) (The first author has normal opticalvariation) Then we determined optimal filters andperformance Filters and performance levels are similarto those in Figures 3 and 4 Thus our results are robustto typical optical differences between the eyes How-

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 13

ever if the optics in one eye is severely degradedrelative to the other then the optimal filters are quitedifferent and disparity estimation performance issignificantly poorer

Interestingly a persistent 4 diopter difference inrefractive power between the two eyes is the mostimportant risk factor for the development of amblyopia(Levi McKee amp Movshon 2011) a condition char-acterized by extremely poor or absent stereopsis(among other deficits) A four diopter differencebetween the left and right eye optics drastically reduces(ie nearly eliminates) computable interocular contrastsignals that covary systematically with disparity(Equation S6 Figure S7endashh)

Conclusions

The method presented here provides a prescriptionfor how to optimally estimate local depth cues fromimages captured by the photosensors in any visionsystem Our analysis of the depth cue of binoculardisparity provides insight into the computations andneural populations required for accurate estimation ofbinocular disparity in animals viewing natural scenesSpecifically by analyzing the information available atthe photoreceptors we improve upon existing compu-tational methods for solving the stereo correspondenceproblem and provide a normative account of a range ofestablished neurophysiological and psychophysicalfindings This study demonstrates the power ofcharacterizing the properties of natural signals that arerelevant for performing specific natural tasks A similarrecent analysis provided insight into the neuralcomputations required for accurate estimation of thefocus error (defocus) in local regions of retinal imagesof natural scenes (Burge amp Geisler 2011 Stevenson etal 1992) The same approach seems poised to provideinsight into the neural encoding and decoding thatunderlie many other fundamental sensory and percep-tual tasks

Methods details

Natural disparity signals

To determine the luminance structure of naturalscenes we photographed natural scenes on and aroundthe University of Texas at Austin campus with a tripod-mounted Nikon D700 14-bit SLR camera (4256 middot 2836pixels) fitted with a Sigma 50 mm prime lens (Burge ampGeisler 2011) To ensure sharp photographs thecamera lens was focused on optical infinity and all

imaged objects were at least 16 m from the cameraRAW photographs were converted to luminance valuesusing the measured sensitivities of the camera (Burge ampGeisler 2011 2012) and the human photopic sensitivityfunction (Stockman amp Sharpe 2000) Twelve hundred256 middot 256 pixel patches were randomly selected from80 photographs (15 Patches middot 80 Images Figure 1b)Four hundred were used for training the other 800were used for testing

To approximate the depth structure of naturalscenes we texture map the luminance patches ontoplanar fronto-parallel or slanted surface patches thatwere straight ahead at eye height After perspectiveprojection this procedure yields the same pattern ofretinal stimulation that would be created by viewingphotographs pasted on surfaces (eg walls) differentlyslanted in depth The disparity of the surface patch wasdefined as the disparity of the surfacersquos central point

d frac14 2 tan1 IPD=2

dfixation thorn D

tan1 IPD=2

ethdfixationTHORN

eth6THORNwhere dfixation is the fixation distance to the surfacecenter IPD is the interpupillary distance and D is thedepth Depth is given by D frac14 dsurface dfixation wheredsurface is the distance to the surface The disparity ofother points on the surface patch varies slightly acrossthe surface patch The change in disparity across thesurface is greater when the surface patches are slanted

Each surface patch was either coincident with ordisplaced from the point of fixation (Figure 1c) that isthe surface patches were positioned at one of 37 depthscorresponding to 37 disparity levels within Panumrsquosfusional range (Panum 1858) (16875 to 16875arcmin in equally spaced steps) Each surface patchsubtended 28 of visual angle from the cyclopean eyeSurfaces were then slanted (or not) and left- and right-eye projections were determined via perspective pro-jection ensuring that horizontal and vertical disparitieswere correct (Figure 1c) Next left and right projec-tions were resampled at 128 samples8 approximatelythe sampling rate of the human foveal cones (CurcioSloan Kalina amp Hendrickson 1990) Finally theimages were cropped such that they subtended 18 fromeach eye (6 058 about the left- and right-eye foveae)

Optics

Patches were defocused with a polychromatic point-spread function based on a wave-optics model of thehuman visual system The model assumed a 2 mm pupil(a size typical for a bright sunny day) (Wyszecki ampStiles 1982) human chromatic aberrations (ThibosYe Zhang amp Bradley 1992) a single refracting

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 14

surface and the Fraunhoffer approximation whichimplies that at or near the focal plane the opticaltransfer function (OTF) is given by the cross-correla-tion of the generalized pupil function with its complexconjugate In humans and nonhuman primates ac-commodative and vergence eye-movement systems aretightly coupled In other words the focus distanceusually equals the fixation distance (Fincham ampWalton 1957) We set the focus distance equal to thefixation distance of 40 cm and defocused the imagesappropriate for each disparity See Supplement forfurther details of how the optics were simulated

Sensor array responses

The information for disparity estimation depends onthe depth and luminance structure of natural scenesprojective geometry the optics of the two eyes thesensor arrays and the sensor noise These factorstogether determine the pattern of noisy sensor re-sponses

reethxTHORN frac14 IeethxTHORNpsfeethxTHORNfrac12 sampethxTHORN thorn g eth7THORNfor each eye e Ie(x) represents an eye-specific lumi-nance image of the light striking the sensor array ateach location xfrac14 (x y) in a hypothetical optical systemthat causes zero degradation in image quality Eacheyersquos optics degrade the idealized image the optics arerepresented by a polychromatic point-spread functionpsfe(x) that contains the effects of defocus chromaticaberration and photopic wavelength sensitivity Thesensor arrays are represented by a spatial samplingfunctions samp(x) Finally the sensor responses arecorrupted by noise g the noise level was set just highenough to remove retinal image detail that is unde-tectable by the human visual system (Williams 1985)(In pilot studies we found that the falloff in contrastsensitivity at low frequencies had a negligible effect onresults for simplicity we did not model its effects) Notethat Equation 7 represents a luminance (photopic)approximation of the images that would be captured bythe photoreceptors This approximation is sufficientlyaccurate for the present purposes (Burge amp Geisler2011)

Accuracy maximization analysis (AMA)

AMA is a method for dimensionality reduction thatfinds the low-dimensional set of features in sensorysignals that are most useful for performing a specifictask The dependence of AMA on task renders itdistinct from many other popular methods for dimen-sionality reduction Principal components analysis(PCA) for example finds the dimensions that account

for the most variation in the sensory signals There isof course no guarantee that factors causing the mostvariation in the sensory signals will be useful for thetask at hand Color lighting and the multitude oftextures in natural images for example cause signifi-cant variation in the retinal images Yet all of thisvariation is irrelevant for the task of estimatingdisparity Thus PCA returns features that are notnecessarily useful to the task while AMA is specificallydesigned to ignore the irrelevant variation

The logic of AMA is as follows Consider encodingeach training stimulus with a small population of filterswith known (internal) noise characteristics The filterresponses are given by the dot product of the contrast-normalized stimulus scaled by the maximum responserate plus response noise (here a small amount ofGaussian noise) With a known noise model it isstraightforward to compute the mean and variance ofeach filterrsquos response to each training stimulus Then aclosed-form expression is derived for the approximateaccuracy of the Bayesian optimal decoder (GeislerNajemnik amp Ing 2009) Finally this closed-formexpression can be used to search the space of linearfilters to find those that give the most accurateperformance If the algorithm does not settle in localminima it finds the Bayes-optimal filters for maximiz-ing performance in a given task (Geisler et al 2009)Here different random initializations yielded the samefinal estimated filters A Matlab implementation ofAMA and a short discussion of how to apply it areavailable at httpjburgecpsutexaseduresearchCodehtml

It is important to note that the Bayesian optimaldecoder used in AMA requires knowing the means andvariances for each possible stimulus and hence it canonly be used to decode the training stimuli In otherwords AMA can find the optimal linear filters given alarge enough training set but the decoder it uses tolearn those filters cannot be applied to arbitrary stimuliA separate analysis (like the one described in the bodyof the present paper) is required to determine how tooptimally decode the AMA filter responses for arbi-trary test stimuli

Estimating disparity

Increasing the number of disparity levels in thetraining set increases the accuracy of disparity estima-tion However increasing the number of disparitylevels in the training set also increases the training setsize and the computational complexity of learningfilters via AMA A balance must be struck Excellentcontinuous estimates are obtained using 1875 arcminsteps for training followed by interpolation of 577 filterresponse distributions (ie 31 interpolated response

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 15

distribution between each training step correspondingto one interpolated distribution every 35 arcsec)Interpolated distributions were obtained by fittingcubic splines through the response distribution meanvectors and covariance matrices This procedureresulted in 577 LL neurons which resulted in posteriorprobability distributions that were defined by 577discrete points MAP estimates were obtained byselecting the disparity with the highest posteriorprobability Increasing the number of interpolateddistributions (ie LL neurons) had no effect onperformance

Keywords natural scene statistics perceptual con-stancy ideal observer Bayesian statistics populationcode encoding decoding selectivity invariance depthperception stereopsis hierarchical model disparityenergy model simple cells complex cells

Acknowledgments

The authors thank Larry Cormack Bruno Olshau-sen and Jonathan Pillow for comments on an earlierdraft JB was supported by NSF grant IIS-1111328NIH grant 2R01-EY011747 and NIH Training Grant1T32-EY021462-01A1 WSG was supported by NIHgrant 2R01-EY011747

Commercial relationships noneCorresponding author Johannes BurgeEmail jburgemailcpsutexaseduAddress Center for Perceptual Systems University ofTexas at Austin Austin TX USA

References

Albrecht D G amp Geisler W S (1991) Motionselectivity and the contrast-response function ofsimple cells in the visual cortex Visual Neurosci-ence 7(6) 531ndash546

Albrecht D G amp Hamilton D B (1982) Striatecortex of monkey and cat Contrast responsefunction Journal of Neurophysiology 48(1) 217ndash237

Anzai A Ohzawa I amp Freeman R D (1999)Neural mechanisms for processing binocular in-formation I Simple cells Journal of Neurophysiol-ogy 82(2) 891ndash908

Badcock D R amp Schor C M (1985) Depth-increment detection function for individual spatialchannels Journal of the Optical Society of AmericaA 2(7) 1211ndash1215

Banks M S Gepshtein S amp Landy M S (2004)Why is spatial stereoresolution so low Journal ofNeuroscience 24(9) 2077ndash2089

Berens P Ecker A S Gerwinn S Tolias A S ampBethge M (2011) Reassessing optimal neuralpopulation codes with neurometric functionsProceedings of the National Academy of SciencesUSA 108(11) 4423

Blakemore C (1970) The range and scope ofbinocular depth discrimination in man The Journalof Physiology 211(3) 599ndash622

Burge J amp Geisler W S (2011) Optimal defocusestimation in individual natural images Proceed-ings of the National Academy of Sciences USA108(40) 16849ndash16854

Burge J amp Geisler W S (2012) Optimal defocusestimates from individual images for autofocusing adigital camera Proceedings of the ISampTSPIE 47thAnnual Meeting January 2012 Burlingame CA

Carandini M amp Heeger D J (2012) Normalizationas a canonical neural computation Nature ReviewsNeuroscience 13(1) 51ndash62

Chen Y amp Qian N (2004) A coarse-to-fine disparityenergy model with both phase-shift and position-shift receptive field mechanisms Neural Computa-tion 16(8) 1545ndash1577

Cormack L K Stevenson S B amp Schor C M(1991) Interocular correlation luminance contrastand cyclopean processing Vision Research 31(12)2195ndash2207

Cumming B G (2002) An unexpected specializationfor horizontal disparity in primate primary visualcortex Nature 418(6898) 633ndash636

Cumming B G amp DeAngelis G C (2001) Thephysiology of stereopsis Annual Review of Neuro-science 24 203ndash238

Cumming B G Shapiro S E amp Parker A J (1998)Disparity detection in anticorrelated stereogramsPerception 27(11) 1367ndash1377

Curcio C A Sloan K R Kalina R E ampHendrickson A E (1990) Human photoreceptortopography The Journal of Comparative Neurolo-gy 292(4) 497ndash523

De Valois R L Albrecht D G amp Thorell L G(1982) Spatial frequency selectivity of cells inmacaque visual cortex Vision Research 22(5) 545ndash559

DeAngelis G C Ohzawa I amp Freeman R D(1991) Depth is encoded in the visual cortex by aspecialized receptive field structure Nature352(6331) 156ndash159

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 16

Ferster D amp Miller K D (2000) Neural mechanismsof orientation selectivity in the visual cortexAnnual Review of Neuroscience 23(1) 441ndash471

Fincham E amp Walton J (1957) The reciprocalactions of accommodation and convergence TheJournal of Physiology 137(3) 488ndash508

Fleet D J Wagner H amp Heeger D J (1996)Neural encoding of binocular disparity Energymodels position shifts and phase shifts VisionResearch 36(12) 1839ndash1857

Ganguli D amp Simoncelli E P (2010) Implicitencoding of prior probabilities in optimal neuralpopulations Advances in Neural Information Pro-cessing Systems (NIPS) 23 658ndash666

Geisler W S Najemnik J amp Ing A D (2009)Optimal stimulus encoders for natural tasksJournal of Vision 9(13)17 1ndash16 httpwwwjournalofvisionorgcontent91317 10116791317 [PubMed] [Article]

Girshick A R Landy M S amp Simoncelli E P(2011) Cardinal rules Visual orientation percep-tion reflects knowledge of environmental statisticsNature Neuroscience 14(7) 926ndash932

Haefner R amp Bethge M (2010) Evaluating neuronalcodes for inference using Fisher informationAdvances in Neural Information Processing Systems23 1ndash9

Haefner R M amp Cumming B G (2008) Adaptationto natural binocular disparities in primate V1explained by a generalized energy model Neuron57(1) 147ndash158

Harris J M McKee S P amp Smallman H S (1997)Fine-scale processing in human binocular stereop-sis Journal of the Optical Society of America AOptics Image Science and Vision 14(8) 1673ndash1683

Hecht S Shlaer S amp Pirenne M H (1942) Energyquanta and vision The Journal of General Physi-ology 25(6) 819ndash840

Heeger D J (1992) Normalization of cell responses incat striate cortex Visual Neuroscience 9(2) 181ndash197

Hibbard P B (2008) Binocular energy responses tonatural images Vision Research 48(12) 1427ndash1439

Hoyer P O amp Hyvarinen A (2000) Independentcomponent analysis applied to feature extractionfrom colour and stereo images Network Compu-tation in Neural Systems 11(3) 191ndash210

Jazayeri M amp Movshon J A (2006) Optimalrepresentation of sensory information by neuralpopulations Nature Neuroscience 9(5) 690ndash696

Kaye M (1978) Stereopsis without binocular corre-lation Vision Research 18(8) 1013ndash1022

Landers D D amp Cormack L K (1997) Asymme-tries and errors in perception of depth fromdisparity suggest a multicomponent model ofdisparity processing Perception amp Psychophysics59(2) 219ndash231

Lehky S R amp Sejnowski T J (1990) Neural modelof stereoacuity and depth interpolation based on adistributed representation of stereo disparityJournal of Neuroscience 10(7) 2281ndash2299

Levi D M McKee S P amp Movshon J A (2011)Visual deficits in anisometropia Vision Research51(1) 48ndash57

Lewicki M S (2002) Efficient coding of naturalsounds Nature Neuroscience 5(4) 356ndash363

Li Z amp Atick J J (1994) Efficient stereo coding inthe multiscale representation Network Computa-tion in Neural Systems 5 157ndash174

Liu Y Bovik A C amp Cormack L K (2008)Disparity statistics in natural scenes Journal ofVision 8(11)19 1ndash14 httpwwwjournalofvisionorgcontent81119 doi10116781119 [PubMed][Article]

Ma W J Beck J M Latham P E amp Pouget A(2006) Bayesian inference with probabilistic pop-ulation codes Nature Neuroscience 9(11) 1432ndash1438

Marcos S amp Burns S A (2000) On the symmetrybetween eyes of wavefront aberration and conedirectionality Vision Research 40(18) 2437ndash2447

Marr D amp Poggio T (1976) Cooperative computa-tion of stereo disparity Science 194(4262) 283ndash287

McKee S P Levi D M amp Bowne S F (1990) Theimprecision of stereopsis Vision Research 30(11)1763ndash1779

Nakayama K amp Shimojo S (1990) Da Vincistereopsis Depth and subjective occluding contoursfrom unpaired image points Vision Research30(11) 1811ndash1825

Nienborg H Bridge H Parker A J amp CummingB G (2004) Receptive field size in V1 neuronslimits acuity for perceiving disparity modulationJournal of Neuroscience 24(9) 2065ndash2076

Ogle K N (1952) On the limits of stereoscopic visionJournal of Experimental Psychology 44(4) 253ndash259

Ohzawa I (1998) Mechanisms of stereoscopic visionThe disparity energy model Current Opinion inNeurobiology 8(4) 509ndash515

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 17

Ohzawa I DeAngelis G C amp Freeman R D(1990) Stereoscopic depth discrimination in thevisual cortex Neurons ideally suited as disparitydetectors Science 249(4972) 1037ndash1041

Ohzawa I DeAngelis G C amp Freeman R D(1997) Encoding of binocular disparity by complexcells in the catrsquos visual cortex Journal of Neuro-physiology 77(6) 2879ndash2909

Olshausen B A amp Field D J (1996) Emergence ofsimple-cell receptive field properties by learning asparse code for natural images Nature 381(6583)607ndash609

Panum P (1858) Physiologische Untersuchungen uberdas Sehen mit zwei Augen [Translation Psycholog-ical investigations on seeing with two eyes] KielSchwerssche Buchandlung

Poggio G F Gonzalez F amp Krause F (1988)Stereoscopic mechanisms in monkey visual cortexBinocular correlation and disparity selectivityJournal of Neuroscience 8(12) 4531ndash4550

Qian N (1997) Binocular disparity and the perceptionof depth Neuron 18(3) 359ndash368

Qian N amp Zhu Y (1997) Physiological computationof binocular disparity Vision Research 37(13)1811ndash1827

Read J C A amp Cumming B G (2003) Testingquantitative models of binocular disparity selectiv-ity in primary visual cortex Journal of Neurophys-iology 90(5) 2795ndash2817

Read J C A amp Cumming B G (2007) Sensors forimpossible stimuli may solve the stereo correspon-dence problem Nature Neuroscience 10(10) 1322ndash1328

Read J C A Parker A J amp Cumming B G (2002)A simple model accounts for the response ofdisparity-tuned V1 neurons to anticorrelated im-ages Visual Neuroscience 19(6) 735ndash753

Simoncelli E P amp Olshausen B A (2001) Naturalimage statistics and neural representation AnnualReview of Neuroscience 24 1193ndash1216

Skottun B C De Valois R L Grosof D HMovshon J A Albrecht D G amp Bonds A B(1991) Classifying simple and complex cells on the

basis of response modulation Vision Research31(7-8) 1079ndash1086

Stevenson S B Cormack L K Schor C M ampTyler C W (1992) Disparity tuning in mecha-nisms of human stereopsis Vision Research 32(9)1685ndash1694

Stockman A amp Sharpe L T (2000) The spectralsensitivities of the middle- and long-wavelength-sensitive cones derived from measurements inobservers of known genotype Vision Research40(13) 1711ndash1737

Tanabe S amp Cumming B G (2008) Mechanismsunderlying the transformation of disparity signalsfrom V1 to V2 in the macaque Journal ofNeuroscience 28(44) 11304ndash11314

Tanabe S Haefner R M amp Cumming B G (2011)Suppressive mechanisms in monkey V1 help tosolve the stereo correspondence problem Journal ofNeuroscience 31(22) 8295ndash8305

Thibos L N Ye M Zhang X amp Bradley A(1992) The chromatic eye A new reduced-eyemodel of ocular chromatic aberration in humansApplied Optics 31(19) 3594ndash3600

Tsao D Y Conway B R amp Livingstone M S(2003) Receptive fields of disparity-tuned simplecells in macaque V1 Neuron 38(1) 103ndash114

Tyler C W (1975) Spatial organization of binoculardisparity sensitivity Vision Research 15(5) 583ndash590

Tyler C W amp Julesz B (1978) Binocular cross-correlation in time and space Vision Research18(1) 101ndash105

Vlaskamp B N S Yoon G amp Banks M S (2011)Human stereopsis is not limited by the optics of thewell-focused eye Journal of Neuroscience 31(27)9814ndash9818

Williams D R (1985) Visibility of interference fringesnear the resolution limit Journal of the OpticalSociety of America A 2(7) 1087ndash1093

Wyszecki G amp Stiles W (1982) Color scienceConcepts and methods quantitative data and for-mulas New York John Wiley amp Sons

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 18

113 13

Optimal13 disparity13 estimation13 in13 natural13 stereo-shy‐images13 (Supplement)13

Johannes13 Burge13 amp13 Wilson13 S13 Geisler13 13 Computing13 the13 log13 likelihood13 with13 populations13 of13 simple13 and13 complex13 cells13 13 Simple13 cells13 in13 visual13 cortex13 produce13 only13 positive13 responses13 whereas13 AMA13 filters13 produce13 both13 positive13 and13 negative13 responses13 The13 linear13 AMA13 filter13 responses13 (see13 first13 term13 in13 Eq13 5)13 and13 their13 pairwise13 sums13 (see13 the13 sum13 in13 the13 third13 termrsquos13 parentheses)13 can13 be13 obtained13 from13 a13 population13 of13 simple13 cells13 We13 define13 simple13 cells13 as13 units13 whose13 outputs13 result13 from13 linear13 filtering13 followed13 by13 half-shy‐wave13 rectification13 (ie13 thresholding13 at13 zero)13 Each13 AMA13 filter13 response13 in13 the13 first13 term13 can13 be13 obtained13 from13 two13 simple13 cells13 that13 are13 lsquoonrsquo13 and13 lsquooffrsquo13 versions13 of13 each13 AMA13 filter13 (Fig13 S3a13 see13 Fig13 3)13 Each13 pairwise-shy‐summed13 AMA13 filter13 response13 in13 the13 third13 term13 can13 be13 obtained13 from13 two13 simple13 cells13 that13 are13 lsquoonrsquo13 and13 lsquooffrsquo13 versions13 of13 each13 pairwise-shy‐summed13 filter13 (Fig13 S3a13 S4)13 13 13 The13 squared13 AMA13 filter13 responses13 (see13 the13 second13 and13 third13 terms13 in13 Eq13 5)13 can13 be13 obtained13 from13 a13 population13 of13 complex13 cells13 We13 define13 complex13 cells13 as13 units13 whose13 responses13 result13 from13 linear13 filtering13 followed13 by13 a13 squaring13 non-shy‐linearity13 as13 is13 often13 the13 case13 in13 cortex13 Complex13 cells13 can13 be13 implemented13 by13 summing13 and13 then13 squaring13 (or13 squaring13 and13 then13 summing)13 the13 outputs13 of13 lsquoonrsquo13 and13 lsquooffrsquo13 simple13 cells13 (Fig13 S3b13 the13 disparity13 tuning13 curves13 of13 these13 complex13 cells13 are13 given13 in13 Fig13 S5)13 13 Finally13 neurons13 whose13 response13 rates13 represent13 the13 log13 likelihood13 of13 the13 AMA13 filter13 population13 responses13 can13 be13 constructed13 via13 a13 weighted13 sum13 of13 simple13 and13 complex13 cell13 population13 responses13 (Eq13 513 Fig13 613 Fig13 7)13 (Note13 that13 the13 likelihood13 can13 be13 obtained13 by13 exponentiating13 the13 log-shy‐likelihood13 responses)13 In13 other13 words13 a13 large13 collection13 of13 log-shy‐likelihood13 (LL)13 neurons13 each13 with13 a13 different13 preferred13 disparity13 can13 be13 constructed13 from13 a13 fixed13 set13 of13 simple13 and13 complex13 cells13 simply13 by13 changing13 the13 weights13 13 13 Derivation13 of13 optimal13 weights13 on13 simple13 and13 complex13 cells13 The13 first13 term13 in13 Eq13 513 (ie13 the13 linear13 term)13 can13 be13 conceptualized13 as13 a13 weighted13 sum13 of13 simple13 cell13 population13 responses13 The13 second13 and13 third13 terms13 (ie13 the13 quadratic13 terms)13 can13 be13 conceptualized13 as13 weighted13 sums13 of13 complex13 cell13 population13 responses13 The13 constant13 corresponds13 to13 baseline13 response13 The13 constant13 and13 the13 weights13 on13 the13 linear13 squared13 and13 sum-shy‐squared13 filters13 are13 given13 by13

wik = Ck

minus1uk 13 13 13 13 13 13 13 13 13 (S1a)13

13 wiik = minusdiag(Cminus1)+ 05Cminus11 13 13 13 13 13 13 13 (S1b)13

wij k = minus05Cij

minus1 forallij j gt i 13 13 13 13 13 13 13 13 (S1c)13

cons primetk = minus05ukTCk

minus1uk + constk 13 13 13 13 13 13 (S1d)13 13 where13 I13 is13 the13 identity13 matrix13 113 is13 the13 lsquoonesrsquo13 vector13 and13 diag()13 sets13 a13 matrix13 diagonal13 to13 a13 vector13 Here13 we13 derive13 these13 weight13 equations13 based13 on13 the13 multi-shy‐dimensional13 Gaussian13 approximations13 to13 the13 empirically13 determined13 filter13 response13 distributions13 (ie13 conditional13 response13 distributions)13 (see13 Fig13 4a)13 13 13 13 13 13

213 13

First13 we13 rewrite13 Eq13 513 in13 the13 main13 text13 by13 expanding13 its13 third13 term13 13

ln p R δ k( ) = wik Ri

i=1

n

sum + wiik Ri2

i=1

n

sum + wij k Ri2 + 2Ri Rj + Rj

2( )j=i+1

n

sumi=1

nminus1

sum + cons primet 13 (S2)13

13 Second13 we13 rewrite13 Eq13 413 in13 the13 main13 text13

13

ln p R δ k( ) = minus05 R minus uk( )T Ak R minus uk( )+ const 13 13 13 13 13 (S3)13

13 where13 Ak 13 equals13 the13 inverse13 covariance13 matrix13 Ck

minus1 13 (Recall13 that13 the13 mean13 vector13 uk and13 covariance13 matrix13 Ck are13 obtained13 by13 projecting13 a13 collection13 of13 natural13 stereo-shy‐images13 having13 disparity13 δ k 13 onto13 the13 filters)13 From13 here13 forward13 the13 disparity13 index13 k13 is13 dropped13 for13 notational13 simplicity13 Third13 we13 set13 Eqs13 S213 and13 S313 equal13 to13 one13 another13 multiply13 through13 and13 collect13 terms13 The13 terms13 of13 the13 log13 likelihood13 (Eq13 S3)13 are13 found13 to13 have13 the13 form13 13

( ) ( )2 2 205 05 2ii i i ii i ii i i ii ia R u a R a Ru a uminus minus = minus minus + 13 13 13 13 (S4a)13

minusaij Ri minus ui( ) Rj minus u j( ) = minus aijRiRj minus aijRiu j minus aijRjui + aiju jui( ) 13 13 13 (S4b)13

13 where13 iia 13 are13 the13 on-shy‐diagonal13 elements13 of13 A 13 ija are13 the13 off-shy‐diagonal13 elements13 and13 iu 13 the13 are13 the13 mean13 filter13 responses13 13 13 The13 aim13 is13 now13 to13 express13 wi wii 13 and13 wij 13 in13 terms13 of13 iu 13 iia 13 and13 ija 13 First13 consider13 wi 13 In13 this13 case13

we13 want13 to13 collect13 terms13 containing13 iR 13 wiRi 13 from13 Eq13 S213 and13 aiiRiui 13 and13 aijRiu j 13 from13 Eq13 S4a13 Setting13 the13 iR 13 terms13 equal13 and13 canceling13 gives13 13

wi = aiju jj=1

n

sum 13 13 13 13 13 13 13 13 13 (S5a)13

13 Note13 that13 the13 weights13 on13 the13 linear13 filter13 responses13 wi 13 are13 zero13 if13 the13 mean13 linear13 filter13 responses13 ui 13 are13 zero13 In13 our13 case13 the13 filter13 response13 distributions13 means13 (ie13 the13 means13 of13 the13 conditional13 response13 distributions)13 are13 all13 near13 zero13 (see13 Fig13 4a13 Fig13 S2)13 Thus13 the13 weights13 on13 the13 simple13 cell13 responses13 (Fig13 6a)13 contribute13 little13 to13 the13 units13 (LL13 neurons)13 that13 respond13 according13 to13 the13 log13 likelihood13 of13 the13 linear13 filter13 population13 response13 (see13 Discussion)13 13 Next13 consider13 wij 13 i ne j 13 In13 this13 case13 we13 want13 to13 collect13 terms13 containing13 RiRj 13 2wijRiRj 13 appears13

in13 Eq13 S213 and13 minusaijRiRj 13 appears13 in13 Eq13 S4b13 Setting13 the13 two13 RiRj 13 terms13 equal13 to13 each13 other13 and13 canceling13 gives13 13

wij = minus05aij 13 13 13 13 13 13 13 13 13 (S5b)13

313 13

Next13 consider13 wii 13 In13 this13 case13 the13 total13 weight13 on13 2iR 13 in13 Eq13 S213 must13 be13 05 iiaminus 13 The13 weight13 from13

the13 third13 term13 of13 Eq13 S213 can13 be13 found13 by13 substituting13 wij = minus05aij 13 Subtracting13 that13 weight13 from13 the13

total13 weight13 specified13 by13 Eq13 S413 gives13 wii 13 13

wii = minusaii + 05 aijjsum 13 13 13 13 13 13 13 13 13 (S5c)13

13 Last13 by13 grouping13 terms13 from13 Eq13 S413 that13 are13 independent13 of13 the13 filter13 response13 we13 find13 the13 constant13 in13 Eq13 S213 is13 given13 by13 13

cons primet = minus05 aiimicroi2 + aijmicroimicro j

j foralljneisum

⎝⎜⎞

⎠⎟isum + const 13 13 13 13 (S5d)13

13 Eqs13 S1a-shy‐d13 are13 obtained13 by13 rewriting13 Eqs13 S5a-shy‐d13 in13 matrix13 notation13 13 13 In13 visual13 cortex13 simple13 and13 complex13 cells13 have13 the13 same13 maximum13 firing13 rates13 on13 average13 (Albrecht13 amp13 Hamilton13 1982)13 However13 the13 maximum13 possible13 response13 of13 the13 squared13 filters13 (see13 the13 second13 term13 in13 Eq13 5)13 is13 R2max 13 and13 the13 maximum13 possible13 response13 of13 the13 sum-shy‐squared13 filters13 (see13 the13 third13

term13 in13 Eq13 5)13 is13 given13 by13 R2ijmax = Rmax 2+ 2fi sdot f j( )2 13 where13 fi 13 and13 f j 13 are13 the13 weighting13 functions13 defining13 filters13 i13 and13 j13 (see13 Fig13 3a)13 Thus13 if13 these13 computations13 were13 implemented13 in13 cortexmdashthat13 is13 if13 the13 maximum13 response13 of13 the13 complex13 cells13 equaled13 the13 maximum13 response13 of13 the13 simple13 cells13 Rmax 13 rather13 than13 R

2max 13 and13 R

2ijmaxmdashthe13 weights13 on13 the13 complex13 cell13 responses13 must13 be13 scaled13

Thus13 the13 optimal13 weights13 on13 complex13 cells13 are13 given13 by13 primewi = wi 13 primewii = Rmaxwii 13 and13 primewij =Rijmax2

Rmaxwij 13

These13 scale13 factors13 are13 important13 because13 in13 a13 neurophysiological13 experiment13 one13 would13 measure13 primew 13 not13 w 13 Consequently13 primewii 13 and13 primewij 13 are13 the13 normalized13 weights13 plotted13 in13 Fig13 7b13

13 Evaluating13 the13 effect13 of13 surface13 slant13 Slant13 causes13 disparity13 to13 change13 across13 the13 patch13 We13 evaluated13 the13 effect13 of13 surface13 slant13 by13 determining13 the13 optimal13 filters13 and13 estimation13 performance13 for13 a13 training13 set13 with13 a13 distribution13 of13 slants13 If13 we13 assume13 (plausibly)13 that13 objects13 are13 on13 average13 equally13 likely13 to13 be13 viewed13 from13 any13 given13 angle13 then13 for13 a13 given13 small13 angular13 retinal13 extent13 the13 probability13 of13 a13 given13 slant13 about13 the13 vertical13 axis13 should13 fall13 as13 a13 half-shy‐cosine13 function13 Slant13 was13 calculated13 for13 the13 cyclopean13 eye13 To13 evaluate13 the13 effect13 of13 surface13 slant13 variation13 we13 generated13 a13 training13 set13 of13 stereo-shy‐images13 from13 a13 set13 of13 surfaces13 with13 slants13 that13 were13 randomly13 drawn13 from13 a13 truncated13 (+7113 deg)13 cosine13 probability13 density13 (more13 extreme13 slants13 were13 not13 practical13 to13 include13 because13 of13 the13 surface13 size13 required13 to13 produce13 a13 projected13 patch13 width13 of13 113 deg)13 The13 within-shy‐patch13 changes13 in13 disparity13 that13 were13 introduced13 by13 these13 slants13 are13 comparable13 to13 those13 in13 natural13 viewing13 (Fig13 S1g)13 Results13 based13 on13 this13 analysis13 are13 thus13 likely13 to13 be13 representative13 Then13 we13 determined13 the13 optimal13 binocular13 filters13 for13 this13 new13 training13 set13 The13 filters13 that13 were13 trained13 on13 surfaces13 having13 different13 slants13 (Fig13 S1a-shy‐c)13 and13 the13 LL13 neurons13 that13 are13 constructed13 from13 them13 (Fig13 S1de)13 are13 quite13 similar13 to13 those13 that13 were13 trained13 only13 on13 fronto-shy‐parallel13 surfaces13 The13 main13 (although13 slight)13 differences13 are13 that13 the13 new13 filters13 have13 spatial13 extents13 that13 are13 slightly13 smaller13 and13 octave13 bandwidths13 that13 are13 slightly13 higher13

413 13

and13 that13 the13 new13 LL13 neuron13 tuning13 curves13 are13 broader13 Disparity13 estimation13 performance13 on13 slant13 surfaces13 is13 also13 comparable13 but13 precision13 is13 somewhat13 reduced13 (Fig13 S1f13 Fig13 5d)13 When13 disparity13 changes13 across13 the13 patch13 the13 optimal13 encoding13 filters13 must13 be13 small13 enough13 to13 prevent13 the13 disparity13 signal13 from13 being13 lsquosmeared13 outrsquo13 by13 within-shy‐patch13 (ie13 within-shy‐receptive13 field)13 disparity13 variation13 And13 they13 must13 be13 large13 enough13 to13 prevent13 a13 poor13 signal-shy‐to-shy‐noise13 ratio13 which13 would13 lead13 to13 a13 preponderance13 of13 false13 matches13 (ie13 inaccurate13 disparity13 estimates)13 The13 optimal13 filters13 appear13 to13 strike13 this13 balance13 13 13 Evaluating13 the13 effect13 of13 depth13 variations13 other13 than13 slant13 Naturally13 occurring13 fine13 depth13 structure13 and13 occlusions13 (ie13 within-shy‐patch13 depth13 variation13 and13 depth13 discontinuities13 Fig13 S8a-shy‐c)13 were13 not13 present13 in13 either13 of13 our13 training13 sets13 To13 determine13 how13 the13 magnitude13 of13 these13 other13 sources13 of13 depth13 variation13 compare13 to13 that13 due13 to13 slant13 we13 analyzed13 4000013 113 deg13 patches13 from13 3013 outdoor13 range13 images13 obtained13 with13 a13 high-shy‐precision13 Riegl13 VZ40013 range13 scanner13 For13 each13 patch13 we13 computed13 the13 standard13 deviation13 of13 i)13 the13 differences13 between13 the13 raw13 range13 values13 and13 the13 best-shy‐fitting13 fronto-shy‐parallel13 plane13

σ nonminus planar 13 and13 ii)13 the13 differences13

between13 the13 best-shy‐fitting13 slanted13 and13 fronto-shy‐parallel13 planes13 σ planar 13 The13 ratio13 of13 these13 standard13

deviations13 σ planar σ nonminus planar 13 provides13 a13 measure13 of13 the13 relative13 magnitude13 of13 depth13 variation13 due13 to13

slant13 and13 variation13 due13 to13 other13 factors13 If13 the13 depth13 variation13 is13 due13 only13 to13 surface13 slant13 then13 the13 ratio13

σ planar σ nonminus planar 13 will13 have13 a13 value13 equal13 to13 1013 If13 all13 of13 the13 depth13 variation13 is13 due13 to13 factors13

other13 than13 slant13 the13 ratio13 will13 have13 a13 value13 of13 0013 Fig13 S8d13 plots13 the13 distribution13 of13 this13 ratio13 for13 all13 range13 patches13 In13 the13 most13 common13 patches13 essentially13 all13 the13 variation13 is13 captured13 by13 slant13 (ie13 the13 modal13 ratio13 is13 ~10)13 The13 median13 ratio13 is13 07513 indicating13 that13 in13 the13 median13 patch13 7513 of13 the13 depth13 variation13 is13 captured13 by13 slant13 Given13 that13 the13 optimal13 binocular13 filters13 are13 robust13 to13 variation13 in13 surface13 slant13 and13 given13 that13 most13 of13 natural13 depth13 variation13 is13 captured13 by13 slant13 it13 is13 unlikely13 that13 depth13 variation13 other13 than13 slant13 will13 qualitatively13 affect13 our13 results13 13 External13 variability13 and13 neural13 noise13 13 We13 examined13 the13 relative13 magnitude13 of13 stimulus-shy‐induced13 LL13 neuron13 variability13 and13 typical13 neural13 noise13 in13 cortex13 Typical13 cortical13 neurons13 have13 a13 peak13 mean13 response13 larger13 than13 3013 spikessec13 (Geisler13 amp13 Albrecht13 1997)13 and13 intrinsic13 noise13 variance13 proportional13 to13 the13 neuronrsquos13 mean13 response13 with13 a13 proportionality13 constant13 (ie13 Fano13 factor)13 less13 than13 15(Tolhurst13 Movshon13 amp13 Dean13 1983)13 To13 err13 on13 the13 side13 of13 too13 much13 neural13 noise13 we13 assume13 that13 each13 LL13 neuron13 has13 a13 maximum13 average13 response13 of13 3013 spkss13 and13 a13 Fano13 factor13 of13 1513 (bc13 the13 effect13 of13 noise13 is13 greater13 at13 lower13 firing13 rates)13 We13 convert13 the13 LL13 neuron13 responses13 to13 spike13 rates13 by13 first13 obtaining13 the13 average13 response13 of13 each13 LL13 neuron13 to13 its13 preferred13 disparity13 and13 then13 scaling13 that13 value13 to13 the13 maximum13 average13 response13 For13 an13 integration13 time13 of13 a13 typical13 fixation13 duration13 (30013 ms)13 stimulus-shy‐induced13 response13 variance13 was13 ~3x13 greater13 than13 the13 internal13 noise13 For13 an13 integration13 time13 equal13 to13 a13 typical13 stimulus13 presentation13 in13 a13 psychophysical13 task13 (113 sec)13 13 stimulus-shy‐induced13 response13 variance13 was13 ~9x13 greater13 than13 the13 internal13 noise13 Thus13 external13 variability13 must13 be13 considered13 to13 understand13 the13 influence13 of13 neural13 noise13 on13 the13 behavioral13 limits13 of13 vision13 under13 natural13 conditions13 In13 fact13 stimulus-shy‐induced13 variability13 is13 likely13 to13 be13 the13 controlling13 variability13 in13 many13 natural13 single-shy‐trial13 tasks13 13 Spatial13 frequency13 tuning13 of13 optimal13 linear13 binocular13 filters13 13 To13 develop13 an13 intuition13 for13 why13 spatial13 frequencies13 higher13 than13 ~613 cpd13 carry13 little13 information13 about13 disparity13 we13 examined13 the13 disparity13 signals13 resulting13 from13 a13 binocularly-shy‐viewed13 high-shy‐contrast13 cosine-shy‐windowed13 edge13 The13 Fourier-shy‐decomposition13 of13 each13 eyersquos13 stimulus13 is13 a13 set13 of13

513 13

phase-shy‐aligned13 sinewave13 gratings13 with13 the13 1f13 contrast13 fall-shy‐off13 typical13 of13 natural13 scenes(Field13 1987)13 The13 binocular13 difference13 signal13 (ie13 the13 difference13 of13 the13 left13 and13 right13 signals)13 is13 sinusoidal13 with13 contrast13 amplitude13 given13 by13 13

AB f δ k( ) = AL( f )2 + AR( f )2 minus 2AL( f )AR( f )cos 2π fδ k( ) 13 13 13 (S6)13

13 where f 13 is13 the13 spatial13 frequency13 δ k 13 is13 a13 particular13 disparity13 AL ( f ) 13 and13 AR( f ) 13 are13 the13 left13 and13

right13 eye13 retinal13 amplitudes13 and13 AB f δ k( ) 13 is13 the13 amplitude13 of13 the13 binocular13 contrast13 difference13 signal13 If13 the13 two13 eyes13 have13 identical13 optics13 the13 left-shy‐13 and13 right-shy‐eye13 retinal13 amplitudes13 (which13 include13 the13 effects13 of13 the13 optics13 and13 the13 1 f 13 falloff13 of13 natural13 image13 spectra)13 will13 be13 the13 same13 for13 a13 given13 disparity13 Fig13 S7a13 plots13 pattern13 of13 binocular13 contrast13 signals13 for13 each13 of13 several13 disparities13 Fig13 S7b13 shows13 the13 signals13 after13 filtering13 with13 a13 bank13 of13 log-shy‐Gabor13 filters13 each13 having13 a13 1513 octave13 bandwidth13 The13 patterns13 of13 binocular13 contrast13 barely13 differ13 above13 ~613 cpd13 A13 filter13 population13 (for13 disparity13 estimation)13 will13 be13 most13 efficient13 if13 each13 filter13 signals13 information13 about13 a13 range13 of13 disparities13 13 13 V113 binocular13 receptive13 fields13 tuned13 to13 the13 highest13 useful13 spatial13 frequency13 (~613 cpd)13 would13 have13 a13 spatial13 extent13 of13 ~813 arcmin13 assuming13 the13 typical13 1513 octave13 bandwidth13 of13 neurons13 in13 cortex13 Given13 that13 receptive13 fields13 cannot13 signal13 disparity13 variation13 finer13 than13 their13 own13 width13 this13 analysis13 may13 help13 explain13 why13 humans13 lose13 the13 ability13 to13 detect13 disparity13 modulations13 (ie13 sinusoidal13 modulations13 in13 disparity-shy‐defined13 depth)13 at13 spatial13 frequencies13 higher13 than13 ~413 cpd13 (Banks13 Gepshtein13 amp13 Landy13 200413 Harris13 McKee13 amp13 Smallman13 1997)13 Vergence13 eye13 movement13 jitter13 has13 previously13 been13 proposed13 as13 an13 explanation13 for13 why13 useful13 binocular13 information13 is13 restricted13 to13 low13 spatial13 frequencies13 (Vlaskamp13 Yoon13 amp13 Banks13 2011)13 vergence13 noise13 can13 degrade13 stereo13 information13 enough13 to13 render13 high13 frequency13 signals13 unmeasurable13 (Fig13 S7cd)13 The13 analysis13 presented13 here13 suggests13 that13 the13 frequency13 at13 which13 humans13 lose13 the13 ability13 to13 detect13 disparity13 modulation13 may13 simply13 be13 explained13 by13 the13 statistics13 of13 natural13 images13 13 Optics13 The13 retinal13 images13 used13 in13 our13 analysis13 were13 simulated13 to13 match13 the13 retinal13 image13 formation13 that13 occurs13 in13 the13 human13 eye13 In13 humans13 and13 non-shy‐human13 primates13 accommodative13 and13 vergence13 eye-shy‐movement13 systems13 are13 yoked(Fincham13 amp13 Walton13 1957)13 Consistent13 with13 this13 fact13 we13 set13 the13 focus13 distance13 equal13 to13 the13 fixation13 distance13 of13 4013 cm13 and13 defocused13 the13 images13 appropriate13 for13 each13 disparity13 The13 image13 of13 a13 surface13 displaced13 from13 the13 fixation13 is13 defocused13 by13

13 ΔD cong δ IPD 13 13 13 13 13 13 13 13 13 13 (S7)13

13 where13 δ 13 is13 the13 disparity13 of13 the13 surface13 expressed13 in13 radians13 IPD13 is13 the13 inter-shy‐pupillary13 distance13 expressed13 in13 meters13 and13 ΔD 13 is13 the13 defocus13 expressed13 in13 diopters13 (1meters)13 To13 account13 for13 the13 effect13 of13 chromatic13 aberration13 we13 created13 a13 different13 polychromatic13 point-shy‐spread13 function13 (PSF)13 for13 each13 disparity13 (ie13 defocus13 Eq13 S7)13 Single-shy‐wavelength13 PSFs13 were13 computed13 every13 1013 nm13 between13 40013 and13 70013 nm13 The13 wavelength-shy‐dependent13 change13 in13 refractive13 power13 of13 the13 human13 eye13 was13 taken13 from13 the13 literature13 (Thibos13 Ye13 Zhang13 amp13 Bradley13 1992)13 Polychromatic13 PSFs13 were13 obtained13 by13 weighting13 the13 single-shy‐wavelength13 PSFs13 by13 the13 photopic13 sensitivity13 function13 (Stockman13 amp13 Sharpe13 2000)13 and13 by13 the13 D6513 daylight13 illumination13 spectrum13 and13 then13 summing13 13

613 13

13 psf photopic xΔD( ) = 1

Kpsf xλΔD( )sc

λsum λ( )D65 λ( ) 13 13 13 13 (S8)13

13 where13 sc λ( ) 13 is13 the13 human13 photopic13 sensitivity13 function13 and13 K13 is13 a13 normalizing13 constant13 that13 sets13 the13 volume13 of13 psf photopic 13 to13 1013 The13 modulation13 transfer13 function13 (MTF)13 is13 the13 amplitude13 of13 the13 Fourier13 transform13 of13 the13 PSF13 13 Contrast13 normalization13 Contrast13 normalization13 occurs13 early13 in13 visual13 processing13 The13 standard13 model13 of13 cortical13 neuron13 response13 assumes13 that13 the13 input13 is13 the13 contrast13 signal13

c x( ) = r x( )minus r( ) r 13 where13 r x( ) 13 are13 the13

noisy13 sensor13 responses13 to13 the13 luminance13 image13 falling13 on13 the13 retina13 over13 a13 local13 area13 and13 r 13 is13 the13 local13 mean13 This13 contrast13 signal13 is13 then13 normalized13 by13 the13 local13 contrast13

cnorm x( ) = c x( ) c x( ) 2

+ nc502 13 where13 n13 is13 the13 dimensionality13 of13 the13 vector13 c x( ) 2 13 is13 the13 contrast13

energy13 (or13 equivalently13 n 13 times13 the13 squared13 RMS13 contrast)13 and13 50c 13 is13 the13 half-shy‐saturation13 constant13 (Albrecht13 amp13 Geisler13 199113 Albrecht13 amp13 Hamilton13 198213 Heeger13 1992)13 (We13 are13 agnostic13 about13 the13 mechanism13 by13 which13 the13 contrast13 normalization13 is13 achieved)13 Finally13 the13 normalized13 contrast13 signal13 is13 weighted13 by13 the13 cortical13 neuronrsquos13 receptive13 field13 and13 passed13 through13 a13 static13 non-shy‐linearity13 For13 example13 in13 the13 case13 of13 a13 binocular13 simple13 cell13 the13 response13 of13 the13 neuron13 is13 obtained13 by13 taking13 the13 dot13 product13 of13 the13 contrast-shy‐normalized13 signal13 with13 the13 receptive13 field13 and13 half-shy‐wave13 rectifying13

Ri = Rmax cnorm x( ) sdot fi x( )⎢⎣ ⎥⎦ 13 13

13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13

713 13

13 References13 Albrecht13 D13 G13 amp13 Geisler13 W13 S13 (1991)13 Motion13 selectivity13 and13 the13 contrast-shy‐response13 function13 of13

simple13 cells13 in13 the13 visual13 cortex13 Visual13 Neuroscience13 7(6)13 531ndash54613 Albrecht13 D13 G13 amp13 Hamilton13 D13 B13 (1982)13 Striate13 cortex13 of13 monkey13 and13 cat13 contrast13 response13

function13 Journal13 of13 Neurophysiology13 48(1)13 217ndash23713 Banks13 M13 S13 Gepshtein13 S13 amp13 Landy13 M13 S13 (2004)13 Why13 is13 spatial13 stereoresolution13 so13 low13 The13

Journal13 of13 Neuroscience13 24(9)13 2077ndash208913 13 Field13 D13 J13 (1987)13 Relations13 between13 the13 statistics13 of13 natural13 images13 and13 the13 response13 properties13 of13

cortical13 cells13 Journal13 of13 the13 Optical13 Society13 of13 America13 A13 4(12)13 2379ndash239413 Fincham13 E13 amp13 Walton13 J13 (1957)13 The13 reciprocal13 actions13 of13 accommodation13 and13 convergence13 The13

Journal13 of13 Physiology13 137(3)13 488ndash50813 Geisler13 W13 S13 amp13 Albrecht13 D13 G13 (1997)13 Visual13 cortex13 neurons13 in13 monkeys13 and13 cats13 detection13

discrimination13 and13 identification13 Visual13 Neuroscience13 14(5)13 897ndash91913 Harris13 J13 M13 McKee13 S13 P13 amp13 Smallman13 H13 S13 (1997)13 Fine-shy‐scale13 processing13 in13 human13 binocular13

stereopsis13 Journal13 of13 the13 Optical13 Society13 of13 America13 A13 14(8)13 1673ndash168313 Heeger13 D13 J13 (1992)13 Normalization13 of13 cell13 responses13 in13 cat13 striate13 cortex13 Visual13 Neuroscience13 9(2)13

181ndash19713 Hibbard13 P13 B13 (2008)13 Binocular13 energy13 responses13 to13 natural13 images13 Vision13 Research13 48(12)13 1427ndash

143913 13 Stockman13 A13 amp13 Sharpe13 L13 T13 (2000)13 The13 spectral13 sensitivities13 of13 the13 middle-shy‐13 and13 long-shy‐wavelength-shy‐

sensitive13 cones13 derived13 from13 measurements13 in13 observers13 of13 known13 genotype13 Vision13 Research13 40(13)13 1711ndash173713

Thibos13 L13 N13 Ye13 M13 Zhang13 X13 amp13 Bradley13 A13 (1992)13 The13 chromatic13 eye13 a13 new13 reduced-shy‐eye13 model13 of13 ocular13 chromatic13 aberration13 in13 humans13 Applied13 Optics13 31(19)13 3594ndash360013

Tolhurst13 D13 J13 Movshon13 J13 A13 amp13 Dean13 A13 F13 (1983)13 The13 statistical13 reliability13 of13 signals13 in13 single13 neurons13 in13 cat13 and13 monkey13 visual13 cortex13 Vision13 Research13 23(8)13 775ndash78513

Vlaskamp13 B13 N13 S13 Yoon13 G13 amp13 Banks13 M13 S13 (2011)13 Human13 stereopsis13 is13 not13 limited13 by13 the13 optics13 of13 the13 well-shy‐focused13 eye13 The13 Journal13 of13 Neuroscience13 31(27)13 9814ndash981813

13

813 13

13 Figure13 S113 Optimal13 filters13 and13 disparity13 estimation13 performance13 for13 estimating13 disparity13 for13 surfaces13 with13 a13 cosine13 distribution13 of13 slants13 (a)13 Optimal13 linear13 binocular13 filters13 for13 estimating13 the13 disparity13 (of13 the13 center13 pixel)13 of13 slanted13 surfaces13 The13 filters13 were13 learned13 on13 a13 training13 set13 having13 surfaces13 with13 a13 cosine13 distribution13 of13 slants13 The13 optimal13 filter13 shapes13 are13 robust13 to13 the13 presence13 of13 non-shy‐zero13 surface13 slants13 in13 the13 training13 set13 However13 these13 filters13 are13 of13 somewhat13 smaller13 spatial13 extent13 and13 slightly13 higher13 octave13 bandwidth13 (compare13 filters13 F3-shy‐F513 to13 those13 in13 Fig13 3)13 (b)13 Spatial13 frequency13 tuning13 and13 bandwidth13 of13 filters13 (c)13 Two-shy‐dimensional13 analogs13 of13 filters13 in13 (a)13 The13 two-shy‐dimensional13 filters13 respond13 virtually13 identically13 to13 two-shy‐dimensional13 images13 as13 do13 the13 one-shy‐dimensional13 filters13 to13 one-shy‐dimensional13 signals13 (d)13 Tuning13 curves13 of13 LL13 neurons13 trained13 and13 tested13 on13 surfaces13 with13 varying13 surface13 slant13 (e)13 Normalized13 weights13 on13 model13 complex13 cell13 responses13 for13 constructing13 five13 LL13 neurons13 with13 the13 indicated13 preferred13 disparities13 (f)13 Disparity13 estimation13 performance13 with13 surfaces13 having13 a13 cosine13 distribution13 of13 slants13 (black13 line)13 Also13 shown13 are13 disparity13 estimates13 derived13 from13 a13 local13 cross-shy‐correlator13 (gray13 line)13 (g)13 Disparity13 gradients13 from13 viewing13 natural13 scenes(Hibbard13 2008)13 vs13 disparity13 gradients13 in13 our13 training13 set13 The13 relationship13 between13 slant13 and13 disparity13 gradient13 is13 given13 by13

g asymp IPD tanθ( ) d 13 where13 θ 13 is13 the13 surface13 slant13 and13 d13 is13 the13 distance13 to13 the13 surface13 Hibbard13 (2008)13 estimated13 disparity13 gradients13 in13 natural13 viewing13 for13 distances13 nearer13 than13 3513 m13 (dashed13 curve)13 and13 0513 m13 (dotted13 curve)13 At13 the13 04m13 viewing13 distance13 assumed13 throughout13 the13 paper13 our13 manipulation13 of13 surface13 slant13 introduced13 distribution13 of13 disparity13 gradients13 (black13 line)13 that13 is13 comparable13 to13 the13 gradients13 introduced13 in13 natural13 viewing13 13

913 13

13 Figure13 S213 Filter13 responses13 LL13 neuron13 responses13 and13 performance13 with13 random-shy‐dot13 stereograms13 and13 anti-shy‐correlated13 random-shy‐dot13 stereograms13 (a)13 Joint13 response13 distributions13 of13 filters13 F113 and13 F213 when13 stimulated13 with13 random-shy‐dot13 stereograms13 Compare13 to13 natural13 stimuli13 in13 Fig13 4a13 (b)13 Joint13 responses13 of13 filters13 F313 and13 F413 when13 stimulated13 with13 random-shy‐dot13 stereograms13 (c)13 LL13 neuron13 responses13 to13 random13 dot13 stereograms13 (d)13 LL13 neuron13 tuning13 vs13 bandwidth13 when13 stimulated13 with13 natural13 stimuli13 Note13 that13 bandwidths13 are13 elevated13 compared13 to13 when13 the13 LL13 neurons13 are13 stimulated13 with13 natural13 stimuli13 (see13 Fig13 7c)13 (e)13 Disparity13 estimates13 and13 estimate13 precision13 (inset)13 obtained13 with13 random-shy‐dot13 stereograms13 (fghi)13 Same13 as13 abce13 except13 that13 the13 stimuli13 were13 anti-shy‐correlated13 random13 dot13 stereograms13 Filter13 response13 distributions13 still13 segregate13 as13 a13 function13 of13 disparity13 (especially13 with13 F313 and13 F4)13 However13 the13 anti-shy‐correlated13 filter13 response13 distributions13 do13 not13 coincide13 with13 the13 lsquoexpectedrsquo13 pattern13 of13 response13 for13 particular13 disparities13 The13 disparities13 are13 decoded13 improperly13 with13 a13 decoder13 trained13 on13 natural13 stimuli13 LL13 neuron13 tuning13 curves13 with13 anti-shy‐correlated13 stereograms13 are13 neither13 selective13 nor13 invariant13 Inaccurate13 highly13 variable13 estimates13 result13 13 13

13 13

1013 13

13 Figure13 S313 Obtaining13 AMA13 filter13 and13 model13 complex13 cell13 responses13 with13 established13 neural13 operations13 (a)13 Simple13 cell13 responses13 are13 obtained13 by13 projecting13 the13 left-shy‐13 and13 right-shy‐eye13 signals13 onto13 each13 binocular13 filter13 and13 half-shy‐wave13 rectifying13 Subtracting13 lsquoonrsquo13 and13 lsquooffrsquo13 simple13 cell13 responses13 give13 the13 AMA13 filter13 responses13 (see13 Fig13 3a)13 (b)13 Model13 (and13 neurophysiological)13 complex13 cell13 responses13 are13 obtained13 by13 summing13 and13 squaring13 the13 responses13 of13 simple13 cells13 (Model13 complex13 cell13 responses13 can13 also13 be13 obtained13 by13 summing13 the13 responses13 of13 simple13 cells13 that13 have13 both13 a13 half-shy‐wave13 rectifying13 and13 a13 squaring13 nonlinearity)13 Note13 that13 the13 model13 complex13 cells13 differ13 from13 canonical13 disparity13 energy13 binocular13 complex13 cells13 We13 label13 them13 lsquocomplexrsquo13 because13 they13 exhibit13 temporal13 frequency13 doubling13 and13 have13 a13 fundamental13 to13 DC13 response13 ratio13 of13 less13 than13 one13 The13 model13 LL13 neurons13 (see13 Fig13 613 and13 Fig13 7)13 are13 more13 similar13 to13 disparity13 energy13 complex13 cells13 although13 as13 noted13 in13 the13 text13 there13 are13 also13 important13 differences13

13

1113 13

13 Figure13 S413 The13 eight13 AMA13 filters13 from13 Fig13 313 (on13 the13 diagonal)13 and13 their13 pairwise13 sums13 (see13 Fig13 S3a)13 Solid13 and13 dashed13 curves13 indicate13 the13 left13 and13 right13 eye13 filter13 components13 respectively13 Filter13 identity13 is13 indicated13 by13 the13 subscripts13 Fij13 Some13 filters13 have13 shapes13 similar13 to13 canonical13 V1-shy‐like13 binocular13 receptive13 fields13 These13 filters13 yield13 model13 complex13 cell13 tuning13 curves13 that13 vary13 with13 disparity13 (Fig13 S5)13 Other13 filters13 are13 irregularly13 shaped13 (eg13 F36)13 Irregularly13 shaped13 filters13 yield13 model13 complex13 cell13 tuning13 curves13 that13 do13 not13 vary13 significantly13 with13 disparity13 (Fig13 S5)13 The13 inset13 shows13 the13 largest13 absolute13 normalized13 weight13 (see13 Supplement13 Eqs13 S1a-shy‐d)13 given13 to13 each13 filter13 in13 constructing13 the13 set13 of13 LL13 neurons13 (Fig13 7a)13 Lighter13 colors13 indicate13 smaller13 weights13 Unsurprisingly13 the13 irregularly13 shaped13 filters13 (eg13 F36)13 play13 a13 generally13 smaller13 role13 in13 shaping13 the13 response13 properties13 of13 log-shy‐likelihood13 (LL)13 neurons13 13 13

1213 13

13 Figure13 S513 Disparity13 tuning13 curves13 of13 the13 model13 complex13 cells13 implied13 by13 the13 AMA13 filters13 (on-shy‐diagonal)13 and13 their13 normalized13 pairwise13 sums13 (off-shy‐diagonal)13 Solid13 curves13 show13 the13 complex13 cell13 tuning13 curves13 to13 natural13 stimuli13 The13

complex13 response13 is13 given13 by13 filtering13 the13 contrast-shy‐normalized13 stimulus13 and13 then13 squaring13 Rmax fi

Tcnorm( )2 13 (see13 Fig13 S4)13 Each13 point13 on13 the13 curve13 represents13 the13 average13 response13 to13 a13 large13 number13 of13 natural13 stimuli13 having13 the13 same13 disparity13 Gray13 area13 shows13 response13 variability13 due13 to13 variation13 in13 natural13 image13 content13 Dashed13 curves13 show13 the13 complex13 cell13 tuning13 curve13 to13 513 random-shy‐dot13 stereograms13 (RDSs)13 In13 most13 cases13 RDS13 tuning13 curves13 are13 similar13 to13 the13 tuning13 curves13 for13 natural13 stimuli13 although13 response13 magnitude13 is13 generally13 somewhat13 attenuated13 These13 individual13 model13 complex13 cells13 have13 some13 selectivity13 for13 disparity13 However13 they13 exhibit13 little13 invariance13 to13 variation13 in13 natural13 image13 content13 Thus13 the13 pattern13 of13 the13 joint13 population13 response13 must13 be13 used13 to13 accurately13 estimate13 disparity13 Neurons13 that13 signal13 the13 log-shy‐likelihood13 of13 disparity13 (LL13 neurons)13 can13 be13 constructed13 by13 appropriately13 combining13 the13 joint13 responses13 of13 model13 complex13 cells13 LL13 neurons13 are13 highly13 selective13 for13 disparity13 and13 are13 also13 strongly13 response13 invariant13 to13 irrelevant13

1313 13

13 Figure13 S613 Optimal13 disparity13 estimation13 vs13 estimation13 via13 local13 cross-shy‐correlation13 Performance13 measures13 were13 obtained13 by13 testing13 each13 method13 on13 identical13 sets13 of13 left-shy‐13 and13 right-shy‐eye13 signals13 like13 those13 shown13 in13 Fig13 1d13 (Note13 that13 this13 differs13 somewhat13 from13 the13 most13 common13 test13 of13 the13 cross-shy‐correlation13 model13 which13 gives13 the13 algorithm13 windowed13 access13 to13 many13 patches13 in13 each13 eye13 We13 chose13 the13 current13 method13 so13 that13 each13 algorithm13 has13 access13 to13 the13 same13 image13 information13 thereby13 making13 the13 comparison13 meaningful)13 Black13 curves13 indicate13 estimation13 performance13 via13 local13 cross-shy‐correlation13 Gray13 curves13 indicate13 performance13 obtained13 with13 the13 eight13 AMA13 filters13 shown13 in13 the13 main13 text13 (data13 replotted13 from13 Fig13 5)13 (a)13 Median13 MAP13 estimates13 for13 frontoparallel13 surfaces13 only13 Error13 bars13 indicate13 6813 confidence13 intervals13 (b)13 Estimate13 precision13 as13 indicated13 by13 6813 confidence13 intervals13 Error13 bars13 from13 (a)13 are13 replotted13 on13 a13 linear13 axis13 (Confidence13 intervals13 were13 plotted13 on13 a13 semi-shy‐log13 plot13 in13 the13 main13 text)13 (c)13 Sign13 identification13 performance13 with13 frontoparallel13 surfaces13 (de)13 Same13 as13 bc13 except13 that13 the13 performance13 is13 for13 surfaces13 with13 a13 cosine13 distribution13 of13 slants13 Estimation13 performance13 with13 the13 cross-shy‐correlator13 is13 (slightly)13 more13 biased13 and13 is13 significantly13 less13 precise13 at13 disparities13 larger13 than13 7513 arcmin13 13

1413 13

13 Figure13 S713 Information13 for13 disparity13 estimation13 is13 concentrated13 in13 low13 to13 mid13 spatial13 frequencies13 (a)13 Inter-shy‐ocular13 retinal13 contrast13 amplitude13 of13 a13 cosine-shy‐windowed13 high-shy‐contrast13 edge13 as13 a13 function13 of13 spatial13 frequency13 for13 different13 disparities13 (Eq13 S6)13 The13 optics13 in13 the13 two13 eyes13 were13 identical13 ie13

AL f( ) = AR f( )prop MTFδ f( ) f 13 where13 MTFδ is13 the13 modulation13 transfer13 function13 associated13 with13 a13 particular13 disparity13 (see13 Methods)13 (b)13 Inter-shy‐ocular13 contrast13 signals13 that13 have13 been13 passed13 through13 a13 bank13 of13 1513 octave13 bandwidth13 log-shy‐Gabor13 filters13 The13 signals13 differ13 little13 above13 ~513 cpd13 (c)13 The13 effect13 of13 vergence13 jitter13 on13 inter-shy‐ocular13 contrast13 signals13 Signals13 were13 corrupted13 by13 213 arcmin13 of13 vergence13 jitter13 a13 typical13 value13 in13 humans13 Vergence13 noise13 was13 simulated13 by13 convolving13 each13 disparity13 signal13 with13 a13 Gaussian13 having13 a13 standard13 deviation13 of13 213 arcmin13 Vergence13 jitter13 further13 attenuates13 high13 frequency13 inter-shy‐ocular13 contrast13 signals13 Natural13 image13 structure13 and13 vergence13 jitter13 both13 contribute13 to13 the13 lack13 of13 disparity13 information13 at13 high13 luminance13 spatial13 frequencies13 (d)13 The13 signals13 in13 (c)13 after13 having13 been13 passed13 through13 a13 bank13 of13 1513 octave13 bandwidth13 log-shy‐Gabor13 filters13 (e)13 The13 effect13 of13 a13 four13 diopter13 difference13 in13 refractive13 power13 between13 the13 two13 eyes13 drastically13 reduces13 the13 inter-shy‐ocular13 contrast13 signal13 A13 four13 diopter13 (or13 larger)13 difference13 between13 the13 left13 and13 right13 optics13 is13 a13 primary13 risk13 factor13 for13 the13 development13 of13 amblyopia13 (Levi13 et13 al13 2011)13 Note13 that13 these13 signals13 were13 simulated13 for13 eyes13 with13 2mm13 pupils13 With13 larger13 pupil13 diameters13 the13 same13 difference13 in13 refractive13 power13 would13 further13 reduce13 available13 disparity13 signals13 For13 example13 with13 a13 2mm13 pupil13 an13 813 diopter13 difference13 between13 the13 eyes13 produces13 the13 same13 difference13 in13 defocus13 blur13 as13 would13 a13 4mm13 pupil13 with13 a13 413 diopters13 difference13 between13 the13 eyes13 In13 that13 case13 there13 are13 no13 measureable13 signals13 above13 113 cpd13 (f)13 The13 signals13 in13 (e)13 after13 having13 been13 passed13 through13 a13 bank13 of13 1513 octave13 bandwidth13 log-shy‐Gabor13 filters13 (g)13 Same13 as13 (e)13 but13 with13 a13 4mm13 pupil13 (h)13 Same13 as13 (f)13 but13 with13 a13 4mm13 pupil13 13

13

1513 13

13 Figure13 S813 Depth13 structure13 not13 attributable13 to13 slant13 in13 natural13 scenes13 (a)13 A13 representative13 natural13 range13 image13 (b)13 Overhead13 plan13 view13 of13 the13 range13 along13 the13 white13 scan-shy‐line13 in13 (a)13 (c)13 Range13 values13 within13 a13 113 deg13 patch13 marked13 by13 the13 white13 box13 in13 (a)13 and13 the13 black13 box13 in13 (b)13 The13 dots13 represent13 the13 raw13 range13 values13 The13 plane13 is13 the13 best13 fitting13 plane13 to13 the13 range13 data13 (d)13 Depth13 variation13 in13 113 deg13 patches13 of13 natural13 scenes13 can13 largely13 be13 captured13 by13 slanted13 planes13 (see13 supplementary13 text)13 Values13 near13 one13 indicate13 that13 all13 range13 variation13 can13 be13 captured13 by13 (planar)13 surface13 slant13 Values13 near13 zero13 indicate13 that13 none13 of13 the13 range13 variation13 can13 be13 captured13 by13 planar13 surface13 slant13 In13 the13 most13 common13 patch13 nearly13 all13 depth13 variation13 is13 attributable13 to13 slant13 In13 the13 median13 patch13 7513 of13 depth13 variation13 is13 captured13 by13 slant13 13 13 13 13 13 13 13 13 13 13

1613 13

13 Figure13 S913 Contrast13 normalization13 and13 the13 Gaussian13 form13 of13 the13 filter13 response13 distributions13 The13 standard13 model13 of13

contrast13 normalization13 in13 retinal13 and13 cortical13 neurons13 is13 given13 by13 cnorm x( ) = c x( ) c x( ) 2

+ nc502 (Albrecht13 amp13 Geisler13

199113 Albrecht13 amp13 Hamilton13 198213 Heeger13 1992)13 Throughout13 the13 paper13 we13 assumed13 that13 c50 13 had13 a13 value13 of13 zero13 To13 examine13 the13 effect13 of13 different13 values13 of13 c50 13 on13 the13 form13 of13 the13 filter13 response13 distributions13 we13 first13 contrast-shy‐normalized13 the13 test13 stimuli13 using13 different13 values13 of13 c50 13 Then13 we13 projected13 the13 contrast-shy‐normalized13 the13 test13 stimuli13 from13 each13 disparity13 level13 onto13 the13 filters13 to13 obtain13 the13 conditional13 filter13 response13 distributions13 Finally13 we13 fit13 a13 multi-shy‐dimensional13 generalized13 Gaussian13 (one-shy‐dimension13 for13 each13 filter)13 to13 each13 conditional13 response13 distribution13

P R |δ k( ) 13 via13 maximum13 likelihood13 estimation13 (a)13 When13 c50 =13 0013 the13 conditional13 response13 distributions13 have13 tails13 that13 tend13 to13 be13 somewhat13 lighter13 than13 Gaussians13 (ie13 lower13 kurtosis13 than13 a13 Gaussian)13 When13 c50 =13 0113 the13 conditional13 response13 distributions13 are13 most13 Gaussian13 on13 average13 For13 c50 0113 the13 distributions13 have13 heavier13 tails13 than13 Gaussians13 (ie13 higher13 kurtosis13 than13 a13 Gaussian)13 (b)13 Powers13 of13 the13 multi-shy‐dimensional13 generalized13 Gaussians13 that13 best13 fit13 a13 subset13 of13 the13 two-shy‐dimensional13 conditional13 filter13 response13 distributions13 A13 power13 of13 two13 (dashed13 line)13 indicates13 that13 the13 best13 fitting13 distribution13 was13 Gaussian13 Powers13 greater13 than13 or13 less13 than13 two13 indicate13 that13 the13 responses13 are13 better13 fit13 by13 distributions13 having13 lighter13 or13 heavier13 tails13 than13 a13 Gaussian13 respectively13 The13 best-shy‐fitting13 powers13 are13 shown13 for13 three13 different13 values13 of13 c50 13 00013 01013 and13 02513 The13 average13 powers13 across13 the13 conditional13 response13 distributions13 for13 the13 three13 tested13 values13 of13 c50 13 were13 27913 20713 and13 11713 respectively13 The13 same13 qualitative13 results13 hold13 for13 the13 other13 two-shy‐dimensional13 response13 distributions13 The13 pattern13 also13 holds13 for13 higher-shy‐dimensional13 filter13 response13 distributions13 In13 higher13 dimensions13 however13 results13 are13 noisier13 because13 stable13 estimates13 of13 kurtosis13 require13 significantly13 more13 data13 13 13 13 13 13

  • jovi-14-02-01_118
    • Introduction
    • f01
    • Methods and results
    • e01
    • f02
    • e02
    • e03
    • e04
    • f03
    • e05
    • f04
    • f05
    • f06
    • f07
    • Discussion
    • Conclusions
    • Methods details
    • e06
    • e07
    • Albrecht1
    • Albrecht2
    • Anzai1
    • Badcock1
    • Banks1
    • Berens1
    • Blakemore1
    • Burge1
    • Burge2
    • Carandini1
    • Chen1
    • Cormack1
    • Cumming1
    • Cumming2
    • Cumming3
    • Curcio1
    • DeValois1
    • DeAngelis1
    • Ferster1
    • Fincham1
    • Fleet1
    • Ganguli1
    • Geisler1
    • Girshick1
    • Haefner1
    • Haefner2
    • Harris1
    • Hecht1
    • Heeger1
    • Hibbard1
    • Hoyer1
    • Jazayeri1
    • Kaye1
    • Landers1
    • Lehky1
    • Levi1
    • Lewicki1
    • Li1
    • Liu1
    • Ma1
    • Marcos1
    • Marr1
    • McKee1
    • Nakayama1
    • Nienborg1
    • Ogle1
    • Ohzawa1
    • Ohzawa2
    • Ohzawa3
    • Olshausen1
    • Panum1
    • Poggio1
    • Qian1
    • Qian2
    • Read1
    • Read2
    • Read3
    • Simoncelli1
    • Skottun1
    • Stevenson1
    • Stockman1
    • Tanabe1
    • Tanabe2
    • Thibos1
    • Tsao1
    • Tyler1
    • Tyler2
    • Vlaskamp1
    • Williams1
    • Wyszecki1
      • JOV-03741-2013-s01
Page 5: Optimal disparity estimation in natural stereo images · The local differences between the retinal images—the binocular disparities— ... vertically oriented receptive fields.The

Thorell 1982 DeAngelis et al 1991 Poggio et al1988) Specifically binocular simple cells have asimilar range of shapes a similar range of spatialfrequency tuning similar octave bandwidths of 15and similar distributions of phase and position coding(Figure 3c) Despite the similarities between theoptimal filters and receptive fields in primary visualcortex we emphasize that our explicit aim is not toaccount for the properties of neurophysiologicalreceptive fields (Olshausen amp Field 1996) Rather ouraim is to determine the filters that encode the retinalinformation most useful for estimating disparity innatural images Thus neurophysiological receptivefields with the properties described here would beideally suited for disparity estimation

The fact that many neurophysiological properties ofbinocular neurons are predicted by a task-drivenanalysis of natural signals with appropriate biologicalconstraints and zero free parameters suggests thatsimilar ideal observer analyses may be useful forunderstanding and evaluating the neural underpin-nings of other fundamental sensory and perceptualabilities

Optimal disparity estimation

The optimal linear binocular filters encode the mostrelevant information in the retinal images for disparityestimation However optimal estimation performancecan only be reached if the joint responses of the filtersare appropriately combined and decoded (Figure 2c d)Here we determine the Bayes optimal (nonlinear)decoder by analyzing the responses of the filters inFigure 3 to natural images with a wide range ofdisparities Later we show how this decoder could beimplemented with linear and nonlinear mechanismscommonly observed in early visual cortex (We notethat the decoder that was used to learn the AMA filtersfrom the training stimuli cannot be used for arbitrarystimuli see Methods details)

First we examine the joint distribution of filterresponses to the training stimuli conditioned on eachdisparity level dk (The dot product between each filterand a contrast normalized left- and right-eye signalgives each filterrsquos response The filter responses arerepresented by the vector R see Figure 2b) Theconditional response distributions p(Rjdk) are approx-imately Gaussian (83 of the marginal distributionsconditioned on disparity are indistinguishable fromGaussian K-S test p 001 see Figure 4a) (Theapproximately Gaussian form is largely due to thecontrast normalization of the training stimuli seeDiscussion) Each of these distributions was fit with amultidimensional Gaussian gauss(R uk Ck) estimatedfrom the sample mean vector uk and the sample

covariance matrix Ck Figure 4a shows sample filterresponse distributions (conditioned on several disparitylevels) for the first two AMA filters Much of theinformation about disparity is contained in thecovariance between the filter responses indicating thata nonlinear decoder is required for optimal perfor-mance

The posterior probability of each disparity given anobserved response vector of AMA filter responses Rcan be obtained via Bayesrsquo rule

pethdkjRTHORN frac14gaussethRukCkTHORNpethdkTHORNXNlfrac141

gaussethR ulClTHORNpethdlTHORN eth2THORN

Here we assume that all disparities are equallylikely Hence the prior probabilities p(d) cancel out(This assumption yields a lower bound on performanceperformance will increase somewhat in natural condi-tions when the prior probabilities are not flat ascenario that is considered later)

The solid curves in Figure 4b show posteriorprobability distributions averaged across all naturalstimuli having the same disparity Each posterior canbe conceptualized as the expected response of apopulation of (exponentiated) log-likelihood neurons(see below) ordered by their preferred disparitiesColored areas show variation in the posterior proba-bilities due to irrelevant image variation When the goalis to obtain the most probable disparity the optimalestimate is given by the maximum a posteriori (MAP)read-out rule

d frac14 argmaxd

pethdjRTHORN eth3THORN

The posterior distributions given by Equation 2could of course also be read out with other decodingrules which would be optimal given different goals(cost functions)

The accuracy of disparity estimates from a largecollection of natural stereo-image test patches (29600Test Patches 800 Natural Inputs middot 37 DisparityLevels) is shown in Figure 5a (Note None of the testpatches were in the training set and only half the testdisparity levels were in the training set) Disparityestimates are unbiased over a wide range At zerodisparity estimate precision corresponds to a detectionthreshold of 6 arcsec As disparity increases (ie asthe surface patch is displaced from the point offixation) precision decreases approximately exponen-tially with disparity (Figure 5b) Sign confusions arerare (37 Figure 5c) Similar but somewhat poorerperformance is obtained with surfaces having a cosinedistribution of slants (Figures 5d e Figure S1f seeDiscussion) In summary optimal encoding (filters inFigure 3) and decoding (Equations 2 and 3) of task-

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 5

relevant information in natural images yields excellentdisparity estimation performance

This pattern of performance is consistent withhuman performance Human disparity detectionthresholds are exquisite a few arcsec on average(Blakemore 1970 Cormack et al 1991) discrimina-tion thresholds decrease exponentially with disparity(Badcock amp Schor 1985 Blakemore 1970 McKeeLevi amp Bowne 1990 Stevenson Cormack Schor ampTyler 1992) and sign confusions occur with a similarpattern and a similar proportion of the time (Landers ampCormack 1997)

Most psychophysical and neurophysiological datahas been collected with artificial stimuli usuallyrandom-dot stereograms (RDSs) We asked how wellour optimal estimator performs on RDS stimuli Giventhat our estimator was trained on natural stimuli it isinteresting to note that performance is very similar (butslightly poorer) with RDSs (Figure S2andashe) Fortunate-ly this finding suggests that under many circumstancesRDS stimuli are reasonable surrogates for naturalstimuli when measuring disparity processing in bio-logical vision systems

Our optimal estimator also performs poorly onanticorrelated stimuli (eg stimuli in which one eyersquosimage is contrast reversed) just like humans (Cum-ming Shapiro amp Parker 1998) Relevant informationexists in the optimal filter responses to anticorrelatedstimuli although the conditional response distributionsare more poorly segregated and the responses aregenerally slightly weaker (Figure S2f g) The primaryreason for poor performance is that an optimalestimator trained on natural images cannot accuratelydecode the filter responses to anticorrelated stimuli(Figure S2h i) This fact may help explain whybinocular neurons often respond strongly to anticor-related stereograms and why anticorrelated stereo-

grams appear like ill-specified volumes of points indepth

Optimal disparity encoding and decoding withneural populations

Although the AMA filters appear similar to binoc-ular simple cells (Figure 3 Figure S1) it may not beobvious how the optimal Bayesian rule for combiningtheir responses is related to processing in visual cortexHere we show that the optimal computations can beimplemented with neurally plausible operationsmdashlinearexcitation linear inhibition and simple static non-linearities (thresholding and squaring) Appropriateweighted summation of binocular simple and complexcell population responses can result in a new popula-tion of neurons having tightly tuned unimodaldisparity tuning curves that are largely invariant (seeFigure 2c)

The key step in implementing a Bayes-optimalestimation rule is to compute the likelihoodmdashorequivalently the log likelihoodmdashof the optimal filterresponses conditioned on each disparity level For thepresent case of a uniform prior the optimal MAPestimate is simply the disparity with the greatest loglikelihood Given that the likelihoods are Gaussian thelog likelihoods are quadratic

ln gaussethRjdkTHORNfrac12 frac14 05ethR ukTHORNTC1k ethR ukTHORN

thorn const eth4THORNwhere uk and Ck which are the mean and covariancematrices of the AMA filter responses to a largecollection of natural stereo-image patches havingdisparity dk (see Figure 4a) By multiplying throughand collecting terms Equation 4 can be expressed as

Figure 3 Optimal linear binocular receptive fields for disparity estimation (a) Spatial receptive fields Solid lines with closed symbols

indicate the left eye filter components Dashed lines with open symbols indicate right eye filter components Insets show 2-D versions

of the 1-D filters (b) Luminance spatial frequency tuning versus spatial frequency bandwidth The filters have bandwidths of

approximately 15 octaves (c) Phase and position shift coding for optimal binocular filters (circles) and binocular cells in macaque

(squares) (Cumming amp DeAngelis 2001) Phase shifts are expressed in equivalent position shifts Note that the filters were optimized

for the fovea whereas macaque cells were recorded from a range of different eccentricities

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 6

ln gaussethRjdkTHORNfrac12 frac14Xnifrac141

wikRi thornXnifrac141

wiikR2i

thornXn1

ifrac141

Xnjfrac14ithorn1

wijkethRi thorn RjTHORN2

thorn const0 eth5THORNwhere Ri is the response of the ith AMA filter and wikwiik and wijk are the weights for disparity dk Theweights are simple functions of the mean and covari-ance matrices from Equation 4 (see Supplement) Thusa neuron which responds according to the loglikelihood of a given disparity (an LL neuron)mdashthat isRLL

k frac14 ln[gauss(Rjdk)]mdashcan be obtained by weightedsummation of the linear (first term) squared (secondterm) and pairwise-sum-squared (third term) AMAfilter responses (Equation 5) The implication is that alarge collection of LL neurons each with a differentpreferred disparity can be constructed from a smallfixed set of linear filters simply by changing the weightson the linear filter responses squared-filter responsesand pairwise-sum-squared filter responses

A potential concern is that these computations couldnot be implemented in cortex AMA filters are strictlylinear and produce positive and negative responses(Figure 6a) whereas real cortical neurons produce onlypositive responses However the response of eachAMA filter could be obtained by subtracting theoutputs of two half-wave rectified simple cells that arelsquolsquoonrsquorsquo and lsquolsquooffrsquorsquo versions of each AMA filter (seeSupplement Figure S3a) The squared and pairwise-

sum-squared responses are modeled as resulting from alinear filter followed by a static squaring nonlinearity(Figure 6b) these responses could be obtained fromcortical binocular complex cells (see Supplement)(Note that our lsquolsquocomplex cellrsquorsquo differs from thedefinition given by some prominent expositors of thedisparity energy model [Figure 6d] Cumming ampDeAngelis 2001 Ohzawa 1998 Qian 1997) Thesquared responses could be obtained by summing andsquaring the responses of on and off simple cells (seeSupplement Figure S3a b) Finally the LL neuronresponse could be obtained via a weighted sum of theAMA filter and model complex cell responses (Figure6c) Thus all the optimal computations are biologicallyplausible

Figure 6 shows processing schematics disparitytuning curves and response variability for the threefilter types implied by our analysis an AMA filter amodel complex cell a model LL neuron and forcomparison a standard disparity energy neuron(Cumming amp DeAngelis 2001) (The model complexcells are labeled lsquolsquocomplexrsquorsquo because in general theyexhibit temporal frequency doubling and a fundamen-tal to mean response ratio that is less than 10 Skottunet al 1991) Response variability due to retinal imagefeatures that are irrelevant for estimating disparity isindicated by the gray area the smaller the gray areathe more invariant the neural response to irrelevantimage features The AMA filter is poorly tuned todisparity and gives highly variable responses todifferent stereo-image patches with the same disparity

Figure 4 Joint filter response distributions conditioned on disparity for filters F1 and F2 (see Figure 3a) (a) Joint filter responses to

each of the 7600 image patches in the training set Different colors and symbols denote different disparity levels Contours show

Gaussian fits to the conditional filter response distributions The black curves on the x and y axes represent the marginal response

distributions p(R1) and p(R2) (b) Posterior probability distributions averaged across all stimuli at each disparity level if only filters F1

and F2 are used (dotted curves) and if all eight filter responses are used (solid curves) Using eight AMA filters instead of two

increases disparity selectivity Shaded areas represent 68 confidence intervals on the posterior probabilities this variation is due to

natural stimulus variation that is irrelevant for estimating disparity Natural stimulus variation thus creates response variability even in

hypothetical populations of noiseless neurons

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 7

(Figure 6a) The model complex cell is better tuned butit is not unimodal and its responses also vary severelywith irrelevant image information (Figure 6b) (Thedisparity tuning curves for all model complex cells areshown in Figure S5) In contrast the LL neuron issharply tuned is effectively unimodal and is stronglyresponse invariant (Figure 6c) That is it respondssimilarly to all natural image patches of a givendisparity The disparity tuning curves for a range of LLneurons are shown in Figure 7a When slant variessimilar but broader LL neuron tuning curves result(Figure S1d e)

These results show that it is potentially misleading torefer to canonical V1 binocular simple and complexcells as disparity tuned because their responses aretypically as strongly modulated by variations incontrast pattern as they are by variations in disparity(gray area Figure 6a b) The LL neurons on the otherhand are tuned to a narrow range of disparities andrespond largely independent of the spatial frequencycontent and contrast

The LL neurons have several interesting propertiesFirst their responses are determined almost exclusivelyby the model complex cell inputs because the weightson the linear responses (Equation 5 Figure 6a c) aregenerally near zero (see Supplement) In this regard theLL neurons are consistent with the predictions of thestandard disparity energy model (Cumming amp DeAn-gelis 2001 Ohzawa 1998) However standard dis-

parity energy neurons are not as narrowly tuned or asinvariant (Figure 6d)

Second each LL neuron receives strong inputs frommultiple complex cells (Figure 7b) In this regard theLL neurons are inconsistent with the disparity energymodel which proposes that disparity-tuned cells areconstructed from two binocular subunits The potentialvalue of more than two subunits has been previouslydemonstrated (Qian amp Zhu 1997) Recently it hasbeen shown that some disparity-tuned V1 cells areoften modulated by a greater number of binocularsubunits than two Indeed as many as 14 subunits candrive the activity of a single disparity selective cell(Tanabe Haefner amp Cumming 2011)

Third the weights on the model complex cells(Figure 7b)mdashwhich are determined by the conditionalresponse distributions (Figure 4a)mdashspecify how infor-mation at different spatial frequencies should becombined

Fourth as preferred disparity increases the numberof strong weights on the complex cell inputs decreases(Figure 7b) This occurs because high spatial frequen-cies are less useful for encoding large disparities (seebelow) Thus if cells exist in cortex that behavesimilarly to LL neurons the number and spatialfrequency tuning of binocular subunits driving theirresponse should decrease as a function of theirpreferred disparity

Fifth for all preferred disparities the excitatory(positive) and inhibitory (negative) weights are in a

Figure 5 Accuracy and precision of disparity estimates on test patches (a) Disparity estimates of fronto-parallel surfaces displaced

from fixation using the filters in Figure 3 Symbols represent the median MAP readout of posterior probability distributions (see

Figure 4b) Error bars represent 68 confidence intervals on the estimates Red boxes mark disparity levels not in the training set

Error bars at untrained levels are no larger than at the trained levels indicating that the algorithm makes continuous estimates (b)

Precision of disparity estimates on a semilog axis Symbols represent 68 confidence intervals (same data as error bars in Figure 5a)

Human discrimination thresholds also rise exponentially as stereo stimuli are moved off the plane of fixation The gray area shows the

hyperacuity region (c) Sign identification performance as a function of disparity (d) (e) Same as in (b) (c) except that data is for

surfaces with a cosine distribution of slants

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 8

classic push-pull relationship (Ferster amp Miller 2000)(Figure 7b) consistent with the fact that disparityselective neurons in visual cortex contain both excit-atory and suppressive subunits (Tanabe et al 2011)

Sixth the LL neuron disparity tuning curves areapproximately log-Gaussian in shape (ie Gaussian ona log-disparity axis) Because their standard deviationsare approximately constant on a log disparity axistheir disparity bandwidths increase linearly with thepreferred disparity (Figure 7c) In this respect many

cortical neurons behave like LL neurons the disparitytuning functions of cortical neurons typically havemore low frequency power than is predicted from thestandard energy model (Ohzawa DeAngelis amp Free-man 1997 Read amp Cumming 2003) Additionallypsychophysically estimated disparity channels in hu-mans exhibit a similar relationship between bandwidthand preferred disparity (Stevenson et al 1992) Humandisparity channels however differ in that they haveinhibitory side-lobes (Stevenson et al 1992) that is

Figure 6 Biologically plausible implementation of selective invariant tuning processing schematics and disparity tuning curves for an

AMA filter a model complex cell and a model log-likelihood (LL) neuron For comparison a disparity energy unit is also presented In

all cases the inputs are contrast normalized photoreceptor responses Disparity tuning curves show the mean response of each filter

type across many natural image patches having the same disparity for many different disparities Shaded areas show response

variation due to variation in irrelevant features in the natural patches (not neural noise) Selectivity for disparity and invariance to

irrelevant features (external variation) increase as processing proceeds (a) The filter response is obtained by linearly filtering the

contrast normalized input signal with the AMA filter (b) The model complex cell response is obtained by squaring the linear AMA

filter response (c) The response of an LL neuron with preferred disparity dk is obtained by a weighted sum of linear and squared

filter responses The weights can be positiveexcitatory or negativeinhibitory (see Figure 7) The weights for an LL neuron with a

particular preferred disparity are specified by the filter response distribution to natural images having that disparity (Figure 4a

Equations 4 5 S1ndash 4) In disparity estimation the filter responses specify that the weights on the linear filter responses are near zero

(see Supplement) (d) A standard disparity energy unit is obtained by simply summing the squared responses of two binocular linear

filters that are in quadrature (908 out of phase with each other) Here we show the tuning curve of a disparity energy unit having

binocular linear filters (subunits) with left and right-eye components that are also 908 out of phase with each other (ie each

binocular subunit is selective for a nonzero disparity)

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 9

they have the center-surround organization that is ahallmark of retinal ganglion cell LGN and V1receptive fields Understanding the basis of this center-surround organization is an important direction forfuture work

V1 binocular neurons are unlikely to have receptivefields exactly matching those of the simple and complexcells implied by the optimal AMA filters but V1neurons are likely to span the subspace spanned by theoptimal filters It is thus plausible that excitatory andinhibitory synaptic weights could develop so that asubset of the neurons in V1 or in other cortical areassignal the log likelihood of different specific disparities(see Figure 7b) Indeed some cells in cortical areas V1V2 and V3V3a exhibit sharp tuning to the disparity ofrandom dot stimuli (Cumming 2002 Ohzawa et al1997 G F Poggio et al 1988 Read amp Cumming2003)

Computational models of estimation from neuralpopulations often rely on the assumption that eachneuron is invariant and unimodally tuned to thestimulus property of interest (Girshick et al 2011Jazayeri amp Movshon 2006 Lehky amp Sejnowski 1990Ma et al 2006) However it is often not discussed howinvariant unimodal tuning arises For example thebinocular neurons (ie complex cells) predicted by thestandard disparity energy model do not generallyexhibit invariant unimodal tuning (Figure 6d) Ouranalysis shows that neurons with unimodal tuning tostimulus properties not trivially available in the retinal

images (eg disparity) can result from appropriatelinear combination of nonlinear filter responses

To obtain optimal disparity estimates the LLneuron population response (represented by the vectorRLL in Figure 2c) must be read out (see Figure 2d)The optimal read-out rule depends on the observerrsquosgoal (the cost function) A common goal is to pick thedisparity having the maximum a posteriori probabil-ity If the prior probability of the different possibledisparities is uniform then the optimal MAP decodingrule reduces to finding the LL neuron with themaximum response Nonuniform prior probabilitiescan be taken into account by adding a disparity-dependent constant to each LL neuron responsebefore finding the peak response There are elegantproposals for how the peak of a population responsecan be computed in noisy neural systems (Jazayeri ampMovshon 2006 Ma et al 2006) Other commonlyassumed cost functions (eg MMSE) yield similarperformance

In sum our analysis has several implications First itsuggests that optimal disparity estimation is bestunderstood in the context of a population codeSecond it shows how to linearly sum nonlinear neuralresponses to construct cells with invariant unimodaltuning curves Third it suggests that the eclecticmixture of binocular receptive field properties in cortexmay play a functional role in disparity estimationFourth it provides a principled hypothesis for howneurons may compute the posterior probabilities of

Figure 7 Constructing selective invariant disparity-tuned units (LL neurons) (a) Tuning curves for several LL neurons each with a

different preferred disparity Each point on the tuning curve represents the average response across a collection of natural stereo-

images having the same disparity Gray areas indicate 61 SD of response due to stimulus-induced response variability (b)

Normalized weights on model complex cell responses (see Supplement Equation 5 Figure 6c) for constructing the five LL neurons

marked with arrows in (a) Positive weights are excitatory (red) Negative weights are inhibitory (blue) On-diagonal weights

correspond to model complex cells having linear receptive fields like the filters in Figure 3a Off-diagonal weights correspond to model

complex cells having linear receptive fields like the scaled pairwise sums of the AMA filters (see Figures S3ndashS5) High spatial

frequencies are not useful for estimating large disparities (Figure S7) Thus the number of strongly weighted complex cells decreases

as the magnitude of the preferred disparity increases from zero (c) LL neuron bandwidth (ie full-width at half-height of disparity

tuning curves) as a function of preferred disparity Bandwidth increases approximately linearly with tuning

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 10

stimulus properties (eg disparity defocus motion)not trivially available in the retinal image(s) Thus ouranalysis provides a recipe for how to increase selectivityand invariance in the visual processing stream invari-ance to image content variability and selectivity for thestimulus property of interest

Discussion

Using principles of Bayesian statistical decisiontheory we showed how to optimally estimate binoc-ular disparity in natural stereo images given theoptical sensor and noise properties of the humanvisual system First we determined the linear binoc-ular filters that encode the retinal image informationmost useful for estimating disparity in natural stereo-images The optimal linear filters share many proper-ties with binocular simple cells in the primary visualcortex of monkeys and cats (Figure 3 Figure S1)Then we determined how the optimal linear filterresponses should be nonlinearly decoded to maximizethe accuracy of disparity estimation The optimaldecoder was not based on prior assumptions butrather was dictated by the statistical pattern of thejoint filter responses to natural stereo images Overallperformance was excellent and matched importantaspects of human performance (Figure 5) Finally weshowed that the optimal encoder and decoder can beimplemented with well-established neural operationsThe operations show how selectivity and invariancecan emerge along the visual processing stream(Figures 6 amp 7)

External variability and neural noise

A visual systemrsquos performance is limited by bothexternal (stimulus-induced) variability and intrinsicneural response variability (ie neural noise) Manytheoretical studies have focused on the impact ofneural noise on neural computation (Berens EckerGerwinn Tolias amp Bethge 2011 Cumming ampDeAngelis 2001 DeAngelis et al 1991 Jazayeri ampMovshon 2006 Ma et al 2006 Nienborg et al 2004Ohzawa DeAngelis amp Freeman 1990) However inmany real-world situations stimulus-induced responsevariability (ie variability due to nuisance stimulusproperties) is a major or the dominant source ofvariability Our analysis demonstrates that the exter-nal variation in natural image contentmdashthat isvariation in image features irrelevant to the taskmdashconstitutes an important source of response variabilityin disparity estimation (see Figures 4 5 7) Thus evenin a hypothetical visual system composed of noiseless

neurons significant response variability will occurwhen estimating disparity under natural conditions(see Figures 4 and 5)

Determining the best performance possible withoutneural noise is essential for understanding the effect ofneural noise (Haefner amp Bethge 2010 Lewicki 2002Li amp Atick 1994 Olshausen amp Field 1996 Simoncelliamp Olshausen 2001) because it sets bounds on theamount of neural noise that a system can toleratebefore performance significantly deteriorates Indeedthe impact of neural noise on encoding and decoding innatural viewing cannot be evaluated without consider-ing the impact of stimulus-induced response variabilityon the population response (see Supplement)

Seventy years ago Hecht Shlaer and Pirenne (1942)showed in a landmark study that dark-adapted humanscan detect light in the periphery from the absorption ofonly five to eight quanta and that external variability(in the form of photon noise) was the primary source ofvariability in human performance This result contra-dicted the notion (widespread at the time) that responsevariability is due primarily to variability within theorganism (eg intrinsic noise) Similarly our resultsunderscore the importance of analyzing the variabilityof natural signals when analyzing the responsevariability of neural circuits that underlie performancein critical behavioral tasks

Comparison with the disparity energy model

The most common neurophysiological model ofdisparity processing is the disparity energy modelwhich aims to account for the response properties ofbinocular complex cells (Cumming amp DeAngelis 2001DeAngelis et al 1991 Ohzawa 1998 Ohzawa et al1990) The model had remarkable initial successHowever an increasing number of discrepancies haveemerged between the neurophysiology and the modelrsquospredictions The standard disparity energy model forexample does not predict the reduced response toanticorrelated stereograms observed in cortex It alsodoes not predict the dependence of disparity selectivecells on more than two binocular subunits Fixes to thestandard model have been proposed to account for theobserved binocular response properties (Haefner ampCumming 2008 Qian amp Zhu 1997 Read Parker ampCumming 2002 Tanabe amp Cumming 2008) We haveshown however that a number of these properties area natural consequence of an optimal algorithm fordisparity estimation

The most common computational model of disparityprocessing is the local cross-correlation model (Bankset al 2004 Cormack et al 1991 Tyler amp Julesz 1978)In this model the estimated disparity is the one thatproduces the maximum local cross-correlation between

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 11

the images in the two eyes Local cross-correlation isthe computational equivalent of appropriately pro-cessing large populations of disparity-energy complexcells having a wide range of tuning characteristics(Anzai Ohzawa amp Freeman 1999 Fleet Wagner ampHeeger 1996) Cross-correlation is optimal when thedisparities across a patch are small and constantbecause it is then akin to the simple template matchingof noise-limited ideal observers Cross-correlation isnot optimal for larger or varying disparities becausethen the left- and right-eye signals falling within abinocular receptive field often differ substantially

How does our ideal observer for disparity estimationcompare with the standard disparity-energy and cross-correlation models To enable a meaningful compari-son both methods were given access to the same imageinformation Doing so requires a slight modification tothe standard cross correlation algorithm (seeSupplement) The eight AMA filters in Figure 3outperform cross-correlation in both accuracy andprecision when the disparities exceed approximately75 arcmin (Figure S6) This performance increase isimpressive given that local cross-correlation has (ineffect) access to a large bank of filters whereas ourmethod uses only eight Furthermore increasing thenumber of AMA filters to 10ndash12 gives essentiallyequivalent performance to cross-correlation for smallerdisparities

Spatial frequency tuning of optimal linearbinocular filters

The optimal linear binocular filters (the AMA filters)are selective for low to mid spatial frequencies (12ndash47c8) and not for higher frequencies (see Figure 3 FigureS1) To develop an intuition for why we examined thebinocular contrast signals as a function of frequencythat result from a binocularly viewed high-contrastedge (Figure S7) The binocular contrast signals barelydiffer above 6 c8 indicating that higher spatialfrequencies carry little information for disparity esti-mation

This analysis of binocular contrast signals providesinsight into another puzzling aspect of disparityprocessing Human sensitivity to disparity modulation(ie sinusoidal modulations in disparity-defined depth)cuts off at very low frequencies (4 c8) (Banks et al2004 Tyler 1975) V1 binocular receptive fields tunedto the highest useful (luminance) spatial frequency (6c8) have widths of 8 arcmin assuming the typicaloctave bandwidth for cortical neurons Given thatreceptive fields cannot signal variation finer than theirown size the fact that humans cannot see disparitymodulations higher than 4 c8 is nicely predicted bythe 8 arcmin width suggested by the present analysis

This size matches previous psychophysically basedestimates of the smallest useful mechanism in disparityestimation (Banks et al 2004 Harris McKee ampSmallman 1997)

Eye movement jitter has previously been proposed asan explanation for why useful binocular information isrestricted to low spatial frequencies (Vlaskamp Yoonamp Banks 2011) The argument is that jitter and therelatively long integration time of the stereo systemcould lsquolsquosmear outrsquorsquo the disparity signals Eye movementjitter certainly can degrade stereo information enoughto render high frequency signals unmeasurable (FigureS7c d) However the analysis in the supplement(Figure S7) suggests that the low frequency at whichhumans lose the ability to detect disparity modulationmay also be explained by the statistics of naturalimages

Generality of findings

Although our analysis is based on calibrated naturalstereo-image patches and realistic characterizations ofhuman optics sensors and neural noise there are somesimplifying assumptions that could potentially limit thegenerality of our conclusions Here we examine theeffect of these assumptions

The effect of surface slant and nonplanar depth structure

The stereo-image patches were projections offronto-parallel surfaces In natural scenes surfaces areoften slanted which causes the disparity to changeacross a patch We evaluated the effect of surface slantby repeating our analysis with a training set having adistribution of slants (see Figure 5d e andSupplement) This distribution of slants produced adistribution of disparity gradients that are comparableto those that occur when viewing natural scenes(Hibbard 2008) (Figure S1g) The optimal binocularfilters are only very slightly affected by surface slant(cf Figure 3 and Figure S1a) These differenceswould be difficult to detect in neurophysiologicalexperiments This may help explain why there hasbeen little success in finding neurons in early visualcortex that are tuned to nonzero slants (Nienborg etal 2004) Disparity estimation performance is alsoquite similar although estimate precision is somewhatreduced (Figure 5d e)

Even after including the effect of surface slant ourtraining set lacks the non-planar depth structuremdashwithin-patch depth variation and depth discontinuities(occlusions)mdashthat is present in natural scenes (FigureS8andashc) To determine how the magnitude of theseother sources depth variation compare to that due toslant we analyzed a set of range images obtained with

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 12

a high-precision Riegl VZ400 range scanner (seeSupplement) We find that most of the depth variationin the range images is captured by variation in slant(Figure S8d) Given that variation in slant has littleeffect on the optimal receptive fields and given thatlocations of fine depth structure and monocularocclusion zones (9 of points in our dataset) aremost likely random with respect to the receptive fieldsthen it is likely that the other sources of variation willhave little effect

Nevertheless occlusions create monocular zones thatcarry information supporting da Vinci stereopsis inhumans (Kaye 1978 Nakayama amp Shimojo 1990)Neurophysiological evidence suggests that binocularmechanisms exist for locating occlusion boundaries(Tsao Conway amp Livingstone 2003) In the futurewhen databases become available of high-qualitynatural stereo images with coregistered RGB anddistance information the general approach describedhere may prove useful for determining the filters thatare optimal for detecting occlusion boundaries

The effect of a realistic disparity prior

Our analysis assumed a flat prior probabilitydistribution over disparity Two groups have recentlypublished estimates of the prior distribution ofdisparities based on range measurements in naturalscenes and human eye movement statistics (Hibbard2008 Liu et al 2008) Does the prior distribution ofdisparity signals have a significant effect on the optimalfilters and estimation performance To check wemodified the frequency of different disparities in ourtraining set to match the prior distribution ofdisparities encountered in natural viewing (Liu et al2008) Linear filter shapes LL neuron tuning curveshapes and performance levels were robust to differ-ences between a flat and realistic disparity prior

It has been hypothesized that the prior over thestimulus dimension of interest (eg disparity) maydetermine the optimal tuning curve shapes and theoptimal distribution of peak tunings (ie how thetuning curves should tile the stimulus dimension)(Ganguli amp Simoncelli 2010) In evaluating thishypothesis one must consider the results presented inthe present paper Our results show that natural signalsand the linear receptive fields that filter those signalsplace strong constraints on the shapes that tuningcurves can have Specifically the shapes of the LLneuron disparity tuning curve (approximately log-Gaussian) are robust to changes in the prior Thusalthough the prior may influence the optimal distribu-tion of peak tuning (our analysis does not address thisissue) it is unlikely to be the sole (or even the primary)determinant of tuning curve shapes

The effect of contrast normalization

Contrast normalization significantly contributes tothe Gaussian form of the filter response distributionsOne potential advantage of Gaussian response distri-butions is that pair-wise (and lower order) statisticsfully characterize the joint responses from an arbitrarynumber of filters making possible the decoding of largefilter population responses In the analysis presentedhere all binocular signals were normalized to a mean ofzero and a vector magnitude of 10 before beingprojected onto the binocular filters This is a simplifiedform of the contrast normalization that is a ubiquitousfeature of retinal and early cortical processing (Car-andini amp Heeger 2012) The standard model ofcontrast normalization in cortical neurons is given by

cnorm(x)frac14 c(x)ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffijjcethxTHORNjj2 thorn nc2

50

qwhere n is the dimen-

sionality of the vector and c50 is the half-saturationconstant (Albrecht amp Geisler 1991 Albrecht ampHamilton 1982 Heeger 1992) (The half-saturationconstant is so-called because in a neuron with anoutput squaring nonlinearity the response rate willequal half its maximum when the contrast of thestimulus equals the value of c50)

We examined the effect of different half-saturationconstants The optimal filters are robust to differentvalues of c50 The conditional response distributions arenot For large values of c50 the response distributionshave tails much heavier than Gaussians When c50frac14 00(as it did throughout the paper) the distributions arewell approximated but somewhat lighter-tailed thanGaussians The distributions (eg see Figure 4a) aremost Gaussian on average when c50 frac14 01 (Figure S9)On the basis of this finding we hypothesize a newfunction for cortical contrast normalization in additionto the many already proposed (Carandini amp Heeger2012) Contrast normalization may help create condi-tional filter response distributions that are Gaussianthereby making simpler the encoding and decoding ofhigh-dimensional subspaces of retinal image informa-tion

The effect of different optics in the two eyes

We modeled the optics of the two eyes as beingidentical whereas refractive power and monochromaticaberrations often differ somewhat between the eyes(Marcos amp Burns 2000) To evaluate the effect ofnormal optical variations we convolved the retinalprojections with point-spread functions previouslymeasured in the first authorrsquos left and right eyes (Burgeamp Geisler 2011) (The first author has normal opticalvariation) Then we determined optimal filters andperformance Filters and performance levels are similarto those in Figures 3 and 4 Thus our results are robustto typical optical differences between the eyes How-

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 13

ever if the optics in one eye is severely degradedrelative to the other then the optimal filters are quitedifferent and disparity estimation performance issignificantly poorer

Interestingly a persistent 4 diopter difference inrefractive power between the two eyes is the mostimportant risk factor for the development of amblyopia(Levi McKee amp Movshon 2011) a condition char-acterized by extremely poor or absent stereopsis(among other deficits) A four diopter differencebetween the left and right eye optics drastically reduces(ie nearly eliminates) computable interocular contrastsignals that covary systematically with disparity(Equation S6 Figure S7endashh)

Conclusions

The method presented here provides a prescriptionfor how to optimally estimate local depth cues fromimages captured by the photosensors in any visionsystem Our analysis of the depth cue of binoculardisparity provides insight into the computations andneural populations required for accurate estimation ofbinocular disparity in animals viewing natural scenesSpecifically by analyzing the information available atthe photoreceptors we improve upon existing compu-tational methods for solving the stereo correspondenceproblem and provide a normative account of a range ofestablished neurophysiological and psychophysicalfindings This study demonstrates the power ofcharacterizing the properties of natural signals that arerelevant for performing specific natural tasks A similarrecent analysis provided insight into the neuralcomputations required for accurate estimation of thefocus error (defocus) in local regions of retinal imagesof natural scenes (Burge amp Geisler 2011 Stevenson etal 1992) The same approach seems poised to provideinsight into the neural encoding and decoding thatunderlie many other fundamental sensory and percep-tual tasks

Methods details

Natural disparity signals

To determine the luminance structure of naturalscenes we photographed natural scenes on and aroundthe University of Texas at Austin campus with a tripod-mounted Nikon D700 14-bit SLR camera (4256 middot 2836pixels) fitted with a Sigma 50 mm prime lens (Burge ampGeisler 2011) To ensure sharp photographs thecamera lens was focused on optical infinity and all

imaged objects were at least 16 m from the cameraRAW photographs were converted to luminance valuesusing the measured sensitivities of the camera (Burge ampGeisler 2011 2012) and the human photopic sensitivityfunction (Stockman amp Sharpe 2000) Twelve hundred256 middot 256 pixel patches were randomly selected from80 photographs (15 Patches middot 80 Images Figure 1b)Four hundred were used for training the other 800were used for testing

To approximate the depth structure of naturalscenes we texture map the luminance patches ontoplanar fronto-parallel or slanted surface patches thatwere straight ahead at eye height After perspectiveprojection this procedure yields the same pattern ofretinal stimulation that would be created by viewingphotographs pasted on surfaces (eg walls) differentlyslanted in depth The disparity of the surface patch wasdefined as the disparity of the surfacersquos central point

d frac14 2 tan1 IPD=2

dfixation thorn D

tan1 IPD=2

ethdfixationTHORN

eth6THORNwhere dfixation is the fixation distance to the surfacecenter IPD is the interpupillary distance and D is thedepth Depth is given by D frac14 dsurface dfixation wheredsurface is the distance to the surface The disparity ofother points on the surface patch varies slightly acrossthe surface patch The change in disparity across thesurface is greater when the surface patches are slanted

Each surface patch was either coincident with ordisplaced from the point of fixation (Figure 1c) that isthe surface patches were positioned at one of 37 depthscorresponding to 37 disparity levels within Panumrsquosfusional range (Panum 1858) (16875 to 16875arcmin in equally spaced steps) Each surface patchsubtended 28 of visual angle from the cyclopean eyeSurfaces were then slanted (or not) and left- and right-eye projections were determined via perspective pro-jection ensuring that horizontal and vertical disparitieswere correct (Figure 1c) Next left and right projec-tions were resampled at 128 samples8 approximatelythe sampling rate of the human foveal cones (CurcioSloan Kalina amp Hendrickson 1990) Finally theimages were cropped such that they subtended 18 fromeach eye (6 058 about the left- and right-eye foveae)

Optics

Patches were defocused with a polychromatic point-spread function based on a wave-optics model of thehuman visual system The model assumed a 2 mm pupil(a size typical for a bright sunny day) (Wyszecki ampStiles 1982) human chromatic aberrations (ThibosYe Zhang amp Bradley 1992) a single refracting

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 14

surface and the Fraunhoffer approximation whichimplies that at or near the focal plane the opticaltransfer function (OTF) is given by the cross-correla-tion of the generalized pupil function with its complexconjugate In humans and nonhuman primates ac-commodative and vergence eye-movement systems aretightly coupled In other words the focus distanceusually equals the fixation distance (Fincham ampWalton 1957) We set the focus distance equal to thefixation distance of 40 cm and defocused the imagesappropriate for each disparity See Supplement forfurther details of how the optics were simulated

Sensor array responses

The information for disparity estimation depends onthe depth and luminance structure of natural scenesprojective geometry the optics of the two eyes thesensor arrays and the sensor noise These factorstogether determine the pattern of noisy sensor re-sponses

reethxTHORN frac14 IeethxTHORNpsfeethxTHORNfrac12 sampethxTHORN thorn g eth7THORNfor each eye e Ie(x) represents an eye-specific lumi-nance image of the light striking the sensor array ateach location xfrac14 (x y) in a hypothetical optical systemthat causes zero degradation in image quality Eacheyersquos optics degrade the idealized image the optics arerepresented by a polychromatic point-spread functionpsfe(x) that contains the effects of defocus chromaticaberration and photopic wavelength sensitivity Thesensor arrays are represented by a spatial samplingfunctions samp(x) Finally the sensor responses arecorrupted by noise g the noise level was set just highenough to remove retinal image detail that is unde-tectable by the human visual system (Williams 1985)(In pilot studies we found that the falloff in contrastsensitivity at low frequencies had a negligible effect onresults for simplicity we did not model its effects) Notethat Equation 7 represents a luminance (photopic)approximation of the images that would be captured bythe photoreceptors This approximation is sufficientlyaccurate for the present purposes (Burge amp Geisler2011)

Accuracy maximization analysis (AMA)

AMA is a method for dimensionality reduction thatfinds the low-dimensional set of features in sensorysignals that are most useful for performing a specifictask The dependence of AMA on task renders itdistinct from many other popular methods for dimen-sionality reduction Principal components analysis(PCA) for example finds the dimensions that account

for the most variation in the sensory signals There isof course no guarantee that factors causing the mostvariation in the sensory signals will be useful for thetask at hand Color lighting and the multitude oftextures in natural images for example cause signifi-cant variation in the retinal images Yet all of thisvariation is irrelevant for the task of estimatingdisparity Thus PCA returns features that are notnecessarily useful to the task while AMA is specificallydesigned to ignore the irrelevant variation

The logic of AMA is as follows Consider encodingeach training stimulus with a small population of filterswith known (internal) noise characteristics The filterresponses are given by the dot product of the contrast-normalized stimulus scaled by the maximum responserate plus response noise (here a small amount ofGaussian noise) With a known noise model it isstraightforward to compute the mean and variance ofeach filterrsquos response to each training stimulus Then aclosed-form expression is derived for the approximateaccuracy of the Bayesian optimal decoder (GeislerNajemnik amp Ing 2009) Finally this closed-formexpression can be used to search the space of linearfilters to find those that give the most accurateperformance If the algorithm does not settle in localminima it finds the Bayes-optimal filters for maximiz-ing performance in a given task (Geisler et al 2009)Here different random initializations yielded the samefinal estimated filters A Matlab implementation ofAMA and a short discussion of how to apply it areavailable at httpjburgecpsutexaseduresearchCodehtml

It is important to note that the Bayesian optimaldecoder used in AMA requires knowing the means andvariances for each possible stimulus and hence it canonly be used to decode the training stimuli In otherwords AMA can find the optimal linear filters given alarge enough training set but the decoder it uses tolearn those filters cannot be applied to arbitrary stimuliA separate analysis (like the one described in the bodyof the present paper) is required to determine how tooptimally decode the AMA filter responses for arbi-trary test stimuli

Estimating disparity

Increasing the number of disparity levels in thetraining set increases the accuracy of disparity estima-tion However increasing the number of disparitylevels in the training set also increases the training setsize and the computational complexity of learningfilters via AMA A balance must be struck Excellentcontinuous estimates are obtained using 1875 arcminsteps for training followed by interpolation of 577 filterresponse distributions (ie 31 interpolated response

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 15

distribution between each training step correspondingto one interpolated distribution every 35 arcsec)Interpolated distributions were obtained by fittingcubic splines through the response distribution meanvectors and covariance matrices This procedureresulted in 577 LL neurons which resulted in posteriorprobability distributions that were defined by 577discrete points MAP estimates were obtained byselecting the disparity with the highest posteriorprobability Increasing the number of interpolateddistributions (ie LL neurons) had no effect onperformance

Keywords natural scene statistics perceptual con-stancy ideal observer Bayesian statistics populationcode encoding decoding selectivity invariance depthperception stereopsis hierarchical model disparityenergy model simple cells complex cells

Acknowledgments

The authors thank Larry Cormack Bruno Olshau-sen and Jonathan Pillow for comments on an earlierdraft JB was supported by NSF grant IIS-1111328NIH grant 2R01-EY011747 and NIH Training Grant1T32-EY021462-01A1 WSG was supported by NIHgrant 2R01-EY011747

Commercial relationships noneCorresponding author Johannes BurgeEmail jburgemailcpsutexaseduAddress Center for Perceptual Systems University ofTexas at Austin Austin TX USA

References

Albrecht D G amp Geisler W S (1991) Motionselectivity and the contrast-response function ofsimple cells in the visual cortex Visual Neurosci-ence 7(6) 531ndash546

Albrecht D G amp Hamilton D B (1982) Striatecortex of monkey and cat Contrast responsefunction Journal of Neurophysiology 48(1) 217ndash237

Anzai A Ohzawa I amp Freeman R D (1999)Neural mechanisms for processing binocular in-formation I Simple cells Journal of Neurophysiol-ogy 82(2) 891ndash908

Badcock D R amp Schor C M (1985) Depth-increment detection function for individual spatialchannels Journal of the Optical Society of AmericaA 2(7) 1211ndash1215

Banks M S Gepshtein S amp Landy M S (2004)Why is spatial stereoresolution so low Journal ofNeuroscience 24(9) 2077ndash2089

Berens P Ecker A S Gerwinn S Tolias A S ampBethge M (2011) Reassessing optimal neuralpopulation codes with neurometric functionsProceedings of the National Academy of SciencesUSA 108(11) 4423

Blakemore C (1970) The range and scope ofbinocular depth discrimination in man The Journalof Physiology 211(3) 599ndash622

Burge J amp Geisler W S (2011) Optimal defocusestimation in individual natural images Proceed-ings of the National Academy of Sciences USA108(40) 16849ndash16854

Burge J amp Geisler W S (2012) Optimal defocusestimates from individual images for autofocusing adigital camera Proceedings of the ISampTSPIE 47thAnnual Meeting January 2012 Burlingame CA

Carandini M amp Heeger D J (2012) Normalizationas a canonical neural computation Nature ReviewsNeuroscience 13(1) 51ndash62

Chen Y amp Qian N (2004) A coarse-to-fine disparityenergy model with both phase-shift and position-shift receptive field mechanisms Neural Computa-tion 16(8) 1545ndash1577

Cormack L K Stevenson S B amp Schor C M(1991) Interocular correlation luminance contrastand cyclopean processing Vision Research 31(12)2195ndash2207

Cumming B G (2002) An unexpected specializationfor horizontal disparity in primate primary visualcortex Nature 418(6898) 633ndash636

Cumming B G amp DeAngelis G C (2001) Thephysiology of stereopsis Annual Review of Neuro-science 24 203ndash238

Cumming B G Shapiro S E amp Parker A J (1998)Disparity detection in anticorrelated stereogramsPerception 27(11) 1367ndash1377

Curcio C A Sloan K R Kalina R E ampHendrickson A E (1990) Human photoreceptortopography The Journal of Comparative Neurolo-gy 292(4) 497ndash523

De Valois R L Albrecht D G amp Thorell L G(1982) Spatial frequency selectivity of cells inmacaque visual cortex Vision Research 22(5) 545ndash559

DeAngelis G C Ohzawa I amp Freeman R D(1991) Depth is encoded in the visual cortex by aspecialized receptive field structure Nature352(6331) 156ndash159

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 16

Ferster D amp Miller K D (2000) Neural mechanismsof orientation selectivity in the visual cortexAnnual Review of Neuroscience 23(1) 441ndash471

Fincham E amp Walton J (1957) The reciprocalactions of accommodation and convergence TheJournal of Physiology 137(3) 488ndash508

Fleet D J Wagner H amp Heeger D J (1996)Neural encoding of binocular disparity Energymodels position shifts and phase shifts VisionResearch 36(12) 1839ndash1857

Ganguli D amp Simoncelli E P (2010) Implicitencoding of prior probabilities in optimal neuralpopulations Advances in Neural Information Pro-cessing Systems (NIPS) 23 658ndash666

Geisler W S Najemnik J amp Ing A D (2009)Optimal stimulus encoders for natural tasksJournal of Vision 9(13)17 1ndash16 httpwwwjournalofvisionorgcontent91317 10116791317 [PubMed] [Article]

Girshick A R Landy M S amp Simoncelli E P(2011) Cardinal rules Visual orientation percep-tion reflects knowledge of environmental statisticsNature Neuroscience 14(7) 926ndash932

Haefner R amp Bethge M (2010) Evaluating neuronalcodes for inference using Fisher informationAdvances in Neural Information Processing Systems23 1ndash9

Haefner R M amp Cumming B G (2008) Adaptationto natural binocular disparities in primate V1explained by a generalized energy model Neuron57(1) 147ndash158

Harris J M McKee S P amp Smallman H S (1997)Fine-scale processing in human binocular stereop-sis Journal of the Optical Society of America AOptics Image Science and Vision 14(8) 1673ndash1683

Hecht S Shlaer S amp Pirenne M H (1942) Energyquanta and vision The Journal of General Physi-ology 25(6) 819ndash840

Heeger D J (1992) Normalization of cell responses incat striate cortex Visual Neuroscience 9(2) 181ndash197

Hibbard P B (2008) Binocular energy responses tonatural images Vision Research 48(12) 1427ndash1439

Hoyer P O amp Hyvarinen A (2000) Independentcomponent analysis applied to feature extractionfrom colour and stereo images Network Compu-tation in Neural Systems 11(3) 191ndash210

Jazayeri M amp Movshon J A (2006) Optimalrepresentation of sensory information by neuralpopulations Nature Neuroscience 9(5) 690ndash696

Kaye M (1978) Stereopsis without binocular corre-lation Vision Research 18(8) 1013ndash1022

Landers D D amp Cormack L K (1997) Asymme-tries and errors in perception of depth fromdisparity suggest a multicomponent model ofdisparity processing Perception amp Psychophysics59(2) 219ndash231

Lehky S R amp Sejnowski T J (1990) Neural modelof stereoacuity and depth interpolation based on adistributed representation of stereo disparityJournal of Neuroscience 10(7) 2281ndash2299

Levi D M McKee S P amp Movshon J A (2011)Visual deficits in anisometropia Vision Research51(1) 48ndash57

Lewicki M S (2002) Efficient coding of naturalsounds Nature Neuroscience 5(4) 356ndash363

Li Z amp Atick J J (1994) Efficient stereo coding inthe multiscale representation Network Computa-tion in Neural Systems 5 157ndash174

Liu Y Bovik A C amp Cormack L K (2008)Disparity statistics in natural scenes Journal ofVision 8(11)19 1ndash14 httpwwwjournalofvisionorgcontent81119 doi10116781119 [PubMed][Article]

Ma W J Beck J M Latham P E amp Pouget A(2006) Bayesian inference with probabilistic pop-ulation codes Nature Neuroscience 9(11) 1432ndash1438

Marcos S amp Burns S A (2000) On the symmetrybetween eyes of wavefront aberration and conedirectionality Vision Research 40(18) 2437ndash2447

Marr D amp Poggio T (1976) Cooperative computa-tion of stereo disparity Science 194(4262) 283ndash287

McKee S P Levi D M amp Bowne S F (1990) Theimprecision of stereopsis Vision Research 30(11)1763ndash1779

Nakayama K amp Shimojo S (1990) Da Vincistereopsis Depth and subjective occluding contoursfrom unpaired image points Vision Research30(11) 1811ndash1825

Nienborg H Bridge H Parker A J amp CummingB G (2004) Receptive field size in V1 neuronslimits acuity for perceiving disparity modulationJournal of Neuroscience 24(9) 2065ndash2076

Ogle K N (1952) On the limits of stereoscopic visionJournal of Experimental Psychology 44(4) 253ndash259

Ohzawa I (1998) Mechanisms of stereoscopic visionThe disparity energy model Current Opinion inNeurobiology 8(4) 509ndash515

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 17

Ohzawa I DeAngelis G C amp Freeman R D(1990) Stereoscopic depth discrimination in thevisual cortex Neurons ideally suited as disparitydetectors Science 249(4972) 1037ndash1041

Ohzawa I DeAngelis G C amp Freeman R D(1997) Encoding of binocular disparity by complexcells in the catrsquos visual cortex Journal of Neuro-physiology 77(6) 2879ndash2909

Olshausen B A amp Field D J (1996) Emergence ofsimple-cell receptive field properties by learning asparse code for natural images Nature 381(6583)607ndash609

Panum P (1858) Physiologische Untersuchungen uberdas Sehen mit zwei Augen [Translation Psycholog-ical investigations on seeing with two eyes] KielSchwerssche Buchandlung

Poggio G F Gonzalez F amp Krause F (1988)Stereoscopic mechanisms in monkey visual cortexBinocular correlation and disparity selectivityJournal of Neuroscience 8(12) 4531ndash4550

Qian N (1997) Binocular disparity and the perceptionof depth Neuron 18(3) 359ndash368

Qian N amp Zhu Y (1997) Physiological computationof binocular disparity Vision Research 37(13)1811ndash1827

Read J C A amp Cumming B G (2003) Testingquantitative models of binocular disparity selectiv-ity in primary visual cortex Journal of Neurophys-iology 90(5) 2795ndash2817

Read J C A amp Cumming B G (2007) Sensors forimpossible stimuli may solve the stereo correspon-dence problem Nature Neuroscience 10(10) 1322ndash1328

Read J C A Parker A J amp Cumming B G (2002)A simple model accounts for the response ofdisparity-tuned V1 neurons to anticorrelated im-ages Visual Neuroscience 19(6) 735ndash753

Simoncelli E P amp Olshausen B A (2001) Naturalimage statistics and neural representation AnnualReview of Neuroscience 24 1193ndash1216

Skottun B C De Valois R L Grosof D HMovshon J A Albrecht D G amp Bonds A B(1991) Classifying simple and complex cells on the

basis of response modulation Vision Research31(7-8) 1079ndash1086

Stevenson S B Cormack L K Schor C M ampTyler C W (1992) Disparity tuning in mecha-nisms of human stereopsis Vision Research 32(9)1685ndash1694

Stockman A amp Sharpe L T (2000) The spectralsensitivities of the middle- and long-wavelength-sensitive cones derived from measurements inobservers of known genotype Vision Research40(13) 1711ndash1737

Tanabe S amp Cumming B G (2008) Mechanismsunderlying the transformation of disparity signalsfrom V1 to V2 in the macaque Journal ofNeuroscience 28(44) 11304ndash11314

Tanabe S Haefner R M amp Cumming B G (2011)Suppressive mechanisms in monkey V1 help tosolve the stereo correspondence problem Journal ofNeuroscience 31(22) 8295ndash8305

Thibos L N Ye M Zhang X amp Bradley A(1992) The chromatic eye A new reduced-eyemodel of ocular chromatic aberration in humansApplied Optics 31(19) 3594ndash3600

Tsao D Y Conway B R amp Livingstone M S(2003) Receptive fields of disparity-tuned simplecells in macaque V1 Neuron 38(1) 103ndash114

Tyler C W (1975) Spatial organization of binoculardisparity sensitivity Vision Research 15(5) 583ndash590

Tyler C W amp Julesz B (1978) Binocular cross-correlation in time and space Vision Research18(1) 101ndash105

Vlaskamp B N S Yoon G amp Banks M S (2011)Human stereopsis is not limited by the optics of thewell-focused eye Journal of Neuroscience 31(27)9814ndash9818

Williams D R (1985) Visibility of interference fringesnear the resolution limit Journal of the OpticalSociety of America A 2(7) 1087ndash1093

Wyszecki G amp Stiles W (1982) Color scienceConcepts and methods quantitative data and for-mulas New York John Wiley amp Sons

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 18

113 13

Optimal13 disparity13 estimation13 in13 natural13 stereo-shy‐images13 (Supplement)13

Johannes13 Burge13 amp13 Wilson13 S13 Geisler13 13 Computing13 the13 log13 likelihood13 with13 populations13 of13 simple13 and13 complex13 cells13 13 Simple13 cells13 in13 visual13 cortex13 produce13 only13 positive13 responses13 whereas13 AMA13 filters13 produce13 both13 positive13 and13 negative13 responses13 The13 linear13 AMA13 filter13 responses13 (see13 first13 term13 in13 Eq13 5)13 and13 their13 pairwise13 sums13 (see13 the13 sum13 in13 the13 third13 termrsquos13 parentheses)13 can13 be13 obtained13 from13 a13 population13 of13 simple13 cells13 We13 define13 simple13 cells13 as13 units13 whose13 outputs13 result13 from13 linear13 filtering13 followed13 by13 half-shy‐wave13 rectification13 (ie13 thresholding13 at13 zero)13 Each13 AMA13 filter13 response13 in13 the13 first13 term13 can13 be13 obtained13 from13 two13 simple13 cells13 that13 are13 lsquoonrsquo13 and13 lsquooffrsquo13 versions13 of13 each13 AMA13 filter13 (Fig13 S3a13 see13 Fig13 3)13 Each13 pairwise-shy‐summed13 AMA13 filter13 response13 in13 the13 third13 term13 can13 be13 obtained13 from13 two13 simple13 cells13 that13 are13 lsquoonrsquo13 and13 lsquooffrsquo13 versions13 of13 each13 pairwise-shy‐summed13 filter13 (Fig13 S3a13 S4)13 13 13 The13 squared13 AMA13 filter13 responses13 (see13 the13 second13 and13 third13 terms13 in13 Eq13 5)13 can13 be13 obtained13 from13 a13 population13 of13 complex13 cells13 We13 define13 complex13 cells13 as13 units13 whose13 responses13 result13 from13 linear13 filtering13 followed13 by13 a13 squaring13 non-shy‐linearity13 as13 is13 often13 the13 case13 in13 cortex13 Complex13 cells13 can13 be13 implemented13 by13 summing13 and13 then13 squaring13 (or13 squaring13 and13 then13 summing)13 the13 outputs13 of13 lsquoonrsquo13 and13 lsquooffrsquo13 simple13 cells13 (Fig13 S3b13 the13 disparity13 tuning13 curves13 of13 these13 complex13 cells13 are13 given13 in13 Fig13 S5)13 13 Finally13 neurons13 whose13 response13 rates13 represent13 the13 log13 likelihood13 of13 the13 AMA13 filter13 population13 responses13 can13 be13 constructed13 via13 a13 weighted13 sum13 of13 simple13 and13 complex13 cell13 population13 responses13 (Eq13 513 Fig13 613 Fig13 7)13 (Note13 that13 the13 likelihood13 can13 be13 obtained13 by13 exponentiating13 the13 log-shy‐likelihood13 responses)13 In13 other13 words13 a13 large13 collection13 of13 log-shy‐likelihood13 (LL)13 neurons13 each13 with13 a13 different13 preferred13 disparity13 can13 be13 constructed13 from13 a13 fixed13 set13 of13 simple13 and13 complex13 cells13 simply13 by13 changing13 the13 weights13 13 13 Derivation13 of13 optimal13 weights13 on13 simple13 and13 complex13 cells13 The13 first13 term13 in13 Eq13 513 (ie13 the13 linear13 term)13 can13 be13 conceptualized13 as13 a13 weighted13 sum13 of13 simple13 cell13 population13 responses13 The13 second13 and13 third13 terms13 (ie13 the13 quadratic13 terms)13 can13 be13 conceptualized13 as13 weighted13 sums13 of13 complex13 cell13 population13 responses13 The13 constant13 corresponds13 to13 baseline13 response13 The13 constant13 and13 the13 weights13 on13 the13 linear13 squared13 and13 sum-shy‐squared13 filters13 are13 given13 by13

wik = Ck

minus1uk 13 13 13 13 13 13 13 13 13 (S1a)13

13 wiik = minusdiag(Cminus1)+ 05Cminus11 13 13 13 13 13 13 13 (S1b)13

wij k = minus05Cij

minus1 forallij j gt i 13 13 13 13 13 13 13 13 (S1c)13

cons primetk = minus05ukTCk

minus1uk + constk 13 13 13 13 13 13 (S1d)13 13 where13 I13 is13 the13 identity13 matrix13 113 is13 the13 lsquoonesrsquo13 vector13 and13 diag()13 sets13 a13 matrix13 diagonal13 to13 a13 vector13 Here13 we13 derive13 these13 weight13 equations13 based13 on13 the13 multi-shy‐dimensional13 Gaussian13 approximations13 to13 the13 empirically13 determined13 filter13 response13 distributions13 (ie13 conditional13 response13 distributions)13 (see13 Fig13 4a)13 13 13 13 13 13

213 13

First13 we13 rewrite13 Eq13 513 in13 the13 main13 text13 by13 expanding13 its13 third13 term13 13

ln p R δ k( ) = wik Ri

i=1

n

sum + wiik Ri2

i=1

n

sum + wij k Ri2 + 2Ri Rj + Rj

2( )j=i+1

n

sumi=1

nminus1

sum + cons primet 13 (S2)13

13 Second13 we13 rewrite13 Eq13 413 in13 the13 main13 text13

13

ln p R δ k( ) = minus05 R minus uk( )T Ak R minus uk( )+ const 13 13 13 13 13 (S3)13

13 where13 Ak 13 equals13 the13 inverse13 covariance13 matrix13 Ck

minus1 13 (Recall13 that13 the13 mean13 vector13 uk and13 covariance13 matrix13 Ck are13 obtained13 by13 projecting13 a13 collection13 of13 natural13 stereo-shy‐images13 having13 disparity13 δ k 13 onto13 the13 filters)13 From13 here13 forward13 the13 disparity13 index13 k13 is13 dropped13 for13 notational13 simplicity13 Third13 we13 set13 Eqs13 S213 and13 S313 equal13 to13 one13 another13 multiply13 through13 and13 collect13 terms13 The13 terms13 of13 the13 log13 likelihood13 (Eq13 S3)13 are13 found13 to13 have13 the13 form13 13

( ) ( )2 2 205 05 2ii i i ii i ii i i ii ia R u a R a Ru a uminus minus = minus minus + 13 13 13 13 (S4a)13

minusaij Ri minus ui( ) Rj minus u j( ) = minus aijRiRj minus aijRiu j minus aijRjui + aiju jui( ) 13 13 13 (S4b)13

13 where13 iia 13 are13 the13 on-shy‐diagonal13 elements13 of13 A 13 ija are13 the13 off-shy‐diagonal13 elements13 and13 iu 13 the13 are13 the13 mean13 filter13 responses13 13 13 The13 aim13 is13 now13 to13 express13 wi wii 13 and13 wij 13 in13 terms13 of13 iu 13 iia 13 and13 ija 13 First13 consider13 wi 13 In13 this13 case13

we13 want13 to13 collect13 terms13 containing13 iR 13 wiRi 13 from13 Eq13 S213 and13 aiiRiui 13 and13 aijRiu j 13 from13 Eq13 S4a13 Setting13 the13 iR 13 terms13 equal13 and13 canceling13 gives13 13

wi = aiju jj=1

n

sum 13 13 13 13 13 13 13 13 13 (S5a)13

13 Note13 that13 the13 weights13 on13 the13 linear13 filter13 responses13 wi 13 are13 zero13 if13 the13 mean13 linear13 filter13 responses13 ui 13 are13 zero13 In13 our13 case13 the13 filter13 response13 distributions13 means13 (ie13 the13 means13 of13 the13 conditional13 response13 distributions)13 are13 all13 near13 zero13 (see13 Fig13 4a13 Fig13 S2)13 Thus13 the13 weights13 on13 the13 simple13 cell13 responses13 (Fig13 6a)13 contribute13 little13 to13 the13 units13 (LL13 neurons)13 that13 respond13 according13 to13 the13 log13 likelihood13 of13 the13 linear13 filter13 population13 response13 (see13 Discussion)13 13 Next13 consider13 wij 13 i ne j 13 In13 this13 case13 we13 want13 to13 collect13 terms13 containing13 RiRj 13 2wijRiRj 13 appears13

in13 Eq13 S213 and13 minusaijRiRj 13 appears13 in13 Eq13 S4b13 Setting13 the13 two13 RiRj 13 terms13 equal13 to13 each13 other13 and13 canceling13 gives13 13

wij = minus05aij 13 13 13 13 13 13 13 13 13 (S5b)13

313 13

Next13 consider13 wii 13 In13 this13 case13 the13 total13 weight13 on13 2iR 13 in13 Eq13 S213 must13 be13 05 iiaminus 13 The13 weight13 from13

the13 third13 term13 of13 Eq13 S213 can13 be13 found13 by13 substituting13 wij = minus05aij 13 Subtracting13 that13 weight13 from13 the13

total13 weight13 specified13 by13 Eq13 S413 gives13 wii 13 13

wii = minusaii + 05 aijjsum 13 13 13 13 13 13 13 13 13 (S5c)13

13 Last13 by13 grouping13 terms13 from13 Eq13 S413 that13 are13 independent13 of13 the13 filter13 response13 we13 find13 the13 constant13 in13 Eq13 S213 is13 given13 by13 13

cons primet = minus05 aiimicroi2 + aijmicroimicro j

j foralljneisum

⎝⎜⎞

⎠⎟isum + const 13 13 13 13 (S5d)13

13 Eqs13 S1a-shy‐d13 are13 obtained13 by13 rewriting13 Eqs13 S5a-shy‐d13 in13 matrix13 notation13 13 13 In13 visual13 cortex13 simple13 and13 complex13 cells13 have13 the13 same13 maximum13 firing13 rates13 on13 average13 (Albrecht13 amp13 Hamilton13 1982)13 However13 the13 maximum13 possible13 response13 of13 the13 squared13 filters13 (see13 the13 second13 term13 in13 Eq13 5)13 is13 R2max 13 and13 the13 maximum13 possible13 response13 of13 the13 sum-shy‐squared13 filters13 (see13 the13 third13

term13 in13 Eq13 5)13 is13 given13 by13 R2ijmax = Rmax 2+ 2fi sdot f j( )2 13 where13 fi 13 and13 f j 13 are13 the13 weighting13 functions13 defining13 filters13 i13 and13 j13 (see13 Fig13 3a)13 Thus13 if13 these13 computations13 were13 implemented13 in13 cortexmdashthat13 is13 if13 the13 maximum13 response13 of13 the13 complex13 cells13 equaled13 the13 maximum13 response13 of13 the13 simple13 cells13 Rmax 13 rather13 than13 R

2max 13 and13 R

2ijmaxmdashthe13 weights13 on13 the13 complex13 cell13 responses13 must13 be13 scaled13

Thus13 the13 optimal13 weights13 on13 complex13 cells13 are13 given13 by13 primewi = wi 13 primewii = Rmaxwii 13 and13 primewij =Rijmax2

Rmaxwij 13

These13 scale13 factors13 are13 important13 because13 in13 a13 neurophysiological13 experiment13 one13 would13 measure13 primew 13 not13 w 13 Consequently13 primewii 13 and13 primewij 13 are13 the13 normalized13 weights13 plotted13 in13 Fig13 7b13

13 Evaluating13 the13 effect13 of13 surface13 slant13 Slant13 causes13 disparity13 to13 change13 across13 the13 patch13 We13 evaluated13 the13 effect13 of13 surface13 slant13 by13 determining13 the13 optimal13 filters13 and13 estimation13 performance13 for13 a13 training13 set13 with13 a13 distribution13 of13 slants13 If13 we13 assume13 (plausibly)13 that13 objects13 are13 on13 average13 equally13 likely13 to13 be13 viewed13 from13 any13 given13 angle13 then13 for13 a13 given13 small13 angular13 retinal13 extent13 the13 probability13 of13 a13 given13 slant13 about13 the13 vertical13 axis13 should13 fall13 as13 a13 half-shy‐cosine13 function13 Slant13 was13 calculated13 for13 the13 cyclopean13 eye13 To13 evaluate13 the13 effect13 of13 surface13 slant13 variation13 we13 generated13 a13 training13 set13 of13 stereo-shy‐images13 from13 a13 set13 of13 surfaces13 with13 slants13 that13 were13 randomly13 drawn13 from13 a13 truncated13 (+7113 deg)13 cosine13 probability13 density13 (more13 extreme13 slants13 were13 not13 practical13 to13 include13 because13 of13 the13 surface13 size13 required13 to13 produce13 a13 projected13 patch13 width13 of13 113 deg)13 The13 within-shy‐patch13 changes13 in13 disparity13 that13 were13 introduced13 by13 these13 slants13 are13 comparable13 to13 those13 in13 natural13 viewing13 (Fig13 S1g)13 Results13 based13 on13 this13 analysis13 are13 thus13 likely13 to13 be13 representative13 Then13 we13 determined13 the13 optimal13 binocular13 filters13 for13 this13 new13 training13 set13 The13 filters13 that13 were13 trained13 on13 surfaces13 having13 different13 slants13 (Fig13 S1a-shy‐c)13 and13 the13 LL13 neurons13 that13 are13 constructed13 from13 them13 (Fig13 S1de)13 are13 quite13 similar13 to13 those13 that13 were13 trained13 only13 on13 fronto-shy‐parallel13 surfaces13 The13 main13 (although13 slight)13 differences13 are13 that13 the13 new13 filters13 have13 spatial13 extents13 that13 are13 slightly13 smaller13 and13 octave13 bandwidths13 that13 are13 slightly13 higher13

413 13

and13 that13 the13 new13 LL13 neuron13 tuning13 curves13 are13 broader13 Disparity13 estimation13 performance13 on13 slant13 surfaces13 is13 also13 comparable13 but13 precision13 is13 somewhat13 reduced13 (Fig13 S1f13 Fig13 5d)13 When13 disparity13 changes13 across13 the13 patch13 the13 optimal13 encoding13 filters13 must13 be13 small13 enough13 to13 prevent13 the13 disparity13 signal13 from13 being13 lsquosmeared13 outrsquo13 by13 within-shy‐patch13 (ie13 within-shy‐receptive13 field)13 disparity13 variation13 And13 they13 must13 be13 large13 enough13 to13 prevent13 a13 poor13 signal-shy‐to-shy‐noise13 ratio13 which13 would13 lead13 to13 a13 preponderance13 of13 false13 matches13 (ie13 inaccurate13 disparity13 estimates)13 The13 optimal13 filters13 appear13 to13 strike13 this13 balance13 13 13 Evaluating13 the13 effect13 of13 depth13 variations13 other13 than13 slant13 Naturally13 occurring13 fine13 depth13 structure13 and13 occlusions13 (ie13 within-shy‐patch13 depth13 variation13 and13 depth13 discontinuities13 Fig13 S8a-shy‐c)13 were13 not13 present13 in13 either13 of13 our13 training13 sets13 To13 determine13 how13 the13 magnitude13 of13 these13 other13 sources13 of13 depth13 variation13 compare13 to13 that13 due13 to13 slant13 we13 analyzed13 4000013 113 deg13 patches13 from13 3013 outdoor13 range13 images13 obtained13 with13 a13 high-shy‐precision13 Riegl13 VZ40013 range13 scanner13 For13 each13 patch13 we13 computed13 the13 standard13 deviation13 of13 i)13 the13 differences13 between13 the13 raw13 range13 values13 and13 the13 best-shy‐fitting13 fronto-shy‐parallel13 plane13

σ nonminus planar 13 and13 ii)13 the13 differences13

between13 the13 best-shy‐fitting13 slanted13 and13 fronto-shy‐parallel13 planes13 σ planar 13 The13 ratio13 of13 these13 standard13

deviations13 σ planar σ nonminus planar 13 provides13 a13 measure13 of13 the13 relative13 magnitude13 of13 depth13 variation13 due13 to13

slant13 and13 variation13 due13 to13 other13 factors13 If13 the13 depth13 variation13 is13 due13 only13 to13 surface13 slant13 then13 the13 ratio13

σ planar σ nonminus planar 13 will13 have13 a13 value13 equal13 to13 1013 If13 all13 of13 the13 depth13 variation13 is13 due13 to13 factors13

other13 than13 slant13 the13 ratio13 will13 have13 a13 value13 of13 0013 Fig13 S8d13 plots13 the13 distribution13 of13 this13 ratio13 for13 all13 range13 patches13 In13 the13 most13 common13 patches13 essentially13 all13 the13 variation13 is13 captured13 by13 slant13 (ie13 the13 modal13 ratio13 is13 ~10)13 The13 median13 ratio13 is13 07513 indicating13 that13 in13 the13 median13 patch13 7513 of13 the13 depth13 variation13 is13 captured13 by13 slant13 Given13 that13 the13 optimal13 binocular13 filters13 are13 robust13 to13 variation13 in13 surface13 slant13 and13 given13 that13 most13 of13 natural13 depth13 variation13 is13 captured13 by13 slant13 it13 is13 unlikely13 that13 depth13 variation13 other13 than13 slant13 will13 qualitatively13 affect13 our13 results13 13 External13 variability13 and13 neural13 noise13 13 We13 examined13 the13 relative13 magnitude13 of13 stimulus-shy‐induced13 LL13 neuron13 variability13 and13 typical13 neural13 noise13 in13 cortex13 Typical13 cortical13 neurons13 have13 a13 peak13 mean13 response13 larger13 than13 3013 spikessec13 (Geisler13 amp13 Albrecht13 1997)13 and13 intrinsic13 noise13 variance13 proportional13 to13 the13 neuronrsquos13 mean13 response13 with13 a13 proportionality13 constant13 (ie13 Fano13 factor)13 less13 than13 15(Tolhurst13 Movshon13 amp13 Dean13 1983)13 To13 err13 on13 the13 side13 of13 too13 much13 neural13 noise13 we13 assume13 that13 each13 LL13 neuron13 has13 a13 maximum13 average13 response13 of13 3013 spkss13 and13 a13 Fano13 factor13 of13 1513 (bc13 the13 effect13 of13 noise13 is13 greater13 at13 lower13 firing13 rates)13 We13 convert13 the13 LL13 neuron13 responses13 to13 spike13 rates13 by13 first13 obtaining13 the13 average13 response13 of13 each13 LL13 neuron13 to13 its13 preferred13 disparity13 and13 then13 scaling13 that13 value13 to13 the13 maximum13 average13 response13 For13 an13 integration13 time13 of13 a13 typical13 fixation13 duration13 (30013 ms)13 stimulus-shy‐induced13 response13 variance13 was13 ~3x13 greater13 than13 the13 internal13 noise13 For13 an13 integration13 time13 equal13 to13 a13 typical13 stimulus13 presentation13 in13 a13 psychophysical13 task13 (113 sec)13 13 stimulus-shy‐induced13 response13 variance13 was13 ~9x13 greater13 than13 the13 internal13 noise13 Thus13 external13 variability13 must13 be13 considered13 to13 understand13 the13 influence13 of13 neural13 noise13 on13 the13 behavioral13 limits13 of13 vision13 under13 natural13 conditions13 In13 fact13 stimulus-shy‐induced13 variability13 is13 likely13 to13 be13 the13 controlling13 variability13 in13 many13 natural13 single-shy‐trial13 tasks13 13 Spatial13 frequency13 tuning13 of13 optimal13 linear13 binocular13 filters13 13 To13 develop13 an13 intuition13 for13 why13 spatial13 frequencies13 higher13 than13 ~613 cpd13 carry13 little13 information13 about13 disparity13 we13 examined13 the13 disparity13 signals13 resulting13 from13 a13 binocularly-shy‐viewed13 high-shy‐contrast13 cosine-shy‐windowed13 edge13 The13 Fourier-shy‐decomposition13 of13 each13 eyersquos13 stimulus13 is13 a13 set13 of13

513 13

phase-shy‐aligned13 sinewave13 gratings13 with13 the13 1f13 contrast13 fall-shy‐off13 typical13 of13 natural13 scenes(Field13 1987)13 The13 binocular13 difference13 signal13 (ie13 the13 difference13 of13 the13 left13 and13 right13 signals)13 is13 sinusoidal13 with13 contrast13 amplitude13 given13 by13 13

AB f δ k( ) = AL( f )2 + AR( f )2 minus 2AL( f )AR( f )cos 2π fδ k( ) 13 13 13 (S6)13

13 where f 13 is13 the13 spatial13 frequency13 δ k 13 is13 a13 particular13 disparity13 AL ( f ) 13 and13 AR( f ) 13 are13 the13 left13 and13

right13 eye13 retinal13 amplitudes13 and13 AB f δ k( ) 13 is13 the13 amplitude13 of13 the13 binocular13 contrast13 difference13 signal13 If13 the13 two13 eyes13 have13 identical13 optics13 the13 left-shy‐13 and13 right-shy‐eye13 retinal13 amplitudes13 (which13 include13 the13 effects13 of13 the13 optics13 and13 the13 1 f 13 falloff13 of13 natural13 image13 spectra)13 will13 be13 the13 same13 for13 a13 given13 disparity13 Fig13 S7a13 plots13 pattern13 of13 binocular13 contrast13 signals13 for13 each13 of13 several13 disparities13 Fig13 S7b13 shows13 the13 signals13 after13 filtering13 with13 a13 bank13 of13 log-shy‐Gabor13 filters13 each13 having13 a13 1513 octave13 bandwidth13 The13 patterns13 of13 binocular13 contrast13 barely13 differ13 above13 ~613 cpd13 A13 filter13 population13 (for13 disparity13 estimation)13 will13 be13 most13 efficient13 if13 each13 filter13 signals13 information13 about13 a13 range13 of13 disparities13 13 13 V113 binocular13 receptive13 fields13 tuned13 to13 the13 highest13 useful13 spatial13 frequency13 (~613 cpd)13 would13 have13 a13 spatial13 extent13 of13 ~813 arcmin13 assuming13 the13 typical13 1513 octave13 bandwidth13 of13 neurons13 in13 cortex13 Given13 that13 receptive13 fields13 cannot13 signal13 disparity13 variation13 finer13 than13 their13 own13 width13 this13 analysis13 may13 help13 explain13 why13 humans13 lose13 the13 ability13 to13 detect13 disparity13 modulations13 (ie13 sinusoidal13 modulations13 in13 disparity-shy‐defined13 depth)13 at13 spatial13 frequencies13 higher13 than13 ~413 cpd13 (Banks13 Gepshtein13 amp13 Landy13 200413 Harris13 McKee13 amp13 Smallman13 1997)13 Vergence13 eye13 movement13 jitter13 has13 previously13 been13 proposed13 as13 an13 explanation13 for13 why13 useful13 binocular13 information13 is13 restricted13 to13 low13 spatial13 frequencies13 (Vlaskamp13 Yoon13 amp13 Banks13 2011)13 vergence13 noise13 can13 degrade13 stereo13 information13 enough13 to13 render13 high13 frequency13 signals13 unmeasurable13 (Fig13 S7cd)13 The13 analysis13 presented13 here13 suggests13 that13 the13 frequency13 at13 which13 humans13 lose13 the13 ability13 to13 detect13 disparity13 modulation13 may13 simply13 be13 explained13 by13 the13 statistics13 of13 natural13 images13 13 Optics13 The13 retinal13 images13 used13 in13 our13 analysis13 were13 simulated13 to13 match13 the13 retinal13 image13 formation13 that13 occurs13 in13 the13 human13 eye13 In13 humans13 and13 non-shy‐human13 primates13 accommodative13 and13 vergence13 eye-shy‐movement13 systems13 are13 yoked(Fincham13 amp13 Walton13 1957)13 Consistent13 with13 this13 fact13 we13 set13 the13 focus13 distance13 equal13 to13 the13 fixation13 distance13 of13 4013 cm13 and13 defocused13 the13 images13 appropriate13 for13 each13 disparity13 The13 image13 of13 a13 surface13 displaced13 from13 the13 fixation13 is13 defocused13 by13

13 ΔD cong δ IPD 13 13 13 13 13 13 13 13 13 13 (S7)13

13 where13 δ 13 is13 the13 disparity13 of13 the13 surface13 expressed13 in13 radians13 IPD13 is13 the13 inter-shy‐pupillary13 distance13 expressed13 in13 meters13 and13 ΔD 13 is13 the13 defocus13 expressed13 in13 diopters13 (1meters)13 To13 account13 for13 the13 effect13 of13 chromatic13 aberration13 we13 created13 a13 different13 polychromatic13 point-shy‐spread13 function13 (PSF)13 for13 each13 disparity13 (ie13 defocus13 Eq13 S7)13 Single-shy‐wavelength13 PSFs13 were13 computed13 every13 1013 nm13 between13 40013 and13 70013 nm13 The13 wavelength-shy‐dependent13 change13 in13 refractive13 power13 of13 the13 human13 eye13 was13 taken13 from13 the13 literature13 (Thibos13 Ye13 Zhang13 amp13 Bradley13 1992)13 Polychromatic13 PSFs13 were13 obtained13 by13 weighting13 the13 single-shy‐wavelength13 PSFs13 by13 the13 photopic13 sensitivity13 function13 (Stockman13 amp13 Sharpe13 2000)13 and13 by13 the13 D6513 daylight13 illumination13 spectrum13 and13 then13 summing13 13

613 13

13 psf photopic xΔD( ) = 1

Kpsf xλΔD( )sc

λsum λ( )D65 λ( ) 13 13 13 13 (S8)13

13 where13 sc λ( ) 13 is13 the13 human13 photopic13 sensitivity13 function13 and13 K13 is13 a13 normalizing13 constant13 that13 sets13 the13 volume13 of13 psf photopic 13 to13 1013 The13 modulation13 transfer13 function13 (MTF)13 is13 the13 amplitude13 of13 the13 Fourier13 transform13 of13 the13 PSF13 13 Contrast13 normalization13 Contrast13 normalization13 occurs13 early13 in13 visual13 processing13 The13 standard13 model13 of13 cortical13 neuron13 response13 assumes13 that13 the13 input13 is13 the13 contrast13 signal13

c x( ) = r x( )minus r( ) r 13 where13 r x( ) 13 are13 the13

noisy13 sensor13 responses13 to13 the13 luminance13 image13 falling13 on13 the13 retina13 over13 a13 local13 area13 and13 r 13 is13 the13 local13 mean13 This13 contrast13 signal13 is13 then13 normalized13 by13 the13 local13 contrast13

cnorm x( ) = c x( ) c x( ) 2

+ nc502 13 where13 n13 is13 the13 dimensionality13 of13 the13 vector13 c x( ) 2 13 is13 the13 contrast13

energy13 (or13 equivalently13 n 13 times13 the13 squared13 RMS13 contrast)13 and13 50c 13 is13 the13 half-shy‐saturation13 constant13 (Albrecht13 amp13 Geisler13 199113 Albrecht13 amp13 Hamilton13 198213 Heeger13 1992)13 (We13 are13 agnostic13 about13 the13 mechanism13 by13 which13 the13 contrast13 normalization13 is13 achieved)13 Finally13 the13 normalized13 contrast13 signal13 is13 weighted13 by13 the13 cortical13 neuronrsquos13 receptive13 field13 and13 passed13 through13 a13 static13 non-shy‐linearity13 For13 example13 in13 the13 case13 of13 a13 binocular13 simple13 cell13 the13 response13 of13 the13 neuron13 is13 obtained13 by13 taking13 the13 dot13 product13 of13 the13 contrast-shy‐normalized13 signal13 with13 the13 receptive13 field13 and13 half-shy‐wave13 rectifying13

Ri = Rmax cnorm x( ) sdot fi x( )⎢⎣ ⎥⎦ 13 13

13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13

713 13

13 References13 Albrecht13 D13 G13 amp13 Geisler13 W13 S13 (1991)13 Motion13 selectivity13 and13 the13 contrast-shy‐response13 function13 of13

simple13 cells13 in13 the13 visual13 cortex13 Visual13 Neuroscience13 7(6)13 531ndash54613 Albrecht13 D13 G13 amp13 Hamilton13 D13 B13 (1982)13 Striate13 cortex13 of13 monkey13 and13 cat13 contrast13 response13

function13 Journal13 of13 Neurophysiology13 48(1)13 217ndash23713 Banks13 M13 S13 Gepshtein13 S13 amp13 Landy13 M13 S13 (2004)13 Why13 is13 spatial13 stereoresolution13 so13 low13 The13

Journal13 of13 Neuroscience13 24(9)13 2077ndash208913 13 Field13 D13 J13 (1987)13 Relations13 between13 the13 statistics13 of13 natural13 images13 and13 the13 response13 properties13 of13

cortical13 cells13 Journal13 of13 the13 Optical13 Society13 of13 America13 A13 4(12)13 2379ndash239413 Fincham13 E13 amp13 Walton13 J13 (1957)13 The13 reciprocal13 actions13 of13 accommodation13 and13 convergence13 The13

Journal13 of13 Physiology13 137(3)13 488ndash50813 Geisler13 W13 S13 amp13 Albrecht13 D13 G13 (1997)13 Visual13 cortex13 neurons13 in13 monkeys13 and13 cats13 detection13

discrimination13 and13 identification13 Visual13 Neuroscience13 14(5)13 897ndash91913 Harris13 J13 M13 McKee13 S13 P13 amp13 Smallman13 H13 S13 (1997)13 Fine-shy‐scale13 processing13 in13 human13 binocular13

stereopsis13 Journal13 of13 the13 Optical13 Society13 of13 America13 A13 14(8)13 1673ndash168313 Heeger13 D13 J13 (1992)13 Normalization13 of13 cell13 responses13 in13 cat13 striate13 cortex13 Visual13 Neuroscience13 9(2)13

181ndash19713 Hibbard13 P13 B13 (2008)13 Binocular13 energy13 responses13 to13 natural13 images13 Vision13 Research13 48(12)13 1427ndash

143913 13 Stockman13 A13 amp13 Sharpe13 L13 T13 (2000)13 The13 spectral13 sensitivities13 of13 the13 middle-shy‐13 and13 long-shy‐wavelength-shy‐

sensitive13 cones13 derived13 from13 measurements13 in13 observers13 of13 known13 genotype13 Vision13 Research13 40(13)13 1711ndash173713

Thibos13 L13 N13 Ye13 M13 Zhang13 X13 amp13 Bradley13 A13 (1992)13 The13 chromatic13 eye13 a13 new13 reduced-shy‐eye13 model13 of13 ocular13 chromatic13 aberration13 in13 humans13 Applied13 Optics13 31(19)13 3594ndash360013

Tolhurst13 D13 J13 Movshon13 J13 A13 amp13 Dean13 A13 F13 (1983)13 The13 statistical13 reliability13 of13 signals13 in13 single13 neurons13 in13 cat13 and13 monkey13 visual13 cortex13 Vision13 Research13 23(8)13 775ndash78513

Vlaskamp13 B13 N13 S13 Yoon13 G13 amp13 Banks13 M13 S13 (2011)13 Human13 stereopsis13 is13 not13 limited13 by13 the13 optics13 of13 the13 well-shy‐focused13 eye13 The13 Journal13 of13 Neuroscience13 31(27)13 9814ndash981813

13

813 13

13 Figure13 S113 Optimal13 filters13 and13 disparity13 estimation13 performance13 for13 estimating13 disparity13 for13 surfaces13 with13 a13 cosine13 distribution13 of13 slants13 (a)13 Optimal13 linear13 binocular13 filters13 for13 estimating13 the13 disparity13 (of13 the13 center13 pixel)13 of13 slanted13 surfaces13 The13 filters13 were13 learned13 on13 a13 training13 set13 having13 surfaces13 with13 a13 cosine13 distribution13 of13 slants13 The13 optimal13 filter13 shapes13 are13 robust13 to13 the13 presence13 of13 non-shy‐zero13 surface13 slants13 in13 the13 training13 set13 However13 these13 filters13 are13 of13 somewhat13 smaller13 spatial13 extent13 and13 slightly13 higher13 octave13 bandwidth13 (compare13 filters13 F3-shy‐F513 to13 those13 in13 Fig13 3)13 (b)13 Spatial13 frequency13 tuning13 and13 bandwidth13 of13 filters13 (c)13 Two-shy‐dimensional13 analogs13 of13 filters13 in13 (a)13 The13 two-shy‐dimensional13 filters13 respond13 virtually13 identically13 to13 two-shy‐dimensional13 images13 as13 do13 the13 one-shy‐dimensional13 filters13 to13 one-shy‐dimensional13 signals13 (d)13 Tuning13 curves13 of13 LL13 neurons13 trained13 and13 tested13 on13 surfaces13 with13 varying13 surface13 slant13 (e)13 Normalized13 weights13 on13 model13 complex13 cell13 responses13 for13 constructing13 five13 LL13 neurons13 with13 the13 indicated13 preferred13 disparities13 (f)13 Disparity13 estimation13 performance13 with13 surfaces13 having13 a13 cosine13 distribution13 of13 slants13 (black13 line)13 Also13 shown13 are13 disparity13 estimates13 derived13 from13 a13 local13 cross-shy‐correlator13 (gray13 line)13 (g)13 Disparity13 gradients13 from13 viewing13 natural13 scenes(Hibbard13 2008)13 vs13 disparity13 gradients13 in13 our13 training13 set13 The13 relationship13 between13 slant13 and13 disparity13 gradient13 is13 given13 by13

g asymp IPD tanθ( ) d 13 where13 θ 13 is13 the13 surface13 slant13 and13 d13 is13 the13 distance13 to13 the13 surface13 Hibbard13 (2008)13 estimated13 disparity13 gradients13 in13 natural13 viewing13 for13 distances13 nearer13 than13 3513 m13 (dashed13 curve)13 and13 0513 m13 (dotted13 curve)13 At13 the13 04m13 viewing13 distance13 assumed13 throughout13 the13 paper13 our13 manipulation13 of13 surface13 slant13 introduced13 distribution13 of13 disparity13 gradients13 (black13 line)13 that13 is13 comparable13 to13 the13 gradients13 introduced13 in13 natural13 viewing13 13

913 13

13 Figure13 S213 Filter13 responses13 LL13 neuron13 responses13 and13 performance13 with13 random-shy‐dot13 stereograms13 and13 anti-shy‐correlated13 random-shy‐dot13 stereograms13 (a)13 Joint13 response13 distributions13 of13 filters13 F113 and13 F213 when13 stimulated13 with13 random-shy‐dot13 stereograms13 Compare13 to13 natural13 stimuli13 in13 Fig13 4a13 (b)13 Joint13 responses13 of13 filters13 F313 and13 F413 when13 stimulated13 with13 random-shy‐dot13 stereograms13 (c)13 LL13 neuron13 responses13 to13 random13 dot13 stereograms13 (d)13 LL13 neuron13 tuning13 vs13 bandwidth13 when13 stimulated13 with13 natural13 stimuli13 Note13 that13 bandwidths13 are13 elevated13 compared13 to13 when13 the13 LL13 neurons13 are13 stimulated13 with13 natural13 stimuli13 (see13 Fig13 7c)13 (e)13 Disparity13 estimates13 and13 estimate13 precision13 (inset)13 obtained13 with13 random-shy‐dot13 stereograms13 (fghi)13 Same13 as13 abce13 except13 that13 the13 stimuli13 were13 anti-shy‐correlated13 random13 dot13 stereograms13 Filter13 response13 distributions13 still13 segregate13 as13 a13 function13 of13 disparity13 (especially13 with13 F313 and13 F4)13 However13 the13 anti-shy‐correlated13 filter13 response13 distributions13 do13 not13 coincide13 with13 the13 lsquoexpectedrsquo13 pattern13 of13 response13 for13 particular13 disparities13 The13 disparities13 are13 decoded13 improperly13 with13 a13 decoder13 trained13 on13 natural13 stimuli13 LL13 neuron13 tuning13 curves13 with13 anti-shy‐correlated13 stereograms13 are13 neither13 selective13 nor13 invariant13 Inaccurate13 highly13 variable13 estimates13 result13 13 13

13 13

1013 13

13 Figure13 S313 Obtaining13 AMA13 filter13 and13 model13 complex13 cell13 responses13 with13 established13 neural13 operations13 (a)13 Simple13 cell13 responses13 are13 obtained13 by13 projecting13 the13 left-shy‐13 and13 right-shy‐eye13 signals13 onto13 each13 binocular13 filter13 and13 half-shy‐wave13 rectifying13 Subtracting13 lsquoonrsquo13 and13 lsquooffrsquo13 simple13 cell13 responses13 give13 the13 AMA13 filter13 responses13 (see13 Fig13 3a)13 (b)13 Model13 (and13 neurophysiological)13 complex13 cell13 responses13 are13 obtained13 by13 summing13 and13 squaring13 the13 responses13 of13 simple13 cells13 (Model13 complex13 cell13 responses13 can13 also13 be13 obtained13 by13 summing13 the13 responses13 of13 simple13 cells13 that13 have13 both13 a13 half-shy‐wave13 rectifying13 and13 a13 squaring13 nonlinearity)13 Note13 that13 the13 model13 complex13 cells13 differ13 from13 canonical13 disparity13 energy13 binocular13 complex13 cells13 We13 label13 them13 lsquocomplexrsquo13 because13 they13 exhibit13 temporal13 frequency13 doubling13 and13 have13 a13 fundamental13 to13 DC13 response13 ratio13 of13 less13 than13 one13 The13 model13 LL13 neurons13 (see13 Fig13 613 and13 Fig13 7)13 are13 more13 similar13 to13 disparity13 energy13 complex13 cells13 although13 as13 noted13 in13 the13 text13 there13 are13 also13 important13 differences13

13

1113 13

13 Figure13 S413 The13 eight13 AMA13 filters13 from13 Fig13 313 (on13 the13 diagonal)13 and13 their13 pairwise13 sums13 (see13 Fig13 S3a)13 Solid13 and13 dashed13 curves13 indicate13 the13 left13 and13 right13 eye13 filter13 components13 respectively13 Filter13 identity13 is13 indicated13 by13 the13 subscripts13 Fij13 Some13 filters13 have13 shapes13 similar13 to13 canonical13 V1-shy‐like13 binocular13 receptive13 fields13 These13 filters13 yield13 model13 complex13 cell13 tuning13 curves13 that13 vary13 with13 disparity13 (Fig13 S5)13 Other13 filters13 are13 irregularly13 shaped13 (eg13 F36)13 Irregularly13 shaped13 filters13 yield13 model13 complex13 cell13 tuning13 curves13 that13 do13 not13 vary13 significantly13 with13 disparity13 (Fig13 S5)13 The13 inset13 shows13 the13 largest13 absolute13 normalized13 weight13 (see13 Supplement13 Eqs13 S1a-shy‐d)13 given13 to13 each13 filter13 in13 constructing13 the13 set13 of13 LL13 neurons13 (Fig13 7a)13 Lighter13 colors13 indicate13 smaller13 weights13 Unsurprisingly13 the13 irregularly13 shaped13 filters13 (eg13 F36)13 play13 a13 generally13 smaller13 role13 in13 shaping13 the13 response13 properties13 of13 log-shy‐likelihood13 (LL)13 neurons13 13 13

1213 13

13 Figure13 S513 Disparity13 tuning13 curves13 of13 the13 model13 complex13 cells13 implied13 by13 the13 AMA13 filters13 (on-shy‐diagonal)13 and13 their13 normalized13 pairwise13 sums13 (off-shy‐diagonal)13 Solid13 curves13 show13 the13 complex13 cell13 tuning13 curves13 to13 natural13 stimuli13 The13

complex13 response13 is13 given13 by13 filtering13 the13 contrast-shy‐normalized13 stimulus13 and13 then13 squaring13 Rmax fi

Tcnorm( )2 13 (see13 Fig13 S4)13 Each13 point13 on13 the13 curve13 represents13 the13 average13 response13 to13 a13 large13 number13 of13 natural13 stimuli13 having13 the13 same13 disparity13 Gray13 area13 shows13 response13 variability13 due13 to13 variation13 in13 natural13 image13 content13 Dashed13 curves13 show13 the13 complex13 cell13 tuning13 curve13 to13 513 random-shy‐dot13 stereograms13 (RDSs)13 In13 most13 cases13 RDS13 tuning13 curves13 are13 similar13 to13 the13 tuning13 curves13 for13 natural13 stimuli13 although13 response13 magnitude13 is13 generally13 somewhat13 attenuated13 These13 individual13 model13 complex13 cells13 have13 some13 selectivity13 for13 disparity13 However13 they13 exhibit13 little13 invariance13 to13 variation13 in13 natural13 image13 content13 Thus13 the13 pattern13 of13 the13 joint13 population13 response13 must13 be13 used13 to13 accurately13 estimate13 disparity13 Neurons13 that13 signal13 the13 log-shy‐likelihood13 of13 disparity13 (LL13 neurons)13 can13 be13 constructed13 by13 appropriately13 combining13 the13 joint13 responses13 of13 model13 complex13 cells13 LL13 neurons13 are13 highly13 selective13 for13 disparity13 and13 are13 also13 strongly13 response13 invariant13 to13 irrelevant13

1313 13

13 Figure13 S613 Optimal13 disparity13 estimation13 vs13 estimation13 via13 local13 cross-shy‐correlation13 Performance13 measures13 were13 obtained13 by13 testing13 each13 method13 on13 identical13 sets13 of13 left-shy‐13 and13 right-shy‐eye13 signals13 like13 those13 shown13 in13 Fig13 1d13 (Note13 that13 this13 differs13 somewhat13 from13 the13 most13 common13 test13 of13 the13 cross-shy‐correlation13 model13 which13 gives13 the13 algorithm13 windowed13 access13 to13 many13 patches13 in13 each13 eye13 We13 chose13 the13 current13 method13 so13 that13 each13 algorithm13 has13 access13 to13 the13 same13 image13 information13 thereby13 making13 the13 comparison13 meaningful)13 Black13 curves13 indicate13 estimation13 performance13 via13 local13 cross-shy‐correlation13 Gray13 curves13 indicate13 performance13 obtained13 with13 the13 eight13 AMA13 filters13 shown13 in13 the13 main13 text13 (data13 replotted13 from13 Fig13 5)13 (a)13 Median13 MAP13 estimates13 for13 frontoparallel13 surfaces13 only13 Error13 bars13 indicate13 6813 confidence13 intervals13 (b)13 Estimate13 precision13 as13 indicated13 by13 6813 confidence13 intervals13 Error13 bars13 from13 (a)13 are13 replotted13 on13 a13 linear13 axis13 (Confidence13 intervals13 were13 plotted13 on13 a13 semi-shy‐log13 plot13 in13 the13 main13 text)13 (c)13 Sign13 identification13 performance13 with13 frontoparallel13 surfaces13 (de)13 Same13 as13 bc13 except13 that13 the13 performance13 is13 for13 surfaces13 with13 a13 cosine13 distribution13 of13 slants13 Estimation13 performance13 with13 the13 cross-shy‐correlator13 is13 (slightly)13 more13 biased13 and13 is13 significantly13 less13 precise13 at13 disparities13 larger13 than13 7513 arcmin13 13

1413 13

13 Figure13 S713 Information13 for13 disparity13 estimation13 is13 concentrated13 in13 low13 to13 mid13 spatial13 frequencies13 (a)13 Inter-shy‐ocular13 retinal13 contrast13 amplitude13 of13 a13 cosine-shy‐windowed13 high-shy‐contrast13 edge13 as13 a13 function13 of13 spatial13 frequency13 for13 different13 disparities13 (Eq13 S6)13 The13 optics13 in13 the13 two13 eyes13 were13 identical13 ie13

AL f( ) = AR f( )prop MTFδ f( ) f 13 where13 MTFδ is13 the13 modulation13 transfer13 function13 associated13 with13 a13 particular13 disparity13 (see13 Methods)13 (b)13 Inter-shy‐ocular13 contrast13 signals13 that13 have13 been13 passed13 through13 a13 bank13 of13 1513 octave13 bandwidth13 log-shy‐Gabor13 filters13 The13 signals13 differ13 little13 above13 ~513 cpd13 (c)13 The13 effect13 of13 vergence13 jitter13 on13 inter-shy‐ocular13 contrast13 signals13 Signals13 were13 corrupted13 by13 213 arcmin13 of13 vergence13 jitter13 a13 typical13 value13 in13 humans13 Vergence13 noise13 was13 simulated13 by13 convolving13 each13 disparity13 signal13 with13 a13 Gaussian13 having13 a13 standard13 deviation13 of13 213 arcmin13 Vergence13 jitter13 further13 attenuates13 high13 frequency13 inter-shy‐ocular13 contrast13 signals13 Natural13 image13 structure13 and13 vergence13 jitter13 both13 contribute13 to13 the13 lack13 of13 disparity13 information13 at13 high13 luminance13 spatial13 frequencies13 (d)13 The13 signals13 in13 (c)13 after13 having13 been13 passed13 through13 a13 bank13 of13 1513 octave13 bandwidth13 log-shy‐Gabor13 filters13 (e)13 The13 effect13 of13 a13 four13 diopter13 difference13 in13 refractive13 power13 between13 the13 two13 eyes13 drastically13 reduces13 the13 inter-shy‐ocular13 contrast13 signal13 A13 four13 diopter13 (or13 larger)13 difference13 between13 the13 left13 and13 right13 optics13 is13 a13 primary13 risk13 factor13 for13 the13 development13 of13 amblyopia13 (Levi13 et13 al13 2011)13 Note13 that13 these13 signals13 were13 simulated13 for13 eyes13 with13 2mm13 pupils13 With13 larger13 pupil13 diameters13 the13 same13 difference13 in13 refractive13 power13 would13 further13 reduce13 available13 disparity13 signals13 For13 example13 with13 a13 2mm13 pupil13 an13 813 diopter13 difference13 between13 the13 eyes13 produces13 the13 same13 difference13 in13 defocus13 blur13 as13 would13 a13 4mm13 pupil13 with13 a13 413 diopters13 difference13 between13 the13 eyes13 In13 that13 case13 there13 are13 no13 measureable13 signals13 above13 113 cpd13 (f)13 The13 signals13 in13 (e)13 after13 having13 been13 passed13 through13 a13 bank13 of13 1513 octave13 bandwidth13 log-shy‐Gabor13 filters13 (g)13 Same13 as13 (e)13 but13 with13 a13 4mm13 pupil13 (h)13 Same13 as13 (f)13 but13 with13 a13 4mm13 pupil13 13

13

1513 13

13 Figure13 S813 Depth13 structure13 not13 attributable13 to13 slant13 in13 natural13 scenes13 (a)13 A13 representative13 natural13 range13 image13 (b)13 Overhead13 plan13 view13 of13 the13 range13 along13 the13 white13 scan-shy‐line13 in13 (a)13 (c)13 Range13 values13 within13 a13 113 deg13 patch13 marked13 by13 the13 white13 box13 in13 (a)13 and13 the13 black13 box13 in13 (b)13 The13 dots13 represent13 the13 raw13 range13 values13 The13 plane13 is13 the13 best13 fitting13 plane13 to13 the13 range13 data13 (d)13 Depth13 variation13 in13 113 deg13 patches13 of13 natural13 scenes13 can13 largely13 be13 captured13 by13 slanted13 planes13 (see13 supplementary13 text)13 Values13 near13 one13 indicate13 that13 all13 range13 variation13 can13 be13 captured13 by13 (planar)13 surface13 slant13 Values13 near13 zero13 indicate13 that13 none13 of13 the13 range13 variation13 can13 be13 captured13 by13 planar13 surface13 slant13 In13 the13 most13 common13 patch13 nearly13 all13 depth13 variation13 is13 attributable13 to13 slant13 In13 the13 median13 patch13 7513 of13 depth13 variation13 is13 captured13 by13 slant13 13 13 13 13 13 13 13 13 13 13

1613 13

13 Figure13 S913 Contrast13 normalization13 and13 the13 Gaussian13 form13 of13 the13 filter13 response13 distributions13 The13 standard13 model13 of13

contrast13 normalization13 in13 retinal13 and13 cortical13 neurons13 is13 given13 by13 cnorm x( ) = c x( ) c x( ) 2

+ nc502 (Albrecht13 amp13 Geisler13

199113 Albrecht13 amp13 Hamilton13 198213 Heeger13 1992)13 Throughout13 the13 paper13 we13 assumed13 that13 c50 13 had13 a13 value13 of13 zero13 To13 examine13 the13 effect13 of13 different13 values13 of13 c50 13 on13 the13 form13 of13 the13 filter13 response13 distributions13 we13 first13 contrast-shy‐normalized13 the13 test13 stimuli13 using13 different13 values13 of13 c50 13 Then13 we13 projected13 the13 contrast-shy‐normalized13 the13 test13 stimuli13 from13 each13 disparity13 level13 onto13 the13 filters13 to13 obtain13 the13 conditional13 filter13 response13 distributions13 Finally13 we13 fit13 a13 multi-shy‐dimensional13 generalized13 Gaussian13 (one-shy‐dimension13 for13 each13 filter)13 to13 each13 conditional13 response13 distribution13

P R |δ k( ) 13 via13 maximum13 likelihood13 estimation13 (a)13 When13 c50 =13 0013 the13 conditional13 response13 distributions13 have13 tails13 that13 tend13 to13 be13 somewhat13 lighter13 than13 Gaussians13 (ie13 lower13 kurtosis13 than13 a13 Gaussian)13 When13 c50 =13 0113 the13 conditional13 response13 distributions13 are13 most13 Gaussian13 on13 average13 For13 c50 0113 the13 distributions13 have13 heavier13 tails13 than13 Gaussians13 (ie13 higher13 kurtosis13 than13 a13 Gaussian)13 (b)13 Powers13 of13 the13 multi-shy‐dimensional13 generalized13 Gaussians13 that13 best13 fit13 a13 subset13 of13 the13 two-shy‐dimensional13 conditional13 filter13 response13 distributions13 A13 power13 of13 two13 (dashed13 line)13 indicates13 that13 the13 best13 fitting13 distribution13 was13 Gaussian13 Powers13 greater13 than13 or13 less13 than13 two13 indicate13 that13 the13 responses13 are13 better13 fit13 by13 distributions13 having13 lighter13 or13 heavier13 tails13 than13 a13 Gaussian13 respectively13 The13 best-shy‐fitting13 powers13 are13 shown13 for13 three13 different13 values13 of13 c50 13 00013 01013 and13 02513 The13 average13 powers13 across13 the13 conditional13 response13 distributions13 for13 the13 three13 tested13 values13 of13 c50 13 were13 27913 20713 and13 11713 respectively13 The13 same13 qualitative13 results13 hold13 for13 the13 other13 two-shy‐dimensional13 response13 distributions13 The13 pattern13 also13 holds13 for13 higher-shy‐dimensional13 filter13 response13 distributions13 In13 higher13 dimensions13 however13 results13 are13 noisier13 because13 stable13 estimates13 of13 kurtosis13 require13 significantly13 more13 data13 13 13 13 13 13

  • jovi-14-02-01_118
    • Introduction
    • f01
    • Methods and results
    • e01
    • f02
    • e02
    • e03
    • e04
    • f03
    • e05
    • f04
    • f05
    • f06
    • f07
    • Discussion
    • Conclusions
    • Methods details
    • e06
    • e07
    • Albrecht1
    • Albrecht2
    • Anzai1
    • Badcock1
    • Banks1
    • Berens1
    • Blakemore1
    • Burge1
    • Burge2
    • Carandini1
    • Chen1
    • Cormack1
    • Cumming1
    • Cumming2
    • Cumming3
    • Curcio1
    • DeValois1
    • DeAngelis1
    • Ferster1
    • Fincham1
    • Fleet1
    • Ganguli1
    • Geisler1
    • Girshick1
    • Haefner1
    • Haefner2
    • Harris1
    • Hecht1
    • Heeger1
    • Hibbard1
    • Hoyer1
    • Jazayeri1
    • Kaye1
    • Landers1
    • Lehky1
    • Levi1
    • Lewicki1
    • Li1
    • Liu1
    • Ma1
    • Marcos1
    • Marr1
    • McKee1
    • Nakayama1
    • Nienborg1
    • Ogle1
    • Ohzawa1
    • Ohzawa2
    • Ohzawa3
    • Olshausen1
    • Panum1
    • Poggio1
    • Qian1
    • Qian2
    • Read1
    • Read2
    • Read3
    • Simoncelli1
    • Skottun1
    • Stevenson1
    • Stockman1
    • Tanabe1
    • Tanabe2
    • Thibos1
    • Tsao1
    • Tyler1
    • Tyler2
    • Vlaskamp1
    • Williams1
    • Wyszecki1
      • JOV-03741-2013-s01
Page 6: Optimal disparity estimation in natural stereo images · The local differences between the retinal images—the binocular disparities— ... vertically oriented receptive fields.The

relevant information in natural images yields excellentdisparity estimation performance

This pattern of performance is consistent withhuman performance Human disparity detectionthresholds are exquisite a few arcsec on average(Blakemore 1970 Cormack et al 1991) discrimina-tion thresholds decrease exponentially with disparity(Badcock amp Schor 1985 Blakemore 1970 McKeeLevi amp Bowne 1990 Stevenson Cormack Schor ampTyler 1992) and sign confusions occur with a similarpattern and a similar proportion of the time (Landers ampCormack 1997)

Most psychophysical and neurophysiological datahas been collected with artificial stimuli usuallyrandom-dot stereograms (RDSs) We asked how wellour optimal estimator performs on RDS stimuli Giventhat our estimator was trained on natural stimuli it isinteresting to note that performance is very similar (butslightly poorer) with RDSs (Figure S2andashe) Fortunate-ly this finding suggests that under many circumstancesRDS stimuli are reasonable surrogates for naturalstimuli when measuring disparity processing in bio-logical vision systems

Our optimal estimator also performs poorly onanticorrelated stimuli (eg stimuli in which one eyersquosimage is contrast reversed) just like humans (Cum-ming Shapiro amp Parker 1998) Relevant informationexists in the optimal filter responses to anticorrelatedstimuli although the conditional response distributionsare more poorly segregated and the responses aregenerally slightly weaker (Figure S2f g) The primaryreason for poor performance is that an optimalestimator trained on natural images cannot accuratelydecode the filter responses to anticorrelated stimuli(Figure S2h i) This fact may help explain whybinocular neurons often respond strongly to anticor-related stereograms and why anticorrelated stereo-

grams appear like ill-specified volumes of points indepth

Optimal disparity encoding and decoding withneural populations

Although the AMA filters appear similar to binoc-ular simple cells (Figure 3 Figure S1) it may not beobvious how the optimal Bayesian rule for combiningtheir responses is related to processing in visual cortexHere we show that the optimal computations can beimplemented with neurally plausible operationsmdashlinearexcitation linear inhibition and simple static non-linearities (thresholding and squaring) Appropriateweighted summation of binocular simple and complexcell population responses can result in a new popula-tion of neurons having tightly tuned unimodaldisparity tuning curves that are largely invariant (seeFigure 2c)

The key step in implementing a Bayes-optimalestimation rule is to compute the likelihoodmdashorequivalently the log likelihoodmdashof the optimal filterresponses conditioned on each disparity level For thepresent case of a uniform prior the optimal MAPestimate is simply the disparity with the greatest loglikelihood Given that the likelihoods are Gaussian thelog likelihoods are quadratic

ln gaussethRjdkTHORNfrac12 frac14 05ethR ukTHORNTC1k ethR ukTHORN

thorn const eth4THORNwhere uk and Ck which are the mean and covariancematrices of the AMA filter responses to a largecollection of natural stereo-image patches havingdisparity dk (see Figure 4a) By multiplying throughand collecting terms Equation 4 can be expressed as

Figure 3 Optimal linear binocular receptive fields for disparity estimation (a) Spatial receptive fields Solid lines with closed symbols

indicate the left eye filter components Dashed lines with open symbols indicate right eye filter components Insets show 2-D versions

of the 1-D filters (b) Luminance spatial frequency tuning versus spatial frequency bandwidth The filters have bandwidths of

approximately 15 octaves (c) Phase and position shift coding for optimal binocular filters (circles) and binocular cells in macaque

(squares) (Cumming amp DeAngelis 2001) Phase shifts are expressed in equivalent position shifts Note that the filters were optimized

for the fovea whereas macaque cells were recorded from a range of different eccentricities

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 6

ln gaussethRjdkTHORNfrac12 frac14Xnifrac141

wikRi thornXnifrac141

wiikR2i

thornXn1

ifrac141

Xnjfrac14ithorn1

wijkethRi thorn RjTHORN2

thorn const0 eth5THORNwhere Ri is the response of the ith AMA filter and wikwiik and wijk are the weights for disparity dk Theweights are simple functions of the mean and covari-ance matrices from Equation 4 (see Supplement) Thusa neuron which responds according to the loglikelihood of a given disparity (an LL neuron)mdashthat isRLL

k frac14 ln[gauss(Rjdk)]mdashcan be obtained by weightedsummation of the linear (first term) squared (secondterm) and pairwise-sum-squared (third term) AMAfilter responses (Equation 5) The implication is that alarge collection of LL neurons each with a differentpreferred disparity can be constructed from a smallfixed set of linear filters simply by changing the weightson the linear filter responses squared-filter responsesand pairwise-sum-squared filter responses

A potential concern is that these computations couldnot be implemented in cortex AMA filters are strictlylinear and produce positive and negative responses(Figure 6a) whereas real cortical neurons produce onlypositive responses However the response of eachAMA filter could be obtained by subtracting theoutputs of two half-wave rectified simple cells that arelsquolsquoonrsquorsquo and lsquolsquooffrsquorsquo versions of each AMA filter (seeSupplement Figure S3a) The squared and pairwise-

sum-squared responses are modeled as resulting from alinear filter followed by a static squaring nonlinearity(Figure 6b) these responses could be obtained fromcortical binocular complex cells (see Supplement)(Note that our lsquolsquocomplex cellrsquorsquo differs from thedefinition given by some prominent expositors of thedisparity energy model [Figure 6d] Cumming ampDeAngelis 2001 Ohzawa 1998 Qian 1997) Thesquared responses could be obtained by summing andsquaring the responses of on and off simple cells (seeSupplement Figure S3a b) Finally the LL neuronresponse could be obtained via a weighted sum of theAMA filter and model complex cell responses (Figure6c) Thus all the optimal computations are biologicallyplausible

Figure 6 shows processing schematics disparitytuning curves and response variability for the threefilter types implied by our analysis an AMA filter amodel complex cell a model LL neuron and forcomparison a standard disparity energy neuron(Cumming amp DeAngelis 2001) (The model complexcells are labeled lsquolsquocomplexrsquorsquo because in general theyexhibit temporal frequency doubling and a fundamen-tal to mean response ratio that is less than 10 Skottunet al 1991) Response variability due to retinal imagefeatures that are irrelevant for estimating disparity isindicated by the gray area the smaller the gray areathe more invariant the neural response to irrelevantimage features The AMA filter is poorly tuned todisparity and gives highly variable responses todifferent stereo-image patches with the same disparity

Figure 4 Joint filter response distributions conditioned on disparity for filters F1 and F2 (see Figure 3a) (a) Joint filter responses to

each of the 7600 image patches in the training set Different colors and symbols denote different disparity levels Contours show

Gaussian fits to the conditional filter response distributions The black curves on the x and y axes represent the marginal response

distributions p(R1) and p(R2) (b) Posterior probability distributions averaged across all stimuli at each disparity level if only filters F1

and F2 are used (dotted curves) and if all eight filter responses are used (solid curves) Using eight AMA filters instead of two

increases disparity selectivity Shaded areas represent 68 confidence intervals on the posterior probabilities this variation is due to

natural stimulus variation that is irrelevant for estimating disparity Natural stimulus variation thus creates response variability even in

hypothetical populations of noiseless neurons

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 7

(Figure 6a) The model complex cell is better tuned butit is not unimodal and its responses also vary severelywith irrelevant image information (Figure 6b) (Thedisparity tuning curves for all model complex cells areshown in Figure S5) In contrast the LL neuron issharply tuned is effectively unimodal and is stronglyresponse invariant (Figure 6c) That is it respondssimilarly to all natural image patches of a givendisparity The disparity tuning curves for a range of LLneurons are shown in Figure 7a When slant variessimilar but broader LL neuron tuning curves result(Figure S1d e)

These results show that it is potentially misleading torefer to canonical V1 binocular simple and complexcells as disparity tuned because their responses aretypically as strongly modulated by variations incontrast pattern as they are by variations in disparity(gray area Figure 6a b) The LL neurons on the otherhand are tuned to a narrow range of disparities andrespond largely independent of the spatial frequencycontent and contrast

The LL neurons have several interesting propertiesFirst their responses are determined almost exclusivelyby the model complex cell inputs because the weightson the linear responses (Equation 5 Figure 6a c) aregenerally near zero (see Supplement) In this regard theLL neurons are consistent with the predictions of thestandard disparity energy model (Cumming amp DeAn-gelis 2001 Ohzawa 1998) However standard dis-

parity energy neurons are not as narrowly tuned or asinvariant (Figure 6d)

Second each LL neuron receives strong inputs frommultiple complex cells (Figure 7b) In this regard theLL neurons are inconsistent with the disparity energymodel which proposes that disparity-tuned cells areconstructed from two binocular subunits The potentialvalue of more than two subunits has been previouslydemonstrated (Qian amp Zhu 1997) Recently it hasbeen shown that some disparity-tuned V1 cells areoften modulated by a greater number of binocularsubunits than two Indeed as many as 14 subunits candrive the activity of a single disparity selective cell(Tanabe Haefner amp Cumming 2011)

Third the weights on the model complex cells(Figure 7b)mdashwhich are determined by the conditionalresponse distributions (Figure 4a)mdashspecify how infor-mation at different spatial frequencies should becombined

Fourth as preferred disparity increases the numberof strong weights on the complex cell inputs decreases(Figure 7b) This occurs because high spatial frequen-cies are less useful for encoding large disparities (seebelow) Thus if cells exist in cortex that behavesimilarly to LL neurons the number and spatialfrequency tuning of binocular subunits driving theirresponse should decrease as a function of theirpreferred disparity

Fifth for all preferred disparities the excitatory(positive) and inhibitory (negative) weights are in a

Figure 5 Accuracy and precision of disparity estimates on test patches (a) Disparity estimates of fronto-parallel surfaces displaced

from fixation using the filters in Figure 3 Symbols represent the median MAP readout of posterior probability distributions (see

Figure 4b) Error bars represent 68 confidence intervals on the estimates Red boxes mark disparity levels not in the training set

Error bars at untrained levels are no larger than at the trained levels indicating that the algorithm makes continuous estimates (b)

Precision of disparity estimates on a semilog axis Symbols represent 68 confidence intervals (same data as error bars in Figure 5a)

Human discrimination thresholds also rise exponentially as stereo stimuli are moved off the plane of fixation The gray area shows the

hyperacuity region (c) Sign identification performance as a function of disparity (d) (e) Same as in (b) (c) except that data is for

surfaces with a cosine distribution of slants

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 8

classic push-pull relationship (Ferster amp Miller 2000)(Figure 7b) consistent with the fact that disparityselective neurons in visual cortex contain both excit-atory and suppressive subunits (Tanabe et al 2011)

Sixth the LL neuron disparity tuning curves areapproximately log-Gaussian in shape (ie Gaussian ona log-disparity axis) Because their standard deviationsare approximately constant on a log disparity axistheir disparity bandwidths increase linearly with thepreferred disparity (Figure 7c) In this respect many

cortical neurons behave like LL neurons the disparitytuning functions of cortical neurons typically havemore low frequency power than is predicted from thestandard energy model (Ohzawa DeAngelis amp Free-man 1997 Read amp Cumming 2003) Additionallypsychophysically estimated disparity channels in hu-mans exhibit a similar relationship between bandwidthand preferred disparity (Stevenson et al 1992) Humandisparity channels however differ in that they haveinhibitory side-lobes (Stevenson et al 1992) that is

Figure 6 Biologically plausible implementation of selective invariant tuning processing schematics and disparity tuning curves for an

AMA filter a model complex cell and a model log-likelihood (LL) neuron For comparison a disparity energy unit is also presented In

all cases the inputs are contrast normalized photoreceptor responses Disparity tuning curves show the mean response of each filter

type across many natural image patches having the same disparity for many different disparities Shaded areas show response

variation due to variation in irrelevant features in the natural patches (not neural noise) Selectivity for disparity and invariance to

irrelevant features (external variation) increase as processing proceeds (a) The filter response is obtained by linearly filtering the

contrast normalized input signal with the AMA filter (b) The model complex cell response is obtained by squaring the linear AMA

filter response (c) The response of an LL neuron with preferred disparity dk is obtained by a weighted sum of linear and squared

filter responses The weights can be positiveexcitatory or negativeinhibitory (see Figure 7) The weights for an LL neuron with a

particular preferred disparity are specified by the filter response distribution to natural images having that disparity (Figure 4a

Equations 4 5 S1ndash 4) In disparity estimation the filter responses specify that the weights on the linear filter responses are near zero

(see Supplement) (d) A standard disparity energy unit is obtained by simply summing the squared responses of two binocular linear

filters that are in quadrature (908 out of phase with each other) Here we show the tuning curve of a disparity energy unit having

binocular linear filters (subunits) with left and right-eye components that are also 908 out of phase with each other (ie each

binocular subunit is selective for a nonzero disparity)

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 9

they have the center-surround organization that is ahallmark of retinal ganglion cell LGN and V1receptive fields Understanding the basis of this center-surround organization is an important direction forfuture work

V1 binocular neurons are unlikely to have receptivefields exactly matching those of the simple and complexcells implied by the optimal AMA filters but V1neurons are likely to span the subspace spanned by theoptimal filters It is thus plausible that excitatory andinhibitory synaptic weights could develop so that asubset of the neurons in V1 or in other cortical areassignal the log likelihood of different specific disparities(see Figure 7b) Indeed some cells in cortical areas V1V2 and V3V3a exhibit sharp tuning to the disparity ofrandom dot stimuli (Cumming 2002 Ohzawa et al1997 G F Poggio et al 1988 Read amp Cumming2003)

Computational models of estimation from neuralpopulations often rely on the assumption that eachneuron is invariant and unimodally tuned to thestimulus property of interest (Girshick et al 2011Jazayeri amp Movshon 2006 Lehky amp Sejnowski 1990Ma et al 2006) However it is often not discussed howinvariant unimodal tuning arises For example thebinocular neurons (ie complex cells) predicted by thestandard disparity energy model do not generallyexhibit invariant unimodal tuning (Figure 6d) Ouranalysis shows that neurons with unimodal tuning tostimulus properties not trivially available in the retinal

images (eg disparity) can result from appropriatelinear combination of nonlinear filter responses

To obtain optimal disparity estimates the LLneuron population response (represented by the vectorRLL in Figure 2c) must be read out (see Figure 2d)The optimal read-out rule depends on the observerrsquosgoal (the cost function) A common goal is to pick thedisparity having the maximum a posteriori probabil-ity If the prior probability of the different possibledisparities is uniform then the optimal MAP decodingrule reduces to finding the LL neuron with themaximum response Nonuniform prior probabilitiescan be taken into account by adding a disparity-dependent constant to each LL neuron responsebefore finding the peak response There are elegantproposals for how the peak of a population responsecan be computed in noisy neural systems (Jazayeri ampMovshon 2006 Ma et al 2006) Other commonlyassumed cost functions (eg MMSE) yield similarperformance

In sum our analysis has several implications First itsuggests that optimal disparity estimation is bestunderstood in the context of a population codeSecond it shows how to linearly sum nonlinear neuralresponses to construct cells with invariant unimodaltuning curves Third it suggests that the eclecticmixture of binocular receptive field properties in cortexmay play a functional role in disparity estimationFourth it provides a principled hypothesis for howneurons may compute the posterior probabilities of

Figure 7 Constructing selective invariant disparity-tuned units (LL neurons) (a) Tuning curves for several LL neurons each with a

different preferred disparity Each point on the tuning curve represents the average response across a collection of natural stereo-

images having the same disparity Gray areas indicate 61 SD of response due to stimulus-induced response variability (b)

Normalized weights on model complex cell responses (see Supplement Equation 5 Figure 6c) for constructing the five LL neurons

marked with arrows in (a) Positive weights are excitatory (red) Negative weights are inhibitory (blue) On-diagonal weights

correspond to model complex cells having linear receptive fields like the filters in Figure 3a Off-diagonal weights correspond to model

complex cells having linear receptive fields like the scaled pairwise sums of the AMA filters (see Figures S3ndashS5) High spatial

frequencies are not useful for estimating large disparities (Figure S7) Thus the number of strongly weighted complex cells decreases

as the magnitude of the preferred disparity increases from zero (c) LL neuron bandwidth (ie full-width at half-height of disparity

tuning curves) as a function of preferred disparity Bandwidth increases approximately linearly with tuning

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 10

stimulus properties (eg disparity defocus motion)not trivially available in the retinal image(s) Thus ouranalysis provides a recipe for how to increase selectivityand invariance in the visual processing stream invari-ance to image content variability and selectivity for thestimulus property of interest

Discussion

Using principles of Bayesian statistical decisiontheory we showed how to optimally estimate binoc-ular disparity in natural stereo images given theoptical sensor and noise properties of the humanvisual system First we determined the linear binoc-ular filters that encode the retinal image informationmost useful for estimating disparity in natural stereo-images The optimal linear filters share many proper-ties with binocular simple cells in the primary visualcortex of monkeys and cats (Figure 3 Figure S1)Then we determined how the optimal linear filterresponses should be nonlinearly decoded to maximizethe accuracy of disparity estimation The optimaldecoder was not based on prior assumptions butrather was dictated by the statistical pattern of thejoint filter responses to natural stereo images Overallperformance was excellent and matched importantaspects of human performance (Figure 5) Finally weshowed that the optimal encoder and decoder can beimplemented with well-established neural operationsThe operations show how selectivity and invariancecan emerge along the visual processing stream(Figures 6 amp 7)

External variability and neural noise

A visual systemrsquos performance is limited by bothexternal (stimulus-induced) variability and intrinsicneural response variability (ie neural noise) Manytheoretical studies have focused on the impact ofneural noise on neural computation (Berens EckerGerwinn Tolias amp Bethge 2011 Cumming ampDeAngelis 2001 DeAngelis et al 1991 Jazayeri ampMovshon 2006 Ma et al 2006 Nienborg et al 2004Ohzawa DeAngelis amp Freeman 1990) However inmany real-world situations stimulus-induced responsevariability (ie variability due to nuisance stimulusproperties) is a major or the dominant source ofvariability Our analysis demonstrates that the exter-nal variation in natural image contentmdashthat isvariation in image features irrelevant to the taskmdashconstitutes an important source of response variabilityin disparity estimation (see Figures 4 5 7) Thus evenin a hypothetical visual system composed of noiseless

neurons significant response variability will occurwhen estimating disparity under natural conditions(see Figures 4 and 5)

Determining the best performance possible withoutneural noise is essential for understanding the effect ofneural noise (Haefner amp Bethge 2010 Lewicki 2002Li amp Atick 1994 Olshausen amp Field 1996 Simoncelliamp Olshausen 2001) because it sets bounds on theamount of neural noise that a system can toleratebefore performance significantly deteriorates Indeedthe impact of neural noise on encoding and decoding innatural viewing cannot be evaluated without consider-ing the impact of stimulus-induced response variabilityon the population response (see Supplement)

Seventy years ago Hecht Shlaer and Pirenne (1942)showed in a landmark study that dark-adapted humanscan detect light in the periphery from the absorption ofonly five to eight quanta and that external variability(in the form of photon noise) was the primary source ofvariability in human performance This result contra-dicted the notion (widespread at the time) that responsevariability is due primarily to variability within theorganism (eg intrinsic noise) Similarly our resultsunderscore the importance of analyzing the variabilityof natural signals when analyzing the responsevariability of neural circuits that underlie performancein critical behavioral tasks

Comparison with the disparity energy model

The most common neurophysiological model ofdisparity processing is the disparity energy modelwhich aims to account for the response properties ofbinocular complex cells (Cumming amp DeAngelis 2001DeAngelis et al 1991 Ohzawa 1998 Ohzawa et al1990) The model had remarkable initial successHowever an increasing number of discrepancies haveemerged between the neurophysiology and the modelrsquospredictions The standard disparity energy model forexample does not predict the reduced response toanticorrelated stereograms observed in cortex It alsodoes not predict the dependence of disparity selectivecells on more than two binocular subunits Fixes to thestandard model have been proposed to account for theobserved binocular response properties (Haefner ampCumming 2008 Qian amp Zhu 1997 Read Parker ampCumming 2002 Tanabe amp Cumming 2008) We haveshown however that a number of these properties area natural consequence of an optimal algorithm fordisparity estimation

The most common computational model of disparityprocessing is the local cross-correlation model (Bankset al 2004 Cormack et al 1991 Tyler amp Julesz 1978)In this model the estimated disparity is the one thatproduces the maximum local cross-correlation between

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 11

the images in the two eyes Local cross-correlation isthe computational equivalent of appropriately pro-cessing large populations of disparity-energy complexcells having a wide range of tuning characteristics(Anzai Ohzawa amp Freeman 1999 Fleet Wagner ampHeeger 1996) Cross-correlation is optimal when thedisparities across a patch are small and constantbecause it is then akin to the simple template matchingof noise-limited ideal observers Cross-correlation isnot optimal for larger or varying disparities becausethen the left- and right-eye signals falling within abinocular receptive field often differ substantially

How does our ideal observer for disparity estimationcompare with the standard disparity-energy and cross-correlation models To enable a meaningful compari-son both methods were given access to the same imageinformation Doing so requires a slight modification tothe standard cross correlation algorithm (seeSupplement) The eight AMA filters in Figure 3outperform cross-correlation in both accuracy andprecision when the disparities exceed approximately75 arcmin (Figure S6) This performance increase isimpressive given that local cross-correlation has (ineffect) access to a large bank of filters whereas ourmethod uses only eight Furthermore increasing thenumber of AMA filters to 10ndash12 gives essentiallyequivalent performance to cross-correlation for smallerdisparities

Spatial frequency tuning of optimal linearbinocular filters

The optimal linear binocular filters (the AMA filters)are selective for low to mid spatial frequencies (12ndash47c8) and not for higher frequencies (see Figure 3 FigureS1) To develop an intuition for why we examined thebinocular contrast signals as a function of frequencythat result from a binocularly viewed high-contrastedge (Figure S7) The binocular contrast signals barelydiffer above 6 c8 indicating that higher spatialfrequencies carry little information for disparity esti-mation

This analysis of binocular contrast signals providesinsight into another puzzling aspect of disparityprocessing Human sensitivity to disparity modulation(ie sinusoidal modulations in disparity-defined depth)cuts off at very low frequencies (4 c8) (Banks et al2004 Tyler 1975) V1 binocular receptive fields tunedto the highest useful (luminance) spatial frequency (6c8) have widths of 8 arcmin assuming the typicaloctave bandwidth for cortical neurons Given thatreceptive fields cannot signal variation finer than theirown size the fact that humans cannot see disparitymodulations higher than 4 c8 is nicely predicted bythe 8 arcmin width suggested by the present analysis

This size matches previous psychophysically basedestimates of the smallest useful mechanism in disparityestimation (Banks et al 2004 Harris McKee ampSmallman 1997)

Eye movement jitter has previously been proposed asan explanation for why useful binocular information isrestricted to low spatial frequencies (Vlaskamp Yoonamp Banks 2011) The argument is that jitter and therelatively long integration time of the stereo systemcould lsquolsquosmear outrsquorsquo the disparity signals Eye movementjitter certainly can degrade stereo information enoughto render high frequency signals unmeasurable (FigureS7c d) However the analysis in the supplement(Figure S7) suggests that the low frequency at whichhumans lose the ability to detect disparity modulationmay also be explained by the statistics of naturalimages

Generality of findings

Although our analysis is based on calibrated naturalstereo-image patches and realistic characterizations ofhuman optics sensors and neural noise there are somesimplifying assumptions that could potentially limit thegenerality of our conclusions Here we examine theeffect of these assumptions

The effect of surface slant and nonplanar depth structure

The stereo-image patches were projections offronto-parallel surfaces In natural scenes surfaces areoften slanted which causes the disparity to changeacross a patch We evaluated the effect of surface slantby repeating our analysis with a training set having adistribution of slants (see Figure 5d e andSupplement) This distribution of slants produced adistribution of disparity gradients that are comparableto those that occur when viewing natural scenes(Hibbard 2008) (Figure S1g) The optimal binocularfilters are only very slightly affected by surface slant(cf Figure 3 and Figure S1a) These differenceswould be difficult to detect in neurophysiologicalexperiments This may help explain why there hasbeen little success in finding neurons in early visualcortex that are tuned to nonzero slants (Nienborg etal 2004) Disparity estimation performance is alsoquite similar although estimate precision is somewhatreduced (Figure 5d e)

Even after including the effect of surface slant ourtraining set lacks the non-planar depth structuremdashwithin-patch depth variation and depth discontinuities(occlusions)mdashthat is present in natural scenes (FigureS8andashc) To determine how the magnitude of theseother sources depth variation compare to that due toslant we analyzed a set of range images obtained with

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 12

a high-precision Riegl VZ400 range scanner (seeSupplement) We find that most of the depth variationin the range images is captured by variation in slant(Figure S8d) Given that variation in slant has littleeffect on the optimal receptive fields and given thatlocations of fine depth structure and monocularocclusion zones (9 of points in our dataset) aremost likely random with respect to the receptive fieldsthen it is likely that the other sources of variation willhave little effect

Nevertheless occlusions create monocular zones thatcarry information supporting da Vinci stereopsis inhumans (Kaye 1978 Nakayama amp Shimojo 1990)Neurophysiological evidence suggests that binocularmechanisms exist for locating occlusion boundaries(Tsao Conway amp Livingstone 2003) In the futurewhen databases become available of high-qualitynatural stereo images with coregistered RGB anddistance information the general approach describedhere may prove useful for determining the filters thatare optimal for detecting occlusion boundaries

The effect of a realistic disparity prior

Our analysis assumed a flat prior probabilitydistribution over disparity Two groups have recentlypublished estimates of the prior distribution ofdisparities based on range measurements in naturalscenes and human eye movement statistics (Hibbard2008 Liu et al 2008) Does the prior distribution ofdisparity signals have a significant effect on the optimalfilters and estimation performance To check wemodified the frequency of different disparities in ourtraining set to match the prior distribution ofdisparities encountered in natural viewing (Liu et al2008) Linear filter shapes LL neuron tuning curveshapes and performance levels were robust to differ-ences between a flat and realistic disparity prior

It has been hypothesized that the prior over thestimulus dimension of interest (eg disparity) maydetermine the optimal tuning curve shapes and theoptimal distribution of peak tunings (ie how thetuning curves should tile the stimulus dimension)(Ganguli amp Simoncelli 2010) In evaluating thishypothesis one must consider the results presented inthe present paper Our results show that natural signalsand the linear receptive fields that filter those signalsplace strong constraints on the shapes that tuningcurves can have Specifically the shapes of the LLneuron disparity tuning curve (approximately log-Gaussian) are robust to changes in the prior Thusalthough the prior may influence the optimal distribu-tion of peak tuning (our analysis does not address thisissue) it is unlikely to be the sole (or even the primary)determinant of tuning curve shapes

The effect of contrast normalization

Contrast normalization significantly contributes tothe Gaussian form of the filter response distributionsOne potential advantage of Gaussian response distri-butions is that pair-wise (and lower order) statisticsfully characterize the joint responses from an arbitrarynumber of filters making possible the decoding of largefilter population responses In the analysis presentedhere all binocular signals were normalized to a mean ofzero and a vector magnitude of 10 before beingprojected onto the binocular filters This is a simplifiedform of the contrast normalization that is a ubiquitousfeature of retinal and early cortical processing (Car-andini amp Heeger 2012) The standard model ofcontrast normalization in cortical neurons is given by

cnorm(x)frac14 c(x)ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffijjcethxTHORNjj2 thorn nc2

50

qwhere n is the dimen-

sionality of the vector and c50 is the half-saturationconstant (Albrecht amp Geisler 1991 Albrecht ampHamilton 1982 Heeger 1992) (The half-saturationconstant is so-called because in a neuron with anoutput squaring nonlinearity the response rate willequal half its maximum when the contrast of thestimulus equals the value of c50)

We examined the effect of different half-saturationconstants The optimal filters are robust to differentvalues of c50 The conditional response distributions arenot For large values of c50 the response distributionshave tails much heavier than Gaussians When c50frac14 00(as it did throughout the paper) the distributions arewell approximated but somewhat lighter-tailed thanGaussians The distributions (eg see Figure 4a) aremost Gaussian on average when c50 frac14 01 (Figure S9)On the basis of this finding we hypothesize a newfunction for cortical contrast normalization in additionto the many already proposed (Carandini amp Heeger2012) Contrast normalization may help create condi-tional filter response distributions that are Gaussianthereby making simpler the encoding and decoding ofhigh-dimensional subspaces of retinal image informa-tion

The effect of different optics in the two eyes

We modeled the optics of the two eyes as beingidentical whereas refractive power and monochromaticaberrations often differ somewhat between the eyes(Marcos amp Burns 2000) To evaluate the effect ofnormal optical variations we convolved the retinalprojections with point-spread functions previouslymeasured in the first authorrsquos left and right eyes (Burgeamp Geisler 2011) (The first author has normal opticalvariation) Then we determined optimal filters andperformance Filters and performance levels are similarto those in Figures 3 and 4 Thus our results are robustto typical optical differences between the eyes How-

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 13

ever if the optics in one eye is severely degradedrelative to the other then the optimal filters are quitedifferent and disparity estimation performance issignificantly poorer

Interestingly a persistent 4 diopter difference inrefractive power between the two eyes is the mostimportant risk factor for the development of amblyopia(Levi McKee amp Movshon 2011) a condition char-acterized by extremely poor or absent stereopsis(among other deficits) A four diopter differencebetween the left and right eye optics drastically reduces(ie nearly eliminates) computable interocular contrastsignals that covary systematically with disparity(Equation S6 Figure S7endashh)

Conclusions

The method presented here provides a prescriptionfor how to optimally estimate local depth cues fromimages captured by the photosensors in any visionsystem Our analysis of the depth cue of binoculardisparity provides insight into the computations andneural populations required for accurate estimation ofbinocular disparity in animals viewing natural scenesSpecifically by analyzing the information available atthe photoreceptors we improve upon existing compu-tational methods for solving the stereo correspondenceproblem and provide a normative account of a range ofestablished neurophysiological and psychophysicalfindings This study demonstrates the power ofcharacterizing the properties of natural signals that arerelevant for performing specific natural tasks A similarrecent analysis provided insight into the neuralcomputations required for accurate estimation of thefocus error (defocus) in local regions of retinal imagesof natural scenes (Burge amp Geisler 2011 Stevenson etal 1992) The same approach seems poised to provideinsight into the neural encoding and decoding thatunderlie many other fundamental sensory and percep-tual tasks

Methods details

Natural disparity signals

To determine the luminance structure of naturalscenes we photographed natural scenes on and aroundthe University of Texas at Austin campus with a tripod-mounted Nikon D700 14-bit SLR camera (4256 middot 2836pixels) fitted with a Sigma 50 mm prime lens (Burge ampGeisler 2011) To ensure sharp photographs thecamera lens was focused on optical infinity and all

imaged objects were at least 16 m from the cameraRAW photographs were converted to luminance valuesusing the measured sensitivities of the camera (Burge ampGeisler 2011 2012) and the human photopic sensitivityfunction (Stockman amp Sharpe 2000) Twelve hundred256 middot 256 pixel patches were randomly selected from80 photographs (15 Patches middot 80 Images Figure 1b)Four hundred were used for training the other 800were used for testing

To approximate the depth structure of naturalscenes we texture map the luminance patches ontoplanar fronto-parallel or slanted surface patches thatwere straight ahead at eye height After perspectiveprojection this procedure yields the same pattern ofretinal stimulation that would be created by viewingphotographs pasted on surfaces (eg walls) differentlyslanted in depth The disparity of the surface patch wasdefined as the disparity of the surfacersquos central point

d frac14 2 tan1 IPD=2

dfixation thorn D

tan1 IPD=2

ethdfixationTHORN

eth6THORNwhere dfixation is the fixation distance to the surfacecenter IPD is the interpupillary distance and D is thedepth Depth is given by D frac14 dsurface dfixation wheredsurface is the distance to the surface The disparity ofother points on the surface patch varies slightly acrossthe surface patch The change in disparity across thesurface is greater when the surface patches are slanted

Each surface patch was either coincident with ordisplaced from the point of fixation (Figure 1c) that isthe surface patches were positioned at one of 37 depthscorresponding to 37 disparity levels within Panumrsquosfusional range (Panum 1858) (16875 to 16875arcmin in equally spaced steps) Each surface patchsubtended 28 of visual angle from the cyclopean eyeSurfaces were then slanted (or not) and left- and right-eye projections were determined via perspective pro-jection ensuring that horizontal and vertical disparitieswere correct (Figure 1c) Next left and right projec-tions were resampled at 128 samples8 approximatelythe sampling rate of the human foveal cones (CurcioSloan Kalina amp Hendrickson 1990) Finally theimages were cropped such that they subtended 18 fromeach eye (6 058 about the left- and right-eye foveae)

Optics

Patches were defocused with a polychromatic point-spread function based on a wave-optics model of thehuman visual system The model assumed a 2 mm pupil(a size typical for a bright sunny day) (Wyszecki ampStiles 1982) human chromatic aberrations (ThibosYe Zhang amp Bradley 1992) a single refracting

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 14

surface and the Fraunhoffer approximation whichimplies that at or near the focal plane the opticaltransfer function (OTF) is given by the cross-correla-tion of the generalized pupil function with its complexconjugate In humans and nonhuman primates ac-commodative and vergence eye-movement systems aretightly coupled In other words the focus distanceusually equals the fixation distance (Fincham ampWalton 1957) We set the focus distance equal to thefixation distance of 40 cm and defocused the imagesappropriate for each disparity See Supplement forfurther details of how the optics were simulated

Sensor array responses

The information for disparity estimation depends onthe depth and luminance structure of natural scenesprojective geometry the optics of the two eyes thesensor arrays and the sensor noise These factorstogether determine the pattern of noisy sensor re-sponses

reethxTHORN frac14 IeethxTHORNpsfeethxTHORNfrac12 sampethxTHORN thorn g eth7THORNfor each eye e Ie(x) represents an eye-specific lumi-nance image of the light striking the sensor array ateach location xfrac14 (x y) in a hypothetical optical systemthat causes zero degradation in image quality Eacheyersquos optics degrade the idealized image the optics arerepresented by a polychromatic point-spread functionpsfe(x) that contains the effects of defocus chromaticaberration and photopic wavelength sensitivity Thesensor arrays are represented by a spatial samplingfunctions samp(x) Finally the sensor responses arecorrupted by noise g the noise level was set just highenough to remove retinal image detail that is unde-tectable by the human visual system (Williams 1985)(In pilot studies we found that the falloff in contrastsensitivity at low frequencies had a negligible effect onresults for simplicity we did not model its effects) Notethat Equation 7 represents a luminance (photopic)approximation of the images that would be captured bythe photoreceptors This approximation is sufficientlyaccurate for the present purposes (Burge amp Geisler2011)

Accuracy maximization analysis (AMA)

AMA is a method for dimensionality reduction thatfinds the low-dimensional set of features in sensorysignals that are most useful for performing a specifictask The dependence of AMA on task renders itdistinct from many other popular methods for dimen-sionality reduction Principal components analysis(PCA) for example finds the dimensions that account

for the most variation in the sensory signals There isof course no guarantee that factors causing the mostvariation in the sensory signals will be useful for thetask at hand Color lighting and the multitude oftextures in natural images for example cause signifi-cant variation in the retinal images Yet all of thisvariation is irrelevant for the task of estimatingdisparity Thus PCA returns features that are notnecessarily useful to the task while AMA is specificallydesigned to ignore the irrelevant variation

The logic of AMA is as follows Consider encodingeach training stimulus with a small population of filterswith known (internal) noise characteristics The filterresponses are given by the dot product of the contrast-normalized stimulus scaled by the maximum responserate plus response noise (here a small amount ofGaussian noise) With a known noise model it isstraightforward to compute the mean and variance ofeach filterrsquos response to each training stimulus Then aclosed-form expression is derived for the approximateaccuracy of the Bayesian optimal decoder (GeislerNajemnik amp Ing 2009) Finally this closed-formexpression can be used to search the space of linearfilters to find those that give the most accurateperformance If the algorithm does not settle in localminima it finds the Bayes-optimal filters for maximiz-ing performance in a given task (Geisler et al 2009)Here different random initializations yielded the samefinal estimated filters A Matlab implementation ofAMA and a short discussion of how to apply it areavailable at httpjburgecpsutexaseduresearchCodehtml

It is important to note that the Bayesian optimaldecoder used in AMA requires knowing the means andvariances for each possible stimulus and hence it canonly be used to decode the training stimuli In otherwords AMA can find the optimal linear filters given alarge enough training set but the decoder it uses tolearn those filters cannot be applied to arbitrary stimuliA separate analysis (like the one described in the bodyof the present paper) is required to determine how tooptimally decode the AMA filter responses for arbi-trary test stimuli

Estimating disparity

Increasing the number of disparity levels in thetraining set increases the accuracy of disparity estima-tion However increasing the number of disparitylevels in the training set also increases the training setsize and the computational complexity of learningfilters via AMA A balance must be struck Excellentcontinuous estimates are obtained using 1875 arcminsteps for training followed by interpolation of 577 filterresponse distributions (ie 31 interpolated response

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 15

distribution between each training step correspondingto one interpolated distribution every 35 arcsec)Interpolated distributions were obtained by fittingcubic splines through the response distribution meanvectors and covariance matrices This procedureresulted in 577 LL neurons which resulted in posteriorprobability distributions that were defined by 577discrete points MAP estimates were obtained byselecting the disparity with the highest posteriorprobability Increasing the number of interpolateddistributions (ie LL neurons) had no effect onperformance

Keywords natural scene statistics perceptual con-stancy ideal observer Bayesian statistics populationcode encoding decoding selectivity invariance depthperception stereopsis hierarchical model disparityenergy model simple cells complex cells

Acknowledgments

The authors thank Larry Cormack Bruno Olshau-sen and Jonathan Pillow for comments on an earlierdraft JB was supported by NSF grant IIS-1111328NIH grant 2R01-EY011747 and NIH Training Grant1T32-EY021462-01A1 WSG was supported by NIHgrant 2R01-EY011747

Commercial relationships noneCorresponding author Johannes BurgeEmail jburgemailcpsutexaseduAddress Center for Perceptual Systems University ofTexas at Austin Austin TX USA

References

Albrecht D G amp Geisler W S (1991) Motionselectivity and the contrast-response function ofsimple cells in the visual cortex Visual Neurosci-ence 7(6) 531ndash546

Albrecht D G amp Hamilton D B (1982) Striatecortex of monkey and cat Contrast responsefunction Journal of Neurophysiology 48(1) 217ndash237

Anzai A Ohzawa I amp Freeman R D (1999)Neural mechanisms for processing binocular in-formation I Simple cells Journal of Neurophysiol-ogy 82(2) 891ndash908

Badcock D R amp Schor C M (1985) Depth-increment detection function for individual spatialchannels Journal of the Optical Society of AmericaA 2(7) 1211ndash1215

Banks M S Gepshtein S amp Landy M S (2004)Why is spatial stereoresolution so low Journal ofNeuroscience 24(9) 2077ndash2089

Berens P Ecker A S Gerwinn S Tolias A S ampBethge M (2011) Reassessing optimal neuralpopulation codes with neurometric functionsProceedings of the National Academy of SciencesUSA 108(11) 4423

Blakemore C (1970) The range and scope ofbinocular depth discrimination in man The Journalof Physiology 211(3) 599ndash622

Burge J amp Geisler W S (2011) Optimal defocusestimation in individual natural images Proceed-ings of the National Academy of Sciences USA108(40) 16849ndash16854

Burge J amp Geisler W S (2012) Optimal defocusestimates from individual images for autofocusing adigital camera Proceedings of the ISampTSPIE 47thAnnual Meeting January 2012 Burlingame CA

Carandini M amp Heeger D J (2012) Normalizationas a canonical neural computation Nature ReviewsNeuroscience 13(1) 51ndash62

Chen Y amp Qian N (2004) A coarse-to-fine disparityenergy model with both phase-shift and position-shift receptive field mechanisms Neural Computa-tion 16(8) 1545ndash1577

Cormack L K Stevenson S B amp Schor C M(1991) Interocular correlation luminance contrastand cyclopean processing Vision Research 31(12)2195ndash2207

Cumming B G (2002) An unexpected specializationfor horizontal disparity in primate primary visualcortex Nature 418(6898) 633ndash636

Cumming B G amp DeAngelis G C (2001) Thephysiology of stereopsis Annual Review of Neuro-science 24 203ndash238

Cumming B G Shapiro S E amp Parker A J (1998)Disparity detection in anticorrelated stereogramsPerception 27(11) 1367ndash1377

Curcio C A Sloan K R Kalina R E ampHendrickson A E (1990) Human photoreceptortopography The Journal of Comparative Neurolo-gy 292(4) 497ndash523

De Valois R L Albrecht D G amp Thorell L G(1982) Spatial frequency selectivity of cells inmacaque visual cortex Vision Research 22(5) 545ndash559

DeAngelis G C Ohzawa I amp Freeman R D(1991) Depth is encoded in the visual cortex by aspecialized receptive field structure Nature352(6331) 156ndash159

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 16

Ferster D amp Miller K D (2000) Neural mechanismsof orientation selectivity in the visual cortexAnnual Review of Neuroscience 23(1) 441ndash471

Fincham E amp Walton J (1957) The reciprocalactions of accommodation and convergence TheJournal of Physiology 137(3) 488ndash508

Fleet D J Wagner H amp Heeger D J (1996)Neural encoding of binocular disparity Energymodels position shifts and phase shifts VisionResearch 36(12) 1839ndash1857

Ganguli D amp Simoncelli E P (2010) Implicitencoding of prior probabilities in optimal neuralpopulations Advances in Neural Information Pro-cessing Systems (NIPS) 23 658ndash666

Geisler W S Najemnik J amp Ing A D (2009)Optimal stimulus encoders for natural tasksJournal of Vision 9(13)17 1ndash16 httpwwwjournalofvisionorgcontent91317 10116791317 [PubMed] [Article]

Girshick A R Landy M S amp Simoncelli E P(2011) Cardinal rules Visual orientation percep-tion reflects knowledge of environmental statisticsNature Neuroscience 14(7) 926ndash932

Haefner R amp Bethge M (2010) Evaluating neuronalcodes for inference using Fisher informationAdvances in Neural Information Processing Systems23 1ndash9

Haefner R M amp Cumming B G (2008) Adaptationto natural binocular disparities in primate V1explained by a generalized energy model Neuron57(1) 147ndash158

Harris J M McKee S P amp Smallman H S (1997)Fine-scale processing in human binocular stereop-sis Journal of the Optical Society of America AOptics Image Science and Vision 14(8) 1673ndash1683

Hecht S Shlaer S amp Pirenne M H (1942) Energyquanta and vision The Journal of General Physi-ology 25(6) 819ndash840

Heeger D J (1992) Normalization of cell responses incat striate cortex Visual Neuroscience 9(2) 181ndash197

Hibbard P B (2008) Binocular energy responses tonatural images Vision Research 48(12) 1427ndash1439

Hoyer P O amp Hyvarinen A (2000) Independentcomponent analysis applied to feature extractionfrom colour and stereo images Network Compu-tation in Neural Systems 11(3) 191ndash210

Jazayeri M amp Movshon J A (2006) Optimalrepresentation of sensory information by neuralpopulations Nature Neuroscience 9(5) 690ndash696

Kaye M (1978) Stereopsis without binocular corre-lation Vision Research 18(8) 1013ndash1022

Landers D D amp Cormack L K (1997) Asymme-tries and errors in perception of depth fromdisparity suggest a multicomponent model ofdisparity processing Perception amp Psychophysics59(2) 219ndash231

Lehky S R amp Sejnowski T J (1990) Neural modelof stereoacuity and depth interpolation based on adistributed representation of stereo disparityJournal of Neuroscience 10(7) 2281ndash2299

Levi D M McKee S P amp Movshon J A (2011)Visual deficits in anisometropia Vision Research51(1) 48ndash57

Lewicki M S (2002) Efficient coding of naturalsounds Nature Neuroscience 5(4) 356ndash363

Li Z amp Atick J J (1994) Efficient stereo coding inthe multiscale representation Network Computa-tion in Neural Systems 5 157ndash174

Liu Y Bovik A C amp Cormack L K (2008)Disparity statistics in natural scenes Journal ofVision 8(11)19 1ndash14 httpwwwjournalofvisionorgcontent81119 doi10116781119 [PubMed][Article]

Ma W J Beck J M Latham P E amp Pouget A(2006) Bayesian inference with probabilistic pop-ulation codes Nature Neuroscience 9(11) 1432ndash1438

Marcos S amp Burns S A (2000) On the symmetrybetween eyes of wavefront aberration and conedirectionality Vision Research 40(18) 2437ndash2447

Marr D amp Poggio T (1976) Cooperative computa-tion of stereo disparity Science 194(4262) 283ndash287

McKee S P Levi D M amp Bowne S F (1990) Theimprecision of stereopsis Vision Research 30(11)1763ndash1779

Nakayama K amp Shimojo S (1990) Da Vincistereopsis Depth and subjective occluding contoursfrom unpaired image points Vision Research30(11) 1811ndash1825

Nienborg H Bridge H Parker A J amp CummingB G (2004) Receptive field size in V1 neuronslimits acuity for perceiving disparity modulationJournal of Neuroscience 24(9) 2065ndash2076

Ogle K N (1952) On the limits of stereoscopic visionJournal of Experimental Psychology 44(4) 253ndash259

Ohzawa I (1998) Mechanisms of stereoscopic visionThe disparity energy model Current Opinion inNeurobiology 8(4) 509ndash515

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 17

Ohzawa I DeAngelis G C amp Freeman R D(1990) Stereoscopic depth discrimination in thevisual cortex Neurons ideally suited as disparitydetectors Science 249(4972) 1037ndash1041

Ohzawa I DeAngelis G C amp Freeman R D(1997) Encoding of binocular disparity by complexcells in the catrsquos visual cortex Journal of Neuro-physiology 77(6) 2879ndash2909

Olshausen B A amp Field D J (1996) Emergence ofsimple-cell receptive field properties by learning asparse code for natural images Nature 381(6583)607ndash609

Panum P (1858) Physiologische Untersuchungen uberdas Sehen mit zwei Augen [Translation Psycholog-ical investigations on seeing with two eyes] KielSchwerssche Buchandlung

Poggio G F Gonzalez F amp Krause F (1988)Stereoscopic mechanisms in monkey visual cortexBinocular correlation and disparity selectivityJournal of Neuroscience 8(12) 4531ndash4550

Qian N (1997) Binocular disparity and the perceptionof depth Neuron 18(3) 359ndash368

Qian N amp Zhu Y (1997) Physiological computationof binocular disparity Vision Research 37(13)1811ndash1827

Read J C A amp Cumming B G (2003) Testingquantitative models of binocular disparity selectiv-ity in primary visual cortex Journal of Neurophys-iology 90(5) 2795ndash2817

Read J C A amp Cumming B G (2007) Sensors forimpossible stimuli may solve the stereo correspon-dence problem Nature Neuroscience 10(10) 1322ndash1328

Read J C A Parker A J amp Cumming B G (2002)A simple model accounts for the response ofdisparity-tuned V1 neurons to anticorrelated im-ages Visual Neuroscience 19(6) 735ndash753

Simoncelli E P amp Olshausen B A (2001) Naturalimage statistics and neural representation AnnualReview of Neuroscience 24 1193ndash1216

Skottun B C De Valois R L Grosof D HMovshon J A Albrecht D G amp Bonds A B(1991) Classifying simple and complex cells on the

basis of response modulation Vision Research31(7-8) 1079ndash1086

Stevenson S B Cormack L K Schor C M ampTyler C W (1992) Disparity tuning in mecha-nisms of human stereopsis Vision Research 32(9)1685ndash1694

Stockman A amp Sharpe L T (2000) The spectralsensitivities of the middle- and long-wavelength-sensitive cones derived from measurements inobservers of known genotype Vision Research40(13) 1711ndash1737

Tanabe S amp Cumming B G (2008) Mechanismsunderlying the transformation of disparity signalsfrom V1 to V2 in the macaque Journal ofNeuroscience 28(44) 11304ndash11314

Tanabe S Haefner R M amp Cumming B G (2011)Suppressive mechanisms in monkey V1 help tosolve the stereo correspondence problem Journal ofNeuroscience 31(22) 8295ndash8305

Thibos L N Ye M Zhang X amp Bradley A(1992) The chromatic eye A new reduced-eyemodel of ocular chromatic aberration in humansApplied Optics 31(19) 3594ndash3600

Tsao D Y Conway B R amp Livingstone M S(2003) Receptive fields of disparity-tuned simplecells in macaque V1 Neuron 38(1) 103ndash114

Tyler C W (1975) Spatial organization of binoculardisparity sensitivity Vision Research 15(5) 583ndash590

Tyler C W amp Julesz B (1978) Binocular cross-correlation in time and space Vision Research18(1) 101ndash105

Vlaskamp B N S Yoon G amp Banks M S (2011)Human stereopsis is not limited by the optics of thewell-focused eye Journal of Neuroscience 31(27)9814ndash9818

Williams D R (1985) Visibility of interference fringesnear the resolution limit Journal of the OpticalSociety of America A 2(7) 1087ndash1093

Wyszecki G amp Stiles W (1982) Color scienceConcepts and methods quantitative data and for-mulas New York John Wiley amp Sons

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 18

113 13

Optimal13 disparity13 estimation13 in13 natural13 stereo-shy‐images13 (Supplement)13

Johannes13 Burge13 amp13 Wilson13 S13 Geisler13 13 Computing13 the13 log13 likelihood13 with13 populations13 of13 simple13 and13 complex13 cells13 13 Simple13 cells13 in13 visual13 cortex13 produce13 only13 positive13 responses13 whereas13 AMA13 filters13 produce13 both13 positive13 and13 negative13 responses13 The13 linear13 AMA13 filter13 responses13 (see13 first13 term13 in13 Eq13 5)13 and13 their13 pairwise13 sums13 (see13 the13 sum13 in13 the13 third13 termrsquos13 parentheses)13 can13 be13 obtained13 from13 a13 population13 of13 simple13 cells13 We13 define13 simple13 cells13 as13 units13 whose13 outputs13 result13 from13 linear13 filtering13 followed13 by13 half-shy‐wave13 rectification13 (ie13 thresholding13 at13 zero)13 Each13 AMA13 filter13 response13 in13 the13 first13 term13 can13 be13 obtained13 from13 two13 simple13 cells13 that13 are13 lsquoonrsquo13 and13 lsquooffrsquo13 versions13 of13 each13 AMA13 filter13 (Fig13 S3a13 see13 Fig13 3)13 Each13 pairwise-shy‐summed13 AMA13 filter13 response13 in13 the13 third13 term13 can13 be13 obtained13 from13 two13 simple13 cells13 that13 are13 lsquoonrsquo13 and13 lsquooffrsquo13 versions13 of13 each13 pairwise-shy‐summed13 filter13 (Fig13 S3a13 S4)13 13 13 The13 squared13 AMA13 filter13 responses13 (see13 the13 second13 and13 third13 terms13 in13 Eq13 5)13 can13 be13 obtained13 from13 a13 population13 of13 complex13 cells13 We13 define13 complex13 cells13 as13 units13 whose13 responses13 result13 from13 linear13 filtering13 followed13 by13 a13 squaring13 non-shy‐linearity13 as13 is13 often13 the13 case13 in13 cortex13 Complex13 cells13 can13 be13 implemented13 by13 summing13 and13 then13 squaring13 (or13 squaring13 and13 then13 summing)13 the13 outputs13 of13 lsquoonrsquo13 and13 lsquooffrsquo13 simple13 cells13 (Fig13 S3b13 the13 disparity13 tuning13 curves13 of13 these13 complex13 cells13 are13 given13 in13 Fig13 S5)13 13 Finally13 neurons13 whose13 response13 rates13 represent13 the13 log13 likelihood13 of13 the13 AMA13 filter13 population13 responses13 can13 be13 constructed13 via13 a13 weighted13 sum13 of13 simple13 and13 complex13 cell13 population13 responses13 (Eq13 513 Fig13 613 Fig13 7)13 (Note13 that13 the13 likelihood13 can13 be13 obtained13 by13 exponentiating13 the13 log-shy‐likelihood13 responses)13 In13 other13 words13 a13 large13 collection13 of13 log-shy‐likelihood13 (LL)13 neurons13 each13 with13 a13 different13 preferred13 disparity13 can13 be13 constructed13 from13 a13 fixed13 set13 of13 simple13 and13 complex13 cells13 simply13 by13 changing13 the13 weights13 13 13 Derivation13 of13 optimal13 weights13 on13 simple13 and13 complex13 cells13 The13 first13 term13 in13 Eq13 513 (ie13 the13 linear13 term)13 can13 be13 conceptualized13 as13 a13 weighted13 sum13 of13 simple13 cell13 population13 responses13 The13 second13 and13 third13 terms13 (ie13 the13 quadratic13 terms)13 can13 be13 conceptualized13 as13 weighted13 sums13 of13 complex13 cell13 population13 responses13 The13 constant13 corresponds13 to13 baseline13 response13 The13 constant13 and13 the13 weights13 on13 the13 linear13 squared13 and13 sum-shy‐squared13 filters13 are13 given13 by13

wik = Ck

minus1uk 13 13 13 13 13 13 13 13 13 (S1a)13

13 wiik = minusdiag(Cminus1)+ 05Cminus11 13 13 13 13 13 13 13 (S1b)13

wij k = minus05Cij

minus1 forallij j gt i 13 13 13 13 13 13 13 13 (S1c)13

cons primetk = minus05ukTCk

minus1uk + constk 13 13 13 13 13 13 (S1d)13 13 where13 I13 is13 the13 identity13 matrix13 113 is13 the13 lsquoonesrsquo13 vector13 and13 diag()13 sets13 a13 matrix13 diagonal13 to13 a13 vector13 Here13 we13 derive13 these13 weight13 equations13 based13 on13 the13 multi-shy‐dimensional13 Gaussian13 approximations13 to13 the13 empirically13 determined13 filter13 response13 distributions13 (ie13 conditional13 response13 distributions)13 (see13 Fig13 4a)13 13 13 13 13 13

213 13

First13 we13 rewrite13 Eq13 513 in13 the13 main13 text13 by13 expanding13 its13 third13 term13 13

ln p R δ k( ) = wik Ri

i=1

n

sum + wiik Ri2

i=1

n

sum + wij k Ri2 + 2Ri Rj + Rj

2( )j=i+1

n

sumi=1

nminus1

sum + cons primet 13 (S2)13

13 Second13 we13 rewrite13 Eq13 413 in13 the13 main13 text13

13

ln p R δ k( ) = minus05 R minus uk( )T Ak R minus uk( )+ const 13 13 13 13 13 (S3)13

13 where13 Ak 13 equals13 the13 inverse13 covariance13 matrix13 Ck

minus1 13 (Recall13 that13 the13 mean13 vector13 uk and13 covariance13 matrix13 Ck are13 obtained13 by13 projecting13 a13 collection13 of13 natural13 stereo-shy‐images13 having13 disparity13 δ k 13 onto13 the13 filters)13 From13 here13 forward13 the13 disparity13 index13 k13 is13 dropped13 for13 notational13 simplicity13 Third13 we13 set13 Eqs13 S213 and13 S313 equal13 to13 one13 another13 multiply13 through13 and13 collect13 terms13 The13 terms13 of13 the13 log13 likelihood13 (Eq13 S3)13 are13 found13 to13 have13 the13 form13 13

( ) ( )2 2 205 05 2ii i i ii i ii i i ii ia R u a R a Ru a uminus minus = minus minus + 13 13 13 13 (S4a)13

minusaij Ri minus ui( ) Rj minus u j( ) = minus aijRiRj minus aijRiu j minus aijRjui + aiju jui( ) 13 13 13 (S4b)13

13 where13 iia 13 are13 the13 on-shy‐diagonal13 elements13 of13 A 13 ija are13 the13 off-shy‐diagonal13 elements13 and13 iu 13 the13 are13 the13 mean13 filter13 responses13 13 13 The13 aim13 is13 now13 to13 express13 wi wii 13 and13 wij 13 in13 terms13 of13 iu 13 iia 13 and13 ija 13 First13 consider13 wi 13 In13 this13 case13

we13 want13 to13 collect13 terms13 containing13 iR 13 wiRi 13 from13 Eq13 S213 and13 aiiRiui 13 and13 aijRiu j 13 from13 Eq13 S4a13 Setting13 the13 iR 13 terms13 equal13 and13 canceling13 gives13 13

wi = aiju jj=1

n

sum 13 13 13 13 13 13 13 13 13 (S5a)13

13 Note13 that13 the13 weights13 on13 the13 linear13 filter13 responses13 wi 13 are13 zero13 if13 the13 mean13 linear13 filter13 responses13 ui 13 are13 zero13 In13 our13 case13 the13 filter13 response13 distributions13 means13 (ie13 the13 means13 of13 the13 conditional13 response13 distributions)13 are13 all13 near13 zero13 (see13 Fig13 4a13 Fig13 S2)13 Thus13 the13 weights13 on13 the13 simple13 cell13 responses13 (Fig13 6a)13 contribute13 little13 to13 the13 units13 (LL13 neurons)13 that13 respond13 according13 to13 the13 log13 likelihood13 of13 the13 linear13 filter13 population13 response13 (see13 Discussion)13 13 Next13 consider13 wij 13 i ne j 13 In13 this13 case13 we13 want13 to13 collect13 terms13 containing13 RiRj 13 2wijRiRj 13 appears13

in13 Eq13 S213 and13 minusaijRiRj 13 appears13 in13 Eq13 S4b13 Setting13 the13 two13 RiRj 13 terms13 equal13 to13 each13 other13 and13 canceling13 gives13 13

wij = minus05aij 13 13 13 13 13 13 13 13 13 (S5b)13

313 13

Next13 consider13 wii 13 In13 this13 case13 the13 total13 weight13 on13 2iR 13 in13 Eq13 S213 must13 be13 05 iiaminus 13 The13 weight13 from13

the13 third13 term13 of13 Eq13 S213 can13 be13 found13 by13 substituting13 wij = minus05aij 13 Subtracting13 that13 weight13 from13 the13

total13 weight13 specified13 by13 Eq13 S413 gives13 wii 13 13

wii = minusaii + 05 aijjsum 13 13 13 13 13 13 13 13 13 (S5c)13

13 Last13 by13 grouping13 terms13 from13 Eq13 S413 that13 are13 independent13 of13 the13 filter13 response13 we13 find13 the13 constant13 in13 Eq13 S213 is13 given13 by13 13

cons primet = minus05 aiimicroi2 + aijmicroimicro j

j foralljneisum

⎝⎜⎞

⎠⎟isum + const 13 13 13 13 (S5d)13

13 Eqs13 S1a-shy‐d13 are13 obtained13 by13 rewriting13 Eqs13 S5a-shy‐d13 in13 matrix13 notation13 13 13 In13 visual13 cortex13 simple13 and13 complex13 cells13 have13 the13 same13 maximum13 firing13 rates13 on13 average13 (Albrecht13 amp13 Hamilton13 1982)13 However13 the13 maximum13 possible13 response13 of13 the13 squared13 filters13 (see13 the13 second13 term13 in13 Eq13 5)13 is13 R2max 13 and13 the13 maximum13 possible13 response13 of13 the13 sum-shy‐squared13 filters13 (see13 the13 third13

term13 in13 Eq13 5)13 is13 given13 by13 R2ijmax = Rmax 2+ 2fi sdot f j( )2 13 where13 fi 13 and13 f j 13 are13 the13 weighting13 functions13 defining13 filters13 i13 and13 j13 (see13 Fig13 3a)13 Thus13 if13 these13 computations13 were13 implemented13 in13 cortexmdashthat13 is13 if13 the13 maximum13 response13 of13 the13 complex13 cells13 equaled13 the13 maximum13 response13 of13 the13 simple13 cells13 Rmax 13 rather13 than13 R

2max 13 and13 R

2ijmaxmdashthe13 weights13 on13 the13 complex13 cell13 responses13 must13 be13 scaled13

Thus13 the13 optimal13 weights13 on13 complex13 cells13 are13 given13 by13 primewi = wi 13 primewii = Rmaxwii 13 and13 primewij =Rijmax2

Rmaxwij 13

These13 scale13 factors13 are13 important13 because13 in13 a13 neurophysiological13 experiment13 one13 would13 measure13 primew 13 not13 w 13 Consequently13 primewii 13 and13 primewij 13 are13 the13 normalized13 weights13 plotted13 in13 Fig13 7b13

13 Evaluating13 the13 effect13 of13 surface13 slant13 Slant13 causes13 disparity13 to13 change13 across13 the13 patch13 We13 evaluated13 the13 effect13 of13 surface13 slant13 by13 determining13 the13 optimal13 filters13 and13 estimation13 performance13 for13 a13 training13 set13 with13 a13 distribution13 of13 slants13 If13 we13 assume13 (plausibly)13 that13 objects13 are13 on13 average13 equally13 likely13 to13 be13 viewed13 from13 any13 given13 angle13 then13 for13 a13 given13 small13 angular13 retinal13 extent13 the13 probability13 of13 a13 given13 slant13 about13 the13 vertical13 axis13 should13 fall13 as13 a13 half-shy‐cosine13 function13 Slant13 was13 calculated13 for13 the13 cyclopean13 eye13 To13 evaluate13 the13 effect13 of13 surface13 slant13 variation13 we13 generated13 a13 training13 set13 of13 stereo-shy‐images13 from13 a13 set13 of13 surfaces13 with13 slants13 that13 were13 randomly13 drawn13 from13 a13 truncated13 (+7113 deg)13 cosine13 probability13 density13 (more13 extreme13 slants13 were13 not13 practical13 to13 include13 because13 of13 the13 surface13 size13 required13 to13 produce13 a13 projected13 patch13 width13 of13 113 deg)13 The13 within-shy‐patch13 changes13 in13 disparity13 that13 were13 introduced13 by13 these13 slants13 are13 comparable13 to13 those13 in13 natural13 viewing13 (Fig13 S1g)13 Results13 based13 on13 this13 analysis13 are13 thus13 likely13 to13 be13 representative13 Then13 we13 determined13 the13 optimal13 binocular13 filters13 for13 this13 new13 training13 set13 The13 filters13 that13 were13 trained13 on13 surfaces13 having13 different13 slants13 (Fig13 S1a-shy‐c)13 and13 the13 LL13 neurons13 that13 are13 constructed13 from13 them13 (Fig13 S1de)13 are13 quite13 similar13 to13 those13 that13 were13 trained13 only13 on13 fronto-shy‐parallel13 surfaces13 The13 main13 (although13 slight)13 differences13 are13 that13 the13 new13 filters13 have13 spatial13 extents13 that13 are13 slightly13 smaller13 and13 octave13 bandwidths13 that13 are13 slightly13 higher13

413 13

and13 that13 the13 new13 LL13 neuron13 tuning13 curves13 are13 broader13 Disparity13 estimation13 performance13 on13 slant13 surfaces13 is13 also13 comparable13 but13 precision13 is13 somewhat13 reduced13 (Fig13 S1f13 Fig13 5d)13 When13 disparity13 changes13 across13 the13 patch13 the13 optimal13 encoding13 filters13 must13 be13 small13 enough13 to13 prevent13 the13 disparity13 signal13 from13 being13 lsquosmeared13 outrsquo13 by13 within-shy‐patch13 (ie13 within-shy‐receptive13 field)13 disparity13 variation13 And13 they13 must13 be13 large13 enough13 to13 prevent13 a13 poor13 signal-shy‐to-shy‐noise13 ratio13 which13 would13 lead13 to13 a13 preponderance13 of13 false13 matches13 (ie13 inaccurate13 disparity13 estimates)13 The13 optimal13 filters13 appear13 to13 strike13 this13 balance13 13 13 Evaluating13 the13 effect13 of13 depth13 variations13 other13 than13 slant13 Naturally13 occurring13 fine13 depth13 structure13 and13 occlusions13 (ie13 within-shy‐patch13 depth13 variation13 and13 depth13 discontinuities13 Fig13 S8a-shy‐c)13 were13 not13 present13 in13 either13 of13 our13 training13 sets13 To13 determine13 how13 the13 magnitude13 of13 these13 other13 sources13 of13 depth13 variation13 compare13 to13 that13 due13 to13 slant13 we13 analyzed13 4000013 113 deg13 patches13 from13 3013 outdoor13 range13 images13 obtained13 with13 a13 high-shy‐precision13 Riegl13 VZ40013 range13 scanner13 For13 each13 patch13 we13 computed13 the13 standard13 deviation13 of13 i)13 the13 differences13 between13 the13 raw13 range13 values13 and13 the13 best-shy‐fitting13 fronto-shy‐parallel13 plane13

σ nonminus planar 13 and13 ii)13 the13 differences13

between13 the13 best-shy‐fitting13 slanted13 and13 fronto-shy‐parallel13 planes13 σ planar 13 The13 ratio13 of13 these13 standard13

deviations13 σ planar σ nonminus planar 13 provides13 a13 measure13 of13 the13 relative13 magnitude13 of13 depth13 variation13 due13 to13

slant13 and13 variation13 due13 to13 other13 factors13 If13 the13 depth13 variation13 is13 due13 only13 to13 surface13 slant13 then13 the13 ratio13

σ planar σ nonminus planar 13 will13 have13 a13 value13 equal13 to13 1013 If13 all13 of13 the13 depth13 variation13 is13 due13 to13 factors13

other13 than13 slant13 the13 ratio13 will13 have13 a13 value13 of13 0013 Fig13 S8d13 plots13 the13 distribution13 of13 this13 ratio13 for13 all13 range13 patches13 In13 the13 most13 common13 patches13 essentially13 all13 the13 variation13 is13 captured13 by13 slant13 (ie13 the13 modal13 ratio13 is13 ~10)13 The13 median13 ratio13 is13 07513 indicating13 that13 in13 the13 median13 patch13 7513 of13 the13 depth13 variation13 is13 captured13 by13 slant13 Given13 that13 the13 optimal13 binocular13 filters13 are13 robust13 to13 variation13 in13 surface13 slant13 and13 given13 that13 most13 of13 natural13 depth13 variation13 is13 captured13 by13 slant13 it13 is13 unlikely13 that13 depth13 variation13 other13 than13 slant13 will13 qualitatively13 affect13 our13 results13 13 External13 variability13 and13 neural13 noise13 13 We13 examined13 the13 relative13 magnitude13 of13 stimulus-shy‐induced13 LL13 neuron13 variability13 and13 typical13 neural13 noise13 in13 cortex13 Typical13 cortical13 neurons13 have13 a13 peak13 mean13 response13 larger13 than13 3013 spikessec13 (Geisler13 amp13 Albrecht13 1997)13 and13 intrinsic13 noise13 variance13 proportional13 to13 the13 neuronrsquos13 mean13 response13 with13 a13 proportionality13 constant13 (ie13 Fano13 factor)13 less13 than13 15(Tolhurst13 Movshon13 amp13 Dean13 1983)13 To13 err13 on13 the13 side13 of13 too13 much13 neural13 noise13 we13 assume13 that13 each13 LL13 neuron13 has13 a13 maximum13 average13 response13 of13 3013 spkss13 and13 a13 Fano13 factor13 of13 1513 (bc13 the13 effect13 of13 noise13 is13 greater13 at13 lower13 firing13 rates)13 We13 convert13 the13 LL13 neuron13 responses13 to13 spike13 rates13 by13 first13 obtaining13 the13 average13 response13 of13 each13 LL13 neuron13 to13 its13 preferred13 disparity13 and13 then13 scaling13 that13 value13 to13 the13 maximum13 average13 response13 For13 an13 integration13 time13 of13 a13 typical13 fixation13 duration13 (30013 ms)13 stimulus-shy‐induced13 response13 variance13 was13 ~3x13 greater13 than13 the13 internal13 noise13 For13 an13 integration13 time13 equal13 to13 a13 typical13 stimulus13 presentation13 in13 a13 psychophysical13 task13 (113 sec)13 13 stimulus-shy‐induced13 response13 variance13 was13 ~9x13 greater13 than13 the13 internal13 noise13 Thus13 external13 variability13 must13 be13 considered13 to13 understand13 the13 influence13 of13 neural13 noise13 on13 the13 behavioral13 limits13 of13 vision13 under13 natural13 conditions13 In13 fact13 stimulus-shy‐induced13 variability13 is13 likely13 to13 be13 the13 controlling13 variability13 in13 many13 natural13 single-shy‐trial13 tasks13 13 Spatial13 frequency13 tuning13 of13 optimal13 linear13 binocular13 filters13 13 To13 develop13 an13 intuition13 for13 why13 spatial13 frequencies13 higher13 than13 ~613 cpd13 carry13 little13 information13 about13 disparity13 we13 examined13 the13 disparity13 signals13 resulting13 from13 a13 binocularly-shy‐viewed13 high-shy‐contrast13 cosine-shy‐windowed13 edge13 The13 Fourier-shy‐decomposition13 of13 each13 eyersquos13 stimulus13 is13 a13 set13 of13

513 13

phase-shy‐aligned13 sinewave13 gratings13 with13 the13 1f13 contrast13 fall-shy‐off13 typical13 of13 natural13 scenes(Field13 1987)13 The13 binocular13 difference13 signal13 (ie13 the13 difference13 of13 the13 left13 and13 right13 signals)13 is13 sinusoidal13 with13 contrast13 amplitude13 given13 by13 13

AB f δ k( ) = AL( f )2 + AR( f )2 minus 2AL( f )AR( f )cos 2π fδ k( ) 13 13 13 (S6)13

13 where f 13 is13 the13 spatial13 frequency13 δ k 13 is13 a13 particular13 disparity13 AL ( f ) 13 and13 AR( f ) 13 are13 the13 left13 and13

right13 eye13 retinal13 amplitudes13 and13 AB f δ k( ) 13 is13 the13 amplitude13 of13 the13 binocular13 contrast13 difference13 signal13 If13 the13 two13 eyes13 have13 identical13 optics13 the13 left-shy‐13 and13 right-shy‐eye13 retinal13 amplitudes13 (which13 include13 the13 effects13 of13 the13 optics13 and13 the13 1 f 13 falloff13 of13 natural13 image13 spectra)13 will13 be13 the13 same13 for13 a13 given13 disparity13 Fig13 S7a13 plots13 pattern13 of13 binocular13 contrast13 signals13 for13 each13 of13 several13 disparities13 Fig13 S7b13 shows13 the13 signals13 after13 filtering13 with13 a13 bank13 of13 log-shy‐Gabor13 filters13 each13 having13 a13 1513 octave13 bandwidth13 The13 patterns13 of13 binocular13 contrast13 barely13 differ13 above13 ~613 cpd13 A13 filter13 population13 (for13 disparity13 estimation)13 will13 be13 most13 efficient13 if13 each13 filter13 signals13 information13 about13 a13 range13 of13 disparities13 13 13 V113 binocular13 receptive13 fields13 tuned13 to13 the13 highest13 useful13 spatial13 frequency13 (~613 cpd)13 would13 have13 a13 spatial13 extent13 of13 ~813 arcmin13 assuming13 the13 typical13 1513 octave13 bandwidth13 of13 neurons13 in13 cortex13 Given13 that13 receptive13 fields13 cannot13 signal13 disparity13 variation13 finer13 than13 their13 own13 width13 this13 analysis13 may13 help13 explain13 why13 humans13 lose13 the13 ability13 to13 detect13 disparity13 modulations13 (ie13 sinusoidal13 modulations13 in13 disparity-shy‐defined13 depth)13 at13 spatial13 frequencies13 higher13 than13 ~413 cpd13 (Banks13 Gepshtein13 amp13 Landy13 200413 Harris13 McKee13 amp13 Smallman13 1997)13 Vergence13 eye13 movement13 jitter13 has13 previously13 been13 proposed13 as13 an13 explanation13 for13 why13 useful13 binocular13 information13 is13 restricted13 to13 low13 spatial13 frequencies13 (Vlaskamp13 Yoon13 amp13 Banks13 2011)13 vergence13 noise13 can13 degrade13 stereo13 information13 enough13 to13 render13 high13 frequency13 signals13 unmeasurable13 (Fig13 S7cd)13 The13 analysis13 presented13 here13 suggests13 that13 the13 frequency13 at13 which13 humans13 lose13 the13 ability13 to13 detect13 disparity13 modulation13 may13 simply13 be13 explained13 by13 the13 statistics13 of13 natural13 images13 13 Optics13 The13 retinal13 images13 used13 in13 our13 analysis13 were13 simulated13 to13 match13 the13 retinal13 image13 formation13 that13 occurs13 in13 the13 human13 eye13 In13 humans13 and13 non-shy‐human13 primates13 accommodative13 and13 vergence13 eye-shy‐movement13 systems13 are13 yoked(Fincham13 amp13 Walton13 1957)13 Consistent13 with13 this13 fact13 we13 set13 the13 focus13 distance13 equal13 to13 the13 fixation13 distance13 of13 4013 cm13 and13 defocused13 the13 images13 appropriate13 for13 each13 disparity13 The13 image13 of13 a13 surface13 displaced13 from13 the13 fixation13 is13 defocused13 by13

13 ΔD cong δ IPD 13 13 13 13 13 13 13 13 13 13 (S7)13

13 where13 δ 13 is13 the13 disparity13 of13 the13 surface13 expressed13 in13 radians13 IPD13 is13 the13 inter-shy‐pupillary13 distance13 expressed13 in13 meters13 and13 ΔD 13 is13 the13 defocus13 expressed13 in13 diopters13 (1meters)13 To13 account13 for13 the13 effect13 of13 chromatic13 aberration13 we13 created13 a13 different13 polychromatic13 point-shy‐spread13 function13 (PSF)13 for13 each13 disparity13 (ie13 defocus13 Eq13 S7)13 Single-shy‐wavelength13 PSFs13 were13 computed13 every13 1013 nm13 between13 40013 and13 70013 nm13 The13 wavelength-shy‐dependent13 change13 in13 refractive13 power13 of13 the13 human13 eye13 was13 taken13 from13 the13 literature13 (Thibos13 Ye13 Zhang13 amp13 Bradley13 1992)13 Polychromatic13 PSFs13 were13 obtained13 by13 weighting13 the13 single-shy‐wavelength13 PSFs13 by13 the13 photopic13 sensitivity13 function13 (Stockman13 amp13 Sharpe13 2000)13 and13 by13 the13 D6513 daylight13 illumination13 spectrum13 and13 then13 summing13 13

613 13

13 psf photopic xΔD( ) = 1

Kpsf xλΔD( )sc

λsum λ( )D65 λ( ) 13 13 13 13 (S8)13

13 where13 sc λ( ) 13 is13 the13 human13 photopic13 sensitivity13 function13 and13 K13 is13 a13 normalizing13 constant13 that13 sets13 the13 volume13 of13 psf photopic 13 to13 1013 The13 modulation13 transfer13 function13 (MTF)13 is13 the13 amplitude13 of13 the13 Fourier13 transform13 of13 the13 PSF13 13 Contrast13 normalization13 Contrast13 normalization13 occurs13 early13 in13 visual13 processing13 The13 standard13 model13 of13 cortical13 neuron13 response13 assumes13 that13 the13 input13 is13 the13 contrast13 signal13

c x( ) = r x( )minus r( ) r 13 where13 r x( ) 13 are13 the13

noisy13 sensor13 responses13 to13 the13 luminance13 image13 falling13 on13 the13 retina13 over13 a13 local13 area13 and13 r 13 is13 the13 local13 mean13 This13 contrast13 signal13 is13 then13 normalized13 by13 the13 local13 contrast13

cnorm x( ) = c x( ) c x( ) 2

+ nc502 13 where13 n13 is13 the13 dimensionality13 of13 the13 vector13 c x( ) 2 13 is13 the13 contrast13

energy13 (or13 equivalently13 n 13 times13 the13 squared13 RMS13 contrast)13 and13 50c 13 is13 the13 half-shy‐saturation13 constant13 (Albrecht13 amp13 Geisler13 199113 Albrecht13 amp13 Hamilton13 198213 Heeger13 1992)13 (We13 are13 agnostic13 about13 the13 mechanism13 by13 which13 the13 contrast13 normalization13 is13 achieved)13 Finally13 the13 normalized13 contrast13 signal13 is13 weighted13 by13 the13 cortical13 neuronrsquos13 receptive13 field13 and13 passed13 through13 a13 static13 non-shy‐linearity13 For13 example13 in13 the13 case13 of13 a13 binocular13 simple13 cell13 the13 response13 of13 the13 neuron13 is13 obtained13 by13 taking13 the13 dot13 product13 of13 the13 contrast-shy‐normalized13 signal13 with13 the13 receptive13 field13 and13 half-shy‐wave13 rectifying13

Ri = Rmax cnorm x( ) sdot fi x( )⎢⎣ ⎥⎦ 13 13

13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13

713 13

13 References13 Albrecht13 D13 G13 amp13 Geisler13 W13 S13 (1991)13 Motion13 selectivity13 and13 the13 contrast-shy‐response13 function13 of13

simple13 cells13 in13 the13 visual13 cortex13 Visual13 Neuroscience13 7(6)13 531ndash54613 Albrecht13 D13 G13 amp13 Hamilton13 D13 B13 (1982)13 Striate13 cortex13 of13 monkey13 and13 cat13 contrast13 response13

function13 Journal13 of13 Neurophysiology13 48(1)13 217ndash23713 Banks13 M13 S13 Gepshtein13 S13 amp13 Landy13 M13 S13 (2004)13 Why13 is13 spatial13 stereoresolution13 so13 low13 The13

Journal13 of13 Neuroscience13 24(9)13 2077ndash208913 13 Field13 D13 J13 (1987)13 Relations13 between13 the13 statistics13 of13 natural13 images13 and13 the13 response13 properties13 of13

cortical13 cells13 Journal13 of13 the13 Optical13 Society13 of13 America13 A13 4(12)13 2379ndash239413 Fincham13 E13 amp13 Walton13 J13 (1957)13 The13 reciprocal13 actions13 of13 accommodation13 and13 convergence13 The13

Journal13 of13 Physiology13 137(3)13 488ndash50813 Geisler13 W13 S13 amp13 Albrecht13 D13 G13 (1997)13 Visual13 cortex13 neurons13 in13 monkeys13 and13 cats13 detection13

discrimination13 and13 identification13 Visual13 Neuroscience13 14(5)13 897ndash91913 Harris13 J13 M13 McKee13 S13 P13 amp13 Smallman13 H13 S13 (1997)13 Fine-shy‐scale13 processing13 in13 human13 binocular13

stereopsis13 Journal13 of13 the13 Optical13 Society13 of13 America13 A13 14(8)13 1673ndash168313 Heeger13 D13 J13 (1992)13 Normalization13 of13 cell13 responses13 in13 cat13 striate13 cortex13 Visual13 Neuroscience13 9(2)13

181ndash19713 Hibbard13 P13 B13 (2008)13 Binocular13 energy13 responses13 to13 natural13 images13 Vision13 Research13 48(12)13 1427ndash

143913 13 Stockman13 A13 amp13 Sharpe13 L13 T13 (2000)13 The13 spectral13 sensitivities13 of13 the13 middle-shy‐13 and13 long-shy‐wavelength-shy‐

sensitive13 cones13 derived13 from13 measurements13 in13 observers13 of13 known13 genotype13 Vision13 Research13 40(13)13 1711ndash173713

Thibos13 L13 N13 Ye13 M13 Zhang13 X13 amp13 Bradley13 A13 (1992)13 The13 chromatic13 eye13 a13 new13 reduced-shy‐eye13 model13 of13 ocular13 chromatic13 aberration13 in13 humans13 Applied13 Optics13 31(19)13 3594ndash360013

Tolhurst13 D13 J13 Movshon13 J13 A13 amp13 Dean13 A13 F13 (1983)13 The13 statistical13 reliability13 of13 signals13 in13 single13 neurons13 in13 cat13 and13 monkey13 visual13 cortex13 Vision13 Research13 23(8)13 775ndash78513

Vlaskamp13 B13 N13 S13 Yoon13 G13 amp13 Banks13 M13 S13 (2011)13 Human13 stereopsis13 is13 not13 limited13 by13 the13 optics13 of13 the13 well-shy‐focused13 eye13 The13 Journal13 of13 Neuroscience13 31(27)13 9814ndash981813

13

813 13

13 Figure13 S113 Optimal13 filters13 and13 disparity13 estimation13 performance13 for13 estimating13 disparity13 for13 surfaces13 with13 a13 cosine13 distribution13 of13 slants13 (a)13 Optimal13 linear13 binocular13 filters13 for13 estimating13 the13 disparity13 (of13 the13 center13 pixel)13 of13 slanted13 surfaces13 The13 filters13 were13 learned13 on13 a13 training13 set13 having13 surfaces13 with13 a13 cosine13 distribution13 of13 slants13 The13 optimal13 filter13 shapes13 are13 robust13 to13 the13 presence13 of13 non-shy‐zero13 surface13 slants13 in13 the13 training13 set13 However13 these13 filters13 are13 of13 somewhat13 smaller13 spatial13 extent13 and13 slightly13 higher13 octave13 bandwidth13 (compare13 filters13 F3-shy‐F513 to13 those13 in13 Fig13 3)13 (b)13 Spatial13 frequency13 tuning13 and13 bandwidth13 of13 filters13 (c)13 Two-shy‐dimensional13 analogs13 of13 filters13 in13 (a)13 The13 two-shy‐dimensional13 filters13 respond13 virtually13 identically13 to13 two-shy‐dimensional13 images13 as13 do13 the13 one-shy‐dimensional13 filters13 to13 one-shy‐dimensional13 signals13 (d)13 Tuning13 curves13 of13 LL13 neurons13 trained13 and13 tested13 on13 surfaces13 with13 varying13 surface13 slant13 (e)13 Normalized13 weights13 on13 model13 complex13 cell13 responses13 for13 constructing13 five13 LL13 neurons13 with13 the13 indicated13 preferred13 disparities13 (f)13 Disparity13 estimation13 performance13 with13 surfaces13 having13 a13 cosine13 distribution13 of13 slants13 (black13 line)13 Also13 shown13 are13 disparity13 estimates13 derived13 from13 a13 local13 cross-shy‐correlator13 (gray13 line)13 (g)13 Disparity13 gradients13 from13 viewing13 natural13 scenes(Hibbard13 2008)13 vs13 disparity13 gradients13 in13 our13 training13 set13 The13 relationship13 between13 slant13 and13 disparity13 gradient13 is13 given13 by13

g asymp IPD tanθ( ) d 13 where13 θ 13 is13 the13 surface13 slant13 and13 d13 is13 the13 distance13 to13 the13 surface13 Hibbard13 (2008)13 estimated13 disparity13 gradients13 in13 natural13 viewing13 for13 distances13 nearer13 than13 3513 m13 (dashed13 curve)13 and13 0513 m13 (dotted13 curve)13 At13 the13 04m13 viewing13 distance13 assumed13 throughout13 the13 paper13 our13 manipulation13 of13 surface13 slant13 introduced13 distribution13 of13 disparity13 gradients13 (black13 line)13 that13 is13 comparable13 to13 the13 gradients13 introduced13 in13 natural13 viewing13 13

913 13

13 Figure13 S213 Filter13 responses13 LL13 neuron13 responses13 and13 performance13 with13 random-shy‐dot13 stereograms13 and13 anti-shy‐correlated13 random-shy‐dot13 stereograms13 (a)13 Joint13 response13 distributions13 of13 filters13 F113 and13 F213 when13 stimulated13 with13 random-shy‐dot13 stereograms13 Compare13 to13 natural13 stimuli13 in13 Fig13 4a13 (b)13 Joint13 responses13 of13 filters13 F313 and13 F413 when13 stimulated13 with13 random-shy‐dot13 stereograms13 (c)13 LL13 neuron13 responses13 to13 random13 dot13 stereograms13 (d)13 LL13 neuron13 tuning13 vs13 bandwidth13 when13 stimulated13 with13 natural13 stimuli13 Note13 that13 bandwidths13 are13 elevated13 compared13 to13 when13 the13 LL13 neurons13 are13 stimulated13 with13 natural13 stimuli13 (see13 Fig13 7c)13 (e)13 Disparity13 estimates13 and13 estimate13 precision13 (inset)13 obtained13 with13 random-shy‐dot13 stereograms13 (fghi)13 Same13 as13 abce13 except13 that13 the13 stimuli13 were13 anti-shy‐correlated13 random13 dot13 stereograms13 Filter13 response13 distributions13 still13 segregate13 as13 a13 function13 of13 disparity13 (especially13 with13 F313 and13 F4)13 However13 the13 anti-shy‐correlated13 filter13 response13 distributions13 do13 not13 coincide13 with13 the13 lsquoexpectedrsquo13 pattern13 of13 response13 for13 particular13 disparities13 The13 disparities13 are13 decoded13 improperly13 with13 a13 decoder13 trained13 on13 natural13 stimuli13 LL13 neuron13 tuning13 curves13 with13 anti-shy‐correlated13 stereograms13 are13 neither13 selective13 nor13 invariant13 Inaccurate13 highly13 variable13 estimates13 result13 13 13

13 13

1013 13

13 Figure13 S313 Obtaining13 AMA13 filter13 and13 model13 complex13 cell13 responses13 with13 established13 neural13 operations13 (a)13 Simple13 cell13 responses13 are13 obtained13 by13 projecting13 the13 left-shy‐13 and13 right-shy‐eye13 signals13 onto13 each13 binocular13 filter13 and13 half-shy‐wave13 rectifying13 Subtracting13 lsquoonrsquo13 and13 lsquooffrsquo13 simple13 cell13 responses13 give13 the13 AMA13 filter13 responses13 (see13 Fig13 3a)13 (b)13 Model13 (and13 neurophysiological)13 complex13 cell13 responses13 are13 obtained13 by13 summing13 and13 squaring13 the13 responses13 of13 simple13 cells13 (Model13 complex13 cell13 responses13 can13 also13 be13 obtained13 by13 summing13 the13 responses13 of13 simple13 cells13 that13 have13 both13 a13 half-shy‐wave13 rectifying13 and13 a13 squaring13 nonlinearity)13 Note13 that13 the13 model13 complex13 cells13 differ13 from13 canonical13 disparity13 energy13 binocular13 complex13 cells13 We13 label13 them13 lsquocomplexrsquo13 because13 they13 exhibit13 temporal13 frequency13 doubling13 and13 have13 a13 fundamental13 to13 DC13 response13 ratio13 of13 less13 than13 one13 The13 model13 LL13 neurons13 (see13 Fig13 613 and13 Fig13 7)13 are13 more13 similar13 to13 disparity13 energy13 complex13 cells13 although13 as13 noted13 in13 the13 text13 there13 are13 also13 important13 differences13

13

1113 13

13 Figure13 S413 The13 eight13 AMA13 filters13 from13 Fig13 313 (on13 the13 diagonal)13 and13 their13 pairwise13 sums13 (see13 Fig13 S3a)13 Solid13 and13 dashed13 curves13 indicate13 the13 left13 and13 right13 eye13 filter13 components13 respectively13 Filter13 identity13 is13 indicated13 by13 the13 subscripts13 Fij13 Some13 filters13 have13 shapes13 similar13 to13 canonical13 V1-shy‐like13 binocular13 receptive13 fields13 These13 filters13 yield13 model13 complex13 cell13 tuning13 curves13 that13 vary13 with13 disparity13 (Fig13 S5)13 Other13 filters13 are13 irregularly13 shaped13 (eg13 F36)13 Irregularly13 shaped13 filters13 yield13 model13 complex13 cell13 tuning13 curves13 that13 do13 not13 vary13 significantly13 with13 disparity13 (Fig13 S5)13 The13 inset13 shows13 the13 largest13 absolute13 normalized13 weight13 (see13 Supplement13 Eqs13 S1a-shy‐d)13 given13 to13 each13 filter13 in13 constructing13 the13 set13 of13 LL13 neurons13 (Fig13 7a)13 Lighter13 colors13 indicate13 smaller13 weights13 Unsurprisingly13 the13 irregularly13 shaped13 filters13 (eg13 F36)13 play13 a13 generally13 smaller13 role13 in13 shaping13 the13 response13 properties13 of13 log-shy‐likelihood13 (LL)13 neurons13 13 13

1213 13

13 Figure13 S513 Disparity13 tuning13 curves13 of13 the13 model13 complex13 cells13 implied13 by13 the13 AMA13 filters13 (on-shy‐diagonal)13 and13 their13 normalized13 pairwise13 sums13 (off-shy‐diagonal)13 Solid13 curves13 show13 the13 complex13 cell13 tuning13 curves13 to13 natural13 stimuli13 The13

complex13 response13 is13 given13 by13 filtering13 the13 contrast-shy‐normalized13 stimulus13 and13 then13 squaring13 Rmax fi

Tcnorm( )2 13 (see13 Fig13 S4)13 Each13 point13 on13 the13 curve13 represents13 the13 average13 response13 to13 a13 large13 number13 of13 natural13 stimuli13 having13 the13 same13 disparity13 Gray13 area13 shows13 response13 variability13 due13 to13 variation13 in13 natural13 image13 content13 Dashed13 curves13 show13 the13 complex13 cell13 tuning13 curve13 to13 513 random-shy‐dot13 stereograms13 (RDSs)13 In13 most13 cases13 RDS13 tuning13 curves13 are13 similar13 to13 the13 tuning13 curves13 for13 natural13 stimuli13 although13 response13 magnitude13 is13 generally13 somewhat13 attenuated13 These13 individual13 model13 complex13 cells13 have13 some13 selectivity13 for13 disparity13 However13 they13 exhibit13 little13 invariance13 to13 variation13 in13 natural13 image13 content13 Thus13 the13 pattern13 of13 the13 joint13 population13 response13 must13 be13 used13 to13 accurately13 estimate13 disparity13 Neurons13 that13 signal13 the13 log-shy‐likelihood13 of13 disparity13 (LL13 neurons)13 can13 be13 constructed13 by13 appropriately13 combining13 the13 joint13 responses13 of13 model13 complex13 cells13 LL13 neurons13 are13 highly13 selective13 for13 disparity13 and13 are13 also13 strongly13 response13 invariant13 to13 irrelevant13

1313 13

13 Figure13 S613 Optimal13 disparity13 estimation13 vs13 estimation13 via13 local13 cross-shy‐correlation13 Performance13 measures13 were13 obtained13 by13 testing13 each13 method13 on13 identical13 sets13 of13 left-shy‐13 and13 right-shy‐eye13 signals13 like13 those13 shown13 in13 Fig13 1d13 (Note13 that13 this13 differs13 somewhat13 from13 the13 most13 common13 test13 of13 the13 cross-shy‐correlation13 model13 which13 gives13 the13 algorithm13 windowed13 access13 to13 many13 patches13 in13 each13 eye13 We13 chose13 the13 current13 method13 so13 that13 each13 algorithm13 has13 access13 to13 the13 same13 image13 information13 thereby13 making13 the13 comparison13 meaningful)13 Black13 curves13 indicate13 estimation13 performance13 via13 local13 cross-shy‐correlation13 Gray13 curves13 indicate13 performance13 obtained13 with13 the13 eight13 AMA13 filters13 shown13 in13 the13 main13 text13 (data13 replotted13 from13 Fig13 5)13 (a)13 Median13 MAP13 estimates13 for13 frontoparallel13 surfaces13 only13 Error13 bars13 indicate13 6813 confidence13 intervals13 (b)13 Estimate13 precision13 as13 indicated13 by13 6813 confidence13 intervals13 Error13 bars13 from13 (a)13 are13 replotted13 on13 a13 linear13 axis13 (Confidence13 intervals13 were13 plotted13 on13 a13 semi-shy‐log13 plot13 in13 the13 main13 text)13 (c)13 Sign13 identification13 performance13 with13 frontoparallel13 surfaces13 (de)13 Same13 as13 bc13 except13 that13 the13 performance13 is13 for13 surfaces13 with13 a13 cosine13 distribution13 of13 slants13 Estimation13 performance13 with13 the13 cross-shy‐correlator13 is13 (slightly)13 more13 biased13 and13 is13 significantly13 less13 precise13 at13 disparities13 larger13 than13 7513 arcmin13 13

1413 13

13 Figure13 S713 Information13 for13 disparity13 estimation13 is13 concentrated13 in13 low13 to13 mid13 spatial13 frequencies13 (a)13 Inter-shy‐ocular13 retinal13 contrast13 amplitude13 of13 a13 cosine-shy‐windowed13 high-shy‐contrast13 edge13 as13 a13 function13 of13 spatial13 frequency13 for13 different13 disparities13 (Eq13 S6)13 The13 optics13 in13 the13 two13 eyes13 were13 identical13 ie13

AL f( ) = AR f( )prop MTFδ f( ) f 13 where13 MTFδ is13 the13 modulation13 transfer13 function13 associated13 with13 a13 particular13 disparity13 (see13 Methods)13 (b)13 Inter-shy‐ocular13 contrast13 signals13 that13 have13 been13 passed13 through13 a13 bank13 of13 1513 octave13 bandwidth13 log-shy‐Gabor13 filters13 The13 signals13 differ13 little13 above13 ~513 cpd13 (c)13 The13 effect13 of13 vergence13 jitter13 on13 inter-shy‐ocular13 contrast13 signals13 Signals13 were13 corrupted13 by13 213 arcmin13 of13 vergence13 jitter13 a13 typical13 value13 in13 humans13 Vergence13 noise13 was13 simulated13 by13 convolving13 each13 disparity13 signal13 with13 a13 Gaussian13 having13 a13 standard13 deviation13 of13 213 arcmin13 Vergence13 jitter13 further13 attenuates13 high13 frequency13 inter-shy‐ocular13 contrast13 signals13 Natural13 image13 structure13 and13 vergence13 jitter13 both13 contribute13 to13 the13 lack13 of13 disparity13 information13 at13 high13 luminance13 spatial13 frequencies13 (d)13 The13 signals13 in13 (c)13 after13 having13 been13 passed13 through13 a13 bank13 of13 1513 octave13 bandwidth13 log-shy‐Gabor13 filters13 (e)13 The13 effect13 of13 a13 four13 diopter13 difference13 in13 refractive13 power13 between13 the13 two13 eyes13 drastically13 reduces13 the13 inter-shy‐ocular13 contrast13 signal13 A13 four13 diopter13 (or13 larger)13 difference13 between13 the13 left13 and13 right13 optics13 is13 a13 primary13 risk13 factor13 for13 the13 development13 of13 amblyopia13 (Levi13 et13 al13 2011)13 Note13 that13 these13 signals13 were13 simulated13 for13 eyes13 with13 2mm13 pupils13 With13 larger13 pupil13 diameters13 the13 same13 difference13 in13 refractive13 power13 would13 further13 reduce13 available13 disparity13 signals13 For13 example13 with13 a13 2mm13 pupil13 an13 813 diopter13 difference13 between13 the13 eyes13 produces13 the13 same13 difference13 in13 defocus13 blur13 as13 would13 a13 4mm13 pupil13 with13 a13 413 diopters13 difference13 between13 the13 eyes13 In13 that13 case13 there13 are13 no13 measureable13 signals13 above13 113 cpd13 (f)13 The13 signals13 in13 (e)13 after13 having13 been13 passed13 through13 a13 bank13 of13 1513 octave13 bandwidth13 log-shy‐Gabor13 filters13 (g)13 Same13 as13 (e)13 but13 with13 a13 4mm13 pupil13 (h)13 Same13 as13 (f)13 but13 with13 a13 4mm13 pupil13 13

13

1513 13

13 Figure13 S813 Depth13 structure13 not13 attributable13 to13 slant13 in13 natural13 scenes13 (a)13 A13 representative13 natural13 range13 image13 (b)13 Overhead13 plan13 view13 of13 the13 range13 along13 the13 white13 scan-shy‐line13 in13 (a)13 (c)13 Range13 values13 within13 a13 113 deg13 patch13 marked13 by13 the13 white13 box13 in13 (a)13 and13 the13 black13 box13 in13 (b)13 The13 dots13 represent13 the13 raw13 range13 values13 The13 plane13 is13 the13 best13 fitting13 plane13 to13 the13 range13 data13 (d)13 Depth13 variation13 in13 113 deg13 patches13 of13 natural13 scenes13 can13 largely13 be13 captured13 by13 slanted13 planes13 (see13 supplementary13 text)13 Values13 near13 one13 indicate13 that13 all13 range13 variation13 can13 be13 captured13 by13 (planar)13 surface13 slant13 Values13 near13 zero13 indicate13 that13 none13 of13 the13 range13 variation13 can13 be13 captured13 by13 planar13 surface13 slant13 In13 the13 most13 common13 patch13 nearly13 all13 depth13 variation13 is13 attributable13 to13 slant13 In13 the13 median13 patch13 7513 of13 depth13 variation13 is13 captured13 by13 slant13 13 13 13 13 13 13 13 13 13 13

1613 13

13 Figure13 S913 Contrast13 normalization13 and13 the13 Gaussian13 form13 of13 the13 filter13 response13 distributions13 The13 standard13 model13 of13

contrast13 normalization13 in13 retinal13 and13 cortical13 neurons13 is13 given13 by13 cnorm x( ) = c x( ) c x( ) 2

+ nc502 (Albrecht13 amp13 Geisler13

199113 Albrecht13 amp13 Hamilton13 198213 Heeger13 1992)13 Throughout13 the13 paper13 we13 assumed13 that13 c50 13 had13 a13 value13 of13 zero13 To13 examine13 the13 effect13 of13 different13 values13 of13 c50 13 on13 the13 form13 of13 the13 filter13 response13 distributions13 we13 first13 contrast-shy‐normalized13 the13 test13 stimuli13 using13 different13 values13 of13 c50 13 Then13 we13 projected13 the13 contrast-shy‐normalized13 the13 test13 stimuli13 from13 each13 disparity13 level13 onto13 the13 filters13 to13 obtain13 the13 conditional13 filter13 response13 distributions13 Finally13 we13 fit13 a13 multi-shy‐dimensional13 generalized13 Gaussian13 (one-shy‐dimension13 for13 each13 filter)13 to13 each13 conditional13 response13 distribution13

P R |δ k( ) 13 via13 maximum13 likelihood13 estimation13 (a)13 When13 c50 =13 0013 the13 conditional13 response13 distributions13 have13 tails13 that13 tend13 to13 be13 somewhat13 lighter13 than13 Gaussians13 (ie13 lower13 kurtosis13 than13 a13 Gaussian)13 When13 c50 =13 0113 the13 conditional13 response13 distributions13 are13 most13 Gaussian13 on13 average13 For13 c50 0113 the13 distributions13 have13 heavier13 tails13 than13 Gaussians13 (ie13 higher13 kurtosis13 than13 a13 Gaussian)13 (b)13 Powers13 of13 the13 multi-shy‐dimensional13 generalized13 Gaussians13 that13 best13 fit13 a13 subset13 of13 the13 two-shy‐dimensional13 conditional13 filter13 response13 distributions13 A13 power13 of13 two13 (dashed13 line)13 indicates13 that13 the13 best13 fitting13 distribution13 was13 Gaussian13 Powers13 greater13 than13 or13 less13 than13 two13 indicate13 that13 the13 responses13 are13 better13 fit13 by13 distributions13 having13 lighter13 or13 heavier13 tails13 than13 a13 Gaussian13 respectively13 The13 best-shy‐fitting13 powers13 are13 shown13 for13 three13 different13 values13 of13 c50 13 00013 01013 and13 02513 The13 average13 powers13 across13 the13 conditional13 response13 distributions13 for13 the13 three13 tested13 values13 of13 c50 13 were13 27913 20713 and13 11713 respectively13 The13 same13 qualitative13 results13 hold13 for13 the13 other13 two-shy‐dimensional13 response13 distributions13 The13 pattern13 also13 holds13 for13 higher-shy‐dimensional13 filter13 response13 distributions13 In13 higher13 dimensions13 however13 results13 are13 noisier13 because13 stable13 estimates13 of13 kurtosis13 require13 significantly13 more13 data13 13 13 13 13 13

  • jovi-14-02-01_118
    • Introduction
    • f01
    • Methods and results
    • e01
    • f02
    • e02
    • e03
    • e04
    • f03
    • e05
    • f04
    • f05
    • f06
    • f07
    • Discussion
    • Conclusions
    • Methods details
    • e06
    • e07
    • Albrecht1
    • Albrecht2
    • Anzai1
    • Badcock1
    • Banks1
    • Berens1
    • Blakemore1
    • Burge1
    • Burge2
    • Carandini1
    • Chen1
    • Cormack1
    • Cumming1
    • Cumming2
    • Cumming3
    • Curcio1
    • DeValois1
    • DeAngelis1
    • Ferster1
    • Fincham1
    • Fleet1
    • Ganguli1
    • Geisler1
    • Girshick1
    • Haefner1
    • Haefner2
    • Harris1
    • Hecht1
    • Heeger1
    • Hibbard1
    • Hoyer1
    • Jazayeri1
    • Kaye1
    • Landers1
    • Lehky1
    • Levi1
    • Lewicki1
    • Li1
    • Liu1
    • Ma1
    • Marcos1
    • Marr1
    • McKee1
    • Nakayama1
    • Nienborg1
    • Ogle1
    • Ohzawa1
    • Ohzawa2
    • Ohzawa3
    • Olshausen1
    • Panum1
    • Poggio1
    • Qian1
    • Qian2
    • Read1
    • Read2
    • Read3
    • Simoncelli1
    • Skottun1
    • Stevenson1
    • Stockman1
    • Tanabe1
    • Tanabe2
    • Thibos1
    • Tsao1
    • Tyler1
    • Tyler2
    • Vlaskamp1
    • Williams1
    • Wyszecki1
      • JOV-03741-2013-s01
Page 7: Optimal disparity estimation in natural stereo images · The local differences between the retinal images—the binocular disparities— ... vertically oriented receptive fields.The

ln gaussethRjdkTHORNfrac12 frac14Xnifrac141

wikRi thornXnifrac141

wiikR2i

thornXn1

ifrac141

Xnjfrac14ithorn1

wijkethRi thorn RjTHORN2

thorn const0 eth5THORNwhere Ri is the response of the ith AMA filter and wikwiik and wijk are the weights for disparity dk Theweights are simple functions of the mean and covari-ance matrices from Equation 4 (see Supplement) Thusa neuron which responds according to the loglikelihood of a given disparity (an LL neuron)mdashthat isRLL

k frac14 ln[gauss(Rjdk)]mdashcan be obtained by weightedsummation of the linear (first term) squared (secondterm) and pairwise-sum-squared (third term) AMAfilter responses (Equation 5) The implication is that alarge collection of LL neurons each with a differentpreferred disparity can be constructed from a smallfixed set of linear filters simply by changing the weightson the linear filter responses squared-filter responsesand pairwise-sum-squared filter responses

A potential concern is that these computations couldnot be implemented in cortex AMA filters are strictlylinear and produce positive and negative responses(Figure 6a) whereas real cortical neurons produce onlypositive responses However the response of eachAMA filter could be obtained by subtracting theoutputs of two half-wave rectified simple cells that arelsquolsquoonrsquorsquo and lsquolsquooffrsquorsquo versions of each AMA filter (seeSupplement Figure S3a) The squared and pairwise-

sum-squared responses are modeled as resulting from alinear filter followed by a static squaring nonlinearity(Figure 6b) these responses could be obtained fromcortical binocular complex cells (see Supplement)(Note that our lsquolsquocomplex cellrsquorsquo differs from thedefinition given by some prominent expositors of thedisparity energy model [Figure 6d] Cumming ampDeAngelis 2001 Ohzawa 1998 Qian 1997) Thesquared responses could be obtained by summing andsquaring the responses of on and off simple cells (seeSupplement Figure S3a b) Finally the LL neuronresponse could be obtained via a weighted sum of theAMA filter and model complex cell responses (Figure6c) Thus all the optimal computations are biologicallyplausible

Figure 6 shows processing schematics disparitytuning curves and response variability for the threefilter types implied by our analysis an AMA filter amodel complex cell a model LL neuron and forcomparison a standard disparity energy neuron(Cumming amp DeAngelis 2001) (The model complexcells are labeled lsquolsquocomplexrsquorsquo because in general theyexhibit temporal frequency doubling and a fundamen-tal to mean response ratio that is less than 10 Skottunet al 1991) Response variability due to retinal imagefeatures that are irrelevant for estimating disparity isindicated by the gray area the smaller the gray areathe more invariant the neural response to irrelevantimage features The AMA filter is poorly tuned todisparity and gives highly variable responses todifferent stereo-image patches with the same disparity

Figure 4 Joint filter response distributions conditioned on disparity for filters F1 and F2 (see Figure 3a) (a) Joint filter responses to

each of the 7600 image patches in the training set Different colors and symbols denote different disparity levels Contours show

Gaussian fits to the conditional filter response distributions The black curves on the x and y axes represent the marginal response

distributions p(R1) and p(R2) (b) Posterior probability distributions averaged across all stimuli at each disparity level if only filters F1

and F2 are used (dotted curves) and if all eight filter responses are used (solid curves) Using eight AMA filters instead of two

increases disparity selectivity Shaded areas represent 68 confidence intervals on the posterior probabilities this variation is due to

natural stimulus variation that is irrelevant for estimating disparity Natural stimulus variation thus creates response variability even in

hypothetical populations of noiseless neurons

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 7

(Figure 6a) The model complex cell is better tuned butit is not unimodal and its responses also vary severelywith irrelevant image information (Figure 6b) (Thedisparity tuning curves for all model complex cells areshown in Figure S5) In contrast the LL neuron issharply tuned is effectively unimodal and is stronglyresponse invariant (Figure 6c) That is it respondssimilarly to all natural image patches of a givendisparity The disparity tuning curves for a range of LLneurons are shown in Figure 7a When slant variessimilar but broader LL neuron tuning curves result(Figure S1d e)

These results show that it is potentially misleading torefer to canonical V1 binocular simple and complexcells as disparity tuned because their responses aretypically as strongly modulated by variations incontrast pattern as they are by variations in disparity(gray area Figure 6a b) The LL neurons on the otherhand are tuned to a narrow range of disparities andrespond largely independent of the spatial frequencycontent and contrast

The LL neurons have several interesting propertiesFirst their responses are determined almost exclusivelyby the model complex cell inputs because the weightson the linear responses (Equation 5 Figure 6a c) aregenerally near zero (see Supplement) In this regard theLL neurons are consistent with the predictions of thestandard disparity energy model (Cumming amp DeAn-gelis 2001 Ohzawa 1998) However standard dis-

parity energy neurons are not as narrowly tuned or asinvariant (Figure 6d)

Second each LL neuron receives strong inputs frommultiple complex cells (Figure 7b) In this regard theLL neurons are inconsistent with the disparity energymodel which proposes that disparity-tuned cells areconstructed from two binocular subunits The potentialvalue of more than two subunits has been previouslydemonstrated (Qian amp Zhu 1997) Recently it hasbeen shown that some disparity-tuned V1 cells areoften modulated by a greater number of binocularsubunits than two Indeed as many as 14 subunits candrive the activity of a single disparity selective cell(Tanabe Haefner amp Cumming 2011)

Third the weights on the model complex cells(Figure 7b)mdashwhich are determined by the conditionalresponse distributions (Figure 4a)mdashspecify how infor-mation at different spatial frequencies should becombined

Fourth as preferred disparity increases the numberof strong weights on the complex cell inputs decreases(Figure 7b) This occurs because high spatial frequen-cies are less useful for encoding large disparities (seebelow) Thus if cells exist in cortex that behavesimilarly to LL neurons the number and spatialfrequency tuning of binocular subunits driving theirresponse should decrease as a function of theirpreferred disparity

Fifth for all preferred disparities the excitatory(positive) and inhibitory (negative) weights are in a

Figure 5 Accuracy and precision of disparity estimates on test patches (a) Disparity estimates of fronto-parallel surfaces displaced

from fixation using the filters in Figure 3 Symbols represent the median MAP readout of posterior probability distributions (see

Figure 4b) Error bars represent 68 confidence intervals on the estimates Red boxes mark disparity levels not in the training set

Error bars at untrained levels are no larger than at the trained levels indicating that the algorithm makes continuous estimates (b)

Precision of disparity estimates on a semilog axis Symbols represent 68 confidence intervals (same data as error bars in Figure 5a)

Human discrimination thresholds also rise exponentially as stereo stimuli are moved off the plane of fixation The gray area shows the

hyperacuity region (c) Sign identification performance as a function of disparity (d) (e) Same as in (b) (c) except that data is for

surfaces with a cosine distribution of slants

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 8

classic push-pull relationship (Ferster amp Miller 2000)(Figure 7b) consistent with the fact that disparityselective neurons in visual cortex contain both excit-atory and suppressive subunits (Tanabe et al 2011)

Sixth the LL neuron disparity tuning curves areapproximately log-Gaussian in shape (ie Gaussian ona log-disparity axis) Because their standard deviationsare approximately constant on a log disparity axistheir disparity bandwidths increase linearly with thepreferred disparity (Figure 7c) In this respect many

cortical neurons behave like LL neurons the disparitytuning functions of cortical neurons typically havemore low frequency power than is predicted from thestandard energy model (Ohzawa DeAngelis amp Free-man 1997 Read amp Cumming 2003) Additionallypsychophysically estimated disparity channels in hu-mans exhibit a similar relationship between bandwidthand preferred disparity (Stevenson et al 1992) Humandisparity channels however differ in that they haveinhibitory side-lobes (Stevenson et al 1992) that is

Figure 6 Biologically plausible implementation of selective invariant tuning processing schematics and disparity tuning curves for an

AMA filter a model complex cell and a model log-likelihood (LL) neuron For comparison a disparity energy unit is also presented In

all cases the inputs are contrast normalized photoreceptor responses Disparity tuning curves show the mean response of each filter

type across many natural image patches having the same disparity for many different disparities Shaded areas show response

variation due to variation in irrelevant features in the natural patches (not neural noise) Selectivity for disparity and invariance to

irrelevant features (external variation) increase as processing proceeds (a) The filter response is obtained by linearly filtering the

contrast normalized input signal with the AMA filter (b) The model complex cell response is obtained by squaring the linear AMA

filter response (c) The response of an LL neuron with preferred disparity dk is obtained by a weighted sum of linear and squared

filter responses The weights can be positiveexcitatory or negativeinhibitory (see Figure 7) The weights for an LL neuron with a

particular preferred disparity are specified by the filter response distribution to natural images having that disparity (Figure 4a

Equations 4 5 S1ndash 4) In disparity estimation the filter responses specify that the weights on the linear filter responses are near zero

(see Supplement) (d) A standard disparity energy unit is obtained by simply summing the squared responses of two binocular linear

filters that are in quadrature (908 out of phase with each other) Here we show the tuning curve of a disparity energy unit having

binocular linear filters (subunits) with left and right-eye components that are also 908 out of phase with each other (ie each

binocular subunit is selective for a nonzero disparity)

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 9

they have the center-surround organization that is ahallmark of retinal ganglion cell LGN and V1receptive fields Understanding the basis of this center-surround organization is an important direction forfuture work

V1 binocular neurons are unlikely to have receptivefields exactly matching those of the simple and complexcells implied by the optimal AMA filters but V1neurons are likely to span the subspace spanned by theoptimal filters It is thus plausible that excitatory andinhibitory synaptic weights could develop so that asubset of the neurons in V1 or in other cortical areassignal the log likelihood of different specific disparities(see Figure 7b) Indeed some cells in cortical areas V1V2 and V3V3a exhibit sharp tuning to the disparity ofrandom dot stimuli (Cumming 2002 Ohzawa et al1997 G F Poggio et al 1988 Read amp Cumming2003)

Computational models of estimation from neuralpopulations often rely on the assumption that eachneuron is invariant and unimodally tuned to thestimulus property of interest (Girshick et al 2011Jazayeri amp Movshon 2006 Lehky amp Sejnowski 1990Ma et al 2006) However it is often not discussed howinvariant unimodal tuning arises For example thebinocular neurons (ie complex cells) predicted by thestandard disparity energy model do not generallyexhibit invariant unimodal tuning (Figure 6d) Ouranalysis shows that neurons with unimodal tuning tostimulus properties not trivially available in the retinal

images (eg disparity) can result from appropriatelinear combination of nonlinear filter responses

To obtain optimal disparity estimates the LLneuron population response (represented by the vectorRLL in Figure 2c) must be read out (see Figure 2d)The optimal read-out rule depends on the observerrsquosgoal (the cost function) A common goal is to pick thedisparity having the maximum a posteriori probabil-ity If the prior probability of the different possibledisparities is uniform then the optimal MAP decodingrule reduces to finding the LL neuron with themaximum response Nonuniform prior probabilitiescan be taken into account by adding a disparity-dependent constant to each LL neuron responsebefore finding the peak response There are elegantproposals for how the peak of a population responsecan be computed in noisy neural systems (Jazayeri ampMovshon 2006 Ma et al 2006) Other commonlyassumed cost functions (eg MMSE) yield similarperformance

In sum our analysis has several implications First itsuggests that optimal disparity estimation is bestunderstood in the context of a population codeSecond it shows how to linearly sum nonlinear neuralresponses to construct cells with invariant unimodaltuning curves Third it suggests that the eclecticmixture of binocular receptive field properties in cortexmay play a functional role in disparity estimationFourth it provides a principled hypothesis for howneurons may compute the posterior probabilities of

Figure 7 Constructing selective invariant disparity-tuned units (LL neurons) (a) Tuning curves for several LL neurons each with a

different preferred disparity Each point on the tuning curve represents the average response across a collection of natural stereo-

images having the same disparity Gray areas indicate 61 SD of response due to stimulus-induced response variability (b)

Normalized weights on model complex cell responses (see Supplement Equation 5 Figure 6c) for constructing the five LL neurons

marked with arrows in (a) Positive weights are excitatory (red) Negative weights are inhibitory (blue) On-diagonal weights

correspond to model complex cells having linear receptive fields like the filters in Figure 3a Off-diagonal weights correspond to model

complex cells having linear receptive fields like the scaled pairwise sums of the AMA filters (see Figures S3ndashS5) High spatial

frequencies are not useful for estimating large disparities (Figure S7) Thus the number of strongly weighted complex cells decreases

as the magnitude of the preferred disparity increases from zero (c) LL neuron bandwidth (ie full-width at half-height of disparity

tuning curves) as a function of preferred disparity Bandwidth increases approximately linearly with tuning

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 10

stimulus properties (eg disparity defocus motion)not trivially available in the retinal image(s) Thus ouranalysis provides a recipe for how to increase selectivityand invariance in the visual processing stream invari-ance to image content variability and selectivity for thestimulus property of interest

Discussion

Using principles of Bayesian statistical decisiontheory we showed how to optimally estimate binoc-ular disparity in natural stereo images given theoptical sensor and noise properties of the humanvisual system First we determined the linear binoc-ular filters that encode the retinal image informationmost useful for estimating disparity in natural stereo-images The optimal linear filters share many proper-ties with binocular simple cells in the primary visualcortex of monkeys and cats (Figure 3 Figure S1)Then we determined how the optimal linear filterresponses should be nonlinearly decoded to maximizethe accuracy of disparity estimation The optimaldecoder was not based on prior assumptions butrather was dictated by the statistical pattern of thejoint filter responses to natural stereo images Overallperformance was excellent and matched importantaspects of human performance (Figure 5) Finally weshowed that the optimal encoder and decoder can beimplemented with well-established neural operationsThe operations show how selectivity and invariancecan emerge along the visual processing stream(Figures 6 amp 7)

External variability and neural noise

A visual systemrsquos performance is limited by bothexternal (stimulus-induced) variability and intrinsicneural response variability (ie neural noise) Manytheoretical studies have focused on the impact ofneural noise on neural computation (Berens EckerGerwinn Tolias amp Bethge 2011 Cumming ampDeAngelis 2001 DeAngelis et al 1991 Jazayeri ampMovshon 2006 Ma et al 2006 Nienborg et al 2004Ohzawa DeAngelis amp Freeman 1990) However inmany real-world situations stimulus-induced responsevariability (ie variability due to nuisance stimulusproperties) is a major or the dominant source ofvariability Our analysis demonstrates that the exter-nal variation in natural image contentmdashthat isvariation in image features irrelevant to the taskmdashconstitutes an important source of response variabilityin disparity estimation (see Figures 4 5 7) Thus evenin a hypothetical visual system composed of noiseless

neurons significant response variability will occurwhen estimating disparity under natural conditions(see Figures 4 and 5)

Determining the best performance possible withoutneural noise is essential for understanding the effect ofneural noise (Haefner amp Bethge 2010 Lewicki 2002Li amp Atick 1994 Olshausen amp Field 1996 Simoncelliamp Olshausen 2001) because it sets bounds on theamount of neural noise that a system can toleratebefore performance significantly deteriorates Indeedthe impact of neural noise on encoding and decoding innatural viewing cannot be evaluated without consider-ing the impact of stimulus-induced response variabilityon the population response (see Supplement)

Seventy years ago Hecht Shlaer and Pirenne (1942)showed in a landmark study that dark-adapted humanscan detect light in the periphery from the absorption ofonly five to eight quanta and that external variability(in the form of photon noise) was the primary source ofvariability in human performance This result contra-dicted the notion (widespread at the time) that responsevariability is due primarily to variability within theorganism (eg intrinsic noise) Similarly our resultsunderscore the importance of analyzing the variabilityof natural signals when analyzing the responsevariability of neural circuits that underlie performancein critical behavioral tasks

Comparison with the disparity energy model

The most common neurophysiological model ofdisparity processing is the disparity energy modelwhich aims to account for the response properties ofbinocular complex cells (Cumming amp DeAngelis 2001DeAngelis et al 1991 Ohzawa 1998 Ohzawa et al1990) The model had remarkable initial successHowever an increasing number of discrepancies haveemerged between the neurophysiology and the modelrsquospredictions The standard disparity energy model forexample does not predict the reduced response toanticorrelated stereograms observed in cortex It alsodoes not predict the dependence of disparity selectivecells on more than two binocular subunits Fixes to thestandard model have been proposed to account for theobserved binocular response properties (Haefner ampCumming 2008 Qian amp Zhu 1997 Read Parker ampCumming 2002 Tanabe amp Cumming 2008) We haveshown however that a number of these properties area natural consequence of an optimal algorithm fordisparity estimation

The most common computational model of disparityprocessing is the local cross-correlation model (Bankset al 2004 Cormack et al 1991 Tyler amp Julesz 1978)In this model the estimated disparity is the one thatproduces the maximum local cross-correlation between

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 11

the images in the two eyes Local cross-correlation isthe computational equivalent of appropriately pro-cessing large populations of disparity-energy complexcells having a wide range of tuning characteristics(Anzai Ohzawa amp Freeman 1999 Fleet Wagner ampHeeger 1996) Cross-correlation is optimal when thedisparities across a patch are small and constantbecause it is then akin to the simple template matchingof noise-limited ideal observers Cross-correlation isnot optimal for larger or varying disparities becausethen the left- and right-eye signals falling within abinocular receptive field often differ substantially

How does our ideal observer for disparity estimationcompare with the standard disparity-energy and cross-correlation models To enable a meaningful compari-son both methods were given access to the same imageinformation Doing so requires a slight modification tothe standard cross correlation algorithm (seeSupplement) The eight AMA filters in Figure 3outperform cross-correlation in both accuracy andprecision when the disparities exceed approximately75 arcmin (Figure S6) This performance increase isimpressive given that local cross-correlation has (ineffect) access to a large bank of filters whereas ourmethod uses only eight Furthermore increasing thenumber of AMA filters to 10ndash12 gives essentiallyequivalent performance to cross-correlation for smallerdisparities

Spatial frequency tuning of optimal linearbinocular filters

The optimal linear binocular filters (the AMA filters)are selective for low to mid spatial frequencies (12ndash47c8) and not for higher frequencies (see Figure 3 FigureS1) To develop an intuition for why we examined thebinocular contrast signals as a function of frequencythat result from a binocularly viewed high-contrastedge (Figure S7) The binocular contrast signals barelydiffer above 6 c8 indicating that higher spatialfrequencies carry little information for disparity esti-mation

This analysis of binocular contrast signals providesinsight into another puzzling aspect of disparityprocessing Human sensitivity to disparity modulation(ie sinusoidal modulations in disparity-defined depth)cuts off at very low frequencies (4 c8) (Banks et al2004 Tyler 1975) V1 binocular receptive fields tunedto the highest useful (luminance) spatial frequency (6c8) have widths of 8 arcmin assuming the typicaloctave bandwidth for cortical neurons Given thatreceptive fields cannot signal variation finer than theirown size the fact that humans cannot see disparitymodulations higher than 4 c8 is nicely predicted bythe 8 arcmin width suggested by the present analysis

This size matches previous psychophysically basedestimates of the smallest useful mechanism in disparityestimation (Banks et al 2004 Harris McKee ampSmallman 1997)

Eye movement jitter has previously been proposed asan explanation for why useful binocular information isrestricted to low spatial frequencies (Vlaskamp Yoonamp Banks 2011) The argument is that jitter and therelatively long integration time of the stereo systemcould lsquolsquosmear outrsquorsquo the disparity signals Eye movementjitter certainly can degrade stereo information enoughto render high frequency signals unmeasurable (FigureS7c d) However the analysis in the supplement(Figure S7) suggests that the low frequency at whichhumans lose the ability to detect disparity modulationmay also be explained by the statistics of naturalimages

Generality of findings

Although our analysis is based on calibrated naturalstereo-image patches and realistic characterizations ofhuman optics sensors and neural noise there are somesimplifying assumptions that could potentially limit thegenerality of our conclusions Here we examine theeffect of these assumptions

The effect of surface slant and nonplanar depth structure

The stereo-image patches were projections offronto-parallel surfaces In natural scenes surfaces areoften slanted which causes the disparity to changeacross a patch We evaluated the effect of surface slantby repeating our analysis with a training set having adistribution of slants (see Figure 5d e andSupplement) This distribution of slants produced adistribution of disparity gradients that are comparableto those that occur when viewing natural scenes(Hibbard 2008) (Figure S1g) The optimal binocularfilters are only very slightly affected by surface slant(cf Figure 3 and Figure S1a) These differenceswould be difficult to detect in neurophysiologicalexperiments This may help explain why there hasbeen little success in finding neurons in early visualcortex that are tuned to nonzero slants (Nienborg etal 2004) Disparity estimation performance is alsoquite similar although estimate precision is somewhatreduced (Figure 5d e)

Even after including the effect of surface slant ourtraining set lacks the non-planar depth structuremdashwithin-patch depth variation and depth discontinuities(occlusions)mdashthat is present in natural scenes (FigureS8andashc) To determine how the magnitude of theseother sources depth variation compare to that due toslant we analyzed a set of range images obtained with

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 12

a high-precision Riegl VZ400 range scanner (seeSupplement) We find that most of the depth variationin the range images is captured by variation in slant(Figure S8d) Given that variation in slant has littleeffect on the optimal receptive fields and given thatlocations of fine depth structure and monocularocclusion zones (9 of points in our dataset) aremost likely random with respect to the receptive fieldsthen it is likely that the other sources of variation willhave little effect

Nevertheless occlusions create monocular zones thatcarry information supporting da Vinci stereopsis inhumans (Kaye 1978 Nakayama amp Shimojo 1990)Neurophysiological evidence suggests that binocularmechanisms exist for locating occlusion boundaries(Tsao Conway amp Livingstone 2003) In the futurewhen databases become available of high-qualitynatural stereo images with coregistered RGB anddistance information the general approach describedhere may prove useful for determining the filters thatare optimal for detecting occlusion boundaries

The effect of a realistic disparity prior

Our analysis assumed a flat prior probabilitydistribution over disparity Two groups have recentlypublished estimates of the prior distribution ofdisparities based on range measurements in naturalscenes and human eye movement statistics (Hibbard2008 Liu et al 2008) Does the prior distribution ofdisparity signals have a significant effect on the optimalfilters and estimation performance To check wemodified the frequency of different disparities in ourtraining set to match the prior distribution ofdisparities encountered in natural viewing (Liu et al2008) Linear filter shapes LL neuron tuning curveshapes and performance levels were robust to differ-ences between a flat and realistic disparity prior

It has been hypothesized that the prior over thestimulus dimension of interest (eg disparity) maydetermine the optimal tuning curve shapes and theoptimal distribution of peak tunings (ie how thetuning curves should tile the stimulus dimension)(Ganguli amp Simoncelli 2010) In evaluating thishypothesis one must consider the results presented inthe present paper Our results show that natural signalsand the linear receptive fields that filter those signalsplace strong constraints on the shapes that tuningcurves can have Specifically the shapes of the LLneuron disparity tuning curve (approximately log-Gaussian) are robust to changes in the prior Thusalthough the prior may influence the optimal distribu-tion of peak tuning (our analysis does not address thisissue) it is unlikely to be the sole (or even the primary)determinant of tuning curve shapes

The effect of contrast normalization

Contrast normalization significantly contributes tothe Gaussian form of the filter response distributionsOne potential advantage of Gaussian response distri-butions is that pair-wise (and lower order) statisticsfully characterize the joint responses from an arbitrarynumber of filters making possible the decoding of largefilter population responses In the analysis presentedhere all binocular signals were normalized to a mean ofzero and a vector magnitude of 10 before beingprojected onto the binocular filters This is a simplifiedform of the contrast normalization that is a ubiquitousfeature of retinal and early cortical processing (Car-andini amp Heeger 2012) The standard model ofcontrast normalization in cortical neurons is given by

cnorm(x)frac14 c(x)ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffijjcethxTHORNjj2 thorn nc2

50

qwhere n is the dimen-

sionality of the vector and c50 is the half-saturationconstant (Albrecht amp Geisler 1991 Albrecht ampHamilton 1982 Heeger 1992) (The half-saturationconstant is so-called because in a neuron with anoutput squaring nonlinearity the response rate willequal half its maximum when the contrast of thestimulus equals the value of c50)

We examined the effect of different half-saturationconstants The optimal filters are robust to differentvalues of c50 The conditional response distributions arenot For large values of c50 the response distributionshave tails much heavier than Gaussians When c50frac14 00(as it did throughout the paper) the distributions arewell approximated but somewhat lighter-tailed thanGaussians The distributions (eg see Figure 4a) aremost Gaussian on average when c50 frac14 01 (Figure S9)On the basis of this finding we hypothesize a newfunction for cortical contrast normalization in additionto the many already proposed (Carandini amp Heeger2012) Contrast normalization may help create condi-tional filter response distributions that are Gaussianthereby making simpler the encoding and decoding ofhigh-dimensional subspaces of retinal image informa-tion

The effect of different optics in the two eyes

We modeled the optics of the two eyes as beingidentical whereas refractive power and monochromaticaberrations often differ somewhat between the eyes(Marcos amp Burns 2000) To evaluate the effect ofnormal optical variations we convolved the retinalprojections with point-spread functions previouslymeasured in the first authorrsquos left and right eyes (Burgeamp Geisler 2011) (The first author has normal opticalvariation) Then we determined optimal filters andperformance Filters and performance levels are similarto those in Figures 3 and 4 Thus our results are robustto typical optical differences between the eyes How-

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 13

ever if the optics in one eye is severely degradedrelative to the other then the optimal filters are quitedifferent and disparity estimation performance issignificantly poorer

Interestingly a persistent 4 diopter difference inrefractive power between the two eyes is the mostimportant risk factor for the development of amblyopia(Levi McKee amp Movshon 2011) a condition char-acterized by extremely poor or absent stereopsis(among other deficits) A four diopter differencebetween the left and right eye optics drastically reduces(ie nearly eliminates) computable interocular contrastsignals that covary systematically with disparity(Equation S6 Figure S7endashh)

Conclusions

The method presented here provides a prescriptionfor how to optimally estimate local depth cues fromimages captured by the photosensors in any visionsystem Our analysis of the depth cue of binoculardisparity provides insight into the computations andneural populations required for accurate estimation ofbinocular disparity in animals viewing natural scenesSpecifically by analyzing the information available atthe photoreceptors we improve upon existing compu-tational methods for solving the stereo correspondenceproblem and provide a normative account of a range ofestablished neurophysiological and psychophysicalfindings This study demonstrates the power ofcharacterizing the properties of natural signals that arerelevant for performing specific natural tasks A similarrecent analysis provided insight into the neuralcomputations required for accurate estimation of thefocus error (defocus) in local regions of retinal imagesof natural scenes (Burge amp Geisler 2011 Stevenson etal 1992) The same approach seems poised to provideinsight into the neural encoding and decoding thatunderlie many other fundamental sensory and percep-tual tasks

Methods details

Natural disparity signals

To determine the luminance structure of naturalscenes we photographed natural scenes on and aroundthe University of Texas at Austin campus with a tripod-mounted Nikon D700 14-bit SLR camera (4256 middot 2836pixels) fitted with a Sigma 50 mm prime lens (Burge ampGeisler 2011) To ensure sharp photographs thecamera lens was focused on optical infinity and all

imaged objects were at least 16 m from the cameraRAW photographs were converted to luminance valuesusing the measured sensitivities of the camera (Burge ampGeisler 2011 2012) and the human photopic sensitivityfunction (Stockman amp Sharpe 2000) Twelve hundred256 middot 256 pixel patches were randomly selected from80 photographs (15 Patches middot 80 Images Figure 1b)Four hundred were used for training the other 800were used for testing

To approximate the depth structure of naturalscenes we texture map the luminance patches ontoplanar fronto-parallel or slanted surface patches thatwere straight ahead at eye height After perspectiveprojection this procedure yields the same pattern ofretinal stimulation that would be created by viewingphotographs pasted on surfaces (eg walls) differentlyslanted in depth The disparity of the surface patch wasdefined as the disparity of the surfacersquos central point

d frac14 2 tan1 IPD=2

dfixation thorn D

tan1 IPD=2

ethdfixationTHORN

eth6THORNwhere dfixation is the fixation distance to the surfacecenter IPD is the interpupillary distance and D is thedepth Depth is given by D frac14 dsurface dfixation wheredsurface is the distance to the surface The disparity ofother points on the surface patch varies slightly acrossthe surface patch The change in disparity across thesurface is greater when the surface patches are slanted

Each surface patch was either coincident with ordisplaced from the point of fixation (Figure 1c) that isthe surface patches were positioned at one of 37 depthscorresponding to 37 disparity levels within Panumrsquosfusional range (Panum 1858) (16875 to 16875arcmin in equally spaced steps) Each surface patchsubtended 28 of visual angle from the cyclopean eyeSurfaces were then slanted (or not) and left- and right-eye projections were determined via perspective pro-jection ensuring that horizontal and vertical disparitieswere correct (Figure 1c) Next left and right projec-tions were resampled at 128 samples8 approximatelythe sampling rate of the human foveal cones (CurcioSloan Kalina amp Hendrickson 1990) Finally theimages were cropped such that they subtended 18 fromeach eye (6 058 about the left- and right-eye foveae)

Optics

Patches were defocused with a polychromatic point-spread function based on a wave-optics model of thehuman visual system The model assumed a 2 mm pupil(a size typical for a bright sunny day) (Wyszecki ampStiles 1982) human chromatic aberrations (ThibosYe Zhang amp Bradley 1992) a single refracting

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 14

surface and the Fraunhoffer approximation whichimplies that at or near the focal plane the opticaltransfer function (OTF) is given by the cross-correla-tion of the generalized pupil function with its complexconjugate In humans and nonhuman primates ac-commodative and vergence eye-movement systems aretightly coupled In other words the focus distanceusually equals the fixation distance (Fincham ampWalton 1957) We set the focus distance equal to thefixation distance of 40 cm and defocused the imagesappropriate for each disparity See Supplement forfurther details of how the optics were simulated

Sensor array responses

The information for disparity estimation depends onthe depth and luminance structure of natural scenesprojective geometry the optics of the two eyes thesensor arrays and the sensor noise These factorstogether determine the pattern of noisy sensor re-sponses

reethxTHORN frac14 IeethxTHORNpsfeethxTHORNfrac12 sampethxTHORN thorn g eth7THORNfor each eye e Ie(x) represents an eye-specific lumi-nance image of the light striking the sensor array ateach location xfrac14 (x y) in a hypothetical optical systemthat causes zero degradation in image quality Eacheyersquos optics degrade the idealized image the optics arerepresented by a polychromatic point-spread functionpsfe(x) that contains the effects of defocus chromaticaberration and photopic wavelength sensitivity Thesensor arrays are represented by a spatial samplingfunctions samp(x) Finally the sensor responses arecorrupted by noise g the noise level was set just highenough to remove retinal image detail that is unde-tectable by the human visual system (Williams 1985)(In pilot studies we found that the falloff in contrastsensitivity at low frequencies had a negligible effect onresults for simplicity we did not model its effects) Notethat Equation 7 represents a luminance (photopic)approximation of the images that would be captured bythe photoreceptors This approximation is sufficientlyaccurate for the present purposes (Burge amp Geisler2011)

Accuracy maximization analysis (AMA)

AMA is a method for dimensionality reduction thatfinds the low-dimensional set of features in sensorysignals that are most useful for performing a specifictask The dependence of AMA on task renders itdistinct from many other popular methods for dimen-sionality reduction Principal components analysis(PCA) for example finds the dimensions that account

for the most variation in the sensory signals There isof course no guarantee that factors causing the mostvariation in the sensory signals will be useful for thetask at hand Color lighting and the multitude oftextures in natural images for example cause signifi-cant variation in the retinal images Yet all of thisvariation is irrelevant for the task of estimatingdisparity Thus PCA returns features that are notnecessarily useful to the task while AMA is specificallydesigned to ignore the irrelevant variation

The logic of AMA is as follows Consider encodingeach training stimulus with a small population of filterswith known (internal) noise characteristics The filterresponses are given by the dot product of the contrast-normalized stimulus scaled by the maximum responserate plus response noise (here a small amount ofGaussian noise) With a known noise model it isstraightforward to compute the mean and variance ofeach filterrsquos response to each training stimulus Then aclosed-form expression is derived for the approximateaccuracy of the Bayesian optimal decoder (GeislerNajemnik amp Ing 2009) Finally this closed-formexpression can be used to search the space of linearfilters to find those that give the most accurateperformance If the algorithm does not settle in localminima it finds the Bayes-optimal filters for maximiz-ing performance in a given task (Geisler et al 2009)Here different random initializations yielded the samefinal estimated filters A Matlab implementation ofAMA and a short discussion of how to apply it areavailable at httpjburgecpsutexaseduresearchCodehtml

It is important to note that the Bayesian optimaldecoder used in AMA requires knowing the means andvariances for each possible stimulus and hence it canonly be used to decode the training stimuli In otherwords AMA can find the optimal linear filters given alarge enough training set but the decoder it uses tolearn those filters cannot be applied to arbitrary stimuliA separate analysis (like the one described in the bodyof the present paper) is required to determine how tooptimally decode the AMA filter responses for arbi-trary test stimuli

Estimating disparity

Increasing the number of disparity levels in thetraining set increases the accuracy of disparity estima-tion However increasing the number of disparitylevels in the training set also increases the training setsize and the computational complexity of learningfilters via AMA A balance must be struck Excellentcontinuous estimates are obtained using 1875 arcminsteps for training followed by interpolation of 577 filterresponse distributions (ie 31 interpolated response

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 15

distribution between each training step correspondingto one interpolated distribution every 35 arcsec)Interpolated distributions were obtained by fittingcubic splines through the response distribution meanvectors and covariance matrices This procedureresulted in 577 LL neurons which resulted in posteriorprobability distributions that were defined by 577discrete points MAP estimates were obtained byselecting the disparity with the highest posteriorprobability Increasing the number of interpolateddistributions (ie LL neurons) had no effect onperformance

Keywords natural scene statistics perceptual con-stancy ideal observer Bayesian statistics populationcode encoding decoding selectivity invariance depthperception stereopsis hierarchical model disparityenergy model simple cells complex cells

Acknowledgments

The authors thank Larry Cormack Bruno Olshau-sen and Jonathan Pillow for comments on an earlierdraft JB was supported by NSF grant IIS-1111328NIH grant 2R01-EY011747 and NIH Training Grant1T32-EY021462-01A1 WSG was supported by NIHgrant 2R01-EY011747

Commercial relationships noneCorresponding author Johannes BurgeEmail jburgemailcpsutexaseduAddress Center for Perceptual Systems University ofTexas at Austin Austin TX USA

References

Albrecht D G amp Geisler W S (1991) Motionselectivity and the contrast-response function ofsimple cells in the visual cortex Visual Neurosci-ence 7(6) 531ndash546

Albrecht D G amp Hamilton D B (1982) Striatecortex of monkey and cat Contrast responsefunction Journal of Neurophysiology 48(1) 217ndash237

Anzai A Ohzawa I amp Freeman R D (1999)Neural mechanisms for processing binocular in-formation I Simple cells Journal of Neurophysiol-ogy 82(2) 891ndash908

Badcock D R amp Schor C M (1985) Depth-increment detection function for individual spatialchannels Journal of the Optical Society of AmericaA 2(7) 1211ndash1215

Banks M S Gepshtein S amp Landy M S (2004)Why is spatial stereoresolution so low Journal ofNeuroscience 24(9) 2077ndash2089

Berens P Ecker A S Gerwinn S Tolias A S ampBethge M (2011) Reassessing optimal neuralpopulation codes with neurometric functionsProceedings of the National Academy of SciencesUSA 108(11) 4423

Blakemore C (1970) The range and scope ofbinocular depth discrimination in man The Journalof Physiology 211(3) 599ndash622

Burge J amp Geisler W S (2011) Optimal defocusestimation in individual natural images Proceed-ings of the National Academy of Sciences USA108(40) 16849ndash16854

Burge J amp Geisler W S (2012) Optimal defocusestimates from individual images for autofocusing adigital camera Proceedings of the ISampTSPIE 47thAnnual Meeting January 2012 Burlingame CA

Carandini M amp Heeger D J (2012) Normalizationas a canonical neural computation Nature ReviewsNeuroscience 13(1) 51ndash62

Chen Y amp Qian N (2004) A coarse-to-fine disparityenergy model with both phase-shift and position-shift receptive field mechanisms Neural Computa-tion 16(8) 1545ndash1577

Cormack L K Stevenson S B amp Schor C M(1991) Interocular correlation luminance contrastand cyclopean processing Vision Research 31(12)2195ndash2207

Cumming B G (2002) An unexpected specializationfor horizontal disparity in primate primary visualcortex Nature 418(6898) 633ndash636

Cumming B G amp DeAngelis G C (2001) Thephysiology of stereopsis Annual Review of Neuro-science 24 203ndash238

Cumming B G Shapiro S E amp Parker A J (1998)Disparity detection in anticorrelated stereogramsPerception 27(11) 1367ndash1377

Curcio C A Sloan K R Kalina R E ampHendrickson A E (1990) Human photoreceptortopography The Journal of Comparative Neurolo-gy 292(4) 497ndash523

De Valois R L Albrecht D G amp Thorell L G(1982) Spatial frequency selectivity of cells inmacaque visual cortex Vision Research 22(5) 545ndash559

DeAngelis G C Ohzawa I amp Freeman R D(1991) Depth is encoded in the visual cortex by aspecialized receptive field structure Nature352(6331) 156ndash159

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 16

Ferster D amp Miller K D (2000) Neural mechanismsof orientation selectivity in the visual cortexAnnual Review of Neuroscience 23(1) 441ndash471

Fincham E amp Walton J (1957) The reciprocalactions of accommodation and convergence TheJournal of Physiology 137(3) 488ndash508

Fleet D J Wagner H amp Heeger D J (1996)Neural encoding of binocular disparity Energymodels position shifts and phase shifts VisionResearch 36(12) 1839ndash1857

Ganguli D amp Simoncelli E P (2010) Implicitencoding of prior probabilities in optimal neuralpopulations Advances in Neural Information Pro-cessing Systems (NIPS) 23 658ndash666

Geisler W S Najemnik J amp Ing A D (2009)Optimal stimulus encoders for natural tasksJournal of Vision 9(13)17 1ndash16 httpwwwjournalofvisionorgcontent91317 10116791317 [PubMed] [Article]

Girshick A R Landy M S amp Simoncelli E P(2011) Cardinal rules Visual orientation percep-tion reflects knowledge of environmental statisticsNature Neuroscience 14(7) 926ndash932

Haefner R amp Bethge M (2010) Evaluating neuronalcodes for inference using Fisher informationAdvances in Neural Information Processing Systems23 1ndash9

Haefner R M amp Cumming B G (2008) Adaptationto natural binocular disparities in primate V1explained by a generalized energy model Neuron57(1) 147ndash158

Harris J M McKee S P amp Smallman H S (1997)Fine-scale processing in human binocular stereop-sis Journal of the Optical Society of America AOptics Image Science and Vision 14(8) 1673ndash1683

Hecht S Shlaer S amp Pirenne M H (1942) Energyquanta and vision The Journal of General Physi-ology 25(6) 819ndash840

Heeger D J (1992) Normalization of cell responses incat striate cortex Visual Neuroscience 9(2) 181ndash197

Hibbard P B (2008) Binocular energy responses tonatural images Vision Research 48(12) 1427ndash1439

Hoyer P O amp Hyvarinen A (2000) Independentcomponent analysis applied to feature extractionfrom colour and stereo images Network Compu-tation in Neural Systems 11(3) 191ndash210

Jazayeri M amp Movshon J A (2006) Optimalrepresentation of sensory information by neuralpopulations Nature Neuroscience 9(5) 690ndash696

Kaye M (1978) Stereopsis without binocular corre-lation Vision Research 18(8) 1013ndash1022

Landers D D amp Cormack L K (1997) Asymme-tries and errors in perception of depth fromdisparity suggest a multicomponent model ofdisparity processing Perception amp Psychophysics59(2) 219ndash231

Lehky S R amp Sejnowski T J (1990) Neural modelof stereoacuity and depth interpolation based on adistributed representation of stereo disparityJournal of Neuroscience 10(7) 2281ndash2299

Levi D M McKee S P amp Movshon J A (2011)Visual deficits in anisometropia Vision Research51(1) 48ndash57

Lewicki M S (2002) Efficient coding of naturalsounds Nature Neuroscience 5(4) 356ndash363

Li Z amp Atick J J (1994) Efficient stereo coding inthe multiscale representation Network Computa-tion in Neural Systems 5 157ndash174

Liu Y Bovik A C amp Cormack L K (2008)Disparity statistics in natural scenes Journal ofVision 8(11)19 1ndash14 httpwwwjournalofvisionorgcontent81119 doi10116781119 [PubMed][Article]

Ma W J Beck J M Latham P E amp Pouget A(2006) Bayesian inference with probabilistic pop-ulation codes Nature Neuroscience 9(11) 1432ndash1438

Marcos S amp Burns S A (2000) On the symmetrybetween eyes of wavefront aberration and conedirectionality Vision Research 40(18) 2437ndash2447

Marr D amp Poggio T (1976) Cooperative computa-tion of stereo disparity Science 194(4262) 283ndash287

McKee S P Levi D M amp Bowne S F (1990) Theimprecision of stereopsis Vision Research 30(11)1763ndash1779

Nakayama K amp Shimojo S (1990) Da Vincistereopsis Depth and subjective occluding contoursfrom unpaired image points Vision Research30(11) 1811ndash1825

Nienborg H Bridge H Parker A J amp CummingB G (2004) Receptive field size in V1 neuronslimits acuity for perceiving disparity modulationJournal of Neuroscience 24(9) 2065ndash2076

Ogle K N (1952) On the limits of stereoscopic visionJournal of Experimental Psychology 44(4) 253ndash259

Ohzawa I (1998) Mechanisms of stereoscopic visionThe disparity energy model Current Opinion inNeurobiology 8(4) 509ndash515

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 17

Ohzawa I DeAngelis G C amp Freeman R D(1990) Stereoscopic depth discrimination in thevisual cortex Neurons ideally suited as disparitydetectors Science 249(4972) 1037ndash1041

Ohzawa I DeAngelis G C amp Freeman R D(1997) Encoding of binocular disparity by complexcells in the catrsquos visual cortex Journal of Neuro-physiology 77(6) 2879ndash2909

Olshausen B A amp Field D J (1996) Emergence ofsimple-cell receptive field properties by learning asparse code for natural images Nature 381(6583)607ndash609

Panum P (1858) Physiologische Untersuchungen uberdas Sehen mit zwei Augen [Translation Psycholog-ical investigations on seeing with two eyes] KielSchwerssche Buchandlung

Poggio G F Gonzalez F amp Krause F (1988)Stereoscopic mechanisms in monkey visual cortexBinocular correlation and disparity selectivityJournal of Neuroscience 8(12) 4531ndash4550

Qian N (1997) Binocular disparity and the perceptionof depth Neuron 18(3) 359ndash368

Qian N amp Zhu Y (1997) Physiological computationof binocular disparity Vision Research 37(13)1811ndash1827

Read J C A amp Cumming B G (2003) Testingquantitative models of binocular disparity selectiv-ity in primary visual cortex Journal of Neurophys-iology 90(5) 2795ndash2817

Read J C A amp Cumming B G (2007) Sensors forimpossible stimuli may solve the stereo correspon-dence problem Nature Neuroscience 10(10) 1322ndash1328

Read J C A Parker A J amp Cumming B G (2002)A simple model accounts for the response ofdisparity-tuned V1 neurons to anticorrelated im-ages Visual Neuroscience 19(6) 735ndash753

Simoncelli E P amp Olshausen B A (2001) Naturalimage statistics and neural representation AnnualReview of Neuroscience 24 1193ndash1216

Skottun B C De Valois R L Grosof D HMovshon J A Albrecht D G amp Bonds A B(1991) Classifying simple and complex cells on the

basis of response modulation Vision Research31(7-8) 1079ndash1086

Stevenson S B Cormack L K Schor C M ampTyler C W (1992) Disparity tuning in mecha-nisms of human stereopsis Vision Research 32(9)1685ndash1694

Stockman A amp Sharpe L T (2000) The spectralsensitivities of the middle- and long-wavelength-sensitive cones derived from measurements inobservers of known genotype Vision Research40(13) 1711ndash1737

Tanabe S amp Cumming B G (2008) Mechanismsunderlying the transformation of disparity signalsfrom V1 to V2 in the macaque Journal ofNeuroscience 28(44) 11304ndash11314

Tanabe S Haefner R M amp Cumming B G (2011)Suppressive mechanisms in monkey V1 help tosolve the stereo correspondence problem Journal ofNeuroscience 31(22) 8295ndash8305

Thibos L N Ye M Zhang X amp Bradley A(1992) The chromatic eye A new reduced-eyemodel of ocular chromatic aberration in humansApplied Optics 31(19) 3594ndash3600

Tsao D Y Conway B R amp Livingstone M S(2003) Receptive fields of disparity-tuned simplecells in macaque V1 Neuron 38(1) 103ndash114

Tyler C W (1975) Spatial organization of binoculardisparity sensitivity Vision Research 15(5) 583ndash590

Tyler C W amp Julesz B (1978) Binocular cross-correlation in time and space Vision Research18(1) 101ndash105

Vlaskamp B N S Yoon G amp Banks M S (2011)Human stereopsis is not limited by the optics of thewell-focused eye Journal of Neuroscience 31(27)9814ndash9818

Williams D R (1985) Visibility of interference fringesnear the resolution limit Journal of the OpticalSociety of America A 2(7) 1087ndash1093

Wyszecki G amp Stiles W (1982) Color scienceConcepts and methods quantitative data and for-mulas New York John Wiley amp Sons

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 18

113 13

Optimal13 disparity13 estimation13 in13 natural13 stereo-shy‐images13 (Supplement)13

Johannes13 Burge13 amp13 Wilson13 S13 Geisler13 13 Computing13 the13 log13 likelihood13 with13 populations13 of13 simple13 and13 complex13 cells13 13 Simple13 cells13 in13 visual13 cortex13 produce13 only13 positive13 responses13 whereas13 AMA13 filters13 produce13 both13 positive13 and13 negative13 responses13 The13 linear13 AMA13 filter13 responses13 (see13 first13 term13 in13 Eq13 5)13 and13 their13 pairwise13 sums13 (see13 the13 sum13 in13 the13 third13 termrsquos13 parentheses)13 can13 be13 obtained13 from13 a13 population13 of13 simple13 cells13 We13 define13 simple13 cells13 as13 units13 whose13 outputs13 result13 from13 linear13 filtering13 followed13 by13 half-shy‐wave13 rectification13 (ie13 thresholding13 at13 zero)13 Each13 AMA13 filter13 response13 in13 the13 first13 term13 can13 be13 obtained13 from13 two13 simple13 cells13 that13 are13 lsquoonrsquo13 and13 lsquooffrsquo13 versions13 of13 each13 AMA13 filter13 (Fig13 S3a13 see13 Fig13 3)13 Each13 pairwise-shy‐summed13 AMA13 filter13 response13 in13 the13 third13 term13 can13 be13 obtained13 from13 two13 simple13 cells13 that13 are13 lsquoonrsquo13 and13 lsquooffrsquo13 versions13 of13 each13 pairwise-shy‐summed13 filter13 (Fig13 S3a13 S4)13 13 13 The13 squared13 AMA13 filter13 responses13 (see13 the13 second13 and13 third13 terms13 in13 Eq13 5)13 can13 be13 obtained13 from13 a13 population13 of13 complex13 cells13 We13 define13 complex13 cells13 as13 units13 whose13 responses13 result13 from13 linear13 filtering13 followed13 by13 a13 squaring13 non-shy‐linearity13 as13 is13 often13 the13 case13 in13 cortex13 Complex13 cells13 can13 be13 implemented13 by13 summing13 and13 then13 squaring13 (or13 squaring13 and13 then13 summing)13 the13 outputs13 of13 lsquoonrsquo13 and13 lsquooffrsquo13 simple13 cells13 (Fig13 S3b13 the13 disparity13 tuning13 curves13 of13 these13 complex13 cells13 are13 given13 in13 Fig13 S5)13 13 Finally13 neurons13 whose13 response13 rates13 represent13 the13 log13 likelihood13 of13 the13 AMA13 filter13 population13 responses13 can13 be13 constructed13 via13 a13 weighted13 sum13 of13 simple13 and13 complex13 cell13 population13 responses13 (Eq13 513 Fig13 613 Fig13 7)13 (Note13 that13 the13 likelihood13 can13 be13 obtained13 by13 exponentiating13 the13 log-shy‐likelihood13 responses)13 In13 other13 words13 a13 large13 collection13 of13 log-shy‐likelihood13 (LL)13 neurons13 each13 with13 a13 different13 preferred13 disparity13 can13 be13 constructed13 from13 a13 fixed13 set13 of13 simple13 and13 complex13 cells13 simply13 by13 changing13 the13 weights13 13 13 Derivation13 of13 optimal13 weights13 on13 simple13 and13 complex13 cells13 The13 first13 term13 in13 Eq13 513 (ie13 the13 linear13 term)13 can13 be13 conceptualized13 as13 a13 weighted13 sum13 of13 simple13 cell13 population13 responses13 The13 second13 and13 third13 terms13 (ie13 the13 quadratic13 terms)13 can13 be13 conceptualized13 as13 weighted13 sums13 of13 complex13 cell13 population13 responses13 The13 constant13 corresponds13 to13 baseline13 response13 The13 constant13 and13 the13 weights13 on13 the13 linear13 squared13 and13 sum-shy‐squared13 filters13 are13 given13 by13

wik = Ck

minus1uk 13 13 13 13 13 13 13 13 13 (S1a)13

13 wiik = minusdiag(Cminus1)+ 05Cminus11 13 13 13 13 13 13 13 (S1b)13

wij k = minus05Cij

minus1 forallij j gt i 13 13 13 13 13 13 13 13 (S1c)13

cons primetk = minus05ukTCk

minus1uk + constk 13 13 13 13 13 13 (S1d)13 13 where13 I13 is13 the13 identity13 matrix13 113 is13 the13 lsquoonesrsquo13 vector13 and13 diag()13 sets13 a13 matrix13 diagonal13 to13 a13 vector13 Here13 we13 derive13 these13 weight13 equations13 based13 on13 the13 multi-shy‐dimensional13 Gaussian13 approximations13 to13 the13 empirically13 determined13 filter13 response13 distributions13 (ie13 conditional13 response13 distributions)13 (see13 Fig13 4a)13 13 13 13 13 13

213 13

First13 we13 rewrite13 Eq13 513 in13 the13 main13 text13 by13 expanding13 its13 third13 term13 13

ln p R δ k( ) = wik Ri

i=1

n

sum + wiik Ri2

i=1

n

sum + wij k Ri2 + 2Ri Rj + Rj

2( )j=i+1

n

sumi=1

nminus1

sum + cons primet 13 (S2)13

13 Second13 we13 rewrite13 Eq13 413 in13 the13 main13 text13

13

ln p R δ k( ) = minus05 R minus uk( )T Ak R minus uk( )+ const 13 13 13 13 13 (S3)13

13 where13 Ak 13 equals13 the13 inverse13 covariance13 matrix13 Ck

minus1 13 (Recall13 that13 the13 mean13 vector13 uk and13 covariance13 matrix13 Ck are13 obtained13 by13 projecting13 a13 collection13 of13 natural13 stereo-shy‐images13 having13 disparity13 δ k 13 onto13 the13 filters)13 From13 here13 forward13 the13 disparity13 index13 k13 is13 dropped13 for13 notational13 simplicity13 Third13 we13 set13 Eqs13 S213 and13 S313 equal13 to13 one13 another13 multiply13 through13 and13 collect13 terms13 The13 terms13 of13 the13 log13 likelihood13 (Eq13 S3)13 are13 found13 to13 have13 the13 form13 13

( ) ( )2 2 205 05 2ii i i ii i ii i i ii ia R u a R a Ru a uminus minus = minus minus + 13 13 13 13 (S4a)13

minusaij Ri minus ui( ) Rj minus u j( ) = minus aijRiRj minus aijRiu j minus aijRjui + aiju jui( ) 13 13 13 (S4b)13

13 where13 iia 13 are13 the13 on-shy‐diagonal13 elements13 of13 A 13 ija are13 the13 off-shy‐diagonal13 elements13 and13 iu 13 the13 are13 the13 mean13 filter13 responses13 13 13 The13 aim13 is13 now13 to13 express13 wi wii 13 and13 wij 13 in13 terms13 of13 iu 13 iia 13 and13 ija 13 First13 consider13 wi 13 In13 this13 case13

we13 want13 to13 collect13 terms13 containing13 iR 13 wiRi 13 from13 Eq13 S213 and13 aiiRiui 13 and13 aijRiu j 13 from13 Eq13 S4a13 Setting13 the13 iR 13 terms13 equal13 and13 canceling13 gives13 13

wi = aiju jj=1

n

sum 13 13 13 13 13 13 13 13 13 (S5a)13

13 Note13 that13 the13 weights13 on13 the13 linear13 filter13 responses13 wi 13 are13 zero13 if13 the13 mean13 linear13 filter13 responses13 ui 13 are13 zero13 In13 our13 case13 the13 filter13 response13 distributions13 means13 (ie13 the13 means13 of13 the13 conditional13 response13 distributions)13 are13 all13 near13 zero13 (see13 Fig13 4a13 Fig13 S2)13 Thus13 the13 weights13 on13 the13 simple13 cell13 responses13 (Fig13 6a)13 contribute13 little13 to13 the13 units13 (LL13 neurons)13 that13 respond13 according13 to13 the13 log13 likelihood13 of13 the13 linear13 filter13 population13 response13 (see13 Discussion)13 13 Next13 consider13 wij 13 i ne j 13 In13 this13 case13 we13 want13 to13 collect13 terms13 containing13 RiRj 13 2wijRiRj 13 appears13

in13 Eq13 S213 and13 minusaijRiRj 13 appears13 in13 Eq13 S4b13 Setting13 the13 two13 RiRj 13 terms13 equal13 to13 each13 other13 and13 canceling13 gives13 13

wij = minus05aij 13 13 13 13 13 13 13 13 13 (S5b)13

313 13

Next13 consider13 wii 13 In13 this13 case13 the13 total13 weight13 on13 2iR 13 in13 Eq13 S213 must13 be13 05 iiaminus 13 The13 weight13 from13

the13 third13 term13 of13 Eq13 S213 can13 be13 found13 by13 substituting13 wij = minus05aij 13 Subtracting13 that13 weight13 from13 the13

total13 weight13 specified13 by13 Eq13 S413 gives13 wii 13 13

wii = minusaii + 05 aijjsum 13 13 13 13 13 13 13 13 13 (S5c)13

13 Last13 by13 grouping13 terms13 from13 Eq13 S413 that13 are13 independent13 of13 the13 filter13 response13 we13 find13 the13 constant13 in13 Eq13 S213 is13 given13 by13 13

cons primet = minus05 aiimicroi2 + aijmicroimicro j

j foralljneisum

⎝⎜⎞

⎠⎟isum + const 13 13 13 13 (S5d)13

13 Eqs13 S1a-shy‐d13 are13 obtained13 by13 rewriting13 Eqs13 S5a-shy‐d13 in13 matrix13 notation13 13 13 In13 visual13 cortex13 simple13 and13 complex13 cells13 have13 the13 same13 maximum13 firing13 rates13 on13 average13 (Albrecht13 amp13 Hamilton13 1982)13 However13 the13 maximum13 possible13 response13 of13 the13 squared13 filters13 (see13 the13 second13 term13 in13 Eq13 5)13 is13 R2max 13 and13 the13 maximum13 possible13 response13 of13 the13 sum-shy‐squared13 filters13 (see13 the13 third13

term13 in13 Eq13 5)13 is13 given13 by13 R2ijmax = Rmax 2+ 2fi sdot f j( )2 13 where13 fi 13 and13 f j 13 are13 the13 weighting13 functions13 defining13 filters13 i13 and13 j13 (see13 Fig13 3a)13 Thus13 if13 these13 computations13 were13 implemented13 in13 cortexmdashthat13 is13 if13 the13 maximum13 response13 of13 the13 complex13 cells13 equaled13 the13 maximum13 response13 of13 the13 simple13 cells13 Rmax 13 rather13 than13 R

2max 13 and13 R

2ijmaxmdashthe13 weights13 on13 the13 complex13 cell13 responses13 must13 be13 scaled13

Thus13 the13 optimal13 weights13 on13 complex13 cells13 are13 given13 by13 primewi = wi 13 primewii = Rmaxwii 13 and13 primewij =Rijmax2

Rmaxwij 13

These13 scale13 factors13 are13 important13 because13 in13 a13 neurophysiological13 experiment13 one13 would13 measure13 primew 13 not13 w 13 Consequently13 primewii 13 and13 primewij 13 are13 the13 normalized13 weights13 plotted13 in13 Fig13 7b13

13 Evaluating13 the13 effect13 of13 surface13 slant13 Slant13 causes13 disparity13 to13 change13 across13 the13 patch13 We13 evaluated13 the13 effect13 of13 surface13 slant13 by13 determining13 the13 optimal13 filters13 and13 estimation13 performance13 for13 a13 training13 set13 with13 a13 distribution13 of13 slants13 If13 we13 assume13 (plausibly)13 that13 objects13 are13 on13 average13 equally13 likely13 to13 be13 viewed13 from13 any13 given13 angle13 then13 for13 a13 given13 small13 angular13 retinal13 extent13 the13 probability13 of13 a13 given13 slant13 about13 the13 vertical13 axis13 should13 fall13 as13 a13 half-shy‐cosine13 function13 Slant13 was13 calculated13 for13 the13 cyclopean13 eye13 To13 evaluate13 the13 effect13 of13 surface13 slant13 variation13 we13 generated13 a13 training13 set13 of13 stereo-shy‐images13 from13 a13 set13 of13 surfaces13 with13 slants13 that13 were13 randomly13 drawn13 from13 a13 truncated13 (+7113 deg)13 cosine13 probability13 density13 (more13 extreme13 slants13 were13 not13 practical13 to13 include13 because13 of13 the13 surface13 size13 required13 to13 produce13 a13 projected13 patch13 width13 of13 113 deg)13 The13 within-shy‐patch13 changes13 in13 disparity13 that13 were13 introduced13 by13 these13 slants13 are13 comparable13 to13 those13 in13 natural13 viewing13 (Fig13 S1g)13 Results13 based13 on13 this13 analysis13 are13 thus13 likely13 to13 be13 representative13 Then13 we13 determined13 the13 optimal13 binocular13 filters13 for13 this13 new13 training13 set13 The13 filters13 that13 were13 trained13 on13 surfaces13 having13 different13 slants13 (Fig13 S1a-shy‐c)13 and13 the13 LL13 neurons13 that13 are13 constructed13 from13 them13 (Fig13 S1de)13 are13 quite13 similar13 to13 those13 that13 were13 trained13 only13 on13 fronto-shy‐parallel13 surfaces13 The13 main13 (although13 slight)13 differences13 are13 that13 the13 new13 filters13 have13 spatial13 extents13 that13 are13 slightly13 smaller13 and13 octave13 bandwidths13 that13 are13 slightly13 higher13

413 13

and13 that13 the13 new13 LL13 neuron13 tuning13 curves13 are13 broader13 Disparity13 estimation13 performance13 on13 slant13 surfaces13 is13 also13 comparable13 but13 precision13 is13 somewhat13 reduced13 (Fig13 S1f13 Fig13 5d)13 When13 disparity13 changes13 across13 the13 patch13 the13 optimal13 encoding13 filters13 must13 be13 small13 enough13 to13 prevent13 the13 disparity13 signal13 from13 being13 lsquosmeared13 outrsquo13 by13 within-shy‐patch13 (ie13 within-shy‐receptive13 field)13 disparity13 variation13 And13 they13 must13 be13 large13 enough13 to13 prevent13 a13 poor13 signal-shy‐to-shy‐noise13 ratio13 which13 would13 lead13 to13 a13 preponderance13 of13 false13 matches13 (ie13 inaccurate13 disparity13 estimates)13 The13 optimal13 filters13 appear13 to13 strike13 this13 balance13 13 13 Evaluating13 the13 effect13 of13 depth13 variations13 other13 than13 slant13 Naturally13 occurring13 fine13 depth13 structure13 and13 occlusions13 (ie13 within-shy‐patch13 depth13 variation13 and13 depth13 discontinuities13 Fig13 S8a-shy‐c)13 were13 not13 present13 in13 either13 of13 our13 training13 sets13 To13 determine13 how13 the13 magnitude13 of13 these13 other13 sources13 of13 depth13 variation13 compare13 to13 that13 due13 to13 slant13 we13 analyzed13 4000013 113 deg13 patches13 from13 3013 outdoor13 range13 images13 obtained13 with13 a13 high-shy‐precision13 Riegl13 VZ40013 range13 scanner13 For13 each13 patch13 we13 computed13 the13 standard13 deviation13 of13 i)13 the13 differences13 between13 the13 raw13 range13 values13 and13 the13 best-shy‐fitting13 fronto-shy‐parallel13 plane13

σ nonminus planar 13 and13 ii)13 the13 differences13

between13 the13 best-shy‐fitting13 slanted13 and13 fronto-shy‐parallel13 planes13 σ planar 13 The13 ratio13 of13 these13 standard13

deviations13 σ planar σ nonminus planar 13 provides13 a13 measure13 of13 the13 relative13 magnitude13 of13 depth13 variation13 due13 to13

slant13 and13 variation13 due13 to13 other13 factors13 If13 the13 depth13 variation13 is13 due13 only13 to13 surface13 slant13 then13 the13 ratio13

σ planar σ nonminus planar 13 will13 have13 a13 value13 equal13 to13 1013 If13 all13 of13 the13 depth13 variation13 is13 due13 to13 factors13

other13 than13 slant13 the13 ratio13 will13 have13 a13 value13 of13 0013 Fig13 S8d13 plots13 the13 distribution13 of13 this13 ratio13 for13 all13 range13 patches13 In13 the13 most13 common13 patches13 essentially13 all13 the13 variation13 is13 captured13 by13 slant13 (ie13 the13 modal13 ratio13 is13 ~10)13 The13 median13 ratio13 is13 07513 indicating13 that13 in13 the13 median13 patch13 7513 of13 the13 depth13 variation13 is13 captured13 by13 slant13 Given13 that13 the13 optimal13 binocular13 filters13 are13 robust13 to13 variation13 in13 surface13 slant13 and13 given13 that13 most13 of13 natural13 depth13 variation13 is13 captured13 by13 slant13 it13 is13 unlikely13 that13 depth13 variation13 other13 than13 slant13 will13 qualitatively13 affect13 our13 results13 13 External13 variability13 and13 neural13 noise13 13 We13 examined13 the13 relative13 magnitude13 of13 stimulus-shy‐induced13 LL13 neuron13 variability13 and13 typical13 neural13 noise13 in13 cortex13 Typical13 cortical13 neurons13 have13 a13 peak13 mean13 response13 larger13 than13 3013 spikessec13 (Geisler13 amp13 Albrecht13 1997)13 and13 intrinsic13 noise13 variance13 proportional13 to13 the13 neuronrsquos13 mean13 response13 with13 a13 proportionality13 constant13 (ie13 Fano13 factor)13 less13 than13 15(Tolhurst13 Movshon13 amp13 Dean13 1983)13 To13 err13 on13 the13 side13 of13 too13 much13 neural13 noise13 we13 assume13 that13 each13 LL13 neuron13 has13 a13 maximum13 average13 response13 of13 3013 spkss13 and13 a13 Fano13 factor13 of13 1513 (bc13 the13 effect13 of13 noise13 is13 greater13 at13 lower13 firing13 rates)13 We13 convert13 the13 LL13 neuron13 responses13 to13 spike13 rates13 by13 first13 obtaining13 the13 average13 response13 of13 each13 LL13 neuron13 to13 its13 preferred13 disparity13 and13 then13 scaling13 that13 value13 to13 the13 maximum13 average13 response13 For13 an13 integration13 time13 of13 a13 typical13 fixation13 duration13 (30013 ms)13 stimulus-shy‐induced13 response13 variance13 was13 ~3x13 greater13 than13 the13 internal13 noise13 For13 an13 integration13 time13 equal13 to13 a13 typical13 stimulus13 presentation13 in13 a13 psychophysical13 task13 (113 sec)13 13 stimulus-shy‐induced13 response13 variance13 was13 ~9x13 greater13 than13 the13 internal13 noise13 Thus13 external13 variability13 must13 be13 considered13 to13 understand13 the13 influence13 of13 neural13 noise13 on13 the13 behavioral13 limits13 of13 vision13 under13 natural13 conditions13 In13 fact13 stimulus-shy‐induced13 variability13 is13 likely13 to13 be13 the13 controlling13 variability13 in13 many13 natural13 single-shy‐trial13 tasks13 13 Spatial13 frequency13 tuning13 of13 optimal13 linear13 binocular13 filters13 13 To13 develop13 an13 intuition13 for13 why13 spatial13 frequencies13 higher13 than13 ~613 cpd13 carry13 little13 information13 about13 disparity13 we13 examined13 the13 disparity13 signals13 resulting13 from13 a13 binocularly-shy‐viewed13 high-shy‐contrast13 cosine-shy‐windowed13 edge13 The13 Fourier-shy‐decomposition13 of13 each13 eyersquos13 stimulus13 is13 a13 set13 of13

513 13

phase-shy‐aligned13 sinewave13 gratings13 with13 the13 1f13 contrast13 fall-shy‐off13 typical13 of13 natural13 scenes(Field13 1987)13 The13 binocular13 difference13 signal13 (ie13 the13 difference13 of13 the13 left13 and13 right13 signals)13 is13 sinusoidal13 with13 contrast13 amplitude13 given13 by13 13

AB f δ k( ) = AL( f )2 + AR( f )2 minus 2AL( f )AR( f )cos 2π fδ k( ) 13 13 13 (S6)13

13 where f 13 is13 the13 spatial13 frequency13 δ k 13 is13 a13 particular13 disparity13 AL ( f ) 13 and13 AR( f ) 13 are13 the13 left13 and13

right13 eye13 retinal13 amplitudes13 and13 AB f δ k( ) 13 is13 the13 amplitude13 of13 the13 binocular13 contrast13 difference13 signal13 If13 the13 two13 eyes13 have13 identical13 optics13 the13 left-shy‐13 and13 right-shy‐eye13 retinal13 amplitudes13 (which13 include13 the13 effects13 of13 the13 optics13 and13 the13 1 f 13 falloff13 of13 natural13 image13 spectra)13 will13 be13 the13 same13 for13 a13 given13 disparity13 Fig13 S7a13 plots13 pattern13 of13 binocular13 contrast13 signals13 for13 each13 of13 several13 disparities13 Fig13 S7b13 shows13 the13 signals13 after13 filtering13 with13 a13 bank13 of13 log-shy‐Gabor13 filters13 each13 having13 a13 1513 octave13 bandwidth13 The13 patterns13 of13 binocular13 contrast13 barely13 differ13 above13 ~613 cpd13 A13 filter13 population13 (for13 disparity13 estimation)13 will13 be13 most13 efficient13 if13 each13 filter13 signals13 information13 about13 a13 range13 of13 disparities13 13 13 V113 binocular13 receptive13 fields13 tuned13 to13 the13 highest13 useful13 spatial13 frequency13 (~613 cpd)13 would13 have13 a13 spatial13 extent13 of13 ~813 arcmin13 assuming13 the13 typical13 1513 octave13 bandwidth13 of13 neurons13 in13 cortex13 Given13 that13 receptive13 fields13 cannot13 signal13 disparity13 variation13 finer13 than13 their13 own13 width13 this13 analysis13 may13 help13 explain13 why13 humans13 lose13 the13 ability13 to13 detect13 disparity13 modulations13 (ie13 sinusoidal13 modulations13 in13 disparity-shy‐defined13 depth)13 at13 spatial13 frequencies13 higher13 than13 ~413 cpd13 (Banks13 Gepshtein13 amp13 Landy13 200413 Harris13 McKee13 amp13 Smallman13 1997)13 Vergence13 eye13 movement13 jitter13 has13 previously13 been13 proposed13 as13 an13 explanation13 for13 why13 useful13 binocular13 information13 is13 restricted13 to13 low13 spatial13 frequencies13 (Vlaskamp13 Yoon13 amp13 Banks13 2011)13 vergence13 noise13 can13 degrade13 stereo13 information13 enough13 to13 render13 high13 frequency13 signals13 unmeasurable13 (Fig13 S7cd)13 The13 analysis13 presented13 here13 suggests13 that13 the13 frequency13 at13 which13 humans13 lose13 the13 ability13 to13 detect13 disparity13 modulation13 may13 simply13 be13 explained13 by13 the13 statistics13 of13 natural13 images13 13 Optics13 The13 retinal13 images13 used13 in13 our13 analysis13 were13 simulated13 to13 match13 the13 retinal13 image13 formation13 that13 occurs13 in13 the13 human13 eye13 In13 humans13 and13 non-shy‐human13 primates13 accommodative13 and13 vergence13 eye-shy‐movement13 systems13 are13 yoked(Fincham13 amp13 Walton13 1957)13 Consistent13 with13 this13 fact13 we13 set13 the13 focus13 distance13 equal13 to13 the13 fixation13 distance13 of13 4013 cm13 and13 defocused13 the13 images13 appropriate13 for13 each13 disparity13 The13 image13 of13 a13 surface13 displaced13 from13 the13 fixation13 is13 defocused13 by13

13 ΔD cong δ IPD 13 13 13 13 13 13 13 13 13 13 (S7)13

13 where13 δ 13 is13 the13 disparity13 of13 the13 surface13 expressed13 in13 radians13 IPD13 is13 the13 inter-shy‐pupillary13 distance13 expressed13 in13 meters13 and13 ΔD 13 is13 the13 defocus13 expressed13 in13 diopters13 (1meters)13 To13 account13 for13 the13 effect13 of13 chromatic13 aberration13 we13 created13 a13 different13 polychromatic13 point-shy‐spread13 function13 (PSF)13 for13 each13 disparity13 (ie13 defocus13 Eq13 S7)13 Single-shy‐wavelength13 PSFs13 were13 computed13 every13 1013 nm13 between13 40013 and13 70013 nm13 The13 wavelength-shy‐dependent13 change13 in13 refractive13 power13 of13 the13 human13 eye13 was13 taken13 from13 the13 literature13 (Thibos13 Ye13 Zhang13 amp13 Bradley13 1992)13 Polychromatic13 PSFs13 were13 obtained13 by13 weighting13 the13 single-shy‐wavelength13 PSFs13 by13 the13 photopic13 sensitivity13 function13 (Stockman13 amp13 Sharpe13 2000)13 and13 by13 the13 D6513 daylight13 illumination13 spectrum13 and13 then13 summing13 13

613 13

13 psf photopic xΔD( ) = 1

Kpsf xλΔD( )sc

λsum λ( )D65 λ( ) 13 13 13 13 (S8)13

13 where13 sc λ( ) 13 is13 the13 human13 photopic13 sensitivity13 function13 and13 K13 is13 a13 normalizing13 constant13 that13 sets13 the13 volume13 of13 psf photopic 13 to13 1013 The13 modulation13 transfer13 function13 (MTF)13 is13 the13 amplitude13 of13 the13 Fourier13 transform13 of13 the13 PSF13 13 Contrast13 normalization13 Contrast13 normalization13 occurs13 early13 in13 visual13 processing13 The13 standard13 model13 of13 cortical13 neuron13 response13 assumes13 that13 the13 input13 is13 the13 contrast13 signal13

c x( ) = r x( )minus r( ) r 13 where13 r x( ) 13 are13 the13

noisy13 sensor13 responses13 to13 the13 luminance13 image13 falling13 on13 the13 retina13 over13 a13 local13 area13 and13 r 13 is13 the13 local13 mean13 This13 contrast13 signal13 is13 then13 normalized13 by13 the13 local13 contrast13

cnorm x( ) = c x( ) c x( ) 2

+ nc502 13 where13 n13 is13 the13 dimensionality13 of13 the13 vector13 c x( ) 2 13 is13 the13 contrast13

energy13 (or13 equivalently13 n 13 times13 the13 squared13 RMS13 contrast)13 and13 50c 13 is13 the13 half-shy‐saturation13 constant13 (Albrecht13 amp13 Geisler13 199113 Albrecht13 amp13 Hamilton13 198213 Heeger13 1992)13 (We13 are13 agnostic13 about13 the13 mechanism13 by13 which13 the13 contrast13 normalization13 is13 achieved)13 Finally13 the13 normalized13 contrast13 signal13 is13 weighted13 by13 the13 cortical13 neuronrsquos13 receptive13 field13 and13 passed13 through13 a13 static13 non-shy‐linearity13 For13 example13 in13 the13 case13 of13 a13 binocular13 simple13 cell13 the13 response13 of13 the13 neuron13 is13 obtained13 by13 taking13 the13 dot13 product13 of13 the13 contrast-shy‐normalized13 signal13 with13 the13 receptive13 field13 and13 half-shy‐wave13 rectifying13

Ri = Rmax cnorm x( ) sdot fi x( )⎢⎣ ⎥⎦ 13 13

13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13

713 13

13 References13 Albrecht13 D13 G13 amp13 Geisler13 W13 S13 (1991)13 Motion13 selectivity13 and13 the13 contrast-shy‐response13 function13 of13

simple13 cells13 in13 the13 visual13 cortex13 Visual13 Neuroscience13 7(6)13 531ndash54613 Albrecht13 D13 G13 amp13 Hamilton13 D13 B13 (1982)13 Striate13 cortex13 of13 monkey13 and13 cat13 contrast13 response13

function13 Journal13 of13 Neurophysiology13 48(1)13 217ndash23713 Banks13 M13 S13 Gepshtein13 S13 amp13 Landy13 M13 S13 (2004)13 Why13 is13 spatial13 stereoresolution13 so13 low13 The13

Journal13 of13 Neuroscience13 24(9)13 2077ndash208913 13 Field13 D13 J13 (1987)13 Relations13 between13 the13 statistics13 of13 natural13 images13 and13 the13 response13 properties13 of13

cortical13 cells13 Journal13 of13 the13 Optical13 Society13 of13 America13 A13 4(12)13 2379ndash239413 Fincham13 E13 amp13 Walton13 J13 (1957)13 The13 reciprocal13 actions13 of13 accommodation13 and13 convergence13 The13

Journal13 of13 Physiology13 137(3)13 488ndash50813 Geisler13 W13 S13 amp13 Albrecht13 D13 G13 (1997)13 Visual13 cortex13 neurons13 in13 monkeys13 and13 cats13 detection13

discrimination13 and13 identification13 Visual13 Neuroscience13 14(5)13 897ndash91913 Harris13 J13 M13 McKee13 S13 P13 amp13 Smallman13 H13 S13 (1997)13 Fine-shy‐scale13 processing13 in13 human13 binocular13

stereopsis13 Journal13 of13 the13 Optical13 Society13 of13 America13 A13 14(8)13 1673ndash168313 Heeger13 D13 J13 (1992)13 Normalization13 of13 cell13 responses13 in13 cat13 striate13 cortex13 Visual13 Neuroscience13 9(2)13

181ndash19713 Hibbard13 P13 B13 (2008)13 Binocular13 energy13 responses13 to13 natural13 images13 Vision13 Research13 48(12)13 1427ndash

143913 13 Stockman13 A13 amp13 Sharpe13 L13 T13 (2000)13 The13 spectral13 sensitivities13 of13 the13 middle-shy‐13 and13 long-shy‐wavelength-shy‐

sensitive13 cones13 derived13 from13 measurements13 in13 observers13 of13 known13 genotype13 Vision13 Research13 40(13)13 1711ndash173713

Thibos13 L13 N13 Ye13 M13 Zhang13 X13 amp13 Bradley13 A13 (1992)13 The13 chromatic13 eye13 a13 new13 reduced-shy‐eye13 model13 of13 ocular13 chromatic13 aberration13 in13 humans13 Applied13 Optics13 31(19)13 3594ndash360013

Tolhurst13 D13 J13 Movshon13 J13 A13 amp13 Dean13 A13 F13 (1983)13 The13 statistical13 reliability13 of13 signals13 in13 single13 neurons13 in13 cat13 and13 monkey13 visual13 cortex13 Vision13 Research13 23(8)13 775ndash78513

Vlaskamp13 B13 N13 S13 Yoon13 G13 amp13 Banks13 M13 S13 (2011)13 Human13 stereopsis13 is13 not13 limited13 by13 the13 optics13 of13 the13 well-shy‐focused13 eye13 The13 Journal13 of13 Neuroscience13 31(27)13 9814ndash981813

13

813 13

13 Figure13 S113 Optimal13 filters13 and13 disparity13 estimation13 performance13 for13 estimating13 disparity13 for13 surfaces13 with13 a13 cosine13 distribution13 of13 slants13 (a)13 Optimal13 linear13 binocular13 filters13 for13 estimating13 the13 disparity13 (of13 the13 center13 pixel)13 of13 slanted13 surfaces13 The13 filters13 were13 learned13 on13 a13 training13 set13 having13 surfaces13 with13 a13 cosine13 distribution13 of13 slants13 The13 optimal13 filter13 shapes13 are13 robust13 to13 the13 presence13 of13 non-shy‐zero13 surface13 slants13 in13 the13 training13 set13 However13 these13 filters13 are13 of13 somewhat13 smaller13 spatial13 extent13 and13 slightly13 higher13 octave13 bandwidth13 (compare13 filters13 F3-shy‐F513 to13 those13 in13 Fig13 3)13 (b)13 Spatial13 frequency13 tuning13 and13 bandwidth13 of13 filters13 (c)13 Two-shy‐dimensional13 analogs13 of13 filters13 in13 (a)13 The13 two-shy‐dimensional13 filters13 respond13 virtually13 identically13 to13 two-shy‐dimensional13 images13 as13 do13 the13 one-shy‐dimensional13 filters13 to13 one-shy‐dimensional13 signals13 (d)13 Tuning13 curves13 of13 LL13 neurons13 trained13 and13 tested13 on13 surfaces13 with13 varying13 surface13 slant13 (e)13 Normalized13 weights13 on13 model13 complex13 cell13 responses13 for13 constructing13 five13 LL13 neurons13 with13 the13 indicated13 preferred13 disparities13 (f)13 Disparity13 estimation13 performance13 with13 surfaces13 having13 a13 cosine13 distribution13 of13 slants13 (black13 line)13 Also13 shown13 are13 disparity13 estimates13 derived13 from13 a13 local13 cross-shy‐correlator13 (gray13 line)13 (g)13 Disparity13 gradients13 from13 viewing13 natural13 scenes(Hibbard13 2008)13 vs13 disparity13 gradients13 in13 our13 training13 set13 The13 relationship13 between13 slant13 and13 disparity13 gradient13 is13 given13 by13

g asymp IPD tanθ( ) d 13 where13 θ 13 is13 the13 surface13 slant13 and13 d13 is13 the13 distance13 to13 the13 surface13 Hibbard13 (2008)13 estimated13 disparity13 gradients13 in13 natural13 viewing13 for13 distances13 nearer13 than13 3513 m13 (dashed13 curve)13 and13 0513 m13 (dotted13 curve)13 At13 the13 04m13 viewing13 distance13 assumed13 throughout13 the13 paper13 our13 manipulation13 of13 surface13 slant13 introduced13 distribution13 of13 disparity13 gradients13 (black13 line)13 that13 is13 comparable13 to13 the13 gradients13 introduced13 in13 natural13 viewing13 13

913 13

13 Figure13 S213 Filter13 responses13 LL13 neuron13 responses13 and13 performance13 with13 random-shy‐dot13 stereograms13 and13 anti-shy‐correlated13 random-shy‐dot13 stereograms13 (a)13 Joint13 response13 distributions13 of13 filters13 F113 and13 F213 when13 stimulated13 with13 random-shy‐dot13 stereograms13 Compare13 to13 natural13 stimuli13 in13 Fig13 4a13 (b)13 Joint13 responses13 of13 filters13 F313 and13 F413 when13 stimulated13 with13 random-shy‐dot13 stereograms13 (c)13 LL13 neuron13 responses13 to13 random13 dot13 stereograms13 (d)13 LL13 neuron13 tuning13 vs13 bandwidth13 when13 stimulated13 with13 natural13 stimuli13 Note13 that13 bandwidths13 are13 elevated13 compared13 to13 when13 the13 LL13 neurons13 are13 stimulated13 with13 natural13 stimuli13 (see13 Fig13 7c)13 (e)13 Disparity13 estimates13 and13 estimate13 precision13 (inset)13 obtained13 with13 random-shy‐dot13 stereograms13 (fghi)13 Same13 as13 abce13 except13 that13 the13 stimuli13 were13 anti-shy‐correlated13 random13 dot13 stereograms13 Filter13 response13 distributions13 still13 segregate13 as13 a13 function13 of13 disparity13 (especially13 with13 F313 and13 F4)13 However13 the13 anti-shy‐correlated13 filter13 response13 distributions13 do13 not13 coincide13 with13 the13 lsquoexpectedrsquo13 pattern13 of13 response13 for13 particular13 disparities13 The13 disparities13 are13 decoded13 improperly13 with13 a13 decoder13 trained13 on13 natural13 stimuli13 LL13 neuron13 tuning13 curves13 with13 anti-shy‐correlated13 stereograms13 are13 neither13 selective13 nor13 invariant13 Inaccurate13 highly13 variable13 estimates13 result13 13 13

13 13

1013 13

13 Figure13 S313 Obtaining13 AMA13 filter13 and13 model13 complex13 cell13 responses13 with13 established13 neural13 operations13 (a)13 Simple13 cell13 responses13 are13 obtained13 by13 projecting13 the13 left-shy‐13 and13 right-shy‐eye13 signals13 onto13 each13 binocular13 filter13 and13 half-shy‐wave13 rectifying13 Subtracting13 lsquoonrsquo13 and13 lsquooffrsquo13 simple13 cell13 responses13 give13 the13 AMA13 filter13 responses13 (see13 Fig13 3a)13 (b)13 Model13 (and13 neurophysiological)13 complex13 cell13 responses13 are13 obtained13 by13 summing13 and13 squaring13 the13 responses13 of13 simple13 cells13 (Model13 complex13 cell13 responses13 can13 also13 be13 obtained13 by13 summing13 the13 responses13 of13 simple13 cells13 that13 have13 both13 a13 half-shy‐wave13 rectifying13 and13 a13 squaring13 nonlinearity)13 Note13 that13 the13 model13 complex13 cells13 differ13 from13 canonical13 disparity13 energy13 binocular13 complex13 cells13 We13 label13 them13 lsquocomplexrsquo13 because13 they13 exhibit13 temporal13 frequency13 doubling13 and13 have13 a13 fundamental13 to13 DC13 response13 ratio13 of13 less13 than13 one13 The13 model13 LL13 neurons13 (see13 Fig13 613 and13 Fig13 7)13 are13 more13 similar13 to13 disparity13 energy13 complex13 cells13 although13 as13 noted13 in13 the13 text13 there13 are13 also13 important13 differences13

13

1113 13

13 Figure13 S413 The13 eight13 AMA13 filters13 from13 Fig13 313 (on13 the13 diagonal)13 and13 their13 pairwise13 sums13 (see13 Fig13 S3a)13 Solid13 and13 dashed13 curves13 indicate13 the13 left13 and13 right13 eye13 filter13 components13 respectively13 Filter13 identity13 is13 indicated13 by13 the13 subscripts13 Fij13 Some13 filters13 have13 shapes13 similar13 to13 canonical13 V1-shy‐like13 binocular13 receptive13 fields13 These13 filters13 yield13 model13 complex13 cell13 tuning13 curves13 that13 vary13 with13 disparity13 (Fig13 S5)13 Other13 filters13 are13 irregularly13 shaped13 (eg13 F36)13 Irregularly13 shaped13 filters13 yield13 model13 complex13 cell13 tuning13 curves13 that13 do13 not13 vary13 significantly13 with13 disparity13 (Fig13 S5)13 The13 inset13 shows13 the13 largest13 absolute13 normalized13 weight13 (see13 Supplement13 Eqs13 S1a-shy‐d)13 given13 to13 each13 filter13 in13 constructing13 the13 set13 of13 LL13 neurons13 (Fig13 7a)13 Lighter13 colors13 indicate13 smaller13 weights13 Unsurprisingly13 the13 irregularly13 shaped13 filters13 (eg13 F36)13 play13 a13 generally13 smaller13 role13 in13 shaping13 the13 response13 properties13 of13 log-shy‐likelihood13 (LL)13 neurons13 13 13

1213 13

13 Figure13 S513 Disparity13 tuning13 curves13 of13 the13 model13 complex13 cells13 implied13 by13 the13 AMA13 filters13 (on-shy‐diagonal)13 and13 their13 normalized13 pairwise13 sums13 (off-shy‐diagonal)13 Solid13 curves13 show13 the13 complex13 cell13 tuning13 curves13 to13 natural13 stimuli13 The13

complex13 response13 is13 given13 by13 filtering13 the13 contrast-shy‐normalized13 stimulus13 and13 then13 squaring13 Rmax fi

Tcnorm( )2 13 (see13 Fig13 S4)13 Each13 point13 on13 the13 curve13 represents13 the13 average13 response13 to13 a13 large13 number13 of13 natural13 stimuli13 having13 the13 same13 disparity13 Gray13 area13 shows13 response13 variability13 due13 to13 variation13 in13 natural13 image13 content13 Dashed13 curves13 show13 the13 complex13 cell13 tuning13 curve13 to13 513 random-shy‐dot13 stereograms13 (RDSs)13 In13 most13 cases13 RDS13 tuning13 curves13 are13 similar13 to13 the13 tuning13 curves13 for13 natural13 stimuli13 although13 response13 magnitude13 is13 generally13 somewhat13 attenuated13 These13 individual13 model13 complex13 cells13 have13 some13 selectivity13 for13 disparity13 However13 they13 exhibit13 little13 invariance13 to13 variation13 in13 natural13 image13 content13 Thus13 the13 pattern13 of13 the13 joint13 population13 response13 must13 be13 used13 to13 accurately13 estimate13 disparity13 Neurons13 that13 signal13 the13 log-shy‐likelihood13 of13 disparity13 (LL13 neurons)13 can13 be13 constructed13 by13 appropriately13 combining13 the13 joint13 responses13 of13 model13 complex13 cells13 LL13 neurons13 are13 highly13 selective13 for13 disparity13 and13 are13 also13 strongly13 response13 invariant13 to13 irrelevant13

1313 13

13 Figure13 S613 Optimal13 disparity13 estimation13 vs13 estimation13 via13 local13 cross-shy‐correlation13 Performance13 measures13 were13 obtained13 by13 testing13 each13 method13 on13 identical13 sets13 of13 left-shy‐13 and13 right-shy‐eye13 signals13 like13 those13 shown13 in13 Fig13 1d13 (Note13 that13 this13 differs13 somewhat13 from13 the13 most13 common13 test13 of13 the13 cross-shy‐correlation13 model13 which13 gives13 the13 algorithm13 windowed13 access13 to13 many13 patches13 in13 each13 eye13 We13 chose13 the13 current13 method13 so13 that13 each13 algorithm13 has13 access13 to13 the13 same13 image13 information13 thereby13 making13 the13 comparison13 meaningful)13 Black13 curves13 indicate13 estimation13 performance13 via13 local13 cross-shy‐correlation13 Gray13 curves13 indicate13 performance13 obtained13 with13 the13 eight13 AMA13 filters13 shown13 in13 the13 main13 text13 (data13 replotted13 from13 Fig13 5)13 (a)13 Median13 MAP13 estimates13 for13 frontoparallel13 surfaces13 only13 Error13 bars13 indicate13 6813 confidence13 intervals13 (b)13 Estimate13 precision13 as13 indicated13 by13 6813 confidence13 intervals13 Error13 bars13 from13 (a)13 are13 replotted13 on13 a13 linear13 axis13 (Confidence13 intervals13 were13 plotted13 on13 a13 semi-shy‐log13 plot13 in13 the13 main13 text)13 (c)13 Sign13 identification13 performance13 with13 frontoparallel13 surfaces13 (de)13 Same13 as13 bc13 except13 that13 the13 performance13 is13 for13 surfaces13 with13 a13 cosine13 distribution13 of13 slants13 Estimation13 performance13 with13 the13 cross-shy‐correlator13 is13 (slightly)13 more13 biased13 and13 is13 significantly13 less13 precise13 at13 disparities13 larger13 than13 7513 arcmin13 13

1413 13

13 Figure13 S713 Information13 for13 disparity13 estimation13 is13 concentrated13 in13 low13 to13 mid13 spatial13 frequencies13 (a)13 Inter-shy‐ocular13 retinal13 contrast13 amplitude13 of13 a13 cosine-shy‐windowed13 high-shy‐contrast13 edge13 as13 a13 function13 of13 spatial13 frequency13 for13 different13 disparities13 (Eq13 S6)13 The13 optics13 in13 the13 two13 eyes13 were13 identical13 ie13

AL f( ) = AR f( )prop MTFδ f( ) f 13 where13 MTFδ is13 the13 modulation13 transfer13 function13 associated13 with13 a13 particular13 disparity13 (see13 Methods)13 (b)13 Inter-shy‐ocular13 contrast13 signals13 that13 have13 been13 passed13 through13 a13 bank13 of13 1513 octave13 bandwidth13 log-shy‐Gabor13 filters13 The13 signals13 differ13 little13 above13 ~513 cpd13 (c)13 The13 effect13 of13 vergence13 jitter13 on13 inter-shy‐ocular13 contrast13 signals13 Signals13 were13 corrupted13 by13 213 arcmin13 of13 vergence13 jitter13 a13 typical13 value13 in13 humans13 Vergence13 noise13 was13 simulated13 by13 convolving13 each13 disparity13 signal13 with13 a13 Gaussian13 having13 a13 standard13 deviation13 of13 213 arcmin13 Vergence13 jitter13 further13 attenuates13 high13 frequency13 inter-shy‐ocular13 contrast13 signals13 Natural13 image13 structure13 and13 vergence13 jitter13 both13 contribute13 to13 the13 lack13 of13 disparity13 information13 at13 high13 luminance13 spatial13 frequencies13 (d)13 The13 signals13 in13 (c)13 after13 having13 been13 passed13 through13 a13 bank13 of13 1513 octave13 bandwidth13 log-shy‐Gabor13 filters13 (e)13 The13 effect13 of13 a13 four13 diopter13 difference13 in13 refractive13 power13 between13 the13 two13 eyes13 drastically13 reduces13 the13 inter-shy‐ocular13 contrast13 signal13 A13 four13 diopter13 (or13 larger)13 difference13 between13 the13 left13 and13 right13 optics13 is13 a13 primary13 risk13 factor13 for13 the13 development13 of13 amblyopia13 (Levi13 et13 al13 2011)13 Note13 that13 these13 signals13 were13 simulated13 for13 eyes13 with13 2mm13 pupils13 With13 larger13 pupil13 diameters13 the13 same13 difference13 in13 refractive13 power13 would13 further13 reduce13 available13 disparity13 signals13 For13 example13 with13 a13 2mm13 pupil13 an13 813 diopter13 difference13 between13 the13 eyes13 produces13 the13 same13 difference13 in13 defocus13 blur13 as13 would13 a13 4mm13 pupil13 with13 a13 413 diopters13 difference13 between13 the13 eyes13 In13 that13 case13 there13 are13 no13 measureable13 signals13 above13 113 cpd13 (f)13 The13 signals13 in13 (e)13 after13 having13 been13 passed13 through13 a13 bank13 of13 1513 octave13 bandwidth13 log-shy‐Gabor13 filters13 (g)13 Same13 as13 (e)13 but13 with13 a13 4mm13 pupil13 (h)13 Same13 as13 (f)13 but13 with13 a13 4mm13 pupil13 13

13

1513 13

13 Figure13 S813 Depth13 structure13 not13 attributable13 to13 slant13 in13 natural13 scenes13 (a)13 A13 representative13 natural13 range13 image13 (b)13 Overhead13 plan13 view13 of13 the13 range13 along13 the13 white13 scan-shy‐line13 in13 (a)13 (c)13 Range13 values13 within13 a13 113 deg13 patch13 marked13 by13 the13 white13 box13 in13 (a)13 and13 the13 black13 box13 in13 (b)13 The13 dots13 represent13 the13 raw13 range13 values13 The13 plane13 is13 the13 best13 fitting13 plane13 to13 the13 range13 data13 (d)13 Depth13 variation13 in13 113 deg13 patches13 of13 natural13 scenes13 can13 largely13 be13 captured13 by13 slanted13 planes13 (see13 supplementary13 text)13 Values13 near13 one13 indicate13 that13 all13 range13 variation13 can13 be13 captured13 by13 (planar)13 surface13 slant13 Values13 near13 zero13 indicate13 that13 none13 of13 the13 range13 variation13 can13 be13 captured13 by13 planar13 surface13 slant13 In13 the13 most13 common13 patch13 nearly13 all13 depth13 variation13 is13 attributable13 to13 slant13 In13 the13 median13 patch13 7513 of13 depth13 variation13 is13 captured13 by13 slant13 13 13 13 13 13 13 13 13 13 13

1613 13

13 Figure13 S913 Contrast13 normalization13 and13 the13 Gaussian13 form13 of13 the13 filter13 response13 distributions13 The13 standard13 model13 of13

contrast13 normalization13 in13 retinal13 and13 cortical13 neurons13 is13 given13 by13 cnorm x( ) = c x( ) c x( ) 2

+ nc502 (Albrecht13 amp13 Geisler13

199113 Albrecht13 amp13 Hamilton13 198213 Heeger13 1992)13 Throughout13 the13 paper13 we13 assumed13 that13 c50 13 had13 a13 value13 of13 zero13 To13 examine13 the13 effect13 of13 different13 values13 of13 c50 13 on13 the13 form13 of13 the13 filter13 response13 distributions13 we13 first13 contrast-shy‐normalized13 the13 test13 stimuli13 using13 different13 values13 of13 c50 13 Then13 we13 projected13 the13 contrast-shy‐normalized13 the13 test13 stimuli13 from13 each13 disparity13 level13 onto13 the13 filters13 to13 obtain13 the13 conditional13 filter13 response13 distributions13 Finally13 we13 fit13 a13 multi-shy‐dimensional13 generalized13 Gaussian13 (one-shy‐dimension13 for13 each13 filter)13 to13 each13 conditional13 response13 distribution13

P R |δ k( ) 13 via13 maximum13 likelihood13 estimation13 (a)13 When13 c50 =13 0013 the13 conditional13 response13 distributions13 have13 tails13 that13 tend13 to13 be13 somewhat13 lighter13 than13 Gaussians13 (ie13 lower13 kurtosis13 than13 a13 Gaussian)13 When13 c50 =13 0113 the13 conditional13 response13 distributions13 are13 most13 Gaussian13 on13 average13 For13 c50 0113 the13 distributions13 have13 heavier13 tails13 than13 Gaussians13 (ie13 higher13 kurtosis13 than13 a13 Gaussian)13 (b)13 Powers13 of13 the13 multi-shy‐dimensional13 generalized13 Gaussians13 that13 best13 fit13 a13 subset13 of13 the13 two-shy‐dimensional13 conditional13 filter13 response13 distributions13 A13 power13 of13 two13 (dashed13 line)13 indicates13 that13 the13 best13 fitting13 distribution13 was13 Gaussian13 Powers13 greater13 than13 or13 less13 than13 two13 indicate13 that13 the13 responses13 are13 better13 fit13 by13 distributions13 having13 lighter13 or13 heavier13 tails13 than13 a13 Gaussian13 respectively13 The13 best-shy‐fitting13 powers13 are13 shown13 for13 three13 different13 values13 of13 c50 13 00013 01013 and13 02513 The13 average13 powers13 across13 the13 conditional13 response13 distributions13 for13 the13 three13 tested13 values13 of13 c50 13 were13 27913 20713 and13 11713 respectively13 The13 same13 qualitative13 results13 hold13 for13 the13 other13 two-shy‐dimensional13 response13 distributions13 The13 pattern13 also13 holds13 for13 higher-shy‐dimensional13 filter13 response13 distributions13 In13 higher13 dimensions13 however13 results13 are13 noisier13 because13 stable13 estimates13 of13 kurtosis13 require13 significantly13 more13 data13 13 13 13 13 13

  • jovi-14-02-01_118
    • Introduction
    • f01
    • Methods and results
    • e01
    • f02
    • e02
    • e03
    • e04
    • f03
    • e05
    • f04
    • f05
    • f06
    • f07
    • Discussion
    • Conclusions
    • Methods details
    • e06
    • e07
    • Albrecht1
    • Albrecht2
    • Anzai1
    • Badcock1
    • Banks1
    • Berens1
    • Blakemore1
    • Burge1
    • Burge2
    • Carandini1
    • Chen1
    • Cormack1
    • Cumming1
    • Cumming2
    • Cumming3
    • Curcio1
    • DeValois1
    • DeAngelis1
    • Ferster1
    • Fincham1
    • Fleet1
    • Ganguli1
    • Geisler1
    • Girshick1
    • Haefner1
    • Haefner2
    • Harris1
    • Hecht1
    • Heeger1
    • Hibbard1
    • Hoyer1
    • Jazayeri1
    • Kaye1
    • Landers1
    • Lehky1
    • Levi1
    • Lewicki1
    • Li1
    • Liu1
    • Ma1
    • Marcos1
    • Marr1
    • McKee1
    • Nakayama1
    • Nienborg1
    • Ogle1
    • Ohzawa1
    • Ohzawa2
    • Ohzawa3
    • Olshausen1
    • Panum1
    • Poggio1
    • Qian1
    • Qian2
    • Read1
    • Read2
    • Read3
    • Simoncelli1
    • Skottun1
    • Stevenson1
    • Stockman1
    • Tanabe1
    • Tanabe2
    • Thibos1
    • Tsao1
    • Tyler1
    • Tyler2
    • Vlaskamp1
    • Williams1
    • Wyszecki1
      • JOV-03741-2013-s01
Page 8: Optimal disparity estimation in natural stereo images · The local differences between the retinal images—the binocular disparities— ... vertically oriented receptive fields.The

(Figure 6a) The model complex cell is better tuned butit is not unimodal and its responses also vary severelywith irrelevant image information (Figure 6b) (Thedisparity tuning curves for all model complex cells areshown in Figure S5) In contrast the LL neuron issharply tuned is effectively unimodal and is stronglyresponse invariant (Figure 6c) That is it respondssimilarly to all natural image patches of a givendisparity The disparity tuning curves for a range of LLneurons are shown in Figure 7a When slant variessimilar but broader LL neuron tuning curves result(Figure S1d e)

These results show that it is potentially misleading torefer to canonical V1 binocular simple and complexcells as disparity tuned because their responses aretypically as strongly modulated by variations incontrast pattern as they are by variations in disparity(gray area Figure 6a b) The LL neurons on the otherhand are tuned to a narrow range of disparities andrespond largely independent of the spatial frequencycontent and contrast

The LL neurons have several interesting propertiesFirst their responses are determined almost exclusivelyby the model complex cell inputs because the weightson the linear responses (Equation 5 Figure 6a c) aregenerally near zero (see Supplement) In this regard theLL neurons are consistent with the predictions of thestandard disparity energy model (Cumming amp DeAn-gelis 2001 Ohzawa 1998) However standard dis-

parity energy neurons are not as narrowly tuned or asinvariant (Figure 6d)

Second each LL neuron receives strong inputs frommultiple complex cells (Figure 7b) In this regard theLL neurons are inconsistent with the disparity energymodel which proposes that disparity-tuned cells areconstructed from two binocular subunits The potentialvalue of more than two subunits has been previouslydemonstrated (Qian amp Zhu 1997) Recently it hasbeen shown that some disparity-tuned V1 cells areoften modulated by a greater number of binocularsubunits than two Indeed as many as 14 subunits candrive the activity of a single disparity selective cell(Tanabe Haefner amp Cumming 2011)

Third the weights on the model complex cells(Figure 7b)mdashwhich are determined by the conditionalresponse distributions (Figure 4a)mdashspecify how infor-mation at different spatial frequencies should becombined

Fourth as preferred disparity increases the numberof strong weights on the complex cell inputs decreases(Figure 7b) This occurs because high spatial frequen-cies are less useful for encoding large disparities (seebelow) Thus if cells exist in cortex that behavesimilarly to LL neurons the number and spatialfrequency tuning of binocular subunits driving theirresponse should decrease as a function of theirpreferred disparity

Fifth for all preferred disparities the excitatory(positive) and inhibitory (negative) weights are in a

Figure 5 Accuracy and precision of disparity estimates on test patches (a) Disparity estimates of fronto-parallel surfaces displaced

from fixation using the filters in Figure 3 Symbols represent the median MAP readout of posterior probability distributions (see

Figure 4b) Error bars represent 68 confidence intervals on the estimates Red boxes mark disparity levels not in the training set

Error bars at untrained levels are no larger than at the trained levels indicating that the algorithm makes continuous estimates (b)

Precision of disparity estimates on a semilog axis Symbols represent 68 confidence intervals (same data as error bars in Figure 5a)

Human discrimination thresholds also rise exponentially as stereo stimuli are moved off the plane of fixation The gray area shows the

hyperacuity region (c) Sign identification performance as a function of disparity (d) (e) Same as in (b) (c) except that data is for

surfaces with a cosine distribution of slants

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 8

classic push-pull relationship (Ferster amp Miller 2000)(Figure 7b) consistent with the fact that disparityselective neurons in visual cortex contain both excit-atory and suppressive subunits (Tanabe et al 2011)

Sixth the LL neuron disparity tuning curves areapproximately log-Gaussian in shape (ie Gaussian ona log-disparity axis) Because their standard deviationsare approximately constant on a log disparity axistheir disparity bandwidths increase linearly with thepreferred disparity (Figure 7c) In this respect many

cortical neurons behave like LL neurons the disparitytuning functions of cortical neurons typically havemore low frequency power than is predicted from thestandard energy model (Ohzawa DeAngelis amp Free-man 1997 Read amp Cumming 2003) Additionallypsychophysically estimated disparity channels in hu-mans exhibit a similar relationship between bandwidthand preferred disparity (Stevenson et al 1992) Humandisparity channels however differ in that they haveinhibitory side-lobes (Stevenson et al 1992) that is

Figure 6 Biologically plausible implementation of selective invariant tuning processing schematics and disparity tuning curves for an

AMA filter a model complex cell and a model log-likelihood (LL) neuron For comparison a disparity energy unit is also presented In

all cases the inputs are contrast normalized photoreceptor responses Disparity tuning curves show the mean response of each filter

type across many natural image patches having the same disparity for many different disparities Shaded areas show response

variation due to variation in irrelevant features in the natural patches (not neural noise) Selectivity for disparity and invariance to

irrelevant features (external variation) increase as processing proceeds (a) The filter response is obtained by linearly filtering the

contrast normalized input signal with the AMA filter (b) The model complex cell response is obtained by squaring the linear AMA

filter response (c) The response of an LL neuron with preferred disparity dk is obtained by a weighted sum of linear and squared

filter responses The weights can be positiveexcitatory or negativeinhibitory (see Figure 7) The weights for an LL neuron with a

particular preferred disparity are specified by the filter response distribution to natural images having that disparity (Figure 4a

Equations 4 5 S1ndash 4) In disparity estimation the filter responses specify that the weights on the linear filter responses are near zero

(see Supplement) (d) A standard disparity energy unit is obtained by simply summing the squared responses of two binocular linear

filters that are in quadrature (908 out of phase with each other) Here we show the tuning curve of a disparity energy unit having

binocular linear filters (subunits) with left and right-eye components that are also 908 out of phase with each other (ie each

binocular subunit is selective for a nonzero disparity)

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 9

they have the center-surround organization that is ahallmark of retinal ganglion cell LGN and V1receptive fields Understanding the basis of this center-surround organization is an important direction forfuture work

V1 binocular neurons are unlikely to have receptivefields exactly matching those of the simple and complexcells implied by the optimal AMA filters but V1neurons are likely to span the subspace spanned by theoptimal filters It is thus plausible that excitatory andinhibitory synaptic weights could develop so that asubset of the neurons in V1 or in other cortical areassignal the log likelihood of different specific disparities(see Figure 7b) Indeed some cells in cortical areas V1V2 and V3V3a exhibit sharp tuning to the disparity ofrandom dot stimuli (Cumming 2002 Ohzawa et al1997 G F Poggio et al 1988 Read amp Cumming2003)

Computational models of estimation from neuralpopulations often rely on the assumption that eachneuron is invariant and unimodally tuned to thestimulus property of interest (Girshick et al 2011Jazayeri amp Movshon 2006 Lehky amp Sejnowski 1990Ma et al 2006) However it is often not discussed howinvariant unimodal tuning arises For example thebinocular neurons (ie complex cells) predicted by thestandard disparity energy model do not generallyexhibit invariant unimodal tuning (Figure 6d) Ouranalysis shows that neurons with unimodal tuning tostimulus properties not trivially available in the retinal

images (eg disparity) can result from appropriatelinear combination of nonlinear filter responses

To obtain optimal disparity estimates the LLneuron population response (represented by the vectorRLL in Figure 2c) must be read out (see Figure 2d)The optimal read-out rule depends on the observerrsquosgoal (the cost function) A common goal is to pick thedisparity having the maximum a posteriori probabil-ity If the prior probability of the different possibledisparities is uniform then the optimal MAP decodingrule reduces to finding the LL neuron with themaximum response Nonuniform prior probabilitiescan be taken into account by adding a disparity-dependent constant to each LL neuron responsebefore finding the peak response There are elegantproposals for how the peak of a population responsecan be computed in noisy neural systems (Jazayeri ampMovshon 2006 Ma et al 2006) Other commonlyassumed cost functions (eg MMSE) yield similarperformance

In sum our analysis has several implications First itsuggests that optimal disparity estimation is bestunderstood in the context of a population codeSecond it shows how to linearly sum nonlinear neuralresponses to construct cells with invariant unimodaltuning curves Third it suggests that the eclecticmixture of binocular receptive field properties in cortexmay play a functional role in disparity estimationFourth it provides a principled hypothesis for howneurons may compute the posterior probabilities of

Figure 7 Constructing selective invariant disparity-tuned units (LL neurons) (a) Tuning curves for several LL neurons each with a

different preferred disparity Each point on the tuning curve represents the average response across a collection of natural stereo-

images having the same disparity Gray areas indicate 61 SD of response due to stimulus-induced response variability (b)

Normalized weights on model complex cell responses (see Supplement Equation 5 Figure 6c) for constructing the five LL neurons

marked with arrows in (a) Positive weights are excitatory (red) Negative weights are inhibitory (blue) On-diagonal weights

correspond to model complex cells having linear receptive fields like the filters in Figure 3a Off-diagonal weights correspond to model

complex cells having linear receptive fields like the scaled pairwise sums of the AMA filters (see Figures S3ndashS5) High spatial

frequencies are not useful for estimating large disparities (Figure S7) Thus the number of strongly weighted complex cells decreases

as the magnitude of the preferred disparity increases from zero (c) LL neuron bandwidth (ie full-width at half-height of disparity

tuning curves) as a function of preferred disparity Bandwidth increases approximately linearly with tuning

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 10

stimulus properties (eg disparity defocus motion)not trivially available in the retinal image(s) Thus ouranalysis provides a recipe for how to increase selectivityand invariance in the visual processing stream invari-ance to image content variability and selectivity for thestimulus property of interest

Discussion

Using principles of Bayesian statistical decisiontheory we showed how to optimally estimate binoc-ular disparity in natural stereo images given theoptical sensor and noise properties of the humanvisual system First we determined the linear binoc-ular filters that encode the retinal image informationmost useful for estimating disparity in natural stereo-images The optimal linear filters share many proper-ties with binocular simple cells in the primary visualcortex of monkeys and cats (Figure 3 Figure S1)Then we determined how the optimal linear filterresponses should be nonlinearly decoded to maximizethe accuracy of disparity estimation The optimaldecoder was not based on prior assumptions butrather was dictated by the statistical pattern of thejoint filter responses to natural stereo images Overallperformance was excellent and matched importantaspects of human performance (Figure 5) Finally weshowed that the optimal encoder and decoder can beimplemented with well-established neural operationsThe operations show how selectivity and invariancecan emerge along the visual processing stream(Figures 6 amp 7)

External variability and neural noise

A visual systemrsquos performance is limited by bothexternal (stimulus-induced) variability and intrinsicneural response variability (ie neural noise) Manytheoretical studies have focused on the impact ofneural noise on neural computation (Berens EckerGerwinn Tolias amp Bethge 2011 Cumming ampDeAngelis 2001 DeAngelis et al 1991 Jazayeri ampMovshon 2006 Ma et al 2006 Nienborg et al 2004Ohzawa DeAngelis amp Freeman 1990) However inmany real-world situations stimulus-induced responsevariability (ie variability due to nuisance stimulusproperties) is a major or the dominant source ofvariability Our analysis demonstrates that the exter-nal variation in natural image contentmdashthat isvariation in image features irrelevant to the taskmdashconstitutes an important source of response variabilityin disparity estimation (see Figures 4 5 7) Thus evenin a hypothetical visual system composed of noiseless

neurons significant response variability will occurwhen estimating disparity under natural conditions(see Figures 4 and 5)

Determining the best performance possible withoutneural noise is essential for understanding the effect ofneural noise (Haefner amp Bethge 2010 Lewicki 2002Li amp Atick 1994 Olshausen amp Field 1996 Simoncelliamp Olshausen 2001) because it sets bounds on theamount of neural noise that a system can toleratebefore performance significantly deteriorates Indeedthe impact of neural noise on encoding and decoding innatural viewing cannot be evaluated without consider-ing the impact of stimulus-induced response variabilityon the population response (see Supplement)

Seventy years ago Hecht Shlaer and Pirenne (1942)showed in a landmark study that dark-adapted humanscan detect light in the periphery from the absorption ofonly five to eight quanta and that external variability(in the form of photon noise) was the primary source ofvariability in human performance This result contra-dicted the notion (widespread at the time) that responsevariability is due primarily to variability within theorganism (eg intrinsic noise) Similarly our resultsunderscore the importance of analyzing the variabilityof natural signals when analyzing the responsevariability of neural circuits that underlie performancein critical behavioral tasks

Comparison with the disparity energy model

The most common neurophysiological model ofdisparity processing is the disparity energy modelwhich aims to account for the response properties ofbinocular complex cells (Cumming amp DeAngelis 2001DeAngelis et al 1991 Ohzawa 1998 Ohzawa et al1990) The model had remarkable initial successHowever an increasing number of discrepancies haveemerged between the neurophysiology and the modelrsquospredictions The standard disparity energy model forexample does not predict the reduced response toanticorrelated stereograms observed in cortex It alsodoes not predict the dependence of disparity selectivecells on more than two binocular subunits Fixes to thestandard model have been proposed to account for theobserved binocular response properties (Haefner ampCumming 2008 Qian amp Zhu 1997 Read Parker ampCumming 2002 Tanabe amp Cumming 2008) We haveshown however that a number of these properties area natural consequence of an optimal algorithm fordisparity estimation

The most common computational model of disparityprocessing is the local cross-correlation model (Bankset al 2004 Cormack et al 1991 Tyler amp Julesz 1978)In this model the estimated disparity is the one thatproduces the maximum local cross-correlation between

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 11

the images in the two eyes Local cross-correlation isthe computational equivalent of appropriately pro-cessing large populations of disparity-energy complexcells having a wide range of tuning characteristics(Anzai Ohzawa amp Freeman 1999 Fleet Wagner ampHeeger 1996) Cross-correlation is optimal when thedisparities across a patch are small and constantbecause it is then akin to the simple template matchingof noise-limited ideal observers Cross-correlation isnot optimal for larger or varying disparities becausethen the left- and right-eye signals falling within abinocular receptive field often differ substantially

How does our ideal observer for disparity estimationcompare with the standard disparity-energy and cross-correlation models To enable a meaningful compari-son both methods were given access to the same imageinformation Doing so requires a slight modification tothe standard cross correlation algorithm (seeSupplement) The eight AMA filters in Figure 3outperform cross-correlation in both accuracy andprecision when the disparities exceed approximately75 arcmin (Figure S6) This performance increase isimpressive given that local cross-correlation has (ineffect) access to a large bank of filters whereas ourmethod uses only eight Furthermore increasing thenumber of AMA filters to 10ndash12 gives essentiallyequivalent performance to cross-correlation for smallerdisparities

Spatial frequency tuning of optimal linearbinocular filters

The optimal linear binocular filters (the AMA filters)are selective for low to mid spatial frequencies (12ndash47c8) and not for higher frequencies (see Figure 3 FigureS1) To develop an intuition for why we examined thebinocular contrast signals as a function of frequencythat result from a binocularly viewed high-contrastedge (Figure S7) The binocular contrast signals barelydiffer above 6 c8 indicating that higher spatialfrequencies carry little information for disparity esti-mation

This analysis of binocular contrast signals providesinsight into another puzzling aspect of disparityprocessing Human sensitivity to disparity modulation(ie sinusoidal modulations in disparity-defined depth)cuts off at very low frequencies (4 c8) (Banks et al2004 Tyler 1975) V1 binocular receptive fields tunedto the highest useful (luminance) spatial frequency (6c8) have widths of 8 arcmin assuming the typicaloctave bandwidth for cortical neurons Given thatreceptive fields cannot signal variation finer than theirown size the fact that humans cannot see disparitymodulations higher than 4 c8 is nicely predicted bythe 8 arcmin width suggested by the present analysis

This size matches previous psychophysically basedestimates of the smallest useful mechanism in disparityestimation (Banks et al 2004 Harris McKee ampSmallman 1997)

Eye movement jitter has previously been proposed asan explanation for why useful binocular information isrestricted to low spatial frequencies (Vlaskamp Yoonamp Banks 2011) The argument is that jitter and therelatively long integration time of the stereo systemcould lsquolsquosmear outrsquorsquo the disparity signals Eye movementjitter certainly can degrade stereo information enoughto render high frequency signals unmeasurable (FigureS7c d) However the analysis in the supplement(Figure S7) suggests that the low frequency at whichhumans lose the ability to detect disparity modulationmay also be explained by the statistics of naturalimages

Generality of findings

Although our analysis is based on calibrated naturalstereo-image patches and realistic characterizations ofhuman optics sensors and neural noise there are somesimplifying assumptions that could potentially limit thegenerality of our conclusions Here we examine theeffect of these assumptions

The effect of surface slant and nonplanar depth structure

The stereo-image patches were projections offronto-parallel surfaces In natural scenes surfaces areoften slanted which causes the disparity to changeacross a patch We evaluated the effect of surface slantby repeating our analysis with a training set having adistribution of slants (see Figure 5d e andSupplement) This distribution of slants produced adistribution of disparity gradients that are comparableto those that occur when viewing natural scenes(Hibbard 2008) (Figure S1g) The optimal binocularfilters are only very slightly affected by surface slant(cf Figure 3 and Figure S1a) These differenceswould be difficult to detect in neurophysiologicalexperiments This may help explain why there hasbeen little success in finding neurons in early visualcortex that are tuned to nonzero slants (Nienborg etal 2004) Disparity estimation performance is alsoquite similar although estimate precision is somewhatreduced (Figure 5d e)

Even after including the effect of surface slant ourtraining set lacks the non-planar depth structuremdashwithin-patch depth variation and depth discontinuities(occlusions)mdashthat is present in natural scenes (FigureS8andashc) To determine how the magnitude of theseother sources depth variation compare to that due toslant we analyzed a set of range images obtained with

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 12

a high-precision Riegl VZ400 range scanner (seeSupplement) We find that most of the depth variationin the range images is captured by variation in slant(Figure S8d) Given that variation in slant has littleeffect on the optimal receptive fields and given thatlocations of fine depth structure and monocularocclusion zones (9 of points in our dataset) aremost likely random with respect to the receptive fieldsthen it is likely that the other sources of variation willhave little effect

Nevertheless occlusions create monocular zones thatcarry information supporting da Vinci stereopsis inhumans (Kaye 1978 Nakayama amp Shimojo 1990)Neurophysiological evidence suggests that binocularmechanisms exist for locating occlusion boundaries(Tsao Conway amp Livingstone 2003) In the futurewhen databases become available of high-qualitynatural stereo images with coregistered RGB anddistance information the general approach describedhere may prove useful for determining the filters thatare optimal for detecting occlusion boundaries

The effect of a realistic disparity prior

Our analysis assumed a flat prior probabilitydistribution over disparity Two groups have recentlypublished estimates of the prior distribution ofdisparities based on range measurements in naturalscenes and human eye movement statistics (Hibbard2008 Liu et al 2008) Does the prior distribution ofdisparity signals have a significant effect on the optimalfilters and estimation performance To check wemodified the frequency of different disparities in ourtraining set to match the prior distribution ofdisparities encountered in natural viewing (Liu et al2008) Linear filter shapes LL neuron tuning curveshapes and performance levels were robust to differ-ences between a flat and realistic disparity prior

It has been hypothesized that the prior over thestimulus dimension of interest (eg disparity) maydetermine the optimal tuning curve shapes and theoptimal distribution of peak tunings (ie how thetuning curves should tile the stimulus dimension)(Ganguli amp Simoncelli 2010) In evaluating thishypothesis one must consider the results presented inthe present paper Our results show that natural signalsand the linear receptive fields that filter those signalsplace strong constraints on the shapes that tuningcurves can have Specifically the shapes of the LLneuron disparity tuning curve (approximately log-Gaussian) are robust to changes in the prior Thusalthough the prior may influence the optimal distribu-tion of peak tuning (our analysis does not address thisissue) it is unlikely to be the sole (or even the primary)determinant of tuning curve shapes

The effect of contrast normalization

Contrast normalization significantly contributes tothe Gaussian form of the filter response distributionsOne potential advantage of Gaussian response distri-butions is that pair-wise (and lower order) statisticsfully characterize the joint responses from an arbitrarynumber of filters making possible the decoding of largefilter population responses In the analysis presentedhere all binocular signals were normalized to a mean ofzero and a vector magnitude of 10 before beingprojected onto the binocular filters This is a simplifiedform of the contrast normalization that is a ubiquitousfeature of retinal and early cortical processing (Car-andini amp Heeger 2012) The standard model ofcontrast normalization in cortical neurons is given by

cnorm(x)frac14 c(x)ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffijjcethxTHORNjj2 thorn nc2

50

qwhere n is the dimen-

sionality of the vector and c50 is the half-saturationconstant (Albrecht amp Geisler 1991 Albrecht ampHamilton 1982 Heeger 1992) (The half-saturationconstant is so-called because in a neuron with anoutput squaring nonlinearity the response rate willequal half its maximum when the contrast of thestimulus equals the value of c50)

We examined the effect of different half-saturationconstants The optimal filters are robust to differentvalues of c50 The conditional response distributions arenot For large values of c50 the response distributionshave tails much heavier than Gaussians When c50frac14 00(as it did throughout the paper) the distributions arewell approximated but somewhat lighter-tailed thanGaussians The distributions (eg see Figure 4a) aremost Gaussian on average when c50 frac14 01 (Figure S9)On the basis of this finding we hypothesize a newfunction for cortical contrast normalization in additionto the many already proposed (Carandini amp Heeger2012) Contrast normalization may help create condi-tional filter response distributions that are Gaussianthereby making simpler the encoding and decoding ofhigh-dimensional subspaces of retinal image informa-tion

The effect of different optics in the two eyes

We modeled the optics of the two eyes as beingidentical whereas refractive power and monochromaticaberrations often differ somewhat between the eyes(Marcos amp Burns 2000) To evaluate the effect ofnormal optical variations we convolved the retinalprojections with point-spread functions previouslymeasured in the first authorrsquos left and right eyes (Burgeamp Geisler 2011) (The first author has normal opticalvariation) Then we determined optimal filters andperformance Filters and performance levels are similarto those in Figures 3 and 4 Thus our results are robustto typical optical differences between the eyes How-

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 13

ever if the optics in one eye is severely degradedrelative to the other then the optimal filters are quitedifferent and disparity estimation performance issignificantly poorer

Interestingly a persistent 4 diopter difference inrefractive power between the two eyes is the mostimportant risk factor for the development of amblyopia(Levi McKee amp Movshon 2011) a condition char-acterized by extremely poor or absent stereopsis(among other deficits) A four diopter differencebetween the left and right eye optics drastically reduces(ie nearly eliminates) computable interocular contrastsignals that covary systematically with disparity(Equation S6 Figure S7endashh)

Conclusions

The method presented here provides a prescriptionfor how to optimally estimate local depth cues fromimages captured by the photosensors in any visionsystem Our analysis of the depth cue of binoculardisparity provides insight into the computations andneural populations required for accurate estimation ofbinocular disparity in animals viewing natural scenesSpecifically by analyzing the information available atthe photoreceptors we improve upon existing compu-tational methods for solving the stereo correspondenceproblem and provide a normative account of a range ofestablished neurophysiological and psychophysicalfindings This study demonstrates the power ofcharacterizing the properties of natural signals that arerelevant for performing specific natural tasks A similarrecent analysis provided insight into the neuralcomputations required for accurate estimation of thefocus error (defocus) in local regions of retinal imagesof natural scenes (Burge amp Geisler 2011 Stevenson etal 1992) The same approach seems poised to provideinsight into the neural encoding and decoding thatunderlie many other fundamental sensory and percep-tual tasks

Methods details

Natural disparity signals

To determine the luminance structure of naturalscenes we photographed natural scenes on and aroundthe University of Texas at Austin campus with a tripod-mounted Nikon D700 14-bit SLR camera (4256 middot 2836pixels) fitted with a Sigma 50 mm prime lens (Burge ampGeisler 2011) To ensure sharp photographs thecamera lens was focused on optical infinity and all

imaged objects were at least 16 m from the cameraRAW photographs were converted to luminance valuesusing the measured sensitivities of the camera (Burge ampGeisler 2011 2012) and the human photopic sensitivityfunction (Stockman amp Sharpe 2000) Twelve hundred256 middot 256 pixel patches were randomly selected from80 photographs (15 Patches middot 80 Images Figure 1b)Four hundred were used for training the other 800were used for testing

To approximate the depth structure of naturalscenes we texture map the luminance patches ontoplanar fronto-parallel or slanted surface patches thatwere straight ahead at eye height After perspectiveprojection this procedure yields the same pattern ofretinal stimulation that would be created by viewingphotographs pasted on surfaces (eg walls) differentlyslanted in depth The disparity of the surface patch wasdefined as the disparity of the surfacersquos central point

d frac14 2 tan1 IPD=2

dfixation thorn D

tan1 IPD=2

ethdfixationTHORN

eth6THORNwhere dfixation is the fixation distance to the surfacecenter IPD is the interpupillary distance and D is thedepth Depth is given by D frac14 dsurface dfixation wheredsurface is the distance to the surface The disparity ofother points on the surface patch varies slightly acrossthe surface patch The change in disparity across thesurface is greater when the surface patches are slanted

Each surface patch was either coincident with ordisplaced from the point of fixation (Figure 1c) that isthe surface patches were positioned at one of 37 depthscorresponding to 37 disparity levels within Panumrsquosfusional range (Panum 1858) (16875 to 16875arcmin in equally spaced steps) Each surface patchsubtended 28 of visual angle from the cyclopean eyeSurfaces were then slanted (or not) and left- and right-eye projections were determined via perspective pro-jection ensuring that horizontal and vertical disparitieswere correct (Figure 1c) Next left and right projec-tions were resampled at 128 samples8 approximatelythe sampling rate of the human foveal cones (CurcioSloan Kalina amp Hendrickson 1990) Finally theimages were cropped such that they subtended 18 fromeach eye (6 058 about the left- and right-eye foveae)

Optics

Patches were defocused with a polychromatic point-spread function based on a wave-optics model of thehuman visual system The model assumed a 2 mm pupil(a size typical for a bright sunny day) (Wyszecki ampStiles 1982) human chromatic aberrations (ThibosYe Zhang amp Bradley 1992) a single refracting

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 14

surface and the Fraunhoffer approximation whichimplies that at or near the focal plane the opticaltransfer function (OTF) is given by the cross-correla-tion of the generalized pupil function with its complexconjugate In humans and nonhuman primates ac-commodative and vergence eye-movement systems aretightly coupled In other words the focus distanceusually equals the fixation distance (Fincham ampWalton 1957) We set the focus distance equal to thefixation distance of 40 cm and defocused the imagesappropriate for each disparity See Supplement forfurther details of how the optics were simulated

Sensor array responses

The information for disparity estimation depends onthe depth and luminance structure of natural scenesprojective geometry the optics of the two eyes thesensor arrays and the sensor noise These factorstogether determine the pattern of noisy sensor re-sponses

reethxTHORN frac14 IeethxTHORNpsfeethxTHORNfrac12 sampethxTHORN thorn g eth7THORNfor each eye e Ie(x) represents an eye-specific lumi-nance image of the light striking the sensor array ateach location xfrac14 (x y) in a hypothetical optical systemthat causes zero degradation in image quality Eacheyersquos optics degrade the idealized image the optics arerepresented by a polychromatic point-spread functionpsfe(x) that contains the effects of defocus chromaticaberration and photopic wavelength sensitivity Thesensor arrays are represented by a spatial samplingfunctions samp(x) Finally the sensor responses arecorrupted by noise g the noise level was set just highenough to remove retinal image detail that is unde-tectable by the human visual system (Williams 1985)(In pilot studies we found that the falloff in contrastsensitivity at low frequencies had a negligible effect onresults for simplicity we did not model its effects) Notethat Equation 7 represents a luminance (photopic)approximation of the images that would be captured bythe photoreceptors This approximation is sufficientlyaccurate for the present purposes (Burge amp Geisler2011)

Accuracy maximization analysis (AMA)

AMA is a method for dimensionality reduction thatfinds the low-dimensional set of features in sensorysignals that are most useful for performing a specifictask The dependence of AMA on task renders itdistinct from many other popular methods for dimen-sionality reduction Principal components analysis(PCA) for example finds the dimensions that account

for the most variation in the sensory signals There isof course no guarantee that factors causing the mostvariation in the sensory signals will be useful for thetask at hand Color lighting and the multitude oftextures in natural images for example cause signifi-cant variation in the retinal images Yet all of thisvariation is irrelevant for the task of estimatingdisparity Thus PCA returns features that are notnecessarily useful to the task while AMA is specificallydesigned to ignore the irrelevant variation

The logic of AMA is as follows Consider encodingeach training stimulus with a small population of filterswith known (internal) noise characteristics The filterresponses are given by the dot product of the contrast-normalized stimulus scaled by the maximum responserate plus response noise (here a small amount ofGaussian noise) With a known noise model it isstraightforward to compute the mean and variance ofeach filterrsquos response to each training stimulus Then aclosed-form expression is derived for the approximateaccuracy of the Bayesian optimal decoder (GeislerNajemnik amp Ing 2009) Finally this closed-formexpression can be used to search the space of linearfilters to find those that give the most accurateperformance If the algorithm does not settle in localminima it finds the Bayes-optimal filters for maximiz-ing performance in a given task (Geisler et al 2009)Here different random initializations yielded the samefinal estimated filters A Matlab implementation ofAMA and a short discussion of how to apply it areavailable at httpjburgecpsutexaseduresearchCodehtml

It is important to note that the Bayesian optimaldecoder used in AMA requires knowing the means andvariances for each possible stimulus and hence it canonly be used to decode the training stimuli In otherwords AMA can find the optimal linear filters given alarge enough training set but the decoder it uses tolearn those filters cannot be applied to arbitrary stimuliA separate analysis (like the one described in the bodyof the present paper) is required to determine how tooptimally decode the AMA filter responses for arbi-trary test stimuli

Estimating disparity

Increasing the number of disparity levels in thetraining set increases the accuracy of disparity estima-tion However increasing the number of disparitylevels in the training set also increases the training setsize and the computational complexity of learningfilters via AMA A balance must be struck Excellentcontinuous estimates are obtained using 1875 arcminsteps for training followed by interpolation of 577 filterresponse distributions (ie 31 interpolated response

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 15

distribution between each training step correspondingto one interpolated distribution every 35 arcsec)Interpolated distributions were obtained by fittingcubic splines through the response distribution meanvectors and covariance matrices This procedureresulted in 577 LL neurons which resulted in posteriorprobability distributions that were defined by 577discrete points MAP estimates were obtained byselecting the disparity with the highest posteriorprobability Increasing the number of interpolateddistributions (ie LL neurons) had no effect onperformance

Keywords natural scene statistics perceptual con-stancy ideal observer Bayesian statistics populationcode encoding decoding selectivity invariance depthperception stereopsis hierarchical model disparityenergy model simple cells complex cells

Acknowledgments

The authors thank Larry Cormack Bruno Olshau-sen and Jonathan Pillow for comments on an earlierdraft JB was supported by NSF grant IIS-1111328NIH grant 2R01-EY011747 and NIH Training Grant1T32-EY021462-01A1 WSG was supported by NIHgrant 2R01-EY011747

Commercial relationships noneCorresponding author Johannes BurgeEmail jburgemailcpsutexaseduAddress Center for Perceptual Systems University ofTexas at Austin Austin TX USA

References

Albrecht D G amp Geisler W S (1991) Motionselectivity and the contrast-response function ofsimple cells in the visual cortex Visual Neurosci-ence 7(6) 531ndash546

Albrecht D G amp Hamilton D B (1982) Striatecortex of monkey and cat Contrast responsefunction Journal of Neurophysiology 48(1) 217ndash237

Anzai A Ohzawa I amp Freeman R D (1999)Neural mechanisms for processing binocular in-formation I Simple cells Journal of Neurophysiol-ogy 82(2) 891ndash908

Badcock D R amp Schor C M (1985) Depth-increment detection function for individual spatialchannels Journal of the Optical Society of AmericaA 2(7) 1211ndash1215

Banks M S Gepshtein S amp Landy M S (2004)Why is spatial stereoresolution so low Journal ofNeuroscience 24(9) 2077ndash2089

Berens P Ecker A S Gerwinn S Tolias A S ampBethge M (2011) Reassessing optimal neuralpopulation codes with neurometric functionsProceedings of the National Academy of SciencesUSA 108(11) 4423

Blakemore C (1970) The range and scope ofbinocular depth discrimination in man The Journalof Physiology 211(3) 599ndash622

Burge J amp Geisler W S (2011) Optimal defocusestimation in individual natural images Proceed-ings of the National Academy of Sciences USA108(40) 16849ndash16854

Burge J amp Geisler W S (2012) Optimal defocusestimates from individual images for autofocusing adigital camera Proceedings of the ISampTSPIE 47thAnnual Meeting January 2012 Burlingame CA

Carandini M amp Heeger D J (2012) Normalizationas a canonical neural computation Nature ReviewsNeuroscience 13(1) 51ndash62

Chen Y amp Qian N (2004) A coarse-to-fine disparityenergy model with both phase-shift and position-shift receptive field mechanisms Neural Computa-tion 16(8) 1545ndash1577

Cormack L K Stevenson S B amp Schor C M(1991) Interocular correlation luminance contrastand cyclopean processing Vision Research 31(12)2195ndash2207

Cumming B G (2002) An unexpected specializationfor horizontal disparity in primate primary visualcortex Nature 418(6898) 633ndash636

Cumming B G amp DeAngelis G C (2001) Thephysiology of stereopsis Annual Review of Neuro-science 24 203ndash238

Cumming B G Shapiro S E amp Parker A J (1998)Disparity detection in anticorrelated stereogramsPerception 27(11) 1367ndash1377

Curcio C A Sloan K R Kalina R E ampHendrickson A E (1990) Human photoreceptortopography The Journal of Comparative Neurolo-gy 292(4) 497ndash523

De Valois R L Albrecht D G amp Thorell L G(1982) Spatial frequency selectivity of cells inmacaque visual cortex Vision Research 22(5) 545ndash559

DeAngelis G C Ohzawa I amp Freeman R D(1991) Depth is encoded in the visual cortex by aspecialized receptive field structure Nature352(6331) 156ndash159

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 16

Ferster D amp Miller K D (2000) Neural mechanismsof orientation selectivity in the visual cortexAnnual Review of Neuroscience 23(1) 441ndash471

Fincham E amp Walton J (1957) The reciprocalactions of accommodation and convergence TheJournal of Physiology 137(3) 488ndash508

Fleet D J Wagner H amp Heeger D J (1996)Neural encoding of binocular disparity Energymodels position shifts and phase shifts VisionResearch 36(12) 1839ndash1857

Ganguli D amp Simoncelli E P (2010) Implicitencoding of prior probabilities in optimal neuralpopulations Advances in Neural Information Pro-cessing Systems (NIPS) 23 658ndash666

Geisler W S Najemnik J amp Ing A D (2009)Optimal stimulus encoders for natural tasksJournal of Vision 9(13)17 1ndash16 httpwwwjournalofvisionorgcontent91317 10116791317 [PubMed] [Article]

Girshick A R Landy M S amp Simoncelli E P(2011) Cardinal rules Visual orientation percep-tion reflects knowledge of environmental statisticsNature Neuroscience 14(7) 926ndash932

Haefner R amp Bethge M (2010) Evaluating neuronalcodes for inference using Fisher informationAdvances in Neural Information Processing Systems23 1ndash9

Haefner R M amp Cumming B G (2008) Adaptationto natural binocular disparities in primate V1explained by a generalized energy model Neuron57(1) 147ndash158

Harris J M McKee S P amp Smallman H S (1997)Fine-scale processing in human binocular stereop-sis Journal of the Optical Society of America AOptics Image Science and Vision 14(8) 1673ndash1683

Hecht S Shlaer S amp Pirenne M H (1942) Energyquanta and vision The Journal of General Physi-ology 25(6) 819ndash840

Heeger D J (1992) Normalization of cell responses incat striate cortex Visual Neuroscience 9(2) 181ndash197

Hibbard P B (2008) Binocular energy responses tonatural images Vision Research 48(12) 1427ndash1439

Hoyer P O amp Hyvarinen A (2000) Independentcomponent analysis applied to feature extractionfrom colour and stereo images Network Compu-tation in Neural Systems 11(3) 191ndash210

Jazayeri M amp Movshon J A (2006) Optimalrepresentation of sensory information by neuralpopulations Nature Neuroscience 9(5) 690ndash696

Kaye M (1978) Stereopsis without binocular corre-lation Vision Research 18(8) 1013ndash1022

Landers D D amp Cormack L K (1997) Asymme-tries and errors in perception of depth fromdisparity suggest a multicomponent model ofdisparity processing Perception amp Psychophysics59(2) 219ndash231

Lehky S R amp Sejnowski T J (1990) Neural modelof stereoacuity and depth interpolation based on adistributed representation of stereo disparityJournal of Neuroscience 10(7) 2281ndash2299

Levi D M McKee S P amp Movshon J A (2011)Visual deficits in anisometropia Vision Research51(1) 48ndash57

Lewicki M S (2002) Efficient coding of naturalsounds Nature Neuroscience 5(4) 356ndash363

Li Z amp Atick J J (1994) Efficient stereo coding inthe multiscale representation Network Computa-tion in Neural Systems 5 157ndash174

Liu Y Bovik A C amp Cormack L K (2008)Disparity statistics in natural scenes Journal ofVision 8(11)19 1ndash14 httpwwwjournalofvisionorgcontent81119 doi10116781119 [PubMed][Article]

Ma W J Beck J M Latham P E amp Pouget A(2006) Bayesian inference with probabilistic pop-ulation codes Nature Neuroscience 9(11) 1432ndash1438

Marcos S amp Burns S A (2000) On the symmetrybetween eyes of wavefront aberration and conedirectionality Vision Research 40(18) 2437ndash2447

Marr D amp Poggio T (1976) Cooperative computa-tion of stereo disparity Science 194(4262) 283ndash287

McKee S P Levi D M amp Bowne S F (1990) Theimprecision of stereopsis Vision Research 30(11)1763ndash1779

Nakayama K amp Shimojo S (1990) Da Vincistereopsis Depth and subjective occluding contoursfrom unpaired image points Vision Research30(11) 1811ndash1825

Nienborg H Bridge H Parker A J amp CummingB G (2004) Receptive field size in V1 neuronslimits acuity for perceiving disparity modulationJournal of Neuroscience 24(9) 2065ndash2076

Ogle K N (1952) On the limits of stereoscopic visionJournal of Experimental Psychology 44(4) 253ndash259

Ohzawa I (1998) Mechanisms of stereoscopic visionThe disparity energy model Current Opinion inNeurobiology 8(4) 509ndash515

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 17

Ohzawa I DeAngelis G C amp Freeman R D(1990) Stereoscopic depth discrimination in thevisual cortex Neurons ideally suited as disparitydetectors Science 249(4972) 1037ndash1041

Ohzawa I DeAngelis G C amp Freeman R D(1997) Encoding of binocular disparity by complexcells in the catrsquos visual cortex Journal of Neuro-physiology 77(6) 2879ndash2909

Olshausen B A amp Field D J (1996) Emergence ofsimple-cell receptive field properties by learning asparse code for natural images Nature 381(6583)607ndash609

Panum P (1858) Physiologische Untersuchungen uberdas Sehen mit zwei Augen [Translation Psycholog-ical investigations on seeing with two eyes] KielSchwerssche Buchandlung

Poggio G F Gonzalez F amp Krause F (1988)Stereoscopic mechanisms in monkey visual cortexBinocular correlation and disparity selectivityJournal of Neuroscience 8(12) 4531ndash4550

Qian N (1997) Binocular disparity and the perceptionof depth Neuron 18(3) 359ndash368

Qian N amp Zhu Y (1997) Physiological computationof binocular disparity Vision Research 37(13)1811ndash1827

Read J C A amp Cumming B G (2003) Testingquantitative models of binocular disparity selectiv-ity in primary visual cortex Journal of Neurophys-iology 90(5) 2795ndash2817

Read J C A amp Cumming B G (2007) Sensors forimpossible stimuli may solve the stereo correspon-dence problem Nature Neuroscience 10(10) 1322ndash1328

Read J C A Parker A J amp Cumming B G (2002)A simple model accounts for the response ofdisparity-tuned V1 neurons to anticorrelated im-ages Visual Neuroscience 19(6) 735ndash753

Simoncelli E P amp Olshausen B A (2001) Naturalimage statistics and neural representation AnnualReview of Neuroscience 24 1193ndash1216

Skottun B C De Valois R L Grosof D HMovshon J A Albrecht D G amp Bonds A B(1991) Classifying simple and complex cells on the

basis of response modulation Vision Research31(7-8) 1079ndash1086

Stevenson S B Cormack L K Schor C M ampTyler C W (1992) Disparity tuning in mecha-nisms of human stereopsis Vision Research 32(9)1685ndash1694

Stockman A amp Sharpe L T (2000) The spectralsensitivities of the middle- and long-wavelength-sensitive cones derived from measurements inobservers of known genotype Vision Research40(13) 1711ndash1737

Tanabe S amp Cumming B G (2008) Mechanismsunderlying the transformation of disparity signalsfrom V1 to V2 in the macaque Journal ofNeuroscience 28(44) 11304ndash11314

Tanabe S Haefner R M amp Cumming B G (2011)Suppressive mechanisms in monkey V1 help tosolve the stereo correspondence problem Journal ofNeuroscience 31(22) 8295ndash8305

Thibos L N Ye M Zhang X amp Bradley A(1992) The chromatic eye A new reduced-eyemodel of ocular chromatic aberration in humansApplied Optics 31(19) 3594ndash3600

Tsao D Y Conway B R amp Livingstone M S(2003) Receptive fields of disparity-tuned simplecells in macaque V1 Neuron 38(1) 103ndash114

Tyler C W (1975) Spatial organization of binoculardisparity sensitivity Vision Research 15(5) 583ndash590

Tyler C W amp Julesz B (1978) Binocular cross-correlation in time and space Vision Research18(1) 101ndash105

Vlaskamp B N S Yoon G amp Banks M S (2011)Human stereopsis is not limited by the optics of thewell-focused eye Journal of Neuroscience 31(27)9814ndash9818

Williams D R (1985) Visibility of interference fringesnear the resolution limit Journal of the OpticalSociety of America A 2(7) 1087ndash1093

Wyszecki G amp Stiles W (1982) Color scienceConcepts and methods quantitative data and for-mulas New York John Wiley amp Sons

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 18

113 13

Optimal13 disparity13 estimation13 in13 natural13 stereo-shy‐images13 (Supplement)13

Johannes13 Burge13 amp13 Wilson13 S13 Geisler13 13 Computing13 the13 log13 likelihood13 with13 populations13 of13 simple13 and13 complex13 cells13 13 Simple13 cells13 in13 visual13 cortex13 produce13 only13 positive13 responses13 whereas13 AMA13 filters13 produce13 both13 positive13 and13 negative13 responses13 The13 linear13 AMA13 filter13 responses13 (see13 first13 term13 in13 Eq13 5)13 and13 their13 pairwise13 sums13 (see13 the13 sum13 in13 the13 third13 termrsquos13 parentheses)13 can13 be13 obtained13 from13 a13 population13 of13 simple13 cells13 We13 define13 simple13 cells13 as13 units13 whose13 outputs13 result13 from13 linear13 filtering13 followed13 by13 half-shy‐wave13 rectification13 (ie13 thresholding13 at13 zero)13 Each13 AMA13 filter13 response13 in13 the13 first13 term13 can13 be13 obtained13 from13 two13 simple13 cells13 that13 are13 lsquoonrsquo13 and13 lsquooffrsquo13 versions13 of13 each13 AMA13 filter13 (Fig13 S3a13 see13 Fig13 3)13 Each13 pairwise-shy‐summed13 AMA13 filter13 response13 in13 the13 third13 term13 can13 be13 obtained13 from13 two13 simple13 cells13 that13 are13 lsquoonrsquo13 and13 lsquooffrsquo13 versions13 of13 each13 pairwise-shy‐summed13 filter13 (Fig13 S3a13 S4)13 13 13 The13 squared13 AMA13 filter13 responses13 (see13 the13 second13 and13 third13 terms13 in13 Eq13 5)13 can13 be13 obtained13 from13 a13 population13 of13 complex13 cells13 We13 define13 complex13 cells13 as13 units13 whose13 responses13 result13 from13 linear13 filtering13 followed13 by13 a13 squaring13 non-shy‐linearity13 as13 is13 often13 the13 case13 in13 cortex13 Complex13 cells13 can13 be13 implemented13 by13 summing13 and13 then13 squaring13 (or13 squaring13 and13 then13 summing)13 the13 outputs13 of13 lsquoonrsquo13 and13 lsquooffrsquo13 simple13 cells13 (Fig13 S3b13 the13 disparity13 tuning13 curves13 of13 these13 complex13 cells13 are13 given13 in13 Fig13 S5)13 13 Finally13 neurons13 whose13 response13 rates13 represent13 the13 log13 likelihood13 of13 the13 AMA13 filter13 population13 responses13 can13 be13 constructed13 via13 a13 weighted13 sum13 of13 simple13 and13 complex13 cell13 population13 responses13 (Eq13 513 Fig13 613 Fig13 7)13 (Note13 that13 the13 likelihood13 can13 be13 obtained13 by13 exponentiating13 the13 log-shy‐likelihood13 responses)13 In13 other13 words13 a13 large13 collection13 of13 log-shy‐likelihood13 (LL)13 neurons13 each13 with13 a13 different13 preferred13 disparity13 can13 be13 constructed13 from13 a13 fixed13 set13 of13 simple13 and13 complex13 cells13 simply13 by13 changing13 the13 weights13 13 13 Derivation13 of13 optimal13 weights13 on13 simple13 and13 complex13 cells13 The13 first13 term13 in13 Eq13 513 (ie13 the13 linear13 term)13 can13 be13 conceptualized13 as13 a13 weighted13 sum13 of13 simple13 cell13 population13 responses13 The13 second13 and13 third13 terms13 (ie13 the13 quadratic13 terms)13 can13 be13 conceptualized13 as13 weighted13 sums13 of13 complex13 cell13 population13 responses13 The13 constant13 corresponds13 to13 baseline13 response13 The13 constant13 and13 the13 weights13 on13 the13 linear13 squared13 and13 sum-shy‐squared13 filters13 are13 given13 by13

wik = Ck

minus1uk 13 13 13 13 13 13 13 13 13 (S1a)13

13 wiik = minusdiag(Cminus1)+ 05Cminus11 13 13 13 13 13 13 13 (S1b)13

wij k = minus05Cij

minus1 forallij j gt i 13 13 13 13 13 13 13 13 (S1c)13

cons primetk = minus05ukTCk

minus1uk + constk 13 13 13 13 13 13 (S1d)13 13 where13 I13 is13 the13 identity13 matrix13 113 is13 the13 lsquoonesrsquo13 vector13 and13 diag()13 sets13 a13 matrix13 diagonal13 to13 a13 vector13 Here13 we13 derive13 these13 weight13 equations13 based13 on13 the13 multi-shy‐dimensional13 Gaussian13 approximations13 to13 the13 empirically13 determined13 filter13 response13 distributions13 (ie13 conditional13 response13 distributions)13 (see13 Fig13 4a)13 13 13 13 13 13

213 13

First13 we13 rewrite13 Eq13 513 in13 the13 main13 text13 by13 expanding13 its13 third13 term13 13

ln p R δ k( ) = wik Ri

i=1

n

sum + wiik Ri2

i=1

n

sum + wij k Ri2 + 2Ri Rj + Rj

2( )j=i+1

n

sumi=1

nminus1

sum + cons primet 13 (S2)13

13 Second13 we13 rewrite13 Eq13 413 in13 the13 main13 text13

13

ln p R δ k( ) = minus05 R minus uk( )T Ak R minus uk( )+ const 13 13 13 13 13 (S3)13

13 where13 Ak 13 equals13 the13 inverse13 covariance13 matrix13 Ck

minus1 13 (Recall13 that13 the13 mean13 vector13 uk and13 covariance13 matrix13 Ck are13 obtained13 by13 projecting13 a13 collection13 of13 natural13 stereo-shy‐images13 having13 disparity13 δ k 13 onto13 the13 filters)13 From13 here13 forward13 the13 disparity13 index13 k13 is13 dropped13 for13 notational13 simplicity13 Third13 we13 set13 Eqs13 S213 and13 S313 equal13 to13 one13 another13 multiply13 through13 and13 collect13 terms13 The13 terms13 of13 the13 log13 likelihood13 (Eq13 S3)13 are13 found13 to13 have13 the13 form13 13

( ) ( )2 2 205 05 2ii i i ii i ii i i ii ia R u a R a Ru a uminus minus = minus minus + 13 13 13 13 (S4a)13

minusaij Ri minus ui( ) Rj minus u j( ) = minus aijRiRj minus aijRiu j minus aijRjui + aiju jui( ) 13 13 13 (S4b)13

13 where13 iia 13 are13 the13 on-shy‐diagonal13 elements13 of13 A 13 ija are13 the13 off-shy‐diagonal13 elements13 and13 iu 13 the13 are13 the13 mean13 filter13 responses13 13 13 The13 aim13 is13 now13 to13 express13 wi wii 13 and13 wij 13 in13 terms13 of13 iu 13 iia 13 and13 ija 13 First13 consider13 wi 13 In13 this13 case13

we13 want13 to13 collect13 terms13 containing13 iR 13 wiRi 13 from13 Eq13 S213 and13 aiiRiui 13 and13 aijRiu j 13 from13 Eq13 S4a13 Setting13 the13 iR 13 terms13 equal13 and13 canceling13 gives13 13

wi = aiju jj=1

n

sum 13 13 13 13 13 13 13 13 13 (S5a)13

13 Note13 that13 the13 weights13 on13 the13 linear13 filter13 responses13 wi 13 are13 zero13 if13 the13 mean13 linear13 filter13 responses13 ui 13 are13 zero13 In13 our13 case13 the13 filter13 response13 distributions13 means13 (ie13 the13 means13 of13 the13 conditional13 response13 distributions)13 are13 all13 near13 zero13 (see13 Fig13 4a13 Fig13 S2)13 Thus13 the13 weights13 on13 the13 simple13 cell13 responses13 (Fig13 6a)13 contribute13 little13 to13 the13 units13 (LL13 neurons)13 that13 respond13 according13 to13 the13 log13 likelihood13 of13 the13 linear13 filter13 population13 response13 (see13 Discussion)13 13 Next13 consider13 wij 13 i ne j 13 In13 this13 case13 we13 want13 to13 collect13 terms13 containing13 RiRj 13 2wijRiRj 13 appears13

in13 Eq13 S213 and13 minusaijRiRj 13 appears13 in13 Eq13 S4b13 Setting13 the13 two13 RiRj 13 terms13 equal13 to13 each13 other13 and13 canceling13 gives13 13

wij = minus05aij 13 13 13 13 13 13 13 13 13 (S5b)13

313 13

Next13 consider13 wii 13 In13 this13 case13 the13 total13 weight13 on13 2iR 13 in13 Eq13 S213 must13 be13 05 iiaminus 13 The13 weight13 from13

the13 third13 term13 of13 Eq13 S213 can13 be13 found13 by13 substituting13 wij = minus05aij 13 Subtracting13 that13 weight13 from13 the13

total13 weight13 specified13 by13 Eq13 S413 gives13 wii 13 13

wii = minusaii + 05 aijjsum 13 13 13 13 13 13 13 13 13 (S5c)13

13 Last13 by13 grouping13 terms13 from13 Eq13 S413 that13 are13 independent13 of13 the13 filter13 response13 we13 find13 the13 constant13 in13 Eq13 S213 is13 given13 by13 13

cons primet = minus05 aiimicroi2 + aijmicroimicro j

j foralljneisum

⎝⎜⎞

⎠⎟isum + const 13 13 13 13 (S5d)13

13 Eqs13 S1a-shy‐d13 are13 obtained13 by13 rewriting13 Eqs13 S5a-shy‐d13 in13 matrix13 notation13 13 13 In13 visual13 cortex13 simple13 and13 complex13 cells13 have13 the13 same13 maximum13 firing13 rates13 on13 average13 (Albrecht13 amp13 Hamilton13 1982)13 However13 the13 maximum13 possible13 response13 of13 the13 squared13 filters13 (see13 the13 second13 term13 in13 Eq13 5)13 is13 R2max 13 and13 the13 maximum13 possible13 response13 of13 the13 sum-shy‐squared13 filters13 (see13 the13 third13

term13 in13 Eq13 5)13 is13 given13 by13 R2ijmax = Rmax 2+ 2fi sdot f j( )2 13 where13 fi 13 and13 f j 13 are13 the13 weighting13 functions13 defining13 filters13 i13 and13 j13 (see13 Fig13 3a)13 Thus13 if13 these13 computations13 were13 implemented13 in13 cortexmdashthat13 is13 if13 the13 maximum13 response13 of13 the13 complex13 cells13 equaled13 the13 maximum13 response13 of13 the13 simple13 cells13 Rmax 13 rather13 than13 R

2max 13 and13 R

2ijmaxmdashthe13 weights13 on13 the13 complex13 cell13 responses13 must13 be13 scaled13

Thus13 the13 optimal13 weights13 on13 complex13 cells13 are13 given13 by13 primewi = wi 13 primewii = Rmaxwii 13 and13 primewij =Rijmax2

Rmaxwij 13

These13 scale13 factors13 are13 important13 because13 in13 a13 neurophysiological13 experiment13 one13 would13 measure13 primew 13 not13 w 13 Consequently13 primewii 13 and13 primewij 13 are13 the13 normalized13 weights13 plotted13 in13 Fig13 7b13

13 Evaluating13 the13 effect13 of13 surface13 slant13 Slant13 causes13 disparity13 to13 change13 across13 the13 patch13 We13 evaluated13 the13 effect13 of13 surface13 slant13 by13 determining13 the13 optimal13 filters13 and13 estimation13 performance13 for13 a13 training13 set13 with13 a13 distribution13 of13 slants13 If13 we13 assume13 (plausibly)13 that13 objects13 are13 on13 average13 equally13 likely13 to13 be13 viewed13 from13 any13 given13 angle13 then13 for13 a13 given13 small13 angular13 retinal13 extent13 the13 probability13 of13 a13 given13 slant13 about13 the13 vertical13 axis13 should13 fall13 as13 a13 half-shy‐cosine13 function13 Slant13 was13 calculated13 for13 the13 cyclopean13 eye13 To13 evaluate13 the13 effect13 of13 surface13 slant13 variation13 we13 generated13 a13 training13 set13 of13 stereo-shy‐images13 from13 a13 set13 of13 surfaces13 with13 slants13 that13 were13 randomly13 drawn13 from13 a13 truncated13 (+7113 deg)13 cosine13 probability13 density13 (more13 extreme13 slants13 were13 not13 practical13 to13 include13 because13 of13 the13 surface13 size13 required13 to13 produce13 a13 projected13 patch13 width13 of13 113 deg)13 The13 within-shy‐patch13 changes13 in13 disparity13 that13 were13 introduced13 by13 these13 slants13 are13 comparable13 to13 those13 in13 natural13 viewing13 (Fig13 S1g)13 Results13 based13 on13 this13 analysis13 are13 thus13 likely13 to13 be13 representative13 Then13 we13 determined13 the13 optimal13 binocular13 filters13 for13 this13 new13 training13 set13 The13 filters13 that13 were13 trained13 on13 surfaces13 having13 different13 slants13 (Fig13 S1a-shy‐c)13 and13 the13 LL13 neurons13 that13 are13 constructed13 from13 them13 (Fig13 S1de)13 are13 quite13 similar13 to13 those13 that13 were13 trained13 only13 on13 fronto-shy‐parallel13 surfaces13 The13 main13 (although13 slight)13 differences13 are13 that13 the13 new13 filters13 have13 spatial13 extents13 that13 are13 slightly13 smaller13 and13 octave13 bandwidths13 that13 are13 slightly13 higher13

413 13

and13 that13 the13 new13 LL13 neuron13 tuning13 curves13 are13 broader13 Disparity13 estimation13 performance13 on13 slant13 surfaces13 is13 also13 comparable13 but13 precision13 is13 somewhat13 reduced13 (Fig13 S1f13 Fig13 5d)13 When13 disparity13 changes13 across13 the13 patch13 the13 optimal13 encoding13 filters13 must13 be13 small13 enough13 to13 prevent13 the13 disparity13 signal13 from13 being13 lsquosmeared13 outrsquo13 by13 within-shy‐patch13 (ie13 within-shy‐receptive13 field)13 disparity13 variation13 And13 they13 must13 be13 large13 enough13 to13 prevent13 a13 poor13 signal-shy‐to-shy‐noise13 ratio13 which13 would13 lead13 to13 a13 preponderance13 of13 false13 matches13 (ie13 inaccurate13 disparity13 estimates)13 The13 optimal13 filters13 appear13 to13 strike13 this13 balance13 13 13 Evaluating13 the13 effect13 of13 depth13 variations13 other13 than13 slant13 Naturally13 occurring13 fine13 depth13 structure13 and13 occlusions13 (ie13 within-shy‐patch13 depth13 variation13 and13 depth13 discontinuities13 Fig13 S8a-shy‐c)13 were13 not13 present13 in13 either13 of13 our13 training13 sets13 To13 determine13 how13 the13 magnitude13 of13 these13 other13 sources13 of13 depth13 variation13 compare13 to13 that13 due13 to13 slant13 we13 analyzed13 4000013 113 deg13 patches13 from13 3013 outdoor13 range13 images13 obtained13 with13 a13 high-shy‐precision13 Riegl13 VZ40013 range13 scanner13 For13 each13 patch13 we13 computed13 the13 standard13 deviation13 of13 i)13 the13 differences13 between13 the13 raw13 range13 values13 and13 the13 best-shy‐fitting13 fronto-shy‐parallel13 plane13

σ nonminus planar 13 and13 ii)13 the13 differences13

between13 the13 best-shy‐fitting13 slanted13 and13 fronto-shy‐parallel13 planes13 σ planar 13 The13 ratio13 of13 these13 standard13

deviations13 σ planar σ nonminus planar 13 provides13 a13 measure13 of13 the13 relative13 magnitude13 of13 depth13 variation13 due13 to13

slant13 and13 variation13 due13 to13 other13 factors13 If13 the13 depth13 variation13 is13 due13 only13 to13 surface13 slant13 then13 the13 ratio13

σ planar σ nonminus planar 13 will13 have13 a13 value13 equal13 to13 1013 If13 all13 of13 the13 depth13 variation13 is13 due13 to13 factors13

other13 than13 slant13 the13 ratio13 will13 have13 a13 value13 of13 0013 Fig13 S8d13 plots13 the13 distribution13 of13 this13 ratio13 for13 all13 range13 patches13 In13 the13 most13 common13 patches13 essentially13 all13 the13 variation13 is13 captured13 by13 slant13 (ie13 the13 modal13 ratio13 is13 ~10)13 The13 median13 ratio13 is13 07513 indicating13 that13 in13 the13 median13 patch13 7513 of13 the13 depth13 variation13 is13 captured13 by13 slant13 Given13 that13 the13 optimal13 binocular13 filters13 are13 robust13 to13 variation13 in13 surface13 slant13 and13 given13 that13 most13 of13 natural13 depth13 variation13 is13 captured13 by13 slant13 it13 is13 unlikely13 that13 depth13 variation13 other13 than13 slant13 will13 qualitatively13 affect13 our13 results13 13 External13 variability13 and13 neural13 noise13 13 We13 examined13 the13 relative13 magnitude13 of13 stimulus-shy‐induced13 LL13 neuron13 variability13 and13 typical13 neural13 noise13 in13 cortex13 Typical13 cortical13 neurons13 have13 a13 peak13 mean13 response13 larger13 than13 3013 spikessec13 (Geisler13 amp13 Albrecht13 1997)13 and13 intrinsic13 noise13 variance13 proportional13 to13 the13 neuronrsquos13 mean13 response13 with13 a13 proportionality13 constant13 (ie13 Fano13 factor)13 less13 than13 15(Tolhurst13 Movshon13 amp13 Dean13 1983)13 To13 err13 on13 the13 side13 of13 too13 much13 neural13 noise13 we13 assume13 that13 each13 LL13 neuron13 has13 a13 maximum13 average13 response13 of13 3013 spkss13 and13 a13 Fano13 factor13 of13 1513 (bc13 the13 effect13 of13 noise13 is13 greater13 at13 lower13 firing13 rates)13 We13 convert13 the13 LL13 neuron13 responses13 to13 spike13 rates13 by13 first13 obtaining13 the13 average13 response13 of13 each13 LL13 neuron13 to13 its13 preferred13 disparity13 and13 then13 scaling13 that13 value13 to13 the13 maximum13 average13 response13 For13 an13 integration13 time13 of13 a13 typical13 fixation13 duration13 (30013 ms)13 stimulus-shy‐induced13 response13 variance13 was13 ~3x13 greater13 than13 the13 internal13 noise13 For13 an13 integration13 time13 equal13 to13 a13 typical13 stimulus13 presentation13 in13 a13 psychophysical13 task13 (113 sec)13 13 stimulus-shy‐induced13 response13 variance13 was13 ~9x13 greater13 than13 the13 internal13 noise13 Thus13 external13 variability13 must13 be13 considered13 to13 understand13 the13 influence13 of13 neural13 noise13 on13 the13 behavioral13 limits13 of13 vision13 under13 natural13 conditions13 In13 fact13 stimulus-shy‐induced13 variability13 is13 likely13 to13 be13 the13 controlling13 variability13 in13 many13 natural13 single-shy‐trial13 tasks13 13 Spatial13 frequency13 tuning13 of13 optimal13 linear13 binocular13 filters13 13 To13 develop13 an13 intuition13 for13 why13 spatial13 frequencies13 higher13 than13 ~613 cpd13 carry13 little13 information13 about13 disparity13 we13 examined13 the13 disparity13 signals13 resulting13 from13 a13 binocularly-shy‐viewed13 high-shy‐contrast13 cosine-shy‐windowed13 edge13 The13 Fourier-shy‐decomposition13 of13 each13 eyersquos13 stimulus13 is13 a13 set13 of13

513 13

phase-shy‐aligned13 sinewave13 gratings13 with13 the13 1f13 contrast13 fall-shy‐off13 typical13 of13 natural13 scenes(Field13 1987)13 The13 binocular13 difference13 signal13 (ie13 the13 difference13 of13 the13 left13 and13 right13 signals)13 is13 sinusoidal13 with13 contrast13 amplitude13 given13 by13 13

AB f δ k( ) = AL( f )2 + AR( f )2 minus 2AL( f )AR( f )cos 2π fδ k( ) 13 13 13 (S6)13

13 where f 13 is13 the13 spatial13 frequency13 δ k 13 is13 a13 particular13 disparity13 AL ( f ) 13 and13 AR( f ) 13 are13 the13 left13 and13

right13 eye13 retinal13 amplitudes13 and13 AB f δ k( ) 13 is13 the13 amplitude13 of13 the13 binocular13 contrast13 difference13 signal13 If13 the13 two13 eyes13 have13 identical13 optics13 the13 left-shy‐13 and13 right-shy‐eye13 retinal13 amplitudes13 (which13 include13 the13 effects13 of13 the13 optics13 and13 the13 1 f 13 falloff13 of13 natural13 image13 spectra)13 will13 be13 the13 same13 for13 a13 given13 disparity13 Fig13 S7a13 plots13 pattern13 of13 binocular13 contrast13 signals13 for13 each13 of13 several13 disparities13 Fig13 S7b13 shows13 the13 signals13 after13 filtering13 with13 a13 bank13 of13 log-shy‐Gabor13 filters13 each13 having13 a13 1513 octave13 bandwidth13 The13 patterns13 of13 binocular13 contrast13 barely13 differ13 above13 ~613 cpd13 A13 filter13 population13 (for13 disparity13 estimation)13 will13 be13 most13 efficient13 if13 each13 filter13 signals13 information13 about13 a13 range13 of13 disparities13 13 13 V113 binocular13 receptive13 fields13 tuned13 to13 the13 highest13 useful13 spatial13 frequency13 (~613 cpd)13 would13 have13 a13 spatial13 extent13 of13 ~813 arcmin13 assuming13 the13 typical13 1513 octave13 bandwidth13 of13 neurons13 in13 cortex13 Given13 that13 receptive13 fields13 cannot13 signal13 disparity13 variation13 finer13 than13 their13 own13 width13 this13 analysis13 may13 help13 explain13 why13 humans13 lose13 the13 ability13 to13 detect13 disparity13 modulations13 (ie13 sinusoidal13 modulations13 in13 disparity-shy‐defined13 depth)13 at13 spatial13 frequencies13 higher13 than13 ~413 cpd13 (Banks13 Gepshtein13 amp13 Landy13 200413 Harris13 McKee13 amp13 Smallman13 1997)13 Vergence13 eye13 movement13 jitter13 has13 previously13 been13 proposed13 as13 an13 explanation13 for13 why13 useful13 binocular13 information13 is13 restricted13 to13 low13 spatial13 frequencies13 (Vlaskamp13 Yoon13 amp13 Banks13 2011)13 vergence13 noise13 can13 degrade13 stereo13 information13 enough13 to13 render13 high13 frequency13 signals13 unmeasurable13 (Fig13 S7cd)13 The13 analysis13 presented13 here13 suggests13 that13 the13 frequency13 at13 which13 humans13 lose13 the13 ability13 to13 detect13 disparity13 modulation13 may13 simply13 be13 explained13 by13 the13 statistics13 of13 natural13 images13 13 Optics13 The13 retinal13 images13 used13 in13 our13 analysis13 were13 simulated13 to13 match13 the13 retinal13 image13 formation13 that13 occurs13 in13 the13 human13 eye13 In13 humans13 and13 non-shy‐human13 primates13 accommodative13 and13 vergence13 eye-shy‐movement13 systems13 are13 yoked(Fincham13 amp13 Walton13 1957)13 Consistent13 with13 this13 fact13 we13 set13 the13 focus13 distance13 equal13 to13 the13 fixation13 distance13 of13 4013 cm13 and13 defocused13 the13 images13 appropriate13 for13 each13 disparity13 The13 image13 of13 a13 surface13 displaced13 from13 the13 fixation13 is13 defocused13 by13

13 ΔD cong δ IPD 13 13 13 13 13 13 13 13 13 13 (S7)13

13 where13 δ 13 is13 the13 disparity13 of13 the13 surface13 expressed13 in13 radians13 IPD13 is13 the13 inter-shy‐pupillary13 distance13 expressed13 in13 meters13 and13 ΔD 13 is13 the13 defocus13 expressed13 in13 diopters13 (1meters)13 To13 account13 for13 the13 effect13 of13 chromatic13 aberration13 we13 created13 a13 different13 polychromatic13 point-shy‐spread13 function13 (PSF)13 for13 each13 disparity13 (ie13 defocus13 Eq13 S7)13 Single-shy‐wavelength13 PSFs13 were13 computed13 every13 1013 nm13 between13 40013 and13 70013 nm13 The13 wavelength-shy‐dependent13 change13 in13 refractive13 power13 of13 the13 human13 eye13 was13 taken13 from13 the13 literature13 (Thibos13 Ye13 Zhang13 amp13 Bradley13 1992)13 Polychromatic13 PSFs13 were13 obtained13 by13 weighting13 the13 single-shy‐wavelength13 PSFs13 by13 the13 photopic13 sensitivity13 function13 (Stockman13 amp13 Sharpe13 2000)13 and13 by13 the13 D6513 daylight13 illumination13 spectrum13 and13 then13 summing13 13

613 13

13 psf photopic xΔD( ) = 1

Kpsf xλΔD( )sc

λsum λ( )D65 λ( ) 13 13 13 13 (S8)13

13 where13 sc λ( ) 13 is13 the13 human13 photopic13 sensitivity13 function13 and13 K13 is13 a13 normalizing13 constant13 that13 sets13 the13 volume13 of13 psf photopic 13 to13 1013 The13 modulation13 transfer13 function13 (MTF)13 is13 the13 amplitude13 of13 the13 Fourier13 transform13 of13 the13 PSF13 13 Contrast13 normalization13 Contrast13 normalization13 occurs13 early13 in13 visual13 processing13 The13 standard13 model13 of13 cortical13 neuron13 response13 assumes13 that13 the13 input13 is13 the13 contrast13 signal13

c x( ) = r x( )minus r( ) r 13 where13 r x( ) 13 are13 the13

noisy13 sensor13 responses13 to13 the13 luminance13 image13 falling13 on13 the13 retina13 over13 a13 local13 area13 and13 r 13 is13 the13 local13 mean13 This13 contrast13 signal13 is13 then13 normalized13 by13 the13 local13 contrast13

cnorm x( ) = c x( ) c x( ) 2

+ nc502 13 where13 n13 is13 the13 dimensionality13 of13 the13 vector13 c x( ) 2 13 is13 the13 contrast13

energy13 (or13 equivalently13 n 13 times13 the13 squared13 RMS13 contrast)13 and13 50c 13 is13 the13 half-shy‐saturation13 constant13 (Albrecht13 amp13 Geisler13 199113 Albrecht13 amp13 Hamilton13 198213 Heeger13 1992)13 (We13 are13 agnostic13 about13 the13 mechanism13 by13 which13 the13 contrast13 normalization13 is13 achieved)13 Finally13 the13 normalized13 contrast13 signal13 is13 weighted13 by13 the13 cortical13 neuronrsquos13 receptive13 field13 and13 passed13 through13 a13 static13 non-shy‐linearity13 For13 example13 in13 the13 case13 of13 a13 binocular13 simple13 cell13 the13 response13 of13 the13 neuron13 is13 obtained13 by13 taking13 the13 dot13 product13 of13 the13 contrast-shy‐normalized13 signal13 with13 the13 receptive13 field13 and13 half-shy‐wave13 rectifying13

Ri = Rmax cnorm x( ) sdot fi x( )⎢⎣ ⎥⎦ 13 13

13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13

713 13

13 References13 Albrecht13 D13 G13 amp13 Geisler13 W13 S13 (1991)13 Motion13 selectivity13 and13 the13 contrast-shy‐response13 function13 of13

simple13 cells13 in13 the13 visual13 cortex13 Visual13 Neuroscience13 7(6)13 531ndash54613 Albrecht13 D13 G13 amp13 Hamilton13 D13 B13 (1982)13 Striate13 cortex13 of13 monkey13 and13 cat13 contrast13 response13

function13 Journal13 of13 Neurophysiology13 48(1)13 217ndash23713 Banks13 M13 S13 Gepshtein13 S13 amp13 Landy13 M13 S13 (2004)13 Why13 is13 spatial13 stereoresolution13 so13 low13 The13

Journal13 of13 Neuroscience13 24(9)13 2077ndash208913 13 Field13 D13 J13 (1987)13 Relations13 between13 the13 statistics13 of13 natural13 images13 and13 the13 response13 properties13 of13

cortical13 cells13 Journal13 of13 the13 Optical13 Society13 of13 America13 A13 4(12)13 2379ndash239413 Fincham13 E13 amp13 Walton13 J13 (1957)13 The13 reciprocal13 actions13 of13 accommodation13 and13 convergence13 The13

Journal13 of13 Physiology13 137(3)13 488ndash50813 Geisler13 W13 S13 amp13 Albrecht13 D13 G13 (1997)13 Visual13 cortex13 neurons13 in13 monkeys13 and13 cats13 detection13

discrimination13 and13 identification13 Visual13 Neuroscience13 14(5)13 897ndash91913 Harris13 J13 M13 McKee13 S13 P13 amp13 Smallman13 H13 S13 (1997)13 Fine-shy‐scale13 processing13 in13 human13 binocular13

stereopsis13 Journal13 of13 the13 Optical13 Society13 of13 America13 A13 14(8)13 1673ndash168313 Heeger13 D13 J13 (1992)13 Normalization13 of13 cell13 responses13 in13 cat13 striate13 cortex13 Visual13 Neuroscience13 9(2)13

181ndash19713 Hibbard13 P13 B13 (2008)13 Binocular13 energy13 responses13 to13 natural13 images13 Vision13 Research13 48(12)13 1427ndash

143913 13 Stockman13 A13 amp13 Sharpe13 L13 T13 (2000)13 The13 spectral13 sensitivities13 of13 the13 middle-shy‐13 and13 long-shy‐wavelength-shy‐

sensitive13 cones13 derived13 from13 measurements13 in13 observers13 of13 known13 genotype13 Vision13 Research13 40(13)13 1711ndash173713

Thibos13 L13 N13 Ye13 M13 Zhang13 X13 amp13 Bradley13 A13 (1992)13 The13 chromatic13 eye13 a13 new13 reduced-shy‐eye13 model13 of13 ocular13 chromatic13 aberration13 in13 humans13 Applied13 Optics13 31(19)13 3594ndash360013

Tolhurst13 D13 J13 Movshon13 J13 A13 amp13 Dean13 A13 F13 (1983)13 The13 statistical13 reliability13 of13 signals13 in13 single13 neurons13 in13 cat13 and13 monkey13 visual13 cortex13 Vision13 Research13 23(8)13 775ndash78513

Vlaskamp13 B13 N13 S13 Yoon13 G13 amp13 Banks13 M13 S13 (2011)13 Human13 stereopsis13 is13 not13 limited13 by13 the13 optics13 of13 the13 well-shy‐focused13 eye13 The13 Journal13 of13 Neuroscience13 31(27)13 9814ndash981813

13

813 13

13 Figure13 S113 Optimal13 filters13 and13 disparity13 estimation13 performance13 for13 estimating13 disparity13 for13 surfaces13 with13 a13 cosine13 distribution13 of13 slants13 (a)13 Optimal13 linear13 binocular13 filters13 for13 estimating13 the13 disparity13 (of13 the13 center13 pixel)13 of13 slanted13 surfaces13 The13 filters13 were13 learned13 on13 a13 training13 set13 having13 surfaces13 with13 a13 cosine13 distribution13 of13 slants13 The13 optimal13 filter13 shapes13 are13 robust13 to13 the13 presence13 of13 non-shy‐zero13 surface13 slants13 in13 the13 training13 set13 However13 these13 filters13 are13 of13 somewhat13 smaller13 spatial13 extent13 and13 slightly13 higher13 octave13 bandwidth13 (compare13 filters13 F3-shy‐F513 to13 those13 in13 Fig13 3)13 (b)13 Spatial13 frequency13 tuning13 and13 bandwidth13 of13 filters13 (c)13 Two-shy‐dimensional13 analogs13 of13 filters13 in13 (a)13 The13 two-shy‐dimensional13 filters13 respond13 virtually13 identically13 to13 two-shy‐dimensional13 images13 as13 do13 the13 one-shy‐dimensional13 filters13 to13 one-shy‐dimensional13 signals13 (d)13 Tuning13 curves13 of13 LL13 neurons13 trained13 and13 tested13 on13 surfaces13 with13 varying13 surface13 slant13 (e)13 Normalized13 weights13 on13 model13 complex13 cell13 responses13 for13 constructing13 five13 LL13 neurons13 with13 the13 indicated13 preferred13 disparities13 (f)13 Disparity13 estimation13 performance13 with13 surfaces13 having13 a13 cosine13 distribution13 of13 slants13 (black13 line)13 Also13 shown13 are13 disparity13 estimates13 derived13 from13 a13 local13 cross-shy‐correlator13 (gray13 line)13 (g)13 Disparity13 gradients13 from13 viewing13 natural13 scenes(Hibbard13 2008)13 vs13 disparity13 gradients13 in13 our13 training13 set13 The13 relationship13 between13 slant13 and13 disparity13 gradient13 is13 given13 by13

g asymp IPD tanθ( ) d 13 where13 θ 13 is13 the13 surface13 slant13 and13 d13 is13 the13 distance13 to13 the13 surface13 Hibbard13 (2008)13 estimated13 disparity13 gradients13 in13 natural13 viewing13 for13 distances13 nearer13 than13 3513 m13 (dashed13 curve)13 and13 0513 m13 (dotted13 curve)13 At13 the13 04m13 viewing13 distance13 assumed13 throughout13 the13 paper13 our13 manipulation13 of13 surface13 slant13 introduced13 distribution13 of13 disparity13 gradients13 (black13 line)13 that13 is13 comparable13 to13 the13 gradients13 introduced13 in13 natural13 viewing13 13

913 13

13 Figure13 S213 Filter13 responses13 LL13 neuron13 responses13 and13 performance13 with13 random-shy‐dot13 stereograms13 and13 anti-shy‐correlated13 random-shy‐dot13 stereograms13 (a)13 Joint13 response13 distributions13 of13 filters13 F113 and13 F213 when13 stimulated13 with13 random-shy‐dot13 stereograms13 Compare13 to13 natural13 stimuli13 in13 Fig13 4a13 (b)13 Joint13 responses13 of13 filters13 F313 and13 F413 when13 stimulated13 with13 random-shy‐dot13 stereograms13 (c)13 LL13 neuron13 responses13 to13 random13 dot13 stereograms13 (d)13 LL13 neuron13 tuning13 vs13 bandwidth13 when13 stimulated13 with13 natural13 stimuli13 Note13 that13 bandwidths13 are13 elevated13 compared13 to13 when13 the13 LL13 neurons13 are13 stimulated13 with13 natural13 stimuli13 (see13 Fig13 7c)13 (e)13 Disparity13 estimates13 and13 estimate13 precision13 (inset)13 obtained13 with13 random-shy‐dot13 stereograms13 (fghi)13 Same13 as13 abce13 except13 that13 the13 stimuli13 were13 anti-shy‐correlated13 random13 dot13 stereograms13 Filter13 response13 distributions13 still13 segregate13 as13 a13 function13 of13 disparity13 (especially13 with13 F313 and13 F4)13 However13 the13 anti-shy‐correlated13 filter13 response13 distributions13 do13 not13 coincide13 with13 the13 lsquoexpectedrsquo13 pattern13 of13 response13 for13 particular13 disparities13 The13 disparities13 are13 decoded13 improperly13 with13 a13 decoder13 trained13 on13 natural13 stimuli13 LL13 neuron13 tuning13 curves13 with13 anti-shy‐correlated13 stereograms13 are13 neither13 selective13 nor13 invariant13 Inaccurate13 highly13 variable13 estimates13 result13 13 13

13 13

1013 13

13 Figure13 S313 Obtaining13 AMA13 filter13 and13 model13 complex13 cell13 responses13 with13 established13 neural13 operations13 (a)13 Simple13 cell13 responses13 are13 obtained13 by13 projecting13 the13 left-shy‐13 and13 right-shy‐eye13 signals13 onto13 each13 binocular13 filter13 and13 half-shy‐wave13 rectifying13 Subtracting13 lsquoonrsquo13 and13 lsquooffrsquo13 simple13 cell13 responses13 give13 the13 AMA13 filter13 responses13 (see13 Fig13 3a)13 (b)13 Model13 (and13 neurophysiological)13 complex13 cell13 responses13 are13 obtained13 by13 summing13 and13 squaring13 the13 responses13 of13 simple13 cells13 (Model13 complex13 cell13 responses13 can13 also13 be13 obtained13 by13 summing13 the13 responses13 of13 simple13 cells13 that13 have13 both13 a13 half-shy‐wave13 rectifying13 and13 a13 squaring13 nonlinearity)13 Note13 that13 the13 model13 complex13 cells13 differ13 from13 canonical13 disparity13 energy13 binocular13 complex13 cells13 We13 label13 them13 lsquocomplexrsquo13 because13 they13 exhibit13 temporal13 frequency13 doubling13 and13 have13 a13 fundamental13 to13 DC13 response13 ratio13 of13 less13 than13 one13 The13 model13 LL13 neurons13 (see13 Fig13 613 and13 Fig13 7)13 are13 more13 similar13 to13 disparity13 energy13 complex13 cells13 although13 as13 noted13 in13 the13 text13 there13 are13 also13 important13 differences13

13

1113 13

13 Figure13 S413 The13 eight13 AMA13 filters13 from13 Fig13 313 (on13 the13 diagonal)13 and13 their13 pairwise13 sums13 (see13 Fig13 S3a)13 Solid13 and13 dashed13 curves13 indicate13 the13 left13 and13 right13 eye13 filter13 components13 respectively13 Filter13 identity13 is13 indicated13 by13 the13 subscripts13 Fij13 Some13 filters13 have13 shapes13 similar13 to13 canonical13 V1-shy‐like13 binocular13 receptive13 fields13 These13 filters13 yield13 model13 complex13 cell13 tuning13 curves13 that13 vary13 with13 disparity13 (Fig13 S5)13 Other13 filters13 are13 irregularly13 shaped13 (eg13 F36)13 Irregularly13 shaped13 filters13 yield13 model13 complex13 cell13 tuning13 curves13 that13 do13 not13 vary13 significantly13 with13 disparity13 (Fig13 S5)13 The13 inset13 shows13 the13 largest13 absolute13 normalized13 weight13 (see13 Supplement13 Eqs13 S1a-shy‐d)13 given13 to13 each13 filter13 in13 constructing13 the13 set13 of13 LL13 neurons13 (Fig13 7a)13 Lighter13 colors13 indicate13 smaller13 weights13 Unsurprisingly13 the13 irregularly13 shaped13 filters13 (eg13 F36)13 play13 a13 generally13 smaller13 role13 in13 shaping13 the13 response13 properties13 of13 log-shy‐likelihood13 (LL)13 neurons13 13 13

1213 13

13 Figure13 S513 Disparity13 tuning13 curves13 of13 the13 model13 complex13 cells13 implied13 by13 the13 AMA13 filters13 (on-shy‐diagonal)13 and13 their13 normalized13 pairwise13 sums13 (off-shy‐diagonal)13 Solid13 curves13 show13 the13 complex13 cell13 tuning13 curves13 to13 natural13 stimuli13 The13

complex13 response13 is13 given13 by13 filtering13 the13 contrast-shy‐normalized13 stimulus13 and13 then13 squaring13 Rmax fi

Tcnorm( )2 13 (see13 Fig13 S4)13 Each13 point13 on13 the13 curve13 represents13 the13 average13 response13 to13 a13 large13 number13 of13 natural13 stimuli13 having13 the13 same13 disparity13 Gray13 area13 shows13 response13 variability13 due13 to13 variation13 in13 natural13 image13 content13 Dashed13 curves13 show13 the13 complex13 cell13 tuning13 curve13 to13 513 random-shy‐dot13 stereograms13 (RDSs)13 In13 most13 cases13 RDS13 tuning13 curves13 are13 similar13 to13 the13 tuning13 curves13 for13 natural13 stimuli13 although13 response13 magnitude13 is13 generally13 somewhat13 attenuated13 These13 individual13 model13 complex13 cells13 have13 some13 selectivity13 for13 disparity13 However13 they13 exhibit13 little13 invariance13 to13 variation13 in13 natural13 image13 content13 Thus13 the13 pattern13 of13 the13 joint13 population13 response13 must13 be13 used13 to13 accurately13 estimate13 disparity13 Neurons13 that13 signal13 the13 log-shy‐likelihood13 of13 disparity13 (LL13 neurons)13 can13 be13 constructed13 by13 appropriately13 combining13 the13 joint13 responses13 of13 model13 complex13 cells13 LL13 neurons13 are13 highly13 selective13 for13 disparity13 and13 are13 also13 strongly13 response13 invariant13 to13 irrelevant13

1313 13

13 Figure13 S613 Optimal13 disparity13 estimation13 vs13 estimation13 via13 local13 cross-shy‐correlation13 Performance13 measures13 were13 obtained13 by13 testing13 each13 method13 on13 identical13 sets13 of13 left-shy‐13 and13 right-shy‐eye13 signals13 like13 those13 shown13 in13 Fig13 1d13 (Note13 that13 this13 differs13 somewhat13 from13 the13 most13 common13 test13 of13 the13 cross-shy‐correlation13 model13 which13 gives13 the13 algorithm13 windowed13 access13 to13 many13 patches13 in13 each13 eye13 We13 chose13 the13 current13 method13 so13 that13 each13 algorithm13 has13 access13 to13 the13 same13 image13 information13 thereby13 making13 the13 comparison13 meaningful)13 Black13 curves13 indicate13 estimation13 performance13 via13 local13 cross-shy‐correlation13 Gray13 curves13 indicate13 performance13 obtained13 with13 the13 eight13 AMA13 filters13 shown13 in13 the13 main13 text13 (data13 replotted13 from13 Fig13 5)13 (a)13 Median13 MAP13 estimates13 for13 frontoparallel13 surfaces13 only13 Error13 bars13 indicate13 6813 confidence13 intervals13 (b)13 Estimate13 precision13 as13 indicated13 by13 6813 confidence13 intervals13 Error13 bars13 from13 (a)13 are13 replotted13 on13 a13 linear13 axis13 (Confidence13 intervals13 were13 plotted13 on13 a13 semi-shy‐log13 plot13 in13 the13 main13 text)13 (c)13 Sign13 identification13 performance13 with13 frontoparallel13 surfaces13 (de)13 Same13 as13 bc13 except13 that13 the13 performance13 is13 for13 surfaces13 with13 a13 cosine13 distribution13 of13 slants13 Estimation13 performance13 with13 the13 cross-shy‐correlator13 is13 (slightly)13 more13 biased13 and13 is13 significantly13 less13 precise13 at13 disparities13 larger13 than13 7513 arcmin13 13

1413 13

13 Figure13 S713 Information13 for13 disparity13 estimation13 is13 concentrated13 in13 low13 to13 mid13 spatial13 frequencies13 (a)13 Inter-shy‐ocular13 retinal13 contrast13 amplitude13 of13 a13 cosine-shy‐windowed13 high-shy‐contrast13 edge13 as13 a13 function13 of13 spatial13 frequency13 for13 different13 disparities13 (Eq13 S6)13 The13 optics13 in13 the13 two13 eyes13 were13 identical13 ie13

AL f( ) = AR f( )prop MTFδ f( ) f 13 where13 MTFδ is13 the13 modulation13 transfer13 function13 associated13 with13 a13 particular13 disparity13 (see13 Methods)13 (b)13 Inter-shy‐ocular13 contrast13 signals13 that13 have13 been13 passed13 through13 a13 bank13 of13 1513 octave13 bandwidth13 log-shy‐Gabor13 filters13 The13 signals13 differ13 little13 above13 ~513 cpd13 (c)13 The13 effect13 of13 vergence13 jitter13 on13 inter-shy‐ocular13 contrast13 signals13 Signals13 were13 corrupted13 by13 213 arcmin13 of13 vergence13 jitter13 a13 typical13 value13 in13 humans13 Vergence13 noise13 was13 simulated13 by13 convolving13 each13 disparity13 signal13 with13 a13 Gaussian13 having13 a13 standard13 deviation13 of13 213 arcmin13 Vergence13 jitter13 further13 attenuates13 high13 frequency13 inter-shy‐ocular13 contrast13 signals13 Natural13 image13 structure13 and13 vergence13 jitter13 both13 contribute13 to13 the13 lack13 of13 disparity13 information13 at13 high13 luminance13 spatial13 frequencies13 (d)13 The13 signals13 in13 (c)13 after13 having13 been13 passed13 through13 a13 bank13 of13 1513 octave13 bandwidth13 log-shy‐Gabor13 filters13 (e)13 The13 effect13 of13 a13 four13 diopter13 difference13 in13 refractive13 power13 between13 the13 two13 eyes13 drastically13 reduces13 the13 inter-shy‐ocular13 contrast13 signal13 A13 four13 diopter13 (or13 larger)13 difference13 between13 the13 left13 and13 right13 optics13 is13 a13 primary13 risk13 factor13 for13 the13 development13 of13 amblyopia13 (Levi13 et13 al13 2011)13 Note13 that13 these13 signals13 were13 simulated13 for13 eyes13 with13 2mm13 pupils13 With13 larger13 pupil13 diameters13 the13 same13 difference13 in13 refractive13 power13 would13 further13 reduce13 available13 disparity13 signals13 For13 example13 with13 a13 2mm13 pupil13 an13 813 diopter13 difference13 between13 the13 eyes13 produces13 the13 same13 difference13 in13 defocus13 blur13 as13 would13 a13 4mm13 pupil13 with13 a13 413 diopters13 difference13 between13 the13 eyes13 In13 that13 case13 there13 are13 no13 measureable13 signals13 above13 113 cpd13 (f)13 The13 signals13 in13 (e)13 after13 having13 been13 passed13 through13 a13 bank13 of13 1513 octave13 bandwidth13 log-shy‐Gabor13 filters13 (g)13 Same13 as13 (e)13 but13 with13 a13 4mm13 pupil13 (h)13 Same13 as13 (f)13 but13 with13 a13 4mm13 pupil13 13

13

1513 13

13 Figure13 S813 Depth13 structure13 not13 attributable13 to13 slant13 in13 natural13 scenes13 (a)13 A13 representative13 natural13 range13 image13 (b)13 Overhead13 plan13 view13 of13 the13 range13 along13 the13 white13 scan-shy‐line13 in13 (a)13 (c)13 Range13 values13 within13 a13 113 deg13 patch13 marked13 by13 the13 white13 box13 in13 (a)13 and13 the13 black13 box13 in13 (b)13 The13 dots13 represent13 the13 raw13 range13 values13 The13 plane13 is13 the13 best13 fitting13 plane13 to13 the13 range13 data13 (d)13 Depth13 variation13 in13 113 deg13 patches13 of13 natural13 scenes13 can13 largely13 be13 captured13 by13 slanted13 planes13 (see13 supplementary13 text)13 Values13 near13 one13 indicate13 that13 all13 range13 variation13 can13 be13 captured13 by13 (planar)13 surface13 slant13 Values13 near13 zero13 indicate13 that13 none13 of13 the13 range13 variation13 can13 be13 captured13 by13 planar13 surface13 slant13 In13 the13 most13 common13 patch13 nearly13 all13 depth13 variation13 is13 attributable13 to13 slant13 In13 the13 median13 patch13 7513 of13 depth13 variation13 is13 captured13 by13 slant13 13 13 13 13 13 13 13 13 13 13

1613 13

13 Figure13 S913 Contrast13 normalization13 and13 the13 Gaussian13 form13 of13 the13 filter13 response13 distributions13 The13 standard13 model13 of13

contrast13 normalization13 in13 retinal13 and13 cortical13 neurons13 is13 given13 by13 cnorm x( ) = c x( ) c x( ) 2

+ nc502 (Albrecht13 amp13 Geisler13

199113 Albrecht13 amp13 Hamilton13 198213 Heeger13 1992)13 Throughout13 the13 paper13 we13 assumed13 that13 c50 13 had13 a13 value13 of13 zero13 To13 examine13 the13 effect13 of13 different13 values13 of13 c50 13 on13 the13 form13 of13 the13 filter13 response13 distributions13 we13 first13 contrast-shy‐normalized13 the13 test13 stimuli13 using13 different13 values13 of13 c50 13 Then13 we13 projected13 the13 contrast-shy‐normalized13 the13 test13 stimuli13 from13 each13 disparity13 level13 onto13 the13 filters13 to13 obtain13 the13 conditional13 filter13 response13 distributions13 Finally13 we13 fit13 a13 multi-shy‐dimensional13 generalized13 Gaussian13 (one-shy‐dimension13 for13 each13 filter)13 to13 each13 conditional13 response13 distribution13

P R |δ k( ) 13 via13 maximum13 likelihood13 estimation13 (a)13 When13 c50 =13 0013 the13 conditional13 response13 distributions13 have13 tails13 that13 tend13 to13 be13 somewhat13 lighter13 than13 Gaussians13 (ie13 lower13 kurtosis13 than13 a13 Gaussian)13 When13 c50 =13 0113 the13 conditional13 response13 distributions13 are13 most13 Gaussian13 on13 average13 For13 c50 0113 the13 distributions13 have13 heavier13 tails13 than13 Gaussians13 (ie13 higher13 kurtosis13 than13 a13 Gaussian)13 (b)13 Powers13 of13 the13 multi-shy‐dimensional13 generalized13 Gaussians13 that13 best13 fit13 a13 subset13 of13 the13 two-shy‐dimensional13 conditional13 filter13 response13 distributions13 A13 power13 of13 two13 (dashed13 line)13 indicates13 that13 the13 best13 fitting13 distribution13 was13 Gaussian13 Powers13 greater13 than13 or13 less13 than13 two13 indicate13 that13 the13 responses13 are13 better13 fit13 by13 distributions13 having13 lighter13 or13 heavier13 tails13 than13 a13 Gaussian13 respectively13 The13 best-shy‐fitting13 powers13 are13 shown13 for13 three13 different13 values13 of13 c50 13 00013 01013 and13 02513 The13 average13 powers13 across13 the13 conditional13 response13 distributions13 for13 the13 three13 tested13 values13 of13 c50 13 were13 27913 20713 and13 11713 respectively13 The13 same13 qualitative13 results13 hold13 for13 the13 other13 two-shy‐dimensional13 response13 distributions13 The13 pattern13 also13 holds13 for13 higher-shy‐dimensional13 filter13 response13 distributions13 In13 higher13 dimensions13 however13 results13 are13 noisier13 because13 stable13 estimates13 of13 kurtosis13 require13 significantly13 more13 data13 13 13 13 13 13

  • jovi-14-02-01_118
    • Introduction
    • f01
    • Methods and results
    • e01
    • f02
    • e02
    • e03
    • e04
    • f03
    • e05
    • f04
    • f05
    • f06
    • f07
    • Discussion
    • Conclusions
    • Methods details
    • e06
    • e07
    • Albrecht1
    • Albrecht2
    • Anzai1
    • Badcock1
    • Banks1
    • Berens1
    • Blakemore1
    • Burge1
    • Burge2
    • Carandini1
    • Chen1
    • Cormack1
    • Cumming1
    • Cumming2
    • Cumming3
    • Curcio1
    • DeValois1
    • DeAngelis1
    • Ferster1
    • Fincham1
    • Fleet1
    • Ganguli1
    • Geisler1
    • Girshick1
    • Haefner1
    • Haefner2
    • Harris1
    • Hecht1
    • Heeger1
    • Hibbard1
    • Hoyer1
    • Jazayeri1
    • Kaye1
    • Landers1
    • Lehky1
    • Levi1
    • Lewicki1
    • Li1
    • Liu1
    • Ma1
    • Marcos1
    • Marr1
    • McKee1
    • Nakayama1
    • Nienborg1
    • Ogle1
    • Ohzawa1
    • Ohzawa2
    • Ohzawa3
    • Olshausen1
    • Panum1
    • Poggio1
    • Qian1
    • Qian2
    • Read1
    • Read2
    • Read3
    • Simoncelli1
    • Skottun1
    • Stevenson1
    • Stockman1
    • Tanabe1
    • Tanabe2
    • Thibos1
    • Tsao1
    • Tyler1
    • Tyler2
    • Vlaskamp1
    • Williams1
    • Wyszecki1
      • JOV-03741-2013-s01
Page 9: Optimal disparity estimation in natural stereo images · The local differences between the retinal images—the binocular disparities— ... vertically oriented receptive fields.The

classic push-pull relationship (Ferster amp Miller 2000)(Figure 7b) consistent with the fact that disparityselective neurons in visual cortex contain both excit-atory and suppressive subunits (Tanabe et al 2011)

Sixth the LL neuron disparity tuning curves areapproximately log-Gaussian in shape (ie Gaussian ona log-disparity axis) Because their standard deviationsare approximately constant on a log disparity axistheir disparity bandwidths increase linearly with thepreferred disparity (Figure 7c) In this respect many

cortical neurons behave like LL neurons the disparitytuning functions of cortical neurons typically havemore low frequency power than is predicted from thestandard energy model (Ohzawa DeAngelis amp Free-man 1997 Read amp Cumming 2003) Additionallypsychophysically estimated disparity channels in hu-mans exhibit a similar relationship between bandwidthand preferred disparity (Stevenson et al 1992) Humandisparity channels however differ in that they haveinhibitory side-lobes (Stevenson et al 1992) that is

Figure 6 Biologically plausible implementation of selective invariant tuning processing schematics and disparity tuning curves for an

AMA filter a model complex cell and a model log-likelihood (LL) neuron For comparison a disparity energy unit is also presented In

all cases the inputs are contrast normalized photoreceptor responses Disparity tuning curves show the mean response of each filter

type across many natural image patches having the same disparity for many different disparities Shaded areas show response

variation due to variation in irrelevant features in the natural patches (not neural noise) Selectivity for disparity and invariance to

irrelevant features (external variation) increase as processing proceeds (a) The filter response is obtained by linearly filtering the

contrast normalized input signal with the AMA filter (b) The model complex cell response is obtained by squaring the linear AMA

filter response (c) The response of an LL neuron with preferred disparity dk is obtained by a weighted sum of linear and squared

filter responses The weights can be positiveexcitatory or negativeinhibitory (see Figure 7) The weights for an LL neuron with a

particular preferred disparity are specified by the filter response distribution to natural images having that disparity (Figure 4a

Equations 4 5 S1ndash 4) In disparity estimation the filter responses specify that the weights on the linear filter responses are near zero

(see Supplement) (d) A standard disparity energy unit is obtained by simply summing the squared responses of two binocular linear

filters that are in quadrature (908 out of phase with each other) Here we show the tuning curve of a disparity energy unit having

binocular linear filters (subunits) with left and right-eye components that are also 908 out of phase with each other (ie each

binocular subunit is selective for a nonzero disparity)

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 9

they have the center-surround organization that is ahallmark of retinal ganglion cell LGN and V1receptive fields Understanding the basis of this center-surround organization is an important direction forfuture work

V1 binocular neurons are unlikely to have receptivefields exactly matching those of the simple and complexcells implied by the optimal AMA filters but V1neurons are likely to span the subspace spanned by theoptimal filters It is thus plausible that excitatory andinhibitory synaptic weights could develop so that asubset of the neurons in V1 or in other cortical areassignal the log likelihood of different specific disparities(see Figure 7b) Indeed some cells in cortical areas V1V2 and V3V3a exhibit sharp tuning to the disparity ofrandom dot stimuli (Cumming 2002 Ohzawa et al1997 G F Poggio et al 1988 Read amp Cumming2003)

Computational models of estimation from neuralpopulations often rely on the assumption that eachneuron is invariant and unimodally tuned to thestimulus property of interest (Girshick et al 2011Jazayeri amp Movshon 2006 Lehky amp Sejnowski 1990Ma et al 2006) However it is often not discussed howinvariant unimodal tuning arises For example thebinocular neurons (ie complex cells) predicted by thestandard disparity energy model do not generallyexhibit invariant unimodal tuning (Figure 6d) Ouranalysis shows that neurons with unimodal tuning tostimulus properties not trivially available in the retinal

images (eg disparity) can result from appropriatelinear combination of nonlinear filter responses

To obtain optimal disparity estimates the LLneuron population response (represented by the vectorRLL in Figure 2c) must be read out (see Figure 2d)The optimal read-out rule depends on the observerrsquosgoal (the cost function) A common goal is to pick thedisparity having the maximum a posteriori probabil-ity If the prior probability of the different possibledisparities is uniform then the optimal MAP decodingrule reduces to finding the LL neuron with themaximum response Nonuniform prior probabilitiescan be taken into account by adding a disparity-dependent constant to each LL neuron responsebefore finding the peak response There are elegantproposals for how the peak of a population responsecan be computed in noisy neural systems (Jazayeri ampMovshon 2006 Ma et al 2006) Other commonlyassumed cost functions (eg MMSE) yield similarperformance

In sum our analysis has several implications First itsuggests that optimal disparity estimation is bestunderstood in the context of a population codeSecond it shows how to linearly sum nonlinear neuralresponses to construct cells with invariant unimodaltuning curves Third it suggests that the eclecticmixture of binocular receptive field properties in cortexmay play a functional role in disparity estimationFourth it provides a principled hypothesis for howneurons may compute the posterior probabilities of

Figure 7 Constructing selective invariant disparity-tuned units (LL neurons) (a) Tuning curves for several LL neurons each with a

different preferred disparity Each point on the tuning curve represents the average response across a collection of natural stereo-

images having the same disparity Gray areas indicate 61 SD of response due to stimulus-induced response variability (b)

Normalized weights on model complex cell responses (see Supplement Equation 5 Figure 6c) for constructing the five LL neurons

marked with arrows in (a) Positive weights are excitatory (red) Negative weights are inhibitory (blue) On-diagonal weights

correspond to model complex cells having linear receptive fields like the filters in Figure 3a Off-diagonal weights correspond to model

complex cells having linear receptive fields like the scaled pairwise sums of the AMA filters (see Figures S3ndashS5) High spatial

frequencies are not useful for estimating large disparities (Figure S7) Thus the number of strongly weighted complex cells decreases

as the magnitude of the preferred disparity increases from zero (c) LL neuron bandwidth (ie full-width at half-height of disparity

tuning curves) as a function of preferred disparity Bandwidth increases approximately linearly with tuning

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 10

stimulus properties (eg disparity defocus motion)not trivially available in the retinal image(s) Thus ouranalysis provides a recipe for how to increase selectivityand invariance in the visual processing stream invari-ance to image content variability and selectivity for thestimulus property of interest

Discussion

Using principles of Bayesian statistical decisiontheory we showed how to optimally estimate binoc-ular disparity in natural stereo images given theoptical sensor and noise properties of the humanvisual system First we determined the linear binoc-ular filters that encode the retinal image informationmost useful for estimating disparity in natural stereo-images The optimal linear filters share many proper-ties with binocular simple cells in the primary visualcortex of monkeys and cats (Figure 3 Figure S1)Then we determined how the optimal linear filterresponses should be nonlinearly decoded to maximizethe accuracy of disparity estimation The optimaldecoder was not based on prior assumptions butrather was dictated by the statistical pattern of thejoint filter responses to natural stereo images Overallperformance was excellent and matched importantaspects of human performance (Figure 5) Finally weshowed that the optimal encoder and decoder can beimplemented with well-established neural operationsThe operations show how selectivity and invariancecan emerge along the visual processing stream(Figures 6 amp 7)

External variability and neural noise

A visual systemrsquos performance is limited by bothexternal (stimulus-induced) variability and intrinsicneural response variability (ie neural noise) Manytheoretical studies have focused on the impact ofneural noise on neural computation (Berens EckerGerwinn Tolias amp Bethge 2011 Cumming ampDeAngelis 2001 DeAngelis et al 1991 Jazayeri ampMovshon 2006 Ma et al 2006 Nienborg et al 2004Ohzawa DeAngelis amp Freeman 1990) However inmany real-world situations stimulus-induced responsevariability (ie variability due to nuisance stimulusproperties) is a major or the dominant source ofvariability Our analysis demonstrates that the exter-nal variation in natural image contentmdashthat isvariation in image features irrelevant to the taskmdashconstitutes an important source of response variabilityin disparity estimation (see Figures 4 5 7) Thus evenin a hypothetical visual system composed of noiseless

neurons significant response variability will occurwhen estimating disparity under natural conditions(see Figures 4 and 5)

Determining the best performance possible withoutneural noise is essential for understanding the effect ofneural noise (Haefner amp Bethge 2010 Lewicki 2002Li amp Atick 1994 Olshausen amp Field 1996 Simoncelliamp Olshausen 2001) because it sets bounds on theamount of neural noise that a system can toleratebefore performance significantly deteriorates Indeedthe impact of neural noise on encoding and decoding innatural viewing cannot be evaluated without consider-ing the impact of stimulus-induced response variabilityon the population response (see Supplement)

Seventy years ago Hecht Shlaer and Pirenne (1942)showed in a landmark study that dark-adapted humanscan detect light in the periphery from the absorption ofonly five to eight quanta and that external variability(in the form of photon noise) was the primary source ofvariability in human performance This result contra-dicted the notion (widespread at the time) that responsevariability is due primarily to variability within theorganism (eg intrinsic noise) Similarly our resultsunderscore the importance of analyzing the variabilityof natural signals when analyzing the responsevariability of neural circuits that underlie performancein critical behavioral tasks

Comparison with the disparity energy model

The most common neurophysiological model ofdisparity processing is the disparity energy modelwhich aims to account for the response properties ofbinocular complex cells (Cumming amp DeAngelis 2001DeAngelis et al 1991 Ohzawa 1998 Ohzawa et al1990) The model had remarkable initial successHowever an increasing number of discrepancies haveemerged between the neurophysiology and the modelrsquospredictions The standard disparity energy model forexample does not predict the reduced response toanticorrelated stereograms observed in cortex It alsodoes not predict the dependence of disparity selectivecells on more than two binocular subunits Fixes to thestandard model have been proposed to account for theobserved binocular response properties (Haefner ampCumming 2008 Qian amp Zhu 1997 Read Parker ampCumming 2002 Tanabe amp Cumming 2008) We haveshown however that a number of these properties area natural consequence of an optimal algorithm fordisparity estimation

The most common computational model of disparityprocessing is the local cross-correlation model (Bankset al 2004 Cormack et al 1991 Tyler amp Julesz 1978)In this model the estimated disparity is the one thatproduces the maximum local cross-correlation between

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 11

the images in the two eyes Local cross-correlation isthe computational equivalent of appropriately pro-cessing large populations of disparity-energy complexcells having a wide range of tuning characteristics(Anzai Ohzawa amp Freeman 1999 Fleet Wagner ampHeeger 1996) Cross-correlation is optimal when thedisparities across a patch are small and constantbecause it is then akin to the simple template matchingof noise-limited ideal observers Cross-correlation isnot optimal for larger or varying disparities becausethen the left- and right-eye signals falling within abinocular receptive field often differ substantially

How does our ideal observer for disparity estimationcompare with the standard disparity-energy and cross-correlation models To enable a meaningful compari-son both methods were given access to the same imageinformation Doing so requires a slight modification tothe standard cross correlation algorithm (seeSupplement) The eight AMA filters in Figure 3outperform cross-correlation in both accuracy andprecision when the disparities exceed approximately75 arcmin (Figure S6) This performance increase isimpressive given that local cross-correlation has (ineffect) access to a large bank of filters whereas ourmethod uses only eight Furthermore increasing thenumber of AMA filters to 10ndash12 gives essentiallyequivalent performance to cross-correlation for smallerdisparities

Spatial frequency tuning of optimal linearbinocular filters

The optimal linear binocular filters (the AMA filters)are selective for low to mid spatial frequencies (12ndash47c8) and not for higher frequencies (see Figure 3 FigureS1) To develop an intuition for why we examined thebinocular contrast signals as a function of frequencythat result from a binocularly viewed high-contrastedge (Figure S7) The binocular contrast signals barelydiffer above 6 c8 indicating that higher spatialfrequencies carry little information for disparity esti-mation

This analysis of binocular contrast signals providesinsight into another puzzling aspect of disparityprocessing Human sensitivity to disparity modulation(ie sinusoidal modulations in disparity-defined depth)cuts off at very low frequencies (4 c8) (Banks et al2004 Tyler 1975) V1 binocular receptive fields tunedto the highest useful (luminance) spatial frequency (6c8) have widths of 8 arcmin assuming the typicaloctave bandwidth for cortical neurons Given thatreceptive fields cannot signal variation finer than theirown size the fact that humans cannot see disparitymodulations higher than 4 c8 is nicely predicted bythe 8 arcmin width suggested by the present analysis

This size matches previous psychophysically basedestimates of the smallest useful mechanism in disparityestimation (Banks et al 2004 Harris McKee ampSmallman 1997)

Eye movement jitter has previously been proposed asan explanation for why useful binocular information isrestricted to low spatial frequencies (Vlaskamp Yoonamp Banks 2011) The argument is that jitter and therelatively long integration time of the stereo systemcould lsquolsquosmear outrsquorsquo the disparity signals Eye movementjitter certainly can degrade stereo information enoughto render high frequency signals unmeasurable (FigureS7c d) However the analysis in the supplement(Figure S7) suggests that the low frequency at whichhumans lose the ability to detect disparity modulationmay also be explained by the statistics of naturalimages

Generality of findings

Although our analysis is based on calibrated naturalstereo-image patches and realistic characterizations ofhuman optics sensors and neural noise there are somesimplifying assumptions that could potentially limit thegenerality of our conclusions Here we examine theeffect of these assumptions

The effect of surface slant and nonplanar depth structure

The stereo-image patches were projections offronto-parallel surfaces In natural scenes surfaces areoften slanted which causes the disparity to changeacross a patch We evaluated the effect of surface slantby repeating our analysis with a training set having adistribution of slants (see Figure 5d e andSupplement) This distribution of slants produced adistribution of disparity gradients that are comparableto those that occur when viewing natural scenes(Hibbard 2008) (Figure S1g) The optimal binocularfilters are only very slightly affected by surface slant(cf Figure 3 and Figure S1a) These differenceswould be difficult to detect in neurophysiologicalexperiments This may help explain why there hasbeen little success in finding neurons in early visualcortex that are tuned to nonzero slants (Nienborg etal 2004) Disparity estimation performance is alsoquite similar although estimate precision is somewhatreduced (Figure 5d e)

Even after including the effect of surface slant ourtraining set lacks the non-planar depth structuremdashwithin-patch depth variation and depth discontinuities(occlusions)mdashthat is present in natural scenes (FigureS8andashc) To determine how the magnitude of theseother sources depth variation compare to that due toslant we analyzed a set of range images obtained with

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 12

a high-precision Riegl VZ400 range scanner (seeSupplement) We find that most of the depth variationin the range images is captured by variation in slant(Figure S8d) Given that variation in slant has littleeffect on the optimal receptive fields and given thatlocations of fine depth structure and monocularocclusion zones (9 of points in our dataset) aremost likely random with respect to the receptive fieldsthen it is likely that the other sources of variation willhave little effect

Nevertheless occlusions create monocular zones thatcarry information supporting da Vinci stereopsis inhumans (Kaye 1978 Nakayama amp Shimojo 1990)Neurophysiological evidence suggests that binocularmechanisms exist for locating occlusion boundaries(Tsao Conway amp Livingstone 2003) In the futurewhen databases become available of high-qualitynatural stereo images with coregistered RGB anddistance information the general approach describedhere may prove useful for determining the filters thatare optimal for detecting occlusion boundaries

The effect of a realistic disparity prior

Our analysis assumed a flat prior probabilitydistribution over disparity Two groups have recentlypublished estimates of the prior distribution ofdisparities based on range measurements in naturalscenes and human eye movement statistics (Hibbard2008 Liu et al 2008) Does the prior distribution ofdisparity signals have a significant effect on the optimalfilters and estimation performance To check wemodified the frequency of different disparities in ourtraining set to match the prior distribution ofdisparities encountered in natural viewing (Liu et al2008) Linear filter shapes LL neuron tuning curveshapes and performance levels were robust to differ-ences between a flat and realistic disparity prior

It has been hypothesized that the prior over thestimulus dimension of interest (eg disparity) maydetermine the optimal tuning curve shapes and theoptimal distribution of peak tunings (ie how thetuning curves should tile the stimulus dimension)(Ganguli amp Simoncelli 2010) In evaluating thishypothesis one must consider the results presented inthe present paper Our results show that natural signalsand the linear receptive fields that filter those signalsplace strong constraints on the shapes that tuningcurves can have Specifically the shapes of the LLneuron disparity tuning curve (approximately log-Gaussian) are robust to changes in the prior Thusalthough the prior may influence the optimal distribu-tion of peak tuning (our analysis does not address thisissue) it is unlikely to be the sole (or even the primary)determinant of tuning curve shapes

The effect of contrast normalization

Contrast normalization significantly contributes tothe Gaussian form of the filter response distributionsOne potential advantage of Gaussian response distri-butions is that pair-wise (and lower order) statisticsfully characterize the joint responses from an arbitrarynumber of filters making possible the decoding of largefilter population responses In the analysis presentedhere all binocular signals were normalized to a mean ofzero and a vector magnitude of 10 before beingprojected onto the binocular filters This is a simplifiedform of the contrast normalization that is a ubiquitousfeature of retinal and early cortical processing (Car-andini amp Heeger 2012) The standard model ofcontrast normalization in cortical neurons is given by

cnorm(x)frac14 c(x)ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffijjcethxTHORNjj2 thorn nc2

50

qwhere n is the dimen-

sionality of the vector and c50 is the half-saturationconstant (Albrecht amp Geisler 1991 Albrecht ampHamilton 1982 Heeger 1992) (The half-saturationconstant is so-called because in a neuron with anoutput squaring nonlinearity the response rate willequal half its maximum when the contrast of thestimulus equals the value of c50)

We examined the effect of different half-saturationconstants The optimal filters are robust to differentvalues of c50 The conditional response distributions arenot For large values of c50 the response distributionshave tails much heavier than Gaussians When c50frac14 00(as it did throughout the paper) the distributions arewell approximated but somewhat lighter-tailed thanGaussians The distributions (eg see Figure 4a) aremost Gaussian on average when c50 frac14 01 (Figure S9)On the basis of this finding we hypothesize a newfunction for cortical contrast normalization in additionto the many already proposed (Carandini amp Heeger2012) Contrast normalization may help create condi-tional filter response distributions that are Gaussianthereby making simpler the encoding and decoding ofhigh-dimensional subspaces of retinal image informa-tion

The effect of different optics in the two eyes

We modeled the optics of the two eyes as beingidentical whereas refractive power and monochromaticaberrations often differ somewhat between the eyes(Marcos amp Burns 2000) To evaluate the effect ofnormal optical variations we convolved the retinalprojections with point-spread functions previouslymeasured in the first authorrsquos left and right eyes (Burgeamp Geisler 2011) (The first author has normal opticalvariation) Then we determined optimal filters andperformance Filters and performance levels are similarto those in Figures 3 and 4 Thus our results are robustto typical optical differences between the eyes How-

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 13

ever if the optics in one eye is severely degradedrelative to the other then the optimal filters are quitedifferent and disparity estimation performance issignificantly poorer

Interestingly a persistent 4 diopter difference inrefractive power between the two eyes is the mostimportant risk factor for the development of amblyopia(Levi McKee amp Movshon 2011) a condition char-acterized by extremely poor or absent stereopsis(among other deficits) A four diopter differencebetween the left and right eye optics drastically reduces(ie nearly eliminates) computable interocular contrastsignals that covary systematically with disparity(Equation S6 Figure S7endashh)

Conclusions

The method presented here provides a prescriptionfor how to optimally estimate local depth cues fromimages captured by the photosensors in any visionsystem Our analysis of the depth cue of binoculardisparity provides insight into the computations andneural populations required for accurate estimation ofbinocular disparity in animals viewing natural scenesSpecifically by analyzing the information available atthe photoreceptors we improve upon existing compu-tational methods for solving the stereo correspondenceproblem and provide a normative account of a range ofestablished neurophysiological and psychophysicalfindings This study demonstrates the power ofcharacterizing the properties of natural signals that arerelevant for performing specific natural tasks A similarrecent analysis provided insight into the neuralcomputations required for accurate estimation of thefocus error (defocus) in local regions of retinal imagesof natural scenes (Burge amp Geisler 2011 Stevenson etal 1992) The same approach seems poised to provideinsight into the neural encoding and decoding thatunderlie many other fundamental sensory and percep-tual tasks

Methods details

Natural disparity signals

To determine the luminance structure of naturalscenes we photographed natural scenes on and aroundthe University of Texas at Austin campus with a tripod-mounted Nikon D700 14-bit SLR camera (4256 middot 2836pixels) fitted with a Sigma 50 mm prime lens (Burge ampGeisler 2011) To ensure sharp photographs thecamera lens was focused on optical infinity and all

imaged objects were at least 16 m from the cameraRAW photographs were converted to luminance valuesusing the measured sensitivities of the camera (Burge ampGeisler 2011 2012) and the human photopic sensitivityfunction (Stockman amp Sharpe 2000) Twelve hundred256 middot 256 pixel patches were randomly selected from80 photographs (15 Patches middot 80 Images Figure 1b)Four hundred were used for training the other 800were used for testing

To approximate the depth structure of naturalscenes we texture map the luminance patches ontoplanar fronto-parallel or slanted surface patches thatwere straight ahead at eye height After perspectiveprojection this procedure yields the same pattern ofretinal stimulation that would be created by viewingphotographs pasted on surfaces (eg walls) differentlyslanted in depth The disparity of the surface patch wasdefined as the disparity of the surfacersquos central point

d frac14 2 tan1 IPD=2

dfixation thorn D

tan1 IPD=2

ethdfixationTHORN

eth6THORNwhere dfixation is the fixation distance to the surfacecenter IPD is the interpupillary distance and D is thedepth Depth is given by D frac14 dsurface dfixation wheredsurface is the distance to the surface The disparity ofother points on the surface patch varies slightly acrossthe surface patch The change in disparity across thesurface is greater when the surface patches are slanted

Each surface patch was either coincident with ordisplaced from the point of fixation (Figure 1c) that isthe surface patches were positioned at one of 37 depthscorresponding to 37 disparity levels within Panumrsquosfusional range (Panum 1858) (16875 to 16875arcmin in equally spaced steps) Each surface patchsubtended 28 of visual angle from the cyclopean eyeSurfaces were then slanted (or not) and left- and right-eye projections were determined via perspective pro-jection ensuring that horizontal and vertical disparitieswere correct (Figure 1c) Next left and right projec-tions were resampled at 128 samples8 approximatelythe sampling rate of the human foveal cones (CurcioSloan Kalina amp Hendrickson 1990) Finally theimages were cropped such that they subtended 18 fromeach eye (6 058 about the left- and right-eye foveae)

Optics

Patches were defocused with a polychromatic point-spread function based on a wave-optics model of thehuman visual system The model assumed a 2 mm pupil(a size typical for a bright sunny day) (Wyszecki ampStiles 1982) human chromatic aberrations (ThibosYe Zhang amp Bradley 1992) a single refracting

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 14

surface and the Fraunhoffer approximation whichimplies that at or near the focal plane the opticaltransfer function (OTF) is given by the cross-correla-tion of the generalized pupil function with its complexconjugate In humans and nonhuman primates ac-commodative and vergence eye-movement systems aretightly coupled In other words the focus distanceusually equals the fixation distance (Fincham ampWalton 1957) We set the focus distance equal to thefixation distance of 40 cm and defocused the imagesappropriate for each disparity See Supplement forfurther details of how the optics were simulated

Sensor array responses

The information for disparity estimation depends onthe depth and luminance structure of natural scenesprojective geometry the optics of the two eyes thesensor arrays and the sensor noise These factorstogether determine the pattern of noisy sensor re-sponses

reethxTHORN frac14 IeethxTHORNpsfeethxTHORNfrac12 sampethxTHORN thorn g eth7THORNfor each eye e Ie(x) represents an eye-specific lumi-nance image of the light striking the sensor array ateach location xfrac14 (x y) in a hypothetical optical systemthat causes zero degradation in image quality Eacheyersquos optics degrade the idealized image the optics arerepresented by a polychromatic point-spread functionpsfe(x) that contains the effects of defocus chromaticaberration and photopic wavelength sensitivity Thesensor arrays are represented by a spatial samplingfunctions samp(x) Finally the sensor responses arecorrupted by noise g the noise level was set just highenough to remove retinal image detail that is unde-tectable by the human visual system (Williams 1985)(In pilot studies we found that the falloff in contrastsensitivity at low frequencies had a negligible effect onresults for simplicity we did not model its effects) Notethat Equation 7 represents a luminance (photopic)approximation of the images that would be captured bythe photoreceptors This approximation is sufficientlyaccurate for the present purposes (Burge amp Geisler2011)

Accuracy maximization analysis (AMA)

AMA is a method for dimensionality reduction thatfinds the low-dimensional set of features in sensorysignals that are most useful for performing a specifictask The dependence of AMA on task renders itdistinct from many other popular methods for dimen-sionality reduction Principal components analysis(PCA) for example finds the dimensions that account

for the most variation in the sensory signals There isof course no guarantee that factors causing the mostvariation in the sensory signals will be useful for thetask at hand Color lighting and the multitude oftextures in natural images for example cause signifi-cant variation in the retinal images Yet all of thisvariation is irrelevant for the task of estimatingdisparity Thus PCA returns features that are notnecessarily useful to the task while AMA is specificallydesigned to ignore the irrelevant variation

The logic of AMA is as follows Consider encodingeach training stimulus with a small population of filterswith known (internal) noise characteristics The filterresponses are given by the dot product of the contrast-normalized stimulus scaled by the maximum responserate plus response noise (here a small amount ofGaussian noise) With a known noise model it isstraightforward to compute the mean and variance ofeach filterrsquos response to each training stimulus Then aclosed-form expression is derived for the approximateaccuracy of the Bayesian optimal decoder (GeislerNajemnik amp Ing 2009) Finally this closed-formexpression can be used to search the space of linearfilters to find those that give the most accurateperformance If the algorithm does not settle in localminima it finds the Bayes-optimal filters for maximiz-ing performance in a given task (Geisler et al 2009)Here different random initializations yielded the samefinal estimated filters A Matlab implementation ofAMA and a short discussion of how to apply it areavailable at httpjburgecpsutexaseduresearchCodehtml

It is important to note that the Bayesian optimaldecoder used in AMA requires knowing the means andvariances for each possible stimulus and hence it canonly be used to decode the training stimuli In otherwords AMA can find the optimal linear filters given alarge enough training set but the decoder it uses tolearn those filters cannot be applied to arbitrary stimuliA separate analysis (like the one described in the bodyof the present paper) is required to determine how tooptimally decode the AMA filter responses for arbi-trary test stimuli

Estimating disparity

Increasing the number of disparity levels in thetraining set increases the accuracy of disparity estima-tion However increasing the number of disparitylevels in the training set also increases the training setsize and the computational complexity of learningfilters via AMA A balance must be struck Excellentcontinuous estimates are obtained using 1875 arcminsteps for training followed by interpolation of 577 filterresponse distributions (ie 31 interpolated response

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 15

distribution between each training step correspondingto one interpolated distribution every 35 arcsec)Interpolated distributions were obtained by fittingcubic splines through the response distribution meanvectors and covariance matrices This procedureresulted in 577 LL neurons which resulted in posteriorprobability distributions that were defined by 577discrete points MAP estimates were obtained byselecting the disparity with the highest posteriorprobability Increasing the number of interpolateddistributions (ie LL neurons) had no effect onperformance

Keywords natural scene statistics perceptual con-stancy ideal observer Bayesian statistics populationcode encoding decoding selectivity invariance depthperception stereopsis hierarchical model disparityenergy model simple cells complex cells

Acknowledgments

The authors thank Larry Cormack Bruno Olshau-sen and Jonathan Pillow for comments on an earlierdraft JB was supported by NSF grant IIS-1111328NIH grant 2R01-EY011747 and NIH Training Grant1T32-EY021462-01A1 WSG was supported by NIHgrant 2R01-EY011747

Commercial relationships noneCorresponding author Johannes BurgeEmail jburgemailcpsutexaseduAddress Center for Perceptual Systems University ofTexas at Austin Austin TX USA

References

Albrecht D G amp Geisler W S (1991) Motionselectivity and the contrast-response function ofsimple cells in the visual cortex Visual Neurosci-ence 7(6) 531ndash546

Albrecht D G amp Hamilton D B (1982) Striatecortex of monkey and cat Contrast responsefunction Journal of Neurophysiology 48(1) 217ndash237

Anzai A Ohzawa I amp Freeman R D (1999)Neural mechanisms for processing binocular in-formation I Simple cells Journal of Neurophysiol-ogy 82(2) 891ndash908

Badcock D R amp Schor C M (1985) Depth-increment detection function for individual spatialchannels Journal of the Optical Society of AmericaA 2(7) 1211ndash1215

Banks M S Gepshtein S amp Landy M S (2004)Why is spatial stereoresolution so low Journal ofNeuroscience 24(9) 2077ndash2089

Berens P Ecker A S Gerwinn S Tolias A S ampBethge M (2011) Reassessing optimal neuralpopulation codes with neurometric functionsProceedings of the National Academy of SciencesUSA 108(11) 4423

Blakemore C (1970) The range and scope ofbinocular depth discrimination in man The Journalof Physiology 211(3) 599ndash622

Burge J amp Geisler W S (2011) Optimal defocusestimation in individual natural images Proceed-ings of the National Academy of Sciences USA108(40) 16849ndash16854

Burge J amp Geisler W S (2012) Optimal defocusestimates from individual images for autofocusing adigital camera Proceedings of the ISampTSPIE 47thAnnual Meeting January 2012 Burlingame CA

Carandini M amp Heeger D J (2012) Normalizationas a canonical neural computation Nature ReviewsNeuroscience 13(1) 51ndash62

Chen Y amp Qian N (2004) A coarse-to-fine disparityenergy model with both phase-shift and position-shift receptive field mechanisms Neural Computa-tion 16(8) 1545ndash1577

Cormack L K Stevenson S B amp Schor C M(1991) Interocular correlation luminance contrastand cyclopean processing Vision Research 31(12)2195ndash2207

Cumming B G (2002) An unexpected specializationfor horizontal disparity in primate primary visualcortex Nature 418(6898) 633ndash636

Cumming B G amp DeAngelis G C (2001) Thephysiology of stereopsis Annual Review of Neuro-science 24 203ndash238

Cumming B G Shapiro S E amp Parker A J (1998)Disparity detection in anticorrelated stereogramsPerception 27(11) 1367ndash1377

Curcio C A Sloan K R Kalina R E ampHendrickson A E (1990) Human photoreceptortopography The Journal of Comparative Neurolo-gy 292(4) 497ndash523

De Valois R L Albrecht D G amp Thorell L G(1982) Spatial frequency selectivity of cells inmacaque visual cortex Vision Research 22(5) 545ndash559

DeAngelis G C Ohzawa I amp Freeman R D(1991) Depth is encoded in the visual cortex by aspecialized receptive field structure Nature352(6331) 156ndash159

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 16

Ferster D amp Miller K D (2000) Neural mechanismsof orientation selectivity in the visual cortexAnnual Review of Neuroscience 23(1) 441ndash471

Fincham E amp Walton J (1957) The reciprocalactions of accommodation and convergence TheJournal of Physiology 137(3) 488ndash508

Fleet D J Wagner H amp Heeger D J (1996)Neural encoding of binocular disparity Energymodels position shifts and phase shifts VisionResearch 36(12) 1839ndash1857

Ganguli D amp Simoncelli E P (2010) Implicitencoding of prior probabilities in optimal neuralpopulations Advances in Neural Information Pro-cessing Systems (NIPS) 23 658ndash666

Geisler W S Najemnik J amp Ing A D (2009)Optimal stimulus encoders for natural tasksJournal of Vision 9(13)17 1ndash16 httpwwwjournalofvisionorgcontent91317 10116791317 [PubMed] [Article]

Girshick A R Landy M S amp Simoncelli E P(2011) Cardinal rules Visual orientation percep-tion reflects knowledge of environmental statisticsNature Neuroscience 14(7) 926ndash932

Haefner R amp Bethge M (2010) Evaluating neuronalcodes for inference using Fisher informationAdvances in Neural Information Processing Systems23 1ndash9

Haefner R M amp Cumming B G (2008) Adaptationto natural binocular disparities in primate V1explained by a generalized energy model Neuron57(1) 147ndash158

Harris J M McKee S P amp Smallman H S (1997)Fine-scale processing in human binocular stereop-sis Journal of the Optical Society of America AOptics Image Science and Vision 14(8) 1673ndash1683

Hecht S Shlaer S amp Pirenne M H (1942) Energyquanta and vision The Journal of General Physi-ology 25(6) 819ndash840

Heeger D J (1992) Normalization of cell responses incat striate cortex Visual Neuroscience 9(2) 181ndash197

Hibbard P B (2008) Binocular energy responses tonatural images Vision Research 48(12) 1427ndash1439

Hoyer P O amp Hyvarinen A (2000) Independentcomponent analysis applied to feature extractionfrom colour and stereo images Network Compu-tation in Neural Systems 11(3) 191ndash210

Jazayeri M amp Movshon J A (2006) Optimalrepresentation of sensory information by neuralpopulations Nature Neuroscience 9(5) 690ndash696

Kaye M (1978) Stereopsis without binocular corre-lation Vision Research 18(8) 1013ndash1022

Landers D D amp Cormack L K (1997) Asymme-tries and errors in perception of depth fromdisparity suggest a multicomponent model ofdisparity processing Perception amp Psychophysics59(2) 219ndash231

Lehky S R amp Sejnowski T J (1990) Neural modelof stereoacuity and depth interpolation based on adistributed representation of stereo disparityJournal of Neuroscience 10(7) 2281ndash2299

Levi D M McKee S P amp Movshon J A (2011)Visual deficits in anisometropia Vision Research51(1) 48ndash57

Lewicki M S (2002) Efficient coding of naturalsounds Nature Neuroscience 5(4) 356ndash363

Li Z amp Atick J J (1994) Efficient stereo coding inthe multiscale representation Network Computa-tion in Neural Systems 5 157ndash174

Liu Y Bovik A C amp Cormack L K (2008)Disparity statistics in natural scenes Journal ofVision 8(11)19 1ndash14 httpwwwjournalofvisionorgcontent81119 doi10116781119 [PubMed][Article]

Ma W J Beck J M Latham P E amp Pouget A(2006) Bayesian inference with probabilistic pop-ulation codes Nature Neuroscience 9(11) 1432ndash1438

Marcos S amp Burns S A (2000) On the symmetrybetween eyes of wavefront aberration and conedirectionality Vision Research 40(18) 2437ndash2447

Marr D amp Poggio T (1976) Cooperative computa-tion of stereo disparity Science 194(4262) 283ndash287

McKee S P Levi D M amp Bowne S F (1990) Theimprecision of stereopsis Vision Research 30(11)1763ndash1779

Nakayama K amp Shimojo S (1990) Da Vincistereopsis Depth and subjective occluding contoursfrom unpaired image points Vision Research30(11) 1811ndash1825

Nienborg H Bridge H Parker A J amp CummingB G (2004) Receptive field size in V1 neuronslimits acuity for perceiving disparity modulationJournal of Neuroscience 24(9) 2065ndash2076

Ogle K N (1952) On the limits of stereoscopic visionJournal of Experimental Psychology 44(4) 253ndash259

Ohzawa I (1998) Mechanisms of stereoscopic visionThe disparity energy model Current Opinion inNeurobiology 8(4) 509ndash515

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 17

Ohzawa I DeAngelis G C amp Freeman R D(1990) Stereoscopic depth discrimination in thevisual cortex Neurons ideally suited as disparitydetectors Science 249(4972) 1037ndash1041

Ohzawa I DeAngelis G C amp Freeman R D(1997) Encoding of binocular disparity by complexcells in the catrsquos visual cortex Journal of Neuro-physiology 77(6) 2879ndash2909

Olshausen B A amp Field D J (1996) Emergence ofsimple-cell receptive field properties by learning asparse code for natural images Nature 381(6583)607ndash609

Panum P (1858) Physiologische Untersuchungen uberdas Sehen mit zwei Augen [Translation Psycholog-ical investigations on seeing with two eyes] KielSchwerssche Buchandlung

Poggio G F Gonzalez F amp Krause F (1988)Stereoscopic mechanisms in monkey visual cortexBinocular correlation and disparity selectivityJournal of Neuroscience 8(12) 4531ndash4550

Qian N (1997) Binocular disparity and the perceptionof depth Neuron 18(3) 359ndash368

Qian N amp Zhu Y (1997) Physiological computationof binocular disparity Vision Research 37(13)1811ndash1827

Read J C A amp Cumming B G (2003) Testingquantitative models of binocular disparity selectiv-ity in primary visual cortex Journal of Neurophys-iology 90(5) 2795ndash2817

Read J C A amp Cumming B G (2007) Sensors forimpossible stimuli may solve the stereo correspon-dence problem Nature Neuroscience 10(10) 1322ndash1328

Read J C A Parker A J amp Cumming B G (2002)A simple model accounts for the response ofdisparity-tuned V1 neurons to anticorrelated im-ages Visual Neuroscience 19(6) 735ndash753

Simoncelli E P amp Olshausen B A (2001) Naturalimage statistics and neural representation AnnualReview of Neuroscience 24 1193ndash1216

Skottun B C De Valois R L Grosof D HMovshon J A Albrecht D G amp Bonds A B(1991) Classifying simple and complex cells on the

basis of response modulation Vision Research31(7-8) 1079ndash1086

Stevenson S B Cormack L K Schor C M ampTyler C W (1992) Disparity tuning in mecha-nisms of human stereopsis Vision Research 32(9)1685ndash1694

Stockman A amp Sharpe L T (2000) The spectralsensitivities of the middle- and long-wavelength-sensitive cones derived from measurements inobservers of known genotype Vision Research40(13) 1711ndash1737

Tanabe S amp Cumming B G (2008) Mechanismsunderlying the transformation of disparity signalsfrom V1 to V2 in the macaque Journal ofNeuroscience 28(44) 11304ndash11314

Tanabe S Haefner R M amp Cumming B G (2011)Suppressive mechanisms in monkey V1 help tosolve the stereo correspondence problem Journal ofNeuroscience 31(22) 8295ndash8305

Thibos L N Ye M Zhang X amp Bradley A(1992) The chromatic eye A new reduced-eyemodel of ocular chromatic aberration in humansApplied Optics 31(19) 3594ndash3600

Tsao D Y Conway B R amp Livingstone M S(2003) Receptive fields of disparity-tuned simplecells in macaque V1 Neuron 38(1) 103ndash114

Tyler C W (1975) Spatial organization of binoculardisparity sensitivity Vision Research 15(5) 583ndash590

Tyler C W amp Julesz B (1978) Binocular cross-correlation in time and space Vision Research18(1) 101ndash105

Vlaskamp B N S Yoon G amp Banks M S (2011)Human stereopsis is not limited by the optics of thewell-focused eye Journal of Neuroscience 31(27)9814ndash9818

Williams D R (1985) Visibility of interference fringesnear the resolution limit Journal of the OpticalSociety of America A 2(7) 1087ndash1093

Wyszecki G amp Stiles W (1982) Color scienceConcepts and methods quantitative data and for-mulas New York John Wiley amp Sons

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 18

113 13

Optimal13 disparity13 estimation13 in13 natural13 stereo-shy‐images13 (Supplement)13

Johannes13 Burge13 amp13 Wilson13 S13 Geisler13 13 Computing13 the13 log13 likelihood13 with13 populations13 of13 simple13 and13 complex13 cells13 13 Simple13 cells13 in13 visual13 cortex13 produce13 only13 positive13 responses13 whereas13 AMA13 filters13 produce13 both13 positive13 and13 negative13 responses13 The13 linear13 AMA13 filter13 responses13 (see13 first13 term13 in13 Eq13 5)13 and13 their13 pairwise13 sums13 (see13 the13 sum13 in13 the13 third13 termrsquos13 parentheses)13 can13 be13 obtained13 from13 a13 population13 of13 simple13 cells13 We13 define13 simple13 cells13 as13 units13 whose13 outputs13 result13 from13 linear13 filtering13 followed13 by13 half-shy‐wave13 rectification13 (ie13 thresholding13 at13 zero)13 Each13 AMA13 filter13 response13 in13 the13 first13 term13 can13 be13 obtained13 from13 two13 simple13 cells13 that13 are13 lsquoonrsquo13 and13 lsquooffrsquo13 versions13 of13 each13 AMA13 filter13 (Fig13 S3a13 see13 Fig13 3)13 Each13 pairwise-shy‐summed13 AMA13 filter13 response13 in13 the13 third13 term13 can13 be13 obtained13 from13 two13 simple13 cells13 that13 are13 lsquoonrsquo13 and13 lsquooffrsquo13 versions13 of13 each13 pairwise-shy‐summed13 filter13 (Fig13 S3a13 S4)13 13 13 The13 squared13 AMA13 filter13 responses13 (see13 the13 second13 and13 third13 terms13 in13 Eq13 5)13 can13 be13 obtained13 from13 a13 population13 of13 complex13 cells13 We13 define13 complex13 cells13 as13 units13 whose13 responses13 result13 from13 linear13 filtering13 followed13 by13 a13 squaring13 non-shy‐linearity13 as13 is13 often13 the13 case13 in13 cortex13 Complex13 cells13 can13 be13 implemented13 by13 summing13 and13 then13 squaring13 (or13 squaring13 and13 then13 summing)13 the13 outputs13 of13 lsquoonrsquo13 and13 lsquooffrsquo13 simple13 cells13 (Fig13 S3b13 the13 disparity13 tuning13 curves13 of13 these13 complex13 cells13 are13 given13 in13 Fig13 S5)13 13 Finally13 neurons13 whose13 response13 rates13 represent13 the13 log13 likelihood13 of13 the13 AMA13 filter13 population13 responses13 can13 be13 constructed13 via13 a13 weighted13 sum13 of13 simple13 and13 complex13 cell13 population13 responses13 (Eq13 513 Fig13 613 Fig13 7)13 (Note13 that13 the13 likelihood13 can13 be13 obtained13 by13 exponentiating13 the13 log-shy‐likelihood13 responses)13 In13 other13 words13 a13 large13 collection13 of13 log-shy‐likelihood13 (LL)13 neurons13 each13 with13 a13 different13 preferred13 disparity13 can13 be13 constructed13 from13 a13 fixed13 set13 of13 simple13 and13 complex13 cells13 simply13 by13 changing13 the13 weights13 13 13 Derivation13 of13 optimal13 weights13 on13 simple13 and13 complex13 cells13 The13 first13 term13 in13 Eq13 513 (ie13 the13 linear13 term)13 can13 be13 conceptualized13 as13 a13 weighted13 sum13 of13 simple13 cell13 population13 responses13 The13 second13 and13 third13 terms13 (ie13 the13 quadratic13 terms)13 can13 be13 conceptualized13 as13 weighted13 sums13 of13 complex13 cell13 population13 responses13 The13 constant13 corresponds13 to13 baseline13 response13 The13 constant13 and13 the13 weights13 on13 the13 linear13 squared13 and13 sum-shy‐squared13 filters13 are13 given13 by13

wik = Ck

minus1uk 13 13 13 13 13 13 13 13 13 (S1a)13

13 wiik = minusdiag(Cminus1)+ 05Cminus11 13 13 13 13 13 13 13 (S1b)13

wij k = minus05Cij

minus1 forallij j gt i 13 13 13 13 13 13 13 13 (S1c)13

cons primetk = minus05ukTCk

minus1uk + constk 13 13 13 13 13 13 (S1d)13 13 where13 I13 is13 the13 identity13 matrix13 113 is13 the13 lsquoonesrsquo13 vector13 and13 diag()13 sets13 a13 matrix13 diagonal13 to13 a13 vector13 Here13 we13 derive13 these13 weight13 equations13 based13 on13 the13 multi-shy‐dimensional13 Gaussian13 approximations13 to13 the13 empirically13 determined13 filter13 response13 distributions13 (ie13 conditional13 response13 distributions)13 (see13 Fig13 4a)13 13 13 13 13 13

213 13

First13 we13 rewrite13 Eq13 513 in13 the13 main13 text13 by13 expanding13 its13 third13 term13 13

ln p R δ k( ) = wik Ri

i=1

n

sum + wiik Ri2

i=1

n

sum + wij k Ri2 + 2Ri Rj + Rj

2( )j=i+1

n

sumi=1

nminus1

sum + cons primet 13 (S2)13

13 Second13 we13 rewrite13 Eq13 413 in13 the13 main13 text13

13

ln p R δ k( ) = minus05 R minus uk( )T Ak R minus uk( )+ const 13 13 13 13 13 (S3)13

13 where13 Ak 13 equals13 the13 inverse13 covariance13 matrix13 Ck

minus1 13 (Recall13 that13 the13 mean13 vector13 uk and13 covariance13 matrix13 Ck are13 obtained13 by13 projecting13 a13 collection13 of13 natural13 stereo-shy‐images13 having13 disparity13 δ k 13 onto13 the13 filters)13 From13 here13 forward13 the13 disparity13 index13 k13 is13 dropped13 for13 notational13 simplicity13 Third13 we13 set13 Eqs13 S213 and13 S313 equal13 to13 one13 another13 multiply13 through13 and13 collect13 terms13 The13 terms13 of13 the13 log13 likelihood13 (Eq13 S3)13 are13 found13 to13 have13 the13 form13 13

( ) ( )2 2 205 05 2ii i i ii i ii i i ii ia R u a R a Ru a uminus minus = minus minus + 13 13 13 13 (S4a)13

minusaij Ri minus ui( ) Rj minus u j( ) = minus aijRiRj minus aijRiu j minus aijRjui + aiju jui( ) 13 13 13 (S4b)13

13 where13 iia 13 are13 the13 on-shy‐diagonal13 elements13 of13 A 13 ija are13 the13 off-shy‐diagonal13 elements13 and13 iu 13 the13 are13 the13 mean13 filter13 responses13 13 13 The13 aim13 is13 now13 to13 express13 wi wii 13 and13 wij 13 in13 terms13 of13 iu 13 iia 13 and13 ija 13 First13 consider13 wi 13 In13 this13 case13

we13 want13 to13 collect13 terms13 containing13 iR 13 wiRi 13 from13 Eq13 S213 and13 aiiRiui 13 and13 aijRiu j 13 from13 Eq13 S4a13 Setting13 the13 iR 13 terms13 equal13 and13 canceling13 gives13 13

wi = aiju jj=1

n

sum 13 13 13 13 13 13 13 13 13 (S5a)13

13 Note13 that13 the13 weights13 on13 the13 linear13 filter13 responses13 wi 13 are13 zero13 if13 the13 mean13 linear13 filter13 responses13 ui 13 are13 zero13 In13 our13 case13 the13 filter13 response13 distributions13 means13 (ie13 the13 means13 of13 the13 conditional13 response13 distributions)13 are13 all13 near13 zero13 (see13 Fig13 4a13 Fig13 S2)13 Thus13 the13 weights13 on13 the13 simple13 cell13 responses13 (Fig13 6a)13 contribute13 little13 to13 the13 units13 (LL13 neurons)13 that13 respond13 according13 to13 the13 log13 likelihood13 of13 the13 linear13 filter13 population13 response13 (see13 Discussion)13 13 Next13 consider13 wij 13 i ne j 13 In13 this13 case13 we13 want13 to13 collect13 terms13 containing13 RiRj 13 2wijRiRj 13 appears13

in13 Eq13 S213 and13 minusaijRiRj 13 appears13 in13 Eq13 S4b13 Setting13 the13 two13 RiRj 13 terms13 equal13 to13 each13 other13 and13 canceling13 gives13 13

wij = minus05aij 13 13 13 13 13 13 13 13 13 (S5b)13

313 13

Next13 consider13 wii 13 In13 this13 case13 the13 total13 weight13 on13 2iR 13 in13 Eq13 S213 must13 be13 05 iiaminus 13 The13 weight13 from13

the13 third13 term13 of13 Eq13 S213 can13 be13 found13 by13 substituting13 wij = minus05aij 13 Subtracting13 that13 weight13 from13 the13

total13 weight13 specified13 by13 Eq13 S413 gives13 wii 13 13

wii = minusaii + 05 aijjsum 13 13 13 13 13 13 13 13 13 (S5c)13

13 Last13 by13 grouping13 terms13 from13 Eq13 S413 that13 are13 independent13 of13 the13 filter13 response13 we13 find13 the13 constant13 in13 Eq13 S213 is13 given13 by13 13

cons primet = minus05 aiimicroi2 + aijmicroimicro j

j foralljneisum

⎝⎜⎞

⎠⎟isum + const 13 13 13 13 (S5d)13

13 Eqs13 S1a-shy‐d13 are13 obtained13 by13 rewriting13 Eqs13 S5a-shy‐d13 in13 matrix13 notation13 13 13 In13 visual13 cortex13 simple13 and13 complex13 cells13 have13 the13 same13 maximum13 firing13 rates13 on13 average13 (Albrecht13 amp13 Hamilton13 1982)13 However13 the13 maximum13 possible13 response13 of13 the13 squared13 filters13 (see13 the13 second13 term13 in13 Eq13 5)13 is13 R2max 13 and13 the13 maximum13 possible13 response13 of13 the13 sum-shy‐squared13 filters13 (see13 the13 third13

term13 in13 Eq13 5)13 is13 given13 by13 R2ijmax = Rmax 2+ 2fi sdot f j( )2 13 where13 fi 13 and13 f j 13 are13 the13 weighting13 functions13 defining13 filters13 i13 and13 j13 (see13 Fig13 3a)13 Thus13 if13 these13 computations13 were13 implemented13 in13 cortexmdashthat13 is13 if13 the13 maximum13 response13 of13 the13 complex13 cells13 equaled13 the13 maximum13 response13 of13 the13 simple13 cells13 Rmax 13 rather13 than13 R

2max 13 and13 R

2ijmaxmdashthe13 weights13 on13 the13 complex13 cell13 responses13 must13 be13 scaled13

Thus13 the13 optimal13 weights13 on13 complex13 cells13 are13 given13 by13 primewi = wi 13 primewii = Rmaxwii 13 and13 primewij =Rijmax2

Rmaxwij 13

These13 scale13 factors13 are13 important13 because13 in13 a13 neurophysiological13 experiment13 one13 would13 measure13 primew 13 not13 w 13 Consequently13 primewii 13 and13 primewij 13 are13 the13 normalized13 weights13 plotted13 in13 Fig13 7b13

13 Evaluating13 the13 effect13 of13 surface13 slant13 Slant13 causes13 disparity13 to13 change13 across13 the13 patch13 We13 evaluated13 the13 effect13 of13 surface13 slant13 by13 determining13 the13 optimal13 filters13 and13 estimation13 performance13 for13 a13 training13 set13 with13 a13 distribution13 of13 slants13 If13 we13 assume13 (plausibly)13 that13 objects13 are13 on13 average13 equally13 likely13 to13 be13 viewed13 from13 any13 given13 angle13 then13 for13 a13 given13 small13 angular13 retinal13 extent13 the13 probability13 of13 a13 given13 slant13 about13 the13 vertical13 axis13 should13 fall13 as13 a13 half-shy‐cosine13 function13 Slant13 was13 calculated13 for13 the13 cyclopean13 eye13 To13 evaluate13 the13 effect13 of13 surface13 slant13 variation13 we13 generated13 a13 training13 set13 of13 stereo-shy‐images13 from13 a13 set13 of13 surfaces13 with13 slants13 that13 were13 randomly13 drawn13 from13 a13 truncated13 (+7113 deg)13 cosine13 probability13 density13 (more13 extreme13 slants13 were13 not13 practical13 to13 include13 because13 of13 the13 surface13 size13 required13 to13 produce13 a13 projected13 patch13 width13 of13 113 deg)13 The13 within-shy‐patch13 changes13 in13 disparity13 that13 were13 introduced13 by13 these13 slants13 are13 comparable13 to13 those13 in13 natural13 viewing13 (Fig13 S1g)13 Results13 based13 on13 this13 analysis13 are13 thus13 likely13 to13 be13 representative13 Then13 we13 determined13 the13 optimal13 binocular13 filters13 for13 this13 new13 training13 set13 The13 filters13 that13 were13 trained13 on13 surfaces13 having13 different13 slants13 (Fig13 S1a-shy‐c)13 and13 the13 LL13 neurons13 that13 are13 constructed13 from13 them13 (Fig13 S1de)13 are13 quite13 similar13 to13 those13 that13 were13 trained13 only13 on13 fronto-shy‐parallel13 surfaces13 The13 main13 (although13 slight)13 differences13 are13 that13 the13 new13 filters13 have13 spatial13 extents13 that13 are13 slightly13 smaller13 and13 octave13 bandwidths13 that13 are13 slightly13 higher13

413 13

and13 that13 the13 new13 LL13 neuron13 tuning13 curves13 are13 broader13 Disparity13 estimation13 performance13 on13 slant13 surfaces13 is13 also13 comparable13 but13 precision13 is13 somewhat13 reduced13 (Fig13 S1f13 Fig13 5d)13 When13 disparity13 changes13 across13 the13 patch13 the13 optimal13 encoding13 filters13 must13 be13 small13 enough13 to13 prevent13 the13 disparity13 signal13 from13 being13 lsquosmeared13 outrsquo13 by13 within-shy‐patch13 (ie13 within-shy‐receptive13 field)13 disparity13 variation13 And13 they13 must13 be13 large13 enough13 to13 prevent13 a13 poor13 signal-shy‐to-shy‐noise13 ratio13 which13 would13 lead13 to13 a13 preponderance13 of13 false13 matches13 (ie13 inaccurate13 disparity13 estimates)13 The13 optimal13 filters13 appear13 to13 strike13 this13 balance13 13 13 Evaluating13 the13 effect13 of13 depth13 variations13 other13 than13 slant13 Naturally13 occurring13 fine13 depth13 structure13 and13 occlusions13 (ie13 within-shy‐patch13 depth13 variation13 and13 depth13 discontinuities13 Fig13 S8a-shy‐c)13 were13 not13 present13 in13 either13 of13 our13 training13 sets13 To13 determine13 how13 the13 magnitude13 of13 these13 other13 sources13 of13 depth13 variation13 compare13 to13 that13 due13 to13 slant13 we13 analyzed13 4000013 113 deg13 patches13 from13 3013 outdoor13 range13 images13 obtained13 with13 a13 high-shy‐precision13 Riegl13 VZ40013 range13 scanner13 For13 each13 patch13 we13 computed13 the13 standard13 deviation13 of13 i)13 the13 differences13 between13 the13 raw13 range13 values13 and13 the13 best-shy‐fitting13 fronto-shy‐parallel13 plane13

σ nonminus planar 13 and13 ii)13 the13 differences13

between13 the13 best-shy‐fitting13 slanted13 and13 fronto-shy‐parallel13 planes13 σ planar 13 The13 ratio13 of13 these13 standard13

deviations13 σ planar σ nonminus planar 13 provides13 a13 measure13 of13 the13 relative13 magnitude13 of13 depth13 variation13 due13 to13

slant13 and13 variation13 due13 to13 other13 factors13 If13 the13 depth13 variation13 is13 due13 only13 to13 surface13 slant13 then13 the13 ratio13

σ planar σ nonminus planar 13 will13 have13 a13 value13 equal13 to13 1013 If13 all13 of13 the13 depth13 variation13 is13 due13 to13 factors13

other13 than13 slant13 the13 ratio13 will13 have13 a13 value13 of13 0013 Fig13 S8d13 plots13 the13 distribution13 of13 this13 ratio13 for13 all13 range13 patches13 In13 the13 most13 common13 patches13 essentially13 all13 the13 variation13 is13 captured13 by13 slant13 (ie13 the13 modal13 ratio13 is13 ~10)13 The13 median13 ratio13 is13 07513 indicating13 that13 in13 the13 median13 patch13 7513 of13 the13 depth13 variation13 is13 captured13 by13 slant13 Given13 that13 the13 optimal13 binocular13 filters13 are13 robust13 to13 variation13 in13 surface13 slant13 and13 given13 that13 most13 of13 natural13 depth13 variation13 is13 captured13 by13 slant13 it13 is13 unlikely13 that13 depth13 variation13 other13 than13 slant13 will13 qualitatively13 affect13 our13 results13 13 External13 variability13 and13 neural13 noise13 13 We13 examined13 the13 relative13 magnitude13 of13 stimulus-shy‐induced13 LL13 neuron13 variability13 and13 typical13 neural13 noise13 in13 cortex13 Typical13 cortical13 neurons13 have13 a13 peak13 mean13 response13 larger13 than13 3013 spikessec13 (Geisler13 amp13 Albrecht13 1997)13 and13 intrinsic13 noise13 variance13 proportional13 to13 the13 neuronrsquos13 mean13 response13 with13 a13 proportionality13 constant13 (ie13 Fano13 factor)13 less13 than13 15(Tolhurst13 Movshon13 amp13 Dean13 1983)13 To13 err13 on13 the13 side13 of13 too13 much13 neural13 noise13 we13 assume13 that13 each13 LL13 neuron13 has13 a13 maximum13 average13 response13 of13 3013 spkss13 and13 a13 Fano13 factor13 of13 1513 (bc13 the13 effect13 of13 noise13 is13 greater13 at13 lower13 firing13 rates)13 We13 convert13 the13 LL13 neuron13 responses13 to13 spike13 rates13 by13 first13 obtaining13 the13 average13 response13 of13 each13 LL13 neuron13 to13 its13 preferred13 disparity13 and13 then13 scaling13 that13 value13 to13 the13 maximum13 average13 response13 For13 an13 integration13 time13 of13 a13 typical13 fixation13 duration13 (30013 ms)13 stimulus-shy‐induced13 response13 variance13 was13 ~3x13 greater13 than13 the13 internal13 noise13 For13 an13 integration13 time13 equal13 to13 a13 typical13 stimulus13 presentation13 in13 a13 psychophysical13 task13 (113 sec)13 13 stimulus-shy‐induced13 response13 variance13 was13 ~9x13 greater13 than13 the13 internal13 noise13 Thus13 external13 variability13 must13 be13 considered13 to13 understand13 the13 influence13 of13 neural13 noise13 on13 the13 behavioral13 limits13 of13 vision13 under13 natural13 conditions13 In13 fact13 stimulus-shy‐induced13 variability13 is13 likely13 to13 be13 the13 controlling13 variability13 in13 many13 natural13 single-shy‐trial13 tasks13 13 Spatial13 frequency13 tuning13 of13 optimal13 linear13 binocular13 filters13 13 To13 develop13 an13 intuition13 for13 why13 spatial13 frequencies13 higher13 than13 ~613 cpd13 carry13 little13 information13 about13 disparity13 we13 examined13 the13 disparity13 signals13 resulting13 from13 a13 binocularly-shy‐viewed13 high-shy‐contrast13 cosine-shy‐windowed13 edge13 The13 Fourier-shy‐decomposition13 of13 each13 eyersquos13 stimulus13 is13 a13 set13 of13

513 13

phase-shy‐aligned13 sinewave13 gratings13 with13 the13 1f13 contrast13 fall-shy‐off13 typical13 of13 natural13 scenes(Field13 1987)13 The13 binocular13 difference13 signal13 (ie13 the13 difference13 of13 the13 left13 and13 right13 signals)13 is13 sinusoidal13 with13 contrast13 amplitude13 given13 by13 13

AB f δ k( ) = AL( f )2 + AR( f )2 minus 2AL( f )AR( f )cos 2π fδ k( ) 13 13 13 (S6)13

13 where f 13 is13 the13 spatial13 frequency13 δ k 13 is13 a13 particular13 disparity13 AL ( f ) 13 and13 AR( f ) 13 are13 the13 left13 and13

right13 eye13 retinal13 amplitudes13 and13 AB f δ k( ) 13 is13 the13 amplitude13 of13 the13 binocular13 contrast13 difference13 signal13 If13 the13 two13 eyes13 have13 identical13 optics13 the13 left-shy‐13 and13 right-shy‐eye13 retinal13 amplitudes13 (which13 include13 the13 effects13 of13 the13 optics13 and13 the13 1 f 13 falloff13 of13 natural13 image13 spectra)13 will13 be13 the13 same13 for13 a13 given13 disparity13 Fig13 S7a13 plots13 pattern13 of13 binocular13 contrast13 signals13 for13 each13 of13 several13 disparities13 Fig13 S7b13 shows13 the13 signals13 after13 filtering13 with13 a13 bank13 of13 log-shy‐Gabor13 filters13 each13 having13 a13 1513 octave13 bandwidth13 The13 patterns13 of13 binocular13 contrast13 barely13 differ13 above13 ~613 cpd13 A13 filter13 population13 (for13 disparity13 estimation)13 will13 be13 most13 efficient13 if13 each13 filter13 signals13 information13 about13 a13 range13 of13 disparities13 13 13 V113 binocular13 receptive13 fields13 tuned13 to13 the13 highest13 useful13 spatial13 frequency13 (~613 cpd)13 would13 have13 a13 spatial13 extent13 of13 ~813 arcmin13 assuming13 the13 typical13 1513 octave13 bandwidth13 of13 neurons13 in13 cortex13 Given13 that13 receptive13 fields13 cannot13 signal13 disparity13 variation13 finer13 than13 their13 own13 width13 this13 analysis13 may13 help13 explain13 why13 humans13 lose13 the13 ability13 to13 detect13 disparity13 modulations13 (ie13 sinusoidal13 modulations13 in13 disparity-shy‐defined13 depth)13 at13 spatial13 frequencies13 higher13 than13 ~413 cpd13 (Banks13 Gepshtein13 amp13 Landy13 200413 Harris13 McKee13 amp13 Smallman13 1997)13 Vergence13 eye13 movement13 jitter13 has13 previously13 been13 proposed13 as13 an13 explanation13 for13 why13 useful13 binocular13 information13 is13 restricted13 to13 low13 spatial13 frequencies13 (Vlaskamp13 Yoon13 amp13 Banks13 2011)13 vergence13 noise13 can13 degrade13 stereo13 information13 enough13 to13 render13 high13 frequency13 signals13 unmeasurable13 (Fig13 S7cd)13 The13 analysis13 presented13 here13 suggests13 that13 the13 frequency13 at13 which13 humans13 lose13 the13 ability13 to13 detect13 disparity13 modulation13 may13 simply13 be13 explained13 by13 the13 statistics13 of13 natural13 images13 13 Optics13 The13 retinal13 images13 used13 in13 our13 analysis13 were13 simulated13 to13 match13 the13 retinal13 image13 formation13 that13 occurs13 in13 the13 human13 eye13 In13 humans13 and13 non-shy‐human13 primates13 accommodative13 and13 vergence13 eye-shy‐movement13 systems13 are13 yoked(Fincham13 amp13 Walton13 1957)13 Consistent13 with13 this13 fact13 we13 set13 the13 focus13 distance13 equal13 to13 the13 fixation13 distance13 of13 4013 cm13 and13 defocused13 the13 images13 appropriate13 for13 each13 disparity13 The13 image13 of13 a13 surface13 displaced13 from13 the13 fixation13 is13 defocused13 by13

13 ΔD cong δ IPD 13 13 13 13 13 13 13 13 13 13 (S7)13

13 where13 δ 13 is13 the13 disparity13 of13 the13 surface13 expressed13 in13 radians13 IPD13 is13 the13 inter-shy‐pupillary13 distance13 expressed13 in13 meters13 and13 ΔD 13 is13 the13 defocus13 expressed13 in13 diopters13 (1meters)13 To13 account13 for13 the13 effect13 of13 chromatic13 aberration13 we13 created13 a13 different13 polychromatic13 point-shy‐spread13 function13 (PSF)13 for13 each13 disparity13 (ie13 defocus13 Eq13 S7)13 Single-shy‐wavelength13 PSFs13 were13 computed13 every13 1013 nm13 between13 40013 and13 70013 nm13 The13 wavelength-shy‐dependent13 change13 in13 refractive13 power13 of13 the13 human13 eye13 was13 taken13 from13 the13 literature13 (Thibos13 Ye13 Zhang13 amp13 Bradley13 1992)13 Polychromatic13 PSFs13 were13 obtained13 by13 weighting13 the13 single-shy‐wavelength13 PSFs13 by13 the13 photopic13 sensitivity13 function13 (Stockman13 amp13 Sharpe13 2000)13 and13 by13 the13 D6513 daylight13 illumination13 spectrum13 and13 then13 summing13 13

613 13

13 psf photopic xΔD( ) = 1

Kpsf xλΔD( )sc

λsum λ( )D65 λ( ) 13 13 13 13 (S8)13

13 where13 sc λ( ) 13 is13 the13 human13 photopic13 sensitivity13 function13 and13 K13 is13 a13 normalizing13 constant13 that13 sets13 the13 volume13 of13 psf photopic 13 to13 1013 The13 modulation13 transfer13 function13 (MTF)13 is13 the13 amplitude13 of13 the13 Fourier13 transform13 of13 the13 PSF13 13 Contrast13 normalization13 Contrast13 normalization13 occurs13 early13 in13 visual13 processing13 The13 standard13 model13 of13 cortical13 neuron13 response13 assumes13 that13 the13 input13 is13 the13 contrast13 signal13

c x( ) = r x( )minus r( ) r 13 where13 r x( ) 13 are13 the13

noisy13 sensor13 responses13 to13 the13 luminance13 image13 falling13 on13 the13 retina13 over13 a13 local13 area13 and13 r 13 is13 the13 local13 mean13 This13 contrast13 signal13 is13 then13 normalized13 by13 the13 local13 contrast13

cnorm x( ) = c x( ) c x( ) 2

+ nc502 13 where13 n13 is13 the13 dimensionality13 of13 the13 vector13 c x( ) 2 13 is13 the13 contrast13

energy13 (or13 equivalently13 n 13 times13 the13 squared13 RMS13 contrast)13 and13 50c 13 is13 the13 half-shy‐saturation13 constant13 (Albrecht13 amp13 Geisler13 199113 Albrecht13 amp13 Hamilton13 198213 Heeger13 1992)13 (We13 are13 agnostic13 about13 the13 mechanism13 by13 which13 the13 contrast13 normalization13 is13 achieved)13 Finally13 the13 normalized13 contrast13 signal13 is13 weighted13 by13 the13 cortical13 neuronrsquos13 receptive13 field13 and13 passed13 through13 a13 static13 non-shy‐linearity13 For13 example13 in13 the13 case13 of13 a13 binocular13 simple13 cell13 the13 response13 of13 the13 neuron13 is13 obtained13 by13 taking13 the13 dot13 product13 of13 the13 contrast-shy‐normalized13 signal13 with13 the13 receptive13 field13 and13 half-shy‐wave13 rectifying13

Ri = Rmax cnorm x( ) sdot fi x( )⎢⎣ ⎥⎦ 13 13

13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13

713 13

13 References13 Albrecht13 D13 G13 amp13 Geisler13 W13 S13 (1991)13 Motion13 selectivity13 and13 the13 contrast-shy‐response13 function13 of13

simple13 cells13 in13 the13 visual13 cortex13 Visual13 Neuroscience13 7(6)13 531ndash54613 Albrecht13 D13 G13 amp13 Hamilton13 D13 B13 (1982)13 Striate13 cortex13 of13 monkey13 and13 cat13 contrast13 response13

function13 Journal13 of13 Neurophysiology13 48(1)13 217ndash23713 Banks13 M13 S13 Gepshtein13 S13 amp13 Landy13 M13 S13 (2004)13 Why13 is13 spatial13 stereoresolution13 so13 low13 The13

Journal13 of13 Neuroscience13 24(9)13 2077ndash208913 13 Field13 D13 J13 (1987)13 Relations13 between13 the13 statistics13 of13 natural13 images13 and13 the13 response13 properties13 of13

cortical13 cells13 Journal13 of13 the13 Optical13 Society13 of13 America13 A13 4(12)13 2379ndash239413 Fincham13 E13 amp13 Walton13 J13 (1957)13 The13 reciprocal13 actions13 of13 accommodation13 and13 convergence13 The13

Journal13 of13 Physiology13 137(3)13 488ndash50813 Geisler13 W13 S13 amp13 Albrecht13 D13 G13 (1997)13 Visual13 cortex13 neurons13 in13 monkeys13 and13 cats13 detection13

discrimination13 and13 identification13 Visual13 Neuroscience13 14(5)13 897ndash91913 Harris13 J13 M13 McKee13 S13 P13 amp13 Smallman13 H13 S13 (1997)13 Fine-shy‐scale13 processing13 in13 human13 binocular13

stereopsis13 Journal13 of13 the13 Optical13 Society13 of13 America13 A13 14(8)13 1673ndash168313 Heeger13 D13 J13 (1992)13 Normalization13 of13 cell13 responses13 in13 cat13 striate13 cortex13 Visual13 Neuroscience13 9(2)13

181ndash19713 Hibbard13 P13 B13 (2008)13 Binocular13 energy13 responses13 to13 natural13 images13 Vision13 Research13 48(12)13 1427ndash

143913 13 Stockman13 A13 amp13 Sharpe13 L13 T13 (2000)13 The13 spectral13 sensitivities13 of13 the13 middle-shy‐13 and13 long-shy‐wavelength-shy‐

sensitive13 cones13 derived13 from13 measurements13 in13 observers13 of13 known13 genotype13 Vision13 Research13 40(13)13 1711ndash173713

Thibos13 L13 N13 Ye13 M13 Zhang13 X13 amp13 Bradley13 A13 (1992)13 The13 chromatic13 eye13 a13 new13 reduced-shy‐eye13 model13 of13 ocular13 chromatic13 aberration13 in13 humans13 Applied13 Optics13 31(19)13 3594ndash360013

Tolhurst13 D13 J13 Movshon13 J13 A13 amp13 Dean13 A13 F13 (1983)13 The13 statistical13 reliability13 of13 signals13 in13 single13 neurons13 in13 cat13 and13 monkey13 visual13 cortex13 Vision13 Research13 23(8)13 775ndash78513

Vlaskamp13 B13 N13 S13 Yoon13 G13 amp13 Banks13 M13 S13 (2011)13 Human13 stereopsis13 is13 not13 limited13 by13 the13 optics13 of13 the13 well-shy‐focused13 eye13 The13 Journal13 of13 Neuroscience13 31(27)13 9814ndash981813

13

813 13

13 Figure13 S113 Optimal13 filters13 and13 disparity13 estimation13 performance13 for13 estimating13 disparity13 for13 surfaces13 with13 a13 cosine13 distribution13 of13 slants13 (a)13 Optimal13 linear13 binocular13 filters13 for13 estimating13 the13 disparity13 (of13 the13 center13 pixel)13 of13 slanted13 surfaces13 The13 filters13 were13 learned13 on13 a13 training13 set13 having13 surfaces13 with13 a13 cosine13 distribution13 of13 slants13 The13 optimal13 filter13 shapes13 are13 robust13 to13 the13 presence13 of13 non-shy‐zero13 surface13 slants13 in13 the13 training13 set13 However13 these13 filters13 are13 of13 somewhat13 smaller13 spatial13 extent13 and13 slightly13 higher13 octave13 bandwidth13 (compare13 filters13 F3-shy‐F513 to13 those13 in13 Fig13 3)13 (b)13 Spatial13 frequency13 tuning13 and13 bandwidth13 of13 filters13 (c)13 Two-shy‐dimensional13 analogs13 of13 filters13 in13 (a)13 The13 two-shy‐dimensional13 filters13 respond13 virtually13 identically13 to13 two-shy‐dimensional13 images13 as13 do13 the13 one-shy‐dimensional13 filters13 to13 one-shy‐dimensional13 signals13 (d)13 Tuning13 curves13 of13 LL13 neurons13 trained13 and13 tested13 on13 surfaces13 with13 varying13 surface13 slant13 (e)13 Normalized13 weights13 on13 model13 complex13 cell13 responses13 for13 constructing13 five13 LL13 neurons13 with13 the13 indicated13 preferred13 disparities13 (f)13 Disparity13 estimation13 performance13 with13 surfaces13 having13 a13 cosine13 distribution13 of13 slants13 (black13 line)13 Also13 shown13 are13 disparity13 estimates13 derived13 from13 a13 local13 cross-shy‐correlator13 (gray13 line)13 (g)13 Disparity13 gradients13 from13 viewing13 natural13 scenes(Hibbard13 2008)13 vs13 disparity13 gradients13 in13 our13 training13 set13 The13 relationship13 between13 slant13 and13 disparity13 gradient13 is13 given13 by13

g asymp IPD tanθ( ) d 13 where13 θ 13 is13 the13 surface13 slant13 and13 d13 is13 the13 distance13 to13 the13 surface13 Hibbard13 (2008)13 estimated13 disparity13 gradients13 in13 natural13 viewing13 for13 distances13 nearer13 than13 3513 m13 (dashed13 curve)13 and13 0513 m13 (dotted13 curve)13 At13 the13 04m13 viewing13 distance13 assumed13 throughout13 the13 paper13 our13 manipulation13 of13 surface13 slant13 introduced13 distribution13 of13 disparity13 gradients13 (black13 line)13 that13 is13 comparable13 to13 the13 gradients13 introduced13 in13 natural13 viewing13 13

913 13

13 Figure13 S213 Filter13 responses13 LL13 neuron13 responses13 and13 performance13 with13 random-shy‐dot13 stereograms13 and13 anti-shy‐correlated13 random-shy‐dot13 stereograms13 (a)13 Joint13 response13 distributions13 of13 filters13 F113 and13 F213 when13 stimulated13 with13 random-shy‐dot13 stereograms13 Compare13 to13 natural13 stimuli13 in13 Fig13 4a13 (b)13 Joint13 responses13 of13 filters13 F313 and13 F413 when13 stimulated13 with13 random-shy‐dot13 stereograms13 (c)13 LL13 neuron13 responses13 to13 random13 dot13 stereograms13 (d)13 LL13 neuron13 tuning13 vs13 bandwidth13 when13 stimulated13 with13 natural13 stimuli13 Note13 that13 bandwidths13 are13 elevated13 compared13 to13 when13 the13 LL13 neurons13 are13 stimulated13 with13 natural13 stimuli13 (see13 Fig13 7c)13 (e)13 Disparity13 estimates13 and13 estimate13 precision13 (inset)13 obtained13 with13 random-shy‐dot13 stereograms13 (fghi)13 Same13 as13 abce13 except13 that13 the13 stimuli13 were13 anti-shy‐correlated13 random13 dot13 stereograms13 Filter13 response13 distributions13 still13 segregate13 as13 a13 function13 of13 disparity13 (especially13 with13 F313 and13 F4)13 However13 the13 anti-shy‐correlated13 filter13 response13 distributions13 do13 not13 coincide13 with13 the13 lsquoexpectedrsquo13 pattern13 of13 response13 for13 particular13 disparities13 The13 disparities13 are13 decoded13 improperly13 with13 a13 decoder13 trained13 on13 natural13 stimuli13 LL13 neuron13 tuning13 curves13 with13 anti-shy‐correlated13 stereograms13 are13 neither13 selective13 nor13 invariant13 Inaccurate13 highly13 variable13 estimates13 result13 13 13

13 13

1013 13

13 Figure13 S313 Obtaining13 AMA13 filter13 and13 model13 complex13 cell13 responses13 with13 established13 neural13 operations13 (a)13 Simple13 cell13 responses13 are13 obtained13 by13 projecting13 the13 left-shy‐13 and13 right-shy‐eye13 signals13 onto13 each13 binocular13 filter13 and13 half-shy‐wave13 rectifying13 Subtracting13 lsquoonrsquo13 and13 lsquooffrsquo13 simple13 cell13 responses13 give13 the13 AMA13 filter13 responses13 (see13 Fig13 3a)13 (b)13 Model13 (and13 neurophysiological)13 complex13 cell13 responses13 are13 obtained13 by13 summing13 and13 squaring13 the13 responses13 of13 simple13 cells13 (Model13 complex13 cell13 responses13 can13 also13 be13 obtained13 by13 summing13 the13 responses13 of13 simple13 cells13 that13 have13 both13 a13 half-shy‐wave13 rectifying13 and13 a13 squaring13 nonlinearity)13 Note13 that13 the13 model13 complex13 cells13 differ13 from13 canonical13 disparity13 energy13 binocular13 complex13 cells13 We13 label13 them13 lsquocomplexrsquo13 because13 they13 exhibit13 temporal13 frequency13 doubling13 and13 have13 a13 fundamental13 to13 DC13 response13 ratio13 of13 less13 than13 one13 The13 model13 LL13 neurons13 (see13 Fig13 613 and13 Fig13 7)13 are13 more13 similar13 to13 disparity13 energy13 complex13 cells13 although13 as13 noted13 in13 the13 text13 there13 are13 also13 important13 differences13

13

1113 13

13 Figure13 S413 The13 eight13 AMA13 filters13 from13 Fig13 313 (on13 the13 diagonal)13 and13 their13 pairwise13 sums13 (see13 Fig13 S3a)13 Solid13 and13 dashed13 curves13 indicate13 the13 left13 and13 right13 eye13 filter13 components13 respectively13 Filter13 identity13 is13 indicated13 by13 the13 subscripts13 Fij13 Some13 filters13 have13 shapes13 similar13 to13 canonical13 V1-shy‐like13 binocular13 receptive13 fields13 These13 filters13 yield13 model13 complex13 cell13 tuning13 curves13 that13 vary13 with13 disparity13 (Fig13 S5)13 Other13 filters13 are13 irregularly13 shaped13 (eg13 F36)13 Irregularly13 shaped13 filters13 yield13 model13 complex13 cell13 tuning13 curves13 that13 do13 not13 vary13 significantly13 with13 disparity13 (Fig13 S5)13 The13 inset13 shows13 the13 largest13 absolute13 normalized13 weight13 (see13 Supplement13 Eqs13 S1a-shy‐d)13 given13 to13 each13 filter13 in13 constructing13 the13 set13 of13 LL13 neurons13 (Fig13 7a)13 Lighter13 colors13 indicate13 smaller13 weights13 Unsurprisingly13 the13 irregularly13 shaped13 filters13 (eg13 F36)13 play13 a13 generally13 smaller13 role13 in13 shaping13 the13 response13 properties13 of13 log-shy‐likelihood13 (LL)13 neurons13 13 13

1213 13

13 Figure13 S513 Disparity13 tuning13 curves13 of13 the13 model13 complex13 cells13 implied13 by13 the13 AMA13 filters13 (on-shy‐diagonal)13 and13 their13 normalized13 pairwise13 sums13 (off-shy‐diagonal)13 Solid13 curves13 show13 the13 complex13 cell13 tuning13 curves13 to13 natural13 stimuli13 The13

complex13 response13 is13 given13 by13 filtering13 the13 contrast-shy‐normalized13 stimulus13 and13 then13 squaring13 Rmax fi

Tcnorm( )2 13 (see13 Fig13 S4)13 Each13 point13 on13 the13 curve13 represents13 the13 average13 response13 to13 a13 large13 number13 of13 natural13 stimuli13 having13 the13 same13 disparity13 Gray13 area13 shows13 response13 variability13 due13 to13 variation13 in13 natural13 image13 content13 Dashed13 curves13 show13 the13 complex13 cell13 tuning13 curve13 to13 513 random-shy‐dot13 stereograms13 (RDSs)13 In13 most13 cases13 RDS13 tuning13 curves13 are13 similar13 to13 the13 tuning13 curves13 for13 natural13 stimuli13 although13 response13 magnitude13 is13 generally13 somewhat13 attenuated13 These13 individual13 model13 complex13 cells13 have13 some13 selectivity13 for13 disparity13 However13 they13 exhibit13 little13 invariance13 to13 variation13 in13 natural13 image13 content13 Thus13 the13 pattern13 of13 the13 joint13 population13 response13 must13 be13 used13 to13 accurately13 estimate13 disparity13 Neurons13 that13 signal13 the13 log-shy‐likelihood13 of13 disparity13 (LL13 neurons)13 can13 be13 constructed13 by13 appropriately13 combining13 the13 joint13 responses13 of13 model13 complex13 cells13 LL13 neurons13 are13 highly13 selective13 for13 disparity13 and13 are13 also13 strongly13 response13 invariant13 to13 irrelevant13

1313 13

13 Figure13 S613 Optimal13 disparity13 estimation13 vs13 estimation13 via13 local13 cross-shy‐correlation13 Performance13 measures13 were13 obtained13 by13 testing13 each13 method13 on13 identical13 sets13 of13 left-shy‐13 and13 right-shy‐eye13 signals13 like13 those13 shown13 in13 Fig13 1d13 (Note13 that13 this13 differs13 somewhat13 from13 the13 most13 common13 test13 of13 the13 cross-shy‐correlation13 model13 which13 gives13 the13 algorithm13 windowed13 access13 to13 many13 patches13 in13 each13 eye13 We13 chose13 the13 current13 method13 so13 that13 each13 algorithm13 has13 access13 to13 the13 same13 image13 information13 thereby13 making13 the13 comparison13 meaningful)13 Black13 curves13 indicate13 estimation13 performance13 via13 local13 cross-shy‐correlation13 Gray13 curves13 indicate13 performance13 obtained13 with13 the13 eight13 AMA13 filters13 shown13 in13 the13 main13 text13 (data13 replotted13 from13 Fig13 5)13 (a)13 Median13 MAP13 estimates13 for13 frontoparallel13 surfaces13 only13 Error13 bars13 indicate13 6813 confidence13 intervals13 (b)13 Estimate13 precision13 as13 indicated13 by13 6813 confidence13 intervals13 Error13 bars13 from13 (a)13 are13 replotted13 on13 a13 linear13 axis13 (Confidence13 intervals13 were13 plotted13 on13 a13 semi-shy‐log13 plot13 in13 the13 main13 text)13 (c)13 Sign13 identification13 performance13 with13 frontoparallel13 surfaces13 (de)13 Same13 as13 bc13 except13 that13 the13 performance13 is13 for13 surfaces13 with13 a13 cosine13 distribution13 of13 slants13 Estimation13 performance13 with13 the13 cross-shy‐correlator13 is13 (slightly)13 more13 biased13 and13 is13 significantly13 less13 precise13 at13 disparities13 larger13 than13 7513 arcmin13 13

1413 13

13 Figure13 S713 Information13 for13 disparity13 estimation13 is13 concentrated13 in13 low13 to13 mid13 spatial13 frequencies13 (a)13 Inter-shy‐ocular13 retinal13 contrast13 amplitude13 of13 a13 cosine-shy‐windowed13 high-shy‐contrast13 edge13 as13 a13 function13 of13 spatial13 frequency13 for13 different13 disparities13 (Eq13 S6)13 The13 optics13 in13 the13 two13 eyes13 were13 identical13 ie13

AL f( ) = AR f( )prop MTFδ f( ) f 13 where13 MTFδ is13 the13 modulation13 transfer13 function13 associated13 with13 a13 particular13 disparity13 (see13 Methods)13 (b)13 Inter-shy‐ocular13 contrast13 signals13 that13 have13 been13 passed13 through13 a13 bank13 of13 1513 octave13 bandwidth13 log-shy‐Gabor13 filters13 The13 signals13 differ13 little13 above13 ~513 cpd13 (c)13 The13 effect13 of13 vergence13 jitter13 on13 inter-shy‐ocular13 contrast13 signals13 Signals13 were13 corrupted13 by13 213 arcmin13 of13 vergence13 jitter13 a13 typical13 value13 in13 humans13 Vergence13 noise13 was13 simulated13 by13 convolving13 each13 disparity13 signal13 with13 a13 Gaussian13 having13 a13 standard13 deviation13 of13 213 arcmin13 Vergence13 jitter13 further13 attenuates13 high13 frequency13 inter-shy‐ocular13 contrast13 signals13 Natural13 image13 structure13 and13 vergence13 jitter13 both13 contribute13 to13 the13 lack13 of13 disparity13 information13 at13 high13 luminance13 spatial13 frequencies13 (d)13 The13 signals13 in13 (c)13 after13 having13 been13 passed13 through13 a13 bank13 of13 1513 octave13 bandwidth13 log-shy‐Gabor13 filters13 (e)13 The13 effect13 of13 a13 four13 diopter13 difference13 in13 refractive13 power13 between13 the13 two13 eyes13 drastically13 reduces13 the13 inter-shy‐ocular13 contrast13 signal13 A13 four13 diopter13 (or13 larger)13 difference13 between13 the13 left13 and13 right13 optics13 is13 a13 primary13 risk13 factor13 for13 the13 development13 of13 amblyopia13 (Levi13 et13 al13 2011)13 Note13 that13 these13 signals13 were13 simulated13 for13 eyes13 with13 2mm13 pupils13 With13 larger13 pupil13 diameters13 the13 same13 difference13 in13 refractive13 power13 would13 further13 reduce13 available13 disparity13 signals13 For13 example13 with13 a13 2mm13 pupil13 an13 813 diopter13 difference13 between13 the13 eyes13 produces13 the13 same13 difference13 in13 defocus13 blur13 as13 would13 a13 4mm13 pupil13 with13 a13 413 diopters13 difference13 between13 the13 eyes13 In13 that13 case13 there13 are13 no13 measureable13 signals13 above13 113 cpd13 (f)13 The13 signals13 in13 (e)13 after13 having13 been13 passed13 through13 a13 bank13 of13 1513 octave13 bandwidth13 log-shy‐Gabor13 filters13 (g)13 Same13 as13 (e)13 but13 with13 a13 4mm13 pupil13 (h)13 Same13 as13 (f)13 but13 with13 a13 4mm13 pupil13 13

13

1513 13

13 Figure13 S813 Depth13 structure13 not13 attributable13 to13 slant13 in13 natural13 scenes13 (a)13 A13 representative13 natural13 range13 image13 (b)13 Overhead13 plan13 view13 of13 the13 range13 along13 the13 white13 scan-shy‐line13 in13 (a)13 (c)13 Range13 values13 within13 a13 113 deg13 patch13 marked13 by13 the13 white13 box13 in13 (a)13 and13 the13 black13 box13 in13 (b)13 The13 dots13 represent13 the13 raw13 range13 values13 The13 plane13 is13 the13 best13 fitting13 plane13 to13 the13 range13 data13 (d)13 Depth13 variation13 in13 113 deg13 patches13 of13 natural13 scenes13 can13 largely13 be13 captured13 by13 slanted13 planes13 (see13 supplementary13 text)13 Values13 near13 one13 indicate13 that13 all13 range13 variation13 can13 be13 captured13 by13 (planar)13 surface13 slant13 Values13 near13 zero13 indicate13 that13 none13 of13 the13 range13 variation13 can13 be13 captured13 by13 planar13 surface13 slant13 In13 the13 most13 common13 patch13 nearly13 all13 depth13 variation13 is13 attributable13 to13 slant13 In13 the13 median13 patch13 7513 of13 depth13 variation13 is13 captured13 by13 slant13 13 13 13 13 13 13 13 13 13 13

1613 13

13 Figure13 S913 Contrast13 normalization13 and13 the13 Gaussian13 form13 of13 the13 filter13 response13 distributions13 The13 standard13 model13 of13

contrast13 normalization13 in13 retinal13 and13 cortical13 neurons13 is13 given13 by13 cnorm x( ) = c x( ) c x( ) 2

+ nc502 (Albrecht13 amp13 Geisler13

199113 Albrecht13 amp13 Hamilton13 198213 Heeger13 1992)13 Throughout13 the13 paper13 we13 assumed13 that13 c50 13 had13 a13 value13 of13 zero13 To13 examine13 the13 effect13 of13 different13 values13 of13 c50 13 on13 the13 form13 of13 the13 filter13 response13 distributions13 we13 first13 contrast-shy‐normalized13 the13 test13 stimuli13 using13 different13 values13 of13 c50 13 Then13 we13 projected13 the13 contrast-shy‐normalized13 the13 test13 stimuli13 from13 each13 disparity13 level13 onto13 the13 filters13 to13 obtain13 the13 conditional13 filter13 response13 distributions13 Finally13 we13 fit13 a13 multi-shy‐dimensional13 generalized13 Gaussian13 (one-shy‐dimension13 for13 each13 filter)13 to13 each13 conditional13 response13 distribution13

P R |δ k( ) 13 via13 maximum13 likelihood13 estimation13 (a)13 When13 c50 =13 0013 the13 conditional13 response13 distributions13 have13 tails13 that13 tend13 to13 be13 somewhat13 lighter13 than13 Gaussians13 (ie13 lower13 kurtosis13 than13 a13 Gaussian)13 When13 c50 =13 0113 the13 conditional13 response13 distributions13 are13 most13 Gaussian13 on13 average13 For13 c50 0113 the13 distributions13 have13 heavier13 tails13 than13 Gaussians13 (ie13 higher13 kurtosis13 than13 a13 Gaussian)13 (b)13 Powers13 of13 the13 multi-shy‐dimensional13 generalized13 Gaussians13 that13 best13 fit13 a13 subset13 of13 the13 two-shy‐dimensional13 conditional13 filter13 response13 distributions13 A13 power13 of13 two13 (dashed13 line)13 indicates13 that13 the13 best13 fitting13 distribution13 was13 Gaussian13 Powers13 greater13 than13 or13 less13 than13 two13 indicate13 that13 the13 responses13 are13 better13 fit13 by13 distributions13 having13 lighter13 or13 heavier13 tails13 than13 a13 Gaussian13 respectively13 The13 best-shy‐fitting13 powers13 are13 shown13 for13 three13 different13 values13 of13 c50 13 00013 01013 and13 02513 The13 average13 powers13 across13 the13 conditional13 response13 distributions13 for13 the13 three13 tested13 values13 of13 c50 13 were13 27913 20713 and13 11713 respectively13 The13 same13 qualitative13 results13 hold13 for13 the13 other13 two-shy‐dimensional13 response13 distributions13 The13 pattern13 also13 holds13 for13 higher-shy‐dimensional13 filter13 response13 distributions13 In13 higher13 dimensions13 however13 results13 are13 noisier13 because13 stable13 estimates13 of13 kurtosis13 require13 significantly13 more13 data13 13 13 13 13 13

  • jovi-14-02-01_118
    • Introduction
    • f01
    • Methods and results
    • e01
    • f02
    • e02
    • e03
    • e04
    • f03
    • e05
    • f04
    • f05
    • f06
    • f07
    • Discussion
    • Conclusions
    • Methods details
    • e06
    • e07
    • Albrecht1
    • Albrecht2
    • Anzai1
    • Badcock1
    • Banks1
    • Berens1
    • Blakemore1
    • Burge1
    • Burge2
    • Carandini1
    • Chen1
    • Cormack1
    • Cumming1
    • Cumming2
    • Cumming3
    • Curcio1
    • DeValois1
    • DeAngelis1
    • Ferster1
    • Fincham1
    • Fleet1
    • Ganguli1
    • Geisler1
    • Girshick1
    • Haefner1
    • Haefner2
    • Harris1
    • Hecht1
    • Heeger1
    • Hibbard1
    • Hoyer1
    • Jazayeri1
    • Kaye1
    • Landers1
    • Lehky1
    • Levi1
    • Lewicki1
    • Li1
    • Liu1
    • Ma1
    • Marcos1
    • Marr1
    • McKee1
    • Nakayama1
    • Nienborg1
    • Ogle1
    • Ohzawa1
    • Ohzawa2
    • Ohzawa3
    • Olshausen1
    • Panum1
    • Poggio1
    • Qian1
    • Qian2
    • Read1
    • Read2
    • Read3
    • Simoncelli1
    • Skottun1
    • Stevenson1
    • Stockman1
    • Tanabe1
    • Tanabe2
    • Thibos1
    • Tsao1
    • Tyler1
    • Tyler2
    • Vlaskamp1
    • Williams1
    • Wyszecki1
      • JOV-03741-2013-s01
Page 10: Optimal disparity estimation in natural stereo images · The local differences between the retinal images—the binocular disparities— ... vertically oriented receptive fields.The

they have the center-surround organization that is ahallmark of retinal ganglion cell LGN and V1receptive fields Understanding the basis of this center-surround organization is an important direction forfuture work

V1 binocular neurons are unlikely to have receptivefields exactly matching those of the simple and complexcells implied by the optimal AMA filters but V1neurons are likely to span the subspace spanned by theoptimal filters It is thus plausible that excitatory andinhibitory synaptic weights could develop so that asubset of the neurons in V1 or in other cortical areassignal the log likelihood of different specific disparities(see Figure 7b) Indeed some cells in cortical areas V1V2 and V3V3a exhibit sharp tuning to the disparity ofrandom dot stimuli (Cumming 2002 Ohzawa et al1997 G F Poggio et al 1988 Read amp Cumming2003)

Computational models of estimation from neuralpopulations often rely on the assumption that eachneuron is invariant and unimodally tuned to thestimulus property of interest (Girshick et al 2011Jazayeri amp Movshon 2006 Lehky amp Sejnowski 1990Ma et al 2006) However it is often not discussed howinvariant unimodal tuning arises For example thebinocular neurons (ie complex cells) predicted by thestandard disparity energy model do not generallyexhibit invariant unimodal tuning (Figure 6d) Ouranalysis shows that neurons with unimodal tuning tostimulus properties not trivially available in the retinal

images (eg disparity) can result from appropriatelinear combination of nonlinear filter responses

To obtain optimal disparity estimates the LLneuron population response (represented by the vectorRLL in Figure 2c) must be read out (see Figure 2d)The optimal read-out rule depends on the observerrsquosgoal (the cost function) A common goal is to pick thedisparity having the maximum a posteriori probabil-ity If the prior probability of the different possibledisparities is uniform then the optimal MAP decodingrule reduces to finding the LL neuron with themaximum response Nonuniform prior probabilitiescan be taken into account by adding a disparity-dependent constant to each LL neuron responsebefore finding the peak response There are elegantproposals for how the peak of a population responsecan be computed in noisy neural systems (Jazayeri ampMovshon 2006 Ma et al 2006) Other commonlyassumed cost functions (eg MMSE) yield similarperformance

In sum our analysis has several implications First itsuggests that optimal disparity estimation is bestunderstood in the context of a population codeSecond it shows how to linearly sum nonlinear neuralresponses to construct cells with invariant unimodaltuning curves Third it suggests that the eclecticmixture of binocular receptive field properties in cortexmay play a functional role in disparity estimationFourth it provides a principled hypothesis for howneurons may compute the posterior probabilities of

Figure 7 Constructing selective invariant disparity-tuned units (LL neurons) (a) Tuning curves for several LL neurons each with a

different preferred disparity Each point on the tuning curve represents the average response across a collection of natural stereo-

images having the same disparity Gray areas indicate 61 SD of response due to stimulus-induced response variability (b)

Normalized weights on model complex cell responses (see Supplement Equation 5 Figure 6c) for constructing the five LL neurons

marked with arrows in (a) Positive weights are excitatory (red) Negative weights are inhibitory (blue) On-diagonal weights

correspond to model complex cells having linear receptive fields like the filters in Figure 3a Off-diagonal weights correspond to model

complex cells having linear receptive fields like the scaled pairwise sums of the AMA filters (see Figures S3ndashS5) High spatial

frequencies are not useful for estimating large disparities (Figure S7) Thus the number of strongly weighted complex cells decreases

as the magnitude of the preferred disparity increases from zero (c) LL neuron bandwidth (ie full-width at half-height of disparity

tuning curves) as a function of preferred disparity Bandwidth increases approximately linearly with tuning

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 10

stimulus properties (eg disparity defocus motion)not trivially available in the retinal image(s) Thus ouranalysis provides a recipe for how to increase selectivityand invariance in the visual processing stream invari-ance to image content variability and selectivity for thestimulus property of interest

Discussion

Using principles of Bayesian statistical decisiontheory we showed how to optimally estimate binoc-ular disparity in natural stereo images given theoptical sensor and noise properties of the humanvisual system First we determined the linear binoc-ular filters that encode the retinal image informationmost useful for estimating disparity in natural stereo-images The optimal linear filters share many proper-ties with binocular simple cells in the primary visualcortex of monkeys and cats (Figure 3 Figure S1)Then we determined how the optimal linear filterresponses should be nonlinearly decoded to maximizethe accuracy of disparity estimation The optimaldecoder was not based on prior assumptions butrather was dictated by the statistical pattern of thejoint filter responses to natural stereo images Overallperformance was excellent and matched importantaspects of human performance (Figure 5) Finally weshowed that the optimal encoder and decoder can beimplemented with well-established neural operationsThe operations show how selectivity and invariancecan emerge along the visual processing stream(Figures 6 amp 7)

External variability and neural noise

A visual systemrsquos performance is limited by bothexternal (stimulus-induced) variability and intrinsicneural response variability (ie neural noise) Manytheoretical studies have focused on the impact ofneural noise on neural computation (Berens EckerGerwinn Tolias amp Bethge 2011 Cumming ampDeAngelis 2001 DeAngelis et al 1991 Jazayeri ampMovshon 2006 Ma et al 2006 Nienborg et al 2004Ohzawa DeAngelis amp Freeman 1990) However inmany real-world situations stimulus-induced responsevariability (ie variability due to nuisance stimulusproperties) is a major or the dominant source ofvariability Our analysis demonstrates that the exter-nal variation in natural image contentmdashthat isvariation in image features irrelevant to the taskmdashconstitutes an important source of response variabilityin disparity estimation (see Figures 4 5 7) Thus evenin a hypothetical visual system composed of noiseless

neurons significant response variability will occurwhen estimating disparity under natural conditions(see Figures 4 and 5)

Determining the best performance possible withoutneural noise is essential for understanding the effect ofneural noise (Haefner amp Bethge 2010 Lewicki 2002Li amp Atick 1994 Olshausen amp Field 1996 Simoncelliamp Olshausen 2001) because it sets bounds on theamount of neural noise that a system can toleratebefore performance significantly deteriorates Indeedthe impact of neural noise on encoding and decoding innatural viewing cannot be evaluated without consider-ing the impact of stimulus-induced response variabilityon the population response (see Supplement)

Seventy years ago Hecht Shlaer and Pirenne (1942)showed in a landmark study that dark-adapted humanscan detect light in the periphery from the absorption ofonly five to eight quanta and that external variability(in the form of photon noise) was the primary source ofvariability in human performance This result contra-dicted the notion (widespread at the time) that responsevariability is due primarily to variability within theorganism (eg intrinsic noise) Similarly our resultsunderscore the importance of analyzing the variabilityof natural signals when analyzing the responsevariability of neural circuits that underlie performancein critical behavioral tasks

Comparison with the disparity energy model

The most common neurophysiological model ofdisparity processing is the disparity energy modelwhich aims to account for the response properties ofbinocular complex cells (Cumming amp DeAngelis 2001DeAngelis et al 1991 Ohzawa 1998 Ohzawa et al1990) The model had remarkable initial successHowever an increasing number of discrepancies haveemerged between the neurophysiology and the modelrsquospredictions The standard disparity energy model forexample does not predict the reduced response toanticorrelated stereograms observed in cortex It alsodoes not predict the dependence of disparity selectivecells on more than two binocular subunits Fixes to thestandard model have been proposed to account for theobserved binocular response properties (Haefner ampCumming 2008 Qian amp Zhu 1997 Read Parker ampCumming 2002 Tanabe amp Cumming 2008) We haveshown however that a number of these properties area natural consequence of an optimal algorithm fordisparity estimation

The most common computational model of disparityprocessing is the local cross-correlation model (Bankset al 2004 Cormack et al 1991 Tyler amp Julesz 1978)In this model the estimated disparity is the one thatproduces the maximum local cross-correlation between

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 11

the images in the two eyes Local cross-correlation isthe computational equivalent of appropriately pro-cessing large populations of disparity-energy complexcells having a wide range of tuning characteristics(Anzai Ohzawa amp Freeman 1999 Fleet Wagner ampHeeger 1996) Cross-correlation is optimal when thedisparities across a patch are small and constantbecause it is then akin to the simple template matchingof noise-limited ideal observers Cross-correlation isnot optimal for larger or varying disparities becausethen the left- and right-eye signals falling within abinocular receptive field often differ substantially

How does our ideal observer for disparity estimationcompare with the standard disparity-energy and cross-correlation models To enable a meaningful compari-son both methods were given access to the same imageinformation Doing so requires a slight modification tothe standard cross correlation algorithm (seeSupplement) The eight AMA filters in Figure 3outperform cross-correlation in both accuracy andprecision when the disparities exceed approximately75 arcmin (Figure S6) This performance increase isimpressive given that local cross-correlation has (ineffect) access to a large bank of filters whereas ourmethod uses only eight Furthermore increasing thenumber of AMA filters to 10ndash12 gives essentiallyequivalent performance to cross-correlation for smallerdisparities

Spatial frequency tuning of optimal linearbinocular filters

The optimal linear binocular filters (the AMA filters)are selective for low to mid spatial frequencies (12ndash47c8) and not for higher frequencies (see Figure 3 FigureS1) To develop an intuition for why we examined thebinocular contrast signals as a function of frequencythat result from a binocularly viewed high-contrastedge (Figure S7) The binocular contrast signals barelydiffer above 6 c8 indicating that higher spatialfrequencies carry little information for disparity esti-mation

This analysis of binocular contrast signals providesinsight into another puzzling aspect of disparityprocessing Human sensitivity to disparity modulation(ie sinusoidal modulations in disparity-defined depth)cuts off at very low frequencies (4 c8) (Banks et al2004 Tyler 1975) V1 binocular receptive fields tunedto the highest useful (luminance) spatial frequency (6c8) have widths of 8 arcmin assuming the typicaloctave bandwidth for cortical neurons Given thatreceptive fields cannot signal variation finer than theirown size the fact that humans cannot see disparitymodulations higher than 4 c8 is nicely predicted bythe 8 arcmin width suggested by the present analysis

This size matches previous psychophysically basedestimates of the smallest useful mechanism in disparityestimation (Banks et al 2004 Harris McKee ampSmallman 1997)

Eye movement jitter has previously been proposed asan explanation for why useful binocular information isrestricted to low spatial frequencies (Vlaskamp Yoonamp Banks 2011) The argument is that jitter and therelatively long integration time of the stereo systemcould lsquolsquosmear outrsquorsquo the disparity signals Eye movementjitter certainly can degrade stereo information enoughto render high frequency signals unmeasurable (FigureS7c d) However the analysis in the supplement(Figure S7) suggests that the low frequency at whichhumans lose the ability to detect disparity modulationmay also be explained by the statistics of naturalimages

Generality of findings

Although our analysis is based on calibrated naturalstereo-image patches and realistic characterizations ofhuman optics sensors and neural noise there are somesimplifying assumptions that could potentially limit thegenerality of our conclusions Here we examine theeffect of these assumptions

The effect of surface slant and nonplanar depth structure

The stereo-image patches were projections offronto-parallel surfaces In natural scenes surfaces areoften slanted which causes the disparity to changeacross a patch We evaluated the effect of surface slantby repeating our analysis with a training set having adistribution of slants (see Figure 5d e andSupplement) This distribution of slants produced adistribution of disparity gradients that are comparableto those that occur when viewing natural scenes(Hibbard 2008) (Figure S1g) The optimal binocularfilters are only very slightly affected by surface slant(cf Figure 3 and Figure S1a) These differenceswould be difficult to detect in neurophysiologicalexperiments This may help explain why there hasbeen little success in finding neurons in early visualcortex that are tuned to nonzero slants (Nienborg etal 2004) Disparity estimation performance is alsoquite similar although estimate precision is somewhatreduced (Figure 5d e)

Even after including the effect of surface slant ourtraining set lacks the non-planar depth structuremdashwithin-patch depth variation and depth discontinuities(occlusions)mdashthat is present in natural scenes (FigureS8andashc) To determine how the magnitude of theseother sources depth variation compare to that due toslant we analyzed a set of range images obtained with

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 12

a high-precision Riegl VZ400 range scanner (seeSupplement) We find that most of the depth variationin the range images is captured by variation in slant(Figure S8d) Given that variation in slant has littleeffect on the optimal receptive fields and given thatlocations of fine depth structure and monocularocclusion zones (9 of points in our dataset) aremost likely random with respect to the receptive fieldsthen it is likely that the other sources of variation willhave little effect

Nevertheless occlusions create monocular zones thatcarry information supporting da Vinci stereopsis inhumans (Kaye 1978 Nakayama amp Shimojo 1990)Neurophysiological evidence suggests that binocularmechanisms exist for locating occlusion boundaries(Tsao Conway amp Livingstone 2003) In the futurewhen databases become available of high-qualitynatural stereo images with coregistered RGB anddistance information the general approach describedhere may prove useful for determining the filters thatare optimal for detecting occlusion boundaries

The effect of a realistic disparity prior

Our analysis assumed a flat prior probabilitydistribution over disparity Two groups have recentlypublished estimates of the prior distribution ofdisparities based on range measurements in naturalscenes and human eye movement statistics (Hibbard2008 Liu et al 2008) Does the prior distribution ofdisparity signals have a significant effect on the optimalfilters and estimation performance To check wemodified the frequency of different disparities in ourtraining set to match the prior distribution ofdisparities encountered in natural viewing (Liu et al2008) Linear filter shapes LL neuron tuning curveshapes and performance levels were robust to differ-ences between a flat and realistic disparity prior

It has been hypothesized that the prior over thestimulus dimension of interest (eg disparity) maydetermine the optimal tuning curve shapes and theoptimal distribution of peak tunings (ie how thetuning curves should tile the stimulus dimension)(Ganguli amp Simoncelli 2010) In evaluating thishypothesis one must consider the results presented inthe present paper Our results show that natural signalsand the linear receptive fields that filter those signalsplace strong constraints on the shapes that tuningcurves can have Specifically the shapes of the LLneuron disparity tuning curve (approximately log-Gaussian) are robust to changes in the prior Thusalthough the prior may influence the optimal distribu-tion of peak tuning (our analysis does not address thisissue) it is unlikely to be the sole (or even the primary)determinant of tuning curve shapes

The effect of contrast normalization

Contrast normalization significantly contributes tothe Gaussian form of the filter response distributionsOne potential advantage of Gaussian response distri-butions is that pair-wise (and lower order) statisticsfully characterize the joint responses from an arbitrarynumber of filters making possible the decoding of largefilter population responses In the analysis presentedhere all binocular signals were normalized to a mean ofzero and a vector magnitude of 10 before beingprojected onto the binocular filters This is a simplifiedform of the contrast normalization that is a ubiquitousfeature of retinal and early cortical processing (Car-andini amp Heeger 2012) The standard model ofcontrast normalization in cortical neurons is given by

cnorm(x)frac14 c(x)ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffijjcethxTHORNjj2 thorn nc2

50

qwhere n is the dimen-

sionality of the vector and c50 is the half-saturationconstant (Albrecht amp Geisler 1991 Albrecht ampHamilton 1982 Heeger 1992) (The half-saturationconstant is so-called because in a neuron with anoutput squaring nonlinearity the response rate willequal half its maximum when the contrast of thestimulus equals the value of c50)

We examined the effect of different half-saturationconstants The optimal filters are robust to differentvalues of c50 The conditional response distributions arenot For large values of c50 the response distributionshave tails much heavier than Gaussians When c50frac14 00(as it did throughout the paper) the distributions arewell approximated but somewhat lighter-tailed thanGaussians The distributions (eg see Figure 4a) aremost Gaussian on average when c50 frac14 01 (Figure S9)On the basis of this finding we hypothesize a newfunction for cortical contrast normalization in additionto the many already proposed (Carandini amp Heeger2012) Contrast normalization may help create condi-tional filter response distributions that are Gaussianthereby making simpler the encoding and decoding ofhigh-dimensional subspaces of retinal image informa-tion

The effect of different optics in the two eyes

We modeled the optics of the two eyes as beingidentical whereas refractive power and monochromaticaberrations often differ somewhat between the eyes(Marcos amp Burns 2000) To evaluate the effect ofnormal optical variations we convolved the retinalprojections with point-spread functions previouslymeasured in the first authorrsquos left and right eyes (Burgeamp Geisler 2011) (The first author has normal opticalvariation) Then we determined optimal filters andperformance Filters and performance levels are similarto those in Figures 3 and 4 Thus our results are robustto typical optical differences between the eyes How-

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 13

ever if the optics in one eye is severely degradedrelative to the other then the optimal filters are quitedifferent and disparity estimation performance issignificantly poorer

Interestingly a persistent 4 diopter difference inrefractive power between the two eyes is the mostimportant risk factor for the development of amblyopia(Levi McKee amp Movshon 2011) a condition char-acterized by extremely poor or absent stereopsis(among other deficits) A four diopter differencebetween the left and right eye optics drastically reduces(ie nearly eliminates) computable interocular contrastsignals that covary systematically with disparity(Equation S6 Figure S7endashh)

Conclusions

The method presented here provides a prescriptionfor how to optimally estimate local depth cues fromimages captured by the photosensors in any visionsystem Our analysis of the depth cue of binoculardisparity provides insight into the computations andneural populations required for accurate estimation ofbinocular disparity in animals viewing natural scenesSpecifically by analyzing the information available atthe photoreceptors we improve upon existing compu-tational methods for solving the stereo correspondenceproblem and provide a normative account of a range ofestablished neurophysiological and psychophysicalfindings This study demonstrates the power ofcharacterizing the properties of natural signals that arerelevant for performing specific natural tasks A similarrecent analysis provided insight into the neuralcomputations required for accurate estimation of thefocus error (defocus) in local regions of retinal imagesof natural scenes (Burge amp Geisler 2011 Stevenson etal 1992) The same approach seems poised to provideinsight into the neural encoding and decoding thatunderlie many other fundamental sensory and percep-tual tasks

Methods details

Natural disparity signals

To determine the luminance structure of naturalscenes we photographed natural scenes on and aroundthe University of Texas at Austin campus with a tripod-mounted Nikon D700 14-bit SLR camera (4256 middot 2836pixels) fitted with a Sigma 50 mm prime lens (Burge ampGeisler 2011) To ensure sharp photographs thecamera lens was focused on optical infinity and all

imaged objects were at least 16 m from the cameraRAW photographs were converted to luminance valuesusing the measured sensitivities of the camera (Burge ampGeisler 2011 2012) and the human photopic sensitivityfunction (Stockman amp Sharpe 2000) Twelve hundred256 middot 256 pixel patches were randomly selected from80 photographs (15 Patches middot 80 Images Figure 1b)Four hundred were used for training the other 800were used for testing

To approximate the depth structure of naturalscenes we texture map the luminance patches ontoplanar fronto-parallel or slanted surface patches thatwere straight ahead at eye height After perspectiveprojection this procedure yields the same pattern ofretinal stimulation that would be created by viewingphotographs pasted on surfaces (eg walls) differentlyslanted in depth The disparity of the surface patch wasdefined as the disparity of the surfacersquos central point

d frac14 2 tan1 IPD=2

dfixation thorn D

tan1 IPD=2

ethdfixationTHORN

eth6THORNwhere dfixation is the fixation distance to the surfacecenter IPD is the interpupillary distance and D is thedepth Depth is given by D frac14 dsurface dfixation wheredsurface is the distance to the surface The disparity ofother points on the surface patch varies slightly acrossthe surface patch The change in disparity across thesurface is greater when the surface patches are slanted

Each surface patch was either coincident with ordisplaced from the point of fixation (Figure 1c) that isthe surface patches were positioned at one of 37 depthscorresponding to 37 disparity levels within Panumrsquosfusional range (Panum 1858) (16875 to 16875arcmin in equally spaced steps) Each surface patchsubtended 28 of visual angle from the cyclopean eyeSurfaces were then slanted (or not) and left- and right-eye projections were determined via perspective pro-jection ensuring that horizontal and vertical disparitieswere correct (Figure 1c) Next left and right projec-tions were resampled at 128 samples8 approximatelythe sampling rate of the human foveal cones (CurcioSloan Kalina amp Hendrickson 1990) Finally theimages were cropped such that they subtended 18 fromeach eye (6 058 about the left- and right-eye foveae)

Optics

Patches were defocused with a polychromatic point-spread function based on a wave-optics model of thehuman visual system The model assumed a 2 mm pupil(a size typical for a bright sunny day) (Wyszecki ampStiles 1982) human chromatic aberrations (ThibosYe Zhang amp Bradley 1992) a single refracting

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 14

surface and the Fraunhoffer approximation whichimplies that at or near the focal plane the opticaltransfer function (OTF) is given by the cross-correla-tion of the generalized pupil function with its complexconjugate In humans and nonhuman primates ac-commodative and vergence eye-movement systems aretightly coupled In other words the focus distanceusually equals the fixation distance (Fincham ampWalton 1957) We set the focus distance equal to thefixation distance of 40 cm and defocused the imagesappropriate for each disparity See Supplement forfurther details of how the optics were simulated

Sensor array responses

The information for disparity estimation depends onthe depth and luminance structure of natural scenesprojective geometry the optics of the two eyes thesensor arrays and the sensor noise These factorstogether determine the pattern of noisy sensor re-sponses

reethxTHORN frac14 IeethxTHORNpsfeethxTHORNfrac12 sampethxTHORN thorn g eth7THORNfor each eye e Ie(x) represents an eye-specific lumi-nance image of the light striking the sensor array ateach location xfrac14 (x y) in a hypothetical optical systemthat causes zero degradation in image quality Eacheyersquos optics degrade the idealized image the optics arerepresented by a polychromatic point-spread functionpsfe(x) that contains the effects of defocus chromaticaberration and photopic wavelength sensitivity Thesensor arrays are represented by a spatial samplingfunctions samp(x) Finally the sensor responses arecorrupted by noise g the noise level was set just highenough to remove retinal image detail that is unde-tectable by the human visual system (Williams 1985)(In pilot studies we found that the falloff in contrastsensitivity at low frequencies had a negligible effect onresults for simplicity we did not model its effects) Notethat Equation 7 represents a luminance (photopic)approximation of the images that would be captured bythe photoreceptors This approximation is sufficientlyaccurate for the present purposes (Burge amp Geisler2011)

Accuracy maximization analysis (AMA)

AMA is a method for dimensionality reduction thatfinds the low-dimensional set of features in sensorysignals that are most useful for performing a specifictask The dependence of AMA on task renders itdistinct from many other popular methods for dimen-sionality reduction Principal components analysis(PCA) for example finds the dimensions that account

for the most variation in the sensory signals There isof course no guarantee that factors causing the mostvariation in the sensory signals will be useful for thetask at hand Color lighting and the multitude oftextures in natural images for example cause signifi-cant variation in the retinal images Yet all of thisvariation is irrelevant for the task of estimatingdisparity Thus PCA returns features that are notnecessarily useful to the task while AMA is specificallydesigned to ignore the irrelevant variation

The logic of AMA is as follows Consider encodingeach training stimulus with a small population of filterswith known (internal) noise characteristics The filterresponses are given by the dot product of the contrast-normalized stimulus scaled by the maximum responserate plus response noise (here a small amount ofGaussian noise) With a known noise model it isstraightforward to compute the mean and variance ofeach filterrsquos response to each training stimulus Then aclosed-form expression is derived for the approximateaccuracy of the Bayesian optimal decoder (GeislerNajemnik amp Ing 2009) Finally this closed-formexpression can be used to search the space of linearfilters to find those that give the most accurateperformance If the algorithm does not settle in localminima it finds the Bayes-optimal filters for maximiz-ing performance in a given task (Geisler et al 2009)Here different random initializations yielded the samefinal estimated filters A Matlab implementation ofAMA and a short discussion of how to apply it areavailable at httpjburgecpsutexaseduresearchCodehtml

It is important to note that the Bayesian optimaldecoder used in AMA requires knowing the means andvariances for each possible stimulus and hence it canonly be used to decode the training stimuli In otherwords AMA can find the optimal linear filters given alarge enough training set but the decoder it uses tolearn those filters cannot be applied to arbitrary stimuliA separate analysis (like the one described in the bodyof the present paper) is required to determine how tooptimally decode the AMA filter responses for arbi-trary test stimuli

Estimating disparity

Increasing the number of disparity levels in thetraining set increases the accuracy of disparity estima-tion However increasing the number of disparitylevels in the training set also increases the training setsize and the computational complexity of learningfilters via AMA A balance must be struck Excellentcontinuous estimates are obtained using 1875 arcminsteps for training followed by interpolation of 577 filterresponse distributions (ie 31 interpolated response

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 15

distribution between each training step correspondingto one interpolated distribution every 35 arcsec)Interpolated distributions were obtained by fittingcubic splines through the response distribution meanvectors and covariance matrices This procedureresulted in 577 LL neurons which resulted in posteriorprobability distributions that were defined by 577discrete points MAP estimates were obtained byselecting the disparity with the highest posteriorprobability Increasing the number of interpolateddistributions (ie LL neurons) had no effect onperformance

Keywords natural scene statistics perceptual con-stancy ideal observer Bayesian statistics populationcode encoding decoding selectivity invariance depthperception stereopsis hierarchical model disparityenergy model simple cells complex cells

Acknowledgments

The authors thank Larry Cormack Bruno Olshau-sen and Jonathan Pillow for comments on an earlierdraft JB was supported by NSF grant IIS-1111328NIH grant 2R01-EY011747 and NIH Training Grant1T32-EY021462-01A1 WSG was supported by NIHgrant 2R01-EY011747

Commercial relationships noneCorresponding author Johannes BurgeEmail jburgemailcpsutexaseduAddress Center for Perceptual Systems University ofTexas at Austin Austin TX USA

References

Albrecht D G amp Geisler W S (1991) Motionselectivity and the contrast-response function ofsimple cells in the visual cortex Visual Neurosci-ence 7(6) 531ndash546

Albrecht D G amp Hamilton D B (1982) Striatecortex of monkey and cat Contrast responsefunction Journal of Neurophysiology 48(1) 217ndash237

Anzai A Ohzawa I amp Freeman R D (1999)Neural mechanisms for processing binocular in-formation I Simple cells Journal of Neurophysiol-ogy 82(2) 891ndash908

Badcock D R amp Schor C M (1985) Depth-increment detection function for individual spatialchannels Journal of the Optical Society of AmericaA 2(7) 1211ndash1215

Banks M S Gepshtein S amp Landy M S (2004)Why is spatial stereoresolution so low Journal ofNeuroscience 24(9) 2077ndash2089

Berens P Ecker A S Gerwinn S Tolias A S ampBethge M (2011) Reassessing optimal neuralpopulation codes with neurometric functionsProceedings of the National Academy of SciencesUSA 108(11) 4423

Blakemore C (1970) The range and scope ofbinocular depth discrimination in man The Journalof Physiology 211(3) 599ndash622

Burge J amp Geisler W S (2011) Optimal defocusestimation in individual natural images Proceed-ings of the National Academy of Sciences USA108(40) 16849ndash16854

Burge J amp Geisler W S (2012) Optimal defocusestimates from individual images for autofocusing adigital camera Proceedings of the ISampTSPIE 47thAnnual Meeting January 2012 Burlingame CA

Carandini M amp Heeger D J (2012) Normalizationas a canonical neural computation Nature ReviewsNeuroscience 13(1) 51ndash62

Chen Y amp Qian N (2004) A coarse-to-fine disparityenergy model with both phase-shift and position-shift receptive field mechanisms Neural Computa-tion 16(8) 1545ndash1577

Cormack L K Stevenson S B amp Schor C M(1991) Interocular correlation luminance contrastand cyclopean processing Vision Research 31(12)2195ndash2207

Cumming B G (2002) An unexpected specializationfor horizontal disparity in primate primary visualcortex Nature 418(6898) 633ndash636

Cumming B G amp DeAngelis G C (2001) Thephysiology of stereopsis Annual Review of Neuro-science 24 203ndash238

Cumming B G Shapiro S E amp Parker A J (1998)Disparity detection in anticorrelated stereogramsPerception 27(11) 1367ndash1377

Curcio C A Sloan K R Kalina R E ampHendrickson A E (1990) Human photoreceptortopography The Journal of Comparative Neurolo-gy 292(4) 497ndash523

De Valois R L Albrecht D G amp Thorell L G(1982) Spatial frequency selectivity of cells inmacaque visual cortex Vision Research 22(5) 545ndash559

DeAngelis G C Ohzawa I amp Freeman R D(1991) Depth is encoded in the visual cortex by aspecialized receptive field structure Nature352(6331) 156ndash159

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 16

Ferster D amp Miller K D (2000) Neural mechanismsof orientation selectivity in the visual cortexAnnual Review of Neuroscience 23(1) 441ndash471

Fincham E amp Walton J (1957) The reciprocalactions of accommodation and convergence TheJournal of Physiology 137(3) 488ndash508

Fleet D J Wagner H amp Heeger D J (1996)Neural encoding of binocular disparity Energymodels position shifts and phase shifts VisionResearch 36(12) 1839ndash1857

Ganguli D amp Simoncelli E P (2010) Implicitencoding of prior probabilities in optimal neuralpopulations Advances in Neural Information Pro-cessing Systems (NIPS) 23 658ndash666

Geisler W S Najemnik J amp Ing A D (2009)Optimal stimulus encoders for natural tasksJournal of Vision 9(13)17 1ndash16 httpwwwjournalofvisionorgcontent91317 10116791317 [PubMed] [Article]

Girshick A R Landy M S amp Simoncelli E P(2011) Cardinal rules Visual orientation percep-tion reflects knowledge of environmental statisticsNature Neuroscience 14(7) 926ndash932

Haefner R amp Bethge M (2010) Evaluating neuronalcodes for inference using Fisher informationAdvances in Neural Information Processing Systems23 1ndash9

Haefner R M amp Cumming B G (2008) Adaptationto natural binocular disparities in primate V1explained by a generalized energy model Neuron57(1) 147ndash158

Harris J M McKee S P amp Smallman H S (1997)Fine-scale processing in human binocular stereop-sis Journal of the Optical Society of America AOptics Image Science and Vision 14(8) 1673ndash1683

Hecht S Shlaer S amp Pirenne M H (1942) Energyquanta and vision The Journal of General Physi-ology 25(6) 819ndash840

Heeger D J (1992) Normalization of cell responses incat striate cortex Visual Neuroscience 9(2) 181ndash197

Hibbard P B (2008) Binocular energy responses tonatural images Vision Research 48(12) 1427ndash1439

Hoyer P O amp Hyvarinen A (2000) Independentcomponent analysis applied to feature extractionfrom colour and stereo images Network Compu-tation in Neural Systems 11(3) 191ndash210

Jazayeri M amp Movshon J A (2006) Optimalrepresentation of sensory information by neuralpopulations Nature Neuroscience 9(5) 690ndash696

Kaye M (1978) Stereopsis without binocular corre-lation Vision Research 18(8) 1013ndash1022

Landers D D amp Cormack L K (1997) Asymme-tries and errors in perception of depth fromdisparity suggest a multicomponent model ofdisparity processing Perception amp Psychophysics59(2) 219ndash231

Lehky S R amp Sejnowski T J (1990) Neural modelof stereoacuity and depth interpolation based on adistributed representation of stereo disparityJournal of Neuroscience 10(7) 2281ndash2299

Levi D M McKee S P amp Movshon J A (2011)Visual deficits in anisometropia Vision Research51(1) 48ndash57

Lewicki M S (2002) Efficient coding of naturalsounds Nature Neuroscience 5(4) 356ndash363

Li Z amp Atick J J (1994) Efficient stereo coding inthe multiscale representation Network Computa-tion in Neural Systems 5 157ndash174

Liu Y Bovik A C amp Cormack L K (2008)Disparity statistics in natural scenes Journal ofVision 8(11)19 1ndash14 httpwwwjournalofvisionorgcontent81119 doi10116781119 [PubMed][Article]

Ma W J Beck J M Latham P E amp Pouget A(2006) Bayesian inference with probabilistic pop-ulation codes Nature Neuroscience 9(11) 1432ndash1438

Marcos S amp Burns S A (2000) On the symmetrybetween eyes of wavefront aberration and conedirectionality Vision Research 40(18) 2437ndash2447

Marr D amp Poggio T (1976) Cooperative computa-tion of stereo disparity Science 194(4262) 283ndash287

McKee S P Levi D M amp Bowne S F (1990) Theimprecision of stereopsis Vision Research 30(11)1763ndash1779

Nakayama K amp Shimojo S (1990) Da Vincistereopsis Depth and subjective occluding contoursfrom unpaired image points Vision Research30(11) 1811ndash1825

Nienborg H Bridge H Parker A J amp CummingB G (2004) Receptive field size in V1 neuronslimits acuity for perceiving disparity modulationJournal of Neuroscience 24(9) 2065ndash2076

Ogle K N (1952) On the limits of stereoscopic visionJournal of Experimental Psychology 44(4) 253ndash259

Ohzawa I (1998) Mechanisms of stereoscopic visionThe disparity energy model Current Opinion inNeurobiology 8(4) 509ndash515

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 17

Ohzawa I DeAngelis G C amp Freeman R D(1990) Stereoscopic depth discrimination in thevisual cortex Neurons ideally suited as disparitydetectors Science 249(4972) 1037ndash1041

Ohzawa I DeAngelis G C amp Freeman R D(1997) Encoding of binocular disparity by complexcells in the catrsquos visual cortex Journal of Neuro-physiology 77(6) 2879ndash2909

Olshausen B A amp Field D J (1996) Emergence ofsimple-cell receptive field properties by learning asparse code for natural images Nature 381(6583)607ndash609

Panum P (1858) Physiologische Untersuchungen uberdas Sehen mit zwei Augen [Translation Psycholog-ical investigations on seeing with two eyes] KielSchwerssche Buchandlung

Poggio G F Gonzalez F amp Krause F (1988)Stereoscopic mechanisms in monkey visual cortexBinocular correlation and disparity selectivityJournal of Neuroscience 8(12) 4531ndash4550

Qian N (1997) Binocular disparity and the perceptionof depth Neuron 18(3) 359ndash368

Qian N amp Zhu Y (1997) Physiological computationof binocular disparity Vision Research 37(13)1811ndash1827

Read J C A amp Cumming B G (2003) Testingquantitative models of binocular disparity selectiv-ity in primary visual cortex Journal of Neurophys-iology 90(5) 2795ndash2817

Read J C A amp Cumming B G (2007) Sensors forimpossible stimuli may solve the stereo correspon-dence problem Nature Neuroscience 10(10) 1322ndash1328

Read J C A Parker A J amp Cumming B G (2002)A simple model accounts for the response ofdisparity-tuned V1 neurons to anticorrelated im-ages Visual Neuroscience 19(6) 735ndash753

Simoncelli E P amp Olshausen B A (2001) Naturalimage statistics and neural representation AnnualReview of Neuroscience 24 1193ndash1216

Skottun B C De Valois R L Grosof D HMovshon J A Albrecht D G amp Bonds A B(1991) Classifying simple and complex cells on the

basis of response modulation Vision Research31(7-8) 1079ndash1086

Stevenson S B Cormack L K Schor C M ampTyler C W (1992) Disparity tuning in mecha-nisms of human stereopsis Vision Research 32(9)1685ndash1694

Stockman A amp Sharpe L T (2000) The spectralsensitivities of the middle- and long-wavelength-sensitive cones derived from measurements inobservers of known genotype Vision Research40(13) 1711ndash1737

Tanabe S amp Cumming B G (2008) Mechanismsunderlying the transformation of disparity signalsfrom V1 to V2 in the macaque Journal ofNeuroscience 28(44) 11304ndash11314

Tanabe S Haefner R M amp Cumming B G (2011)Suppressive mechanisms in monkey V1 help tosolve the stereo correspondence problem Journal ofNeuroscience 31(22) 8295ndash8305

Thibos L N Ye M Zhang X amp Bradley A(1992) The chromatic eye A new reduced-eyemodel of ocular chromatic aberration in humansApplied Optics 31(19) 3594ndash3600

Tsao D Y Conway B R amp Livingstone M S(2003) Receptive fields of disparity-tuned simplecells in macaque V1 Neuron 38(1) 103ndash114

Tyler C W (1975) Spatial organization of binoculardisparity sensitivity Vision Research 15(5) 583ndash590

Tyler C W amp Julesz B (1978) Binocular cross-correlation in time and space Vision Research18(1) 101ndash105

Vlaskamp B N S Yoon G amp Banks M S (2011)Human stereopsis is not limited by the optics of thewell-focused eye Journal of Neuroscience 31(27)9814ndash9818

Williams D R (1985) Visibility of interference fringesnear the resolution limit Journal of the OpticalSociety of America A 2(7) 1087ndash1093

Wyszecki G amp Stiles W (1982) Color scienceConcepts and methods quantitative data and for-mulas New York John Wiley amp Sons

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 18

113 13

Optimal13 disparity13 estimation13 in13 natural13 stereo-shy‐images13 (Supplement)13

Johannes13 Burge13 amp13 Wilson13 S13 Geisler13 13 Computing13 the13 log13 likelihood13 with13 populations13 of13 simple13 and13 complex13 cells13 13 Simple13 cells13 in13 visual13 cortex13 produce13 only13 positive13 responses13 whereas13 AMA13 filters13 produce13 both13 positive13 and13 negative13 responses13 The13 linear13 AMA13 filter13 responses13 (see13 first13 term13 in13 Eq13 5)13 and13 their13 pairwise13 sums13 (see13 the13 sum13 in13 the13 third13 termrsquos13 parentheses)13 can13 be13 obtained13 from13 a13 population13 of13 simple13 cells13 We13 define13 simple13 cells13 as13 units13 whose13 outputs13 result13 from13 linear13 filtering13 followed13 by13 half-shy‐wave13 rectification13 (ie13 thresholding13 at13 zero)13 Each13 AMA13 filter13 response13 in13 the13 first13 term13 can13 be13 obtained13 from13 two13 simple13 cells13 that13 are13 lsquoonrsquo13 and13 lsquooffrsquo13 versions13 of13 each13 AMA13 filter13 (Fig13 S3a13 see13 Fig13 3)13 Each13 pairwise-shy‐summed13 AMA13 filter13 response13 in13 the13 third13 term13 can13 be13 obtained13 from13 two13 simple13 cells13 that13 are13 lsquoonrsquo13 and13 lsquooffrsquo13 versions13 of13 each13 pairwise-shy‐summed13 filter13 (Fig13 S3a13 S4)13 13 13 The13 squared13 AMA13 filter13 responses13 (see13 the13 second13 and13 third13 terms13 in13 Eq13 5)13 can13 be13 obtained13 from13 a13 population13 of13 complex13 cells13 We13 define13 complex13 cells13 as13 units13 whose13 responses13 result13 from13 linear13 filtering13 followed13 by13 a13 squaring13 non-shy‐linearity13 as13 is13 often13 the13 case13 in13 cortex13 Complex13 cells13 can13 be13 implemented13 by13 summing13 and13 then13 squaring13 (or13 squaring13 and13 then13 summing)13 the13 outputs13 of13 lsquoonrsquo13 and13 lsquooffrsquo13 simple13 cells13 (Fig13 S3b13 the13 disparity13 tuning13 curves13 of13 these13 complex13 cells13 are13 given13 in13 Fig13 S5)13 13 Finally13 neurons13 whose13 response13 rates13 represent13 the13 log13 likelihood13 of13 the13 AMA13 filter13 population13 responses13 can13 be13 constructed13 via13 a13 weighted13 sum13 of13 simple13 and13 complex13 cell13 population13 responses13 (Eq13 513 Fig13 613 Fig13 7)13 (Note13 that13 the13 likelihood13 can13 be13 obtained13 by13 exponentiating13 the13 log-shy‐likelihood13 responses)13 In13 other13 words13 a13 large13 collection13 of13 log-shy‐likelihood13 (LL)13 neurons13 each13 with13 a13 different13 preferred13 disparity13 can13 be13 constructed13 from13 a13 fixed13 set13 of13 simple13 and13 complex13 cells13 simply13 by13 changing13 the13 weights13 13 13 Derivation13 of13 optimal13 weights13 on13 simple13 and13 complex13 cells13 The13 first13 term13 in13 Eq13 513 (ie13 the13 linear13 term)13 can13 be13 conceptualized13 as13 a13 weighted13 sum13 of13 simple13 cell13 population13 responses13 The13 second13 and13 third13 terms13 (ie13 the13 quadratic13 terms)13 can13 be13 conceptualized13 as13 weighted13 sums13 of13 complex13 cell13 population13 responses13 The13 constant13 corresponds13 to13 baseline13 response13 The13 constant13 and13 the13 weights13 on13 the13 linear13 squared13 and13 sum-shy‐squared13 filters13 are13 given13 by13

wik = Ck

minus1uk 13 13 13 13 13 13 13 13 13 (S1a)13

13 wiik = minusdiag(Cminus1)+ 05Cminus11 13 13 13 13 13 13 13 (S1b)13

wij k = minus05Cij

minus1 forallij j gt i 13 13 13 13 13 13 13 13 (S1c)13

cons primetk = minus05ukTCk

minus1uk + constk 13 13 13 13 13 13 (S1d)13 13 where13 I13 is13 the13 identity13 matrix13 113 is13 the13 lsquoonesrsquo13 vector13 and13 diag()13 sets13 a13 matrix13 diagonal13 to13 a13 vector13 Here13 we13 derive13 these13 weight13 equations13 based13 on13 the13 multi-shy‐dimensional13 Gaussian13 approximations13 to13 the13 empirically13 determined13 filter13 response13 distributions13 (ie13 conditional13 response13 distributions)13 (see13 Fig13 4a)13 13 13 13 13 13

213 13

First13 we13 rewrite13 Eq13 513 in13 the13 main13 text13 by13 expanding13 its13 third13 term13 13

ln p R δ k( ) = wik Ri

i=1

n

sum + wiik Ri2

i=1

n

sum + wij k Ri2 + 2Ri Rj + Rj

2( )j=i+1

n

sumi=1

nminus1

sum + cons primet 13 (S2)13

13 Second13 we13 rewrite13 Eq13 413 in13 the13 main13 text13

13

ln p R δ k( ) = minus05 R minus uk( )T Ak R minus uk( )+ const 13 13 13 13 13 (S3)13

13 where13 Ak 13 equals13 the13 inverse13 covariance13 matrix13 Ck

minus1 13 (Recall13 that13 the13 mean13 vector13 uk and13 covariance13 matrix13 Ck are13 obtained13 by13 projecting13 a13 collection13 of13 natural13 stereo-shy‐images13 having13 disparity13 δ k 13 onto13 the13 filters)13 From13 here13 forward13 the13 disparity13 index13 k13 is13 dropped13 for13 notational13 simplicity13 Third13 we13 set13 Eqs13 S213 and13 S313 equal13 to13 one13 another13 multiply13 through13 and13 collect13 terms13 The13 terms13 of13 the13 log13 likelihood13 (Eq13 S3)13 are13 found13 to13 have13 the13 form13 13

( ) ( )2 2 205 05 2ii i i ii i ii i i ii ia R u a R a Ru a uminus minus = minus minus + 13 13 13 13 (S4a)13

minusaij Ri minus ui( ) Rj minus u j( ) = minus aijRiRj minus aijRiu j minus aijRjui + aiju jui( ) 13 13 13 (S4b)13

13 where13 iia 13 are13 the13 on-shy‐diagonal13 elements13 of13 A 13 ija are13 the13 off-shy‐diagonal13 elements13 and13 iu 13 the13 are13 the13 mean13 filter13 responses13 13 13 The13 aim13 is13 now13 to13 express13 wi wii 13 and13 wij 13 in13 terms13 of13 iu 13 iia 13 and13 ija 13 First13 consider13 wi 13 In13 this13 case13

we13 want13 to13 collect13 terms13 containing13 iR 13 wiRi 13 from13 Eq13 S213 and13 aiiRiui 13 and13 aijRiu j 13 from13 Eq13 S4a13 Setting13 the13 iR 13 terms13 equal13 and13 canceling13 gives13 13

wi = aiju jj=1

n

sum 13 13 13 13 13 13 13 13 13 (S5a)13

13 Note13 that13 the13 weights13 on13 the13 linear13 filter13 responses13 wi 13 are13 zero13 if13 the13 mean13 linear13 filter13 responses13 ui 13 are13 zero13 In13 our13 case13 the13 filter13 response13 distributions13 means13 (ie13 the13 means13 of13 the13 conditional13 response13 distributions)13 are13 all13 near13 zero13 (see13 Fig13 4a13 Fig13 S2)13 Thus13 the13 weights13 on13 the13 simple13 cell13 responses13 (Fig13 6a)13 contribute13 little13 to13 the13 units13 (LL13 neurons)13 that13 respond13 according13 to13 the13 log13 likelihood13 of13 the13 linear13 filter13 population13 response13 (see13 Discussion)13 13 Next13 consider13 wij 13 i ne j 13 In13 this13 case13 we13 want13 to13 collect13 terms13 containing13 RiRj 13 2wijRiRj 13 appears13

in13 Eq13 S213 and13 minusaijRiRj 13 appears13 in13 Eq13 S4b13 Setting13 the13 two13 RiRj 13 terms13 equal13 to13 each13 other13 and13 canceling13 gives13 13

wij = minus05aij 13 13 13 13 13 13 13 13 13 (S5b)13

313 13

Next13 consider13 wii 13 In13 this13 case13 the13 total13 weight13 on13 2iR 13 in13 Eq13 S213 must13 be13 05 iiaminus 13 The13 weight13 from13

the13 third13 term13 of13 Eq13 S213 can13 be13 found13 by13 substituting13 wij = minus05aij 13 Subtracting13 that13 weight13 from13 the13

total13 weight13 specified13 by13 Eq13 S413 gives13 wii 13 13

wii = minusaii + 05 aijjsum 13 13 13 13 13 13 13 13 13 (S5c)13

13 Last13 by13 grouping13 terms13 from13 Eq13 S413 that13 are13 independent13 of13 the13 filter13 response13 we13 find13 the13 constant13 in13 Eq13 S213 is13 given13 by13 13

cons primet = minus05 aiimicroi2 + aijmicroimicro j

j foralljneisum

⎝⎜⎞

⎠⎟isum + const 13 13 13 13 (S5d)13

13 Eqs13 S1a-shy‐d13 are13 obtained13 by13 rewriting13 Eqs13 S5a-shy‐d13 in13 matrix13 notation13 13 13 In13 visual13 cortex13 simple13 and13 complex13 cells13 have13 the13 same13 maximum13 firing13 rates13 on13 average13 (Albrecht13 amp13 Hamilton13 1982)13 However13 the13 maximum13 possible13 response13 of13 the13 squared13 filters13 (see13 the13 second13 term13 in13 Eq13 5)13 is13 R2max 13 and13 the13 maximum13 possible13 response13 of13 the13 sum-shy‐squared13 filters13 (see13 the13 third13

term13 in13 Eq13 5)13 is13 given13 by13 R2ijmax = Rmax 2+ 2fi sdot f j( )2 13 where13 fi 13 and13 f j 13 are13 the13 weighting13 functions13 defining13 filters13 i13 and13 j13 (see13 Fig13 3a)13 Thus13 if13 these13 computations13 were13 implemented13 in13 cortexmdashthat13 is13 if13 the13 maximum13 response13 of13 the13 complex13 cells13 equaled13 the13 maximum13 response13 of13 the13 simple13 cells13 Rmax 13 rather13 than13 R

2max 13 and13 R

2ijmaxmdashthe13 weights13 on13 the13 complex13 cell13 responses13 must13 be13 scaled13

Thus13 the13 optimal13 weights13 on13 complex13 cells13 are13 given13 by13 primewi = wi 13 primewii = Rmaxwii 13 and13 primewij =Rijmax2

Rmaxwij 13

These13 scale13 factors13 are13 important13 because13 in13 a13 neurophysiological13 experiment13 one13 would13 measure13 primew 13 not13 w 13 Consequently13 primewii 13 and13 primewij 13 are13 the13 normalized13 weights13 plotted13 in13 Fig13 7b13

13 Evaluating13 the13 effect13 of13 surface13 slant13 Slant13 causes13 disparity13 to13 change13 across13 the13 patch13 We13 evaluated13 the13 effect13 of13 surface13 slant13 by13 determining13 the13 optimal13 filters13 and13 estimation13 performance13 for13 a13 training13 set13 with13 a13 distribution13 of13 slants13 If13 we13 assume13 (plausibly)13 that13 objects13 are13 on13 average13 equally13 likely13 to13 be13 viewed13 from13 any13 given13 angle13 then13 for13 a13 given13 small13 angular13 retinal13 extent13 the13 probability13 of13 a13 given13 slant13 about13 the13 vertical13 axis13 should13 fall13 as13 a13 half-shy‐cosine13 function13 Slant13 was13 calculated13 for13 the13 cyclopean13 eye13 To13 evaluate13 the13 effect13 of13 surface13 slant13 variation13 we13 generated13 a13 training13 set13 of13 stereo-shy‐images13 from13 a13 set13 of13 surfaces13 with13 slants13 that13 were13 randomly13 drawn13 from13 a13 truncated13 (+7113 deg)13 cosine13 probability13 density13 (more13 extreme13 slants13 were13 not13 practical13 to13 include13 because13 of13 the13 surface13 size13 required13 to13 produce13 a13 projected13 patch13 width13 of13 113 deg)13 The13 within-shy‐patch13 changes13 in13 disparity13 that13 were13 introduced13 by13 these13 slants13 are13 comparable13 to13 those13 in13 natural13 viewing13 (Fig13 S1g)13 Results13 based13 on13 this13 analysis13 are13 thus13 likely13 to13 be13 representative13 Then13 we13 determined13 the13 optimal13 binocular13 filters13 for13 this13 new13 training13 set13 The13 filters13 that13 were13 trained13 on13 surfaces13 having13 different13 slants13 (Fig13 S1a-shy‐c)13 and13 the13 LL13 neurons13 that13 are13 constructed13 from13 them13 (Fig13 S1de)13 are13 quite13 similar13 to13 those13 that13 were13 trained13 only13 on13 fronto-shy‐parallel13 surfaces13 The13 main13 (although13 slight)13 differences13 are13 that13 the13 new13 filters13 have13 spatial13 extents13 that13 are13 slightly13 smaller13 and13 octave13 bandwidths13 that13 are13 slightly13 higher13

413 13

and13 that13 the13 new13 LL13 neuron13 tuning13 curves13 are13 broader13 Disparity13 estimation13 performance13 on13 slant13 surfaces13 is13 also13 comparable13 but13 precision13 is13 somewhat13 reduced13 (Fig13 S1f13 Fig13 5d)13 When13 disparity13 changes13 across13 the13 patch13 the13 optimal13 encoding13 filters13 must13 be13 small13 enough13 to13 prevent13 the13 disparity13 signal13 from13 being13 lsquosmeared13 outrsquo13 by13 within-shy‐patch13 (ie13 within-shy‐receptive13 field)13 disparity13 variation13 And13 they13 must13 be13 large13 enough13 to13 prevent13 a13 poor13 signal-shy‐to-shy‐noise13 ratio13 which13 would13 lead13 to13 a13 preponderance13 of13 false13 matches13 (ie13 inaccurate13 disparity13 estimates)13 The13 optimal13 filters13 appear13 to13 strike13 this13 balance13 13 13 Evaluating13 the13 effect13 of13 depth13 variations13 other13 than13 slant13 Naturally13 occurring13 fine13 depth13 structure13 and13 occlusions13 (ie13 within-shy‐patch13 depth13 variation13 and13 depth13 discontinuities13 Fig13 S8a-shy‐c)13 were13 not13 present13 in13 either13 of13 our13 training13 sets13 To13 determine13 how13 the13 magnitude13 of13 these13 other13 sources13 of13 depth13 variation13 compare13 to13 that13 due13 to13 slant13 we13 analyzed13 4000013 113 deg13 patches13 from13 3013 outdoor13 range13 images13 obtained13 with13 a13 high-shy‐precision13 Riegl13 VZ40013 range13 scanner13 For13 each13 patch13 we13 computed13 the13 standard13 deviation13 of13 i)13 the13 differences13 between13 the13 raw13 range13 values13 and13 the13 best-shy‐fitting13 fronto-shy‐parallel13 plane13

σ nonminus planar 13 and13 ii)13 the13 differences13

between13 the13 best-shy‐fitting13 slanted13 and13 fronto-shy‐parallel13 planes13 σ planar 13 The13 ratio13 of13 these13 standard13

deviations13 σ planar σ nonminus planar 13 provides13 a13 measure13 of13 the13 relative13 magnitude13 of13 depth13 variation13 due13 to13

slant13 and13 variation13 due13 to13 other13 factors13 If13 the13 depth13 variation13 is13 due13 only13 to13 surface13 slant13 then13 the13 ratio13

σ planar σ nonminus planar 13 will13 have13 a13 value13 equal13 to13 1013 If13 all13 of13 the13 depth13 variation13 is13 due13 to13 factors13

other13 than13 slant13 the13 ratio13 will13 have13 a13 value13 of13 0013 Fig13 S8d13 plots13 the13 distribution13 of13 this13 ratio13 for13 all13 range13 patches13 In13 the13 most13 common13 patches13 essentially13 all13 the13 variation13 is13 captured13 by13 slant13 (ie13 the13 modal13 ratio13 is13 ~10)13 The13 median13 ratio13 is13 07513 indicating13 that13 in13 the13 median13 patch13 7513 of13 the13 depth13 variation13 is13 captured13 by13 slant13 Given13 that13 the13 optimal13 binocular13 filters13 are13 robust13 to13 variation13 in13 surface13 slant13 and13 given13 that13 most13 of13 natural13 depth13 variation13 is13 captured13 by13 slant13 it13 is13 unlikely13 that13 depth13 variation13 other13 than13 slant13 will13 qualitatively13 affect13 our13 results13 13 External13 variability13 and13 neural13 noise13 13 We13 examined13 the13 relative13 magnitude13 of13 stimulus-shy‐induced13 LL13 neuron13 variability13 and13 typical13 neural13 noise13 in13 cortex13 Typical13 cortical13 neurons13 have13 a13 peak13 mean13 response13 larger13 than13 3013 spikessec13 (Geisler13 amp13 Albrecht13 1997)13 and13 intrinsic13 noise13 variance13 proportional13 to13 the13 neuronrsquos13 mean13 response13 with13 a13 proportionality13 constant13 (ie13 Fano13 factor)13 less13 than13 15(Tolhurst13 Movshon13 amp13 Dean13 1983)13 To13 err13 on13 the13 side13 of13 too13 much13 neural13 noise13 we13 assume13 that13 each13 LL13 neuron13 has13 a13 maximum13 average13 response13 of13 3013 spkss13 and13 a13 Fano13 factor13 of13 1513 (bc13 the13 effect13 of13 noise13 is13 greater13 at13 lower13 firing13 rates)13 We13 convert13 the13 LL13 neuron13 responses13 to13 spike13 rates13 by13 first13 obtaining13 the13 average13 response13 of13 each13 LL13 neuron13 to13 its13 preferred13 disparity13 and13 then13 scaling13 that13 value13 to13 the13 maximum13 average13 response13 For13 an13 integration13 time13 of13 a13 typical13 fixation13 duration13 (30013 ms)13 stimulus-shy‐induced13 response13 variance13 was13 ~3x13 greater13 than13 the13 internal13 noise13 For13 an13 integration13 time13 equal13 to13 a13 typical13 stimulus13 presentation13 in13 a13 psychophysical13 task13 (113 sec)13 13 stimulus-shy‐induced13 response13 variance13 was13 ~9x13 greater13 than13 the13 internal13 noise13 Thus13 external13 variability13 must13 be13 considered13 to13 understand13 the13 influence13 of13 neural13 noise13 on13 the13 behavioral13 limits13 of13 vision13 under13 natural13 conditions13 In13 fact13 stimulus-shy‐induced13 variability13 is13 likely13 to13 be13 the13 controlling13 variability13 in13 many13 natural13 single-shy‐trial13 tasks13 13 Spatial13 frequency13 tuning13 of13 optimal13 linear13 binocular13 filters13 13 To13 develop13 an13 intuition13 for13 why13 spatial13 frequencies13 higher13 than13 ~613 cpd13 carry13 little13 information13 about13 disparity13 we13 examined13 the13 disparity13 signals13 resulting13 from13 a13 binocularly-shy‐viewed13 high-shy‐contrast13 cosine-shy‐windowed13 edge13 The13 Fourier-shy‐decomposition13 of13 each13 eyersquos13 stimulus13 is13 a13 set13 of13

513 13

phase-shy‐aligned13 sinewave13 gratings13 with13 the13 1f13 contrast13 fall-shy‐off13 typical13 of13 natural13 scenes(Field13 1987)13 The13 binocular13 difference13 signal13 (ie13 the13 difference13 of13 the13 left13 and13 right13 signals)13 is13 sinusoidal13 with13 contrast13 amplitude13 given13 by13 13

AB f δ k( ) = AL( f )2 + AR( f )2 minus 2AL( f )AR( f )cos 2π fδ k( ) 13 13 13 (S6)13

13 where f 13 is13 the13 spatial13 frequency13 δ k 13 is13 a13 particular13 disparity13 AL ( f ) 13 and13 AR( f ) 13 are13 the13 left13 and13

right13 eye13 retinal13 amplitudes13 and13 AB f δ k( ) 13 is13 the13 amplitude13 of13 the13 binocular13 contrast13 difference13 signal13 If13 the13 two13 eyes13 have13 identical13 optics13 the13 left-shy‐13 and13 right-shy‐eye13 retinal13 amplitudes13 (which13 include13 the13 effects13 of13 the13 optics13 and13 the13 1 f 13 falloff13 of13 natural13 image13 spectra)13 will13 be13 the13 same13 for13 a13 given13 disparity13 Fig13 S7a13 plots13 pattern13 of13 binocular13 contrast13 signals13 for13 each13 of13 several13 disparities13 Fig13 S7b13 shows13 the13 signals13 after13 filtering13 with13 a13 bank13 of13 log-shy‐Gabor13 filters13 each13 having13 a13 1513 octave13 bandwidth13 The13 patterns13 of13 binocular13 contrast13 barely13 differ13 above13 ~613 cpd13 A13 filter13 population13 (for13 disparity13 estimation)13 will13 be13 most13 efficient13 if13 each13 filter13 signals13 information13 about13 a13 range13 of13 disparities13 13 13 V113 binocular13 receptive13 fields13 tuned13 to13 the13 highest13 useful13 spatial13 frequency13 (~613 cpd)13 would13 have13 a13 spatial13 extent13 of13 ~813 arcmin13 assuming13 the13 typical13 1513 octave13 bandwidth13 of13 neurons13 in13 cortex13 Given13 that13 receptive13 fields13 cannot13 signal13 disparity13 variation13 finer13 than13 their13 own13 width13 this13 analysis13 may13 help13 explain13 why13 humans13 lose13 the13 ability13 to13 detect13 disparity13 modulations13 (ie13 sinusoidal13 modulations13 in13 disparity-shy‐defined13 depth)13 at13 spatial13 frequencies13 higher13 than13 ~413 cpd13 (Banks13 Gepshtein13 amp13 Landy13 200413 Harris13 McKee13 amp13 Smallman13 1997)13 Vergence13 eye13 movement13 jitter13 has13 previously13 been13 proposed13 as13 an13 explanation13 for13 why13 useful13 binocular13 information13 is13 restricted13 to13 low13 spatial13 frequencies13 (Vlaskamp13 Yoon13 amp13 Banks13 2011)13 vergence13 noise13 can13 degrade13 stereo13 information13 enough13 to13 render13 high13 frequency13 signals13 unmeasurable13 (Fig13 S7cd)13 The13 analysis13 presented13 here13 suggests13 that13 the13 frequency13 at13 which13 humans13 lose13 the13 ability13 to13 detect13 disparity13 modulation13 may13 simply13 be13 explained13 by13 the13 statistics13 of13 natural13 images13 13 Optics13 The13 retinal13 images13 used13 in13 our13 analysis13 were13 simulated13 to13 match13 the13 retinal13 image13 formation13 that13 occurs13 in13 the13 human13 eye13 In13 humans13 and13 non-shy‐human13 primates13 accommodative13 and13 vergence13 eye-shy‐movement13 systems13 are13 yoked(Fincham13 amp13 Walton13 1957)13 Consistent13 with13 this13 fact13 we13 set13 the13 focus13 distance13 equal13 to13 the13 fixation13 distance13 of13 4013 cm13 and13 defocused13 the13 images13 appropriate13 for13 each13 disparity13 The13 image13 of13 a13 surface13 displaced13 from13 the13 fixation13 is13 defocused13 by13

13 ΔD cong δ IPD 13 13 13 13 13 13 13 13 13 13 (S7)13

13 where13 δ 13 is13 the13 disparity13 of13 the13 surface13 expressed13 in13 radians13 IPD13 is13 the13 inter-shy‐pupillary13 distance13 expressed13 in13 meters13 and13 ΔD 13 is13 the13 defocus13 expressed13 in13 diopters13 (1meters)13 To13 account13 for13 the13 effect13 of13 chromatic13 aberration13 we13 created13 a13 different13 polychromatic13 point-shy‐spread13 function13 (PSF)13 for13 each13 disparity13 (ie13 defocus13 Eq13 S7)13 Single-shy‐wavelength13 PSFs13 were13 computed13 every13 1013 nm13 between13 40013 and13 70013 nm13 The13 wavelength-shy‐dependent13 change13 in13 refractive13 power13 of13 the13 human13 eye13 was13 taken13 from13 the13 literature13 (Thibos13 Ye13 Zhang13 amp13 Bradley13 1992)13 Polychromatic13 PSFs13 were13 obtained13 by13 weighting13 the13 single-shy‐wavelength13 PSFs13 by13 the13 photopic13 sensitivity13 function13 (Stockman13 amp13 Sharpe13 2000)13 and13 by13 the13 D6513 daylight13 illumination13 spectrum13 and13 then13 summing13 13

613 13

13 psf photopic xΔD( ) = 1

Kpsf xλΔD( )sc

λsum λ( )D65 λ( ) 13 13 13 13 (S8)13

13 where13 sc λ( ) 13 is13 the13 human13 photopic13 sensitivity13 function13 and13 K13 is13 a13 normalizing13 constant13 that13 sets13 the13 volume13 of13 psf photopic 13 to13 1013 The13 modulation13 transfer13 function13 (MTF)13 is13 the13 amplitude13 of13 the13 Fourier13 transform13 of13 the13 PSF13 13 Contrast13 normalization13 Contrast13 normalization13 occurs13 early13 in13 visual13 processing13 The13 standard13 model13 of13 cortical13 neuron13 response13 assumes13 that13 the13 input13 is13 the13 contrast13 signal13

c x( ) = r x( )minus r( ) r 13 where13 r x( ) 13 are13 the13

noisy13 sensor13 responses13 to13 the13 luminance13 image13 falling13 on13 the13 retina13 over13 a13 local13 area13 and13 r 13 is13 the13 local13 mean13 This13 contrast13 signal13 is13 then13 normalized13 by13 the13 local13 contrast13

cnorm x( ) = c x( ) c x( ) 2

+ nc502 13 where13 n13 is13 the13 dimensionality13 of13 the13 vector13 c x( ) 2 13 is13 the13 contrast13

energy13 (or13 equivalently13 n 13 times13 the13 squared13 RMS13 contrast)13 and13 50c 13 is13 the13 half-shy‐saturation13 constant13 (Albrecht13 amp13 Geisler13 199113 Albrecht13 amp13 Hamilton13 198213 Heeger13 1992)13 (We13 are13 agnostic13 about13 the13 mechanism13 by13 which13 the13 contrast13 normalization13 is13 achieved)13 Finally13 the13 normalized13 contrast13 signal13 is13 weighted13 by13 the13 cortical13 neuronrsquos13 receptive13 field13 and13 passed13 through13 a13 static13 non-shy‐linearity13 For13 example13 in13 the13 case13 of13 a13 binocular13 simple13 cell13 the13 response13 of13 the13 neuron13 is13 obtained13 by13 taking13 the13 dot13 product13 of13 the13 contrast-shy‐normalized13 signal13 with13 the13 receptive13 field13 and13 half-shy‐wave13 rectifying13

Ri = Rmax cnorm x( ) sdot fi x( )⎢⎣ ⎥⎦ 13 13

13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13

713 13

13 References13 Albrecht13 D13 G13 amp13 Geisler13 W13 S13 (1991)13 Motion13 selectivity13 and13 the13 contrast-shy‐response13 function13 of13

simple13 cells13 in13 the13 visual13 cortex13 Visual13 Neuroscience13 7(6)13 531ndash54613 Albrecht13 D13 G13 amp13 Hamilton13 D13 B13 (1982)13 Striate13 cortex13 of13 monkey13 and13 cat13 contrast13 response13

function13 Journal13 of13 Neurophysiology13 48(1)13 217ndash23713 Banks13 M13 S13 Gepshtein13 S13 amp13 Landy13 M13 S13 (2004)13 Why13 is13 spatial13 stereoresolution13 so13 low13 The13

Journal13 of13 Neuroscience13 24(9)13 2077ndash208913 13 Field13 D13 J13 (1987)13 Relations13 between13 the13 statistics13 of13 natural13 images13 and13 the13 response13 properties13 of13

cortical13 cells13 Journal13 of13 the13 Optical13 Society13 of13 America13 A13 4(12)13 2379ndash239413 Fincham13 E13 amp13 Walton13 J13 (1957)13 The13 reciprocal13 actions13 of13 accommodation13 and13 convergence13 The13

Journal13 of13 Physiology13 137(3)13 488ndash50813 Geisler13 W13 S13 amp13 Albrecht13 D13 G13 (1997)13 Visual13 cortex13 neurons13 in13 monkeys13 and13 cats13 detection13

discrimination13 and13 identification13 Visual13 Neuroscience13 14(5)13 897ndash91913 Harris13 J13 M13 McKee13 S13 P13 amp13 Smallman13 H13 S13 (1997)13 Fine-shy‐scale13 processing13 in13 human13 binocular13

stereopsis13 Journal13 of13 the13 Optical13 Society13 of13 America13 A13 14(8)13 1673ndash168313 Heeger13 D13 J13 (1992)13 Normalization13 of13 cell13 responses13 in13 cat13 striate13 cortex13 Visual13 Neuroscience13 9(2)13

181ndash19713 Hibbard13 P13 B13 (2008)13 Binocular13 energy13 responses13 to13 natural13 images13 Vision13 Research13 48(12)13 1427ndash

143913 13 Stockman13 A13 amp13 Sharpe13 L13 T13 (2000)13 The13 spectral13 sensitivities13 of13 the13 middle-shy‐13 and13 long-shy‐wavelength-shy‐

sensitive13 cones13 derived13 from13 measurements13 in13 observers13 of13 known13 genotype13 Vision13 Research13 40(13)13 1711ndash173713

Thibos13 L13 N13 Ye13 M13 Zhang13 X13 amp13 Bradley13 A13 (1992)13 The13 chromatic13 eye13 a13 new13 reduced-shy‐eye13 model13 of13 ocular13 chromatic13 aberration13 in13 humans13 Applied13 Optics13 31(19)13 3594ndash360013

Tolhurst13 D13 J13 Movshon13 J13 A13 amp13 Dean13 A13 F13 (1983)13 The13 statistical13 reliability13 of13 signals13 in13 single13 neurons13 in13 cat13 and13 monkey13 visual13 cortex13 Vision13 Research13 23(8)13 775ndash78513

Vlaskamp13 B13 N13 S13 Yoon13 G13 amp13 Banks13 M13 S13 (2011)13 Human13 stereopsis13 is13 not13 limited13 by13 the13 optics13 of13 the13 well-shy‐focused13 eye13 The13 Journal13 of13 Neuroscience13 31(27)13 9814ndash981813

13

813 13

13 Figure13 S113 Optimal13 filters13 and13 disparity13 estimation13 performance13 for13 estimating13 disparity13 for13 surfaces13 with13 a13 cosine13 distribution13 of13 slants13 (a)13 Optimal13 linear13 binocular13 filters13 for13 estimating13 the13 disparity13 (of13 the13 center13 pixel)13 of13 slanted13 surfaces13 The13 filters13 were13 learned13 on13 a13 training13 set13 having13 surfaces13 with13 a13 cosine13 distribution13 of13 slants13 The13 optimal13 filter13 shapes13 are13 robust13 to13 the13 presence13 of13 non-shy‐zero13 surface13 slants13 in13 the13 training13 set13 However13 these13 filters13 are13 of13 somewhat13 smaller13 spatial13 extent13 and13 slightly13 higher13 octave13 bandwidth13 (compare13 filters13 F3-shy‐F513 to13 those13 in13 Fig13 3)13 (b)13 Spatial13 frequency13 tuning13 and13 bandwidth13 of13 filters13 (c)13 Two-shy‐dimensional13 analogs13 of13 filters13 in13 (a)13 The13 two-shy‐dimensional13 filters13 respond13 virtually13 identically13 to13 two-shy‐dimensional13 images13 as13 do13 the13 one-shy‐dimensional13 filters13 to13 one-shy‐dimensional13 signals13 (d)13 Tuning13 curves13 of13 LL13 neurons13 trained13 and13 tested13 on13 surfaces13 with13 varying13 surface13 slant13 (e)13 Normalized13 weights13 on13 model13 complex13 cell13 responses13 for13 constructing13 five13 LL13 neurons13 with13 the13 indicated13 preferred13 disparities13 (f)13 Disparity13 estimation13 performance13 with13 surfaces13 having13 a13 cosine13 distribution13 of13 slants13 (black13 line)13 Also13 shown13 are13 disparity13 estimates13 derived13 from13 a13 local13 cross-shy‐correlator13 (gray13 line)13 (g)13 Disparity13 gradients13 from13 viewing13 natural13 scenes(Hibbard13 2008)13 vs13 disparity13 gradients13 in13 our13 training13 set13 The13 relationship13 between13 slant13 and13 disparity13 gradient13 is13 given13 by13

g asymp IPD tanθ( ) d 13 where13 θ 13 is13 the13 surface13 slant13 and13 d13 is13 the13 distance13 to13 the13 surface13 Hibbard13 (2008)13 estimated13 disparity13 gradients13 in13 natural13 viewing13 for13 distances13 nearer13 than13 3513 m13 (dashed13 curve)13 and13 0513 m13 (dotted13 curve)13 At13 the13 04m13 viewing13 distance13 assumed13 throughout13 the13 paper13 our13 manipulation13 of13 surface13 slant13 introduced13 distribution13 of13 disparity13 gradients13 (black13 line)13 that13 is13 comparable13 to13 the13 gradients13 introduced13 in13 natural13 viewing13 13

913 13

13 Figure13 S213 Filter13 responses13 LL13 neuron13 responses13 and13 performance13 with13 random-shy‐dot13 stereograms13 and13 anti-shy‐correlated13 random-shy‐dot13 stereograms13 (a)13 Joint13 response13 distributions13 of13 filters13 F113 and13 F213 when13 stimulated13 with13 random-shy‐dot13 stereograms13 Compare13 to13 natural13 stimuli13 in13 Fig13 4a13 (b)13 Joint13 responses13 of13 filters13 F313 and13 F413 when13 stimulated13 with13 random-shy‐dot13 stereograms13 (c)13 LL13 neuron13 responses13 to13 random13 dot13 stereograms13 (d)13 LL13 neuron13 tuning13 vs13 bandwidth13 when13 stimulated13 with13 natural13 stimuli13 Note13 that13 bandwidths13 are13 elevated13 compared13 to13 when13 the13 LL13 neurons13 are13 stimulated13 with13 natural13 stimuli13 (see13 Fig13 7c)13 (e)13 Disparity13 estimates13 and13 estimate13 precision13 (inset)13 obtained13 with13 random-shy‐dot13 stereograms13 (fghi)13 Same13 as13 abce13 except13 that13 the13 stimuli13 were13 anti-shy‐correlated13 random13 dot13 stereograms13 Filter13 response13 distributions13 still13 segregate13 as13 a13 function13 of13 disparity13 (especially13 with13 F313 and13 F4)13 However13 the13 anti-shy‐correlated13 filter13 response13 distributions13 do13 not13 coincide13 with13 the13 lsquoexpectedrsquo13 pattern13 of13 response13 for13 particular13 disparities13 The13 disparities13 are13 decoded13 improperly13 with13 a13 decoder13 trained13 on13 natural13 stimuli13 LL13 neuron13 tuning13 curves13 with13 anti-shy‐correlated13 stereograms13 are13 neither13 selective13 nor13 invariant13 Inaccurate13 highly13 variable13 estimates13 result13 13 13

13 13

1013 13

13 Figure13 S313 Obtaining13 AMA13 filter13 and13 model13 complex13 cell13 responses13 with13 established13 neural13 operations13 (a)13 Simple13 cell13 responses13 are13 obtained13 by13 projecting13 the13 left-shy‐13 and13 right-shy‐eye13 signals13 onto13 each13 binocular13 filter13 and13 half-shy‐wave13 rectifying13 Subtracting13 lsquoonrsquo13 and13 lsquooffrsquo13 simple13 cell13 responses13 give13 the13 AMA13 filter13 responses13 (see13 Fig13 3a)13 (b)13 Model13 (and13 neurophysiological)13 complex13 cell13 responses13 are13 obtained13 by13 summing13 and13 squaring13 the13 responses13 of13 simple13 cells13 (Model13 complex13 cell13 responses13 can13 also13 be13 obtained13 by13 summing13 the13 responses13 of13 simple13 cells13 that13 have13 both13 a13 half-shy‐wave13 rectifying13 and13 a13 squaring13 nonlinearity)13 Note13 that13 the13 model13 complex13 cells13 differ13 from13 canonical13 disparity13 energy13 binocular13 complex13 cells13 We13 label13 them13 lsquocomplexrsquo13 because13 they13 exhibit13 temporal13 frequency13 doubling13 and13 have13 a13 fundamental13 to13 DC13 response13 ratio13 of13 less13 than13 one13 The13 model13 LL13 neurons13 (see13 Fig13 613 and13 Fig13 7)13 are13 more13 similar13 to13 disparity13 energy13 complex13 cells13 although13 as13 noted13 in13 the13 text13 there13 are13 also13 important13 differences13

13

1113 13

13 Figure13 S413 The13 eight13 AMA13 filters13 from13 Fig13 313 (on13 the13 diagonal)13 and13 their13 pairwise13 sums13 (see13 Fig13 S3a)13 Solid13 and13 dashed13 curves13 indicate13 the13 left13 and13 right13 eye13 filter13 components13 respectively13 Filter13 identity13 is13 indicated13 by13 the13 subscripts13 Fij13 Some13 filters13 have13 shapes13 similar13 to13 canonical13 V1-shy‐like13 binocular13 receptive13 fields13 These13 filters13 yield13 model13 complex13 cell13 tuning13 curves13 that13 vary13 with13 disparity13 (Fig13 S5)13 Other13 filters13 are13 irregularly13 shaped13 (eg13 F36)13 Irregularly13 shaped13 filters13 yield13 model13 complex13 cell13 tuning13 curves13 that13 do13 not13 vary13 significantly13 with13 disparity13 (Fig13 S5)13 The13 inset13 shows13 the13 largest13 absolute13 normalized13 weight13 (see13 Supplement13 Eqs13 S1a-shy‐d)13 given13 to13 each13 filter13 in13 constructing13 the13 set13 of13 LL13 neurons13 (Fig13 7a)13 Lighter13 colors13 indicate13 smaller13 weights13 Unsurprisingly13 the13 irregularly13 shaped13 filters13 (eg13 F36)13 play13 a13 generally13 smaller13 role13 in13 shaping13 the13 response13 properties13 of13 log-shy‐likelihood13 (LL)13 neurons13 13 13

1213 13

13 Figure13 S513 Disparity13 tuning13 curves13 of13 the13 model13 complex13 cells13 implied13 by13 the13 AMA13 filters13 (on-shy‐diagonal)13 and13 their13 normalized13 pairwise13 sums13 (off-shy‐diagonal)13 Solid13 curves13 show13 the13 complex13 cell13 tuning13 curves13 to13 natural13 stimuli13 The13

complex13 response13 is13 given13 by13 filtering13 the13 contrast-shy‐normalized13 stimulus13 and13 then13 squaring13 Rmax fi

Tcnorm( )2 13 (see13 Fig13 S4)13 Each13 point13 on13 the13 curve13 represents13 the13 average13 response13 to13 a13 large13 number13 of13 natural13 stimuli13 having13 the13 same13 disparity13 Gray13 area13 shows13 response13 variability13 due13 to13 variation13 in13 natural13 image13 content13 Dashed13 curves13 show13 the13 complex13 cell13 tuning13 curve13 to13 513 random-shy‐dot13 stereograms13 (RDSs)13 In13 most13 cases13 RDS13 tuning13 curves13 are13 similar13 to13 the13 tuning13 curves13 for13 natural13 stimuli13 although13 response13 magnitude13 is13 generally13 somewhat13 attenuated13 These13 individual13 model13 complex13 cells13 have13 some13 selectivity13 for13 disparity13 However13 they13 exhibit13 little13 invariance13 to13 variation13 in13 natural13 image13 content13 Thus13 the13 pattern13 of13 the13 joint13 population13 response13 must13 be13 used13 to13 accurately13 estimate13 disparity13 Neurons13 that13 signal13 the13 log-shy‐likelihood13 of13 disparity13 (LL13 neurons)13 can13 be13 constructed13 by13 appropriately13 combining13 the13 joint13 responses13 of13 model13 complex13 cells13 LL13 neurons13 are13 highly13 selective13 for13 disparity13 and13 are13 also13 strongly13 response13 invariant13 to13 irrelevant13

1313 13

13 Figure13 S613 Optimal13 disparity13 estimation13 vs13 estimation13 via13 local13 cross-shy‐correlation13 Performance13 measures13 were13 obtained13 by13 testing13 each13 method13 on13 identical13 sets13 of13 left-shy‐13 and13 right-shy‐eye13 signals13 like13 those13 shown13 in13 Fig13 1d13 (Note13 that13 this13 differs13 somewhat13 from13 the13 most13 common13 test13 of13 the13 cross-shy‐correlation13 model13 which13 gives13 the13 algorithm13 windowed13 access13 to13 many13 patches13 in13 each13 eye13 We13 chose13 the13 current13 method13 so13 that13 each13 algorithm13 has13 access13 to13 the13 same13 image13 information13 thereby13 making13 the13 comparison13 meaningful)13 Black13 curves13 indicate13 estimation13 performance13 via13 local13 cross-shy‐correlation13 Gray13 curves13 indicate13 performance13 obtained13 with13 the13 eight13 AMA13 filters13 shown13 in13 the13 main13 text13 (data13 replotted13 from13 Fig13 5)13 (a)13 Median13 MAP13 estimates13 for13 frontoparallel13 surfaces13 only13 Error13 bars13 indicate13 6813 confidence13 intervals13 (b)13 Estimate13 precision13 as13 indicated13 by13 6813 confidence13 intervals13 Error13 bars13 from13 (a)13 are13 replotted13 on13 a13 linear13 axis13 (Confidence13 intervals13 were13 plotted13 on13 a13 semi-shy‐log13 plot13 in13 the13 main13 text)13 (c)13 Sign13 identification13 performance13 with13 frontoparallel13 surfaces13 (de)13 Same13 as13 bc13 except13 that13 the13 performance13 is13 for13 surfaces13 with13 a13 cosine13 distribution13 of13 slants13 Estimation13 performance13 with13 the13 cross-shy‐correlator13 is13 (slightly)13 more13 biased13 and13 is13 significantly13 less13 precise13 at13 disparities13 larger13 than13 7513 arcmin13 13

1413 13

13 Figure13 S713 Information13 for13 disparity13 estimation13 is13 concentrated13 in13 low13 to13 mid13 spatial13 frequencies13 (a)13 Inter-shy‐ocular13 retinal13 contrast13 amplitude13 of13 a13 cosine-shy‐windowed13 high-shy‐contrast13 edge13 as13 a13 function13 of13 spatial13 frequency13 for13 different13 disparities13 (Eq13 S6)13 The13 optics13 in13 the13 two13 eyes13 were13 identical13 ie13

AL f( ) = AR f( )prop MTFδ f( ) f 13 where13 MTFδ is13 the13 modulation13 transfer13 function13 associated13 with13 a13 particular13 disparity13 (see13 Methods)13 (b)13 Inter-shy‐ocular13 contrast13 signals13 that13 have13 been13 passed13 through13 a13 bank13 of13 1513 octave13 bandwidth13 log-shy‐Gabor13 filters13 The13 signals13 differ13 little13 above13 ~513 cpd13 (c)13 The13 effect13 of13 vergence13 jitter13 on13 inter-shy‐ocular13 contrast13 signals13 Signals13 were13 corrupted13 by13 213 arcmin13 of13 vergence13 jitter13 a13 typical13 value13 in13 humans13 Vergence13 noise13 was13 simulated13 by13 convolving13 each13 disparity13 signal13 with13 a13 Gaussian13 having13 a13 standard13 deviation13 of13 213 arcmin13 Vergence13 jitter13 further13 attenuates13 high13 frequency13 inter-shy‐ocular13 contrast13 signals13 Natural13 image13 structure13 and13 vergence13 jitter13 both13 contribute13 to13 the13 lack13 of13 disparity13 information13 at13 high13 luminance13 spatial13 frequencies13 (d)13 The13 signals13 in13 (c)13 after13 having13 been13 passed13 through13 a13 bank13 of13 1513 octave13 bandwidth13 log-shy‐Gabor13 filters13 (e)13 The13 effect13 of13 a13 four13 diopter13 difference13 in13 refractive13 power13 between13 the13 two13 eyes13 drastically13 reduces13 the13 inter-shy‐ocular13 contrast13 signal13 A13 four13 diopter13 (or13 larger)13 difference13 between13 the13 left13 and13 right13 optics13 is13 a13 primary13 risk13 factor13 for13 the13 development13 of13 amblyopia13 (Levi13 et13 al13 2011)13 Note13 that13 these13 signals13 were13 simulated13 for13 eyes13 with13 2mm13 pupils13 With13 larger13 pupil13 diameters13 the13 same13 difference13 in13 refractive13 power13 would13 further13 reduce13 available13 disparity13 signals13 For13 example13 with13 a13 2mm13 pupil13 an13 813 diopter13 difference13 between13 the13 eyes13 produces13 the13 same13 difference13 in13 defocus13 blur13 as13 would13 a13 4mm13 pupil13 with13 a13 413 diopters13 difference13 between13 the13 eyes13 In13 that13 case13 there13 are13 no13 measureable13 signals13 above13 113 cpd13 (f)13 The13 signals13 in13 (e)13 after13 having13 been13 passed13 through13 a13 bank13 of13 1513 octave13 bandwidth13 log-shy‐Gabor13 filters13 (g)13 Same13 as13 (e)13 but13 with13 a13 4mm13 pupil13 (h)13 Same13 as13 (f)13 but13 with13 a13 4mm13 pupil13 13

13

1513 13

13 Figure13 S813 Depth13 structure13 not13 attributable13 to13 slant13 in13 natural13 scenes13 (a)13 A13 representative13 natural13 range13 image13 (b)13 Overhead13 plan13 view13 of13 the13 range13 along13 the13 white13 scan-shy‐line13 in13 (a)13 (c)13 Range13 values13 within13 a13 113 deg13 patch13 marked13 by13 the13 white13 box13 in13 (a)13 and13 the13 black13 box13 in13 (b)13 The13 dots13 represent13 the13 raw13 range13 values13 The13 plane13 is13 the13 best13 fitting13 plane13 to13 the13 range13 data13 (d)13 Depth13 variation13 in13 113 deg13 patches13 of13 natural13 scenes13 can13 largely13 be13 captured13 by13 slanted13 planes13 (see13 supplementary13 text)13 Values13 near13 one13 indicate13 that13 all13 range13 variation13 can13 be13 captured13 by13 (planar)13 surface13 slant13 Values13 near13 zero13 indicate13 that13 none13 of13 the13 range13 variation13 can13 be13 captured13 by13 planar13 surface13 slant13 In13 the13 most13 common13 patch13 nearly13 all13 depth13 variation13 is13 attributable13 to13 slant13 In13 the13 median13 patch13 7513 of13 depth13 variation13 is13 captured13 by13 slant13 13 13 13 13 13 13 13 13 13 13

1613 13

13 Figure13 S913 Contrast13 normalization13 and13 the13 Gaussian13 form13 of13 the13 filter13 response13 distributions13 The13 standard13 model13 of13

contrast13 normalization13 in13 retinal13 and13 cortical13 neurons13 is13 given13 by13 cnorm x( ) = c x( ) c x( ) 2

+ nc502 (Albrecht13 amp13 Geisler13

199113 Albrecht13 amp13 Hamilton13 198213 Heeger13 1992)13 Throughout13 the13 paper13 we13 assumed13 that13 c50 13 had13 a13 value13 of13 zero13 To13 examine13 the13 effect13 of13 different13 values13 of13 c50 13 on13 the13 form13 of13 the13 filter13 response13 distributions13 we13 first13 contrast-shy‐normalized13 the13 test13 stimuli13 using13 different13 values13 of13 c50 13 Then13 we13 projected13 the13 contrast-shy‐normalized13 the13 test13 stimuli13 from13 each13 disparity13 level13 onto13 the13 filters13 to13 obtain13 the13 conditional13 filter13 response13 distributions13 Finally13 we13 fit13 a13 multi-shy‐dimensional13 generalized13 Gaussian13 (one-shy‐dimension13 for13 each13 filter)13 to13 each13 conditional13 response13 distribution13

P R |δ k( ) 13 via13 maximum13 likelihood13 estimation13 (a)13 When13 c50 =13 0013 the13 conditional13 response13 distributions13 have13 tails13 that13 tend13 to13 be13 somewhat13 lighter13 than13 Gaussians13 (ie13 lower13 kurtosis13 than13 a13 Gaussian)13 When13 c50 =13 0113 the13 conditional13 response13 distributions13 are13 most13 Gaussian13 on13 average13 For13 c50 0113 the13 distributions13 have13 heavier13 tails13 than13 Gaussians13 (ie13 higher13 kurtosis13 than13 a13 Gaussian)13 (b)13 Powers13 of13 the13 multi-shy‐dimensional13 generalized13 Gaussians13 that13 best13 fit13 a13 subset13 of13 the13 two-shy‐dimensional13 conditional13 filter13 response13 distributions13 A13 power13 of13 two13 (dashed13 line)13 indicates13 that13 the13 best13 fitting13 distribution13 was13 Gaussian13 Powers13 greater13 than13 or13 less13 than13 two13 indicate13 that13 the13 responses13 are13 better13 fit13 by13 distributions13 having13 lighter13 or13 heavier13 tails13 than13 a13 Gaussian13 respectively13 The13 best-shy‐fitting13 powers13 are13 shown13 for13 three13 different13 values13 of13 c50 13 00013 01013 and13 02513 The13 average13 powers13 across13 the13 conditional13 response13 distributions13 for13 the13 three13 tested13 values13 of13 c50 13 were13 27913 20713 and13 11713 respectively13 The13 same13 qualitative13 results13 hold13 for13 the13 other13 two-shy‐dimensional13 response13 distributions13 The13 pattern13 also13 holds13 for13 higher-shy‐dimensional13 filter13 response13 distributions13 In13 higher13 dimensions13 however13 results13 are13 noisier13 because13 stable13 estimates13 of13 kurtosis13 require13 significantly13 more13 data13 13 13 13 13 13

  • jovi-14-02-01_118
    • Introduction
    • f01
    • Methods and results
    • e01
    • f02
    • e02
    • e03
    • e04
    • f03
    • e05
    • f04
    • f05
    • f06
    • f07
    • Discussion
    • Conclusions
    • Methods details
    • e06
    • e07
    • Albrecht1
    • Albrecht2
    • Anzai1
    • Badcock1
    • Banks1
    • Berens1
    • Blakemore1
    • Burge1
    • Burge2
    • Carandini1
    • Chen1
    • Cormack1
    • Cumming1
    • Cumming2
    • Cumming3
    • Curcio1
    • DeValois1
    • DeAngelis1
    • Ferster1
    • Fincham1
    • Fleet1
    • Ganguli1
    • Geisler1
    • Girshick1
    • Haefner1
    • Haefner2
    • Harris1
    • Hecht1
    • Heeger1
    • Hibbard1
    • Hoyer1
    • Jazayeri1
    • Kaye1
    • Landers1
    • Lehky1
    • Levi1
    • Lewicki1
    • Li1
    • Liu1
    • Ma1
    • Marcos1
    • Marr1
    • McKee1
    • Nakayama1
    • Nienborg1
    • Ogle1
    • Ohzawa1
    • Ohzawa2
    • Ohzawa3
    • Olshausen1
    • Panum1
    • Poggio1
    • Qian1
    • Qian2
    • Read1
    • Read2
    • Read3
    • Simoncelli1
    • Skottun1
    • Stevenson1
    • Stockman1
    • Tanabe1
    • Tanabe2
    • Thibos1
    • Tsao1
    • Tyler1
    • Tyler2
    • Vlaskamp1
    • Williams1
    • Wyszecki1
      • JOV-03741-2013-s01
Page 11: Optimal disparity estimation in natural stereo images · The local differences between the retinal images—the binocular disparities— ... vertically oriented receptive fields.The

stimulus properties (eg disparity defocus motion)not trivially available in the retinal image(s) Thus ouranalysis provides a recipe for how to increase selectivityand invariance in the visual processing stream invari-ance to image content variability and selectivity for thestimulus property of interest

Discussion

Using principles of Bayesian statistical decisiontheory we showed how to optimally estimate binoc-ular disparity in natural stereo images given theoptical sensor and noise properties of the humanvisual system First we determined the linear binoc-ular filters that encode the retinal image informationmost useful for estimating disparity in natural stereo-images The optimal linear filters share many proper-ties with binocular simple cells in the primary visualcortex of monkeys and cats (Figure 3 Figure S1)Then we determined how the optimal linear filterresponses should be nonlinearly decoded to maximizethe accuracy of disparity estimation The optimaldecoder was not based on prior assumptions butrather was dictated by the statistical pattern of thejoint filter responses to natural stereo images Overallperformance was excellent and matched importantaspects of human performance (Figure 5) Finally weshowed that the optimal encoder and decoder can beimplemented with well-established neural operationsThe operations show how selectivity and invariancecan emerge along the visual processing stream(Figures 6 amp 7)

External variability and neural noise

A visual systemrsquos performance is limited by bothexternal (stimulus-induced) variability and intrinsicneural response variability (ie neural noise) Manytheoretical studies have focused on the impact ofneural noise on neural computation (Berens EckerGerwinn Tolias amp Bethge 2011 Cumming ampDeAngelis 2001 DeAngelis et al 1991 Jazayeri ampMovshon 2006 Ma et al 2006 Nienborg et al 2004Ohzawa DeAngelis amp Freeman 1990) However inmany real-world situations stimulus-induced responsevariability (ie variability due to nuisance stimulusproperties) is a major or the dominant source ofvariability Our analysis demonstrates that the exter-nal variation in natural image contentmdashthat isvariation in image features irrelevant to the taskmdashconstitutes an important source of response variabilityin disparity estimation (see Figures 4 5 7) Thus evenin a hypothetical visual system composed of noiseless

neurons significant response variability will occurwhen estimating disparity under natural conditions(see Figures 4 and 5)

Determining the best performance possible withoutneural noise is essential for understanding the effect ofneural noise (Haefner amp Bethge 2010 Lewicki 2002Li amp Atick 1994 Olshausen amp Field 1996 Simoncelliamp Olshausen 2001) because it sets bounds on theamount of neural noise that a system can toleratebefore performance significantly deteriorates Indeedthe impact of neural noise on encoding and decoding innatural viewing cannot be evaluated without consider-ing the impact of stimulus-induced response variabilityon the population response (see Supplement)

Seventy years ago Hecht Shlaer and Pirenne (1942)showed in a landmark study that dark-adapted humanscan detect light in the periphery from the absorption ofonly five to eight quanta and that external variability(in the form of photon noise) was the primary source ofvariability in human performance This result contra-dicted the notion (widespread at the time) that responsevariability is due primarily to variability within theorganism (eg intrinsic noise) Similarly our resultsunderscore the importance of analyzing the variabilityof natural signals when analyzing the responsevariability of neural circuits that underlie performancein critical behavioral tasks

Comparison with the disparity energy model

The most common neurophysiological model ofdisparity processing is the disparity energy modelwhich aims to account for the response properties ofbinocular complex cells (Cumming amp DeAngelis 2001DeAngelis et al 1991 Ohzawa 1998 Ohzawa et al1990) The model had remarkable initial successHowever an increasing number of discrepancies haveemerged between the neurophysiology and the modelrsquospredictions The standard disparity energy model forexample does not predict the reduced response toanticorrelated stereograms observed in cortex It alsodoes not predict the dependence of disparity selectivecells on more than two binocular subunits Fixes to thestandard model have been proposed to account for theobserved binocular response properties (Haefner ampCumming 2008 Qian amp Zhu 1997 Read Parker ampCumming 2002 Tanabe amp Cumming 2008) We haveshown however that a number of these properties area natural consequence of an optimal algorithm fordisparity estimation

The most common computational model of disparityprocessing is the local cross-correlation model (Bankset al 2004 Cormack et al 1991 Tyler amp Julesz 1978)In this model the estimated disparity is the one thatproduces the maximum local cross-correlation between

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 11

the images in the two eyes Local cross-correlation isthe computational equivalent of appropriately pro-cessing large populations of disparity-energy complexcells having a wide range of tuning characteristics(Anzai Ohzawa amp Freeman 1999 Fleet Wagner ampHeeger 1996) Cross-correlation is optimal when thedisparities across a patch are small and constantbecause it is then akin to the simple template matchingof noise-limited ideal observers Cross-correlation isnot optimal for larger or varying disparities becausethen the left- and right-eye signals falling within abinocular receptive field often differ substantially

How does our ideal observer for disparity estimationcompare with the standard disparity-energy and cross-correlation models To enable a meaningful compari-son both methods were given access to the same imageinformation Doing so requires a slight modification tothe standard cross correlation algorithm (seeSupplement) The eight AMA filters in Figure 3outperform cross-correlation in both accuracy andprecision when the disparities exceed approximately75 arcmin (Figure S6) This performance increase isimpressive given that local cross-correlation has (ineffect) access to a large bank of filters whereas ourmethod uses only eight Furthermore increasing thenumber of AMA filters to 10ndash12 gives essentiallyequivalent performance to cross-correlation for smallerdisparities

Spatial frequency tuning of optimal linearbinocular filters

The optimal linear binocular filters (the AMA filters)are selective for low to mid spatial frequencies (12ndash47c8) and not for higher frequencies (see Figure 3 FigureS1) To develop an intuition for why we examined thebinocular contrast signals as a function of frequencythat result from a binocularly viewed high-contrastedge (Figure S7) The binocular contrast signals barelydiffer above 6 c8 indicating that higher spatialfrequencies carry little information for disparity esti-mation

This analysis of binocular contrast signals providesinsight into another puzzling aspect of disparityprocessing Human sensitivity to disparity modulation(ie sinusoidal modulations in disparity-defined depth)cuts off at very low frequencies (4 c8) (Banks et al2004 Tyler 1975) V1 binocular receptive fields tunedto the highest useful (luminance) spatial frequency (6c8) have widths of 8 arcmin assuming the typicaloctave bandwidth for cortical neurons Given thatreceptive fields cannot signal variation finer than theirown size the fact that humans cannot see disparitymodulations higher than 4 c8 is nicely predicted bythe 8 arcmin width suggested by the present analysis

This size matches previous psychophysically basedestimates of the smallest useful mechanism in disparityestimation (Banks et al 2004 Harris McKee ampSmallman 1997)

Eye movement jitter has previously been proposed asan explanation for why useful binocular information isrestricted to low spatial frequencies (Vlaskamp Yoonamp Banks 2011) The argument is that jitter and therelatively long integration time of the stereo systemcould lsquolsquosmear outrsquorsquo the disparity signals Eye movementjitter certainly can degrade stereo information enoughto render high frequency signals unmeasurable (FigureS7c d) However the analysis in the supplement(Figure S7) suggests that the low frequency at whichhumans lose the ability to detect disparity modulationmay also be explained by the statistics of naturalimages

Generality of findings

Although our analysis is based on calibrated naturalstereo-image patches and realistic characterizations ofhuman optics sensors and neural noise there are somesimplifying assumptions that could potentially limit thegenerality of our conclusions Here we examine theeffect of these assumptions

The effect of surface slant and nonplanar depth structure

The stereo-image patches were projections offronto-parallel surfaces In natural scenes surfaces areoften slanted which causes the disparity to changeacross a patch We evaluated the effect of surface slantby repeating our analysis with a training set having adistribution of slants (see Figure 5d e andSupplement) This distribution of slants produced adistribution of disparity gradients that are comparableto those that occur when viewing natural scenes(Hibbard 2008) (Figure S1g) The optimal binocularfilters are only very slightly affected by surface slant(cf Figure 3 and Figure S1a) These differenceswould be difficult to detect in neurophysiologicalexperiments This may help explain why there hasbeen little success in finding neurons in early visualcortex that are tuned to nonzero slants (Nienborg etal 2004) Disparity estimation performance is alsoquite similar although estimate precision is somewhatreduced (Figure 5d e)

Even after including the effect of surface slant ourtraining set lacks the non-planar depth structuremdashwithin-patch depth variation and depth discontinuities(occlusions)mdashthat is present in natural scenes (FigureS8andashc) To determine how the magnitude of theseother sources depth variation compare to that due toslant we analyzed a set of range images obtained with

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 12

a high-precision Riegl VZ400 range scanner (seeSupplement) We find that most of the depth variationin the range images is captured by variation in slant(Figure S8d) Given that variation in slant has littleeffect on the optimal receptive fields and given thatlocations of fine depth structure and monocularocclusion zones (9 of points in our dataset) aremost likely random with respect to the receptive fieldsthen it is likely that the other sources of variation willhave little effect

Nevertheless occlusions create monocular zones thatcarry information supporting da Vinci stereopsis inhumans (Kaye 1978 Nakayama amp Shimojo 1990)Neurophysiological evidence suggests that binocularmechanisms exist for locating occlusion boundaries(Tsao Conway amp Livingstone 2003) In the futurewhen databases become available of high-qualitynatural stereo images with coregistered RGB anddistance information the general approach describedhere may prove useful for determining the filters thatare optimal for detecting occlusion boundaries

The effect of a realistic disparity prior

Our analysis assumed a flat prior probabilitydistribution over disparity Two groups have recentlypublished estimates of the prior distribution ofdisparities based on range measurements in naturalscenes and human eye movement statistics (Hibbard2008 Liu et al 2008) Does the prior distribution ofdisparity signals have a significant effect on the optimalfilters and estimation performance To check wemodified the frequency of different disparities in ourtraining set to match the prior distribution ofdisparities encountered in natural viewing (Liu et al2008) Linear filter shapes LL neuron tuning curveshapes and performance levels were robust to differ-ences between a flat and realistic disparity prior

It has been hypothesized that the prior over thestimulus dimension of interest (eg disparity) maydetermine the optimal tuning curve shapes and theoptimal distribution of peak tunings (ie how thetuning curves should tile the stimulus dimension)(Ganguli amp Simoncelli 2010) In evaluating thishypothesis one must consider the results presented inthe present paper Our results show that natural signalsand the linear receptive fields that filter those signalsplace strong constraints on the shapes that tuningcurves can have Specifically the shapes of the LLneuron disparity tuning curve (approximately log-Gaussian) are robust to changes in the prior Thusalthough the prior may influence the optimal distribu-tion of peak tuning (our analysis does not address thisissue) it is unlikely to be the sole (or even the primary)determinant of tuning curve shapes

The effect of contrast normalization

Contrast normalization significantly contributes tothe Gaussian form of the filter response distributionsOne potential advantage of Gaussian response distri-butions is that pair-wise (and lower order) statisticsfully characterize the joint responses from an arbitrarynumber of filters making possible the decoding of largefilter population responses In the analysis presentedhere all binocular signals were normalized to a mean ofzero and a vector magnitude of 10 before beingprojected onto the binocular filters This is a simplifiedform of the contrast normalization that is a ubiquitousfeature of retinal and early cortical processing (Car-andini amp Heeger 2012) The standard model ofcontrast normalization in cortical neurons is given by

cnorm(x)frac14 c(x)ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffijjcethxTHORNjj2 thorn nc2

50

qwhere n is the dimen-

sionality of the vector and c50 is the half-saturationconstant (Albrecht amp Geisler 1991 Albrecht ampHamilton 1982 Heeger 1992) (The half-saturationconstant is so-called because in a neuron with anoutput squaring nonlinearity the response rate willequal half its maximum when the contrast of thestimulus equals the value of c50)

We examined the effect of different half-saturationconstants The optimal filters are robust to differentvalues of c50 The conditional response distributions arenot For large values of c50 the response distributionshave tails much heavier than Gaussians When c50frac14 00(as it did throughout the paper) the distributions arewell approximated but somewhat lighter-tailed thanGaussians The distributions (eg see Figure 4a) aremost Gaussian on average when c50 frac14 01 (Figure S9)On the basis of this finding we hypothesize a newfunction for cortical contrast normalization in additionto the many already proposed (Carandini amp Heeger2012) Contrast normalization may help create condi-tional filter response distributions that are Gaussianthereby making simpler the encoding and decoding ofhigh-dimensional subspaces of retinal image informa-tion

The effect of different optics in the two eyes

We modeled the optics of the two eyes as beingidentical whereas refractive power and monochromaticaberrations often differ somewhat between the eyes(Marcos amp Burns 2000) To evaluate the effect ofnormal optical variations we convolved the retinalprojections with point-spread functions previouslymeasured in the first authorrsquos left and right eyes (Burgeamp Geisler 2011) (The first author has normal opticalvariation) Then we determined optimal filters andperformance Filters and performance levels are similarto those in Figures 3 and 4 Thus our results are robustto typical optical differences between the eyes How-

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 13

ever if the optics in one eye is severely degradedrelative to the other then the optimal filters are quitedifferent and disparity estimation performance issignificantly poorer

Interestingly a persistent 4 diopter difference inrefractive power between the two eyes is the mostimportant risk factor for the development of amblyopia(Levi McKee amp Movshon 2011) a condition char-acterized by extremely poor or absent stereopsis(among other deficits) A four diopter differencebetween the left and right eye optics drastically reduces(ie nearly eliminates) computable interocular contrastsignals that covary systematically with disparity(Equation S6 Figure S7endashh)

Conclusions

The method presented here provides a prescriptionfor how to optimally estimate local depth cues fromimages captured by the photosensors in any visionsystem Our analysis of the depth cue of binoculardisparity provides insight into the computations andneural populations required for accurate estimation ofbinocular disparity in animals viewing natural scenesSpecifically by analyzing the information available atthe photoreceptors we improve upon existing compu-tational methods for solving the stereo correspondenceproblem and provide a normative account of a range ofestablished neurophysiological and psychophysicalfindings This study demonstrates the power ofcharacterizing the properties of natural signals that arerelevant for performing specific natural tasks A similarrecent analysis provided insight into the neuralcomputations required for accurate estimation of thefocus error (defocus) in local regions of retinal imagesof natural scenes (Burge amp Geisler 2011 Stevenson etal 1992) The same approach seems poised to provideinsight into the neural encoding and decoding thatunderlie many other fundamental sensory and percep-tual tasks

Methods details

Natural disparity signals

To determine the luminance structure of naturalscenes we photographed natural scenes on and aroundthe University of Texas at Austin campus with a tripod-mounted Nikon D700 14-bit SLR camera (4256 middot 2836pixels) fitted with a Sigma 50 mm prime lens (Burge ampGeisler 2011) To ensure sharp photographs thecamera lens was focused on optical infinity and all

imaged objects were at least 16 m from the cameraRAW photographs were converted to luminance valuesusing the measured sensitivities of the camera (Burge ampGeisler 2011 2012) and the human photopic sensitivityfunction (Stockman amp Sharpe 2000) Twelve hundred256 middot 256 pixel patches were randomly selected from80 photographs (15 Patches middot 80 Images Figure 1b)Four hundred were used for training the other 800were used for testing

To approximate the depth structure of naturalscenes we texture map the luminance patches ontoplanar fronto-parallel or slanted surface patches thatwere straight ahead at eye height After perspectiveprojection this procedure yields the same pattern ofretinal stimulation that would be created by viewingphotographs pasted on surfaces (eg walls) differentlyslanted in depth The disparity of the surface patch wasdefined as the disparity of the surfacersquos central point

d frac14 2 tan1 IPD=2

dfixation thorn D

tan1 IPD=2

ethdfixationTHORN

eth6THORNwhere dfixation is the fixation distance to the surfacecenter IPD is the interpupillary distance and D is thedepth Depth is given by D frac14 dsurface dfixation wheredsurface is the distance to the surface The disparity ofother points on the surface patch varies slightly acrossthe surface patch The change in disparity across thesurface is greater when the surface patches are slanted

Each surface patch was either coincident with ordisplaced from the point of fixation (Figure 1c) that isthe surface patches were positioned at one of 37 depthscorresponding to 37 disparity levels within Panumrsquosfusional range (Panum 1858) (16875 to 16875arcmin in equally spaced steps) Each surface patchsubtended 28 of visual angle from the cyclopean eyeSurfaces were then slanted (or not) and left- and right-eye projections were determined via perspective pro-jection ensuring that horizontal and vertical disparitieswere correct (Figure 1c) Next left and right projec-tions were resampled at 128 samples8 approximatelythe sampling rate of the human foveal cones (CurcioSloan Kalina amp Hendrickson 1990) Finally theimages were cropped such that they subtended 18 fromeach eye (6 058 about the left- and right-eye foveae)

Optics

Patches were defocused with a polychromatic point-spread function based on a wave-optics model of thehuman visual system The model assumed a 2 mm pupil(a size typical for a bright sunny day) (Wyszecki ampStiles 1982) human chromatic aberrations (ThibosYe Zhang amp Bradley 1992) a single refracting

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 14

surface and the Fraunhoffer approximation whichimplies that at or near the focal plane the opticaltransfer function (OTF) is given by the cross-correla-tion of the generalized pupil function with its complexconjugate In humans and nonhuman primates ac-commodative and vergence eye-movement systems aretightly coupled In other words the focus distanceusually equals the fixation distance (Fincham ampWalton 1957) We set the focus distance equal to thefixation distance of 40 cm and defocused the imagesappropriate for each disparity See Supplement forfurther details of how the optics were simulated

Sensor array responses

The information for disparity estimation depends onthe depth and luminance structure of natural scenesprojective geometry the optics of the two eyes thesensor arrays and the sensor noise These factorstogether determine the pattern of noisy sensor re-sponses

reethxTHORN frac14 IeethxTHORNpsfeethxTHORNfrac12 sampethxTHORN thorn g eth7THORNfor each eye e Ie(x) represents an eye-specific lumi-nance image of the light striking the sensor array ateach location xfrac14 (x y) in a hypothetical optical systemthat causes zero degradation in image quality Eacheyersquos optics degrade the idealized image the optics arerepresented by a polychromatic point-spread functionpsfe(x) that contains the effects of defocus chromaticaberration and photopic wavelength sensitivity Thesensor arrays are represented by a spatial samplingfunctions samp(x) Finally the sensor responses arecorrupted by noise g the noise level was set just highenough to remove retinal image detail that is unde-tectable by the human visual system (Williams 1985)(In pilot studies we found that the falloff in contrastsensitivity at low frequencies had a negligible effect onresults for simplicity we did not model its effects) Notethat Equation 7 represents a luminance (photopic)approximation of the images that would be captured bythe photoreceptors This approximation is sufficientlyaccurate for the present purposes (Burge amp Geisler2011)

Accuracy maximization analysis (AMA)

AMA is a method for dimensionality reduction thatfinds the low-dimensional set of features in sensorysignals that are most useful for performing a specifictask The dependence of AMA on task renders itdistinct from many other popular methods for dimen-sionality reduction Principal components analysis(PCA) for example finds the dimensions that account

for the most variation in the sensory signals There isof course no guarantee that factors causing the mostvariation in the sensory signals will be useful for thetask at hand Color lighting and the multitude oftextures in natural images for example cause signifi-cant variation in the retinal images Yet all of thisvariation is irrelevant for the task of estimatingdisparity Thus PCA returns features that are notnecessarily useful to the task while AMA is specificallydesigned to ignore the irrelevant variation

The logic of AMA is as follows Consider encodingeach training stimulus with a small population of filterswith known (internal) noise characteristics The filterresponses are given by the dot product of the contrast-normalized stimulus scaled by the maximum responserate plus response noise (here a small amount ofGaussian noise) With a known noise model it isstraightforward to compute the mean and variance ofeach filterrsquos response to each training stimulus Then aclosed-form expression is derived for the approximateaccuracy of the Bayesian optimal decoder (GeislerNajemnik amp Ing 2009) Finally this closed-formexpression can be used to search the space of linearfilters to find those that give the most accurateperformance If the algorithm does not settle in localminima it finds the Bayes-optimal filters for maximiz-ing performance in a given task (Geisler et al 2009)Here different random initializations yielded the samefinal estimated filters A Matlab implementation ofAMA and a short discussion of how to apply it areavailable at httpjburgecpsutexaseduresearchCodehtml

It is important to note that the Bayesian optimaldecoder used in AMA requires knowing the means andvariances for each possible stimulus and hence it canonly be used to decode the training stimuli In otherwords AMA can find the optimal linear filters given alarge enough training set but the decoder it uses tolearn those filters cannot be applied to arbitrary stimuliA separate analysis (like the one described in the bodyof the present paper) is required to determine how tooptimally decode the AMA filter responses for arbi-trary test stimuli

Estimating disparity

Increasing the number of disparity levels in thetraining set increases the accuracy of disparity estima-tion However increasing the number of disparitylevels in the training set also increases the training setsize and the computational complexity of learningfilters via AMA A balance must be struck Excellentcontinuous estimates are obtained using 1875 arcminsteps for training followed by interpolation of 577 filterresponse distributions (ie 31 interpolated response

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 15

distribution between each training step correspondingto one interpolated distribution every 35 arcsec)Interpolated distributions were obtained by fittingcubic splines through the response distribution meanvectors and covariance matrices This procedureresulted in 577 LL neurons which resulted in posteriorprobability distributions that were defined by 577discrete points MAP estimates were obtained byselecting the disparity with the highest posteriorprobability Increasing the number of interpolateddistributions (ie LL neurons) had no effect onperformance

Keywords natural scene statistics perceptual con-stancy ideal observer Bayesian statistics populationcode encoding decoding selectivity invariance depthperception stereopsis hierarchical model disparityenergy model simple cells complex cells

Acknowledgments

The authors thank Larry Cormack Bruno Olshau-sen and Jonathan Pillow for comments on an earlierdraft JB was supported by NSF grant IIS-1111328NIH grant 2R01-EY011747 and NIH Training Grant1T32-EY021462-01A1 WSG was supported by NIHgrant 2R01-EY011747

Commercial relationships noneCorresponding author Johannes BurgeEmail jburgemailcpsutexaseduAddress Center for Perceptual Systems University ofTexas at Austin Austin TX USA

References

Albrecht D G amp Geisler W S (1991) Motionselectivity and the contrast-response function ofsimple cells in the visual cortex Visual Neurosci-ence 7(6) 531ndash546

Albrecht D G amp Hamilton D B (1982) Striatecortex of monkey and cat Contrast responsefunction Journal of Neurophysiology 48(1) 217ndash237

Anzai A Ohzawa I amp Freeman R D (1999)Neural mechanisms for processing binocular in-formation I Simple cells Journal of Neurophysiol-ogy 82(2) 891ndash908

Badcock D R amp Schor C M (1985) Depth-increment detection function for individual spatialchannels Journal of the Optical Society of AmericaA 2(7) 1211ndash1215

Banks M S Gepshtein S amp Landy M S (2004)Why is spatial stereoresolution so low Journal ofNeuroscience 24(9) 2077ndash2089

Berens P Ecker A S Gerwinn S Tolias A S ampBethge M (2011) Reassessing optimal neuralpopulation codes with neurometric functionsProceedings of the National Academy of SciencesUSA 108(11) 4423

Blakemore C (1970) The range and scope ofbinocular depth discrimination in man The Journalof Physiology 211(3) 599ndash622

Burge J amp Geisler W S (2011) Optimal defocusestimation in individual natural images Proceed-ings of the National Academy of Sciences USA108(40) 16849ndash16854

Burge J amp Geisler W S (2012) Optimal defocusestimates from individual images for autofocusing adigital camera Proceedings of the ISampTSPIE 47thAnnual Meeting January 2012 Burlingame CA

Carandini M amp Heeger D J (2012) Normalizationas a canonical neural computation Nature ReviewsNeuroscience 13(1) 51ndash62

Chen Y amp Qian N (2004) A coarse-to-fine disparityenergy model with both phase-shift and position-shift receptive field mechanisms Neural Computa-tion 16(8) 1545ndash1577

Cormack L K Stevenson S B amp Schor C M(1991) Interocular correlation luminance contrastand cyclopean processing Vision Research 31(12)2195ndash2207

Cumming B G (2002) An unexpected specializationfor horizontal disparity in primate primary visualcortex Nature 418(6898) 633ndash636

Cumming B G amp DeAngelis G C (2001) Thephysiology of stereopsis Annual Review of Neuro-science 24 203ndash238

Cumming B G Shapiro S E amp Parker A J (1998)Disparity detection in anticorrelated stereogramsPerception 27(11) 1367ndash1377

Curcio C A Sloan K R Kalina R E ampHendrickson A E (1990) Human photoreceptortopography The Journal of Comparative Neurolo-gy 292(4) 497ndash523

De Valois R L Albrecht D G amp Thorell L G(1982) Spatial frequency selectivity of cells inmacaque visual cortex Vision Research 22(5) 545ndash559

DeAngelis G C Ohzawa I amp Freeman R D(1991) Depth is encoded in the visual cortex by aspecialized receptive field structure Nature352(6331) 156ndash159

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 16

Ferster D amp Miller K D (2000) Neural mechanismsof orientation selectivity in the visual cortexAnnual Review of Neuroscience 23(1) 441ndash471

Fincham E amp Walton J (1957) The reciprocalactions of accommodation and convergence TheJournal of Physiology 137(3) 488ndash508

Fleet D J Wagner H amp Heeger D J (1996)Neural encoding of binocular disparity Energymodels position shifts and phase shifts VisionResearch 36(12) 1839ndash1857

Ganguli D amp Simoncelli E P (2010) Implicitencoding of prior probabilities in optimal neuralpopulations Advances in Neural Information Pro-cessing Systems (NIPS) 23 658ndash666

Geisler W S Najemnik J amp Ing A D (2009)Optimal stimulus encoders for natural tasksJournal of Vision 9(13)17 1ndash16 httpwwwjournalofvisionorgcontent91317 10116791317 [PubMed] [Article]

Girshick A R Landy M S amp Simoncelli E P(2011) Cardinal rules Visual orientation percep-tion reflects knowledge of environmental statisticsNature Neuroscience 14(7) 926ndash932

Haefner R amp Bethge M (2010) Evaluating neuronalcodes for inference using Fisher informationAdvances in Neural Information Processing Systems23 1ndash9

Haefner R M amp Cumming B G (2008) Adaptationto natural binocular disparities in primate V1explained by a generalized energy model Neuron57(1) 147ndash158

Harris J M McKee S P amp Smallman H S (1997)Fine-scale processing in human binocular stereop-sis Journal of the Optical Society of America AOptics Image Science and Vision 14(8) 1673ndash1683

Hecht S Shlaer S amp Pirenne M H (1942) Energyquanta and vision The Journal of General Physi-ology 25(6) 819ndash840

Heeger D J (1992) Normalization of cell responses incat striate cortex Visual Neuroscience 9(2) 181ndash197

Hibbard P B (2008) Binocular energy responses tonatural images Vision Research 48(12) 1427ndash1439

Hoyer P O amp Hyvarinen A (2000) Independentcomponent analysis applied to feature extractionfrom colour and stereo images Network Compu-tation in Neural Systems 11(3) 191ndash210

Jazayeri M amp Movshon J A (2006) Optimalrepresentation of sensory information by neuralpopulations Nature Neuroscience 9(5) 690ndash696

Kaye M (1978) Stereopsis without binocular corre-lation Vision Research 18(8) 1013ndash1022

Landers D D amp Cormack L K (1997) Asymme-tries and errors in perception of depth fromdisparity suggest a multicomponent model ofdisparity processing Perception amp Psychophysics59(2) 219ndash231

Lehky S R amp Sejnowski T J (1990) Neural modelof stereoacuity and depth interpolation based on adistributed representation of stereo disparityJournal of Neuroscience 10(7) 2281ndash2299

Levi D M McKee S P amp Movshon J A (2011)Visual deficits in anisometropia Vision Research51(1) 48ndash57

Lewicki M S (2002) Efficient coding of naturalsounds Nature Neuroscience 5(4) 356ndash363

Li Z amp Atick J J (1994) Efficient stereo coding inthe multiscale representation Network Computa-tion in Neural Systems 5 157ndash174

Liu Y Bovik A C amp Cormack L K (2008)Disparity statistics in natural scenes Journal ofVision 8(11)19 1ndash14 httpwwwjournalofvisionorgcontent81119 doi10116781119 [PubMed][Article]

Ma W J Beck J M Latham P E amp Pouget A(2006) Bayesian inference with probabilistic pop-ulation codes Nature Neuroscience 9(11) 1432ndash1438

Marcos S amp Burns S A (2000) On the symmetrybetween eyes of wavefront aberration and conedirectionality Vision Research 40(18) 2437ndash2447

Marr D amp Poggio T (1976) Cooperative computa-tion of stereo disparity Science 194(4262) 283ndash287

McKee S P Levi D M amp Bowne S F (1990) Theimprecision of stereopsis Vision Research 30(11)1763ndash1779

Nakayama K amp Shimojo S (1990) Da Vincistereopsis Depth and subjective occluding contoursfrom unpaired image points Vision Research30(11) 1811ndash1825

Nienborg H Bridge H Parker A J amp CummingB G (2004) Receptive field size in V1 neuronslimits acuity for perceiving disparity modulationJournal of Neuroscience 24(9) 2065ndash2076

Ogle K N (1952) On the limits of stereoscopic visionJournal of Experimental Psychology 44(4) 253ndash259

Ohzawa I (1998) Mechanisms of stereoscopic visionThe disparity energy model Current Opinion inNeurobiology 8(4) 509ndash515

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 17

Ohzawa I DeAngelis G C amp Freeman R D(1990) Stereoscopic depth discrimination in thevisual cortex Neurons ideally suited as disparitydetectors Science 249(4972) 1037ndash1041

Ohzawa I DeAngelis G C amp Freeman R D(1997) Encoding of binocular disparity by complexcells in the catrsquos visual cortex Journal of Neuro-physiology 77(6) 2879ndash2909

Olshausen B A amp Field D J (1996) Emergence ofsimple-cell receptive field properties by learning asparse code for natural images Nature 381(6583)607ndash609

Panum P (1858) Physiologische Untersuchungen uberdas Sehen mit zwei Augen [Translation Psycholog-ical investigations on seeing with two eyes] KielSchwerssche Buchandlung

Poggio G F Gonzalez F amp Krause F (1988)Stereoscopic mechanisms in monkey visual cortexBinocular correlation and disparity selectivityJournal of Neuroscience 8(12) 4531ndash4550

Qian N (1997) Binocular disparity and the perceptionof depth Neuron 18(3) 359ndash368

Qian N amp Zhu Y (1997) Physiological computationof binocular disparity Vision Research 37(13)1811ndash1827

Read J C A amp Cumming B G (2003) Testingquantitative models of binocular disparity selectiv-ity in primary visual cortex Journal of Neurophys-iology 90(5) 2795ndash2817

Read J C A amp Cumming B G (2007) Sensors forimpossible stimuli may solve the stereo correspon-dence problem Nature Neuroscience 10(10) 1322ndash1328

Read J C A Parker A J amp Cumming B G (2002)A simple model accounts for the response ofdisparity-tuned V1 neurons to anticorrelated im-ages Visual Neuroscience 19(6) 735ndash753

Simoncelli E P amp Olshausen B A (2001) Naturalimage statistics and neural representation AnnualReview of Neuroscience 24 1193ndash1216

Skottun B C De Valois R L Grosof D HMovshon J A Albrecht D G amp Bonds A B(1991) Classifying simple and complex cells on the

basis of response modulation Vision Research31(7-8) 1079ndash1086

Stevenson S B Cormack L K Schor C M ampTyler C W (1992) Disparity tuning in mecha-nisms of human stereopsis Vision Research 32(9)1685ndash1694

Stockman A amp Sharpe L T (2000) The spectralsensitivities of the middle- and long-wavelength-sensitive cones derived from measurements inobservers of known genotype Vision Research40(13) 1711ndash1737

Tanabe S amp Cumming B G (2008) Mechanismsunderlying the transformation of disparity signalsfrom V1 to V2 in the macaque Journal ofNeuroscience 28(44) 11304ndash11314

Tanabe S Haefner R M amp Cumming B G (2011)Suppressive mechanisms in monkey V1 help tosolve the stereo correspondence problem Journal ofNeuroscience 31(22) 8295ndash8305

Thibos L N Ye M Zhang X amp Bradley A(1992) The chromatic eye A new reduced-eyemodel of ocular chromatic aberration in humansApplied Optics 31(19) 3594ndash3600

Tsao D Y Conway B R amp Livingstone M S(2003) Receptive fields of disparity-tuned simplecells in macaque V1 Neuron 38(1) 103ndash114

Tyler C W (1975) Spatial organization of binoculardisparity sensitivity Vision Research 15(5) 583ndash590

Tyler C W amp Julesz B (1978) Binocular cross-correlation in time and space Vision Research18(1) 101ndash105

Vlaskamp B N S Yoon G amp Banks M S (2011)Human stereopsis is not limited by the optics of thewell-focused eye Journal of Neuroscience 31(27)9814ndash9818

Williams D R (1985) Visibility of interference fringesnear the resolution limit Journal of the OpticalSociety of America A 2(7) 1087ndash1093

Wyszecki G amp Stiles W (1982) Color scienceConcepts and methods quantitative data and for-mulas New York John Wiley amp Sons

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 18

113 13

Optimal13 disparity13 estimation13 in13 natural13 stereo-shy‐images13 (Supplement)13

Johannes13 Burge13 amp13 Wilson13 S13 Geisler13 13 Computing13 the13 log13 likelihood13 with13 populations13 of13 simple13 and13 complex13 cells13 13 Simple13 cells13 in13 visual13 cortex13 produce13 only13 positive13 responses13 whereas13 AMA13 filters13 produce13 both13 positive13 and13 negative13 responses13 The13 linear13 AMA13 filter13 responses13 (see13 first13 term13 in13 Eq13 5)13 and13 their13 pairwise13 sums13 (see13 the13 sum13 in13 the13 third13 termrsquos13 parentheses)13 can13 be13 obtained13 from13 a13 population13 of13 simple13 cells13 We13 define13 simple13 cells13 as13 units13 whose13 outputs13 result13 from13 linear13 filtering13 followed13 by13 half-shy‐wave13 rectification13 (ie13 thresholding13 at13 zero)13 Each13 AMA13 filter13 response13 in13 the13 first13 term13 can13 be13 obtained13 from13 two13 simple13 cells13 that13 are13 lsquoonrsquo13 and13 lsquooffrsquo13 versions13 of13 each13 AMA13 filter13 (Fig13 S3a13 see13 Fig13 3)13 Each13 pairwise-shy‐summed13 AMA13 filter13 response13 in13 the13 third13 term13 can13 be13 obtained13 from13 two13 simple13 cells13 that13 are13 lsquoonrsquo13 and13 lsquooffrsquo13 versions13 of13 each13 pairwise-shy‐summed13 filter13 (Fig13 S3a13 S4)13 13 13 The13 squared13 AMA13 filter13 responses13 (see13 the13 second13 and13 third13 terms13 in13 Eq13 5)13 can13 be13 obtained13 from13 a13 population13 of13 complex13 cells13 We13 define13 complex13 cells13 as13 units13 whose13 responses13 result13 from13 linear13 filtering13 followed13 by13 a13 squaring13 non-shy‐linearity13 as13 is13 often13 the13 case13 in13 cortex13 Complex13 cells13 can13 be13 implemented13 by13 summing13 and13 then13 squaring13 (or13 squaring13 and13 then13 summing)13 the13 outputs13 of13 lsquoonrsquo13 and13 lsquooffrsquo13 simple13 cells13 (Fig13 S3b13 the13 disparity13 tuning13 curves13 of13 these13 complex13 cells13 are13 given13 in13 Fig13 S5)13 13 Finally13 neurons13 whose13 response13 rates13 represent13 the13 log13 likelihood13 of13 the13 AMA13 filter13 population13 responses13 can13 be13 constructed13 via13 a13 weighted13 sum13 of13 simple13 and13 complex13 cell13 population13 responses13 (Eq13 513 Fig13 613 Fig13 7)13 (Note13 that13 the13 likelihood13 can13 be13 obtained13 by13 exponentiating13 the13 log-shy‐likelihood13 responses)13 In13 other13 words13 a13 large13 collection13 of13 log-shy‐likelihood13 (LL)13 neurons13 each13 with13 a13 different13 preferred13 disparity13 can13 be13 constructed13 from13 a13 fixed13 set13 of13 simple13 and13 complex13 cells13 simply13 by13 changing13 the13 weights13 13 13 Derivation13 of13 optimal13 weights13 on13 simple13 and13 complex13 cells13 The13 first13 term13 in13 Eq13 513 (ie13 the13 linear13 term)13 can13 be13 conceptualized13 as13 a13 weighted13 sum13 of13 simple13 cell13 population13 responses13 The13 second13 and13 third13 terms13 (ie13 the13 quadratic13 terms)13 can13 be13 conceptualized13 as13 weighted13 sums13 of13 complex13 cell13 population13 responses13 The13 constant13 corresponds13 to13 baseline13 response13 The13 constant13 and13 the13 weights13 on13 the13 linear13 squared13 and13 sum-shy‐squared13 filters13 are13 given13 by13

wik = Ck

minus1uk 13 13 13 13 13 13 13 13 13 (S1a)13

13 wiik = minusdiag(Cminus1)+ 05Cminus11 13 13 13 13 13 13 13 (S1b)13

wij k = minus05Cij

minus1 forallij j gt i 13 13 13 13 13 13 13 13 (S1c)13

cons primetk = minus05ukTCk

minus1uk + constk 13 13 13 13 13 13 (S1d)13 13 where13 I13 is13 the13 identity13 matrix13 113 is13 the13 lsquoonesrsquo13 vector13 and13 diag()13 sets13 a13 matrix13 diagonal13 to13 a13 vector13 Here13 we13 derive13 these13 weight13 equations13 based13 on13 the13 multi-shy‐dimensional13 Gaussian13 approximations13 to13 the13 empirically13 determined13 filter13 response13 distributions13 (ie13 conditional13 response13 distributions)13 (see13 Fig13 4a)13 13 13 13 13 13

213 13

First13 we13 rewrite13 Eq13 513 in13 the13 main13 text13 by13 expanding13 its13 third13 term13 13

ln p R δ k( ) = wik Ri

i=1

n

sum + wiik Ri2

i=1

n

sum + wij k Ri2 + 2Ri Rj + Rj

2( )j=i+1

n

sumi=1

nminus1

sum + cons primet 13 (S2)13

13 Second13 we13 rewrite13 Eq13 413 in13 the13 main13 text13

13

ln p R δ k( ) = minus05 R minus uk( )T Ak R minus uk( )+ const 13 13 13 13 13 (S3)13

13 where13 Ak 13 equals13 the13 inverse13 covariance13 matrix13 Ck

minus1 13 (Recall13 that13 the13 mean13 vector13 uk and13 covariance13 matrix13 Ck are13 obtained13 by13 projecting13 a13 collection13 of13 natural13 stereo-shy‐images13 having13 disparity13 δ k 13 onto13 the13 filters)13 From13 here13 forward13 the13 disparity13 index13 k13 is13 dropped13 for13 notational13 simplicity13 Third13 we13 set13 Eqs13 S213 and13 S313 equal13 to13 one13 another13 multiply13 through13 and13 collect13 terms13 The13 terms13 of13 the13 log13 likelihood13 (Eq13 S3)13 are13 found13 to13 have13 the13 form13 13

( ) ( )2 2 205 05 2ii i i ii i ii i i ii ia R u a R a Ru a uminus minus = minus minus + 13 13 13 13 (S4a)13

minusaij Ri minus ui( ) Rj minus u j( ) = minus aijRiRj minus aijRiu j minus aijRjui + aiju jui( ) 13 13 13 (S4b)13

13 where13 iia 13 are13 the13 on-shy‐diagonal13 elements13 of13 A 13 ija are13 the13 off-shy‐diagonal13 elements13 and13 iu 13 the13 are13 the13 mean13 filter13 responses13 13 13 The13 aim13 is13 now13 to13 express13 wi wii 13 and13 wij 13 in13 terms13 of13 iu 13 iia 13 and13 ija 13 First13 consider13 wi 13 In13 this13 case13

we13 want13 to13 collect13 terms13 containing13 iR 13 wiRi 13 from13 Eq13 S213 and13 aiiRiui 13 and13 aijRiu j 13 from13 Eq13 S4a13 Setting13 the13 iR 13 terms13 equal13 and13 canceling13 gives13 13

wi = aiju jj=1

n

sum 13 13 13 13 13 13 13 13 13 (S5a)13

13 Note13 that13 the13 weights13 on13 the13 linear13 filter13 responses13 wi 13 are13 zero13 if13 the13 mean13 linear13 filter13 responses13 ui 13 are13 zero13 In13 our13 case13 the13 filter13 response13 distributions13 means13 (ie13 the13 means13 of13 the13 conditional13 response13 distributions)13 are13 all13 near13 zero13 (see13 Fig13 4a13 Fig13 S2)13 Thus13 the13 weights13 on13 the13 simple13 cell13 responses13 (Fig13 6a)13 contribute13 little13 to13 the13 units13 (LL13 neurons)13 that13 respond13 according13 to13 the13 log13 likelihood13 of13 the13 linear13 filter13 population13 response13 (see13 Discussion)13 13 Next13 consider13 wij 13 i ne j 13 In13 this13 case13 we13 want13 to13 collect13 terms13 containing13 RiRj 13 2wijRiRj 13 appears13

in13 Eq13 S213 and13 minusaijRiRj 13 appears13 in13 Eq13 S4b13 Setting13 the13 two13 RiRj 13 terms13 equal13 to13 each13 other13 and13 canceling13 gives13 13

wij = minus05aij 13 13 13 13 13 13 13 13 13 (S5b)13

313 13

Next13 consider13 wii 13 In13 this13 case13 the13 total13 weight13 on13 2iR 13 in13 Eq13 S213 must13 be13 05 iiaminus 13 The13 weight13 from13

the13 third13 term13 of13 Eq13 S213 can13 be13 found13 by13 substituting13 wij = minus05aij 13 Subtracting13 that13 weight13 from13 the13

total13 weight13 specified13 by13 Eq13 S413 gives13 wii 13 13

wii = minusaii + 05 aijjsum 13 13 13 13 13 13 13 13 13 (S5c)13

13 Last13 by13 grouping13 terms13 from13 Eq13 S413 that13 are13 independent13 of13 the13 filter13 response13 we13 find13 the13 constant13 in13 Eq13 S213 is13 given13 by13 13

cons primet = minus05 aiimicroi2 + aijmicroimicro j

j foralljneisum

⎝⎜⎞

⎠⎟isum + const 13 13 13 13 (S5d)13

13 Eqs13 S1a-shy‐d13 are13 obtained13 by13 rewriting13 Eqs13 S5a-shy‐d13 in13 matrix13 notation13 13 13 In13 visual13 cortex13 simple13 and13 complex13 cells13 have13 the13 same13 maximum13 firing13 rates13 on13 average13 (Albrecht13 amp13 Hamilton13 1982)13 However13 the13 maximum13 possible13 response13 of13 the13 squared13 filters13 (see13 the13 second13 term13 in13 Eq13 5)13 is13 R2max 13 and13 the13 maximum13 possible13 response13 of13 the13 sum-shy‐squared13 filters13 (see13 the13 third13

term13 in13 Eq13 5)13 is13 given13 by13 R2ijmax = Rmax 2+ 2fi sdot f j( )2 13 where13 fi 13 and13 f j 13 are13 the13 weighting13 functions13 defining13 filters13 i13 and13 j13 (see13 Fig13 3a)13 Thus13 if13 these13 computations13 were13 implemented13 in13 cortexmdashthat13 is13 if13 the13 maximum13 response13 of13 the13 complex13 cells13 equaled13 the13 maximum13 response13 of13 the13 simple13 cells13 Rmax 13 rather13 than13 R

2max 13 and13 R

2ijmaxmdashthe13 weights13 on13 the13 complex13 cell13 responses13 must13 be13 scaled13

Thus13 the13 optimal13 weights13 on13 complex13 cells13 are13 given13 by13 primewi = wi 13 primewii = Rmaxwii 13 and13 primewij =Rijmax2

Rmaxwij 13

These13 scale13 factors13 are13 important13 because13 in13 a13 neurophysiological13 experiment13 one13 would13 measure13 primew 13 not13 w 13 Consequently13 primewii 13 and13 primewij 13 are13 the13 normalized13 weights13 plotted13 in13 Fig13 7b13

13 Evaluating13 the13 effect13 of13 surface13 slant13 Slant13 causes13 disparity13 to13 change13 across13 the13 patch13 We13 evaluated13 the13 effect13 of13 surface13 slant13 by13 determining13 the13 optimal13 filters13 and13 estimation13 performance13 for13 a13 training13 set13 with13 a13 distribution13 of13 slants13 If13 we13 assume13 (plausibly)13 that13 objects13 are13 on13 average13 equally13 likely13 to13 be13 viewed13 from13 any13 given13 angle13 then13 for13 a13 given13 small13 angular13 retinal13 extent13 the13 probability13 of13 a13 given13 slant13 about13 the13 vertical13 axis13 should13 fall13 as13 a13 half-shy‐cosine13 function13 Slant13 was13 calculated13 for13 the13 cyclopean13 eye13 To13 evaluate13 the13 effect13 of13 surface13 slant13 variation13 we13 generated13 a13 training13 set13 of13 stereo-shy‐images13 from13 a13 set13 of13 surfaces13 with13 slants13 that13 were13 randomly13 drawn13 from13 a13 truncated13 (+7113 deg)13 cosine13 probability13 density13 (more13 extreme13 slants13 were13 not13 practical13 to13 include13 because13 of13 the13 surface13 size13 required13 to13 produce13 a13 projected13 patch13 width13 of13 113 deg)13 The13 within-shy‐patch13 changes13 in13 disparity13 that13 were13 introduced13 by13 these13 slants13 are13 comparable13 to13 those13 in13 natural13 viewing13 (Fig13 S1g)13 Results13 based13 on13 this13 analysis13 are13 thus13 likely13 to13 be13 representative13 Then13 we13 determined13 the13 optimal13 binocular13 filters13 for13 this13 new13 training13 set13 The13 filters13 that13 were13 trained13 on13 surfaces13 having13 different13 slants13 (Fig13 S1a-shy‐c)13 and13 the13 LL13 neurons13 that13 are13 constructed13 from13 them13 (Fig13 S1de)13 are13 quite13 similar13 to13 those13 that13 were13 trained13 only13 on13 fronto-shy‐parallel13 surfaces13 The13 main13 (although13 slight)13 differences13 are13 that13 the13 new13 filters13 have13 spatial13 extents13 that13 are13 slightly13 smaller13 and13 octave13 bandwidths13 that13 are13 slightly13 higher13

413 13

and13 that13 the13 new13 LL13 neuron13 tuning13 curves13 are13 broader13 Disparity13 estimation13 performance13 on13 slant13 surfaces13 is13 also13 comparable13 but13 precision13 is13 somewhat13 reduced13 (Fig13 S1f13 Fig13 5d)13 When13 disparity13 changes13 across13 the13 patch13 the13 optimal13 encoding13 filters13 must13 be13 small13 enough13 to13 prevent13 the13 disparity13 signal13 from13 being13 lsquosmeared13 outrsquo13 by13 within-shy‐patch13 (ie13 within-shy‐receptive13 field)13 disparity13 variation13 And13 they13 must13 be13 large13 enough13 to13 prevent13 a13 poor13 signal-shy‐to-shy‐noise13 ratio13 which13 would13 lead13 to13 a13 preponderance13 of13 false13 matches13 (ie13 inaccurate13 disparity13 estimates)13 The13 optimal13 filters13 appear13 to13 strike13 this13 balance13 13 13 Evaluating13 the13 effect13 of13 depth13 variations13 other13 than13 slant13 Naturally13 occurring13 fine13 depth13 structure13 and13 occlusions13 (ie13 within-shy‐patch13 depth13 variation13 and13 depth13 discontinuities13 Fig13 S8a-shy‐c)13 were13 not13 present13 in13 either13 of13 our13 training13 sets13 To13 determine13 how13 the13 magnitude13 of13 these13 other13 sources13 of13 depth13 variation13 compare13 to13 that13 due13 to13 slant13 we13 analyzed13 4000013 113 deg13 patches13 from13 3013 outdoor13 range13 images13 obtained13 with13 a13 high-shy‐precision13 Riegl13 VZ40013 range13 scanner13 For13 each13 patch13 we13 computed13 the13 standard13 deviation13 of13 i)13 the13 differences13 between13 the13 raw13 range13 values13 and13 the13 best-shy‐fitting13 fronto-shy‐parallel13 plane13

σ nonminus planar 13 and13 ii)13 the13 differences13

between13 the13 best-shy‐fitting13 slanted13 and13 fronto-shy‐parallel13 planes13 σ planar 13 The13 ratio13 of13 these13 standard13

deviations13 σ planar σ nonminus planar 13 provides13 a13 measure13 of13 the13 relative13 magnitude13 of13 depth13 variation13 due13 to13

slant13 and13 variation13 due13 to13 other13 factors13 If13 the13 depth13 variation13 is13 due13 only13 to13 surface13 slant13 then13 the13 ratio13

σ planar σ nonminus planar 13 will13 have13 a13 value13 equal13 to13 1013 If13 all13 of13 the13 depth13 variation13 is13 due13 to13 factors13

other13 than13 slant13 the13 ratio13 will13 have13 a13 value13 of13 0013 Fig13 S8d13 plots13 the13 distribution13 of13 this13 ratio13 for13 all13 range13 patches13 In13 the13 most13 common13 patches13 essentially13 all13 the13 variation13 is13 captured13 by13 slant13 (ie13 the13 modal13 ratio13 is13 ~10)13 The13 median13 ratio13 is13 07513 indicating13 that13 in13 the13 median13 patch13 7513 of13 the13 depth13 variation13 is13 captured13 by13 slant13 Given13 that13 the13 optimal13 binocular13 filters13 are13 robust13 to13 variation13 in13 surface13 slant13 and13 given13 that13 most13 of13 natural13 depth13 variation13 is13 captured13 by13 slant13 it13 is13 unlikely13 that13 depth13 variation13 other13 than13 slant13 will13 qualitatively13 affect13 our13 results13 13 External13 variability13 and13 neural13 noise13 13 We13 examined13 the13 relative13 magnitude13 of13 stimulus-shy‐induced13 LL13 neuron13 variability13 and13 typical13 neural13 noise13 in13 cortex13 Typical13 cortical13 neurons13 have13 a13 peak13 mean13 response13 larger13 than13 3013 spikessec13 (Geisler13 amp13 Albrecht13 1997)13 and13 intrinsic13 noise13 variance13 proportional13 to13 the13 neuronrsquos13 mean13 response13 with13 a13 proportionality13 constant13 (ie13 Fano13 factor)13 less13 than13 15(Tolhurst13 Movshon13 amp13 Dean13 1983)13 To13 err13 on13 the13 side13 of13 too13 much13 neural13 noise13 we13 assume13 that13 each13 LL13 neuron13 has13 a13 maximum13 average13 response13 of13 3013 spkss13 and13 a13 Fano13 factor13 of13 1513 (bc13 the13 effect13 of13 noise13 is13 greater13 at13 lower13 firing13 rates)13 We13 convert13 the13 LL13 neuron13 responses13 to13 spike13 rates13 by13 first13 obtaining13 the13 average13 response13 of13 each13 LL13 neuron13 to13 its13 preferred13 disparity13 and13 then13 scaling13 that13 value13 to13 the13 maximum13 average13 response13 For13 an13 integration13 time13 of13 a13 typical13 fixation13 duration13 (30013 ms)13 stimulus-shy‐induced13 response13 variance13 was13 ~3x13 greater13 than13 the13 internal13 noise13 For13 an13 integration13 time13 equal13 to13 a13 typical13 stimulus13 presentation13 in13 a13 psychophysical13 task13 (113 sec)13 13 stimulus-shy‐induced13 response13 variance13 was13 ~9x13 greater13 than13 the13 internal13 noise13 Thus13 external13 variability13 must13 be13 considered13 to13 understand13 the13 influence13 of13 neural13 noise13 on13 the13 behavioral13 limits13 of13 vision13 under13 natural13 conditions13 In13 fact13 stimulus-shy‐induced13 variability13 is13 likely13 to13 be13 the13 controlling13 variability13 in13 many13 natural13 single-shy‐trial13 tasks13 13 Spatial13 frequency13 tuning13 of13 optimal13 linear13 binocular13 filters13 13 To13 develop13 an13 intuition13 for13 why13 spatial13 frequencies13 higher13 than13 ~613 cpd13 carry13 little13 information13 about13 disparity13 we13 examined13 the13 disparity13 signals13 resulting13 from13 a13 binocularly-shy‐viewed13 high-shy‐contrast13 cosine-shy‐windowed13 edge13 The13 Fourier-shy‐decomposition13 of13 each13 eyersquos13 stimulus13 is13 a13 set13 of13

513 13

phase-shy‐aligned13 sinewave13 gratings13 with13 the13 1f13 contrast13 fall-shy‐off13 typical13 of13 natural13 scenes(Field13 1987)13 The13 binocular13 difference13 signal13 (ie13 the13 difference13 of13 the13 left13 and13 right13 signals)13 is13 sinusoidal13 with13 contrast13 amplitude13 given13 by13 13

AB f δ k( ) = AL( f )2 + AR( f )2 minus 2AL( f )AR( f )cos 2π fδ k( ) 13 13 13 (S6)13

13 where f 13 is13 the13 spatial13 frequency13 δ k 13 is13 a13 particular13 disparity13 AL ( f ) 13 and13 AR( f ) 13 are13 the13 left13 and13

right13 eye13 retinal13 amplitudes13 and13 AB f δ k( ) 13 is13 the13 amplitude13 of13 the13 binocular13 contrast13 difference13 signal13 If13 the13 two13 eyes13 have13 identical13 optics13 the13 left-shy‐13 and13 right-shy‐eye13 retinal13 amplitudes13 (which13 include13 the13 effects13 of13 the13 optics13 and13 the13 1 f 13 falloff13 of13 natural13 image13 spectra)13 will13 be13 the13 same13 for13 a13 given13 disparity13 Fig13 S7a13 plots13 pattern13 of13 binocular13 contrast13 signals13 for13 each13 of13 several13 disparities13 Fig13 S7b13 shows13 the13 signals13 after13 filtering13 with13 a13 bank13 of13 log-shy‐Gabor13 filters13 each13 having13 a13 1513 octave13 bandwidth13 The13 patterns13 of13 binocular13 contrast13 barely13 differ13 above13 ~613 cpd13 A13 filter13 population13 (for13 disparity13 estimation)13 will13 be13 most13 efficient13 if13 each13 filter13 signals13 information13 about13 a13 range13 of13 disparities13 13 13 V113 binocular13 receptive13 fields13 tuned13 to13 the13 highest13 useful13 spatial13 frequency13 (~613 cpd)13 would13 have13 a13 spatial13 extent13 of13 ~813 arcmin13 assuming13 the13 typical13 1513 octave13 bandwidth13 of13 neurons13 in13 cortex13 Given13 that13 receptive13 fields13 cannot13 signal13 disparity13 variation13 finer13 than13 their13 own13 width13 this13 analysis13 may13 help13 explain13 why13 humans13 lose13 the13 ability13 to13 detect13 disparity13 modulations13 (ie13 sinusoidal13 modulations13 in13 disparity-shy‐defined13 depth)13 at13 spatial13 frequencies13 higher13 than13 ~413 cpd13 (Banks13 Gepshtein13 amp13 Landy13 200413 Harris13 McKee13 amp13 Smallman13 1997)13 Vergence13 eye13 movement13 jitter13 has13 previously13 been13 proposed13 as13 an13 explanation13 for13 why13 useful13 binocular13 information13 is13 restricted13 to13 low13 spatial13 frequencies13 (Vlaskamp13 Yoon13 amp13 Banks13 2011)13 vergence13 noise13 can13 degrade13 stereo13 information13 enough13 to13 render13 high13 frequency13 signals13 unmeasurable13 (Fig13 S7cd)13 The13 analysis13 presented13 here13 suggests13 that13 the13 frequency13 at13 which13 humans13 lose13 the13 ability13 to13 detect13 disparity13 modulation13 may13 simply13 be13 explained13 by13 the13 statistics13 of13 natural13 images13 13 Optics13 The13 retinal13 images13 used13 in13 our13 analysis13 were13 simulated13 to13 match13 the13 retinal13 image13 formation13 that13 occurs13 in13 the13 human13 eye13 In13 humans13 and13 non-shy‐human13 primates13 accommodative13 and13 vergence13 eye-shy‐movement13 systems13 are13 yoked(Fincham13 amp13 Walton13 1957)13 Consistent13 with13 this13 fact13 we13 set13 the13 focus13 distance13 equal13 to13 the13 fixation13 distance13 of13 4013 cm13 and13 defocused13 the13 images13 appropriate13 for13 each13 disparity13 The13 image13 of13 a13 surface13 displaced13 from13 the13 fixation13 is13 defocused13 by13

13 ΔD cong δ IPD 13 13 13 13 13 13 13 13 13 13 (S7)13

13 where13 δ 13 is13 the13 disparity13 of13 the13 surface13 expressed13 in13 radians13 IPD13 is13 the13 inter-shy‐pupillary13 distance13 expressed13 in13 meters13 and13 ΔD 13 is13 the13 defocus13 expressed13 in13 diopters13 (1meters)13 To13 account13 for13 the13 effect13 of13 chromatic13 aberration13 we13 created13 a13 different13 polychromatic13 point-shy‐spread13 function13 (PSF)13 for13 each13 disparity13 (ie13 defocus13 Eq13 S7)13 Single-shy‐wavelength13 PSFs13 were13 computed13 every13 1013 nm13 between13 40013 and13 70013 nm13 The13 wavelength-shy‐dependent13 change13 in13 refractive13 power13 of13 the13 human13 eye13 was13 taken13 from13 the13 literature13 (Thibos13 Ye13 Zhang13 amp13 Bradley13 1992)13 Polychromatic13 PSFs13 were13 obtained13 by13 weighting13 the13 single-shy‐wavelength13 PSFs13 by13 the13 photopic13 sensitivity13 function13 (Stockman13 amp13 Sharpe13 2000)13 and13 by13 the13 D6513 daylight13 illumination13 spectrum13 and13 then13 summing13 13

613 13

13 psf photopic xΔD( ) = 1

Kpsf xλΔD( )sc

λsum λ( )D65 λ( ) 13 13 13 13 (S8)13

13 where13 sc λ( ) 13 is13 the13 human13 photopic13 sensitivity13 function13 and13 K13 is13 a13 normalizing13 constant13 that13 sets13 the13 volume13 of13 psf photopic 13 to13 1013 The13 modulation13 transfer13 function13 (MTF)13 is13 the13 amplitude13 of13 the13 Fourier13 transform13 of13 the13 PSF13 13 Contrast13 normalization13 Contrast13 normalization13 occurs13 early13 in13 visual13 processing13 The13 standard13 model13 of13 cortical13 neuron13 response13 assumes13 that13 the13 input13 is13 the13 contrast13 signal13

c x( ) = r x( )minus r( ) r 13 where13 r x( ) 13 are13 the13

noisy13 sensor13 responses13 to13 the13 luminance13 image13 falling13 on13 the13 retina13 over13 a13 local13 area13 and13 r 13 is13 the13 local13 mean13 This13 contrast13 signal13 is13 then13 normalized13 by13 the13 local13 contrast13

cnorm x( ) = c x( ) c x( ) 2

+ nc502 13 where13 n13 is13 the13 dimensionality13 of13 the13 vector13 c x( ) 2 13 is13 the13 contrast13

energy13 (or13 equivalently13 n 13 times13 the13 squared13 RMS13 contrast)13 and13 50c 13 is13 the13 half-shy‐saturation13 constant13 (Albrecht13 amp13 Geisler13 199113 Albrecht13 amp13 Hamilton13 198213 Heeger13 1992)13 (We13 are13 agnostic13 about13 the13 mechanism13 by13 which13 the13 contrast13 normalization13 is13 achieved)13 Finally13 the13 normalized13 contrast13 signal13 is13 weighted13 by13 the13 cortical13 neuronrsquos13 receptive13 field13 and13 passed13 through13 a13 static13 non-shy‐linearity13 For13 example13 in13 the13 case13 of13 a13 binocular13 simple13 cell13 the13 response13 of13 the13 neuron13 is13 obtained13 by13 taking13 the13 dot13 product13 of13 the13 contrast-shy‐normalized13 signal13 with13 the13 receptive13 field13 and13 half-shy‐wave13 rectifying13

Ri = Rmax cnorm x( ) sdot fi x( )⎢⎣ ⎥⎦ 13 13

13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13

713 13

13 References13 Albrecht13 D13 G13 amp13 Geisler13 W13 S13 (1991)13 Motion13 selectivity13 and13 the13 contrast-shy‐response13 function13 of13

simple13 cells13 in13 the13 visual13 cortex13 Visual13 Neuroscience13 7(6)13 531ndash54613 Albrecht13 D13 G13 amp13 Hamilton13 D13 B13 (1982)13 Striate13 cortex13 of13 monkey13 and13 cat13 contrast13 response13

function13 Journal13 of13 Neurophysiology13 48(1)13 217ndash23713 Banks13 M13 S13 Gepshtein13 S13 amp13 Landy13 M13 S13 (2004)13 Why13 is13 spatial13 stereoresolution13 so13 low13 The13

Journal13 of13 Neuroscience13 24(9)13 2077ndash208913 13 Field13 D13 J13 (1987)13 Relations13 between13 the13 statistics13 of13 natural13 images13 and13 the13 response13 properties13 of13

cortical13 cells13 Journal13 of13 the13 Optical13 Society13 of13 America13 A13 4(12)13 2379ndash239413 Fincham13 E13 amp13 Walton13 J13 (1957)13 The13 reciprocal13 actions13 of13 accommodation13 and13 convergence13 The13

Journal13 of13 Physiology13 137(3)13 488ndash50813 Geisler13 W13 S13 amp13 Albrecht13 D13 G13 (1997)13 Visual13 cortex13 neurons13 in13 monkeys13 and13 cats13 detection13

discrimination13 and13 identification13 Visual13 Neuroscience13 14(5)13 897ndash91913 Harris13 J13 M13 McKee13 S13 P13 amp13 Smallman13 H13 S13 (1997)13 Fine-shy‐scale13 processing13 in13 human13 binocular13

stereopsis13 Journal13 of13 the13 Optical13 Society13 of13 America13 A13 14(8)13 1673ndash168313 Heeger13 D13 J13 (1992)13 Normalization13 of13 cell13 responses13 in13 cat13 striate13 cortex13 Visual13 Neuroscience13 9(2)13

181ndash19713 Hibbard13 P13 B13 (2008)13 Binocular13 energy13 responses13 to13 natural13 images13 Vision13 Research13 48(12)13 1427ndash

143913 13 Stockman13 A13 amp13 Sharpe13 L13 T13 (2000)13 The13 spectral13 sensitivities13 of13 the13 middle-shy‐13 and13 long-shy‐wavelength-shy‐

sensitive13 cones13 derived13 from13 measurements13 in13 observers13 of13 known13 genotype13 Vision13 Research13 40(13)13 1711ndash173713

Thibos13 L13 N13 Ye13 M13 Zhang13 X13 amp13 Bradley13 A13 (1992)13 The13 chromatic13 eye13 a13 new13 reduced-shy‐eye13 model13 of13 ocular13 chromatic13 aberration13 in13 humans13 Applied13 Optics13 31(19)13 3594ndash360013

Tolhurst13 D13 J13 Movshon13 J13 A13 amp13 Dean13 A13 F13 (1983)13 The13 statistical13 reliability13 of13 signals13 in13 single13 neurons13 in13 cat13 and13 monkey13 visual13 cortex13 Vision13 Research13 23(8)13 775ndash78513

Vlaskamp13 B13 N13 S13 Yoon13 G13 amp13 Banks13 M13 S13 (2011)13 Human13 stereopsis13 is13 not13 limited13 by13 the13 optics13 of13 the13 well-shy‐focused13 eye13 The13 Journal13 of13 Neuroscience13 31(27)13 9814ndash981813

13

813 13

13 Figure13 S113 Optimal13 filters13 and13 disparity13 estimation13 performance13 for13 estimating13 disparity13 for13 surfaces13 with13 a13 cosine13 distribution13 of13 slants13 (a)13 Optimal13 linear13 binocular13 filters13 for13 estimating13 the13 disparity13 (of13 the13 center13 pixel)13 of13 slanted13 surfaces13 The13 filters13 were13 learned13 on13 a13 training13 set13 having13 surfaces13 with13 a13 cosine13 distribution13 of13 slants13 The13 optimal13 filter13 shapes13 are13 robust13 to13 the13 presence13 of13 non-shy‐zero13 surface13 slants13 in13 the13 training13 set13 However13 these13 filters13 are13 of13 somewhat13 smaller13 spatial13 extent13 and13 slightly13 higher13 octave13 bandwidth13 (compare13 filters13 F3-shy‐F513 to13 those13 in13 Fig13 3)13 (b)13 Spatial13 frequency13 tuning13 and13 bandwidth13 of13 filters13 (c)13 Two-shy‐dimensional13 analogs13 of13 filters13 in13 (a)13 The13 two-shy‐dimensional13 filters13 respond13 virtually13 identically13 to13 two-shy‐dimensional13 images13 as13 do13 the13 one-shy‐dimensional13 filters13 to13 one-shy‐dimensional13 signals13 (d)13 Tuning13 curves13 of13 LL13 neurons13 trained13 and13 tested13 on13 surfaces13 with13 varying13 surface13 slant13 (e)13 Normalized13 weights13 on13 model13 complex13 cell13 responses13 for13 constructing13 five13 LL13 neurons13 with13 the13 indicated13 preferred13 disparities13 (f)13 Disparity13 estimation13 performance13 with13 surfaces13 having13 a13 cosine13 distribution13 of13 slants13 (black13 line)13 Also13 shown13 are13 disparity13 estimates13 derived13 from13 a13 local13 cross-shy‐correlator13 (gray13 line)13 (g)13 Disparity13 gradients13 from13 viewing13 natural13 scenes(Hibbard13 2008)13 vs13 disparity13 gradients13 in13 our13 training13 set13 The13 relationship13 between13 slant13 and13 disparity13 gradient13 is13 given13 by13

g asymp IPD tanθ( ) d 13 where13 θ 13 is13 the13 surface13 slant13 and13 d13 is13 the13 distance13 to13 the13 surface13 Hibbard13 (2008)13 estimated13 disparity13 gradients13 in13 natural13 viewing13 for13 distances13 nearer13 than13 3513 m13 (dashed13 curve)13 and13 0513 m13 (dotted13 curve)13 At13 the13 04m13 viewing13 distance13 assumed13 throughout13 the13 paper13 our13 manipulation13 of13 surface13 slant13 introduced13 distribution13 of13 disparity13 gradients13 (black13 line)13 that13 is13 comparable13 to13 the13 gradients13 introduced13 in13 natural13 viewing13 13

913 13

13 Figure13 S213 Filter13 responses13 LL13 neuron13 responses13 and13 performance13 with13 random-shy‐dot13 stereograms13 and13 anti-shy‐correlated13 random-shy‐dot13 stereograms13 (a)13 Joint13 response13 distributions13 of13 filters13 F113 and13 F213 when13 stimulated13 with13 random-shy‐dot13 stereograms13 Compare13 to13 natural13 stimuli13 in13 Fig13 4a13 (b)13 Joint13 responses13 of13 filters13 F313 and13 F413 when13 stimulated13 with13 random-shy‐dot13 stereograms13 (c)13 LL13 neuron13 responses13 to13 random13 dot13 stereograms13 (d)13 LL13 neuron13 tuning13 vs13 bandwidth13 when13 stimulated13 with13 natural13 stimuli13 Note13 that13 bandwidths13 are13 elevated13 compared13 to13 when13 the13 LL13 neurons13 are13 stimulated13 with13 natural13 stimuli13 (see13 Fig13 7c)13 (e)13 Disparity13 estimates13 and13 estimate13 precision13 (inset)13 obtained13 with13 random-shy‐dot13 stereograms13 (fghi)13 Same13 as13 abce13 except13 that13 the13 stimuli13 were13 anti-shy‐correlated13 random13 dot13 stereograms13 Filter13 response13 distributions13 still13 segregate13 as13 a13 function13 of13 disparity13 (especially13 with13 F313 and13 F4)13 However13 the13 anti-shy‐correlated13 filter13 response13 distributions13 do13 not13 coincide13 with13 the13 lsquoexpectedrsquo13 pattern13 of13 response13 for13 particular13 disparities13 The13 disparities13 are13 decoded13 improperly13 with13 a13 decoder13 trained13 on13 natural13 stimuli13 LL13 neuron13 tuning13 curves13 with13 anti-shy‐correlated13 stereograms13 are13 neither13 selective13 nor13 invariant13 Inaccurate13 highly13 variable13 estimates13 result13 13 13

13 13

1013 13

13 Figure13 S313 Obtaining13 AMA13 filter13 and13 model13 complex13 cell13 responses13 with13 established13 neural13 operations13 (a)13 Simple13 cell13 responses13 are13 obtained13 by13 projecting13 the13 left-shy‐13 and13 right-shy‐eye13 signals13 onto13 each13 binocular13 filter13 and13 half-shy‐wave13 rectifying13 Subtracting13 lsquoonrsquo13 and13 lsquooffrsquo13 simple13 cell13 responses13 give13 the13 AMA13 filter13 responses13 (see13 Fig13 3a)13 (b)13 Model13 (and13 neurophysiological)13 complex13 cell13 responses13 are13 obtained13 by13 summing13 and13 squaring13 the13 responses13 of13 simple13 cells13 (Model13 complex13 cell13 responses13 can13 also13 be13 obtained13 by13 summing13 the13 responses13 of13 simple13 cells13 that13 have13 both13 a13 half-shy‐wave13 rectifying13 and13 a13 squaring13 nonlinearity)13 Note13 that13 the13 model13 complex13 cells13 differ13 from13 canonical13 disparity13 energy13 binocular13 complex13 cells13 We13 label13 them13 lsquocomplexrsquo13 because13 they13 exhibit13 temporal13 frequency13 doubling13 and13 have13 a13 fundamental13 to13 DC13 response13 ratio13 of13 less13 than13 one13 The13 model13 LL13 neurons13 (see13 Fig13 613 and13 Fig13 7)13 are13 more13 similar13 to13 disparity13 energy13 complex13 cells13 although13 as13 noted13 in13 the13 text13 there13 are13 also13 important13 differences13

13

1113 13

13 Figure13 S413 The13 eight13 AMA13 filters13 from13 Fig13 313 (on13 the13 diagonal)13 and13 their13 pairwise13 sums13 (see13 Fig13 S3a)13 Solid13 and13 dashed13 curves13 indicate13 the13 left13 and13 right13 eye13 filter13 components13 respectively13 Filter13 identity13 is13 indicated13 by13 the13 subscripts13 Fij13 Some13 filters13 have13 shapes13 similar13 to13 canonical13 V1-shy‐like13 binocular13 receptive13 fields13 These13 filters13 yield13 model13 complex13 cell13 tuning13 curves13 that13 vary13 with13 disparity13 (Fig13 S5)13 Other13 filters13 are13 irregularly13 shaped13 (eg13 F36)13 Irregularly13 shaped13 filters13 yield13 model13 complex13 cell13 tuning13 curves13 that13 do13 not13 vary13 significantly13 with13 disparity13 (Fig13 S5)13 The13 inset13 shows13 the13 largest13 absolute13 normalized13 weight13 (see13 Supplement13 Eqs13 S1a-shy‐d)13 given13 to13 each13 filter13 in13 constructing13 the13 set13 of13 LL13 neurons13 (Fig13 7a)13 Lighter13 colors13 indicate13 smaller13 weights13 Unsurprisingly13 the13 irregularly13 shaped13 filters13 (eg13 F36)13 play13 a13 generally13 smaller13 role13 in13 shaping13 the13 response13 properties13 of13 log-shy‐likelihood13 (LL)13 neurons13 13 13

1213 13

13 Figure13 S513 Disparity13 tuning13 curves13 of13 the13 model13 complex13 cells13 implied13 by13 the13 AMA13 filters13 (on-shy‐diagonal)13 and13 their13 normalized13 pairwise13 sums13 (off-shy‐diagonal)13 Solid13 curves13 show13 the13 complex13 cell13 tuning13 curves13 to13 natural13 stimuli13 The13

complex13 response13 is13 given13 by13 filtering13 the13 contrast-shy‐normalized13 stimulus13 and13 then13 squaring13 Rmax fi

Tcnorm( )2 13 (see13 Fig13 S4)13 Each13 point13 on13 the13 curve13 represents13 the13 average13 response13 to13 a13 large13 number13 of13 natural13 stimuli13 having13 the13 same13 disparity13 Gray13 area13 shows13 response13 variability13 due13 to13 variation13 in13 natural13 image13 content13 Dashed13 curves13 show13 the13 complex13 cell13 tuning13 curve13 to13 513 random-shy‐dot13 stereograms13 (RDSs)13 In13 most13 cases13 RDS13 tuning13 curves13 are13 similar13 to13 the13 tuning13 curves13 for13 natural13 stimuli13 although13 response13 magnitude13 is13 generally13 somewhat13 attenuated13 These13 individual13 model13 complex13 cells13 have13 some13 selectivity13 for13 disparity13 However13 they13 exhibit13 little13 invariance13 to13 variation13 in13 natural13 image13 content13 Thus13 the13 pattern13 of13 the13 joint13 population13 response13 must13 be13 used13 to13 accurately13 estimate13 disparity13 Neurons13 that13 signal13 the13 log-shy‐likelihood13 of13 disparity13 (LL13 neurons)13 can13 be13 constructed13 by13 appropriately13 combining13 the13 joint13 responses13 of13 model13 complex13 cells13 LL13 neurons13 are13 highly13 selective13 for13 disparity13 and13 are13 also13 strongly13 response13 invariant13 to13 irrelevant13

1313 13

13 Figure13 S613 Optimal13 disparity13 estimation13 vs13 estimation13 via13 local13 cross-shy‐correlation13 Performance13 measures13 were13 obtained13 by13 testing13 each13 method13 on13 identical13 sets13 of13 left-shy‐13 and13 right-shy‐eye13 signals13 like13 those13 shown13 in13 Fig13 1d13 (Note13 that13 this13 differs13 somewhat13 from13 the13 most13 common13 test13 of13 the13 cross-shy‐correlation13 model13 which13 gives13 the13 algorithm13 windowed13 access13 to13 many13 patches13 in13 each13 eye13 We13 chose13 the13 current13 method13 so13 that13 each13 algorithm13 has13 access13 to13 the13 same13 image13 information13 thereby13 making13 the13 comparison13 meaningful)13 Black13 curves13 indicate13 estimation13 performance13 via13 local13 cross-shy‐correlation13 Gray13 curves13 indicate13 performance13 obtained13 with13 the13 eight13 AMA13 filters13 shown13 in13 the13 main13 text13 (data13 replotted13 from13 Fig13 5)13 (a)13 Median13 MAP13 estimates13 for13 frontoparallel13 surfaces13 only13 Error13 bars13 indicate13 6813 confidence13 intervals13 (b)13 Estimate13 precision13 as13 indicated13 by13 6813 confidence13 intervals13 Error13 bars13 from13 (a)13 are13 replotted13 on13 a13 linear13 axis13 (Confidence13 intervals13 were13 plotted13 on13 a13 semi-shy‐log13 plot13 in13 the13 main13 text)13 (c)13 Sign13 identification13 performance13 with13 frontoparallel13 surfaces13 (de)13 Same13 as13 bc13 except13 that13 the13 performance13 is13 for13 surfaces13 with13 a13 cosine13 distribution13 of13 slants13 Estimation13 performance13 with13 the13 cross-shy‐correlator13 is13 (slightly)13 more13 biased13 and13 is13 significantly13 less13 precise13 at13 disparities13 larger13 than13 7513 arcmin13 13

1413 13

13 Figure13 S713 Information13 for13 disparity13 estimation13 is13 concentrated13 in13 low13 to13 mid13 spatial13 frequencies13 (a)13 Inter-shy‐ocular13 retinal13 contrast13 amplitude13 of13 a13 cosine-shy‐windowed13 high-shy‐contrast13 edge13 as13 a13 function13 of13 spatial13 frequency13 for13 different13 disparities13 (Eq13 S6)13 The13 optics13 in13 the13 two13 eyes13 were13 identical13 ie13

AL f( ) = AR f( )prop MTFδ f( ) f 13 where13 MTFδ is13 the13 modulation13 transfer13 function13 associated13 with13 a13 particular13 disparity13 (see13 Methods)13 (b)13 Inter-shy‐ocular13 contrast13 signals13 that13 have13 been13 passed13 through13 a13 bank13 of13 1513 octave13 bandwidth13 log-shy‐Gabor13 filters13 The13 signals13 differ13 little13 above13 ~513 cpd13 (c)13 The13 effect13 of13 vergence13 jitter13 on13 inter-shy‐ocular13 contrast13 signals13 Signals13 were13 corrupted13 by13 213 arcmin13 of13 vergence13 jitter13 a13 typical13 value13 in13 humans13 Vergence13 noise13 was13 simulated13 by13 convolving13 each13 disparity13 signal13 with13 a13 Gaussian13 having13 a13 standard13 deviation13 of13 213 arcmin13 Vergence13 jitter13 further13 attenuates13 high13 frequency13 inter-shy‐ocular13 contrast13 signals13 Natural13 image13 structure13 and13 vergence13 jitter13 both13 contribute13 to13 the13 lack13 of13 disparity13 information13 at13 high13 luminance13 spatial13 frequencies13 (d)13 The13 signals13 in13 (c)13 after13 having13 been13 passed13 through13 a13 bank13 of13 1513 octave13 bandwidth13 log-shy‐Gabor13 filters13 (e)13 The13 effect13 of13 a13 four13 diopter13 difference13 in13 refractive13 power13 between13 the13 two13 eyes13 drastically13 reduces13 the13 inter-shy‐ocular13 contrast13 signal13 A13 four13 diopter13 (or13 larger)13 difference13 between13 the13 left13 and13 right13 optics13 is13 a13 primary13 risk13 factor13 for13 the13 development13 of13 amblyopia13 (Levi13 et13 al13 2011)13 Note13 that13 these13 signals13 were13 simulated13 for13 eyes13 with13 2mm13 pupils13 With13 larger13 pupil13 diameters13 the13 same13 difference13 in13 refractive13 power13 would13 further13 reduce13 available13 disparity13 signals13 For13 example13 with13 a13 2mm13 pupil13 an13 813 diopter13 difference13 between13 the13 eyes13 produces13 the13 same13 difference13 in13 defocus13 blur13 as13 would13 a13 4mm13 pupil13 with13 a13 413 diopters13 difference13 between13 the13 eyes13 In13 that13 case13 there13 are13 no13 measureable13 signals13 above13 113 cpd13 (f)13 The13 signals13 in13 (e)13 after13 having13 been13 passed13 through13 a13 bank13 of13 1513 octave13 bandwidth13 log-shy‐Gabor13 filters13 (g)13 Same13 as13 (e)13 but13 with13 a13 4mm13 pupil13 (h)13 Same13 as13 (f)13 but13 with13 a13 4mm13 pupil13 13

13

1513 13

13 Figure13 S813 Depth13 structure13 not13 attributable13 to13 slant13 in13 natural13 scenes13 (a)13 A13 representative13 natural13 range13 image13 (b)13 Overhead13 plan13 view13 of13 the13 range13 along13 the13 white13 scan-shy‐line13 in13 (a)13 (c)13 Range13 values13 within13 a13 113 deg13 patch13 marked13 by13 the13 white13 box13 in13 (a)13 and13 the13 black13 box13 in13 (b)13 The13 dots13 represent13 the13 raw13 range13 values13 The13 plane13 is13 the13 best13 fitting13 plane13 to13 the13 range13 data13 (d)13 Depth13 variation13 in13 113 deg13 patches13 of13 natural13 scenes13 can13 largely13 be13 captured13 by13 slanted13 planes13 (see13 supplementary13 text)13 Values13 near13 one13 indicate13 that13 all13 range13 variation13 can13 be13 captured13 by13 (planar)13 surface13 slant13 Values13 near13 zero13 indicate13 that13 none13 of13 the13 range13 variation13 can13 be13 captured13 by13 planar13 surface13 slant13 In13 the13 most13 common13 patch13 nearly13 all13 depth13 variation13 is13 attributable13 to13 slant13 In13 the13 median13 patch13 7513 of13 depth13 variation13 is13 captured13 by13 slant13 13 13 13 13 13 13 13 13 13 13

1613 13

13 Figure13 S913 Contrast13 normalization13 and13 the13 Gaussian13 form13 of13 the13 filter13 response13 distributions13 The13 standard13 model13 of13

contrast13 normalization13 in13 retinal13 and13 cortical13 neurons13 is13 given13 by13 cnorm x( ) = c x( ) c x( ) 2

+ nc502 (Albrecht13 amp13 Geisler13

199113 Albrecht13 amp13 Hamilton13 198213 Heeger13 1992)13 Throughout13 the13 paper13 we13 assumed13 that13 c50 13 had13 a13 value13 of13 zero13 To13 examine13 the13 effect13 of13 different13 values13 of13 c50 13 on13 the13 form13 of13 the13 filter13 response13 distributions13 we13 first13 contrast-shy‐normalized13 the13 test13 stimuli13 using13 different13 values13 of13 c50 13 Then13 we13 projected13 the13 contrast-shy‐normalized13 the13 test13 stimuli13 from13 each13 disparity13 level13 onto13 the13 filters13 to13 obtain13 the13 conditional13 filter13 response13 distributions13 Finally13 we13 fit13 a13 multi-shy‐dimensional13 generalized13 Gaussian13 (one-shy‐dimension13 for13 each13 filter)13 to13 each13 conditional13 response13 distribution13

P R |δ k( ) 13 via13 maximum13 likelihood13 estimation13 (a)13 When13 c50 =13 0013 the13 conditional13 response13 distributions13 have13 tails13 that13 tend13 to13 be13 somewhat13 lighter13 than13 Gaussians13 (ie13 lower13 kurtosis13 than13 a13 Gaussian)13 When13 c50 =13 0113 the13 conditional13 response13 distributions13 are13 most13 Gaussian13 on13 average13 For13 c50 0113 the13 distributions13 have13 heavier13 tails13 than13 Gaussians13 (ie13 higher13 kurtosis13 than13 a13 Gaussian)13 (b)13 Powers13 of13 the13 multi-shy‐dimensional13 generalized13 Gaussians13 that13 best13 fit13 a13 subset13 of13 the13 two-shy‐dimensional13 conditional13 filter13 response13 distributions13 A13 power13 of13 two13 (dashed13 line)13 indicates13 that13 the13 best13 fitting13 distribution13 was13 Gaussian13 Powers13 greater13 than13 or13 less13 than13 two13 indicate13 that13 the13 responses13 are13 better13 fit13 by13 distributions13 having13 lighter13 or13 heavier13 tails13 than13 a13 Gaussian13 respectively13 The13 best-shy‐fitting13 powers13 are13 shown13 for13 three13 different13 values13 of13 c50 13 00013 01013 and13 02513 The13 average13 powers13 across13 the13 conditional13 response13 distributions13 for13 the13 three13 tested13 values13 of13 c50 13 were13 27913 20713 and13 11713 respectively13 The13 same13 qualitative13 results13 hold13 for13 the13 other13 two-shy‐dimensional13 response13 distributions13 The13 pattern13 also13 holds13 for13 higher-shy‐dimensional13 filter13 response13 distributions13 In13 higher13 dimensions13 however13 results13 are13 noisier13 because13 stable13 estimates13 of13 kurtosis13 require13 significantly13 more13 data13 13 13 13 13 13

  • jovi-14-02-01_118
    • Introduction
    • f01
    • Methods and results
    • e01
    • f02
    • e02
    • e03
    • e04
    • f03
    • e05
    • f04
    • f05
    • f06
    • f07
    • Discussion
    • Conclusions
    • Methods details
    • e06
    • e07
    • Albrecht1
    • Albrecht2
    • Anzai1
    • Badcock1
    • Banks1
    • Berens1
    • Blakemore1
    • Burge1
    • Burge2
    • Carandini1
    • Chen1
    • Cormack1
    • Cumming1
    • Cumming2
    • Cumming3
    • Curcio1
    • DeValois1
    • DeAngelis1
    • Ferster1
    • Fincham1
    • Fleet1
    • Ganguli1
    • Geisler1
    • Girshick1
    • Haefner1
    • Haefner2
    • Harris1
    • Hecht1
    • Heeger1
    • Hibbard1
    • Hoyer1
    • Jazayeri1
    • Kaye1
    • Landers1
    • Lehky1
    • Levi1
    • Lewicki1
    • Li1
    • Liu1
    • Ma1
    • Marcos1
    • Marr1
    • McKee1
    • Nakayama1
    • Nienborg1
    • Ogle1
    • Ohzawa1
    • Ohzawa2
    • Ohzawa3
    • Olshausen1
    • Panum1
    • Poggio1
    • Qian1
    • Qian2
    • Read1
    • Read2
    • Read3
    • Simoncelli1
    • Skottun1
    • Stevenson1
    • Stockman1
    • Tanabe1
    • Tanabe2
    • Thibos1
    • Tsao1
    • Tyler1
    • Tyler2
    • Vlaskamp1
    • Williams1
    • Wyszecki1
      • JOV-03741-2013-s01
Page 12: Optimal disparity estimation in natural stereo images · The local differences between the retinal images—the binocular disparities— ... vertically oriented receptive fields.The

the images in the two eyes Local cross-correlation isthe computational equivalent of appropriately pro-cessing large populations of disparity-energy complexcells having a wide range of tuning characteristics(Anzai Ohzawa amp Freeman 1999 Fleet Wagner ampHeeger 1996) Cross-correlation is optimal when thedisparities across a patch are small and constantbecause it is then akin to the simple template matchingof noise-limited ideal observers Cross-correlation isnot optimal for larger or varying disparities becausethen the left- and right-eye signals falling within abinocular receptive field often differ substantially

How does our ideal observer for disparity estimationcompare with the standard disparity-energy and cross-correlation models To enable a meaningful compari-son both methods were given access to the same imageinformation Doing so requires a slight modification tothe standard cross correlation algorithm (seeSupplement) The eight AMA filters in Figure 3outperform cross-correlation in both accuracy andprecision when the disparities exceed approximately75 arcmin (Figure S6) This performance increase isimpressive given that local cross-correlation has (ineffect) access to a large bank of filters whereas ourmethod uses only eight Furthermore increasing thenumber of AMA filters to 10ndash12 gives essentiallyequivalent performance to cross-correlation for smallerdisparities

Spatial frequency tuning of optimal linearbinocular filters

The optimal linear binocular filters (the AMA filters)are selective for low to mid spatial frequencies (12ndash47c8) and not for higher frequencies (see Figure 3 FigureS1) To develop an intuition for why we examined thebinocular contrast signals as a function of frequencythat result from a binocularly viewed high-contrastedge (Figure S7) The binocular contrast signals barelydiffer above 6 c8 indicating that higher spatialfrequencies carry little information for disparity esti-mation

This analysis of binocular contrast signals providesinsight into another puzzling aspect of disparityprocessing Human sensitivity to disparity modulation(ie sinusoidal modulations in disparity-defined depth)cuts off at very low frequencies (4 c8) (Banks et al2004 Tyler 1975) V1 binocular receptive fields tunedto the highest useful (luminance) spatial frequency (6c8) have widths of 8 arcmin assuming the typicaloctave bandwidth for cortical neurons Given thatreceptive fields cannot signal variation finer than theirown size the fact that humans cannot see disparitymodulations higher than 4 c8 is nicely predicted bythe 8 arcmin width suggested by the present analysis

This size matches previous psychophysically basedestimates of the smallest useful mechanism in disparityestimation (Banks et al 2004 Harris McKee ampSmallman 1997)

Eye movement jitter has previously been proposed asan explanation for why useful binocular information isrestricted to low spatial frequencies (Vlaskamp Yoonamp Banks 2011) The argument is that jitter and therelatively long integration time of the stereo systemcould lsquolsquosmear outrsquorsquo the disparity signals Eye movementjitter certainly can degrade stereo information enoughto render high frequency signals unmeasurable (FigureS7c d) However the analysis in the supplement(Figure S7) suggests that the low frequency at whichhumans lose the ability to detect disparity modulationmay also be explained by the statistics of naturalimages

Generality of findings

Although our analysis is based on calibrated naturalstereo-image patches and realistic characterizations ofhuman optics sensors and neural noise there are somesimplifying assumptions that could potentially limit thegenerality of our conclusions Here we examine theeffect of these assumptions

The effect of surface slant and nonplanar depth structure

The stereo-image patches were projections offronto-parallel surfaces In natural scenes surfaces areoften slanted which causes the disparity to changeacross a patch We evaluated the effect of surface slantby repeating our analysis with a training set having adistribution of slants (see Figure 5d e andSupplement) This distribution of slants produced adistribution of disparity gradients that are comparableto those that occur when viewing natural scenes(Hibbard 2008) (Figure S1g) The optimal binocularfilters are only very slightly affected by surface slant(cf Figure 3 and Figure S1a) These differenceswould be difficult to detect in neurophysiologicalexperiments This may help explain why there hasbeen little success in finding neurons in early visualcortex that are tuned to nonzero slants (Nienborg etal 2004) Disparity estimation performance is alsoquite similar although estimate precision is somewhatreduced (Figure 5d e)

Even after including the effect of surface slant ourtraining set lacks the non-planar depth structuremdashwithin-patch depth variation and depth discontinuities(occlusions)mdashthat is present in natural scenes (FigureS8andashc) To determine how the magnitude of theseother sources depth variation compare to that due toslant we analyzed a set of range images obtained with

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 12

a high-precision Riegl VZ400 range scanner (seeSupplement) We find that most of the depth variationin the range images is captured by variation in slant(Figure S8d) Given that variation in slant has littleeffect on the optimal receptive fields and given thatlocations of fine depth structure and monocularocclusion zones (9 of points in our dataset) aremost likely random with respect to the receptive fieldsthen it is likely that the other sources of variation willhave little effect

Nevertheless occlusions create monocular zones thatcarry information supporting da Vinci stereopsis inhumans (Kaye 1978 Nakayama amp Shimojo 1990)Neurophysiological evidence suggests that binocularmechanisms exist for locating occlusion boundaries(Tsao Conway amp Livingstone 2003) In the futurewhen databases become available of high-qualitynatural stereo images with coregistered RGB anddistance information the general approach describedhere may prove useful for determining the filters thatare optimal for detecting occlusion boundaries

The effect of a realistic disparity prior

Our analysis assumed a flat prior probabilitydistribution over disparity Two groups have recentlypublished estimates of the prior distribution ofdisparities based on range measurements in naturalscenes and human eye movement statistics (Hibbard2008 Liu et al 2008) Does the prior distribution ofdisparity signals have a significant effect on the optimalfilters and estimation performance To check wemodified the frequency of different disparities in ourtraining set to match the prior distribution ofdisparities encountered in natural viewing (Liu et al2008) Linear filter shapes LL neuron tuning curveshapes and performance levels were robust to differ-ences between a flat and realistic disparity prior

It has been hypothesized that the prior over thestimulus dimension of interest (eg disparity) maydetermine the optimal tuning curve shapes and theoptimal distribution of peak tunings (ie how thetuning curves should tile the stimulus dimension)(Ganguli amp Simoncelli 2010) In evaluating thishypothesis one must consider the results presented inthe present paper Our results show that natural signalsand the linear receptive fields that filter those signalsplace strong constraints on the shapes that tuningcurves can have Specifically the shapes of the LLneuron disparity tuning curve (approximately log-Gaussian) are robust to changes in the prior Thusalthough the prior may influence the optimal distribu-tion of peak tuning (our analysis does not address thisissue) it is unlikely to be the sole (or even the primary)determinant of tuning curve shapes

The effect of contrast normalization

Contrast normalization significantly contributes tothe Gaussian form of the filter response distributionsOne potential advantage of Gaussian response distri-butions is that pair-wise (and lower order) statisticsfully characterize the joint responses from an arbitrarynumber of filters making possible the decoding of largefilter population responses In the analysis presentedhere all binocular signals were normalized to a mean ofzero and a vector magnitude of 10 before beingprojected onto the binocular filters This is a simplifiedform of the contrast normalization that is a ubiquitousfeature of retinal and early cortical processing (Car-andini amp Heeger 2012) The standard model ofcontrast normalization in cortical neurons is given by

cnorm(x)frac14 c(x)ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffijjcethxTHORNjj2 thorn nc2

50

qwhere n is the dimen-

sionality of the vector and c50 is the half-saturationconstant (Albrecht amp Geisler 1991 Albrecht ampHamilton 1982 Heeger 1992) (The half-saturationconstant is so-called because in a neuron with anoutput squaring nonlinearity the response rate willequal half its maximum when the contrast of thestimulus equals the value of c50)

We examined the effect of different half-saturationconstants The optimal filters are robust to differentvalues of c50 The conditional response distributions arenot For large values of c50 the response distributionshave tails much heavier than Gaussians When c50frac14 00(as it did throughout the paper) the distributions arewell approximated but somewhat lighter-tailed thanGaussians The distributions (eg see Figure 4a) aremost Gaussian on average when c50 frac14 01 (Figure S9)On the basis of this finding we hypothesize a newfunction for cortical contrast normalization in additionto the many already proposed (Carandini amp Heeger2012) Contrast normalization may help create condi-tional filter response distributions that are Gaussianthereby making simpler the encoding and decoding ofhigh-dimensional subspaces of retinal image informa-tion

The effect of different optics in the two eyes

We modeled the optics of the two eyes as beingidentical whereas refractive power and monochromaticaberrations often differ somewhat between the eyes(Marcos amp Burns 2000) To evaluate the effect ofnormal optical variations we convolved the retinalprojections with point-spread functions previouslymeasured in the first authorrsquos left and right eyes (Burgeamp Geisler 2011) (The first author has normal opticalvariation) Then we determined optimal filters andperformance Filters and performance levels are similarto those in Figures 3 and 4 Thus our results are robustto typical optical differences between the eyes How-

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 13

ever if the optics in one eye is severely degradedrelative to the other then the optimal filters are quitedifferent and disparity estimation performance issignificantly poorer

Interestingly a persistent 4 diopter difference inrefractive power between the two eyes is the mostimportant risk factor for the development of amblyopia(Levi McKee amp Movshon 2011) a condition char-acterized by extremely poor or absent stereopsis(among other deficits) A four diopter differencebetween the left and right eye optics drastically reduces(ie nearly eliminates) computable interocular contrastsignals that covary systematically with disparity(Equation S6 Figure S7endashh)

Conclusions

The method presented here provides a prescriptionfor how to optimally estimate local depth cues fromimages captured by the photosensors in any visionsystem Our analysis of the depth cue of binoculardisparity provides insight into the computations andneural populations required for accurate estimation ofbinocular disparity in animals viewing natural scenesSpecifically by analyzing the information available atthe photoreceptors we improve upon existing compu-tational methods for solving the stereo correspondenceproblem and provide a normative account of a range ofestablished neurophysiological and psychophysicalfindings This study demonstrates the power ofcharacterizing the properties of natural signals that arerelevant for performing specific natural tasks A similarrecent analysis provided insight into the neuralcomputations required for accurate estimation of thefocus error (defocus) in local regions of retinal imagesof natural scenes (Burge amp Geisler 2011 Stevenson etal 1992) The same approach seems poised to provideinsight into the neural encoding and decoding thatunderlie many other fundamental sensory and percep-tual tasks

Methods details

Natural disparity signals

To determine the luminance structure of naturalscenes we photographed natural scenes on and aroundthe University of Texas at Austin campus with a tripod-mounted Nikon D700 14-bit SLR camera (4256 middot 2836pixels) fitted with a Sigma 50 mm prime lens (Burge ampGeisler 2011) To ensure sharp photographs thecamera lens was focused on optical infinity and all

imaged objects were at least 16 m from the cameraRAW photographs were converted to luminance valuesusing the measured sensitivities of the camera (Burge ampGeisler 2011 2012) and the human photopic sensitivityfunction (Stockman amp Sharpe 2000) Twelve hundred256 middot 256 pixel patches were randomly selected from80 photographs (15 Patches middot 80 Images Figure 1b)Four hundred were used for training the other 800were used for testing

To approximate the depth structure of naturalscenes we texture map the luminance patches ontoplanar fronto-parallel or slanted surface patches thatwere straight ahead at eye height After perspectiveprojection this procedure yields the same pattern ofretinal stimulation that would be created by viewingphotographs pasted on surfaces (eg walls) differentlyslanted in depth The disparity of the surface patch wasdefined as the disparity of the surfacersquos central point

d frac14 2 tan1 IPD=2

dfixation thorn D

tan1 IPD=2

ethdfixationTHORN

eth6THORNwhere dfixation is the fixation distance to the surfacecenter IPD is the interpupillary distance and D is thedepth Depth is given by D frac14 dsurface dfixation wheredsurface is the distance to the surface The disparity ofother points on the surface patch varies slightly acrossthe surface patch The change in disparity across thesurface is greater when the surface patches are slanted

Each surface patch was either coincident with ordisplaced from the point of fixation (Figure 1c) that isthe surface patches were positioned at one of 37 depthscorresponding to 37 disparity levels within Panumrsquosfusional range (Panum 1858) (16875 to 16875arcmin in equally spaced steps) Each surface patchsubtended 28 of visual angle from the cyclopean eyeSurfaces were then slanted (or not) and left- and right-eye projections were determined via perspective pro-jection ensuring that horizontal and vertical disparitieswere correct (Figure 1c) Next left and right projec-tions were resampled at 128 samples8 approximatelythe sampling rate of the human foveal cones (CurcioSloan Kalina amp Hendrickson 1990) Finally theimages were cropped such that they subtended 18 fromeach eye (6 058 about the left- and right-eye foveae)

Optics

Patches were defocused with a polychromatic point-spread function based on a wave-optics model of thehuman visual system The model assumed a 2 mm pupil(a size typical for a bright sunny day) (Wyszecki ampStiles 1982) human chromatic aberrations (ThibosYe Zhang amp Bradley 1992) a single refracting

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 14

surface and the Fraunhoffer approximation whichimplies that at or near the focal plane the opticaltransfer function (OTF) is given by the cross-correla-tion of the generalized pupil function with its complexconjugate In humans and nonhuman primates ac-commodative and vergence eye-movement systems aretightly coupled In other words the focus distanceusually equals the fixation distance (Fincham ampWalton 1957) We set the focus distance equal to thefixation distance of 40 cm and defocused the imagesappropriate for each disparity See Supplement forfurther details of how the optics were simulated

Sensor array responses

The information for disparity estimation depends onthe depth and luminance structure of natural scenesprojective geometry the optics of the two eyes thesensor arrays and the sensor noise These factorstogether determine the pattern of noisy sensor re-sponses

reethxTHORN frac14 IeethxTHORNpsfeethxTHORNfrac12 sampethxTHORN thorn g eth7THORNfor each eye e Ie(x) represents an eye-specific lumi-nance image of the light striking the sensor array ateach location xfrac14 (x y) in a hypothetical optical systemthat causes zero degradation in image quality Eacheyersquos optics degrade the idealized image the optics arerepresented by a polychromatic point-spread functionpsfe(x) that contains the effects of defocus chromaticaberration and photopic wavelength sensitivity Thesensor arrays are represented by a spatial samplingfunctions samp(x) Finally the sensor responses arecorrupted by noise g the noise level was set just highenough to remove retinal image detail that is unde-tectable by the human visual system (Williams 1985)(In pilot studies we found that the falloff in contrastsensitivity at low frequencies had a negligible effect onresults for simplicity we did not model its effects) Notethat Equation 7 represents a luminance (photopic)approximation of the images that would be captured bythe photoreceptors This approximation is sufficientlyaccurate for the present purposes (Burge amp Geisler2011)

Accuracy maximization analysis (AMA)

AMA is a method for dimensionality reduction thatfinds the low-dimensional set of features in sensorysignals that are most useful for performing a specifictask The dependence of AMA on task renders itdistinct from many other popular methods for dimen-sionality reduction Principal components analysis(PCA) for example finds the dimensions that account

for the most variation in the sensory signals There isof course no guarantee that factors causing the mostvariation in the sensory signals will be useful for thetask at hand Color lighting and the multitude oftextures in natural images for example cause signifi-cant variation in the retinal images Yet all of thisvariation is irrelevant for the task of estimatingdisparity Thus PCA returns features that are notnecessarily useful to the task while AMA is specificallydesigned to ignore the irrelevant variation

The logic of AMA is as follows Consider encodingeach training stimulus with a small population of filterswith known (internal) noise characteristics The filterresponses are given by the dot product of the contrast-normalized stimulus scaled by the maximum responserate plus response noise (here a small amount ofGaussian noise) With a known noise model it isstraightforward to compute the mean and variance ofeach filterrsquos response to each training stimulus Then aclosed-form expression is derived for the approximateaccuracy of the Bayesian optimal decoder (GeislerNajemnik amp Ing 2009) Finally this closed-formexpression can be used to search the space of linearfilters to find those that give the most accurateperformance If the algorithm does not settle in localminima it finds the Bayes-optimal filters for maximiz-ing performance in a given task (Geisler et al 2009)Here different random initializations yielded the samefinal estimated filters A Matlab implementation ofAMA and a short discussion of how to apply it areavailable at httpjburgecpsutexaseduresearchCodehtml

It is important to note that the Bayesian optimaldecoder used in AMA requires knowing the means andvariances for each possible stimulus and hence it canonly be used to decode the training stimuli In otherwords AMA can find the optimal linear filters given alarge enough training set but the decoder it uses tolearn those filters cannot be applied to arbitrary stimuliA separate analysis (like the one described in the bodyof the present paper) is required to determine how tooptimally decode the AMA filter responses for arbi-trary test stimuli

Estimating disparity

Increasing the number of disparity levels in thetraining set increases the accuracy of disparity estima-tion However increasing the number of disparitylevels in the training set also increases the training setsize and the computational complexity of learningfilters via AMA A balance must be struck Excellentcontinuous estimates are obtained using 1875 arcminsteps for training followed by interpolation of 577 filterresponse distributions (ie 31 interpolated response

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 15

distribution between each training step correspondingto one interpolated distribution every 35 arcsec)Interpolated distributions were obtained by fittingcubic splines through the response distribution meanvectors and covariance matrices This procedureresulted in 577 LL neurons which resulted in posteriorprobability distributions that were defined by 577discrete points MAP estimates were obtained byselecting the disparity with the highest posteriorprobability Increasing the number of interpolateddistributions (ie LL neurons) had no effect onperformance

Keywords natural scene statistics perceptual con-stancy ideal observer Bayesian statistics populationcode encoding decoding selectivity invariance depthperception stereopsis hierarchical model disparityenergy model simple cells complex cells

Acknowledgments

The authors thank Larry Cormack Bruno Olshau-sen and Jonathan Pillow for comments on an earlierdraft JB was supported by NSF grant IIS-1111328NIH grant 2R01-EY011747 and NIH Training Grant1T32-EY021462-01A1 WSG was supported by NIHgrant 2R01-EY011747

Commercial relationships noneCorresponding author Johannes BurgeEmail jburgemailcpsutexaseduAddress Center for Perceptual Systems University ofTexas at Austin Austin TX USA

References

Albrecht D G amp Geisler W S (1991) Motionselectivity and the contrast-response function ofsimple cells in the visual cortex Visual Neurosci-ence 7(6) 531ndash546

Albrecht D G amp Hamilton D B (1982) Striatecortex of monkey and cat Contrast responsefunction Journal of Neurophysiology 48(1) 217ndash237

Anzai A Ohzawa I amp Freeman R D (1999)Neural mechanisms for processing binocular in-formation I Simple cells Journal of Neurophysiol-ogy 82(2) 891ndash908

Badcock D R amp Schor C M (1985) Depth-increment detection function for individual spatialchannels Journal of the Optical Society of AmericaA 2(7) 1211ndash1215

Banks M S Gepshtein S amp Landy M S (2004)Why is spatial stereoresolution so low Journal ofNeuroscience 24(9) 2077ndash2089

Berens P Ecker A S Gerwinn S Tolias A S ampBethge M (2011) Reassessing optimal neuralpopulation codes with neurometric functionsProceedings of the National Academy of SciencesUSA 108(11) 4423

Blakemore C (1970) The range and scope ofbinocular depth discrimination in man The Journalof Physiology 211(3) 599ndash622

Burge J amp Geisler W S (2011) Optimal defocusestimation in individual natural images Proceed-ings of the National Academy of Sciences USA108(40) 16849ndash16854

Burge J amp Geisler W S (2012) Optimal defocusestimates from individual images for autofocusing adigital camera Proceedings of the ISampTSPIE 47thAnnual Meeting January 2012 Burlingame CA

Carandini M amp Heeger D J (2012) Normalizationas a canonical neural computation Nature ReviewsNeuroscience 13(1) 51ndash62

Chen Y amp Qian N (2004) A coarse-to-fine disparityenergy model with both phase-shift and position-shift receptive field mechanisms Neural Computa-tion 16(8) 1545ndash1577

Cormack L K Stevenson S B amp Schor C M(1991) Interocular correlation luminance contrastand cyclopean processing Vision Research 31(12)2195ndash2207

Cumming B G (2002) An unexpected specializationfor horizontal disparity in primate primary visualcortex Nature 418(6898) 633ndash636

Cumming B G amp DeAngelis G C (2001) Thephysiology of stereopsis Annual Review of Neuro-science 24 203ndash238

Cumming B G Shapiro S E amp Parker A J (1998)Disparity detection in anticorrelated stereogramsPerception 27(11) 1367ndash1377

Curcio C A Sloan K R Kalina R E ampHendrickson A E (1990) Human photoreceptortopography The Journal of Comparative Neurolo-gy 292(4) 497ndash523

De Valois R L Albrecht D G amp Thorell L G(1982) Spatial frequency selectivity of cells inmacaque visual cortex Vision Research 22(5) 545ndash559

DeAngelis G C Ohzawa I amp Freeman R D(1991) Depth is encoded in the visual cortex by aspecialized receptive field structure Nature352(6331) 156ndash159

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 16

Ferster D amp Miller K D (2000) Neural mechanismsof orientation selectivity in the visual cortexAnnual Review of Neuroscience 23(1) 441ndash471

Fincham E amp Walton J (1957) The reciprocalactions of accommodation and convergence TheJournal of Physiology 137(3) 488ndash508

Fleet D J Wagner H amp Heeger D J (1996)Neural encoding of binocular disparity Energymodels position shifts and phase shifts VisionResearch 36(12) 1839ndash1857

Ganguli D amp Simoncelli E P (2010) Implicitencoding of prior probabilities in optimal neuralpopulations Advances in Neural Information Pro-cessing Systems (NIPS) 23 658ndash666

Geisler W S Najemnik J amp Ing A D (2009)Optimal stimulus encoders for natural tasksJournal of Vision 9(13)17 1ndash16 httpwwwjournalofvisionorgcontent91317 10116791317 [PubMed] [Article]

Girshick A R Landy M S amp Simoncelli E P(2011) Cardinal rules Visual orientation percep-tion reflects knowledge of environmental statisticsNature Neuroscience 14(7) 926ndash932

Haefner R amp Bethge M (2010) Evaluating neuronalcodes for inference using Fisher informationAdvances in Neural Information Processing Systems23 1ndash9

Haefner R M amp Cumming B G (2008) Adaptationto natural binocular disparities in primate V1explained by a generalized energy model Neuron57(1) 147ndash158

Harris J M McKee S P amp Smallman H S (1997)Fine-scale processing in human binocular stereop-sis Journal of the Optical Society of America AOptics Image Science and Vision 14(8) 1673ndash1683

Hecht S Shlaer S amp Pirenne M H (1942) Energyquanta and vision The Journal of General Physi-ology 25(6) 819ndash840

Heeger D J (1992) Normalization of cell responses incat striate cortex Visual Neuroscience 9(2) 181ndash197

Hibbard P B (2008) Binocular energy responses tonatural images Vision Research 48(12) 1427ndash1439

Hoyer P O amp Hyvarinen A (2000) Independentcomponent analysis applied to feature extractionfrom colour and stereo images Network Compu-tation in Neural Systems 11(3) 191ndash210

Jazayeri M amp Movshon J A (2006) Optimalrepresentation of sensory information by neuralpopulations Nature Neuroscience 9(5) 690ndash696

Kaye M (1978) Stereopsis without binocular corre-lation Vision Research 18(8) 1013ndash1022

Landers D D amp Cormack L K (1997) Asymme-tries and errors in perception of depth fromdisparity suggest a multicomponent model ofdisparity processing Perception amp Psychophysics59(2) 219ndash231

Lehky S R amp Sejnowski T J (1990) Neural modelof stereoacuity and depth interpolation based on adistributed representation of stereo disparityJournal of Neuroscience 10(7) 2281ndash2299

Levi D M McKee S P amp Movshon J A (2011)Visual deficits in anisometropia Vision Research51(1) 48ndash57

Lewicki M S (2002) Efficient coding of naturalsounds Nature Neuroscience 5(4) 356ndash363

Li Z amp Atick J J (1994) Efficient stereo coding inthe multiscale representation Network Computa-tion in Neural Systems 5 157ndash174

Liu Y Bovik A C amp Cormack L K (2008)Disparity statistics in natural scenes Journal ofVision 8(11)19 1ndash14 httpwwwjournalofvisionorgcontent81119 doi10116781119 [PubMed][Article]

Ma W J Beck J M Latham P E amp Pouget A(2006) Bayesian inference with probabilistic pop-ulation codes Nature Neuroscience 9(11) 1432ndash1438

Marcos S amp Burns S A (2000) On the symmetrybetween eyes of wavefront aberration and conedirectionality Vision Research 40(18) 2437ndash2447

Marr D amp Poggio T (1976) Cooperative computa-tion of stereo disparity Science 194(4262) 283ndash287

McKee S P Levi D M amp Bowne S F (1990) Theimprecision of stereopsis Vision Research 30(11)1763ndash1779

Nakayama K amp Shimojo S (1990) Da Vincistereopsis Depth and subjective occluding contoursfrom unpaired image points Vision Research30(11) 1811ndash1825

Nienborg H Bridge H Parker A J amp CummingB G (2004) Receptive field size in V1 neuronslimits acuity for perceiving disparity modulationJournal of Neuroscience 24(9) 2065ndash2076

Ogle K N (1952) On the limits of stereoscopic visionJournal of Experimental Psychology 44(4) 253ndash259

Ohzawa I (1998) Mechanisms of stereoscopic visionThe disparity energy model Current Opinion inNeurobiology 8(4) 509ndash515

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 17

Ohzawa I DeAngelis G C amp Freeman R D(1990) Stereoscopic depth discrimination in thevisual cortex Neurons ideally suited as disparitydetectors Science 249(4972) 1037ndash1041

Ohzawa I DeAngelis G C amp Freeman R D(1997) Encoding of binocular disparity by complexcells in the catrsquos visual cortex Journal of Neuro-physiology 77(6) 2879ndash2909

Olshausen B A amp Field D J (1996) Emergence ofsimple-cell receptive field properties by learning asparse code for natural images Nature 381(6583)607ndash609

Panum P (1858) Physiologische Untersuchungen uberdas Sehen mit zwei Augen [Translation Psycholog-ical investigations on seeing with two eyes] KielSchwerssche Buchandlung

Poggio G F Gonzalez F amp Krause F (1988)Stereoscopic mechanisms in monkey visual cortexBinocular correlation and disparity selectivityJournal of Neuroscience 8(12) 4531ndash4550

Qian N (1997) Binocular disparity and the perceptionof depth Neuron 18(3) 359ndash368

Qian N amp Zhu Y (1997) Physiological computationof binocular disparity Vision Research 37(13)1811ndash1827

Read J C A amp Cumming B G (2003) Testingquantitative models of binocular disparity selectiv-ity in primary visual cortex Journal of Neurophys-iology 90(5) 2795ndash2817

Read J C A amp Cumming B G (2007) Sensors forimpossible stimuli may solve the stereo correspon-dence problem Nature Neuroscience 10(10) 1322ndash1328

Read J C A Parker A J amp Cumming B G (2002)A simple model accounts for the response ofdisparity-tuned V1 neurons to anticorrelated im-ages Visual Neuroscience 19(6) 735ndash753

Simoncelli E P amp Olshausen B A (2001) Naturalimage statistics and neural representation AnnualReview of Neuroscience 24 1193ndash1216

Skottun B C De Valois R L Grosof D HMovshon J A Albrecht D G amp Bonds A B(1991) Classifying simple and complex cells on the

basis of response modulation Vision Research31(7-8) 1079ndash1086

Stevenson S B Cormack L K Schor C M ampTyler C W (1992) Disparity tuning in mecha-nisms of human stereopsis Vision Research 32(9)1685ndash1694

Stockman A amp Sharpe L T (2000) The spectralsensitivities of the middle- and long-wavelength-sensitive cones derived from measurements inobservers of known genotype Vision Research40(13) 1711ndash1737

Tanabe S amp Cumming B G (2008) Mechanismsunderlying the transformation of disparity signalsfrom V1 to V2 in the macaque Journal ofNeuroscience 28(44) 11304ndash11314

Tanabe S Haefner R M amp Cumming B G (2011)Suppressive mechanisms in monkey V1 help tosolve the stereo correspondence problem Journal ofNeuroscience 31(22) 8295ndash8305

Thibos L N Ye M Zhang X amp Bradley A(1992) The chromatic eye A new reduced-eyemodel of ocular chromatic aberration in humansApplied Optics 31(19) 3594ndash3600

Tsao D Y Conway B R amp Livingstone M S(2003) Receptive fields of disparity-tuned simplecells in macaque V1 Neuron 38(1) 103ndash114

Tyler C W (1975) Spatial organization of binoculardisparity sensitivity Vision Research 15(5) 583ndash590

Tyler C W amp Julesz B (1978) Binocular cross-correlation in time and space Vision Research18(1) 101ndash105

Vlaskamp B N S Yoon G amp Banks M S (2011)Human stereopsis is not limited by the optics of thewell-focused eye Journal of Neuroscience 31(27)9814ndash9818

Williams D R (1985) Visibility of interference fringesnear the resolution limit Journal of the OpticalSociety of America A 2(7) 1087ndash1093

Wyszecki G amp Stiles W (1982) Color scienceConcepts and methods quantitative data and for-mulas New York John Wiley amp Sons

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 18

113 13

Optimal13 disparity13 estimation13 in13 natural13 stereo-shy‐images13 (Supplement)13

Johannes13 Burge13 amp13 Wilson13 S13 Geisler13 13 Computing13 the13 log13 likelihood13 with13 populations13 of13 simple13 and13 complex13 cells13 13 Simple13 cells13 in13 visual13 cortex13 produce13 only13 positive13 responses13 whereas13 AMA13 filters13 produce13 both13 positive13 and13 negative13 responses13 The13 linear13 AMA13 filter13 responses13 (see13 first13 term13 in13 Eq13 5)13 and13 their13 pairwise13 sums13 (see13 the13 sum13 in13 the13 third13 termrsquos13 parentheses)13 can13 be13 obtained13 from13 a13 population13 of13 simple13 cells13 We13 define13 simple13 cells13 as13 units13 whose13 outputs13 result13 from13 linear13 filtering13 followed13 by13 half-shy‐wave13 rectification13 (ie13 thresholding13 at13 zero)13 Each13 AMA13 filter13 response13 in13 the13 first13 term13 can13 be13 obtained13 from13 two13 simple13 cells13 that13 are13 lsquoonrsquo13 and13 lsquooffrsquo13 versions13 of13 each13 AMA13 filter13 (Fig13 S3a13 see13 Fig13 3)13 Each13 pairwise-shy‐summed13 AMA13 filter13 response13 in13 the13 third13 term13 can13 be13 obtained13 from13 two13 simple13 cells13 that13 are13 lsquoonrsquo13 and13 lsquooffrsquo13 versions13 of13 each13 pairwise-shy‐summed13 filter13 (Fig13 S3a13 S4)13 13 13 The13 squared13 AMA13 filter13 responses13 (see13 the13 second13 and13 third13 terms13 in13 Eq13 5)13 can13 be13 obtained13 from13 a13 population13 of13 complex13 cells13 We13 define13 complex13 cells13 as13 units13 whose13 responses13 result13 from13 linear13 filtering13 followed13 by13 a13 squaring13 non-shy‐linearity13 as13 is13 often13 the13 case13 in13 cortex13 Complex13 cells13 can13 be13 implemented13 by13 summing13 and13 then13 squaring13 (or13 squaring13 and13 then13 summing)13 the13 outputs13 of13 lsquoonrsquo13 and13 lsquooffrsquo13 simple13 cells13 (Fig13 S3b13 the13 disparity13 tuning13 curves13 of13 these13 complex13 cells13 are13 given13 in13 Fig13 S5)13 13 Finally13 neurons13 whose13 response13 rates13 represent13 the13 log13 likelihood13 of13 the13 AMA13 filter13 population13 responses13 can13 be13 constructed13 via13 a13 weighted13 sum13 of13 simple13 and13 complex13 cell13 population13 responses13 (Eq13 513 Fig13 613 Fig13 7)13 (Note13 that13 the13 likelihood13 can13 be13 obtained13 by13 exponentiating13 the13 log-shy‐likelihood13 responses)13 In13 other13 words13 a13 large13 collection13 of13 log-shy‐likelihood13 (LL)13 neurons13 each13 with13 a13 different13 preferred13 disparity13 can13 be13 constructed13 from13 a13 fixed13 set13 of13 simple13 and13 complex13 cells13 simply13 by13 changing13 the13 weights13 13 13 Derivation13 of13 optimal13 weights13 on13 simple13 and13 complex13 cells13 The13 first13 term13 in13 Eq13 513 (ie13 the13 linear13 term)13 can13 be13 conceptualized13 as13 a13 weighted13 sum13 of13 simple13 cell13 population13 responses13 The13 second13 and13 third13 terms13 (ie13 the13 quadratic13 terms)13 can13 be13 conceptualized13 as13 weighted13 sums13 of13 complex13 cell13 population13 responses13 The13 constant13 corresponds13 to13 baseline13 response13 The13 constant13 and13 the13 weights13 on13 the13 linear13 squared13 and13 sum-shy‐squared13 filters13 are13 given13 by13

wik = Ck

minus1uk 13 13 13 13 13 13 13 13 13 (S1a)13

13 wiik = minusdiag(Cminus1)+ 05Cminus11 13 13 13 13 13 13 13 (S1b)13

wij k = minus05Cij

minus1 forallij j gt i 13 13 13 13 13 13 13 13 (S1c)13

cons primetk = minus05ukTCk

minus1uk + constk 13 13 13 13 13 13 (S1d)13 13 where13 I13 is13 the13 identity13 matrix13 113 is13 the13 lsquoonesrsquo13 vector13 and13 diag()13 sets13 a13 matrix13 diagonal13 to13 a13 vector13 Here13 we13 derive13 these13 weight13 equations13 based13 on13 the13 multi-shy‐dimensional13 Gaussian13 approximations13 to13 the13 empirically13 determined13 filter13 response13 distributions13 (ie13 conditional13 response13 distributions)13 (see13 Fig13 4a)13 13 13 13 13 13

213 13

First13 we13 rewrite13 Eq13 513 in13 the13 main13 text13 by13 expanding13 its13 third13 term13 13

ln p R δ k( ) = wik Ri

i=1

n

sum + wiik Ri2

i=1

n

sum + wij k Ri2 + 2Ri Rj + Rj

2( )j=i+1

n

sumi=1

nminus1

sum + cons primet 13 (S2)13

13 Second13 we13 rewrite13 Eq13 413 in13 the13 main13 text13

13

ln p R δ k( ) = minus05 R minus uk( )T Ak R minus uk( )+ const 13 13 13 13 13 (S3)13

13 where13 Ak 13 equals13 the13 inverse13 covariance13 matrix13 Ck

minus1 13 (Recall13 that13 the13 mean13 vector13 uk and13 covariance13 matrix13 Ck are13 obtained13 by13 projecting13 a13 collection13 of13 natural13 stereo-shy‐images13 having13 disparity13 δ k 13 onto13 the13 filters)13 From13 here13 forward13 the13 disparity13 index13 k13 is13 dropped13 for13 notational13 simplicity13 Third13 we13 set13 Eqs13 S213 and13 S313 equal13 to13 one13 another13 multiply13 through13 and13 collect13 terms13 The13 terms13 of13 the13 log13 likelihood13 (Eq13 S3)13 are13 found13 to13 have13 the13 form13 13

( ) ( )2 2 205 05 2ii i i ii i ii i i ii ia R u a R a Ru a uminus minus = minus minus + 13 13 13 13 (S4a)13

minusaij Ri minus ui( ) Rj minus u j( ) = minus aijRiRj minus aijRiu j minus aijRjui + aiju jui( ) 13 13 13 (S4b)13

13 where13 iia 13 are13 the13 on-shy‐diagonal13 elements13 of13 A 13 ija are13 the13 off-shy‐diagonal13 elements13 and13 iu 13 the13 are13 the13 mean13 filter13 responses13 13 13 The13 aim13 is13 now13 to13 express13 wi wii 13 and13 wij 13 in13 terms13 of13 iu 13 iia 13 and13 ija 13 First13 consider13 wi 13 In13 this13 case13

we13 want13 to13 collect13 terms13 containing13 iR 13 wiRi 13 from13 Eq13 S213 and13 aiiRiui 13 and13 aijRiu j 13 from13 Eq13 S4a13 Setting13 the13 iR 13 terms13 equal13 and13 canceling13 gives13 13

wi = aiju jj=1

n

sum 13 13 13 13 13 13 13 13 13 (S5a)13

13 Note13 that13 the13 weights13 on13 the13 linear13 filter13 responses13 wi 13 are13 zero13 if13 the13 mean13 linear13 filter13 responses13 ui 13 are13 zero13 In13 our13 case13 the13 filter13 response13 distributions13 means13 (ie13 the13 means13 of13 the13 conditional13 response13 distributions)13 are13 all13 near13 zero13 (see13 Fig13 4a13 Fig13 S2)13 Thus13 the13 weights13 on13 the13 simple13 cell13 responses13 (Fig13 6a)13 contribute13 little13 to13 the13 units13 (LL13 neurons)13 that13 respond13 according13 to13 the13 log13 likelihood13 of13 the13 linear13 filter13 population13 response13 (see13 Discussion)13 13 Next13 consider13 wij 13 i ne j 13 In13 this13 case13 we13 want13 to13 collect13 terms13 containing13 RiRj 13 2wijRiRj 13 appears13

in13 Eq13 S213 and13 minusaijRiRj 13 appears13 in13 Eq13 S4b13 Setting13 the13 two13 RiRj 13 terms13 equal13 to13 each13 other13 and13 canceling13 gives13 13

wij = minus05aij 13 13 13 13 13 13 13 13 13 (S5b)13

313 13

Next13 consider13 wii 13 In13 this13 case13 the13 total13 weight13 on13 2iR 13 in13 Eq13 S213 must13 be13 05 iiaminus 13 The13 weight13 from13

the13 third13 term13 of13 Eq13 S213 can13 be13 found13 by13 substituting13 wij = minus05aij 13 Subtracting13 that13 weight13 from13 the13

total13 weight13 specified13 by13 Eq13 S413 gives13 wii 13 13

wii = minusaii + 05 aijjsum 13 13 13 13 13 13 13 13 13 (S5c)13

13 Last13 by13 grouping13 terms13 from13 Eq13 S413 that13 are13 independent13 of13 the13 filter13 response13 we13 find13 the13 constant13 in13 Eq13 S213 is13 given13 by13 13

cons primet = minus05 aiimicroi2 + aijmicroimicro j

j foralljneisum

⎝⎜⎞

⎠⎟isum + const 13 13 13 13 (S5d)13

13 Eqs13 S1a-shy‐d13 are13 obtained13 by13 rewriting13 Eqs13 S5a-shy‐d13 in13 matrix13 notation13 13 13 In13 visual13 cortex13 simple13 and13 complex13 cells13 have13 the13 same13 maximum13 firing13 rates13 on13 average13 (Albrecht13 amp13 Hamilton13 1982)13 However13 the13 maximum13 possible13 response13 of13 the13 squared13 filters13 (see13 the13 second13 term13 in13 Eq13 5)13 is13 R2max 13 and13 the13 maximum13 possible13 response13 of13 the13 sum-shy‐squared13 filters13 (see13 the13 third13

term13 in13 Eq13 5)13 is13 given13 by13 R2ijmax = Rmax 2+ 2fi sdot f j( )2 13 where13 fi 13 and13 f j 13 are13 the13 weighting13 functions13 defining13 filters13 i13 and13 j13 (see13 Fig13 3a)13 Thus13 if13 these13 computations13 were13 implemented13 in13 cortexmdashthat13 is13 if13 the13 maximum13 response13 of13 the13 complex13 cells13 equaled13 the13 maximum13 response13 of13 the13 simple13 cells13 Rmax 13 rather13 than13 R

2max 13 and13 R

2ijmaxmdashthe13 weights13 on13 the13 complex13 cell13 responses13 must13 be13 scaled13

Thus13 the13 optimal13 weights13 on13 complex13 cells13 are13 given13 by13 primewi = wi 13 primewii = Rmaxwii 13 and13 primewij =Rijmax2

Rmaxwij 13

These13 scale13 factors13 are13 important13 because13 in13 a13 neurophysiological13 experiment13 one13 would13 measure13 primew 13 not13 w 13 Consequently13 primewii 13 and13 primewij 13 are13 the13 normalized13 weights13 plotted13 in13 Fig13 7b13

13 Evaluating13 the13 effect13 of13 surface13 slant13 Slant13 causes13 disparity13 to13 change13 across13 the13 patch13 We13 evaluated13 the13 effect13 of13 surface13 slant13 by13 determining13 the13 optimal13 filters13 and13 estimation13 performance13 for13 a13 training13 set13 with13 a13 distribution13 of13 slants13 If13 we13 assume13 (plausibly)13 that13 objects13 are13 on13 average13 equally13 likely13 to13 be13 viewed13 from13 any13 given13 angle13 then13 for13 a13 given13 small13 angular13 retinal13 extent13 the13 probability13 of13 a13 given13 slant13 about13 the13 vertical13 axis13 should13 fall13 as13 a13 half-shy‐cosine13 function13 Slant13 was13 calculated13 for13 the13 cyclopean13 eye13 To13 evaluate13 the13 effect13 of13 surface13 slant13 variation13 we13 generated13 a13 training13 set13 of13 stereo-shy‐images13 from13 a13 set13 of13 surfaces13 with13 slants13 that13 were13 randomly13 drawn13 from13 a13 truncated13 (+7113 deg)13 cosine13 probability13 density13 (more13 extreme13 slants13 were13 not13 practical13 to13 include13 because13 of13 the13 surface13 size13 required13 to13 produce13 a13 projected13 patch13 width13 of13 113 deg)13 The13 within-shy‐patch13 changes13 in13 disparity13 that13 were13 introduced13 by13 these13 slants13 are13 comparable13 to13 those13 in13 natural13 viewing13 (Fig13 S1g)13 Results13 based13 on13 this13 analysis13 are13 thus13 likely13 to13 be13 representative13 Then13 we13 determined13 the13 optimal13 binocular13 filters13 for13 this13 new13 training13 set13 The13 filters13 that13 were13 trained13 on13 surfaces13 having13 different13 slants13 (Fig13 S1a-shy‐c)13 and13 the13 LL13 neurons13 that13 are13 constructed13 from13 them13 (Fig13 S1de)13 are13 quite13 similar13 to13 those13 that13 were13 trained13 only13 on13 fronto-shy‐parallel13 surfaces13 The13 main13 (although13 slight)13 differences13 are13 that13 the13 new13 filters13 have13 spatial13 extents13 that13 are13 slightly13 smaller13 and13 octave13 bandwidths13 that13 are13 slightly13 higher13

413 13

and13 that13 the13 new13 LL13 neuron13 tuning13 curves13 are13 broader13 Disparity13 estimation13 performance13 on13 slant13 surfaces13 is13 also13 comparable13 but13 precision13 is13 somewhat13 reduced13 (Fig13 S1f13 Fig13 5d)13 When13 disparity13 changes13 across13 the13 patch13 the13 optimal13 encoding13 filters13 must13 be13 small13 enough13 to13 prevent13 the13 disparity13 signal13 from13 being13 lsquosmeared13 outrsquo13 by13 within-shy‐patch13 (ie13 within-shy‐receptive13 field)13 disparity13 variation13 And13 they13 must13 be13 large13 enough13 to13 prevent13 a13 poor13 signal-shy‐to-shy‐noise13 ratio13 which13 would13 lead13 to13 a13 preponderance13 of13 false13 matches13 (ie13 inaccurate13 disparity13 estimates)13 The13 optimal13 filters13 appear13 to13 strike13 this13 balance13 13 13 Evaluating13 the13 effect13 of13 depth13 variations13 other13 than13 slant13 Naturally13 occurring13 fine13 depth13 structure13 and13 occlusions13 (ie13 within-shy‐patch13 depth13 variation13 and13 depth13 discontinuities13 Fig13 S8a-shy‐c)13 were13 not13 present13 in13 either13 of13 our13 training13 sets13 To13 determine13 how13 the13 magnitude13 of13 these13 other13 sources13 of13 depth13 variation13 compare13 to13 that13 due13 to13 slant13 we13 analyzed13 4000013 113 deg13 patches13 from13 3013 outdoor13 range13 images13 obtained13 with13 a13 high-shy‐precision13 Riegl13 VZ40013 range13 scanner13 For13 each13 patch13 we13 computed13 the13 standard13 deviation13 of13 i)13 the13 differences13 between13 the13 raw13 range13 values13 and13 the13 best-shy‐fitting13 fronto-shy‐parallel13 plane13

σ nonminus planar 13 and13 ii)13 the13 differences13

between13 the13 best-shy‐fitting13 slanted13 and13 fronto-shy‐parallel13 planes13 σ planar 13 The13 ratio13 of13 these13 standard13

deviations13 σ planar σ nonminus planar 13 provides13 a13 measure13 of13 the13 relative13 magnitude13 of13 depth13 variation13 due13 to13

slant13 and13 variation13 due13 to13 other13 factors13 If13 the13 depth13 variation13 is13 due13 only13 to13 surface13 slant13 then13 the13 ratio13

σ planar σ nonminus planar 13 will13 have13 a13 value13 equal13 to13 1013 If13 all13 of13 the13 depth13 variation13 is13 due13 to13 factors13

other13 than13 slant13 the13 ratio13 will13 have13 a13 value13 of13 0013 Fig13 S8d13 plots13 the13 distribution13 of13 this13 ratio13 for13 all13 range13 patches13 In13 the13 most13 common13 patches13 essentially13 all13 the13 variation13 is13 captured13 by13 slant13 (ie13 the13 modal13 ratio13 is13 ~10)13 The13 median13 ratio13 is13 07513 indicating13 that13 in13 the13 median13 patch13 7513 of13 the13 depth13 variation13 is13 captured13 by13 slant13 Given13 that13 the13 optimal13 binocular13 filters13 are13 robust13 to13 variation13 in13 surface13 slant13 and13 given13 that13 most13 of13 natural13 depth13 variation13 is13 captured13 by13 slant13 it13 is13 unlikely13 that13 depth13 variation13 other13 than13 slant13 will13 qualitatively13 affect13 our13 results13 13 External13 variability13 and13 neural13 noise13 13 We13 examined13 the13 relative13 magnitude13 of13 stimulus-shy‐induced13 LL13 neuron13 variability13 and13 typical13 neural13 noise13 in13 cortex13 Typical13 cortical13 neurons13 have13 a13 peak13 mean13 response13 larger13 than13 3013 spikessec13 (Geisler13 amp13 Albrecht13 1997)13 and13 intrinsic13 noise13 variance13 proportional13 to13 the13 neuronrsquos13 mean13 response13 with13 a13 proportionality13 constant13 (ie13 Fano13 factor)13 less13 than13 15(Tolhurst13 Movshon13 amp13 Dean13 1983)13 To13 err13 on13 the13 side13 of13 too13 much13 neural13 noise13 we13 assume13 that13 each13 LL13 neuron13 has13 a13 maximum13 average13 response13 of13 3013 spkss13 and13 a13 Fano13 factor13 of13 1513 (bc13 the13 effect13 of13 noise13 is13 greater13 at13 lower13 firing13 rates)13 We13 convert13 the13 LL13 neuron13 responses13 to13 spike13 rates13 by13 first13 obtaining13 the13 average13 response13 of13 each13 LL13 neuron13 to13 its13 preferred13 disparity13 and13 then13 scaling13 that13 value13 to13 the13 maximum13 average13 response13 For13 an13 integration13 time13 of13 a13 typical13 fixation13 duration13 (30013 ms)13 stimulus-shy‐induced13 response13 variance13 was13 ~3x13 greater13 than13 the13 internal13 noise13 For13 an13 integration13 time13 equal13 to13 a13 typical13 stimulus13 presentation13 in13 a13 psychophysical13 task13 (113 sec)13 13 stimulus-shy‐induced13 response13 variance13 was13 ~9x13 greater13 than13 the13 internal13 noise13 Thus13 external13 variability13 must13 be13 considered13 to13 understand13 the13 influence13 of13 neural13 noise13 on13 the13 behavioral13 limits13 of13 vision13 under13 natural13 conditions13 In13 fact13 stimulus-shy‐induced13 variability13 is13 likely13 to13 be13 the13 controlling13 variability13 in13 many13 natural13 single-shy‐trial13 tasks13 13 Spatial13 frequency13 tuning13 of13 optimal13 linear13 binocular13 filters13 13 To13 develop13 an13 intuition13 for13 why13 spatial13 frequencies13 higher13 than13 ~613 cpd13 carry13 little13 information13 about13 disparity13 we13 examined13 the13 disparity13 signals13 resulting13 from13 a13 binocularly-shy‐viewed13 high-shy‐contrast13 cosine-shy‐windowed13 edge13 The13 Fourier-shy‐decomposition13 of13 each13 eyersquos13 stimulus13 is13 a13 set13 of13

513 13

phase-shy‐aligned13 sinewave13 gratings13 with13 the13 1f13 contrast13 fall-shy‐off13 typical13 of13 natural13 scenes(Field13 1987)13 The13 binocular13 difference13 signal13 (ie13 the13 difference13 of13 the13 left13 and13 right13 signals)13 is13 sinusoidal13 with13 contrast13 amplitude13 given13 by13 13

AB f δ k( ) = AL( f )2 + AR( f )2 minus 2AL( f )AR( f )cos 2π fδ k( ) 13 13 13 (S6)13

13 where f 13 is13 the13 spatial13 frequency13 δ k 13 is13 a13 particular13 disparity13 AL ( f ) 13 and13 AR( f ) 13 are13 the13 left13 and13

right13 eye13 retinal13 amplitudes13 and13 AB f δ k( ) 13 is13 the13 amplitude13 of13 the13 binocular13 contrast13 difference13 signal13 If13 the13 two13 eyes13 have13 identical13 optics13 the13 left-shy‐13 and13 right-shy‐eye13 retinal13 amplitudes13 (which13 include13 the13 effects13 of13 the13 optics13 and13 the13 1 f 13 falloff13 of13 natural13 image13 spectra)13 will13 be13 the13 same13 for13 a13 given13 disparity13 Fig13 S7a13 plots13 pattern13 of13 binocular13 contrast13 signals13 for13 each13 of13 several13 disparities13 Fig13 S7b13 shows13 the13 signals13 after13 filtering13 with13 a13 bank13 of13 log-shy‐Gabor13 filters13 each13 having13 a13 1513 octave13 bandwidth13 The13 patterns13 of13 binocular13 contrast13 barely13 differ13 above13 ~613 cpd13 A13 filter13 population13 (for13 disparity13 estimation)13 will13 be13 most13 efficient13 if13 each13 filter13 signals13 information13 about13 a13 range13 of13 disparities13 13 13 V113 binocular13 receptive13 fields13 tuned13 to13 the13 highest13 useful13 spatial13 frequency13 (~613 cpd)13 would13 have13 a13 spatial13 extent13 of13 ~813 arcmin13 assuming13 the13 typical13 1513 octave13 bandwidth13 of13 neurons13 in13 cortex13 Given13 that13 receptive13 fields13 cannot13 signal13 disparity13 variation13 finer13 than13 their13 own13 width13 this13 analysis13 may13 help13 explain13 why13 humans13 lose13 the13 ability13 to13 detect13 disparity13 modulations13 (ie13 sinusoidal13 modulations13 in13 disparity-shy‐defined13 depth)13 at13 spatial13 frequencies13 higher13 than13 ~413 cpd13 (Banks13 Gepshtein13 amp13 Landy13 200413 Harris13 McKee13 amp13 Smallman13 1997)13 Vergence13 eye13 movement13 jitter13 has13 previously13 been13 proposed13 as13 an13 explanation13 for13 why13 useful13 binocular13 information13 is13 restricted13 to13 low13 spatial13 frequencies13 (Vlaskamp13 Yoon13 amp13 Banks13 2011)13 vergence13 noise13 can13 degrade13 stereo13 information13 enough13 to13 render13 high13 frequency13 signals13 unmeasurable13 (Fig13 S7cd)13 The13 analysis13 presented13 here13 suggests13 that13 the13 frequency13 at13 which13 humans13 lose13 the13 ability13 to13 detect13 disparity13 modulation13 may13 simply13 be13 explained13 by13 the13 statistics13 of13 natural13 images13 13 Optics13 The13 retinal13 images13 used13 in13 our13 analysis13 were13 simulated13 to13 match13 the13 retinal13 image13 formation13 that13 occurs13 in13 the13 human13 eye13 In13 humans13 and13 non-shy‐human13 primates13 accommodative13 and13 vergence13 eye-shy‐movement13 systems13 are13 yoked(Fincham13 amp13 Walton13 1957)13 Consistent13 with13 this13 fact13 we13 set13 the13 focus13 distance13 equal13 to13 the13 fixation13 distance13 of13 4013 cm13 and13 defocused13 the13 images13 appropriate13 for13 each13 disparity13 The13 image13 of13 a13 surface13 displaced13 from13 the13 fixation13 is13 defocused13 by13

13 ΔD cong δ IPD 13 13 13 13 13 13 13 13 13 13 (S7)13

13 where13 δ 13 is13 the13 disparity13 of13 the13 surface13 expressed13 in13 radians13 IPD13 is13 the13 inter-shy‐pupillary13 distance13 expressed13 in13 meters13 and13 ΔD 13 is13 the13 defocus13 expressed13 in13 diopters13 (1meters)13 To13 account13 for13 the13 effect13 of13 chromatic13 aberration13 we13 created13 a13 different13 polychromatic13 point-shy‐spread13 function13 (PSF)13 for13 each13 disparity13 (ie13 defocus13 Eq13 S7)13 Single-shy‐wavelength13 PSFs13 were13 computed13 every13 1013 nm13 between13 40013 and13 70013 nm13 The13 wavelength-shy‐dependent13 change13 in13 refractive13 power13 of13 the13 human13 eye13 was13 taken13 from13 the13 literature13 (Thibos13 Ye13 Zhang13 amp13 Bradley13 1992)13 Polychromatic13 PSFs13 were13 obtained13 by13 weighting13 the13 single-shy‐wavelength13 PSFs13 by13 the13 photopic13 sensitivity13 function13 (Stockman13 amp13 Sharpe13 2000)13 and13 by13 the13 D6513 daylight13 illumination13 spectrum13 and13 then13 summing13 13

613 13

13 psf photopic xΔD( ) = 1

Kpsf xλΔD( )sc

λsum λ( )D65 λ( ) 13 13 13 13 (S8)13

13 where13 sc λ( ) 13 is13 the13 human13 photopic13 sensitivity13 function13 and13 K13 is13 a13 normalizing13 constant13 that13 sets13 the13 volume13 of13 psf photopic 13 to13 1013 The13 modulation13 transfer13 function13 (MTF)13 is13 the13 amplitude13 of13 the13 Fourier13 transform13 of13 the13 PSF13 13 Contrast13 normalization13 Contrast13 normalization13 occurs13 early13 in13 visual13 processing13 The13 standard13 model13 of13 cortical13 neuron13 response13 assumes13 that13 the13 input13 is13 the13 contrast13 signal13

c x( ) = r x( )minus r( ) r 13 where13 r x( ) 13 are13 the13

noisy13 sensor13 responses13 to13 the13 luminance13 image13 falling13 on13 the13 retina13 over13 a13 local13 area13 and13 r 13 is13 the13 local13 mean13 This13 contrast13 signal13 is13 then13 normalized13 by13 the13 local13 contrast13

cnorm x( ) = c x( ) c x( ) 2

+ nc502 13 where13 n13 is13 the13 dimensionality13 of13 the13 vector13 c x( ) 2 13 is13 the13 contrast13

energy13 (or13 equivalently13 n 13 times13 the13 squared13 RMS13 contrast)13 and13 50c 13 is13 the13 half-shy‐saturation13 constant13 (Albrecht13 amp13 Geisler13 199113 Albrecht13 amp13 Hamilton13 198213 Heeger13 1992)13 (We13 are13 agnostic13 about13 the13 mechanism13 by13 which13 the13 contrast13 normalization13 is13 achieved)13 Finally13 the13 normalized13 contrast13 signal13 is13 weighted13 by13 the13 cortical13 neuronrsquos13 receptive13 field13 and13 passed13 through13 a13 static13 non-shy‐linearity13 For13 example13 in13 the13 case13 of13 a13 binocular13 simple13 cell13 the13 response13 of13 the13 neuron13 is13 obtained13 by13 taking13 the13 dot13 product13 of13 the13 contrast-shy‐normalized13 signal13 with13 the13 receptive13 field13 and13 half-shy‐wave13 rectifying13

Ri = Rmax cnorm x( ) sdot fi x( )⎢⎣ ⎥⎦ 13 13

13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13

713 13

13 References13 Albrecht13 D13 G13 amp13 Geisler13 W13 S13 (1991)13 Motion13 selectivity13 and13 the13 contrast-shy‐response13 function13 of13

simple13 cells13 in13 the13 visual13 cortex13 Visual13 Neuroscience13 7(6)13 531ndash54613 Albrecht13 D13 G13 amp13 Hamilton13 D13 B13 (1982)13 Striate13 cortex13 of13 monkey13 and13 cat13 contrast13 response13

function13 Journal13 of13 Neurophysiology13 48(1)13 217ndash23713 Banks13 M13 S13 Gepshtein13 S13 amp13 Landy13 M13 S13 (2004)13 Why13 is13 spatial13 stereoresolution13 so13 low13 The13

Journal13 of13 Neuroscience13 24(9)13 2077ndash208913 13 Field13 D13 J13 (1987)13 Relations13 between13 the13 statistics13 of13 natural13 images13 and13 the13 response13 properties13 of13

cortical13 cells13 Journal13 of13 the13 Optical13 Society13 of13 America13 A13 4(12)13 2379ndash239413 Fincham13 E13 amp13 Walton13 J13 (1957)13 The13 reciprocal13 actions13 of13 accommodation13 and13 convergence13 The13

Journal13 of13 Physiology13 137(3)13 488ndash50813 Geisler13 W13 S13 amp13 Albrecht13 D13 G13 (1997)13 Visual13 cortex13 neurons13 in13 monkeys13 and13 cats13 detection13

discrimination13 and13 identification13 Visual13 Neuroscience13 14(5)13 897ndash91913 Harris13 J13 M13 McKee13 S13 P13 amp13 Smallman13 H13 S13 (1997)13 Fine-shy‐scale13 processing13 in13 human13 binocular13

stereopsis13 Journal13 of13 the13 Optical13 Society13 of13 America13 A13 14(8)13 1673ndash168313 Heeger13 D13 J13 (1992)13 Normalization13 of13 cell13 responses13 in13 cat13 striate13 cortex13 Visual13 Neuroscience13 9(2)13

181ndash19713 Hibbard13 P13 B13 (2008)13 Binocular13 energy13 responses13 to13 natural13 images13 Vision13 Research13 48(12)13 1427ndash

143913 13 Stockman13 A13 amp13 Sharpe13 L13 T13 (2000)13 The13 spectral13 sensitivities13 of13 the13 middle-shy‐13 and13 long-shy‐wavelength-shy‐

sensitive13 cones13 derived13 from13 measurements13 in13 observers13 of13 known13 genotype13 Vision13 Research13 40(13)13 1711ndash173713

Thibos13 L13 N13 Ye13 M13 Zhang13 X13 amp13 Bradley13 A13 (1992)13 The13 chromatic13 eye13 a13 new13 reduced-shy‐eye13 model13 of13 ocular13 chromatic13 aberration13 in13 humans13 Applied13 Optics13 31(19)13 3594ndash360013

Tolhurst13 D13 J13 Movshon13 J13 A13 amp13 Dean13 A13 F13 (1983)13 The13 statistical13 reliability13 of13 signals13 in13 single13 neurons13 in13 cat13 and13 monkey13 visual13 cortex13 Vision13 Research13 23(8)13 775ndash78513

Vlaskamp13 B13 N13 S13 Yoon13 G13 amp13 Banks13 M13 S13 (2011)13 Human13 stereopsis13 is13 not13 limited13 by13 the13 optics13 of13 the13 well-shy‐focused13 eye13 The13 Journal13 of13 Neuroscience13 31(27)13 9814ndash981813

13

813 13

13 Figure13 S113 Optimal13 filters13 and13 disparity13 estimation13 performance13 for13 estimating13 disparity13 for13 surfaces13 with13 a13 cosine13 distribution13 of13 slants13 (a)13 Optimal13 linear13 binocular13 filters13 for13 estimating13 the13 disparity13 (of13 the13 center13 pixel)13 of13 slanted13 surfaces13 The13 filters13 were13 learned13 on13 a13 training13 set13 having13 surfaces13 with13 a13 cosine13 distribution13 of13 slants13 The13 optimal13 filter13 shapes13 are13 robust13 to13 the13 presence13 of13 non-shy‐zero13 surface13 slants13 in13 the13 training13 set13 However13 these13 filters13 are13 of13 somewhat13 smaller13 spatial13 extent13 and13 slightly13 higher13 octave13 bandwidth13 (compare13 filters13 F3-shy‐F513 to13 those13 in13 Fig13 3)13 (b)13 Spatial13 frequency13 tuning13 and13 bandwidth13 of13 filters13 (c)13 Two-shy‐dimensional13 analogs13 of13 filters13 in13 (a)13 The13 two-shy‐dimensional13 filters13 respond13 virtually13 identically13 to13 two-shy‐dimensional13 images13 as13 do13 the13 one-shy‐dimensional13 filters13 to13 one-shy‐dimensional13 signals13 (d)13 Tuning13 curves13 of13 LL13 neurons13 trained13 and13 tested13 on13 surfaces13 with13 varying13 surface13 slant13 (e)13 Normalized13 weights13 on13 model13 complex13 cell13 responses13 for13 constructing13 five13 LL13 neurons13 with13 the13 indicated13 preferred13 disparities13 (f)13 Disparity13 estimation13 performance13 with13 surfaces13 having13 a13 cosine13 distribution13 of13 slants13 (black13 line)13 Also13 shown13 are13 disparity13 estimates13 derived13 from13 a13 local13 cross-shy‐correlator13 (gray13 line)13 (g)13 Disparity13 gradients13 from13 viewing13 natural13 scenes(Hibbard13 2008)13 vs13 disparity13 gradients13 in13 our13 training13 set13 The13 relationship13 between13 slant13 and13 disparity13 gradient13 is13 given13 by13

g asymp IPD tanθ( ) d 13 where13 θ 13 is13 the13 surface13 slant13 and13 d13 is13 the13 distance13 to13 the13 surface13 Hibbard13 (2008)13 estimated13 disparity13 gradients13 in13 natural13 viewing13 for13 distances13 nearer13 than13 3513 m13 (dashed13 curve)13 and13 0513 m13 (dotted13 curve)13 At13 the13 04m13 viewing13 distance13 assumed13 throughout13 the13 paper13 our13 manipulation13 of13 surface13 slant13 introduced13 distribution13 of13 disparity13 gradients13 (black13 line)13 that13 is13 comparable13 to13 the13 gradients13 introduced13 in13 natural13 viewing13 13

913 13

13 Figure13 S213 Filter13 responses13 LL13 neuron13 responses13 and13 performance13 with13 random-shy‐dot13 stereograms13 and13 anti-shy‐correlated13 random-shy‐dot13 stereograms13 (a)13 Joint13 response13 distributions13 of13 filters13 F113 and13 F213 when13 stimulated13 with13 random-shy‐dot13 stereograms13 Compare13 to13 natural13 stimuli13 in13 Fig13 4a13 (b)13 Joint13 responses13 of13 filters13 F313 and13 F413 when13 stimulated13 with13 random-shy‐dot13 stereograms13 (c)13 LL13 neuron13 responses13 to13 random13 dot13 stereograms13 (d)13 LL13 neuron13 tuning13 vs13 bandwidth13 when13 stimulated13 with13 natural13 stimuli13 Note13 that13 bandwidths13 are13 elevated13 compared13 to13 when13 the13 LL13 neurons13 are13 stimulated13 with13 natural13 stimuli13 (see13 Fig13 7c)13 (e)13 Disparity13 estimates13 and13 estimate13 precision13 (inset)13 obtained13 with13 random-shy‐dot13 stereograms13 (fghi)13 Same13 as13 abce13 except13 that13 the13 stimuli13 were13 anti-shy‐correlated13 random13 dot13 stereograms13 Filter13 response13 distributions13 still13 segregate13 as13 a13 function13 of13 disparity13 (especially13 with13 F313 and13 F4)13 However13 the13 anti-shy‐correlated13 filter13 response13 distributions13 do13 not13 coincide13 with13 the13 lsquoexpectedrsquo13 pattern13 of13 response13 for13 particular13 disparities13 The13 disparities13 are13 decoded13 improperly13 with13 a13 decoder13 trained13 on13 natural13 stimuli13 LL13 neuron13 tuning13 curves13 with13 anti-shy‐correlated13 stereograms13 are13 neither13 selective13 nor13 invariant13 Inaccurate13 highly13 variable13 estimates13 result13 13 13

13 13

1013 13

13 Figure13 S313 Obtaining13 AMA13 filter13 and13 model13 complex13 cell13 responses13 with13 established13 neural13 operations13 (a)13 Simple13 cell13 responses13 are13 obtained13 by13 projecting13 the13 left-shy‐13 and13 right-shy‐eye13 signals13 onto13 each13 binocular13 filter13 and13 half-shy‐wave13 rectifying13 Subtracting13 lsquoonrsquo13 and13 lsquooffrsquo13 simple13 cell13 responses13 give13 the13 AMA13 filter13 responses13 (see13 Fig13 3a)13 (b)13 Model13 (and13 neurophysiological)13 complex13 cell13 responses13 are13 obtained13 by13 summing13 and13 squaring13 the13 responses13 of13 simple13 cells13 (Model13 complex13 cell13 responses13 can13 also13 be13 obtained13 by13 summing13 the13 responses13 of13 simple13 cells13 that13 have13 both13 a13 half-shy‐wave13 rectifying13 and13 a13 squaring13 nonlinearity)13 Note13 that13 the13 model13 complex13 cells13 differ13 from13 canonical13 disparity13 energy13 binocular13 complex13 cells13 We13 label13 them13 lsquocomplexrsquo13 because13 they13 exhibit13 temporal13 frequency13 doubling13 and13 have13 a13 fundamental13 to13 DC13 response13 ratio13 of13 less13 than13 one13 The13 model13 LL13 neurons13 (see13 Fig13 613 and13 Fig13 7)13 are13 more13 similar13 to13 disparity13 energy13 complex13 cells13 although13 as13 noted13 in13 the13 text13 there13 are13 also13 important13 differences13

13

1113 13

13 Figure13 S413 The13 eight13 AMA13 filters13 from13 Fig13 313 (on13 the13 diagonal)13 and13 their13 pairwise13 sums13 (see13 Fig13 S3a)13 Solid13 and13 dashed13 curves13 indicate13 the13 left13 and13 right13 eye13 filter13 components13 respectively13 Filter13 identity13 is13 indicated13 by13 the13 subscripts13 Fij13 Some13 filters13 have13 shapes13 similar13 to13 canonical13 V1-shy‐like13 binocular13 receptive13 fields13 These13 filters13 yield13 model13 complex13 cell13 tuning13 curves13 that13 vary13 with13 disparity13 (Fig13 S5)13 Other13 filters13 are13 irregularly13 shaped13 (eg13 F36)13 Irregularly13 shaped13 filters13 yield13 model13 complex13 cell13 tuning13 curves13 that13 do13 not13 vary13 significantly13 with13 disparity13 (Fig13 S5)13 The13 inset13 shows13 the13 largest13 absolute13 normalized13 weight13 (see13 Supplement13 Eqs13 S1a-shy‐d)13 given13 to13 each13 filter13 in13 constructing13 the13 set13 of13 LL13 neurons13 (Fig13 7a)13 Lighter13 colors13 indicate13 smaller13 weights13 Unsurprisingly13 the13 irregularly13 shaped13 filters13 (eg13 F36)13 play13 a13 generally13 smaller13 role13 in13 shaping13 the13 response13 properties13 of13 log-shy‐likelihood13 (LL)13 neurons13 13 13

1213 13

13 Figure13 S513 Disparity13 tuning13 curves13 of13 the13 model13 complex13 cells13 implied13 by13 the13 AMA13 filters13 (on-shy‐diagonal)13 and13 their13 normalized13 pairwise13 sums13 (off-shy‐diagonal)13 Solid13 curves13 show13 the13 complex13 cell13 tuning13 curves13 to13 natural13 stimuli13 The13

complex13 response13 is13 given13 by13 filtering13 the13 contrast-shy‐normalized13 stimulus13 and13 then13 squaring13 Rmax fi

Tcnorm( )2 13 (see13 Fig13 S4)13 Each13 point13 on13 the13 curve13 represents13 the13 average13 response13 to13 a13 large13 number13 of13 natural13 stimuli13 having13 the13 same13 disparity13 Gray13 area13 shows13 response13 variability13 due13 to13 variation13 in13 natural13 image13 content13 Dashed13 curves13 show13 the13 complex13 cell13 tuning13 curve13 to13 513 random-shy‐dot13 stereograms13 (RDSs)13 In13 most13 cases13 RDS13 tuning13 curves13 are13 similar13 to13 the13 tuning13 curves13 for13 natural13 stimuli13 although13 response13 magnitude13 is13 generally13 somewhat13 attenuated13 These13 individual13 model13 complex13 cells13 have13 some13 selectivity13 for13 disparity13 However13 they13 exhibit13 little13 invariance13 to13 variation13 in13 natural13 image13 content13 Thus13 the13 pattern13 of13 the13 joint13 population13 response13 must13 be13 used13 to13 accurately13 estimate13 disparity13 Neurons13 that13 signal13 the13 log-shy‐likelihood13 of13 disparity13 (LL13 neurons)13 can13 be13 constructed13 by13 appropriately13 combining13 the13 joint13 responses13 of13 model13 complex13 cells13 LL13 neurons13 are13 highly13 selective13 for13 disparity13 and13 are13 also13 strongly13 response13 invariant13 to13 irrelevant13

1313 13

13 Figure13 S613 Optimal13 disparity13 estimation13 vs13 estimation13 via13 local13 cross-shy‐correlation13 Performance13 measures13 were13 obtained13 by13 testing13 each13 method13 on13 identical13 sets13 of13 left-shy‐13 and13 right-shy‐eye13 signals13 like13 those13 shown13 in13 Fig13 1d13 (Note13 that13 this13 differs13 somewhat13 from13 the13 most13 common13 test13 of13 the13 cross-shy‐correlation13 model13 which13 gives13 the13 algorithm13 windowed13 access13 to13 many13 patches13 in13 each13 eye13 We13 chose13 the13 current13 method13 so13 that13 each13 algorithm13 has13 access13 to13 the13 same13 image13 information13 thereby13 making13 the13 comparison13 meaningful)13 Black13 curves13 indicate13 estimation13 performance13 via13 local13 cross-shy‐correlation13 Gray13 curves13 indicate13 performance13 obtained13 with13 the13 eight13 AMA13 filters13 shown13 in13 the13 main13 text13 (data13 replotted13 from13 Fig13 5)13 (a)13 Median13 MAP13 estimates13 for13 frontoparallel13 surfaces13 only13 Error13 bars13 indicate13 6813 confidence13 intervals13 (b)13 Estimate13 precision13 as13 indicated13 by13 6813 confidence13 intervals13 Error13 bars13 from13 (a)13 are13 replotted13 on13 a13 linear13 axis13 (Confidence13 intervals13 were13 plotted13 on13 a13 semi-shy‐log13 plot13 in13 the13 main13 text)13 (c)13 Sign13 identification13 performance13 with13 frontoparallel13 surfaces13 (de)13 Same13 as13 bc13 except13 that13 the13 performance13 is13 for13 surfaces13 with13 a13 cosine13 distribution13 of13 slants13 Estimation13 performance13 with13 the13 cross-shy‐correlator13 is13 (slightly)13 more13 biased13 and13 is13 significantly13 less13 precise13 at13 disparities13 larger13 than13 7513 arcmin13 13

1413 13

13 Figure13 S713 Information13 for13 disparity13 estimation13 is13 concentrated13 in13 low13 to13 mid13 spatial13 frequencies13 (a)13 Inter-shy‐ocular13 retinal13 contrast13 amplitude13 of13 a13 cosine-shy‐windowed13 high-shy‐contrast13 edge13 as13 a13 function13 of13 spatial13 frequency13 for13 different13 disparities13 (Eq13 S6)13 The13 optics13 in13 the13 two13 eyes13 were13 identical13 ie13

AL f( ) = AR f( )prop MTFδ f( ) f 13 where13 MTFδ is13 the13 modulation13 transfer13 function13 associated13 with13 a13 particular13 disparity13 (see13 Methods)13 (b)13 Inter-shy‐ocular13 contrast13 signals13 that13 have13 been13 passed13 through13 a13 bank13 of13 1513 octave13 bandwidth13 log-shy‐Gabor13 filters13 The13 signals13 differ13 little13 above13 ~513 cpd13 (c)13 The13 effect13 of13 vergence13 jitter13 on13 inter-shy‐ocular13 contrast13 signals13 Signals13 were13 corrupted13 by13 213 arcmin13 of13 vergence13 jitter13 a13 typical13 value13 in13 humans13 Vergence13 noise13 was13 simulated13 by13 convolving13 each13 disparity13 signal13 with13 a13 Gaussian13 having13 a13 standard13 deviation13 of13 213 arcmin13 Vergence13 jitter13 further13 attenuates13 high13 frequency13 inter-shy‐ocular13 contrast13 signals13 Natural13 image13 structure13 and13 vergence13 jitter13 both13 contribute13 to13 the13 lack13 of13 disparity13 information13 at13 high13 luminance13 spatial13 frequencies13 (d)13 The13 signals13 in13 (c)13 after13 having13 been13 passed13 through13 a13 bank13 of13 1513 octave13 bandwidth13 log-shy‐Gabor13 filters13 (e)13 The13 effect13 of13 a13 four13 diopter13 difference13 in13 refractive13 power13 between13 the13 two13 eyes13 drastically13 reduces13 the13 inter-shy‐ocular13 contrast13 signal13 A13 four13 diopter13 (or13 larger)13 difference13 between13 the13 left13 and13 right13 optics13 is13 a13 primary13 risk13 factor13 for13 the13 development13 of13 amblyopia13 (Levi13 et13 al13 2011)13 Note13 that13 these13 signals13 were13 simulated13 for13 eyes13 with13 2mm13 pupils13 With13 larger13 pupil13 diameters13 the13 same13 difference13 in13 refractive13 power13 would13 further13 reduce13 available13 disparity13 signals13 For13 example13 with13 a13 2mm13 pupil13 an13 813 diopter13 difference13 between13 the13 eyes13 produces13 the13 same13 difference13 in13 defocus13 blur13 as13 would13 a13 4mm13 pupil13 with13 a13 413 diopters13 difference13 between13 the13 eyes13 In13 that13 case13 there13 are13 no13 measureable13 signals13 above13 113 cpd13 (f)13 The13 signals13 in13 (e)13 after13 having13 been13 passed13 through13 a13 bank13 of13 1513 octave13 bandwidth13 log-shy‐Gabor13 filters13 (g)13 Same13 as13 (e)13 but13 with13 a13 4mm13 pupil13 (h)13 Same13 as13 (f)13 but13 with13 a13 4mm13 pupil13 13

13

1513 13

13 Figure13 S813 Depth13 structure13 not13 attributable13 to13 slant13 in13 natural13 scenes13 (a)13 A13 representative13 natural13 range13 image13 (b)13 Overhead13 plan13 view13 of13 the13 range13 along13 the13 white13 scan-shy‐line13 in13 (a)13 (c)13 Range13 values13 within13 a13 113 deg13 patch13 marked13 by13 the13 white13 box13 in13 (a)13 and13 the13 black13 box13 in13 (b)13 The13 dots13 represent13 the13 raw13 range13 values13 The13 plane13 is13 the13 best13 fitting13 plane13 to13 the13 range13 data13 (d)13 Depth13 variation13 in13 113 deg13 patches13 of13 natural13 scenes13 can13 largely13 be13 captured13 by13 slanted13 planes13 (see13 supplementary13 text)13 Values13 near13 one13 indicate13 that13 all13 range13 variation13 can13 be13 captured13 by13 (planar)13 surface13 slant13 Values13 near13 zero13 indicate13 that13 none13 of13 the13 range13 variation13 can13 be13 captured13 by13 planar13 surface13 slant13 In13 the13 most13 common13 patch13 nearly13 all13 depth13 variation13 is13 attributable13 to13 slant13 In13 the13 median13 patch13 7513 of13 depth13 variation13 is13 captured13 by13 slant13 13 13 13 13 13 13 13 13 13 13

1613 13

13 Figure13 S913 Contrast13 normalization13 and13 the13 Gaussian13 form13 of13 the13 filter13 response13 distributions13 The13 standard13 model13 of13

contrast13 normalization13 in13 retinal13 and13 cortical13 neurons13 is13 given13 by13 cnorm x( ) = c x( ) c x( ) 2

+ nc502 (Albrecht13 amp13 Geisler13

199113 Albrecht13 amp13 Hamilton13 198213 Heeger13 1992)13 Throughout13 the13 paper13 we13 assumed13 that13 c50 13 had13 a13 value13 of13 zero13 To13 examine13 the13 effect13 of13 different13 values13 of13 c50 13 on13 the13 form13 of13 the13 filter13 response13 distributions13 we13 first13 contrast-shy‐normalized13 the13 test13 stimuli13 using13 different13 values13 of13 c50 13 Then13 we13 projected13 the13 contrast-shy‐normalized13 the13 test13 stimuli13 from13 each13 disparity13 level13 onto13 the13 filters13 to13 obtain13 the13 conditional13 filter13 response13 distributions13 Finally13 we13 fit13 a13 multi-shy‐dimensional13 generalized13 Gaussian13 (one-shy‐dimension13 for13 each13 filter)13 to13 each13 conditional13 response13 distribution13

P R |δ k( ) 13 via13 maximum13 likelihood13 estimation13 (a)13 When13 c50 =13 0013 the13 conditional13 response13 distributions13 have13 tails13 that13 tend13 to13 be13 somewhat13 lighter13 than13 Gaussians13 (ie13 lower13 kurtosis13 than13 a13 Gaussian)13 When13 c50 =13 0113 the13 conditional13 response13 distributions13 are13 most13 Gaussian13 on13 average13 For13 c50 0113 the13 distributions13 have13 heavier13 tails13 than13 Gaussians13 (ie13 higher13 kurtosis13 than13 a13 Gaussian)13 (b)13 Powers13 of13 the13 multi-shy‐dimensional13 generalized13 Gaussians13 that13 best13 fit13 a13 subset13 of13 the13 two-shy‐dimensional13 conditional13 filter13 response13 distributions13 A13 power13 of13 two13 (dashed13 line)13 indicates13 that13 the13 best13 fitting13 distribution13 was13 Gaussian13 Powers13 greater13 than13 or13 less13 than13 two13 indicate13 that13 the13 responses13 are13 better13 fit13 by13 distributions13 having13 lighter13 or13 heavier13 tails13 than13 a13 Gaussian13 respectively13 The13 best-shy‐fitting13 powers13 are13 shown13 for13 three13 different13 values13 of13 c50 13 00013 01013 and13 02513 The13 average13 powers13 across13 the13 conditional13 response13 distributions13 for13 the13 three13 tested13 values13 of13 c50 13 were13 27913 20713 and13 11713 respectively13 The13 same13 qualitative13 results13 hold13 for13 the13 other13 two-shy‐dimensional13 response13 distributions13 The13 pattern13 also13 holds13 for13 higher-shy‐dimensional13 filter13 response13 distributions13 In13 higher13 dimensions13 however13 results13 are13 noisier13 because13 stable13 estimates13 of13 kurtosis13 require13 significantly13 more13 data13 13 13 13 13 13

  • jovi-14-02-01_118
    • Introduction
    • f01
    • Methods and results
    • e01
    • f02
    • e02
    • e03
    • e04
    • f03
    • e05
    • f04
    • f05
    • f06
    • f07
    • Discussion
    • Conclusions
    • Methods details
    • e06
    • e07
    • Albrecht1
    • Albrecht2
    • Anzai1
    • Badcock1
    • Banks1
    • Berens1
    • Blakemore1
    • Burge1
    • Burge2
    • Carandini1
    • Chen1
    • Cormack1
    • Cumming1
    • Cumming2
    • Cumming3
    • Curcio1
    • DeValois1
    • DeAngelis1
    • Ferster1
    • Fincham1
    • Fleet1
    • Ganguli1
    • Geisler1
    • Girshick1
    • Haefner1
    • Haefner2
    • Harris1
    • Hecht1
    • Heeger1
    • Hibbard1
    • Hoyer1
    • Jazayeri1
    • Kaye1
    • Landers1
    • Lehky1
    • Levi1
    • Lewicki1
    • Li1
    • Liu1
    • Ma1
    • Marcos1
    • Marr1
    • McKee1
    • Nakayama1
    • Nienborg1
    • Ogle1
    • Ohzawa1
    • Ohzawa2
    • Ohzawa3
    • Olshausen1
    • Panum1
    • Poggio1
    • Qian1
    • Qian2
    • Read1
    • Read2
    • Read3
    • Simoncelli1
    • Skottun1
    • Stevenson1
    • Stockman1
    • Tanabe1
    • Tanabe2
    • Thibos1
    • Tsao1
    • Tyler1
    • Tyler2
    • Vlaskamp1
    • Williams1
    • Wyszecki1
      • JOV-03741-2013-s01
Page 13: Optimal disparity estimation in natural stereo images · The local differences between the retinal images—the binocular disparities— ... vertically oriented receptive fields.The

a high-precision Riegl VZ400 range scanner (seeSupplement) We find that most of the depth variationin the range images is captured by variation in slant(Figure S8d) Given that variation in slant has littleeffect on the optimal receptive fields and given thatlocations of fine depth structure and monocularocclusion zones (9 of points in our dataset) aremost likely random with respect to the receptive fieldsthen it is likely that the other sources of variation willhave little effect

Nevertheless occlusions create monocular zones thatcarry information supporting da Vinci stereopsis inhumans (Kaye 1978 Nakayama amp Shimojo 1990)Neurophysiological evidence suggests that binocularmechanisms exist for locating occlusion boundaries(Tsao Conway amp Livingstone 2003) In the futurewhen databases become available of high-qualitynatural stereo images with coregistered RGB anddistance information the general approach describedhere may prove useful for determining the filters thatare optimal for detecting occlusion boundaries

The effect of a realistic disparity prior

Our analysis assumed a flat prior probabilitydistribution over disparity Two groups have recentlypublished estimates of the prior distribution ofdisparities based on range measurements in naturalscenes and human eye movement statistics (Hibbard2008 Liu et al 2008) Does the prior distribution ofdisparity signals have a significant effect on the optimalfilters and estimation performance To check wemodified the frequency of different disparities in ourtraining set to match the prior distribution ofdisparities encountered in natural viewing (Liu et al2008) Linear filter shapes LL neuron tuning curveshapes and performance levels were robust to differ-ences between a flat and realistic disparity prior

It has been hypothesized that the prior over thestimulus dimension of interest (eg disparity) maydetermine the optimal tuning curve shapes and theoptimal distribution of peak tunings (ie how thetuning curves should tile the stimulus dimension)(Ganguli amp Simoncelli 2010) In evaluating thishypothesis one must consider the results presented inthe present paper Our results show that natural signalsand the linear receptive fields that filter those signalsplace strong constraints on the shapes that tuningcurves can have Specifically the shapes of the LLneuron disparity tuning curve (approximately log-Gaussian) are robust to changes in the prior Thusalthough the prior may influence the optimal distribu-tion of peak tuning (our analysis does not address thisissue) it is unlikely to be the sole (or even the primary)determinant of tuning curve shapes

The effect of contrast normalization

Contrast normalization significantly contributes tothe Gaussian form of the filter response distributionsOne potential advantage of Gaussian response distri-butions is that pair-wise (and lower order) statisticsfully characterize the joint responses from an arbitrarynumber of filters making possible the decoding of largefilter population responses In the analysis presentedhere all binocular signals were normalized to a mean ofzero and a vector magnitude of 10 before beingprojected onto the binocular filters This is a simplifiedform of the contrast normalization that is a ubiquitousfeature of retinal and early cortical processing (Car-andini amp Heeger 2012) The standard model ofcontrast normalization in cortical neurons is given by

cnorm(x)frac14 c(x)ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffijjcethxTHORNjj2 thorn nc2

50

qwhere n is the dimen-

sionality of the vector and c50 is the half-saturationconstant (Albrecht amp Geisler 1991 Albrecht ampHamilton 1982 Heeger 1992) (The half-saturationconstant is so-called because in a neuron with anoutput squaring nonlinearity the response rate willequal half its maximum when the contrast of thestimulus equals the value of c50)

We examined the effect of different half-saturationconstants The optimal filters are robust to differentvalues of c50 The conditional response distributions arenot For large values of c50 the response distributionshave tails much heavier than Gaussians When c50frac14 00(as it did throughout the paper) the distributions arewell approximated but somewhat lighter-tailed thanGaussians The distributions (eg see Figure 4a) aremost Gaussian on average when c50 frac14 01 (Figure S9)On the basis of this finding we hypothesize a newfunction for cortical contrast normalization in additionto the many already proposed (Carandini amp Heeger2012) Contrast normalization may help create condi-tional filter response distributions that are Gaussianthereby making simpler the encoding and decoding ofhigh-dimensional subspaces of retinal image informa-tion

The effect of different optics in the two eyes

We modeled the optics of the two eyes as beingidentical whereas refractive power and monochromaticaberrations often differ somewhat between the eyes(Marcos amp Burns 2000) To evaluate the effect ofnormal optical variations we convolved the retinalprojections with point-spread functions previouslymeasured in the first authorrsquos left and right eyes (Burgeamp Geisler 2011) (The first author has normal opticalvariation) Then we determined optimal filters andperformance Filters and performance levels are similarto those in Figures 3 and 4 Thus our results are robustto typical optical differences between the eyes How-

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 13

ever if the optics in one eye is severely degradedrelative to the other then the optimal filters are quitedifferent and disparity estimation performance issignificantly poorer

Interestingly a persistent 4 diopter difference inrefractive power between the two eyes is the mostimportant risk factor for the development of amblyopia(Levi McKee amp Movshon 2011) a condition char-acterized by extremely poor or absent stereopsis(among other deficits) A four diopter differencebetween the left and right eye optics drastically reduces(ie nearly eliminates) computable interocular contrastsignals that covary systematically with disparity(Equation S6 Figure S7endashh)

Conclusions

The method presented here provides a prescriptionfor how to optimally estimate local depth cues fromimages captured by the photosensors in any visionsystem Our analysis of the depth cue of binoculardisparity provides insight into the computations andneural populations required for accurate estimation ofbinocular disparity in animals viewing natural scenesSpecifically by analyzing the information available atthe photoreceptors we improve upon existing compu-tational methods for solving the stereo correspondenceproblem and provide a normative account of a range ofestablished neurophysiological and psychophysicalfindings This study demonstrates the power ofcharacterizing the properties of natural signals that arerelevant for performing specific natural tasks A similarrecent analysis provided insight into the neuralcomputations required for accurate estimation of thefocus error (defocus) in local regions of retinal imagesof natural scenes (Burge amp Geisler 2011 Stevenson etal 1992) The same approach seems poised to provideinsight into the neural encoding and decoding thatunderlie many other fundamental sensory and percep-tual tasks

Methods details

Natural disparity signals

To determine the luminance structure of naturalscenes we photographed natural scenes on and aroundthe University of Texas at Austin campus with a tripod-mounted Nikon D700 14-bit SLR camera (4256 middot 2836pixels) fitted with a Sigma 50 mm prime lens (Burge ampGeisler 2011) To ensure sharp photographs thecamera lens was focused on optical infinity and all

imaged objects were at least 16 m from the cameraRAW photographs were converted to luminance valuesusing the measured sensitivities of the camera (Burge ampGeisler 2011 2012) and the human photopic sensitivityfunction (Stockman amp Sharpe 2000) Twelve hundred256 middot 256 pixel patches were randomly selected from80 photographs (15 Patches middot 80 Images Figure 1b)Four hundred were used for training the other 800were used for testing

To approximate the depth structure of naturalscenes we texture map the luminance patches ontoplanar fronto-parallel or slanted surface patches thatwere straight ahead at eye height After perspectiveprojection this procedure yields the same pattern ofretinal stimulation that would be created by viewingphotographs pasted on surfaces (eg walls) differentlyslanted in depth The disparity of the surface patch wasdefined as the disparity of the surfacersquos central point

d frac14 2 tan1 IPD=2

dfixation thorn D

tan1 IPD=2

ethdfixationTHORN

eth6THORNwhere dfixation is the fixation distance to the surfacecenter IPD is the interpupillary distance and D is thedepth Depth is given by D frac14 dsurface dfixation wheredsurface is the distance to the surface The disparity ofother points on the surface patch varies slightly acrossthe surface patch The change in disparity across thesurface is greater when the surface patches are slanted

Each surface patch was either coincident with ordisplaced from the point of fixation (Figure 1c) that isthe surface patches were positioned at one of 37 depthscorresponding to 37 disparity levels within Panumrsquosfusional range (Panum 1858) (16875 to 16875arcmin in equally spaced steps) Each surface patchsubtended 28 of visual angle from the cyclopean eyeSurfaces were then slanted (or not) and left- and right-eye projections were determined via perspective pro-jection ensuring that horizontal and vertical disparitieswere correct (Figure 1c) Next left and right projec-tions were resampled at 128 samples8 approximatelythe sampling rate of the human foveal cones (CurcioSloan Kalina amp Hendrickson 1990) Finally theimages were cropped such that they subtended 18 fromeach eye (6 058 about the left- and right-eye foveae)

Optics

Patches were defocused with a polychromatic point-spread function based on a wave-optics model of thehuman visual system The model assumed a 2 mm pupil(a size typical for a bright sunny day) (Wyszecki ampStiles 1982) human chromatic aberrations (ThibosYe Zhang amp Bradley 1992) a single refracting

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 14

surface and the Fraunhoffer approximation whichimplies that at or near the focal plane the opticaltransfer function (OTF) is given by the cross-correla-tion of the generalized pupil function with its complexconjugate In humans and nonhuman primates ac-commodative and vergence eye-movement systems aretightly coupled In other words the focus distanceusually equals the fixation distance (Fincham ampWalton 1957) We set the focus distance equal to thefixation distance of 40 cm and defocused the imagesappropriate for each disparity See Supplement forfurther details of how the optics were simulated

Sensor array responses

The information for disparity estimation depends onthe depth and luminance structure of natural scenesprojective geometry the optics of the two eyes thesensor arrays and the sensor noise These factorstogether determine the pattern of noisy sensor re-sponses

reethxTHORN frac14 IeethxTHORNpsfeethxTHORNfrac12 sampethxTHORN thorn g eth7THORNfor each eye e Ie(x) represents an eye-specific lumi-nance image of the light striking the sensor array ateach location xfrac14 (x y) in a hypothetical optical systemthat causes zero degradation in image quality Eacheyersquos optics degrade the idealized image the optics arerepresented by a polychromatic point-spread functionpsfe(x) that contains the effects of defocus chromaticaberration and photopic wavelength sensitivity Thesensor arrays are represented by a spatial samplingfunctions samp(x) Finally the sensor responses arecorrupted by noise g the noise level was set just highenough to remove retinal image detail that is unde-tectable by the human visual system (Williams 1985)(In pilot studies we found that the falloff in contrastsensitivity at low frequencies had a negligible effect onresults for simplicity we did not model its effects) Notethat Equation 7 represents a luminance (photopic)approximation of the images that would be captured bythe photoreceptors This approximation is sufficientlyaccurate for the present purposes (Burge amp Geisler2011)

Accuracy maximization analysis (AMA)

AMA is a method for dimensionality reduction thatfinds the low-dimensional set of features in sensorysignals that are most useful for performing a specifictask The dependence of AMA on task renders itdistinct from many other popular methods for dimen-sionality reduction Principal components analysis(PCA) for example finds the dimensions that account

for the most variation in the sensory signals There isof course no guarantee that factors causing the mostvariation in the sensory signals will be useful for thetask at hand Color lighting and the multitude oftextures in natural images for example cause signifi-cant variation in the retinal images Yet all of thisvariation is irrelevant for the task of estimatingdisparity Thus PCA returns features that are notnecessarily useful to the task while AMA is specificallydesigned to ignore the irrelevant variation

The logic of AMA is as follows Consider encodingeach training stimulus with a small population of filterswith known (internal) noise characteristics The filterresponses are given by the dot product of the contrast-normalized stimulus scaled by the maximum responserate plus response noise (here a small amount ofGaussian noise) With a known noise model it isstraightforward to compute the mean and variance ofeach filterrsquos response to each training stimulus Then aclosed-form expression is derived for the approximateaccuracy of the Bayesian optimal decoder (GeislerNajemnik amp Ing 2009) Finally this closed-formexpression can be used to search the space of linearfilters to find those that give the most accurateperformance If the algorithm does not settle in localminima it finds the Bayes-optimal filters for maximiz-ing performance in a given task (Geisler et al 2009)Here different random initializations yielded the samefinal estimated filters A Matlab implementation ofAMA and a short discussion of how to apply it areavailable at httpjburgecpsutexaseduresearchCodehtml

It is important to note that the Bayesian optimaldecoder used in AMA requires knowing the means andvariances for each possible stimulus and hence it canonly be used to decode the training stimuli In otherwords AMA can find the optimal linear filters given alarge enough training set but the decoder it uses tolearn those filters cannot be applied to arbitrary stimuliA separate analysis (like the one described in the bodyof the present paper) is required to determine how tooptimally decode the AMA filter responses for arbi-trary test stimuli

Estimating disparity

Increasing the number of disparity levels in thetraining set increases the accuracy of disparity estima-tion However increasing the number of disparitylevels in the training set also increases the training setsize and the computational complexity of learningfilters via AMA A balance must be struck Excellentcontinuous estimates are obtained using 1875 arcminsteps for training followed by interpolation of 577 filterresponse distributions (ie 31 interpolated response

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 15

distribution between each training step correspondingto one interpolated distribution every 35 arcsec)Interpolated distributions were obtained by fittingcubic splines through the response distribution meanvectors and covariance matrices This procedureresulted in 577 LL neurons which resulted in posteriorprobability distributions that were defined by 577discrete points MAP estimates were obtained byselecting the disparity with the highest posteriorprobability Increasing the number of interpolateddistributions (ie LL neurons) had no effect onperformance

Keywords natural scene statistics perceptual con-stancy ideal observer Bayesian statistics populationcode encoding decoding selectivity invariance depthperception stereopsis hierarchical model disparityenergy model simple cells complex cells

Acknowledgments

The authors thank Larry Cormack Bruno Olshau-sen and Jonathan Pillow for comments on an earlierdraft JB was supported by NSF grant IIS-1111328NIH grant 2R01-EY011747 and NIH Training Grant1T32-EY021462-01A1 WSG was supported by NIHgrant 2R01-EY011747

Commercial relationships noneCorresponding author Johannes BurgeEmail jburgemailcpsutexaseduAddress Center for Perceptual Systems University ofTexas at Austin Austin TX USA

References

Albrecht D G amp Geisler W S (1991) Motionselectivity and the contrast-response function ofsimple cells in the visual cortex Visual Neurosci-ence 7(6) 531ndash546

Albrecht D G amp Hamilton D B (1982) Striatecortex of monkey and cat Contrast responsefunction Journal of Neurophysiology 48(1) 217ndash237

Anzai A Ohzawa I amp Freeman R D (1999)Neural mechanisms for processing binocular in-formation I Simple cells Journal of Neurophysiol-ogy 82(2) 891ndash908

Badcock D R amp Schor C M (1985) Depth-increment detection function for individual spatialchannels Journal of the Optical Society of AmericaA 2(7) 1211ndash1215

Banks M S Gepshtein S amp Landy M S (2004)Why is spatial stereoresolution so low Journal ofNeuroscience 24(9) 2077ndash2089

Berens P Ecker A S Gerwinn S Tolias A S ampBethge M (2011) Reassessing optimal neuralpopulation codes with neurometric functionsProceedings of the National Academy of SciencesUSA 108(11) 4423

Blakemore C (1970) The range and scope ofbinocular depth discrimination in man The Journalof Physiology 211(3) 599ndash622

Burge J amp Geisler W S (2011) Optimal defocusestimation in individual natural images Proceed-ings of the National Academy of Sciences USA108(40) 16849ndash16854

Burge J amp Geisler W S (2012) Optimal defocusestimates from individual images for autofocusing adigital camera Proceedings of the ISampTSPIE 47thAnnual Meeting January 2012 Burlingame CA

Carandini M amp Heeger D J (2012) Normalizationas a canonical neural computation Nature ReviewsNeuroscience 13(1) 51ndash62

Chen Y amp Qian N (2004) A coarse-to-fine disparityenergy model with both phase-shift and position-shift receptive field mechanisms Neural Computa-tion 16(8) 1545ndash1577

Cormack L K Stevenson S B amp Schor C M(1991) Interocular correlation luminance contrastand cyclopean processing Vision Research 31(12)2195ndash2207

Cumming B G (2002) An unexpected specializationfor horizontal disparity in primate primary visualcortex Nature 418(6898) 633ndash636

Cumming B G amp DeAngelis G C (2001) Thephysiology of stereopsis Annual Review of Neuro-science 24 203ndash238

Cumming B G Shapiro S E amp Parker A J (1998)Disparity detection in anticorrelated stereogramsPerception 27(11) 1367ndash1377

Curcio C A Sloan K R Kalina R E ampHendrickson A E (1990) Human photoreceptortopography The Journal of Comparative Neurolo-gy 292(4) 497ndash523

De Valois R L Albrecht D G amp Thorell L G(1982) Spatial frequency selectivity of cells inmacaque visual cortex Vision Research 22(5) 545ndash559

DeAngelis G C Ohzawa I amp Freeman R D(1991) Depth is encoded in the visual cortex by aspecialized receptive field structure Nature352(6331) 156ndash159

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 16

Ferster D amp Miller K D (2000) Neural mechanismsof orientation selectivity in the visual cortexAnnual Review of Neuroscience 23(1) 441ndash471

Fincham E amp Walton J (1957) The reciprocalactions of accommodation and convergence TheJournal of Physiology 137(3) 488ndash508

Fleet D J Wagner H amp Heeger D J (1996)Neural encoding of binocular disparity Energymodels position shifts and phase shifts VisionResearch 36(12) 1839ndash1857

Ganguli D amp Simoncelli E P (2010) Implicitencoding of prior probabilities in optimal neuralpopulations Advances in Neural Information Pro-cessing Systems (NIPS) 23 658ndash666

Geisler W S Najemnik J amp Ing A D (2009)Optimal stimulus encoders for natural tasksJournal of Vision 9(13)17 1ndash16 httpwwwjournalofvisionorgcontent91317 10116791317 [PubMed] [Article]

Girshick A R Landy M S amp Simoncelli E P(2011) Cardinal rules Visual orientation percep-tion reflects knowledge of environmental statisticsNature Neuroscience 14(7) 926ndash932

Haefner R amp Bethge M (2010) Evaluating neuronalcodes for inference using Fisher informationAdvances in Neural Information Processing Systems23 1ndash9

Haefner R M amp Cumming B G (2008) Adaptationto natural binocular disparities in primate V1explained by a generalized energy model Neuron57(1) 147ndash158

Harris J M McKee S P amp Smallman H S (1997)Fine-scale processing in human binocular stereop-sis Journal of the Optical Society of America AOptics Image Science and Vision 14(8) 1673ndash1683

Hecht S Shlaer S amp Pirenne M H (1942) Energyquanta and vision The Journal of General Physi-ology 25(6) 819ndash840

Heeger D J (1992) Normalization of cell responses incat striate cortex Visual Neuroscience 9(2) 181ndash197

Hibbard P B (2008) Binocular energy responses tonatural images Vision Research 48(12) 1427ndash1439

Hoyer P O amp Hyvarinen A (2000) Independentcomponent analysis applied to feature extractionfrom colour and stereo images Network Compu-tation in Neural Systems 11(3) 191ndash210

Jazayeri M amp Movshon J A (2006) Optimalrepresentation of sensory information by neuralpopulations Nature Neuroscience 9(5) 690ndash696

Kaye M (1978) Stereopsis without binocular corre-lation Vision Research 18(8) 1013ndash1022

Landers D D amp Cormack L K (1997) Asymme-tries and errors in perception of depth fromdisparity suggest a multicomponent model ofdisparity processing Perception amp Psychophysics59(2) 219ndash231

Lehky S R amp Sejnowski T J (1990) Neural modelof stereoacuity and depth interpolation based on adistributed representation of stereo disparityJournal of Neuroscience 10(7) 2281ndash2299

Levi D M McKee S P amp Movshon J A (2011)Visual deficits in anisometropia Vision Research51(1) 48ndash57

Lewicki M S (2002) Efficient coding of naturalsounds Nature Neuroscience 5(4) 356ndash363

Li Z amp Atick J J (1994) Efficient stereo coding inthe multiscale representation Network Computa-tion in Neural Systems 5 157ndash174

Liu Y Bovik A C amp Cormack L K (2008)Disparity statistics in natural scenes Journal ofVision 8(11)19 1ndash14 httpwwwjournalofvisionorgcontent81119 doi10116781119 [PubMed][Article]

Ma W J Beck J M Latham P E amp Pouget A(2006) Bayesian inference with probabilistic pop-ulation codes Nature Neuroscience 9(11) 1432ndash1438

Marcos S amp Burns S A (2000) On the symmetrybetween eyes of wavefront aberration and conedirectionality Vision Research 40(18) 2437ndash2447

Marr D amp Poggio T (1976) Cooperative computa-tion of stereo disparity Science 194(4262) 283ndash287

McKee S P Levi D M amp Bowne S F (1990) Theimprecision of stereopsis Vision Research 30(11)1763ndash1779

Nakayama K amp Shimojo S (1990) Da Vincistereopsis Depth and subjective occluding contoursfrom unpaired image points Vision Research30(11) 1811ndash1825

Nienborg H Bridge H Parker A J amp CummingB G (2004) Receptive field size in V1 neuronslimits acuity for perceiving disparity modulationJournal of Neuroscience 24(9) 2065ndash2076

Ogle K N (1952) On the limits of stereoscopic visionJournal of Experimental Psychology 44(4) 253ndash259

Ohzawa I (1998) Mechanisms of stereoscopic visionThe disparity energy model Current Opinion inNeurobiology 8(4) 509ndash515

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 17

Ohzawa I DeAngelis G C amp Freeman R D(1990) Stereoscopic depth discrimination in thevisual cortex Neurons ideally suited as disparitydetectors Science 249(4972) 1037ndash1041

Ohzawa I DeAngelis G C amp Freeman R D(1997) Encoding of binocular disparity by complexcells in the catrsquos visual cortex Journal of Neuro-physiology 77(6) 2879ndash2909

Olshausen B A amp Field D J (1996) Emergence ofsimple-cell receptive field properties by learning asparse code for natural images Nature 381(6583)607ndash609

Panum P (1858) Physiologische Untersuchungen uberdas Sehen mit zwei Augen [Translation Psycholog-ical investigations on seeing with two eyes] KielSchwerssche Buchandlung

Poggio G F Gonzalez F amp Krause F (1988)Stereoscopic mechanisms in monkey visual cortexBinocular correlation and disparity selectivityJournal of Neuroscience 8(12) 4531ndash4550

Qian N (1997) Binocular disparity and the perceptionof depth Neuron 18(3) 359ndash368

Qian N amp Zhu Y (1997) Physiological computationof binocular disparity Vision Research 37(13)1811ndash1827

Read J C A amp Cumming B G (2003) Testingquantitative models of binocular disparity selectiv-ity in primary visual cortex Journal of Neurophys-iology 90(5) 2795ndash2817

Read J C A amp Cumming B G (2007) Sensors forimpossible stimuli may solve the stereo correspon-dence problem Nature Neuroscience 10(10) 1322ndash1328

Read J C A Parker A J amp Cumming B G (2002)A simple model accounts for the response ofdisparity-tuned V1 neurons to anticorrelated im-ages Visual Neuroscience 19(6) 735ndash753

Simoncelli E P amp Olshausen B A (2001) Naturalimage statistics and neural representation AnnualReview of Neuroscience 24 1193ndash1216

Skottun B C De Valois R L Grosof D HMovshon J A Albrecht D G amp Bonds A B(1991) Classifying simple and complex cells on the

basis of response modulation Vision Research31(7-8) 1079ndash1086

Stevenson S B Cormack L K Schor C M ampTyler C W (1992) Disparity tuning in mecha-nisms of human stereopsis Vision Research 32(9)1685ndash1694

Stockman A amp Sharpe L T (2000) The spectralsensitivities of the middle- and long-wavelength-sensitive cones derived from measurements inobservers of known genotype Vision Research40(13) 1711ndash1737

Tanabe S amp Cumming B G (2008) Mechanismsunderlying the transformation of disparity signalsfrom V1 to V2 in the macaque Journal ofNeuroscience 28(44) 11304ndash11314

Tanabe S Haefner R M amp Cumming B G (2011)Suppressive mechanisms in monkey V1 help tosolve the stereo correspondence problem Journal ofNeuroscience 31(22) 8295ndash8305

Thibos L N Ye M Zhang X amp Bradley A(1992) The chromatic eye A new reduced-eyemodel of ocular chromatic aberration in humansApplied Optics 31(19) 3594ndash3600

Tsao D Y Conway B R amp Livingstone M S(2003) Receptive fields of disparity-tuned simplecells in macaque V1 Neuron 38(1) 103ndash114

Tyler C W (1975) Spatial organization of binoculardisparity sensitivity Vision Research 15(5) 583ndash590

Tyler C W amp Julesz B (1978) Binocular cross-correlation in time and space Vision Research18(1) 101ndash105

Vlaskamp B N S Yoon G amp Banks M S (2011)Human stereopsis is not limited by the optics of thewell-focused eye Journal of Neuroscience 31(27)9814ndash9818

Williams D R (1985) Visibility of interference fringesnear the resolution limit Journal of the OpticalSociety of America A 2(7) 1087ndash1093

Wyszecki G amp Stiles W (1982) Color scienceConcepts and methods quantitative data and for-mulas New York John Wiley amp Sons

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 18

113 13

Optimal13 disparity13 estimation13 in13 natural13 stereo-shy‐images13 (Supplement)13

Johannes13 Burge13 amp13 Wilson13 S13 Geisler13 13 Computing13 the13 log13 likelihood13 with13 populations13 of13 simple13 and13 complex13 cells13 13 Simple13 cells13 in13 visual13 cortex13 produce13 only13 positive13 responses13 whereas13 AMA13 filters13 produce13 both13 positive13 and13 negative13 responses13 The13 linear13 AMA13 filter13 responses13 (see13 first13 term13 in13 Eq13 5)13 and13 their13 pairwise13 sums13 (see13 the13 sum13 in13 the13 third13 termrsquos13 parentheses)13 can13 be13 obtained13 from13 a13 population13 of13 simple13 cells13 We13 define13 simple13 cells13 as13 units13 whose13 outputs13 result13 from13 linear13 filtering13 followed13 by13 half-shy‐wave13 rectification13 (ie13 thresholding13 at13 zero)13 Each13 AMA13 filter13 response13 in13 the13 first13 term13 can13 be13 obtained13 from13 two13 simple13 cells13 that13 are13 lsquoonrsquo13 and13 lsquooffrsquo13 versions13 of13 each13 AMA13 filter13 (Fig13 S3a13 see13 Fig13 3)13 Each13 pairwise-shy‐summed13 AMA13 filter13 response13 in13 the13 third13 term13 can13 be13 obtained13 from13 two13 simple13 cells13 that13 are13 lsquoonrsquo13 and13 lsquooffrsquo13 versions13 of13 each13 pairwise-shy‐summed13 filter13 (Fig13 S3a13 S4)13 13 13 The13 squared13 AMA13 filter13 responses13 (see13 the13 second13 and13 third13 terms13 in13 Eq13 5)13 can13 be13 obtained13 from13 a13 population13 of13 complex13 cells13 We13 define13 complex13 cells13 as13 units13 whose13 responses13 result13 from13 linear13 filtering13 followed13 by13 a13 squaring13 non-shy‐linearity13 as13 is13 often13 the13 case13 in13 cortex13 Complex13 cells13 can13 be13 implemented13 by13 summing13 and13 then13 squaring13 (or13 squaring13 and13 then13 summing)13 the13 outputs13 of13 lsquoonrsquo13 and13 lsquooffrsquo13 simple13 cells13 (Fig13 S3b13 the13 disparity13 tuning13 curves13 of13 these13 complex13 cells13 are13 given13 in13 Fig13 S5)13 13 Finally13 neurons13 whose13 response13 rates13 represent13 the13 log13 likelihood13 of13 the13 AMA13 filter13 population13 responses13 can13 be13 constructed13 via13 a13 weighted13 sum13 of13 simple13 and13 complex13 cell13 population13 responses13 (Eq13 513 Fig13 613 Fig13 7)13 (Note13 that13 the13 likelihood13 can13 be13 obtained13 by13 exponentiating13 the13 log-shy‐likelihood13 responses)13 In13 other13 words13 a13 large13 collection13 of13 log-shy‐likelihood13 (LL)13 neurons13 each13 with13 a13 different13 preferred13 disparity13 can13 be13 constructed13 from13 a13 fixed13 set13 of13 simple13 and13 complex13 cells13 simply13 by13 changing13 the13 weights13 13 13 Derivation13 of13 optimal13 weights13 on13 simple13 and13 complex13 cells13 The13 first13 term13 in13 Eq13 513 (ie13 the13 linear13 term)13 can13 be13 conceptualized13 as13 a13 weighted13 sum13 of13 simple13 cell13 population13 responses13 The13 second13 and13 third13 terms13 (ie13 the13 quadratic13 terms)13 can13 be13 conceptualized13 as13 weighted13 sums13 of13 complex13 cell13 population13 responses13 The13 constant13 corresponds13 to13 baseline13 response13 The13 constant13 and13 the13 weights13 on13 the13 linear13 squared13 and13 sum-shy‐squared13 filters13 are13 given13 by13

wik = Ck

minus1uk 13 13 13 13 13 13 13 13 13 (S1a)13

13 wiik = minusdiag(Cminus1)+ 05Cminus11 13 13 13 13 13 13 13 (S1b)13

wij k = minus05Cij

minus1 forallij j gt i 13 13 13 13 13 13 13 13 (S1c)13

cons primetk = minus05ukTCk

minus1uk + constk 13 13 13 13 13 13 (S1d)13 13 where13 I13 is13 the13 identity13 matrix13 113 is13 the13 lsquoonesrsquo13 vector13 and13 diag()13 sets13 a13 matrix13 diagonal13 to13 a13 vector13 Here13 we13 derive13 these13 weight13 equations13 based13 on13 the13 multi-shy‐dimensional13 Gaussian13 approximations13 to13 the13 empirically13 determined13 filter13 response13 distributions13 (ie13 conditional13 response13 distributions)13 (see13 Fig13 4a)13 13 13 13 13 13

213 13

First13 we13 rewrite13 Eq13 513 in13 the13 main13 text13 by13 expanding13 its13 third13 term13 13

ln p R δ k( ) = wik Ri

i=1

n

sum + wiik Ri2

i=1

n

sum + wij k Ri2 + 2Ri Rj + Rj

2( )j=i+1

n

sumi=1

nminus1

sum + cons primet 13 (S2)13

13 Second13 we13 rewrite13 Eq13 413 in13 the13 main13 text13

13

ln p R δ k( ) = minus05 R minus uk( )T Ak R minus uk( )+ const 13 13 13 13 13 (S3)13

13 where13 Ak 13 equals13 the13 inverse13 covariance13 matrix13 Ck

minus1 13 (Recall13 that13 the13 mean13 vector13 uk and13 covariance13 matrix13 Ck are13 obtained13 by13 projecting13 a13 collection13 of13 natural13 stereo-shy‐images13 having13 disparity13 δ k 13 onto13 the13 filters)13 From13 here13 forward13 the13 disparity13 index13 k13 is13 dropped13 for13 notational13 simplicity13 Third13 we13 set13 Eqs13 S213 and13 S313 equal13 to13 one13 another13 multiply13 through13 and13 collect13 terms13 The13 terms13 of13 the13 log13 likelihood13 (Eq13 S3)13 are13 found13 to13 have13 the13 form13 13

( ) ( )2 2 205 05 2ii i i ii i ii i i ii ia R u a R a Ru a uminus minus = minus minus + 13 13 13 13 (S4a)13

minusaij Ri minus ui( ) Rj minus u j( ) = minus aijRiRj minus aijRiu j minus aijRjui + aiju jui( ) 13 13 13 (S4b)13

13 where13 iia 13 are13 the13 on-shy‐diagonal13 elements13 of13 A 13 ija are13 the13 off-shy‐diagonal13 elements13 and13 iu 13 the13 are13 the13 mean13 filter13 responses13 13 13 The13 aim13 is13 now13 to13 express13 wi wii 13 and13 wij 13 in13 terms13 of13 iu 13 iia 13 and13 ija 13 First13 consider13 wi 13 In13 this13 case13

we13 want13 to13 collect13 terms13 containing13 iR 13 wiRi 13 from13 Eq13 S213 and13 aiiRiui 13 and13 aijRiu j 13 from13 Eq13 S4a13 Setting13 the13 iR 13 terms13 equal13 and13 canceling13 gives13 13

wi = aiju jj=1

n

sum 13 13 13 13 13 13 13 13 13 (S5a)13

13 Note13 that13 the13 weights13 on13 the13 linear13 filter13 responses13 wi 13 are13 zero13 if13 the13 mean13 linear13 filter13 responses13 ui 13 are13 zero13 In13 our13 case13 the13 filter13 response13 distributions13 means13 (ie13 the13 means13 of13 the13 conditional13 response13 distributions)13 are13 all13 near13 zero13 (see13 Fig13 4a13 Fig13 S2)13 Thus13 the13 weights13 on13 the13 simple13 cell13 responses13 (Fig13 6a)13 contribute13 little13 to13 the13 units13 (LL13 neurons)13 that13 respond13 according13 to13 the13 log13 likelihood13 of13 the13 linear13 filter13 population13 response13 (see13 Discussion)13 13 Next13 consider13 wij 13 i ne j 13 In13 this13 case13 we13 want13 to13 collect13 terms13 containing13 RiRj 13 2wijRiRj 13 appears13

in13 Eq13 S213 and13 minusaijRiRj 13 appears13 in13 Eq13 S4b13 Setting13 the13 two13 RiRj 13 terms13 equal13 to13 each13 other13 and13 canceling13 gives13 13

wij = minus05aij 13 13 13 13 13 13 13 13 13 (S5b)13

313 13

Next13 consider13 wii 13 In13 this13 case13 the13 total13 weight13 on13 2iR 13 in13 Eq13 S213 must13 be13 05 iiaminus 13 The13 weight13 from13

the13 third13 term13 of13 Eq13 S213 can13 be13 found13 by13 substituting13 wij = minus05aij 13 Subtracting13 that13 weight13 from13 the13

total13 weight13 specified13 by13 Eq13 S413 gives13 wii 13 13

wii = minusaii + 05 aijjsum 13 13 13 13 13 13 13 13 13 (S5c)13

13 Last13 by13 grouping13 terms13 from13 Eq13 S413 that13 are13 independent13 of13 the13 filter13 response13 we13 find13 the13 constant13 in13 Eq13 S213 is13 given13 by13 13

cons primet = minus05 aiimicroi2 + aijmicroimicro j

j foralljneisum

⎝⎜⎞

⎠⎟isum + const 13 13 13 13 (S5d)13

13 Eqs13 S1a-shy‐d13 are13 obtained13 by13 rewriting13 Eqs13 S5a-shy‐d13 in13 matrix13 notation13 13 13 In13 visual13 cortex13 simple13 and13 complex13 cells13 have13 the13 same13 maximum13 firing13 rates13 on13 average13 (Albrecht13 amp13 Hamilton13 1982)13 However13 the13 maximum13 possible13 response13 of13 the13 squared13 filters13 (see13 the13 second13 term13 in13 Eq13 5)13 is13 R2max 13 and13 the13 maximum13 possible13 response13 of13 the13 sum-shy‐squared13 filters13 (see13 the13 third13

term13 in13 Eq13 5)13 is13 given13 by13 R2ijmax = Rmax 2+ 2fi sdot f j( )2 13 where13 fi 13 and13 f j 13 are13 the13 weighting13 functions13 defining13 filters13 i13 and13 j13 (see13 Fig13 3a)13 Thus13 if13 these13 computations13 were13 implemented13 in13 cortexmdashthat13 is13 if13 the13 maximum13 response13 of13 the13 complex13 cells13 equaled13 the13 maximum13 response13 of13 the13 simple13 cells13 Rmax 13 rather13 than13 R

2max 13 and13 R

2ijmaxmdashthe13 weights13 on13 the13 complex13 cell13 responses13 must13 be13 scaled13

Thus13 the13 optimal13 weights13 on13 complex13 cells13 are13 given13 by13 primewi = wi 13 primewii = Rmaxwii 13 and13 primewij =Rijmax2

Rmaxwij 13

These13 scale13 factors13 are13 important13 because13 in13 a13 neurophysiological13 experiment13 one13 would13 measure13 primew 13 not13 w 13 Consequently13 primewii 13 and13 primewij 13 are13 the13 normalized13 weights13 plotted13 in13 Fig13 7b13

13 Evaluating13 the13 effect13 of13 surface13 slant13 Slant13 causes13 disparity13 to13 change13 across13 the13 patch13 We13 evaluated13 the13 effect13 of13 surface13 slant13 by13 determining13 the13 optimal13 filters13 and13 estimation13 performance13 for13 a13 training13 set13 with13 a13 distribution13 of13 slants13 If13 we13 assume13 (plausibly)13 that13 objects13 are13 on13 average13 equally13 likely13 to13 be13 viewed13 from13 any13 given13 angle13 then13 for13 a13 given13 small13 angular13 retinal13 extent13 the13 probability13 of13 a13 given13 slant13 about13 the13 vertical13 axis13 should13 fall13 as13 a13 half-shy‐cosine13 function13 Slant13 was13 calculated13 for13 the13 cyclopean13 eye13 To13 evaluate13 the13 effect13 of13 surface13 slant13 variation13 we13 generated13 a13 training13 set13 of13 stereo-shy‐images13 from13 a13 set13 of13 surfaces13 with13 slants13 that13 were13 randomly13 drawn13 from13 a13 truncated13 (+7113 deg)13 cosine13 probability13 density13 (more13 extreme13 slants13 were13 not13 practical13 to13 include13 because13 of13 the13 surface13 size13 required13 to13 produce13 a13 projected13 patch13 width13 of13 113 deg)13 The13 within-shy‐patch13 changes13 in13 disparity13 that13 were13 introduced13 by13 these13 slants13 are13 comparable13 to13 those13 in13 natural13 viewing13 (Fig13 S1g)13 Results13 based13 on13 this13 analysis13 are13 thus13 likely13 to13 be13 representative13 Then13 we13 determined13 the13 optimal13 binocular13 filters13 for13 this13 new13 training13 set13 The13 filters13 that13 were13 trained13 on13 surfaces13 having13 different13 slants13 (Fig13 S1a-shy‐c)13 and13 the13 LL13 neurons13 that13 are13 constructed13 from13 them13 (Fig13 S1de)13 are13 quite13 similar13 to13 those13 that13 were13 trained13 only13 on13 fronto-shy‐parallel13 surfaces13 The13 main13 (although13 slight)13 differences13 are13 that13 the13 new13 filters13 have13 spatial13 extents13 that13 are13 slightly13 smaller13 and13 octave13 bandwidths13 that13 are13 slightly13 higher13

413 13

and13 that13 the13 new13 LL13 neuron13 tuning13 curves13 are13 broader13 Disparity13 estimation13 performance13 on13 slant13 surfaces13 is13 also13 comparable13 but13 precision13 is13 somewhat13 reduced13 (Fig13 S1f13 Fig13 5d)13 When13 disparity13 changes13 across13 the13 patch13 the13 optimal13 encoding13 filters13 must13 be13 small13 enough13 to13 prevent13 the13 disparity13 signal13 from13 being13 lsquosmeared13 outrsquo13 by13 within-shy‐patch13 (ie13 within-shy‐receptive13 field)13 disparity13 variation13 And13 they13 must13 be13 large13 enough13 to13 prevent13 a13 poor13 signal-shy‐to-shy‐noise13 ratio13 which13 would13 lead13 to13 a13 preponderance13 of13 false13 matches13 (ie13 inaccurate13 disparity13 estimates)13 The13 optimal13 filters13 appear13 to13 strike13 this13 balance13 13 13 Evaluating13 the13 effect13 of13 depth13 variations13 other13 than13 slant13 Naturally13 occurring13 fine13 depth13 structure13 and13 occlusions13 (ie13 within-shy‐patch13 depth13 variation13 and13 depth13 discontinuities13 Fig13 S8a-shy‐c)13 were13 not13 present13 in13 either13 of13 our13 training13 sets13 To13 determine13 how13 the13 magnitude13 of13 these13 other13 sources13 of13 depth13 variation13 compare13 to13 that13 due13 to13 slant13 we13 analyzed13 4000013 113 deg13 patches13 from13 3013 outdoor13 range13 images13 obtained13 with13 a13 high-shy‐precision13 Riegl13 VZ40013 range13 scanner13 For13 each13 patch13 we13 computed13 the13 standard13 deviation13 of13 i)13 the13 differences13 between13 the13 raw13 range13 values13 and13 the13 best-shy‐fitting13 fronto-shy‐parallel13 plane13

σ nonminus planar 13 and13 ii)13 the13 differences13

between13 the13 best-shy‐fitting13 slanted13 and13 fronto-shy‐parallel13 planes13 σ planar 13 The13 ratio13 of13 these13 standard13

deviations13 σ planar σ nonminus planar 13 provides13 a13 measure13 of13 the13 relative13 magnitude13 of13 depth13 variation13 due13 to13

slant13 and13 variation13 due13 to13 other13 factors13 If13 the13 depth13 variation13 is13 due13 only13 to13 surface13 slant13 then13 the13 ratio13

σ planar σ nonminus planar 13 will13 have13 a13 value13 equal13 to13 1013 If13 all13 of13 the13 depth13 variation13 is13 due13 to13 factors13

other13 than13 slant13 the13 ratio13 will13 have13 a13 value13 of13 0013 Fig13 S8d13 plots13 the13 distribution13 of13 this13 ratio13 for13 all13 range13 patches13 In13 the13 most13 common13 patches13 essentially13 all13 the13 variation13 is13 captured13 by13 slant13 (ie13 the13 modal13 ratio13 is13 ~10)13 The13 median13 ratio13 is13 07513 indicating13 that13 in13 the13 median13 patch13 7513 of13 the13 depth13 variation13 is13 captured13 by13 slant13 Given13 that13 the13 optimal13 binocular13 filters13 are13 robust13 to13 variation13 in13 surface13 slant13 and13 given13 that13 most13 of13 natural13 depth13 variation13 is13 captured13 by13 slant13 it13 is13 unlikely13 that13 depth13 variation13 other13 than13 slant13 will13 qualitatively13 affect13 our13 results13 13 External13 variability13 and13 neural13 noise13 13 We13 examined13 the13 relative13 magnitude13 of13 stimulus-shy‐induced13 LL13 neuron13 variability13 and13 typical13 neural13 noise13 in13 cortex13 Typical13 cortical13 neurons13 have13 a13 peak13 mean13 response13 larger13 than13 3013 spikessec13 (Geisler13 amp13 Albrecht13 1997)13 and13 intrinsic13 noise13 variance13 proportional13 to13 the13 neuronrsquos13 mean13 response13 with13 a13 proportionality13 constant13 (ie13 Fano13 factor)13 less13 than13 15(Tolhurst13 Movshon13 amp13 Dean13 1983)13 To13 err13 on13 the13 side13 of13 too13 much13 neural13 noise13 we13 assume13 that13 each13 LL13 neuron13 has13 a13 maximum13 average13 response13 of13 3013 spkss13 and13 a13 Fano13 factor13 of13 1513 (bc13 the13 effect13 of13 noise13 is13 greater13 at13 lower13 firing13 rates)13 We13 convert13 the13 LL13 neuron13 responses13 to13 spike13 rates13 by13 first13 obtaining13 the13 average13 response13 of13 each13 LL13 neuron13 to13 its13 preferred13 disparity13 and13 then13 scaling13 that13 value13 to13 the13 maximum13 average13 response13 For13 an13 integration13 time13 of13 a13 typical13 fixation13 duration13 (30013 ms)13 stimulus-shy‐induced13 response13 variance13 was13 ~3x13 greater13 than13 the13 internal13 noise13 For13 an13 integration13 time13 equal13 to13 a13 typical13 stimulus13 presentation13 in13 a13 psychophysical13 task13 (113 sec)13 13 stimulus-shy‐induced13 response13 variance13 was13 ~9x13 greater13 than13 the13 internal13 noise13 Thus13 external13 variability13 must13 be13 considered13 to13 understand13 the13 influence13 of13 neural13 noise13 on13 the13 behavioral13 limits13 of13 vision13 under13 natural13 conditions13 In13 fact13 stimulus-shy‐induced13 variability13 is13 likely13 to13 be13 the13 controlling13 variability13 in13 many13 natural13 single-shy‐trial13 tasks13 13 Spatial13 frequency13 tuning13 of13 optimal13 linear13 binocular13 filters13 13 To13 develop13 an13 intuition13 for13 why13 spatial13 frequencies13 higher13 than13 ~613 cpd13 carry13 little13 information13 about13 disparity13 we13 examined13 the13 disparity13 signals13 resulting13 from13 a13 binocularly-shy‐viewed13 high-shy‐contrast13 cosine-shy‐windowed13 edge13 The13 Fourier-shy‐decomposition13 of13 each13 eyersquos13 stimulus13 is13 a13 set13 of13

513 13

phase-shy‐aligned13 sinewave13 gratings13 with13 the13 1f13 contrast13 fall-shy‐off13 typical13 of13 natural13 scenes(Field13 1987)13 The13 binocular13 difference13 signal13 (ie13 the13 difference13 of13 the13 left13 and13 right13 signals)13 is13 sinusoidal13 with13 contrast13 amplitude13 given13 by13 13

AB f δ k( ) = AL( f )2 + AR( f )2 minus 2AL( f )AR( f )cos 2π fδ k( ) 13 13 13 (S6)13

13 where f 13 is13 the13 spatial13 frequency13 δ k 13 is13 a13 particular13 disparity13 AL ( f ) 13 and13 AR( f ) 13 are13 the13 left13 and13

right13 eye13 retinal13 amplitudes13 and13 AB f δ k( ) 13 is13 the13 amplitude13 of13 the13 binocular13 contrast13 difference13 signal13 If13 the13 two13 eyes13 have13 identical13 optics13 the13 left-shy‐13 and13 right-shy‐eye13 retinal13 amplitudes13 (which13 include13 the13 effects13 of13 the13 optics13 and13 the13 1 f 13 falloff13 of13 natural13 image13 spectra)13 will13 be13 the13 same13 for13 a13 given13 disparity13 Fig13 S7a13 plots13 pattern13 of13 binocular13 contrast13 signals13 for13 each13 of13 several13 disparities13 Fig13 S7b13 shows13 the13 signals13 after13 filtering13 with13 a13 bank13 of13 log-shy‐Gabor13 filters13 each13 having13 a13 1513 octave13 bandwidth13 The13 patterns13 of13 binocular13 contrast13 barely13 differ13 above13 ~613 cpd13 A13 filter13 population13 (for13 disparity13 estimation)13 will13 be13 most13 efficient13 if13 each13 filter13 signals13 information13 about13 a13 range13 of13 disparities13 13 13 V113 binocular13 receptive13 fields13 tuned13 to13 the13 highest13 useful13 spatial13 frequency13 (~613 cpd)13 would13 have13 a13 spatial13 extent13 of13 ~813 arcmin13 assuming13 the13 typical13 1513 octave13 bandwidth13 of13 neurons13 in13 cortex13 Given13 that13 receptive13 fields13 cannot13 signal13 disparity13 variation13 finer13 than13 their13 own13 width13 this13 analysis13 may13 help13 explain13 why13 humans13 lose13 the13 ability13 to13 detect13 disparity13 modulations13 (ie13 sinusoidal13 modulations13 in13 disparity-shy‐defined13 depth)13 at13 spatial13 frequencies13 higher13 than13 ~413 cpd13 (Banks13 Gepshtein13 amp13 Landy13 200413 Harris13 McKee13 amp13 Smallman13 1997)13 Vergence13 eye13 movement13 jitter13 has13 previously13 been13 proposed13 as13 an13 explanation13 for13 why13 useful13 binocular13 information13 is13 restricted13 to13 low13 spatial13 frequencies13 (Vlaskamp13 Yoon13 amp13 Banks13 2011)13 vergence13 noise13 can13 degrade13 stereo13 information13 enough13 to13 render13 high13 frequency13 signals13 unmeasurable13 (Fig13 S7cd)13 The13 analysis13 presented13 here13 suggests13 that13 the13 frequency13 at13 which13 humans13 lose13 the13 ability13 to13 detect13 disparity13 modulation13 may13 simply13 be13 explained13 by13 the13 statistics13 of13 natural13 images13 13 Optics13 The13 retinal13 images13 used13 in13 our13 analysis13 were13 simulated13 to13 match13 the13 retinal13 image13 formation13 that13 occurs13 in13 the13 human13 eye13 In13 humans13 and13 non-shy‐human13 primates13 accommodative13 and13 vergence13 eye-shy‐movement13 systems13 are13 yoked(Fincham13 amp13 Walton13 1957)13 Consistent13 with13 this13 fact13 we13 set13 the13 focus13 distance13 equal13 to13 the13 fixation13 distance13 of13 4013 cm13 and13 defocused13 the13 images13 appropriate13 for13 each13 disparity13 The13 image13 of13 a13 surface13 displaced13 from13 the13 fixation13 is13 defocused13 by13

13 ΔD cong δ IPD 13 13 13 13 13 13 13 13 13 13 (S7)13

13 where13 δ 13 is13 the13 disparity13 of13 the13 surface13 expressed13 in13 radians13 IPD13 is13 the13 inter-shy‐pupillary13 distance13 expressed13 in13 meters13 and13 ΔD 13 is13 the13 defocus13 expressed13 in13 diopters13 (1meters)13 To13 account13 for13 the13 effect13 of13 chromatic13 aberration13 we13 created13 a13 different13 polychromatic13 point-shy‐spread13 function13 (PSF)13 for13 each13 disparity13 (ie13 defocus13 Eq13 S7)13 Single-shy‐wavelength13 PSFs13 were13 computed13 every13 1013 nm13 between13 40013 and13 70013 nm13 The13 wavelength-shy‐dependent13 change13 in13 refractive13 power13 of13 the13 human13 eye13 was13 taken13 from13 the13 literature13 (Thibos13 Ye13 Zhang13 amp13 Bradley13 1992)13 Polychromatic13 PSFs13 were13 obtained13 by13 weighting13 the13 single-shy‐wavelength13 PSFs13 by13 the13 photopic13 sensitivity13 function13 (Stockman13 amp13 Sharpe13 2000)13 and13 by13 the13 D6513 daylight13 illumination13 spectrum13 and13 then13 summing13 13

613 13

13 psf photopic xΔD( ) = 1

Kpsf xλΔD( )sc

λsum λ( )D65 λ( ) 13 13 13 13 (S8)13

13 where13 sc λ( ) 13 is13 the13 human13 photopic13 sensitivity13 function13 and13 K13 is13 a13 normalizing13 constant13 that13 sets13 the13 volume13 of13 psf photopic 13 to13 1013 The13 modulation13 transfer13 function13 (MTF)13 is13 the13 amplitude13 of13 the13 Fourier13 transform13 of13 the13 PSF13 13 Contrast13 normalization13 Contrast13 normalization13 occurs13 early13 in13 visual13 processing13 The13 standard13 model13 of13 cortical13 neuron13 response13 assumes13 that13 the13 input13 is13 the13 contrast13 signal13

c x( ) = r x( )minus r( ) r 13 where13 r x( ) 13 are13 the13

noisy13 sensor13 responses13 to13 the13 luminance13 image13 falling13 on13 the13 retina13 over13 a13 local13 area13 and13 r 13 is13 the13 local13 mean13 This13 contrast13 signal13 is13 then13 normalized13 by13 the13 local13 contrast13

cnorm x( ) = c x( ) c x( ) 2

+ nc502 13 where13 n13 is13 the13 dimensionality13 of13 the13 vector13 c x( ) 2 13 is13 the13 contrast13

energy13 (or13 equivalently13 n 13 times13 the13 squared13 RMS13 contrast)13 and13 50c 13 is13 the13 half-shy‐saturation13 constant13 (Albrecht13 amp13 Geisler13 199113 Albrecht13 amp13 Hamilton13 198213 Heeger13 1992)13 (We13 are13 agnostic13 about13 the13 mechanism13 by13 which13 the13 contrast13 normalization13 is13 achieved)13 Finally13 the13 normalized13 contrast13 signal13 is13 weighted13 by13 the13 cortical13 neuronrsquos13 receptive13 field13 and13 passed13 through13 a13 static13 non-shy‐linearity13 For13 example13 in13 the13 case13 of13 a13 binocular13 simple13 cell13 the13 response13 of13 the13 neuron13 is13 obtained13 by13 taking13 the13 dot13 product13 of13 the13 contrast-shy‐normalized13 signal13 with13 the13 receptive13 field13 and13 half-shy‐wave13 rectifying13

Ri = Rmax cnorm x( ) sdot fi x( )⎢⎣ ⎥⎦ 13 13

13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13

713 13

13 References13 Albrecht13 D13 G13 amp13 Geisler13 W13 S13 (1991)13 Motion13 selectivity13 and13 the13 contrast-shy‐response13 function13 of13

simple13 cells13 in13 the13 visual13 cortex13 Visual13 Neuroscience13 7(6)13 531ndash54613 Albrecht13 D13 G13 amp13 Hamilton13 D13 B13 (1982)13 Striate13 cortex13 of13 monkey13 and13 cat13 contrast13 response13

function13 Journal13 of13 Neurophysiology13 48(1)13 217ndash23713 Banks13 M13 S13 Gepshtein13 S13 amp13 Landy13 M13 S13 (2004)13 Why13 is13 spatial13 stereoresolution13 so13 low13 The13

Journal13 of13 Neuroscience13 24(9)13 2077ndash208913 13 Field13 D13 J13 (1987)13 Relations13 between13 the13 statistics13 of13 natural13 images13 and13 the13 response13 properties13 of13

cortical13 cells13 Journal13 of13 the13 Optical13 Society13 of13 America13 A13 4(12)13 2379ndash239413 Fincham13 E13 amp13 Walton13 J13 (1957)13 The13 reciprocal13 actions13 of13 accommodation13 and13 convergence13 The13

Journal13 of13 Physiology13 137(3)13 488ndash50813 Geisler13 W13 S13 amp13 Albrecht13 D13 G13 (1997)13 Visual13 cortex13 neurons13 in13 monkeys13 and13 cats13 detection13

discrimination13 and13 identification13 Visual13 Neuroscience13 14(5)13 897ndash91913 Harris13 J13 M13 McKee13 S13 P13 amp13 Smallman13 H13 S13 (1997)13 Fine-shy‐scale13 processing13 in13 human13 binocular13

stereopsis13 Journal13 of13 the13 Optical13 Society13 of13 America13 A13 14(8)13 1673ndash168313 Heeger13 D13 J13 (1992)13 Normalization13 of13 cell13 responses13 in13 cat13 striate13 cortex13 Visual13 Neuroscience13 9(2)13

181ndash19713 Hibbard13 P13 B13 (2008)13 Binocular13 energy13 responses13 to13 natural13 images13 Vision13 Research13 48(12)13 1427ndash

143913 13 Stockman13 A13 amp13 Sharpe13 L13 T13 (2000)13 The13 spectral13 sensitivities13 of13 the13 middle-shy‐13 and13 long-shy‐wavelength-shy‐

sensitive13 cones13 derived13 from13 measurements13 in13 observers13 of13 known13 genotype13 Vision13 Research13 40(13)13 1711ndash173713

Thibos13 L13 N13 Ye13 M13 Zhang13 X13 amp13 Bradley13 A13 (1992)13 The13 chromatic13 eye13 a13 new13 reduced-shy‐eye13 model13 of13 ocular13 chromatic13 aberration13 in13 humans13 Applied13 Optics13 31(19)13 3594ndash360013

Tolhurst13 D13 J13 Movshon13 J13 A13 amp13 Dean13 A13 F13 (1983)13 The13 statistical13 reliability13 of13 signals13 in13 single13 neurons13 in13 cat13 and13 monkey13 visual13 cortex13 Vision13 Research13 23(8)13 775ndash78513

Vlaskamp13 B13 N13 S13 Yoon13 G13 amp13 Banks13 M13 S13 (2011)13 Human13 stereopsis13 is13 not13 limited13 by13 the13 optics13 of13 the13 well-shy‐focused13 eye13 The13 Journal13 of13 Neuroscience13 31(27)13 9814ndash981813

13

813 13

13 Figure13 S113 Optimal13 filters13 and13 disparity13 estimation13 performance13 for13 estimating13 disparity13 for13 surfaces13 with13 a13 cosine13 distribution13 of13 slants13 (a)13 Optimal13 linear13 binocular13 filters13 for13 estimating13 the13 disparity13 (of13 the13 center13 pixel)13 of13 slanted13 surfaces13 The13 filters13 were13 learned13 on13 a13 training13 set13 having13 surfaces13 with13 a13 cosine13 distribution13 of13 slants13 The13 optimal13 filter13 shapes13 are13 robust13 to13 the13 presence13 of13 non-shy‐zero13 surface13 slants13 in13 the13 training13 set13 However13 these13 filters13 are13 of13 somewhat13 smaller13 spatial13 extent13 and13 slightly13 higher13 octave13 bandwidth13 (compare13 filters13 F3-shy‐F513 to13 those13 in13 Fig13 3)13 (b)13 Spatial13 frequency13 tuning13 and13 bandwidth13 of13 filters13 (c)13 Two-shy‐dimensional13 analogs13 of13 filters13 in13 (a)13 The13 two-shy‐dimensional13 filters13 respond13 virtually13 identically13 to13 two-shy‐dimensional13 images13 as13 do13 the13 one-shy‐dimensional13 filters13 to13 one-shy‐dimensional13 signals13 (d)13 Tuning13 curves13 of13 LL13 neurons13 trained13 and13 tested13 on13 surfaces13 with13 varying13 surface13 slant13 (e)13 Normalized13 weights13 on13 model13 complex13 cell13 responses13 for13 constructing13 five13 LL13 neurons13 with13 the13 indicated13 preferred13 disparities13 (f)13 Disparity13 estimation13 performance13 with13 surfaces13 having13 a13 cosine13 distribution13 of13 slants13 (black13 line)13 Also13 shown13 are13 disparity13 estimates13 derived13 from13 a13 local13 cross-shy‐correlator13 (gray13 line)13 (g)13 Disparity13 gradients13 from13 viewing13 natural13 scenes(Hibbard13 2008)13 vs13 disparity13 gradients13 in13 our13 training13 set13 The13 relationship13 between13 slant13 and13 disparity13 gradient13 is13 given13 by13

g asymp IPD tanθ( ) d 13 where13 θ 13 is13 the13 surface13 slant13 and13 d13 is13 the13 distance13 to13 the13 surface13 Hibbard13 (2008)13 estimated13 disparity13 gradients13 in13 natural13 viewing13 for13 distances13 nearer13 than13 3513 m13 (dashed13 curve)13 and13 0513 m13 (dotted13 curve)13 At13 the13 04m13 viewing13 distance13 assumed13 throughout13 the13 paper13 our13 manipulation13 of13 surface13 slant13 introduced13 distribution13 of13 disparity13 gradients13 (black13 line)13 that13 is13 comparable13 to13 the13 gradients13 introduced13 in13 natural13 viewing13 13

913 13

13 Figure13 S213 Filter13 responses13 LL13 neuron13 responses13 and13 performance13 with13 random-shy‐dot13 stereograms13 and13 anti-shy‐correlated13 random-shy‐dot13 stereograms13 (a)13 Joint13 response13 distributions13 of13 filters13 F113 and13 F213 when13 stimulated13 with13 random-shy‐dot13 stereograms13 Compare13 to13 natural13 stimuli13 in13 Fig13 4a13 (b)13 Joint13 responses13 of13 filters13 F313 and13 F413 when13 stimulated13 with13 random-shy‐dot13 stereograms13 (c)13 LL13 neuron13 responses13 to13 random13 dot13 stereograms13 (d)13 LL13 neuron13 tuning13 vs13 bandwidth13 when13 stimulated13 with13 natural13 stimuli13 Note13 that13 bandwidths13 are13 elevated13 compared13 to13 when13 the13 LL13 neurons13 are13 stimulated13 with13 natural13 stimuli13 (see13 Fig13 7c)13 (e)13 Disparity13 estimates13 and13 estimate13 precision13 (inset)13 obtained13 with13 random-shy‐dot13 stereograms13 (fghi)13 Same13 as13 abce13 except13 that13 the13 stimuli13 were13 anti-shy‐correlated13 random13 dot13 stereograms13 Filter13 response13 distributions13 still13 segregate13 as13 a13 function13 of13 disparity13 (especially13 with13 F313 and13 F4)13 However13 the13 anti-shy‐correlated13 filter13 response13 distributions13 do13 not13 coincide13 with13 the13 lsquoexpectedrsquo13 pattern13 of13 response13 for13 particular13 disparities13 The13 disparities13 are13 decoded13 improperly13 with13 a13 decoder13 trained13 on13 natural13 stimuli13 LL13 neuron13 tuning13 curves13 with13 anti-shy‐correlated13 stereograms13 are13 neither13 selective13 nor13 invariant13 Inaccurate13 highly13 variable13 estimates13 result13 13 13

13 13

1013 13

13 Figure13 S313 Obtaining13 AMA13 filter13 and13 model13 complex13 cell13 responses13 with13 established13 neural13 operations13 (a)13 Simple13 cell13 responses13 are13 obtained13 by13 projecting13 the13 left-shy‐13 and13 right-shy‐eye13 signals13 onto13 each13 binocular13 filter13 and13 half-shy‐wave13 rectifying13 Subtracting13 lsquoonrsquo13 and13 lsquooffrsquo13 simple13 cell13 responses13 give13 the13 AMA13 filter13 responses13 (see13 Fig13 3a)13 (b)13 Model13 (and13 neurophysiological)13 complex13 cell13 responses13 are13 obtained13 by13 summing13 and13 squaring13 the13 responses13 of13 simple13 cells13 (Model13 complex13 cell13 responses13 can13 also13 be13 obtained13 by13 summing13 the13 responses13 of13 simple13 cells13 that13 have13 both13 a13 half-shy‐wave13 rectifying13 and13 a13 squaring13 nonlinearity)13 Note13 that13 the13 model13 complex13 cells13 differ13 from13 canonical13 disparity13 energy13 binocular13 complex13 cells13 We13 label13 them13 lsquocomplexrsquo13 because13 they13 exhibit13 temporal13 frequency13 doubling13 and13 have13 a13 fundamental13 to13 DC13 response13 ratio13 of13 less13 than13 one13 The13 model13 LL13 neurons13 (see13 Fig13 613 and13 Fig13 7)13 are13 more13 similar13 to13 disparity13 energy13 complex13 cells13 although13 as13 noted13 in13 the13 text13 there13 are13 also13 important13 differences13

13

1113 13

13 Figure13 S413 The13 eight13 AMA13 filters13 from13 Fig13 313 (on13 the13 diagonal)13 and13 their13 pairwise13 sums13 (see13 Fig13 S3a)13 Solid13 and13 dashed13 curves13 indicate13 the13 left13 and13 right13 eye13 filter13 components13 respectively13 Filter13 identity13 is13 indicated13 by13 the13 subscripts13 Fij13 Some13 filters13 have13 shapes13 similar13 to13 canonical13 V1-shy‐like13 binocular13 receptive13 fields13 These13 filters13 yield13 model13 complex13 cell13 tuning13 curves13 that13 vary13 with13 disparity13 (Fig13 S5)13 Other13 filters13 are13 irregularly13 shaped13 (eg13 F36)13 Irregularly13 shaped13 filters13 yield13 model13 complex13 cell13 tuning13 curves13 that13 do13 not13 vary13 significantly13 with13 disparity13 (Fig13 S5)13 The13 inset13 shows13 the13 largest13 absolute13 normalized13 weight13 (see13 Supplement13 Eqs13 S1a-shy‐d)13 given13 to13 each13 filter13 in13 constructing13 the13 set13 of13 LL13 neurons13 (Fig13 7a)13 Lighter13 colors13 indicate13 smaller13 weights13 Unsurprisingly13 the13 irregularly13 shaped13 filters13 (eg13 F36)13 play13 a13 generally13 smaller13 role13 in13 shaping13 the13 response13 properties13 of13 log-shy‐likelihood13 (LL)13 neurons13 13 13

1213 13

13 Figure13 S513 Disparity13 tuning13 curves13 of13 the13 model13 complex13 cells13 implied13 by13 the13 AMA13 filters13 (on-shy‐diagonal)13 and13 their13 normalized13 pairwise13 sums13 (off-shy‐diagonal)13 Solid13 curves13 show13 the13 complex13 cell13 tuning13 curves13 to13 natural13 stimuli13 The13

complex13 response13 is13 given13 by13 filtering13 the13 contrast-shy‐normalized13 stimulus13 and13 then13 squaring13 Rmax fi

Tcnorm( )2 13 (see13 Fig13 S4)13 Each13 point13 on13 the13 curve13 represents13 the13 average13 response13 to13 a13 large13 number13 of13 natural13 stimuli13 having13 the13 same13 disparity13 Gray13 area13 shows13 response13 variability13 due13 to13 variation13 in13 natural13 image13 content13 Dashed13 curves13 show13 the13 complex13 cell13 tuning13 curve13 to13 513 random-shy‐dot13 stereograms13 (RDSs)13 In13 most13 cases13 RDS13 tuning13 curves13 are13 similar13 to13 the13 tuning13 curves13 for13 natural13 stimuli13 although13 response13 magnitude13 is13 generally13 somewhat13 attenuated13 These13 individual13 model13 complex13 cells13 have13 some13 selectivity13 for13 disparity13 However13 they13 exhibit13 little13 invariance13 to13 variation13 in13 natural13 image13 content13 Thus13 the13 pattern13 of13 the13 joint13 population13 response13 must13 be13 used13 to13 accurately13 estimate13 disparity13 Neurons13 that13 signal13 the13 log-shy‐likelihood13 of13 disparity13 (LL13 neurons)13 can13 be13 constructed13 by13 appropriately13 combining13 the13 joint13 responses13 of13 model13 complex13 cells13 LL13 neurons13 are13 highly13 selective13 for13 disparity13 and13 are13 also13 strongly13 response13 invariant13 to13 irrelevant13

1313 13

13 Figure13 S613 Optimal13 disparity13 estimation13 vs13 estimation13 via13 local13 cross-shy‐correlation13 Performance13 measures13 were13 obtained13 by13 testing13 each13 method13 on13 identical13 sets13 of13 left-shy‐13 and13 right-shy‐eye13 signals13 like13 those13 shown13 in13 Fig13 1d13 (Note13 that13 this13 differs13 somewhat13 from13 the13 most13 common13 test13 of13 the13 cross-shy‐correlation13 model13 which13 gives13 the13 algorithm13 windowed13 access13 to13 many13 patches13 in13 each13 eye13 We13 chose13 the13 current13 method13 so13 that13 each13 algorithm13 has13 access13 to13 the13 same13 image13 information13 thereby13 making13 the13 comparison13 meaningful)13 Black13 curves13 indicate13 estimation13 performance13 via13 local13 cross-shy‐correlation13 Gray13 curves13 indicate13 performance13 obtained13 with13 the13 eight13 AMA13 filters13 shown13 in13 the13 main13 text13 (data13 replotted13 from13 Fig13 5)13 (a)13 Median13 MAP13 estimates13 for13 frontoparallel13 surfaces13 only13 Error13 bars13 indicate13 6813 confidence13 intervals13 (b)13 Estimate13 precision13 as13 indicated13 by13 6813 confidence13 intervals13 Error13 bars13 from13 (a)13 are13 replotted13 on13 a13 linear13 axis13 (Confidence13 intervals13 were13 plotted13 on13 a13 semi-shy‐log13 plot13 in13 the13 main13 text)13 (c)13 Sign13 identification13 performance13 with13 frontoparallel13 surfaces13 (de)13 Same13 as13 bc13 except13 that13 the13 performance13 is13 for13 surfaces13 with13 a13 cosine13 distribution13 of13 slants13 Estimation13 performance13 with13 the13 cross-shy‐correlator13 is13 (slightly)13 more13 biased13 and13 is13 significantly13 less13 precise13 at13 disparities13 larger13 than13 7513 arcmin13 13

1413 13

13 Figure13 S713 Information13 for13 disparity13 estimation13 is13 concentrated13 in13 low13 to13 mid13 spatial13 frequencies13 (a)13 Inter-shy‐ocular13 retinal13 contrast13 amplitude13 of13 a13 cosine-shy‐windowed13 high-shy‐contrast13 edge13 as13 a13 function13 of13 spatial13 frequency13 for13 different13 disparities13 (Eq13 S6)13 The13 optics13 in13 the13 two13 eyes13 were13 identical13 ie13

AL f( ) = AR f( )prop MTFδ f( ) f 13 where13 MTFδ is13 the13 modulation13 transfer13 function13 associated13 with13 a13 particular13 disparity13 (see13 Methods)13 (b)13 Inter-shy‐ocular13 contrast13 signals13 that13 have13 been13 passed13 through13 a13 bank13 of13 1513 octave13 bandwidth13 log-shy‐Gabor13 filters13 The13 signals13 differ13 little13 above13 ~513 cpd13 (c)13 The13 effect13 of13 vergence13 jitter13 on13 inter-shy‐ocular13 contrast13 signals13 Signals13 were13 corrupted13 by13 213 arcmin13 of13 vergence13 jitter13 a13 typical13 value13 in13 humans13 Vergence13 noise13 was13 simulated13 by13 convolving13 each13 disparity13 signal13 with13 a13 Gaussian13 having13 a13 standard13 deviation13 of13 213 arcmin13 Vergence13 jitter13 further13 attenuates13 high13 frequency13 inter-shy‐ocular13 contrast13 signals13 Natural13 image13 structure13 and13 vergence13 jitter13 both13 contribute13 to13 the13 lack13 of13 disparity13 information13 at13 high13 luminance13 spatial13 frequencies13 (d)13 The13 signals13 in13 (c)13 after13 having13 been13 passed13 through13 a13 bank13 of13 1513 octave13 bandwidth13 log-shy‐Gabor13 filters13 (e)13 The13 effect13 of13 a13 four13 diopter13 difference13 in13 refractive13 power13 between13 the13 two13 eyes13 drastically13 reduces13 the13 inter-shy‐ocular13 contrast13 signal13 A13 four13 diopter13 (or13 larger)13 difference13 between13 the13 left13 and13 right13 optics13 is13 a13 primary13 risk13 factor13 for13 the13 development13 of13 amblyopia13 (Levi13 et13 al13 2011)13 Note13 that13 these13 signals13 were13 simulated13 for13 eyes13 with13 2mm13 pupils13 With13 larger13 pupil13 diameters13 the13 same13 difference13 in13 refractive13 power13 would13 further13 reduce13 available13 disparity13 signals13 For13 example13 with13 a13 2mm13 pupil13 an13 813 diopter13 difference13 between13 the13 eyes13 produces13 the13 same13 difference13 in13 defocus13 blur13 as13 would13 a13 4mm13 pupil13 with13 a13 413 diopters13 difference13 between13 the13 eyes13 In13 that13 case13 there13 are13 no13 measureable13 signals13 above13 113 cpd13 (f)13 The13 signals13 in13 (e)13 after13 having13 been13 passed13 through13 a13 bank13 of13 1513 octave13 bandwidth13 log-shy‐Gabor13 filters13 (g)13 Same13 as13 (e)13 but13 with13 a13 4mm13 pupil13 (h)13 Same13 as13 (f)13 but13 with13 a13 4mm13 pupil13 13

13

1513 13

13 Figure13 S813 Depth13 structure13 not13 attributable13 to13 slant13 in13 natural13 scenes13 (a)13 A13 representative13 natural13 range13 image13 (b)13 Overhead13 plan13 view13 of13 the13 range13 along13 the13 white13 scan-shy‐line13 in13 (a)13 (c)13 Range13 values13 within13 a13 113 deg13 patch13 marked13 by13 the13 white13 box13 in13 (a)13 and13 the13 black13 box13 in13 (b)13 The13 dots13 represent13 the13 raw13 range13 values13 The13 plane13 is13 the13 best13 fitting13 plane13 to13 the13 range13 data13 (d)13 Depth13 variation13 in13 113 deg13 patches13 of13 natural13 scenes13 can13 largely13 be13 captured13 by13 slanted13 planes13 (see13 supplementary13 text)13 Values13 near13 one13 indicate13 that13 all13 range13 variation13 can13 be13 captured13 by13 (planar)13 surface13 slant13 Values13 near13 zero13 indicate13 that13 none13 of13 the13 range13 variation13 can13 be13 captured13 by13 planar13 surface13 slant13 In13 the13 most13 common13 patch13 nearly13 all13 depth13 variation13 is13 attributable13 to13 slant13 In13 the13 median13 patch13 7513 of13 depth13 variation13 is13 captured13 by13 slant13 13 13 13 13 13 13 13 13 13 13

1613 13

13 Figure13 S913 Contrast13 normalization13 and13 the13 Gaussian13 form13 of13 the13 filter13 response13 distributions13 The13 standard13 model13 of13

contrast13 normalization13 in13 retinal13 and13 cortical13 neurons13 is13 given13 by13 cnorm x( ) = c x( ) c x( ) 2

+ nc502 (Albrecht13 amp13 Geisler13

199113 Albrecht13 amp13 Hamilton13 198213 Heeger13 1992)13 Throughout13 the13 paper13 we13 assumed13 that13 c50 13 had13 a13 value13 of13 zero13 To13 examine13 the13 effect13 of13 different13 values13 of13 c50 13 on13 the13 form13 of13 the13 filter13 response13 distributions13 we13 first13 contrast-shy‐normalized13 the13 test13 stimuli13 using13 different13 values13 of13 c50 13 Then13 we13 projected13 the13 contrast-shy‐normalized13 the13 test13 stimuli13 from13 each13 disparity13 level13 onto13 the13 filters13 to13 obtain13 the13 conditional13 filter13 response13 distributions13 Finally13 we13 fit13 a13 multi-shy‐dimensional13 generalized13 Gaussian13 (one-shy‐dimension13 for13 each13 filter)13 to13 each13 conditional13 response13 distribution13

P R |δ k( ) 13 via13 maximum13 likelihood13 estimation13 (a)13 When13 c50 =13 0013 the13 conditional13 response13 distributions13 have13 tails13 that13 tend13 to13 be13 somewhat13 lighter13 than13 Gaussians13 (ie13 lower13 kurtosis13 than13 a13 Gaussian)13 When13 c50 =13 0113 the13 conditional13 response13 distributions13 are13 most13 Gaussian13 on13 average13 For13 c50 0113 the13 distributions13 have13 heavier13 tails13 than13 Gaussians13 (ie13 higher13 kurtosis13 than13 a13 Gaussian)13 (b)13 Powers13 of13 the13 multi-shy‐dimensional13 generalized13 Gaussians13 that13 best13 fit13 a13 subset13 of13 the13 two-shy‐dimensional13 conditional13 filter13 response13 distributions13 A13 power13 of13 two13 (dashed13 line)13 indicates13 that13 the13 best13 fitting13 distribution13 was13 Gaussian13 Powers13 greater13 than13 or13 less13 than13 two13 indicate13 that13 the13 responses13 are13 better13 fit13 by13 distributions13 having13 lighter13 or13 heavier13 tails13 than13 a13 Gaussian13 respectively13 The13 best-shy‐fitting13 powers13 are13 shown13 for13 three13 different13 values13 of13 c50 13 00013 01013 and13 02513 The13 average13 powers13 across13 the13 conditional13 response13 distributions13 for13 the13 three13 tested13 values13 of13 c50 13 were13 27913 20713 and13 11713 respectively13 The13 same13 qualitative13 results13 hold13 for13 the13 other13 two-shy‐dimensional13 response13 distributions13 The13 pattern13 also13 holds13 for13 higher-shy‐dimensional13 filter13 response13 distributions13 In13 higher13 dimensions13 however13 results13 are13 noisier13 because13 stable13 estimates13 of13 kurtosis13 require13 significantly13 more13 data13 13 13 13 13 13

  • jovi-14-02-01_118
    • Introduction
    • f01
    • Methods and results
    • e01
    • f02
    • e02
    • e03
    • e04
    • f03
    • e05
    • f04
    • f05
    • f06
    • f07
    • Discussion
    • Conclusions
    • Methods details
    • e06
    • e07
    • Albrecht1
    • Albrecht2
    • Anzai1
    • Badcock1
    • Banks1
    • Berens1
    • Blakemore1
    • Burge1
    • Burge2
    • Carandini1
    • Chen1
    • Cormack1
    • Cumming1
    • Cumming2
    • Cumming3
    • Curcio1
    • DeValois1
    • DeAngelis1
    • Ferster1
    • Fincham1
    • Fleet1
    • Ganguli1
    • Geisler1
    • Girshick1
    • Haefner1
    • Haefner2
    • Harris1
    • Hecht1
    • Heeger1
    • Hibbard1
    • Hoyer1
    • Jazayeri1
    • Kaye1
    • Landers1
    • Lehky1
    • Levi1
    • Lewicki1
    • Li1
    • Liu1
    • Ma1
    • Marcos1
    • Marr1
    • McKee1
    • Nakayama1
    • Nienborg1
    • Ogle1
    • Ohzawa1
    • Ohzawa2
    • Ohzawa3
    • Olshausen1
    • Panum1
    • Poggio1
    • Qian1
    • Qian2
    • Read1
    • Read2
    • Read3
    • Simoncelli1
    • Skottun1
    • Stevenson1
    • Stockman1
    • Tanabe1
    • Tanabe2
    • Thibos1
    • Tsao1
    • Tyler1
    • Tyler2
    • Vlaskamp1
    • Williams1
    • Wyszecki1
      • JOV-03741-2013-s01
Page 14: Optimal disparity estimation in natural stereo images · The local differences between the retinal images—the binocular disparities— ... vertically oriented receptive fields.The

ever if the optics in one eye is severely degradedrelative to the other then the optimal filters are quitedifferent and disparity estimation performance issignificantly poorer

Interestingly a persistent 4 diopter difference inrefractive power between the two eyes is the mostimportant risk factor for the development of amblyopia(Levi McKee amp Movshon 2011) a condition char-acterized by extremely poor or absent stereopsis(among other deficits) A four diopter differencebetween the left and right eye optics drastically reduces(ie nearly eliminates) computable interocular contrastsignals that covary systematically with disparity(Equation S6 Figure S7endashh)

Conclusions

The method presented here provides a prescriptionfor how to optimally estimate local depth cues fromimages captured by the photosensors in any visionsystem Our analysis of the depth cue of binoculardisparity provides insight into the computations andneural populations required for accurate estimation ofbinocular disparity in animals viewing natural scenesSpecifically by analyzing the information available atthe photoreceptors we improve upon existing compu-tational methods for solving the stereo correspondenceproblem and provide a normative account of a range ofestablished neurophysiological and psychophysicalfindings This study demonstrates the power ofcharacterizing the properties of natural signals that arerelevant for performing specific natural tasks A similarrecent analysis provided insight into the neuralcomputations required for accurate estimation of thefocus error (defocus) in local regions of retinal imagesof natural scenes (Burge amp Geisler 2011 Stevenson etal 1992) The same approach seems poised to provideinsight into the neural encoding and decoding thatunderlie many other fundamental sensory and percep-tual tasks

Methods details

Natural disparity signals

To determine the luminance structure of naturalscenes we photographed natural scenes on and aroundthe University of Texas at Austin campus with a tripod-mounted Nikon D700 14-bit SLR camera (4256 middot 2836pixels) fitted with a Sigma 50 mm prime lens (Burge ampGeisler 2011) To ensure sharp photographs thecamera lens was focused on optical infinity and all

imaged objects were at least 16 m from the cameraRAW photographs were converted to luminance valuesusing the measured sensitivities of the camera (Burge ampGeisler 2011 2012) and the human photopic sensitivityfunction (Stockman amp Sharpe 2000) Twelve hundred256 middot 256 pixel patches were randomly selected from80 photographs (15 Patches middot 80 Images Figure 1b)Four hundred were used for training the other 800were used for testing

To approximate the depth structure of naturalscenes we texture map the luminance patches ontoplanar fronto-parallel or slanted surface patches thatwere straight ahead at eye height After perspectiveprojection this procedure yields the same pattern ofretinal stimulation that would be created by viewingphotographs pasted on surfaces (eg walls) differentlyslanted in depth The disparity of the surface patch wasdefined as the disparity of the surfacersquos central point

d frac14 2 tan1 IPD=2

dfixation thorn D

tan1 IPD=2

ethdfixationTHORN

eth6THORNwhere dfixation is the fixation distance to the surfacecenter IPD is the interpupillary distance and D is thedepth Depth is given by D frac14 dsurface dfixation wheredsurface is the distance to the surface The disparity ofother points on the surface patch varies slightly acrossthe surface patch The change in disparity across thesurface is greater when the surface patches are slanted

Each surface patch was either coincident with ordisplaced from the point of fixation (Figure 1c) that isthe surface patches were positioned at one of 37 depthscorresponding to 37 disparity levels within Panumrsquosfusional range (Panum 1858) (16875 to 16875arcmin in equally spaced steps) Each surface patchsubtended 28 of visual angle from the cyclopean eyeSurfaces were then slanted (or not) and left- and right-eye projections were determined via perspective pro-jection ensuring that horizontal and vertical disparitieswere correct (Figure 1c) Next left and right projec-tions were resampled at 128 samples8 approximatelythe sampling rate of the human foveal cones (CurcioSloan Kalina amp Hendrickson 1990) Finally theimages were cropped such that they subtended 18 fromeach eye (6 058 about the left- and right-eye foveae)

Optics

Patches were defocused with a polychromatic point-spread function based on a wave-optics model of thehuman visual system The model assumed a 2 mm pupil(a size typical for a bright sunny day) (Wyszecki ampStiles 1982) human chromatic aberrations (ThibosYe Zhang amp Bradley 1992) a single refracting

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 14

surface and the Fraunhoffer approximation whichimplies that at or near the focal plane the opticaltransfer function (OTF) is given by the cross-correla-tion of the generalized pupil function with its complexconjugate In humans and nonhuman primates ac-commodative and vergence eye-movement systems aretightly coupled In other words the focus distanceusually equals the fixation distance (Fincham ampWalton 1957) We set the focus distance equal to thefixation distance of 40 cm and defocused the imagesappropriate for each disparity See Supplement forfurther details of how the optics were simulated

Sensor array responses

The information for disparity estimation depends onthe depth and luminance structure of natural scenesprojective geometry the optics of the two eyes thesensor arrays and the sensor noise These factorstogether determine the pattern of noisy sensor re-sponses

reethxTHORN frac14 IeethxTHORNpsfeethxTHORNfrac12 sampethxTHORN thorn g eth7THORNfor each eye e Ie(x) represents an eye-specific lumi-nance image of the light striking the sensor array ateach location xfrac14 (x y) in a hypothetical optical systemthat causes zero degradation in image quality Eacheyersquos optics degrade the idealized image the optics arerepresented by a polychromatic point-spread functionpsfe(x) that contains the effects of defocus chromaticaberration and photopic wavelength sensitivity Thesensor arrays are represented by a spatial samplingfunctions samp(x) Finally the sensor responses arecorrupted by noise g the noise level was set just highenough to remove retinal image detail that is unde-tectable by the human visual system (Williams 1985)(In pilot studies we found that the falloff in contrastsensitivity at low frequencies had a negligible effect onresults for simplicity we did not model its effects) Notethat Equation 7 represents a luminance (photopic)approximation of the images that would be captured bythe photoreceptors This approximation is sufficientlyaccurate for the present purposes (Burge amp Geisler2011)

Accuracy maximization analysis (AMA)

AMA is a method for dimensionality reduction thatfinds the low-dimensional set of features in sensorysignals that are most useful for performing a specifictask The dependence of AMA on task renders itdistinct from many other popular methods for dimen-sionality reduction Principal components analysis(PCA) for example finds the dimensions that account

for the most variation in the sensory signals There isof course no guarantee that factors causing the mostvariation in the sensory signals will be useful for thetask at hand Color lighting and the multitude oftextures in natural images for example cause signifi-cant variation in the retinal images Yet all of thisvariation is irrelevant for the task of estimatingdisparity Thus PCA returns features that are notnecessarily useful to the task while AMA is specificallydesigned to ignore the irrelevant variation

The logic of AMA is as follows Consider encodingeach training stimulus with a small population of filterswith known (internal) noise characteristics The filterresponses are given by the dot product of the contrast-normalized stimulus scaled by the maximum responserate plus response noise (here a small amount ofGaussian noise) With a known noise model it isstraightforward to compute the mean and variance ofeach filterrsquos response to each training stimulus Then aclosed-form expression is derived for the approximateaccuracy of the Bayesian optimal decoder (GeislerNajemnik amp Ing 2009) Finally this closed-formexpression can be used to search the space of linearfilters to find those that give the most accurateperformance If the algorithm does not settle in localminima it finds the Bayes-optimal filters for maximiz-ing performance in a given task (Geisler et al 2009)Here different random initializations yielded the samefinal estimated filters A Matlab implementation ofAMA and a short discussion of how to apply it areavailable at httpjburgecpsutexaseduresearchCodehtml

It is important to note that the Bayesian optimaldecoder used in AMA requires knowing the means andvariances for each possible stimulus and hence it canonly be used to decode the training stimuli In otherwords AMA can find the optimal linear filters given alarge enough training set but the decoder it uses tolearn those filters cannot be applied to arbitrary stimuliA separate analysis (like the one described in the bodyof the present paper) is required to determine how tooptimally decode the AMA filter responses for arbi-trary test stimuli

Estimating disparity

Increasing the number of disparity levels in thetraining set increases the accuracy of disparity estima-tion However increasing the number of disparitylevels in the training set also increases the training setsize and the computational complexity of learningfilters via AMA A balance must be struck Excellentcontinuous estimates are obtained using 1875 arcminsteps for training followed by interpolation of 577 filterresponse distributions (ie 31 interpolated response

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 15

distribution between each training step correspondingto one interpolated distribution every 35 arcsec)Interpolated distributions were obtained by fittingcubic splines through the response distribution meanvectors and covariance matrices This procedureresulted in 577 LL neurons which resulted in posteriorprobability distributions that were defined by 577discrete points MAP estimates were obtained byselecting the disparity with the highest posteriorprobability Increasing the number of interpolateddistributions (ie LL neurons) had no effect onperformance

Keywords natural scene statistics perceptual con-stancy ideal observer Bayesian statistics populationcode encoding decoding selectivity invariance depthperception stereopsis hierarchical model disparityenergy model simple cells complex cells

Acknowledgments

The authors thank Larry Cormack Bruno Olshau-sen and Jonathan Pillow for comments on an earlierdraft JB was supported by NSF grant IIS-1111328NIH grant 2R01-EY011747 and NIH Training Grant1T32-EY021462-01A1 WSG was supported by NIHgrant 2R01-EY011747

Commercial relationships noneCorresponding author Johannes BurgeEmail jburgemailcpsutexaseduAddress Center for Perceptual Systems University ofTexas at Austin Austin TX USA

References

Albrecht D G amp Geisler W S (1991) Motionselectivity and the contrast-response function ofsimple cells in the visual cortex Visual Neurosci-ence 7(6) 531ndash546

Albrecht D G amp Hamilton D B (1982) Striatecortex of monkey and cat Contrast responsefunction Journal of Neurophysiology 48(1) 217ndash237

Anzai A Ohzawa I amp Freeman R D (1999)Neural mechanisms for processing binocular in-formation I Simple cells Journal of Neurophysiol-ogy 82(2) 891ndash908

Badcock D R amp Schor C M (1985) Depth-increment detection function for individual spatialchannels Journal of the Optical Society of AmericaA 2(7) 1211ndash1215

Banks M S Gepshtein S amp Landy M S (2004)Why is spatial stereoresolution so low Journal ofNeuroscience 24(9) 2077ndash2089

Berens P Ecker A S Gerwinn S Tolias A S ampBethge M (2011) Reassessing optimal neuralpopulation codes with neurometric functionsProceedings of the National Academy of SciencesUSA 108(11) 4423

Blakemore C (1970) The range and scope ofbinocular depth discrimination in man The Journalof Physiology 211(3) 599ndash622

Burge J amp Geisler W S (2011) Optimal defocusestimation in individual natural images Proceed-ings of the National Academy of Sciences USA108(40) 16849ndash16854

Burge J amp Geisler W S (2012) Optimal defocusestimates from individual images for autofocusing adigital camera Proceedings of the ISampTSPIE 47thAnnual Meeting January 2012 Burlingame CA

Carandini M amp Heeger D J (2012) Normalizationas a canonical neural computation Nature ReviewsNeuroscience 13(1) 51ndash62

Chen Y amp Qian N (2004) A coarse-to-fine disparityenergy model with both phase-shift and position-shift receptive field mechanisms Neural Computa-tion 16(8) 1545ndash1577

Cormack L K Stevenson S B amp Schor C M(1991) Interocular correlation luminance contrastand cyclopean processing Vision Research 31(12)2195ndash2207

Cumming B G (2002) An unexpected specializationfor horizontal disparity in primate primary visualcortex Nature 418(6898) 633ndash636

Cumming B G amp DeAngelis G C (2001) Thephysiology of stereopsis Annual Review of Neuro-science 24 203ndash238

Cumming B G Shapiro S E amp Parker A J (1998)Disparity detection in anticorrelated stereogramsPerception 27(11) 1367ndash1377

Curcio C A Sloan K R Kalina R E ampHendrickson A E (1990) Human photoreceptortopography The Journal of Comparative Neurolo-gy 292(4) 497ndash523

De Valois R L Albrecht D G amp Thorell L G(1982) Spatial frequency selectivity of cells inmacaque visual cortex Vision Research 22(5) 545ndash559

DeAngelis G C Ohzawa I amp Freeman R D(1991) Depth is encoded in the visual cortex by aspecialized receptive field structure Nature352(6331) 156ndash159

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 16

Ferster D amp Miller K D (2000) Neural mechanismsof orientation selectivity in the visual cortexAnnual Review of Neuroscience 23(1) 441ndash471

Fincham E amp Walton J (1957) The reciprocalactions of accommodation and convergence TheJournal of Physiology 137(3) 488ndash508

Fleet D J Wagner H amp Heeger D J (1996)Neural encoding of binocular disparity Energymodels position shifts and phase shifts VisionResearch 36(12) 1839ndash1857

Ganguli D amp Simoncelli E P (2010) Implicitencoding of prior probabilities in optimal neuralpopulations Advances in Neural Information Pro-cessing Systems (NIPS) 23 658ndash666

Geisler W S Najemnik J amp Ing A D (2009)Optimal stimulus encoders for natural tasksJournal of Vision 9(13)17 1ndash16 httpwwwjournalofvisionorgcontent91317 10116791317 [PubMed] [Article]

Girshick A R Landy M S amp Simoncelli E P(2011) Cardinal rules Visual orientation percep-tion reflects knowledge of environmental statisticsNature Neuroscience 14(7) 926ndash932

Haefner R amp Bethge M (2010) Evaluating neuronalcodes for inference using Fisher informationAdvances in Neural Information Processing Systems23 1ndash9

Haefner R M amp Cumming B G (2008) Adaptationto natural binocular disparities in primate V1explained by a generalized energy model Neuron57(1) 147ndash158

Harris J M McKee S P amp Smallman H S (1997)Fine-scale processing in human binocular stereop-sis Journal of the Optical Society of America AOptics Image Science and Vision 14(8) 1673ndash1683

Hecht S Shlaer S amp Pirenne M H (1942) Energyquanta and vision The Journal of General Physi-ology 25(6) 819ndash840

Heeger D J (1992) Normalization of cell responses incat striate cortex Visual Neuroscience 9(2) 181ndash197

Hibbard P B (2008) Binocular energy responses tonatural images Vision Research 48(12) 1427ndash1439

Hoyer P O amp Hyvarinen A (2000) Independentcomponent analysis applied to feature extractionfrom colour and stereo images Network Compu-tation in Neural Systems 11(3) 191ndash210

Jazayeri M amp Movshon J A (2006) Optimalrepresentation of sensory information by neuralpopulations Nature Neuroscience 9(5) 690ndash696

Kaye M (1978) Stereopsis without binocular corre-lation Vision Research 18(8) 1013ndash1022

Landers D D amp Cormack L K (1997) Asymme-tries and errors in perception of depth fromdisparity suggest a multicomponent model ofdisparity processing Perception amp Psychophysics59(2) 219ndash231

Lehky S R amp Sejnowski T J (1990) Neural modelof stereoacuity and depth interpolation based on adistributed representation of stereo disparityJournal of Neuroscience 10(7) 2281ndash2299

Levi D M McKee S P amp Movshon J A (2011)Visual deficits in anisometropia Vision Research51(1) 48ndash57

Lewicki M S (2002) Efficient coding of naturalsounds Nature Neuroscience 5(4) 356ndash363

Li Z amp Atick J J (1994) Efficient stereo coding inthe multiscale representation Network Computa-tion in Neural Systems 5 157ndash174

Liu Y Bovik A C amp Cormack L K (2008)Disparity statistics in natural scenes Journal ofVision 8(11)19 1ndash14 httpwwwjournalofvisionorgcontent81119 doi10116781119 [PubMed][Article]

Ma W J Beck J M Latham P E amp Pouget A(2006) Bayesian inference with probabilistic pop-ulation codes Nature Neuroscience 9(11) 1432ndash1438

Marcos S amp Burns S A (2000) On the symmetrybetween eyes of wavefront aberration and conedirectionality Vision Research 40(18) 2437ndash2447

Marr D amp Poggio T (1976) Cooperative computa-tion of stereo disparity Science 194(4262) 283ndash287

McKee S P Levi D M amp Bowne S F (1990) Theimprecision of stereopsis Vision Research 30(11)1763ndash1779

Nakayama K amp Shimojo S (1990) Da Vincistereopsis Depth and subjective occluding contoursfrom unpaired image points Vision Research30(11) 1811ndash1825

Nienborg H Bridge H Parker A J amp CummingB G (2004) Receptive field size in V1 neuronslimits acuity for perceiving disparity modulationJournal of Neuroscience 24(9) 2065ndash2076

Ogle K N (1952) On the limits of stereoscopic visionJournal of Experimental Psychology 44(4) 253ndash259

Ohzawa I (1998) Mechanisms of stereoscopic visionThe disparity energy model Current Opinion inNeurobiology 8(4) 509ndash515

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 17

Ohzawa I DeAngelis G C amp Freeman R D(1990) Stereoscopic depth discrimination in thevisual cortex Neurons ideally suited as disparitydetectors Science 249(4972) 1037ndash1041

Ohzawa I DeAngelis G C amp Freeman R D(1997) Encoding of binocular disparity by complexcells in the catrsquos visual cortex Journal of Neuro-physiology 77(6) 2879ndash2909

Olshausen B A amp Field D J (1996) Emergence ofsimple-cell receptive field properties by learning asparse code for natural images Nature 381(6583)607ndash609

Panum P (1858) Physiologische Untersuchungen uberdas Sehen mit zwei Augen [Translation Psycholog-ical investigations on seeing with two eyes] KielSchwerssche Buchandlung

Poggio G F Gonzalez F amp Krause F (1988)Stereoscopic mechanisms in monkey visual cortexBinocular correlation and disparity selectivityJournal of Neuroscience 8(12) 4531ndash4550

Qian N (1997) Binocular disparity and the perceptionof depth Neuron 18(3) 359ndash368

Qian N amp Zhu Y (1997) Physiological computationof binocular disparity Vision Research 37(13)1811ndash1827

Read J C A amp Cumming B G (2003) Testingquantitative models of binocular disparity selectiv-ity in primary visual cortex Journal of Neurophys-iology 90(5) 2795ndash2817

Read J C A amp Cumming B G (2007) Sensors forimpossible stimuli may solve the stereo correspon-dence problem Nature Neuroscience 10(10) 1322ndash1328

Read J C A Parker A J amp Cumming B G (2002)A simple model accounts for the response ofdisparity-tuned V1 neurons to anticorrelated im-ages Visual Neuroscience 19(6) 735ndash753

Simoncelli E P amp Olshausen B A (2001) Naturalimage statistics and neural representation AnnualReview of Neuroscience 24 1193ndash1216

Skottun B C De Valois R L Grosof D HMovshon J A Albrecht D G amp Bonds A B(1991) Classifying simple and complex cells on the

basis of response modulation Vision Research31(7-8) 1079ndash1086

Stevenson S B Cormack L K Schor C M ampTyler C W (1992) Disparity tuning in mecha-nisms of human stereopsis Vision Research 32(9)1685ndash1694

Stockman A amp Sharpe L T (2000) The spectralsensitivities of the middle- and long-wavelength-sensitive cones derived from measurements inobservers of known genotype Vision Research40(13) 1711ndash1737

Tanabe S amp Cumming B G (2008) Mechanismsunderlying the transformation of disparity signalsfrom V1 to V2 in the macaque Journal ofNeuroscience 28(44) 11304ndash11314

Tanabe S Haefner R M amp Cumming B G (2011)Suppressive mechanisms in monkey V1 help tosolve the stereo correspondence problem Journal ofNeuroscience 31(22) 8295ndash8305

Thibos L N Ye M Zhang X amp Bradley A(1992) The chromatic eye A new reduced-eyemodel of ocular chromatic aberration in humansApplied Optics 31(19) 3594ndash3600

Tsao D Y Conway B R amp Livingstone M S(2003) Receptive fields of disparity-tuned simplecells in macaque V1 Neuron 38(1) 103ndash114

Tyler C W (1975) Spatial organization of binoculardisparity sensitivity Vision Research 15(5) 583ndash590

Tyler C W amp Julesz B (1978) Binocular cross-correlation in time and space Vision Research18(1) 101ndash105

Vlaskamp B N S Yoon G amp Banks M S (2011)Human stereopsis is not limited by the optics of thewell-focused eye Journal of Neuroscience 31(27)9814ndash9818

Williams D R (1985) Visibility of interference fringesnear the resolution limit Journal of the OpticalSociety of America A 2(7) 1087ndash1093

Wyszecki G amp Stiles W (1982) Color scienceConcepts and methods quantitative data and for-mulas New York John Wiley amp Sons

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 18

113 13

Optimal13 disparity13 estimation13 in13 natural13 stereo-shy‐images13 (Supplement)13

Johannes13 Burge13 amp13 Wilson13 S13 Geisler13 13 Computing13 the13 log13 likelihood13 with13 populations13 of13 simple13 and13 complex13 cells13 13 Simple13 cells13 in13 visual13 cortex13 produce13 only13 positive13 responses13 whereas13 AMA13 filters13 produce13 both13 positive13 and13 negative13 responses13 The13 linear13 AMA13 filter13 responses13 (see13 first13 term13 in13 Eq13 5)13 and13 their13 pairwise13 sums13 (see13 the13 sum13 in13 the13 third13 termrsquos13 parentheses)13 can13 be13 obtained13 from13 a13 population13 of13 simple13 cells13 We13 define13 simple13 cells13 as13 units13 whose13 outputs13 result13 from13 linear13 filtering13 followed13 by13 half-shy‐wave13 rectification13 (ie13 thresholding13 at13 zero)13 Each13 AMA13 filter13 response13 in13 the13 first13 term13 can13 be13 obtained13 from13 two13 simple13 cells13 that13 are13 lsquoonrsquo13 and13 lsquooffrsquo13 versions13 of13 each13 AMA13 filter13 (Fig13 S3a13 see13 Fig13 3)13 Each13 pairwise-shy‐summed13 AMA13 filter13 response13 in13 the13 third13 term13 can13 be13 obtained13 from13 two13 simple13 cells13 that13 are13 lsquoonrsquo13 and13 lsquooffrsquo13 versions13 of13 each13 pairwise-shy‐summed13 filter13 (Fig13 S3a13 S4)13 13 13 The13 squared13 AMA13 filter13 responses13 (see13 the13 second13 and13 third13 terms13 in13 Eq13 5)13 can13 be13 obtained13 from13 a13 population13 of13 complex13 cells13 We13 define13 complex13 cells13 as13 units13 whose13 responses13 result13 from13 linear13 filtering13 followed13 by13 a13 squaring13 non-shy‐linearity13 as13 is13 often13 the13 case13 in13 cortex13 Complex13 cells13 can13 be13 implemented13 by13 summing13 and13 then13 squaring13 (or13 squaring13 and13 then13 summing)13 the13 outputs13 of13 lsquoonrsquo13 and13 lsquooffrsquo13 simple13 cells13 (Fig13 S3b13 the13 disparity13 tuning13 curves13 of13 these13 complex13 cells13 are13 given13 in13 Fig13 S5)13 13 Finally13 neurons13 whose13 response13 rates13 represent13 the13 log13 likelihood13 of13 the13 AMA13 filter13 population13 responses13 can13 be13 constructed13 via13 a13 weighted13 sum13 of13 simple13 and13 complex13 cell13 population13 responses13 (Eq13 513 Fig13 613 Fig13 7)13 (Note13 that13 the13 likelihood13 can13 be13 obtained13 by13 exponentiating13 the13 log-shy‐likelihood13 responses)13 In13 other13 words13 a13 large13 collection13 of13 log-shy‐likelihood13 (LL)13 neurons13 each13 with13 a13 different13 preferred13 disparity13 can13 be13 constructed13 from13 a13 fixed13 set13 of13 simple13 and13 complex13 cells13 simply13 by13 changing13 the13 weights13 13 13 Derivation13 of13 optimal13 weights13 on13 simple13 and13 complex13 cells13 The13 first13 term13 in13 Eq13 513 (ie13 the13 linear13 term)13 can13 be13 conceptualized13 as13 a13 weighted13 sum13 of13 simple13 cell13 population13 responses13 The13 second13 and13 third13 terms13 (ie13 the13 quadratic13 terms)13 can13 be13 conceptualized13 as13 weighted13 sums13 of13 complex13 cell13 population13 responses13 The13 constant13 corresponds13 to13 baseline13 response13 The13 constant13 and13 the13 weights13 on13 the13 linear13 squared13 and13 sum-shy‐squared13 filters13 are13 given13 by13

wik = Ck

minus1uk 13 13 13 13 13 13 13 13 13 (S1a)13

13 wiik = minusdiag(Cminus1)+ 05Cminus11 13 13 13 13 13 13 13 (S1b)13

wij k = minus05Cij

minus1 forallij j gt i 13 13 13 13 13 13 13 13 (S1c)13

cons primetk = minus05ukTCk

minus1uk + constk 13 13 13 13 13 13 (S1d)13 13 where13 I13 is13 the13 identity13 matrix13 113 is13 the13 lsquoonesrsquo13 vector13 and13 diag()13 sets13 a13 matrix13 diagonal13 to13 a13 vector13 Here13 we13 derive13 these13 weight13 equations13 based13 on13 the13 multi-shy‐dimensional13 Gaussian13 approximations13 to13 the13 empirically13 determined13 filter13 response13 distributions13 (ie13 conditional13 response13 distributions)13 (see13 Fig13 4a)13 13 13 13 13 13

213 13

First13 we13 rewrite13 Eq13 513 in13 the13 main13 text13 by13 expanding13 its13 third13 term13 13

ln p R δ k( ) = wik Ri

i=1

n

sum + wiik Ri2

i=1

n

sum + wij k Ri2 + 2Ri Rj + Rj

2( )j=i+1

n

sumi=1

nminus1

sum + cons primet 13 (S2)13

13 Second13 we13 rewrite13 Eq13 413 in13 the13 main13 text13

13

ln p R δ k( ) = minus05 R minus uk( )T Ak R minus uk( )+ const 13 13 13 13 13 (S3)13

13 where13 Ak 13 equals13 the13 inverse13 covariance13 matrix13 Ck

minus1 13 (Recall13 that13 the13 mean13 vector13 uk and13 covariance13 matrix13 Ck are13 obtained13 by13 projecting13 a13 collection13 of13 natural13 stereo-shy‐images13 having13 disparity13 δ k 13 onto13 the13 filters)13 From13 here13 forward13 the13 disparity13 index13 k13 is13 dropped13 for13 notational13 simplicity13 Third13 we13 set13 Eqs13 S213 and13 S313 equal13 to13 one13 another13 multiply13 through13 and13 collect13 terms13 The13 terms13 of13 the13 log13 likelihood13 (Eq13 S3)13 are13 found13 to13 have13 the13 form13 13

( ) ( )2 2 205 05 2ii i i ii i ii i i ii ia R u a R a Ru a uminus minus = minus minus + 13 13 13 13 (S4a)13

minusaij Ri minus ui( ) Rj minus u j( ) = minus aijRiRj minus aijRiu j minus aijRjui + aiju jui( ) 13 13 13 (S4b)13

13 where13 iia 13 are13 the13 on-shy‐diagonal13 elements13 of13 A 13 ija are13 the13 off-shy‐diagonal13 elements13 and13 iu 13 the13 are13 the13 mean13 filter13 responses13 13 13 The13 aim13 is13 now13 to13 express13 wi wii 13 and13 wij 13 in13 terms13 of13 iu 13 iia 13 and13 ija 13 First13 consider13 wi 13 In13 this13 case13

we13 want13 to13 collect13 terms13 containing13 iR 13 wiRi 13 from13 Eq13 S213 and13 aiiRiui 13 and13 aijRiu j 13 from13 Eq13 S4a13 Setting13 the13 iR 13 terms13 equal13 and13 canceling13 gives13 13

wi = aiju jj=1

n

sum 13 13 13 13 13 13 13 13 13 (S5a)13

13 Note13 that13 the13 weights13 on13 the13 linear13 filter13 responses13 wi 13 are13 zero13 if13 the13 mean13 linear13 filter13 responses13 ui 13 are13 zero13 In13 our13 case13 the13 filter13 response13 distributions13 means13 (ie13 the13 means13 of13 the13 conditional13 response13 distributions)13 are13 all13 near13 zero13 (see13 Fig13 4a13 Fig13 S2)13 Thus13 the13 weights13 on13 the13 simple13 cell13 responses13 (Fig13 6a)13 contribute13 little13 to13 the13 units13 (LL13 neurons)13 that13 respond13 according13 to13 the13 log13 likelihood13 of13 the13 linear13 filter13 population13 response13 (see13 Discussion)13 13 Next13 consider13 wij 13 i ne j 13 In13 this13 case13 we13 want13 to13 collect13 terms13 containing13 RiRj 13 2wijRiRj 13 appears13

in13 Eq13 S213 and13 minusaijRiRj 13 appears13 in13 Eq13 S4b13 Setting13 the13 two13 RiRj 13 terms13 equal13 to13 each13 other13 and13 canceling13 gives13 13

wij = minus05aij 13 13 13 13 13 13 13 13 13 (S5b)13

313 13

Next13 consider13 wii 13 In13 this13 case13 the13 total13 weight13 on13 2iR 13 in13 Eq13 S213 must13 be13 05 iiaminus 13 The13 weight13 from13

the13 third13 term13 of13 Eq13 S213 can13 be13 found13 by13 substituting13 wij = minus05aij 13 Subtracting13 that13 weight13 from13 the13

total13 weight13 specified13 by13 Eq13 S413 gives13 wii 13 13

wii = minusaii + 05 aijjsum 13 13 13 13 13 13 13 13 13 (S5c)13

13 Last13 by13 grouping13 terms13 from13 Eq13 S413 that13 are13 independent13 of13 the13 filter13 response13 we13 find13 the13 constant13 in13 Eq13 S213 is13 given13 by13 13

cons primet = minus05 aiimicroi2 + aijmicroimicro j

j foralljneisum

⎝⎜⎞

⎠⎟isum + const 13 13 13 13 (S5d)13

13 Eqs13 S1a-shy‐d13 are13 obtained13 by13 rewriting13 Eqs13 S5a-shy‐d13 in13 matrix13 notation13 13 13 In13 visual13 cortex13 simple13 and13 complex13 cells13 have13 the13 same13 maximum13 firing13 rates13 on13 average13 (Albrecht13 amp13 Hamilton13 1982)13 However13 the13 maximum13 possible13 response13 of13 the13 squared13 filters13 (see13 the13 second13 term13 in13 Eq13 5)13 is13 R2max 13 and13 the13 maximum13 possible13 response13 of13 the13 sum-shy‐squared13 filters13 (see13 the13 third13

term13 in13 Eq13 5)13 is13 given13 by13 R2ijmax = Rmax 2+ 2fi sdot f j( )2 13 where13 fi 13 and13 f j 13 are13 the13 weighting13 functions13 defining13 filters13 i13 and13 j13 (see13 Fig13 3a)13 Thus13 if13 these13 computations13 were13 implemented13 in13 cortexmdashthat13 is13 if13 the13 maximum13 response13 of13 the13 complex13 cells13 equaled13 the13 maximum13 response13 of13 the13 simple13 cells13 Rmax 13 rather13 than13 R

2max 13 and13 R

2ijmaxmdashthe13 weights13 on13 the13 complex13 cell13 responses13 must13 be13 scaled13

Thus13 the13 optimal13 weights13 on13 complex13 cells13 are13 given13 by13 primewi = wi 13 primewii = Rmaxwii 13 and13 primewij =Rijmax2

Rmaxwij 13

These13 scale13 factors13 are13 important13 because13 in13 a13 neurophysiological13 experiment13 one13 would13 measure13 primew 13 not13 w 13 Consequently13 primewii 13 and13 primewij 13 are13 the13 normalized13 weights13 plotted13 in13 Fig13 7b13

13 Evaluating13 the13 effect13 of13 surface13 slant13 Slant13 causes13 disparity13 to13 change13 across13 the13 patch13 We13 evaluated13 the13 effect13 of13 surface13 slant13 by13 determining13 the13 optimal13 filters13 and13 estimation13 performance13 for13 a13 training13 set13 with13 a13 distribution13 of13 slants13 If13 we13 assume13 (plausibly)13 that13 objects13 are13 on13 average13 equally13 likely13 to13 be13 viewed13 from13 any13 given13 angle13 then13 for13 a13 given13 small13 angular13 retinal13 extent13 the13 probability13 of13 a13 given13 slant13 about13 the13 vertical13 axis13 should13 fall13 as13 a13 half-shy‐cosine13 function13 Slant13 was13 calculated13 for13 the13 cyclopean13 eye13 To13 evaluate13 the13 effect13 of13 surface13 slant13 variation13 we13 generated13 a13 training13 set13 of13 stereo-shy‐images13 from13 a13 set13 of13 surfaces13 with13 slants13 that13 were13 randomly13 drawn13 from13 a13 truncated13 (+7113 deg)13 cosine13 probability13 density13 (more13 extreme13 slants13 were13 not13 practical13 to13 include13 because13 of13 the13 surface13 size13 required13 to13 produce13 a13 projected13 patch13 width13 of13 113 deg)13 The13 within-shy‐patch13 changes13 in13 disparity13 that13 were13 introduced13 by13 these13 slants13 are13 comparable13 to13 those13 in13 natural13 viewing13 (Fig13 S1g)13 Results13 based13 on13 this13 analysis13 are13 thus13 likely13 to13 be13 representative13 Then13 we13 determined13 the13 optimal13 binocular13 filters13 for13 this13 new13 training13 set13 The13 filters13 that13 were13 trained13 on13 surfaces13 having13 different13 slants13 (Fig13 S1a-shy‐c)13 and13 the13 LL13 neurons13 that13 are13 constructed13 from13 them13 (Fig13 S1de)13 are13 quite13 similar13 to13 those13 that13 were13 trained13 only13 on13 fronto-shy‐parallel13 surfaces13 The13 main13 (although13 slight)13 differences13 are13 that13 the13 new13 filters13 have13 spatial13 extents13 that13 are13 slightly13 smaller13 and13 octave13 bandwidths13 that13 are13 slightly13 higher13

413 13

and13 that13 the13 new13 LL13 neuron13 tuning13 curves13 are13 broader13 Disparity13 estimation13 performance13 on13 slant13 surfaces13 is13 also13 comparable13 but13 precision13 is13 somewhat13 reduced13 (Fig13 S1f13 Fig13 5d)13 When13 disparity13 changes13 across13 the13 patch13 the13 optimal13 encoding13 filters13 must13 be13 small13 enough13 to13 prevent13 the13 disparity13 signal13 from13 being13 lsquosmeared13 outrsquo13 by13 within-shy‐patch13 (ie13 within-shy‐receptive13 field)13 disparity13 variation13 And13 they13 must13 be13 large13 enough13 to13 prevent13 a13 poor13 signal-shy‐to-shy‐noise13 ratio13 which13 would13 lead13 to13 a13 preponderance13 of13 false13 matches13 (ie13 inaccurate13 disparity13 estimates)13 The13 optimal13 filters13 appear13 to13 strike13 this13 balance13 13 13 Evaluating13 the13 effect13 of13 depth13 variations13 other13 than13 slant13 Naturally13 occurring13 fine13 depth13 structure13 and13 occlusions13 (ie13 within-shy‐patch13 depth13 variation13 and13 depth13 discontinuities13 Fig13 S8a-shy‐c)13 were13 not13 present13 in13 either13 of13 our13 training13 sets13 To13 determine13 how13 the13 magnitude13 of13 these13 other13 sources13 of13 depth13 variation13 compare13 to13 that13 due13 to13 slant13 we13 analyzed13 4000013 113 deg13 patches13 from13 3013 outdoor13 range13 images13 obtained13 with13 a13 high-shy‐precision13 Riegl13 VZ40013 range13 scanner13 For13 each13 patch13 we13 computed13 the13 standard13 deviation13 of13 i)13 the13 differences13 between13 the13 raw13 range13 values13 and13 the13 best-shy‐fitting13 fronto-shy‐parallel13 plane13

σ nonminus planar 13 and13 ii)13 the13 differences13

between13 the13 best-shy‐fitting13 slanted13 and13 fronto-shy‐parallel13 planes13 σ planar 13 The13 ratio13 of13 these13 standard13

deviations13 σ planar σ nonminus planar 13 provides13 a13 measure13 of13 the13 relative13 magnitude13 of13 depth13 variation13 due13 to13

slant13 and13 variation13 due13 to13 other13 factors13 If13 the13 depth13 variation13 is13 due13 only13 to13 surface13 slant13 then13 the13 ratio13

σ planar σ nonminus planar 13 will13 have13 a13 value13 equal13 to13 1013 If13 all13 of13 the13 depth13 variation13 is13 due13 to13 factors13

other13 than13 slant13 the13 ratio13 will13 have13 a13 value13 of13 0013 Fig13 S8d13 plots13 the13 distribution13 of13 this13 ratio13 for13 all13 range13 patches13 In13 the13 most13 common13 patches13 essentially13 all13 the13 variation13 is13 captured13 by13 slant13 (ie13 the13 modal13 ratio13 is13 ~10)13 The13 median13 ratio13 is13 07513 indicating13 that13 in13 the13 median13 patch13 7513 of13 the13 depth13 variation13 is13 captured13 by13 slant13 Given13 that13 the13 optimal13 binocular13 filters13 are13 robust13 to13 variation13 in13 surface13 slant13 and13 given13 that13 most13 of13 natural13 depth13 variation13 is13 captured13 by13 slant13 it13 is13 unlikely13 that13 depth13 variation13 other13 than13 slant13 will13 qualitatively13 affect13 our13 results13 13 External13 variability13 and13 neural13 noise13 13 We13 examined13 the13 relative13 magnitude13 of13 stimulus-shy‐induced13 LL13 neuron13 variability13 and13 typical13 neural13 noise13 in13 cortex13 Typical13 cortical13 neurons13 have13 a13 peak13 mean13 response13 larger13 than13 3013 spikessec13 (Geisler13 amp13 Albrecht13 1997)13 and13 intrinsic13 noise13 variance13 proportional13 to13 the13 neuronrsquos13 mean13 response13 with13 a13 proportionality13 constant13 (ie13 Fano13 factor)13 less13 than13 15(Tolhurst13 Movshon13 amp13 Dean13 1983)13 To13 err13 on13 the13 side13 of13 too13 much13 neural13 noise13 we13 assume13 that13 each13 LL13 neuron13 has13 a13 maximum13 average13 response13 of13 3013 spkss13 and13 a13 Fano13 factor13 of13 1513 (bc13 the13 effect13 of13 noise13 is13 greater13 at13 lower13 firing13 rates)13 We13 convert13 the13 LL13 neuron13 responses13 to13 spike13 rates13 by13 first13 obtaining13 the13 average13 response13 of13 each13 LL13 neuron13 to13 its13 preferred13 disparity13 and13 then13 scaling13 that13 value13 to13 the13 maximum13 average13 response13 For13 an13 integration13 time13 of13 a13 typical13 fixation13 duration13 (30013 ms)13 stimulus-shy‐induced13 response13 variance13 was13 ~3x13 greater13 than13 the13 internal13 noise13 For13 an13 integration13 time13 equal13 to13 a13 typical13 stimulus13 presentation13 in13 a13 psychophysical13 task13 (113 sec)13 13 stimulus-shy‐induced13 response13 variance13 was13 ~9x13 greater13 than13 the13 internal13 noise13 Thus13 external13 variability13 must13 be13 considered13 to13 understand13 the13 influence13 of13 neural13 noise13 on13 the13 behavioral13 limits13 of13 vision13 under13 natural13 conditions13 In13 fact13 stimulus-shy‐induced13 variability13 is13 likely13 to13 be13 the13 controlling13 variability13 in13 many13 natural13 single-shy‐trial13 tasks13 13 Spatial13 frequency13 tuning13 of13 optimal13 linear13 binocular13 filters13 13 To13 develop13 an13 intuition13 for13 why13 spatial13 frequencies13 higher13 than13 ~613 cpd13 carry13 little13 information13 about13 disparity13 we13 examined13 the13 disparity13 signals13 resulting13 from13 a13 binocularly-shy‐viewed13 high-shy‐contrast13 cosine-shy‐windowed13 edge13 The13 Fourier-shy‐decomposition13 of13 each13 eyersquos13 stimulus13 is13 a13 set13 of13

513 13

phase-shy‐aligned13 sinewave13 gratings13 with13 the13 1f13 contrast13 fall-shy‐off13 typical13 of13 natural13 scenes(Field13 1987)13 The13 binocular13 difference13 signal13 (ie13 the13 difference13 of13 the13 left13 and13 right13 signals)13 is13 sinusoidal13 with13 contrast13 amplitude13 given13 by13 13

AB f δ k( ) = AL( f )2 + AR( f )2 minus 2AL( f )AR( f )cos 2π fδ k( ) 13 13 13 (S6)13

13 where f 13 is13 the13 spatial13 frequency13 δ k 13 is13 a13 particular13 disparity13 AL ( f ) 13 and13 AR( f ) 13 are13 the13 left13 and13

right13 eye13 retinal13 amplitudes13 and13 AB f δ k( ) 13 is13 the13 amplitude13 of13 the13 binocular13 contrast13 difference13 signal13 If13 the13 two13 eyes13 have13 identical13 optics13 the13 left-shy‐13 and13 right-shy‐eye13 retinal13 amplitudes13 (which13 include13 the13 effects13 of13 the13 optics13 and13 the13 1 f 13 falloff13 of13 natural13 image13 spectra)13 will13 be13 the13 same13 for13 a13 given13 disparity13 Fig13 S7a13 plots13 pattern13 of13 binocular13 contrast13 signals13 for13 each13 of13 several13 disparities13 Fig13 S7b13 shows13 the13 signals13 after13 filtering13 with13 a13 bank13 of13 log-shy‐Gabor13 filters13 each13 having13 a13 1513 octave13 bandwidth13 The13 patterns13 of13 binocular13 contrast13 barely13 differ13 above13 ~613 cpd13 A13 filter13 population13 (for13 disparity13 estimation)13 will13 be13 most13 efficient13 if13 each13 filter13 signals13 information13 about13 a13 range13 of13 disparities13 13 13 V113 binocular13 receptive13 fields13 tuned13 to13 the13 highest13 useful13 spatial13 frequency13 (~613 cpd)13 would13 have13 a13 spatial13 extent13 of13 ~813 arcmin13 assuming13 the13 typical13 1513 octave13 bandwidth13 of13 neurons13 in13 cortex13 Given13 that13 receptive13 fields13 cannot13 signal13 disparity13 variation13 finer13 than13 their13 own13 width13 this13 analysis13 may13 help13 explain13 why13 humans13 lose13 the13 ability13 to13 detect13 disparity13 modulations13 (ie13 sinusoidal13 modulations13 in13 disparity-shy‐defined13 depth)13 at13 spatial13 frequencies13 higher13 than13 ~413 cpd13 (Banks13 Gepshtein13 amp13 Landy13 200413 Harris13 McKee13 amp13 Smallman13 1997)13 Vergence13 eye13 movement13 jitter13 has13 previously13 been13 proposed13 as13 an13 explanation13 for13 why13 useful13 binocular13 information13 is13 restricted13 to13 low13 spatial13 frequencies13 (Vlaskamp13 Yoon13 amp13 Banks13 2011)13 vergence13 noise13 can13 degrade13 stereo13 information13 enough13 to13 render13 high13 frequency13 signals13 unmeasurable13 (Fig13 S7cd)13 The13 analysis13 presented13 here13 suggests13 that13 the13 frequency13 at13 which13 humans13 lose13 the13 ability13 to13 detect13 disparity13 modulation13 may13 simply13 be13 explained13 by13 the13 statistics13 of13 natural13 images13 13 Optics13 The13 retinal13 images13 used13 in13 our13 analysis13 were13 simulated13 to13 match13 the13 retinal13 image13 formation13 that13 occurs13 in13 the13 human13 eye13 In13 humans13 and13 non-shy‐human13 primates13 accommodative13 and13 vergence13 eye-shy‐movement13 systems13 are13 yoked(Fincham13 amp13 Walton13 1957)13 Consistent13 with13 this13 fact13 we13 set13 the13 focus13 distance13 equal13 to13 the13 fixation13 distance13 of13 4013 cm13 and13 defocused13 the13 images13 appropriate13 for13 each13 disparity13 The13 image13 of13 a13 surface13 displaced13 from13 the13 fixation13 is13 defocused13 by13

13 ΔD cong δ IPD 13 13 13 13 13 13 13 13 13 13 (S7)13

13 where13 δ 13 is13 the13 disparity13 of13 the13 surface13 expressed13 in13 radians13 IPD13 is13 the13 inter-shy‐pupillary13 distance13 expressed13 in13 meters13 and13 ΔD 13 is13 the13 defocus13 expressed13 in13 diopters13 (1meters)13 To13 account13 for13 the13 effect13 of13 chromatic13 aberration13 we13 created13 a13 different13 polychromatic13 point-shy‐spread13 function13 (PSF)13 for13 each13 disparity13 (ie13 defocus13 Eq13 S7)13 Single-shy‐wavelength13 PSFs13 were13 computed13 every13 1013 nm13 between13 40013 and13 70013 nm13 The13 wavelength-shy‐dependent13 change13 in13 refractive13 power13 of13 the13 human13 eye13 was13 taken13 from13 the13 literature13 (Thibos13 Ye13 Zhang13 amp13 Bradley13 1992)13 Polychromatic13 PSFs13 were13 obtained13 by13 weighting13 the13 single-shy‐wavelength13 PSFs13 by13 the13 photopic13 sensitivity13 function13 (Stockman13 amp13 Sharpe13 2000)13 and13 by13 the13 D6513 daylight13 illumination13 spectrum13 and13 then13 summing13 13

613 13

13 psf photopic xΔD( ) = 1

Kpsf xλΔD( )sc

λsum λ( )D65 λ( ) 13 13 13 13 (S8)13

13 where13 sc λ( ) 13 is13 the13 human13 photopic13 sensitivity13 function13 and13 K13 is13 a13 normalizing13 constant13 that13 sets13 the13 volume13 of13 psf photopic 13 to13 1013 The13 modulation13 transfer13 function13 (MTF)13 is13 the13 amplitude13 of13 the13 Fourier13 transform13 of13 the13 PSF13 13 Contrast13 normalization13 Contrast13 normalization13 occurs13 early13 in13 visual13 processing13 The13 standard13 model13 of13 cortical13 neuron13 response13 assumes13 that13 the13 input13 is13 the13 contrast13 signal13

c x( ) = r x( )minus r( ) r 13 where13 r x( ) 13 are13 the13

noisy13 sensor13 responses13 to13 the13 luminance13 image13 falling13 on13 the13 retina13 over13 a13 local13 area13 and13 r 13 is13 the13 local13 mean13 This13 contrast13 signal13 is13 then13 normalized13 by13 the13 local13 contrast13

cnorm x( ) = c x( ) c x( ) 2

+ nc502 13 where13 n13 is13 the13 dimensionality13 of13 the13 vector13 c x( ) 2 13 is13 the13 contrast13

energy13 (or13 equivalently13 n 13 times13 the13 squared13 RMS13 contrast)13 and13 50c 13 is13 the13 half-shy‐saturation13 constant13 (Albrecht13 amp13 Geisler13 199113 Albrecht13 amp13 Hamilton13 198213 Heeger13 1992)13 (We13 are13 agnostic13 about13 the13 mechanism13 by13 which13 the13 contrast13 normalization13 is13 achieved)13 Finally13 the13 normalized13 contrast13 signal13 is13 weighted13 by13 the13 cortical13 neuronrsquos13 receptive13 field13 and13 passed13 through13 a13 static13 non-shy‐linearity13 For13 example13 in13 the13 case13 of13 a13 binocular13 simple13 cell13 the13 response13 of13 the13 neuron13 is13 obtained13 by13 taking13 the13 dot13 product13 of13 the13 contrast-shy‐normalized13 signal13 with13 the13 receptive13 field13 and13 half-shy‐wave13 rectifying13

Ri = Rmax cnorm x( ) sdot fi x( )⎢⎣ ⎥⎦ 13 13

13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13

713 13

13 References13 Albrecht13 D13 G13 amp13 Geisler13 W13 S13 (1991)13 Motion13 selectivity13 and13 the13 contrast-shy‐response13 function13 of13

simple13 cells13 in13 the13 visual13 cortex13 Visual13 Neuroscience13 7(6)13 531ndash54613 Albrecht13 D13 G13 amp13 Hamilton13 D13 B13 (1982)13 Striate13 cortex13 of13 monkey13 and13 cat13 contrast13 response13

function13 Journal13 of13 Neurophysiology13 48(1)13 217ndash23713 Banks13 M13 S13 Gepshtein13 S13 amp13 Landy13 M13 S13 (2004)13 Why13 is13 spatial13 stereoresolution13 so13 low13 The13

Journal13 of13 Neuroscience13 24(9)13 2077ndash208913 13 Field13 D13 J13 (1987)13 Relations13 between13 the13 statistics13 of13 natural13 images13 and13 the13 response13 properties13 of13

cortical13 cells13 Journal13 of13 the13 Optical13 Society13 of13 America13 A13 4(12)13 2379ndash239413 Fincham13 E13 amp13 Walton13 J13 (1957)13 The13 reciprocal13 actions13 of13 accommodation13 and13 convergence13 The13

Journal13 of13 Physiology13 137(3)13 488ndash50813 Geisler13 W13 S13 amp13 Albrecht13 D13 G13 (1997)13 Visual13 cortex13 neurons13 in13 monkeys13 and13 cats13 detection13

discrimination13 and13 identification13 Visual13 Neuroscience13 14(5)13 897ndash91913 Harris13 J13 M13 McKee13 S13 P13 amp13 Smallman13 H13 S13 (1997)13 Fine-shy‐scale13 processing13 in13 human13 binocular13

stereopsis13 Journal13 of13 the13 Optical13 Society13 of13 America13 A13 14(8)13 1673ndash168313 Heeger13 D13 J13 (1992)13 Normalization13 of13 cell13 responses13 in13 cat13 striate13 cortex13 Visual13 Neuroscience13 9(2)13

181ndash19713 Hibbard13 P13 B13 (2008)13 Binocular13 energy13 responses13 to13 natural13 images13 Vision13 Research13 48(12)13 1427ndash

143913 13 Stockman13 A13 amp13 Sharpe13 L13 T13 (2000)13 The13 spectral13 sensitivities13 of13 the13 middle-shy‐13 and13 long-shy‐wavelength-shy‐

sensitive13 cones13 derived13 from13 measurements13 in13 observers13 of13 known13 genotype13 Vision13 Research13 40(13)13 1711ndash173713

Thibos13 L13 N13 Ye13 M13 Zhang13 X13 amp13 Bradley13 A13 (1992)13 The13 chromatic13 eye13 a13 new13 reduced-shy‐eye13 model13 of13 ocular13 chromatic13 aberration13 in13 humans13 Applied13 Optics13 31(19)13 3594ndash360013

Tolhurst13 D13 J13 Movshon13 J13 A13 amp13 Dean13 A13 F13 (1983)13 The13 statistical13 reliability13 of13 signals13 in13 single13 neurons13 in13 cat13 and13 monkey13 visual13 cortex13 Vision13 Research13 23(8)13 775ndash78513

Vlaskamp13 B13 N13 S13 Yoon13 G13 amp13 Banks13 M13 S13 (2011)13 Human13 stereopsis13 is13 not13 limited13 by13 the13 optics13 of13 the13 well-shy‐focused13 eye13 The13 Journal13 of13 Neuroscience13 31(27)13 9814ndash981813

13

813 13

13 Figure13 S113 Optimal13 filters13 and13 disparity13 estimation13 performance13 for13 estimating13 disparity13 for13 surfaces13 with13 a13 cosine13 distribution13 of13 slants13 (a)13 Optimal13 linear13 binocular13 filters13 for13 estimating13 the13 disparity13 (of13 the13 center13 pixel)13 of13 slanted13 surfaces13 The13 filters13 were13 learned13 on13 a13 training13 set13 having13 surfaces13 with13 a13 cosine13 distribution13 of13 slants13 The13 optimal13 filter13 shapes13 are13 robust13 to13 the13 presence13 of13 non-shy‐zero13 surface13 slants13 in13 the13 training13 set13 However13 these13 filters13 are13 of13 somewhat13 smaller13 spatial13 extent13 and13 slightly13 higher13 octave13 bandwidth13 (compare13 filters13 F3-shy‐F513 to13 those13 in13 Fig13 3)13 (b)13 Spatial13 frequency13 tuning13 and13 bandwidth13 of13 filters13 (c)13 Two-shy‐dimensional13 analogs13 of13 filters13 in13 (a)13 The13 two-shy‐dimensional13 filters13 respond13 virtually13 identically13 to13 two-shy‐dimensional13 images13 as13 do13 the13 one-shy‐dimensional13 filters13 to13 one-shy‐dimensional13 signals13 (d)13 Tuning13 curves13 of13 LL13 neurons13 trained13 and13 tested13 on13 surfaces13 with13 varying13 surface13 slant13 (e)13 Normalized13 weights13 on13 model13 complex13 cell13 responses13 for13 constructing13 five13 LL13 neurons13 with13 the13 indicated13 preferred13 disparities13 (f)13 Disparity13 estimation13 performance13 with13 surfaces13 having13 a13 cosine13 distribution13 of13 slants13 (black13 line)13 Also13 shown13 are13 disparity13 estimates13 derived13 from13 a13 local13 cross-shy‐correlator13 (gray13 line)13 (g)13 Disparity13 gradients13 from13 viewing13 natural13 scenes(Hibbard13 2008)13 vs13 disparity13 gradients13 in13 our13 training13 set13 The13 relationship13 between13 slant13 and13 disparity13 gradient13 is13 given13 by13

g asymp IPD tanθ( ) d 13 where13 θ 13 is13 the13 surface13 slant13 and13 d13 is13 the13 distance13 to13 the13 surface13 Hibbard13 (2008)13 estimated13 disparity13 gradients13 in13 natural13 viewing13 for13 distances13 nearer13 than13 3513 m13 (dashed13 curve)13 and13 0513 m13 (dotted13 curve)13 At13 the13 04m13 viewing13 distance13 assumed13 throughout13 the13 paper13 our13 manipulation13 of13 surface13 slant13 introduced13 distribution13 of13 disparity13 gradients13 (black13 line)13 that13 is13 comparable13 to13 the13 gradients13 introduced13 in13 natural13 viewing13 13

913 13

13 Figure13 S213 Filter13 responses13 LL13 neuron13 responses13 and13 performance13 with13 random-shy‐dot13 stereograms13 and13 anti-shy‐correlated13 random-shy‐dot13 stereograms13 (a)13 Joint13 response13 distributions13 of13 filters13 F113 and13 F213 when13 stimulated13 with13 random-shy‐dot13 stereograms13 Compare13 to13 natural13 stimuli13 in13 Fig13 4a13 (b)13 Joint13 responses13 of13 filters13 F313 and13 F413 when13 stimulated13 with13 random-shy‐dot13 stereograms13 (c)13 LL13 neuron13 responses13 to13 random13 dot13 stereograms13 (d)13 LL13 neuron13 tuning13 vs13 bandwidth13 when13 stimulated13 with13 natural13 stimuli13 Note13 that13 bandwidths13 are13 elevated13 compared13 to13 when13 the13 LL13 neurons13 are13 stimulated13 with13 natural13 stimuli13 (see13 Fig13 7c)13 (e)13 Disparity13 estimates13 and13 estimate13 precision13 (inset)13 obtained13 with13 random-shy‐dot13 stereograms13 (fghi)13 Same13 as13 abce13 except13 that13 the13 stimuli13 were13 anti-shy‐correlated13 random13 dot13 stereograms13 Filter13 response13 distributions13 still13 segregate13 as13 a13 function13 of13 disparity13 (especially13 with13 F313 and13 F4)13 However13 the13 anti-shy‐correlated13 filter13 response13 distributions13 do13 not13 coincide13 with13 the13 lsquoexpectedrsquo13 pattern13 of13 response13 for13 particular13 disparities13 The13 disparities13 are13 decoded13 improperly13 with13 a13 decoder13 trained13 on13 natural13 stimuli13 LL13 neuron13 tuning13 curves13 with13 anti-shy‐correlated13 stereograms13 are13 neither13 selective13 nor13 invariant13 Inaccurate13 highly13 variable13 estimates13 result13 13 13

13 13

1013 13

13 Figure13 S313 Obtaining13 AMA13 filter13 and13 model13 complex13 cell13 responses13 with13 established13 neural13 operations13 (a)13 Simple13 cell13 responses13 are13 obtained13 by13 projecting13 the13 left-shy‐13 and13 right-shy‐eye13 signals13 onto13 each13 binocular13 filter13 and13 half-shy‐wave13 rectifying13 Subtracting13 lsquoonrsquo13 and13 lsquooffrsquo13 simple13 cell13 responses13 give13 the13 AMA13 filter13 responses13 (see13 Fig13 3a)13 (b)13 Model13 (and13 neurophysiological)13 complex13 cell13 responses13 are13 obtained13 by13 summing13 and13 squaring13 the13 responses13 of13 simple13 cells13 (Model13 complex13 cell13 responses13 can13 also13 be13 obtained13 by13 summing13 the13 responses13 of13 simple13 cells13 that13 have13 both13 a13 half-shy‐wave13 rectifying13 and13 a13 squaring13 nonlinearity)13 Note13 that13 the13 model13 complex13 cells13 differ13 from13 canonical13 disparity13 energy13 binocular13 complex13 cells13 We13 label13 them13 lsquocomplexrsquo13 because13 they13 exhibit13 temporal13 frequency13 doubling13 and13 have13 a13 fundamental13 to13 DC13 response13 ratio13 of13 less13 than13 one13 The13 model13 LL13 neurons13 (see13 Fig13 613 and13 Fig13 7)13 are13 more13 similar13 to13 disparity13 energy13 complex13 cells13 although13 as13 noted13 in13 the13 text13 there13 are13 also13 important13 differences13

13

1113 13

13 Figure13 S413 The13 eight13 AMA13 filters13 from13 Fig13 313 (on13 the13 diagonal)13 and13 their13 pairwise13 sums13 (see13 Fig13 S3a)13 Solid13 and13 dashed13 curves13 indicate13 the13 left13 and13 right13 eye13 filter13 components13 respectively13 Filter13 identity13 is13 indicated13 by13 the13 subscripts13 Fij13 Some13 filters13 have13 shapes13 similar13 to13 canonical13 V1-shy‐like13 binocular13 receptive13 fields13 These13 filters13 yield13 model13 complex13 cell13 tuning13 curves13 that13 vary13 with13 disparity13 (Fig13 S5)13 Other13 filters13 are13 irregularly13 shaped13 (eg13 F36)13 Irregularly13 shaped13 filters13 yield13 model13 complex13 cell13 tuning13 curves13 that13 do13 not13 vary13 significantly13 with13 disparity13 (Fig13 S5)13 The13 inset13 shows13 the13 largest13 absolute13 normalized13 weight13 (see13 Supplement13 Eqs13 S1a-shy‐d)13 given13 to13 each13 filter13 in13 constructing13 the13 set13 of13 LL13 neurons13 (Fig13 7a)13 Lighter13 colors13 indicate13 smaller13 weights13 Unsurprisingly13 the13 irregularly13 shaped13 filters13 (eg13 F36)13 play13 a13 generally13 smaller13 role13 in13 shaping13 the13 response13 properties13 of13 log-shy‐likelihood13 (LL)13 neurons13 13 13

1213 13

13 Figure13 S513 Disparity13 tuning13 curves13 of13 the13 model13 complex13 cells13 implied13 by13 the13 AMA13 filters13 (on-shy‐diagonal)13 and13 their13 normalized13 pairwise13 sums13 (off-shy‐diagonal)13 Solid13 curves13 show13 the13 complex13 cell13 tuning13 curves13 to13 natural13 stimuli13 The13

complex13 response13 is13 given13 by13 filtering13 the13 contrast-shy‐normalized13 stimulus13 and13 then13 squaring13 Rmax fi

Tcnorm( )2 13 (see13 Fig13 S4)13 Each13 point13 on13 the13 curve13 represents13 the13 average13 response13 to13 a13 large13 number13 of13 natural13 stimuli13 having13 the13 same13 disparity13 Gray13 area13 shows13 response13 variability13 due13 to13 variation13 in13 natural13 image13 content13 Dashed13 curves13 show13 the13 complex13 cell13 tuning13 curve13 to13 513 random-shy‐dot13 stereograms13 (RDSs)13 In13 most13 cases13 RDS13 tuning13 curves13 are13 similar13 to13 the13 tuning13 curves13 for13 natural13 stimuli13 although13 response13 magnitude13 is13 generally13 somewhat13 attenuated13 These13 individual13 model13 complex13 cells13 have13 some13 selectivity13 for13 disparity13 However13 they13 exhibit13 little13 invariance13 to13 variation13 in13 natural13 image13 content13 Thus13 the13 pattern13 of13 the13 joint13 population13 response13 must13 be13 used13 to13 accurately13 estimate13 disparity13 Neurons13 that13 signal13 the13 log-shy‐likelihood13 of13 disparity13 (LL13 neurons)13 can13 be13 constructed13 by13 appropriately13 combining13 the13 joint13 responses13 of13 model13 complex13 cells13 LL13 neurons13 are13 highly13 selective13 for13 disparity13 and13 are13 also13 strongly13 response13 invariant13 to13 irrelevant13

1313 13

13 Figure13 S613 Optimal13 disparity13 estimation13 vs13 estimation13 via13 local13 cross-shy‐correlation13 Performance13 measures13 were13 obtained13 by13 testing13 each13 method13 on13 identical13 sets13 of13 left-shy‐13 and13 right-shy‐eye13 signals13 like13 those13 shown13 in13 Fig13 1d13 (Note13 that13 this13 differs13 somewhat13 from13 the13 most13 common13 test13 of13 the13 cross-shy‐correlation13 model13 which13 gives13 the13 algorithm13 windowed13 access13 to13 many13 patches13 in13 each13 eye13 We13 chose13 the13 current13 method13 so13 that13 each13 algorithm13 has13 access13 to13 the13 same13 image13 information13 thereby13 making13 the13 comparison13 meaningful)13 Black13 curves13 indicate13 estimation13 performance13 via13 local13 cross-shy‐correlation13 Gray13 curves13 indicate13 performance13 obtained13 with13 the13 eight13 AMA13 filters13 shown13 in13 the13 main13 text13 (data13 replotted13 from13 Fig13 5)13 (a)13 Median13 MAP13 estimates13 for13 frontoparallel13 surfaces13 only13 Error13 bars13 indicate13 6813 confidence13 intervals13 (b)13 Estimate13 precision13 as13 indicated13 by13 6813 confidence13 intervals13 Error13 bars13 from13 (a)13 are13 replotted13 on13 a13 linear13 axis13 (Confidence13 intervals13 were13 plotted13 on13 a13 semi-shy‐log13 plot13 in13 the13 main13 text)13 (c)13 Sign13 identification13 performance13 with13 frontoparallel13 surfaces13 (de)13 Same13 as13 bc13 except13 that13 the13 performance13 is13 for13 surfaces13 with13 a13 cosine13 distribution13 of13 slants13 Estimation13 performance13 with13 the13 cross-shy‐correlator13 is13 (slightly)13 more13 biased13 and13 is13 significantly13 less13 precise13 at13 disparities13 larger13 than13 7513 arcmin13 13

1413 13

13 Figure13 S713 Information13 for13 disparity13 estimation13 is13 concentrated13 in13 low13 to13 mid13 spatial13 frequencies13 (a)13 Inter-shy‐ocular13 retinal13 contrast13 amplitude13 of13 a13 cosine-shy‐windowed13 high-shy‐contrast13 edge13 as13 a13 function13 of13 spatial13 frequency13 for13 different13 disparities13 (Eq13 S6)13 The13 optics13 in13 the13 two13 eyes13 were13 identical13 ie13

AL f( ) = AR f( )prop MTFδ f( ) f 13 where13 MTFδ is13 the13 modulation13 transfer13 function13 associated13 with13 a13 particular13 disparity13 (see13 Methods)13 (b)13 Inter-shy‐ocular13 contrast13 signals13 that13 have13 been13 passed13 through13 a13 bank13 of13 1513 octave13 bandwidth13 log-shy‐Gabor13 filters13 The13 signals13 differ13 little13 above13 ~513 cpd13 (c)13 The13 effect13 of13 vergence13 jitter13 on13 inter-shy‐ocular13 contrast13 signals13 Signals13 were13 corrupted13 by13 213 arcmin13 of13 vergence13 jitter13 a13 typical13 value13 in13 humans13 Vergence13 noise13 was13 simulated13 by13 convolving13 each13 disparity13 signal13 with13 a13 Gaussian13 having13 a13 standard13 deviation13 of13 213 arcmin13 Vergence13 jitter13 further13 attenuates13 high13 frequency13 inter-shy‐ocular13 contrast13 signals13 Natural13 image13 structure13 and13 vergence13 jitter13 both13 contribute13 to13 the13 lack13 of13 disparity13 information13 at13 high13 luminance13 spatial13 frequencies13 (d)13 The13 signals13 in13 (c)13 after13 having13 been13 passed13 through13 a13 bank13 of13 1513 octave13 bandwidth13 log-shy‐Gabor13 filters13 (e)13 The13 effect13 of13 a13 four13 diopter13 difference13 in13 refractive13 power13 between13 the13 two13 eyes13 drastically13 reduces13 the13 inter-shy‐ocular13 contrast13 signal13 A13 four13 diopter13 (or13 larger)13 difference13 between13 the13 left13 and13 right13 optics13 is13 a13 primary13 risk13 factor13 for13 the13 development13 of13 amblyopia13 (Levi13 et13 al13 2011)13 Note13 that13 these13 signals13 were13 simulated13 for13 eyes13 with13 2mm13 pupils13 With13 larger13 pupil13 diameters13 the13 same13 difference13 in13 refractive13 power13 would13 further13 reduce13 available13 disparity13 signals13 For13 example13 with13 a13 2mm13 pupil13 an13 813 diopter13 difference13 between13 the13 eyes13 produces13 the13 same13 difference13 in13 defocus13 blur13 as13 would13 a13 4mm13 pupil13 with13 a13 413 diopters13 difference13 between13 the13 eyes13 In13 that13 case13 there13 are13 no13 measureable13 signals13 above13 113 cpd13 (f)13 The13 signals13 in13 (e)13 after13 having13 been13 passed13 through13 a13 bank13 of13 1513 octave13 bandwidth13 log-shy‐Gabor13 filters13 (g)13 Same13 as13 (e)13 but13 with13 a13 4mm13 pupil13 (h)13 Same13 as13 (f)13 but13 with13 a13 4mm13 pupil13 13

13

1513 13

13 Figure13 S813 Depth13 structure13 not13 attributable13 to13 slant13 in13 natural13 scenes13 (a)13 A13 representative13 natural13 range13 image13 (b)13 Overhead13 plan13 view13 of13 the13 range13 along13 the13 white13 scan-shy‐line13 in13 (a)13 (c)13 Range13 values13 within13 a13 113 deg13 patch13 marked13 by13 the13 white13 box13 in13 (a)13 and13 the13 black13 box13 in13 (b)13 The13 dots13 represent13 the13 raw13 range13 values13 The13 plane13 is13 the13 best13 fitting13 plane13 to13 the13 range13 data13 (d)13 Depth13 variation13 in13 113 deg13 patches13 of13 natural13 scenes13 can13 largely13 be13 captured13 by13 slanted13 planes13 (see13 supplementary13 text)13 Values13 near13 one13 indicate13 that13 all13 range13 variation13 can13 be13 captured13 by13 (planar)13 surface13 slant13 Values13 near13 zero13 indicate13 that13 none13 of13 the13 range13 variation13 can13 be13 captured13 by13 planar13 surface13 slant13 In13 the13 most13 common13 patch13 nearly13 all13 depth13 variation13 is13 attributable13 to13 slant13 In13 the13 median13 patch13 7513 of13 depth13 variation13 is13 captured13 by13 slant13 13 13 13 13 13 13 13 13 13 13

1613 13

13 Figure13 S913 Contrast13 normalization13 and13 the13 Gaussian13 form13 of13 the13 filter13 response13 distributions13 The13 standard13 model13 of13

contrast13 normalization13 in13 retinal13 and13 cortical13 neurons13 is13 given13 by13 cnorm x( ) = c x( ) c x( ) 2

+ nc502 (Albrecht13 amp13 Geisler13

199113 Albrecht13 amp13 Hamilton13 198213 Heeger13 1992)13 Throughout13 the13 paper13 we13 assumed13 that13 c50 13 had13 a13 value13 of13 zero13 To13 examine13 the13 effect13 of13 different13 values13 of13 c50 13 on13 the13 form13 of13 the13 filter13 response13 distributions13 we13 first13 contrast-shy‐normalized13 the13 test13 stimuli13 using13 different13 values13 of13 c50 13 Then13 we13 projected13 the13 contrast-shy‐normalized13 the13 test13 stimuli13 from13 each13 disparity13 level13 onto13 the13 filters13 to13 obtain13 the13 conditional13 filter13 response13 distributions13 Finally13 we13 fit13 a13 multi-shy‐dimensional13 generalized13 Gaussian13 (one-shy‐dimension13 for13 each13 filter)13 to13 each13 conditional13 response13 distribution13

P R |δ k( ) 13 via13 maximum13 likelihood13 estimation13 (a)13 When13 c50 =13 0013 the13 conditional13 response13 distributions13 have13 tails13 that13 tend13 to13 be13 somewhat13 lighter13 than13 Gaussians13 (ie13 lower13 kurtosis13 than13 a13 Gaussian)13 When13 c50 =13 0113 the13 conditional13 response13 distributions13 are13 most13 Gaussian13 on13 average13 For13 c50 0113 the13 distributions13 have13 heavier13 tails13 than13 Gaussians13 (ie13 higher13 kurtosis13 than13 a13 Gaussian)13 (b)13 Powers13 of13 the13 multi-shy‐dimensional13 generalized13 Gaussians13 that13 best13 fit13 a13 subset13 of13 the13 two-shy‐dimensional13 conditional13 filter13 response13 distributions13 A13 power13 of13 two13 (dashed13 line)13 indicates13 that13 the13 best13 fitting13 distribution13 was13 Gaussian13 Powers13 greater13 than13 or13 less13 than13 two13 indicate13 that13 the13 responses13 are13 better13 fit13 by13 distributions13 having13 lighter13 or13 heavier13 tails13 than13 a13 Gaussian13 respectively13 The13 best-shy‐fitting13 powers13 are13 shown13 for13 three13 different13 values13 of13 c50 13 00013 01013 and13 02513 The13 average13 powers13 across13 the13 conditional13 response13 distributions13 for13 the13 three13 tested13 values13 of13 c50 13 were13 27913 20713 and13 11713 respectively13 The13 same13 qualitative13 results13 hold13 for13 the13 other13 two-shy‐dimensional13 response13 distributions13 The13 pattern13 also13 holds13 for13 higher-shy‐dimensional13 filter13 response13 distributions13 In13 higher13 dimensions13 however13 results13 are13 noisier13 because13 stable13 estimates13 of13 kurtosis13 require13 significantly13 more13 data13 13 13 13 13 13

  • jovi-14-02-01_118
    • Introduction
    • f01
    • Methods and results
    • e01
    • f02
    • e02
    • e03
    • e04
    • f03
    • e05
    • f04
    • f05
    • f06
    • f07
    • Discussion
    • Conclusions
    • Methods details
    • e06
    • e07
    • Albrecht1
    • Albrecht2
    • Anzai1
    • Badcock1
    • Banks1
    • Berens1
    • Blakemore1
    • Burge1
    • Burge2
    • Carandini1
    • Chen1
    • Cormack1
    • Cumming1
    • Cumming2
    • Cumming3
    • Curcio1
    • DeValois1
    • DeAngelis1
    • Ferster1
    • Fincham1
    • Fleet1
    • Ganguli1
    • Geisler1
    • Girshick1
    • Haefner1
    • Haefner2
    • Harris1
    • Hecht1
    • Heeger1
    • Hibbard1
    • Hoyer1
    • Jazayeri1
    • Kaye1
    • Landers1
    • Lehky1
    • Levi1
    • Lewicki1
    • Li1
    • Liu1
    • Ma1
    • Marcos1
    • Marr1
    • McKee1
    • Nakayama1
    • Nienborg1
    • Ogle1
    • Ohzawa1
    • Ohzawa2
    • Ohzawa3
    • Olshausen1
    • Panum1
    • Poggio1
    • Qian1
    • Qian2
    • Read1
    • Read2
    • Read3
    • Simoncelli1
    • Skottun1
    • Stevenson1
    • Stockman1
    • Tanabe1
    • Tanabe2
    • Thibos1
    • Tsao1
    • Tyler1
    • Tyler2
    • Vlaskamp1
    • Williams1
    • Wyszecki1
      • JOV-03741-2013-s01
Page 15: Optimal disparity estimation in natural stereo images · The local differences between the retinal images—the binocular disparities— ... vertically oriented receptive fields.The

surface and the Fraunhoffer approximation whichimplies that at or near the focal plane the opticaltransfer function (OTF) is given by the cross-correla-tion of the generalized pupil function with its complexconjugate In humans and nonhuman primates ac-commodative and vergence eye-movement systems aretightly coupled In other words the focus distanceusually equals the fixation distance (Fincham ampWalton 1957) We set the focus distance equal to thefixation distance of 40 cm and defocused the imagesappropriate for each disparity See Supplement forfurther details of how the optics were simulated

Sensor array responses

The information for disparity estimation depends onthe depth and luminance structure of natural scenesprojective geometry the optics of the two eyes thesensor arrays and the sensor noise These factorstogether determine the pattern of noisy sensor re-sponses

reethxTHORN frac14 IeethxTHORNpsfeethxTHORNfrac12 sampethxTHORN thorn g eth7THORNfor each eye e Ie(x) represents an eye-specific lumi-nance image of the light striking the sensor array ateach location xfrac14 (x y) in a hypothetical optical systemthat causes zero degradation in image quality Eacheyersquos optics degrade the idealized image the optics arerepresented by a polychromatic point-spread functionpsfe(x) that contains the effects of defocus chromaticaberration and photopic wavelength sensitivity Thesensor arrays are represented by a spatial samplingfunctions samp(x) Finally the sensor responses arecorrupted by noise g the noise level was set just highenough to remove retinal image detail that is unde-tectable by the human visual system (Williams 1985)(In pilot studies we found that the falloff in contrastsensitivity at low frequencies had a negligible effect onresults for simplicity we did not model its effects) Notethat Equation 7 represents a luminance (photopic)approximation of the images that would be captured bythe photoreceptors This approximation is sufficientlyaccurate for the present purposes (Burge amp Geisler2011)

Accuracy maximization analysis (AMA)

AMA is a method for dimensionality reduction thatfinds the low-dimensional set of features in sensorysignals that are most useful for performing a specifictask The dependence of AMA on task renders itdistinct from many other popular methods for dimen-sionality reduction Principal components analysis(PCA) for example finds the dimensions that account

for the most variation in the sensory signals There isof course no guarantee that factors causing the mostvariation in the sensory signals will be useful for thetask at hand Color lighting and the multitude oftextures in natural images for example cause signifi-cant variation in the retinal images Yet all of thisvariation is irrelevant for the task of estimatingdisparity Thus PCA returns features that are notnecessarily useful to the task while AMA is specificallydesigned to ignore the irrelevant variation

The logic of AMA is as follows Consider encodingeach training stimulus with a small population of filterswith known (internal) noise characteristics The filterresponses are given by the dot product of the contrast-normalized stimulus scaled by the maximum responserate plus response noise (here a small amount ofGaussian noise) With a known noise model it isstraightforward to compute the mean and variance ofeach filterrsquos response to each training stimulus Then aclosed-form expression is derived for the approximateaccuracy of the Bayesian optimal decoder (GeislerNajemnik amp Ing 2009) Finally this closed-formexpression can be used to search the space of linearfilters to find those that give the most accurateperformance If the algorithm does not settle in localminima it finds the Bayes-optimal filters for maximiz-ing performance in a given task (Geisler et al 2009)Here different random initializations yielded the samefinal estimated filters A Matlab implementation ofAMA and a short discussion of how to apply it areavailable at httpjburgecpsutexaseduresearchCodehtml

It is important to note that the Bayesian optimaldecoder used in AMA requires knowing the means andvariances for each possible stimulus and hence it canonly be used to decode the training stimuli In otherwords AMA can find the optimal linear filters given alarge enough training set but the decoder it uses tolearn those filters cannot be applied to arbitrary stimuliA separate analysis (like the one described in the bodyof the present paper) is required to determine how tooptimally decode the AMA filter responses for arbi-trary test stimuli

Estimating disparity

Increasing the number of disparity levels in thetraining set increases the accuracy of disparity estima-tion However increasing the number of disparitylevels in the training set also increases the training setsize and the computational complexity of learningfilters via AMA A balance must be struck Excellentcontinuous estimates are obtained using 1875 arcminsteps for training followed by interpolation of 577 filterresponse distributions (ie 31 interpolated response

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 15

distribution between each training step correspondingto one interpolated distribution every 35 arcsec)Interpolated distributions were obtained by fittingcubic splines through the response distribution meanvectors and covariance matrices This procedureresulted in 577 LL neurons which resulted in posteriorprobability distributions that were defined by 577discrete points MAP estimates were obtained byselecting the disparity with the highest posteriorprobability Increasing the number of interpolateddistributions (ie LL neurons) had no effect onperformance

Keywords natural scene statistics perceptual con-stancy ideal observer Bayesian statistics populationcode encoding decoding selectivity invariance depthperception stereopsis hierarchical model disparityenergy model simple cells complex cells

Acknowledgments

The authors thank Larry Cormack Bruno Olshau-sen and Jonathan Pillow for comments on an earlierdraft JB was supported by NSF grant IIS-1111328NIH grant 2R01-EY011747 and NIH Training Grant1T32-EY021462-01A1 WSG was supported by NIHgrant 2R01-EY011747

Commercial relationships noneCorresponding author Johannes BurgeEmail jburgemailcpsutexaseduAddress Center for Perceptual Systems University ofTexas at Austin Austin TX USA

References

Albrecht D G amp Geisler W S (1991) Motionselectivity and the contrast-response function ofsimple cells in the visual cortex Visual Neurosci-ence 7(6) 531ndash546

Albrecht D G amp Hamilton D B (1982) Striatecortex of monkey and cat Contrast responsefunction Journal of Neurophysiology 48(1) 217ndash237

Anzai A Ohzawa I amp Freeman R D (1999)Neural mechanisms for processing binocular in-formation I Simple cells Journal of Neurophysiol-ogy 82(2) 891ndash908

Badcock D R amp Schor C M (1985) Depth-increment detection function for individual spatialchannels Journal of the Optical Society of AmericaA 2(7) 1211ndash1215

Banks M S Gepshtein S amp Landy M S (2004)Why is spatial stereoresolution so low Journal ofNeuroscience 24(9) 2077ndash2089

Berens P Ecker A S Gerwinn S Tolias A S ampBethge M (2011) Reassessing optimal neuralpopulation codes with neurometric functionsProceedings of the National Academy of SciencesUSA 108(11) 4423

Blakemore C (1970) The range and scope ofbinocular depth discrimination in man The Journalof Physiology 211(3) 599ndash622

Burge J amp Geisler W S (2011) Optimal defocusestimation in individual natural images Proceed-ings of the National Academy of Sciences USA108(40) 16849ndash16854

Burge J amp Geisler W S (2012) Optimal defocusestimates from individual images for autofocusing adigital camera Proceedings of the ISampTSPIE 47thAnnual Meeting January 2012 Burlingame CA

Carandini M amp Heeger D J (2012) Normalizationas a canonical neural computation Nature ReviewsNeuroscience 13(1) 51ndash62

Chen Y amp Qian N (2004) A coarse-to-fine disparityenergy model with both phase-shift and position-shift receptive field mechanisms Neural Computa-tion 16(8) 1545ndash1577

Cormack L K Stevenson S B amp Schor C M(1991) Interocular correlation luminance contrastand cyclopean processing Vision Research 31(12)2195ndash2207

Cumming B G (2002) An unexpected specializationfor horizontal disparity in primate primary visualcortex Nature 418(6898) 633ndash636

Cumming B G amp DeAngelis G C (2001) Thephysiology of stereopsis Annual Review of Neuro-science 24 203ndash238

Cumming B G Shapiro S E amp Parker A J (1998)Disparity detection in anticorrelated stereogramsPerception 27(11) 1367ndash1377

Curcio C A Sloan K R Kalina R E ampHendrickson A E (1990) Human photoreceptortopography The Journal of Comparative Neurolo-gy 292(4) 497ndash523

De Valois R L Albrecht D G amp Thorell L G(1982) Spatial frequency selectivity of cells inmacaque visual cortex Vision Research 22(5) 545ndash559

DeAngelis G C Ohzawa I amp Freeman R D(1991) Depth is encoded in the visual cortex by aspecialized receptive field structure Nature352(6331) 156ndash159

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 16

Ferster D amp Miller K D (2000) Neural mechanismsof orientation selectivity in the visual cortexAnnual Review of Neuroscience 23(1) 441ndash471

Fincham E amp Walton J (1957) The reciprocalactions of accommodation and convergence TheJournal of Physiology 137(3) 488ndash508

Fleet D J Wagner H amp Heeger D J (1996)Neural encoding of binocular disparity Energymodels position shifts and phase shifts VisionResearch 36(12) 1839ndash1857

Ganguli D amp Simoncelli E P (2010) Implicitencoding of prior probabilities in optimal neuralpopulations Advances in Neural Information Pro-cessing Systems (NIPS) 23 658ndash666

Geisler W S Najemnik J amp Ing A D (2009)Optimal stimulus encoders for natural tasksJournal of Vision 9(13)17 1ndash16 httpwwwjournalofvisionorgcontent91317 10116791317 [PubMed] [Article]

Girshick A R Landy M S amp Simoncelli E P(2011) Cardinal rules Visual orientation percep-tion reflects knowledge of environmental statisticsNature Neuroscience 14(7) 926ndash932

Haefner R amp Bethge M (2010) Evaluating neuronalcodes for inference using Fisher informationAdvances in Neural Information Processing Systems23 1ndash9

Haefner R M amp Cumming B G (2008) Adaptationto natural binocular disparities in primate V1explained by a generalized energy model Neuron57(1) 147ndash158

Harris J M McKee S P amp Smallman H S (1997)Fine-scale processing in human binocular stereop-sis Journal of the Optical Society of America AOptics Image Science and Vision 14(8) 1673ndash1683

Hecht S Shlaer S amp Pirenne M H (1942) Energyquanta and vision The Journal of General Physi-ology 25(6) 819ndash840

Heeger D J (1992) Normalization of cell responses incat striate cortex Visual Neuroscience 9(2) 181ndash197

Hibbard P B (2008) Binocular energy responses tonatural images Vision Research 48(12) 1427ndash1439

Hoyer P O amp Hyvarinen A (2000) Independentcomponent analysis applied to feature extractionfrom colour and stereo images Network Compu-tation in Neural Systems 11(3) 191ndash210

Jazayeri M amp Movshon J A (2006) Optimalrepresentation of sensory information by neuralpopulations Nature Neuroscience 9(5) 690ndash696

Kaye M (1978) Stereopsis without binocular corre-lation Vision Research 18(8) 1013ndash1022

Landers D D amp Cormack L K (1997) Asymme-tries and errors in perception of depth fromdisparity suggest a multicomponent model ofdisparity processing Perception amp Psychophysics59(2) 219ndash231

Lehky S R amp Sejnowski T J (1990) Neural modelof stereoacuity and depth interpolation based on adistributed representation of stereo disparityJournal of Neuroscience 10(7) 2281ndash2299

Levi D M McKee S P amp Movshon J A (2011)Visual deficits in anisometropia Vision Research51(1) 48ndash57

Lewicki M S (2002) Efficient coding of naturalsounds Nature Neuroscience 5(4) 356ndash363

Li Z amp Atick J J (1994) Efficient stereo coding inthe multiscale representation Network Computa-tion in Neural Systems 5 157ndash174

Liu Y Bovik A C amp Cormack L K (2008)Disparity statistics in natural scenes Journal ofVision 8(11)19 1ndash14 httpwwwjournalofvisionorgcontent81119 doi10116781119 [PubMed][Article]

Ma W J Beck J M Latham P E amp Pouget A(2006) Bayesian inference with probabilistic pop-ulation codes Nature Neuroscience 9(11) 1432ndash1438

Marcos S amp Burns S A (2000) On the symmetrybetween eyes of wavefront aberration and conedirectionality Vision Research 40(18) 2437ndash2447

Marr D amp Poggio T (1976) Cooperative computa-tion of stereo disparity Science 194(4262) 283ndash287

McKee S P Levi D M amp Bowne S F (1990) Theimprecision of stereopsis Vision Research 30(11)1763ndash1779

Nakayama K amp Shimojo S (1990) Da Vincistereopsis Depth and subjective occluding contoursfrom unpaired image points Vision Research30(11) 1811ndash1825

Nienborg H Bridge H Parker A J amp CummingB G (2004) Receptive field size in V1 neuronslimits acuity for perceiving disparity modulationJournal of Neuroscience 24(9) 2065ndash2076

Ogle K N (1952) On the limits of stereoscopic visionJournal of Experimental Psychology 44(4) 253ndash259

Ohzawa I (1998) Mechanisms of stereoscopic visionThe disparity energy model Current Opinion inNeurobiology 8(4) 509ndash515

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 17

Ohzawa I DeAngelis G C amp Freeman R D(1990) Stereoscopic depth discrimination in thevisual cortex Neurons ideally suited as disparitydetectors Science 249(4972) 1037ndash1041

Ohzawa I DeAngelis G C amp Freeman R D(1997) Encoding of binocular disparity by complexcells in the catrsquos visual cortex Journal of Neuro-physiology 77(6) 2879ndash2909

Olshausen B A amp Field D J (1996) Emergence ofsimple-cell receptive field properties by learning asparse code for natural images Nature 381(6583)607ndash609

Panum P (1858) Physiologische Untersuchungen uberdas Sehen mit zwei Augen [Translation Psycholog-ical investigations on seeing with two eyes] KielSchwerssche Buchandlung

Poggio G F Gonzalez F amp Krause F (1988)Stereoscopic mechanisms in monkey visual cortexBinocular correlation and disparity selectivityJournal of Neuroscience 8(12) 4531ndash4550

Qian N (1997) Binocular disparity and the perceptionof depth Neuron 18(3) 359ndash368

Qian N amp Zhu Y (1997) Physiological computationof binocular disparity Vision Research 37(13)1811ndash1827

Read J C A amp Cumming B G (2003) Testingquantitative models of binocular disparity selectiv-ity in primary visual cortex Journal of Neurophys-iology 90(5) 2795ndash2817

Read J C A amp Cumming B G (2007) Sensors forimpossible stimuli may solve the stereo correspon-dence problem Nature Neuroscience 10(10) 1322ndash1328

Read J C A Parker A J amp Cumming B G (2002)A simple model accounts for the response ofdisparity-tuned V1 neurons to anticorrelated im-ages Visual Neuroscience 19(6) 735ndash753

Simoncelli E P amp Olshausen B A (2001) Naturalimage statistics and neural representation AnnualReview of Neuroscience 24 1193ndash1216

Skottun B C De Valois R L Grosof D HMovshon J A Albrecht D G amp Bonds A B(1991) Classifying simple and complex cells on the

basis of response modulation Vision Research31(7-8) 1079ndash1086

Stevenson S B Cormack L K Schor C M ampTyler C W (1992) Disparity tuning in mecha-nisms of human stereopsis Vision Research 32(9)1685ndash1694

Stockman A amp Sharpe L T (2000) The spectralsensitivities of the middle- and long-wavelength-sensitive cones derived from measurements inobservers of known genotype Vision Research40(13) 1711ndash1737

Tanabe S amp Cumming B G (2008) Mechanismsunderlying the transformation of disparity signalsfrom V1 to V2 in the macaque Journal ofNeuroscience 28(44) 11304ndash11314

Tanabe S Haefner R M amp Cumming B G (2011)Suppressive mechanisms in monkey V1 help tosolve the stereo correspondence problem Journal ofNeuroscience 31(22) 8295ndash8305

Thibos L N Ye M Zhang X amp Bradley A(1992) The chromatic eye A new reduced-eyemodel of ocular chromatic aberration in humansApplied Optics 31(19) 3594ndash3600

Tsao D Y Conway B R amp Livingstone M S(2003) Receptive fields of disparity-tuned simplecells in macaque V1 Neuron 38(1) 103ndash114

Tyler C W (1975) Spatial organization of binoculardisparity sensitivity Vision Research 15(5) 583ndash590

Tyler C W amp Julesz B (1978) Binocular cross-correlation in time and space Vision Research18(1) 101ndash105

Vlaskamp B N S Yoon G amp Banks M S (2011)Human stereopsis is not limited by the optics of thewell-focused eye Journal of Neuroscience 31(27)9814ndash9818

Williams D R (1985) Visibility of interference fringesnear the resolution limit Journal of the OpticalSociety of America A 2(7) 1087ndash1093

Wyszecki G amp Stiles W (1982) Color scienceConcepts and methods quantitative data and for-mulas New York John Wiley amp Sons

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 18

113 13

Optimal13 disparity13 estimation13 in13 natural13 stereo-shy‐images13 (Supplement)13

Johannes13 Burge13 amp13 Wilson13 S13 Geisler13 13 Computing13 the13 log13 likelihood13 with13 populations13 of13 simple13 and13 complex13 cells13 13 Simple13 cells13 in13 visual13 cortex13 produce13 only13 positive13 responses13 whereas13 AMA13 filters13 produce13 both13 positive13 and13 negative13 responses13 The13 linear13 AMA13 filter13 responses13 (see13 first13 term13 in13 Eq13 5)13 and13 their13 pairwise13 sums13 (see13 the13 sum13 in13 the13 third13 termrsquos13 parentheses)13 can13 be13 obtained13 from13 a13 population13 of13 simple13 cells13 We13 define13 simple13 cells13 as13 units13 whose13 outputs13 result13 from13 linear13 filtering13 followed13 by13 half-shy‐wave13 rectification13 (ie13 thresholding13 at13 zero)13 Each13 AMA13 filter13 response13 in13 the13 first13 term13 can13 be13 obtained13 from13 two13 simple13 cells13 that13 are13 lsquoonrsquo13 and13 lsquooffrsquo13 versions13 of13 each13 AMA13 filter13 (Fig13 S3a13 see13 Fig13 3)13 Each13 pairwise-shy‐summed13 AMA13 filter13 response13 in13 the13 third13 term13 can13 be13 obtained13 from13 two13 simple13 cells13 that13 are13 lsquoonrsquo13 and13 lsquooffrsquo13 versions13 of13 each13 pairwise-shy‐summed13 filter13 (Fig13 S3a13 S4)13 13 13 The13 squared13 AMA13 filter13 responses13 (see13 the13 second13 and13 third13 terms13 in13 Eq13 5)13 can13 be13 obtained13 from13 a13 population13 of13 complex13 cells13 We13 define13 complex13 cells13 as13 units13 whose13 responses13 result13 from13 linear13 filtering13 followed13 by13 a13 squaring13 non-shy‐linearity13 as13 is13 often13 the13 case13 in13 cortex13 Complex13 cells13 can13 be13 implemented13 by13 summing13 and13 then13 squaring13 (or13 squaring13 and13 then13 summing)13 the13 outputs13 of13 lsquoonrsquo13 and13 lsquooffrsquo13 simple13 cells13 (Fig13 S3b13 the13 disparity13 tuning13 curves13 of13 these13 complex13 cells13 are13 given13 in13 Fig13 S5)13 13 Finally13 neurons13 whose13 response13 rates13 represent13 the13 log13 likelihood13 of13 the13 AMA13 filter13 population13 responses13 can13 be13 constructed13 via13 a13 weighted13 sum13 of13 simple13 and13 complex13 cell13 population13 responses13 (Eq13 513 Fig13 613 Fig13 7)13 (Note13 that13 the13 likelihood13 can13 be13 obtained13 by13 exponentiating13 the13 log-shy‐likelihood13 responses)13 In13 other13 words13 a13 large13 collection13 of13 log-shy‐likelihood13 (LL)13 neurons13 each13 with13 a13 different13 preferred13 disparity13 can13 be13 constructed13 from13 a13 fixed13 set13 of13 simple13 and13 complex13 cells13 simply13 by13 changing13 the13 weights13 13 13 Derivation13 of13 optimal13 weights13 on13 simple13 and13 complex13 cells13 The13 first13 term13 in13 Eq13 513 (ie13 the13 linear13 term)13 can13 be13 conceptualized13 as13 a13 weighted13 sum13 of13 simple13 cell13 population13 responses13 The13 second13 and13 third13 terms13 (ie13 the13 quadratic13 terms)13 can13 be13 conceptualized13 as13 weighted13 sums13 of13 complex13 cell13 population13 responses13 The13 constant13 corresponds13 to13 baseline13 response13 The13 constant13 and13 the13 weights13 on13 the13 linear13 squared13 and13 sum-shy‐squared13 filters13 are13 given13 by13

wik = Ck

minus1uk 13 13 13 13 13 13 13 13 13 (S1a)13

13 wiik = minusdiag(Cminus1)+ 05Cminus11 13 13 13 13 13 13 13 (S1b)13

wij k = minus05Cij

minus1 forallij j gt i 13 13 13 13 13 13 13 13 (S1c)13

cons primetk = minus05ukTCk

minus1uk + constk 13 13 13 13 13 13 (S1d)13 13 where13 I13 is13 the13 identity13 matrix13 113 is13 the13 lsquoonesrsquo13 vector13 and13 diag()13 sets13 a13 matrix13 diagonal13 to13 a13 vector13 Here13 we13 derive13 these13 weight13 equations13 based13 on13 the13 multi-shy‐dimensional13 Gaussian13 approximations13 to13 the13 empirically13 determined13 filter13 response13 distributions13 (ie13 conditional13 response13 distributions)13 (see13 Fig13 4a)13 13 13 13 13 13

213 13

First13 we13 rewrite13 Eq13 513 in13 the13 main13 text13 by13 expanding13 its13 third13 term13 13

ln p R δ k( ) = wik Ri

i=1

n

sum + wiik Ri2

i=1

n

sum + wij k Ri2 + 2Ri Rj + Rj

2( )j=i+1

n

sumi=1

nminus1

sum + cons primet 13 (S2)13

13 Second13 we13 rewrite13 Eq13 413 in13 the13 main13 text13

13

ln p R δ k( ) = minus05 R minus uk( )T Ak R minus uk( )+ const 13 13 13 13 13 (S3)13

13 where13 Ak 13 equals13 the13 inverse13 covariance13 matrix13 Ck

minus1 13 (Recall13 that13 the13 mean13 vector13 uk and13 covariance13 matrix13 Ck are13 obtained13 by13 projecting13 a13 collection13 of13 natural13 stereo-shy‐images13 having13 disparity13 δ k 13 onto13 the13 filters)13 From13 here13 forward13 the13 disparity13 index13 k13 is13 dropped13 for13 notational13 simplicity13 Third13 we13 set13 Eqs13 S213 and13 S313 equal13 to13 one13 another13 multiply13 through13 and13 collect13 terms13 The13 terms13 of13 the13 log13 likelihood13 (Eq13 S3)13 are13 found13 to13 have13 the13 form13 13

( ) ( )2 2 205 05 2ii i i ii i ii i i ii ia R u a R a Ru a uminus minus = minus minus + 13 13 13 13 (S4a)13

minusaij Ri minus ui( ) Rj minus u j( ) = minus aijRiRj minus aijRiu j minus aijRjui + aiju jui( ) 13 13 13 (S4b)13

13 where13 iia 13 are13 the13 on-shy‐diagonal13 elements13 of13 A 13 ija are13 the13 off-shy‐diagonal13 elements13 and13 iu 13 the13 are13 the13 mean13 filter13 responses13 13 13 The13 aim13 is13 now13 to13 express13 wi wii 13 and13 wij 13 in13 terms13 of13 iu 13 iia 13 and13 ija 13 First13 consider13 wi 13 In13 this13 case13

we13 want13 to13 collect13 terms13 containing13 iR 13 wiRi 13 from13 Eq13 S213 and13 aiiRiui 13 and13 aijRiu j 13 from13 Eq13 S4a13 Setting13 the13 iR 13 terms13 equal13 and13 canceling13 gives13 13

wi = aiju jj=1

n

sum 13 13 13 13 13 13 13 13 13 (S5a)13

13 Note13 that13 the13 weights13 on13 the13 linear13 filter13 responses13 wi 13 are13 zero13 if13 the13 mean13 linear13 filter13 responses13 ui 13 are13 zero13 In13 our13 case13 the13 filter13 response13 distributions13 means13 (ie13 the13 means13 of13 the13 conditional13 response13 distributions)13 are13 all13 near13 zero13 (see13 Fig13 4a13 Fig13 S2)13 Thus13 the13 weights13 on13 the13 simple13 cell13 responses13 (Fig13 6a)13 contribute13 little13 to13 the13 units13 (LL13 neurons)13 that13 respond13 according13 to13 the13 log13 likelihood13 of13 the13 linear13 filter13 population13 response13 (see13 Discussion)13 13 Next13 consider13 wij 13 i ne j 13 In13 this13 case13 we13 want13 to13 collect13 terms13 containing13 RiRj 13 2wijRiRj 13 appears13

in13 Eq13 S213 and13 minusaijRiRj 13 appears13 in13 Eq13 S4b13 Setting13 the13 two13 RiRj 13 terms13 equal13 to13 each13 other13 and13 canceling13 gives13 13

wij = minus05aij 13 13 13 13 13 13 13 13 13 (S5b)13

313 13

Next13 consider13 wii 13 In13 this13 case13 the13 total13 weight13 on13 2iR 13 in13 Eq13 S213 must13 be13 05 iiaminus 13 The13 weight13 from13

the13 third13 term13 of13 Eq13 S213 can13 be13 found13 by13 substituting13 wij = minus05aij 13 Subtracting13 that13 weight13 from13 the13

total13 weight13 specified13 by13 Eq13 S413 gives13 wii 13 13

wii = minusaii + 05 aijjsum 13 13 13 13 13 13 13 13 13 (S5c)13

13 Last13 by13 grouping13 terms13 from13 Eq13 S413 that13 are13 independent13 of13 the13 filter13 response13 we13 find13 the13 constant13 in13 Eq13 S213 is13 given13 by13 13

cons primet = minus05 aiimicroi2 + aijmicroimicro j

j foralljneisum

⎝⎜⎞

⎠⎟isum + const 13 13 13 13 (S5d)13

13 Eqs13 S1a-shy‐d13 are13 obtained13 by13 rewriting13 Eqs13 S5a-shy‐d13 in13 matrix13 notation13 13 13 In13 visual13 cortex13 simple13 and13 complex13 cells13 have13 the13 same13 maximum13 firing13 rates13 on13 average13 (Albrecht13 amp13 Hamilton13 1982)13 However13 the13 maximum13 possible13 response13 of13 the13 squared13 filters13 (see13 the13 second13 term13 in13 Eq13 5)13 is13 R2max 13 and13 the13 maximum13 possible13 response13 of13 the13 sum-shy‐squared13 filters13 (see13 the13 third13

term13 in13 Eq13 5)13 is13 given13 by13 R2ijmax = Rmax 2+ 2fi sdot f j( )2 13 where13 fi 13 and13 f j 13 are13 the13 weighting13 functions13 defining13 filters13 i13 and13 j13 (see13 Fig13 3a)13 Thus13 if13 these13 computations13 were13 implemented13 in13 cortexmdashthat13 is13 if13 the13 maximum13 response13 of13 the13 complex13 cells13 equaled13 the13 maximum13 response13 of13 the13 simple13 cells13 Rmax 13 rather13 than13 R

2max 13 and13 R

2ijmaxmdashthe13 weights13 on13 the13 complex13 cell13 responses13 must13 be13 scaled13

Thus13 the13 optimal13 weights13 on13 complex13 cells13 are13 given13 by13 primewi = wi 13 primewii = Rmaxwii 13 and13 primewij =Rijmax2

Rmaxwij 13

These13 scale13 factors13 are13 important13 because13 in13 a13 neurophysiological13 experiment13 one13 would13 measure13 primew 13 not13 w 13 Consequently13 primewii 13 and13 primewij 13 are13 the13 normalized13 weights13 plotted13 in13 Fig13 7b13

13 Evaluating13 the13 effect13 of13 surface13 slant13 Slant13 causes13 disparity13 to13 change13 across13 the13 patch13 We13 evaluated13 the13 effect13 of13 surface13 slant13 by13 determining13 the13 optimal13 filters13 and13 estimation13 performance13 for13 a13 training13 set13 with13 a13 distribution13 of13 slants13 If13 we13 assume13 (plausibly)13 that13 objects13 are13 on13 average13 equally13 likely13 to13 be13 viewed13 from13 any13 given13 angle13 then13 for13 a13 given13 small13 angular13 retinal13 extent13 the13 probability13 of13 a13 given13 slant13 about13 the13 vertical13 axis13 should13 fall13 as13 a13 half-shy‐cosine13 function13 Slant13 was13 calculated13 for13 the13 cyclopean13 eye13 To13 evaluate13 the13 effect13 of13 surface13 slant13 variation13 we13 generated13 a13 training13 set13 of13 stereo-shy‐images13 from13 a13 set13 of13 surfaces13 with13 slants13 that13 were13 randomly13 drawn13 from13 a13 truncated13 (+7113 deg)13 cosine13 probability13 density13 (more13 extreme13 slants13 were13 not13 practical13 to13 include13 because13 of13 the13 surface13 size13 required13 to13 produce13 a13 projected13 patch13 width13 of13 113 deg)13 The13 within-shy‐patch13 changes13 in13 disparity13 that13 were13 introduced13 by13 these13 slants13 are13 comparable13 to13 those13 in13 natural13 viewing13 (Fig13 S1g)13 Results13 based13 on13 this13 analysis13 are13 thus13 likely13 to13 be13 representative13 Then13 we13 determined13 the13 optimal13 binocular13 filters13 for13 this13 new13 training13 set13 The13 filters13 that13 were13 trained13 on13 surfaces13 having13 different13 slants13 (Fig13 S1a-shy‐c)13 and13 the13 LL13 neurons13 that13 are13 constructed13 from13 them13 (Fig13 S1de)13 are13 quite13 similar13 to13 those13 that13 were13 trained13 only13 on13 fronto-shy‐parallel13 surfaces13 The13 main13 (although13 slight)13 differences13 are13 that13 the13 new13 filters13 have13 spatial13 extents13 that13 are13 slightly13 smaller13 and13 octave13 bandwidths13 that13 are13 slightly13 higher13

413 13

and13 that13 the13 new13 LL13 neuron13 tuning13 curves13 are13 broader13 Disparity13 estimation13 performance13 on13 slant13 surfaces13 is13 also13 comparable13 but13 precision13 is13 somewhat13 reduced13 (Fig13 S1f13 Fig13 5d)13 When13 disparity13 changes13 across13 the13 patch13 the13 optimal13 encoding13 filters13 must13 be13 small13 enough13 to13 prevent13 the13 disparity13 signal13 from13 being13 lsquosmeared13 outrsquo13 by13 within-shy‐patch13 (ie13 within-shy‐receptive13 field)13 disparity13 variation13 And13 they13 must13 be13 large13 enough13 to13 prevent13 a13 poor13 signal-shy‐to-shy‐noise13 ratio13 which13 would13 lead13 to13 a13 preponderance13 of13 false13 matches13 (ie13 inaccurate13 disparity13 estimates)13 The13 optimal13 filters13 appear13 to13 strike13 this13 balance13 13 13 Evaluating13 the13 effect13 of13 depth13 variations13 other13 than13 slant13 Naturally13 occurring13 fine13 depth13 structure13 and13 occlusions13 (ie13 within-shy‐patch13 depth13 variation13 and13 depth13 discontinuities13 Fig13 S8a-shy‐c)13 were13 not13 present13 in13 either13 of13 our13 training13 sets13 To13 determine13 how13 the13 magnitude13 of13 these13 other13 sources13 of13 depth13 variation13 compare13 to13 that13 due13 to13 slant13 we13 analyzed13 4000013 113 deg13 patches13 from13 3013 outdoor13 range13 images13 obtained13 with13 a13 high-shy‐precision13 Riegl13 VZ40013 range13 scanner13 For13 each13 patch13 we13 computed13 the13 standard13 deviation13 of13 i)13 the13 differences13 between13 the13 raw13 range13 values13 and13 the13 best-shy‐fitting13 fronto-shy‐parallel13 plane13

σ nonminus planar 13 and13 ii)13 the13 differences13

between13 the13 best-shy‐fitting13 slanted13 and13 fronto-shy‐parallel13 planes13 σ planar 13 The13 ratio13 of13 these13 standard13

deviations13 σ planar σ nonminus planar 13 provides13 a13 measure13 of13 the13 relative13 magnitude13 of13 depth13 variation13 due13 to13

slant13 and13 variation13 due13 to13 other13 factors13 If13 the13 depth13 variation13 is13 due13 only13 to13 surface13 slant13 then13 the13 ratio13

σ planar σ nonminus planar 13 will13 have13 a13 value13 equal13 to13 1013 If13 all13 of13 the13 depth13 variation13 is13 due13 to13 factors13

other13 than13 slant13 the13 ratio13 will13 have13 a13 value13 of13 0013 Fig13 S8d13 plots13 the13 distribution13 of13 this13 ratio13 for13 all13 range13 patches13 In13 the13 most13 common13 patches13 essentially13 all13 the13 variation13 is13 captured13 by13 slant13 (ie13 the13 modal13 ratio13 is13 ~10)13 The13 median13 ratio13 is13 07513 indicating13 that13 in13 the13 median13 patch13 7513 of13 the13 depth13 variation13 is13 captured13 by13 slant13 Given13 that13 the13 optimal13 binocular13 filters13 are13 robust13 to13 variation13 in13 surface13 slant13 and13 given13 that13 most13 of13 natural13 depth13 variation13 is13 captured13 by13 slant13 it13 is13 unlikely13 that13 depth13 variation13 other13 than13 slant13 will13 qualitatively13 affect13 our13 results13 13 External13 variability13 and13 neural13 noise13 13 We13 examined13 the13 relative13 magnitude13 of13 stimulus-shy‐induced13 LL13 neuron13 variability13 and13 typical13 neural13 noise13 in13 cortex13 Typical13 cortical13 neurons13 have13 a13 peak13 mean13 response13 larger13 than13 3013 spikessec13 (Geisler13 amp13 Albrecht13 1997)13 and13 intrinsic13 noise13 variance13 proportional13 to13 the13 neuronrsquos13 mean13 response13 with13 a13 proportionality13 constant13 (ie13 Fano13 factor)13 less13 than13 15(Tolhurst13 Movshon13 amp13 Dean13 1983)13 To13 err13 on13 the13 side13 of13 too13 much13 neural13 noise13 we13 assume13 that13 each13 LL13 neuron13 has13 a13 maximum13 average13 response13 of13 3013 spkss13 and13 a13 Fano13 factor13 of13 1513 (bc13 the13 effect13 of13 noise13 is13 greater13 at13 lower13 firing13 rates)13 We13 convert13 the13 LL13 neuron13 responses13 to13 spike13 rates13 by13 first13 obtaining13 the13 average13 response13 of13 each13 LL13 neuron13 to13 its13 preferred13 disparity13 and13 then13 scaling13 that13 value13 to13 the13 maximum13 average13 response13 For13 an13 integration13 time13 of13 a13 typical13 fixation13 duration13 (30013 ms)13 stimulus-shy‐induced13 response13 variance13 was13 ~3x13 greater13 than13 the13 internal13 noise13 For13 an13 integration13 time13 equal13 to13 a13 typical13 stimulus13 presentation13 in13 a13 psychophysical13 task13 (113 sec)13 13 stimulus-shy‐induced13 response13 variance13 was13 ~9x13 greater13 than13 the13 internal13 noise13 Thus13 external13 variability13 must13 be13 considered13 to13 understand13 the13 influence13 of13 neural13 noise13 on13 the13 behavioral13 limits13 of13 vision13 under13 natural13 conditions13 In13 fact13 stimulus-shy‐induced13 variability13 is13 likely13 to13 be13 the13 controlling13 variability13 in13 many13 natural13 single-shy‐trial13 tasks13 13 Spatial13 frequency13 tuning13 of13 optimal13 linear13 binocular13 filters13 13 To13 develop13 an13 intuition13 for13 why13 spatial13 frequencies13 higher13 than13 ~613 cpd13 carry13 little13 information13 about13 disparity13 we13 examined13 the13 disparity13 signals13 resulting13 from13 a13 binocularly-shy‐viewed13 high-shy‐contrast13 cosine-shy‐windowed13 edge13 The13 Fourier-shy‐decomposition13 of13 each13 eyersquos13 stimulus13 is13 a13 set13 of13

513 13

phase-shy‐aligned13 sinewave13 gratings13 with13 the13 1f13 contrast13 fall-shy‐off13 typical13 of13 natural13 scenes(Field13 1987)13 The13 binocular13 difference13 signal13 (ie13 the13 difference13 of13 the13 left13 and13 right13 signals)13 is13 sinusoidal13 with13 contrast13 amplitude13 given13 by13 13

AB f δ k( ) = AL( f )2 + AR( f )2 minus 2AL( f )AR( f )cos 2π fδ k( ) 13 13 13 (S6)13

13 where f 13 is13 the13 spatial13 frequency13 δ k 13 is13 a13 particular13 disparity13 AL ( f ) 13 and13 AR( f ) 13 are13 the13 left13 and13

right13 eye13 retinal13 amplitudes13 and13 AB f δ k( ) 13 is13 the13 amplitude13 of13 the13 binocular13 contrast13 difference13 signal13 If13 the13 two13 eyes13 have13 identical13 optics13 the13 left-shy‐13 and13 right-shy‐eye13 retinal13 amplitudes13 (which13 include13 the13 effects13 of13 the13 optics13 and13 the13 1 f 13 falloff13 of13 natural13 image13 spectra)13 will13 be13 the13 same13 for13 a13 given13 disparity13 Fig13 S7a13 plots13 pattern13 of13 binocular13 contrast13 signals13 for13 each13 of13 several13 disparities13 Fig13 S7b13 shows13 the13 signals13 after13 filtering13 with13 a13 bank13 of13 log-shy‐Gabor13 filters13 each13 having13 a13 1513 octave13 bandwidth13 The13 patterns13 of13 binocular13 contrast13 barely13 differ13 above13 ~613 cpd13 A13 filter13 population13 (for13 disparity13 estimation)13 will13 be13 most13 efficient13 if13 each13 filter13 signals13 information13 about13 a13 range13 of13 disparities13 13 13 V113 binocular13 receptive13 fields13 tuned13 to13 the13 highest13 useful13 spatial13 frequency13 (~613 cpd)13 would13 have13 a13 spatial13 extent13 of13 ~813 arcmin13 assuming13 the13 typical13 1513 octave13 bandwidth13 of13 neurons13 in13 cortex13 Given13 that13 receptive13 fields13 cannot13 signal13 disparity13 variation13 finer13 than13 their13 own13 width13 this13 analysis13 may13 help13 explain13 why13 humans13 lose13 the13 ability13 to13 detect13 disparity13 modulations13 (ie13 sinusoidal13 modulations13 in13 disparity-shy‐defined13 depth)13 at13 spatial13 frequencies13 higher13 than13 ~413 cpd13 (Banks13 Gepshtein13 amp13 Landy13 200413 Harris13 McKee13 amp13 Smallman13 1997)13 Vergence13 eye13 movement13 jitter13 has13 previously13 been13 proposed13 as13 an13 explanation13 for13 why13 useful13 binocular13 information13 is13 restricted13 to13 low13 spatial13 frequencies13 (Vlaskamp13 Yoon13 amp13 Banks13 2011)13 vergence13 noise13 can13 degrade13 stereo13 information13 enough13 to13 render13 high13 frequency13 signals13 unmeasurable13 (Fig13 S7cd)13 The13 analysis13 presented13 here13 suggests13 that13 the13 frequency13 at13 which13 humans13 lose13 the13 ability13 to13 detect13 disparity13 modulation13 may13 simply13 be13 explained13 by13 the13 statistics13 of13 natural13 images13 13 Optics13 The13 retinal13 images13 used13 in13 our13 analysis13 were13 simulated13 to13 match13 the13 retinal13 image13 formation13 that13 occurs13 in13 the13 human13 eye13 In13 humans13 and13 non-shy‐human13 primates13 accommodative13 and13 vergence13 eye-shy‐movement13 systems13 are13 yoked(Fincham13 amp13 Walton13 1957)13 Consistent13 with13 this13 fact13 we13 set13 the13 focus13 distance13 equal13 to13 the13 fixation13 distance13 of13 4013 cm13 and13 defocused13 the13 images13 appropriate13 for13 each13 disparity13 The13 image13 of13 a13 surface13 displaced13 from13 the13 fixation13 is13 defocused13 by13

13 ΔD cong δ IPD 13 13 13 13 13 13 13 13 13 13 (S7)13

13 where13 δ 13 is13 the13 disparity13 of13 the13 surface13 expressed13 in13 radians13 IPD13 is13 the13 inter-shy‐pupillary13 distance13 expressed13 in13 meters13 and13 ΔD 13 is13 the13 defocus13 expressed13 in13 diopters13 (1meters)13 To13 account13 for13 the13 effect13 of13 chromatic13 aberration13 we13 created13 a13 different13 polychromatic13 point-shy‐spread13 function13 (PSF)13 for13 each13 disparity13 (ie13 defocus13 Eq13 S7)13 Single-shy‐wavelength13 PSFs13 were13 computed13 every13 1013 nm13 between13 40013 and13 70013 nm13 The13 wavelength-shy‐dependent13 change13 in13 refractive13 power13 of13 the13 human13 eye13 was13 taken13 from13 the13 literature13 (Thibos13 Ye13 Zhang13 amp13 Bradley13 1992)13 Polychromatic13 PSFs13 were13 obtained13 by13 weighting13 the13 single-shy‐wavelength13 PSFs13 by13 the13 photopic13 sensitivity13 function13 (Stockman13 amp13 Sharpe13 2000)13 and13 by13 the13 D6513 daylight13 illumination13 spectrum13 and13 then13 summing13 13

613 13

13 psf photopic xΔD( ) = 1

Kpsf xλΔD( )sc

λsum λ( )D65 λ( ) 13 13 13 13 (S8)13

13 where13 sc λ( ) 13 is13 the13 human13 photopic13 sensitivity13 function13 and13 K13 is13 a13 normalizing13 constant13 that13 sets13 the13 volume13 of13 psf photopic 13 to13 1013 The13 modulation13 transfer13 function13 (MTF)13 is13 the13 amplitude13 of13 the13 Fourier13 transform13 of13 the13 PSF13 13 Contrast13 normalization13 Contrast13 normalization13 occurs13 early13 in13 visual13 processing13 The13 standard13 model13 of13 cortical13 neuron13 response13 assumes13 that13 the13 input13 is13 the13 contrast13 signal13

c x( ) = r x( )minus r( ) r 13 where13 r x( ) 13 are13 the13

noisy13 sensor13 responses13 to13 the13 luminance13 image13 falling13 on13 the13 retina13 over13 a13 local13 area13 and13 r 13 is13 the13 local13 mean13 This13 contrast13 signal13 is13 then13 normalized13 by13 the13 local13 contrast13

cnorm x( ) = c x( ) c x( ) 2

+ nc502 13 where13 n13 is13 the13 dimensionality13 of13 the13 vector13 c x( ) 2 13 is13 the13 contrast13

energy13 (or13 equivalently13 n 13 times13 the13 squared13 RMS13 contrast)13 and13 50c 13 is13 the13 half-shy‐saturation13 constant13 (Albrecht13 amp13 Geisler13 199113 Albrecht13 amp13 Hamilton13 198213 Heeger13 1992)13 (We13 are13 agnostic13 about13 the13 mechanism13 by13 which13 the13 contrast13 normalization13 is13 achieved)13 Finally13 the13 normalized13 contrast13 signal13 is13 weighted13 by13 the13 cortical13 neuronrsquos13 receptive13 field13 and13 passed13 through13 a13 static13 non-shy‐linearity13 For13 example13 in13 the13 case13 of13 a13 binocular13 simple13 cell13 the13 response13 of13 the13 neuron13 is13 obtained13 by13 taking13 the13 dot13 product13 of13 the13 contrast-shy‐normalized13 signal13 with13 the13 receptive13 field13 and13 half-shy‐wave13 rectifying13

Ri = Rmax cnorm x( ) sdot fi x( )⎢⎣ ⎥⎦ 13 13

13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13

713 13

13 References13 Albrecht13 D13 G13 amp13 Geisler13 W13 S13 (1991)13 Motion13 selectivity13 and13 the13 contrast-shy‐response13 function13 of13

simple13 cells13 in13 the13 visual13 cortex13 Visual13 Neuroscience13 7(6)13 531ndash54613 Albrecht13 D13 G13 amp13 Hamilton13 D13 B13 (1982)13 Striate13 cortex13 of13 monkey13 and13 cat13 contrast13 response13

function13 Journal13 of13 Neurophysiology13 48(1)13 217ndash23713 Banks13 M13 S13 Gepshtein13 S13 amp13 Landy13 M13 S13 (2004)13 Why13 is13 spatial13 stereoresolution13 so13 low13 The13

Journal13 of13 Neuroscience13 24(9)13 2077ndash208913 13 Field13 D13 J13 (1987)13 Relations13 between13 the13 statistics13 of13 natural13 images13 and13 the13 response13 properties13 of13

cortical13 cells13 Journal13 of13 the13 Optical13 Society13 of13 America13 A13 4(12)13 2379ndash239413 Fincham13 E13 amp13 Walton13 J13 (1957)13 The13 reciprocal13 actions13 of13 accommodation13 and13 convergence13 The13

Journal13 of13 Physiology13 137(3)13 488ndash50813 Geisler13 W13 S13 amp13 Albrecht13 D13 G13 (1997)13 Visual13 cortex13 neurons13 in13 monkeys13 and13 cats13 detection13

discrimination13 and13 identification13 Visual13 Neuroscience13 14(5)13 897ndash91913 Harris13 J13 M13 McKee13 S13 P13 amp13 Smallman13 H13 S13 (1997)13 Fine-shy‐scale13 processing13 in13 human13 binocular13

stereopsis13 Journal13 of13 the13 Optical13 Society13 of13 America13 A13 14(8)13 1673ndash168313 Heeger13 D13 J13 (1992)13 Normalization13 of13 cell13 responses13 in13 cat13 striate13 cortex13 Visual13 Neuroscience13 9(2)13

181ndash19713 Hibbard13 P13 B13 (2008)13 Binocular13 energy13 responses13 to13 natural13 images13 Vision13 Research13 48(12)13 1427ndash

143913 13 Stockman13 A13 amp13 Sharpe13 L13 T13 (2000)13 The13 spectral13 sensitivities13 of13 the13 middle-shy‐13 and13 long-shy‐wavelength-shy‐

sensitive13 cones13 derived13 from13 measurements13 in13 observers13 of13 known13 genotype13 Vision13 Research13 40(13)13 1711ndash173713

Thibos13 L13 N13 Ye13 M13 Zhang13 X13 amp13 Bradley13 A13 (1992)13 The13 chromatic13 eye13 a13 new13 reduced-shy‐eye13 model13 of13 ocular13 chromatic13 aberration13 in13 humans13 Applied13 Optics13 31(19)13 3594ndash360013

Tolhurst13 D13 J13 Movshon13 J13 A13 amp13 Dean13 A13 F13 (1983)13 The13 statistical13 reliability13 of13 signals13 in13 single13 neurons13 in13 cat13 and13 monkey13 visual13 cortex13 Vision13 Research13 23(8)13 775ndash78513

Vlaskamp13 B13 N13 S13 Yoon13 G13 amp13 Banks13 M13 S13 (2011)13 Human13 stereopsis13 is13 not13 limited13 by13 the13 optics13 of13 the13 well-shy‐focused13 eye13 The13 Journal13 of13 Neuroscience13 31(27)13 9814ndash981813

13

813 13

13 Figure13 S113 Optimal13 filters13 and13 disparity13 estimation13 performance13 for13 estimating13 disparity13 for13 surfaces13 with13 a13 cosine13 distribution13 of13 slants13 (a)13 Optimal13 linear13 binocular13 filters13 for13 estimating13 the13 disparity13 (of13 the13 center13 pixel)13 of13 slanted13 surfaces13 The13 filters13 were13 learned13 on13 a13 training13 set13 having13 surfaces13 with13 a13 cosine13 distribution13 of13 slants13 The13 optimal13 filter13 shapes13 are13 robust13 to13 the13 presence13 of13 non-shy‐zero13 surface13 slants13 in13 the13 training13 set13 However13 these13 filters13 are13 of13 somewhat13 smaller13 spatial13 extent13 and13 slightly13 higher13 octave13 bandwidth13 (compare13 filters13 F3-shy‐F513 to13 those13 in13 Fig13 3)13 (b)13 Spatial13 frequency13 tuning13 and13 bandwidth13 of13 filters13 (c)13 Two-shy‐dimensional13 analogs13 of13 filters13 in13 (a)13 The13 two-shy‐dimensional13 filters13 respond13 virtually13 identically13 to13 two-shy‐dimensional13 images13 as13 do13 the13 one-shy‐dimensional13 filters13 to13 one-shy‐dimensional13 signals13 (d)13 Tuning13 curves13 of13 LL13 neurons13 trained13 and13 tested13 on13 surfaces13 with13 varying13 surface13 slant13 (e)13 Normalized13 weights13 on13 model13 complex13 cell13 responses13 for13 constructing13 five13 LL13 neurons13 with13 the13 indicated13 preferred13 disparities13 (f)13 Disparity13 estimation13 performance13 with13 surfaces13 having13 a13 cosine13 distribution13 of13 slants13 (black13 line)13 Also13 shown13 are13 disparity13 estimates13 derived13 from13 a13 local13 cross-shy‐correlator13 (gray13 line)13 (g)13 Disparity13 gradients13 from13 viewing13 natural13 scenes(Hibbard13 2008)13 vs13 disparity13 gradients13 in13 our13 training13 set13 The13 relationship13 between13 slant13 and13 disparity13 gradient13 is13 given13 by13

g asymp IPD tanθ( ) d 13 where13 θ 13 is13 the13 surface13 slant13 and13 d13 is13 the13 distance13 to13 the13 surface13 Hibbard13 (2008)13 estimated13 disparity13 gradients13 in13 natural13 viewing13 for13 distances13 nearer13 than13 3513 m13 (dashed13 curve)13 and13 0513 m13 (dotted13 curve)13 At13 the13 04m13 viewing13 distance13 assumed13 throughout13 the13 paper13 our13 manipulation13 of13 surface13 slant13 introduced13 distribution13 of13 disparity13 gradients13 (black13 line)13 that13 is13 comparable13 to13 the13 gradients13 introduced13 in13 natural13 viewing13 13

913 13

13 Figure13 S213 Filter13 responses13 LL13 neuron13 responses13 and13 performance13 with13 random-shy‐dot13 stereograms13 and13 anti-shy‐correlated13 random-shy‐dot13 stereograms13 (a)13 Joint13 response13 distributions13 of13 filters13 F113 and13 F213 when13 stimulated13 with13 random-shy‐dot13 stereograms13 Compare13 to13 natural13 stimuli13 in13 Fig13 4a13 (b)13 Joint13 responses13 of13 filters13 F313 and13 F413 when13 stimulated13 with13 random-shy‐dot13 stereograms13 (c)13 LL13 neuron13 responses13 to13 random13 dot13 stereograms13 (d)13 LL13 neuron13 tuning13 vs13 bandwidth13 when13 stimulated13 with13 natural13 stimuli13 Note13 that13 bandwidths13 are13 elevated13 compared13 to13 when13 the13 LL13 neurons13 are13 stimulated13 with13 natural13 stimuli13 (see13 Fig13 7c)13 (e)13 Disparity13 estimates13 and13 estimate13 precision13 (inset)13 obtained13 with13 random-shy‐dot13 stereograms13 (fghi)13 Same13 as13 abce13 except13 that13 the13 stimuli13 were13 anti-shy‐correlated13 random13 dot13 stereograms13 Filter13 response13 distributions13 still13 segregate13 as13 a13 function13 of13 disparity13 (especially13 with13 F313 and13 F4)13 However13 the13 anti-shy‐correlated13 filter13 response13 distributions13 do13 not13 coincide13 with13 the13 lsquoexpectedrsquo13 pattern13 of13 response13 for13 particular13 disparities13 The13 disparities13 are13 decoded13 improperly13 with13 a13 decoder13 trained13 on13 natural13 stimuli13 LL13 neuron13 tuning13 curves13 with13 anti-shy‐correlated13 stereograms13 are13 neither13 selective13 nor13 invariant13 Inaccurate13 highly13 variable13 estimates13 result13 13 13

13 13

1013 13

13 Figure13 S313 Obtaining13 AMA13 filter13 and13 model13 complex13 cell13 responses13 with13 established13 neural13 operations13 (a)13 Simple13 cell13 responses13 are13 obtained13 by13 projecting13 the13 left-shy‐13 and13 right-shy‐eye13 signals13 onto13 each13 binocular13 filter13 and13 half-shy‐wave13 rectifying13 Subtracting13 lsquoonrsquo13 and13 lsquooffrsquo13 simple13 cell13 responses13 give13 the13 AMA13 filter13 responses13 (see13 Fig13 3a)13 (b)13 Model13 (and13 neurophysiological)13 complex13 cell13 responses13 are13 obtained13 by13 summing13 and13 squaring13 the13 responses13 of13 simple13 cells13 (Model13 complex13 cell13 responses13 can13 also13 be13 obtained13 by13 summing13 the13 responses13 of13 simple13 cells13 that13 have13 both13 a13 half-shy‐wave13 rectifying13 and13 a13 squaring13 nonlinearity)13 Note13 that13 the13 model13 complex13 cells13 differ13 from13 canonical13 disparity13 energy13 binocular13 complex13 cells13 We13 label13 them13 lsquocomplexrsquo13 because13 they13 exhibit13 temporal13 frequency13 doubling13 and13 have13 a13 fundamental13 to13 DC13 response13 ratio13 of13 less13 than13 one13 The13 model13 LL13 neurons13 (see13 Fig13 613 and13 Fig13 7)13 are13 more13 similar13 to13 disparity13 energy13 complex13 cells13 although13 as13 noted13 in13 the13 text13 there13 are13 also13 important13 differences13

13

1113 13

13 Figure13 S413 The13 eight13 AMA13 filters13 from13 Fig13 313 (on13 the13 diagonal)13 and13 their13 pairwise13 sums13 (see13 Fig13 S3a)13 Solid13 and13 dashed13 curves13 indicate13 the13 left13 and13 right13 eye13 filter13 components13 respectively13 Filter13 identity13 is13 indicated13 by13 the13 subscripts13 Fij13 Some13 filters13 have13 shapes13 similar13 to13 canonical13 V1-shy‐like13 binocular13 receptive13 fields13 These13 filters13 yield13 model13 complex13 cell13 tuning13 curves13 that13 vary13 with13 disparity13 (Fig13 S5)13 Other13 filters13 are13 irregularly13 shaped13 (eg13 F36)13 Irregularly13 shaped13 filters13 yield13 model13 complex13 cell13 tuning13 curves13 that13 do13 not13 vary13 significantly13 with13 disparity13 (Fig13 S5)13 The13 inset13 shows13 the13 largest13 absolute13 normalized13 weight13 (see13 Supplement13 Eqs13 S1a-shy‐d)13 given13 to13 each13 filter13 in13 constructing13 the13 set13 of13 LL13 neurons13 (Fig13 7a)13 Lighter13 colors13 indicate13 smaller13 weights13 Unsurprisingly13 the13 irregularly13 shaped13 filters13 (eg13 F36)13 play13 a13 generally13 smaller13 role13 in13 shaping13 the13 response13 properties13 of13 log-shy‐likelihood13 (LL)13 neurons13 13 13

1213 13

13 Figure13 S513 Disparity13 tuning13 curves13 of13 the13 model13 complex13 cells13 implied13 by13 the13 AMA13 filters13 (on-shy‐diagonal)13 and13 their13 normalized13 pairwise13 sums13 (off-shy‐diagonal)13 Solid13 curves13 show13 the13 complex13 cell13 tuning13 curves13 to13 natural13 stimuli13 The13

complex13 response13 is13 given13 by13 filtering13 the13 contrast-shy‐normalized13 stimulus13 and13 then13 squaring13 Rmax fi

Tcnorm( )2 13 (see13 Fig13 S4)13 Each13 point13 on13 the13 curve13 represents13 the13 average13 response13 to13 a13 large13 number13 of13 natural13 stimuli13 having13 the13 same13 disparity13 Gray13 area13 shows13 response13 variability13 due13 to13 variation13 in13 natural13 image13 content13 Dashed13 curves13 show13 the13 complex13 cell13 tuning13 curve13 to13 513 random-shy‐dot13 stereograms13 (RDSs)13 In13 most13 cases13 RDS13 tuning13 curves13 are13 similar13 to13 the13 tuning13 curves13 for13 natural13 stimuli13 although13 response13 magnitude13 is13 generally13 somewhat13 attenuated13 These13 individual13 model13 complex13 cells13 have13 some13 selectivity13 for13 disparity13 However13 they13 exhibit13 little13 invariance13 to13 variation13 in13 natural13 image13 content13 Thus13 the13 pattern13 of13 the13 joint13 population13 response13 must13 be13 used13 to13 accurately13 estimate13 disparity13 Neurons13 that13 signal13 the13 log-shy‐likelihood13 of13 disparity13 (LL13 neurons)13 can13 be13 constructed13 by13 appropriately13 combining13 the13 joint13 responses13 of13 model13 complex13 cells13 LL13 neurons13 are13 highly13 selective13 for13 disparity13 and13 are13 also13 strongly13 response13 invariant13 to13 irrelevant13

1313 13

13 Figure13 S613 Optimal13 disparity13 estimation13 vs13 estimation13 via13 local13 cross-shy‐correlation13 Performance13 measures13 were13 obtained13 by13 testing13 each13 method13 on13 identical13 sets13 of13 left-shy‐13 and13 right-shy‐eye13 signals13 like13 those13 shown13 in13 Fig13 1d13 (Note13 that13 this13 differs13 somewhat13 from13 the13 most13 common13 test13 of13 the13 cross-shy‐correlation13 model13 which13 gives13 the13 algorithm13 windowed13 access13 to13 many13 patches13 in13 each13 eye13 We13 chose13 the13 current13 method13 so13 that13 each13 algorithm13 has13 access13 to13 the13 same13 image13 information13 thereby13 making13 the13 comparison13 meaningful)13 Black13 curves13 indicate13 estimation13 performance13 via13 local13 cross-shy‐correlation13 Gray13 curves13 indicate13 performance13 obtained13 with13 the13 eight13 AMA13 filters13 shown13 in13 the13 main13 text13 (data13 replotted13 from13 Fig13 5)13 (a)13 Median13 MAP13 estimates13 for13 frontoparallel13 surfaces13 only13 Error13 bars13 indicate13 6813 confidence13 intervals13 (b)13 Estimate13 precision13 as13 indicated13 by13 6813 confidence13 intervals13 Error13 bars13 from13 (a)13 are13 replotted13 on13 a13 linear13 axis13 (Confidence13 intervals13 were13 plotted13 on13 a13 semi-shy‐log13 plot13 in13 the13 main13 text)13 (c)13 Sign13 identification13 performance13 with13 frontoparallel13 surfaces13 (de)13 Same13 as13 bc13 except13 that13 the13 performance13 is13 for13 surfaces13 with13 a13 cosine13 distribution13 of13 slants13 Estimation13 performance13 with13 the13 cross-shy‐correlator13 is13 (slightly)13 more13 biased13 and13 is13 significantly13 less13 precise13 at13 disparities13 larger13 than13 7513 arcmin13 13

1413 13

13 Figure13 S713 Information13 for13 disparity13 estimation13 is13 concentrated13 in13 low13 to13 mid13 spatial13 frequencies13 (a)13 Inter-shy‐ocular13 retinal13 contrast13 amplitude13 of13 a13 cosine-shy‐windowed13 high-shy‐contrast13 edge13 as13 a13 function13 of13 spatial13 frequency13 for13 different13 disparities13 (Eq13 S6)13 The13 optics13 in13 the13 two13 eyes13 were13 identical13 ie13

AL f( ) = AR f( )prop MTFδ f( ) f 13 where13 MTFδ is13 the13 modulation13 transfer13 function13 associated13 with13 a13 particular13 disparity13 (see13 Methods)13 (b)13 Inter-shy‐ocular13 contrast13 signals13 that13 have13 been13 passed13 through13 a13 bank13 of13 1513 octave13 bandwidth13 log-shy‐Gabor13 filters13 The13 signals13 differ13 little13 above13 ~513 cpd13 (c)13 The13 effect13 of13 vergence13 jitter13 on13 inter-shy‐ocular13 contrast13 signals13 Signals13 were13 corrupted13 by13 213 arcmin13 of13 vergence13 jitter13 a13 typical13 value13 in13 humans13 Vergence13 noise13 was13 simulated13 by13 convolving13 each13 disparity13 signal13 with13 a13 Gaussian13 having13 a13 standard13 deviation13 of13 213 arcmin13 Vergence13 jitter13 further13 attenuates13 high13 frequency13 inter-shy‐ocular13 contrast13 signals13 Natural13 image13 structure13 and13 vergence13 jitter13 both13 contribute13 to13 the13 lack13 of13 disparity13 information13 at13 high13 luminance13 spatial13 frequencies13 (d)13 The13 signals13 in13 (c)13 after13 having13 been13 passed13 through13 a13 bank13 of13 1513 octave13 bandwidth13 log-shy‐Gabor13 filters13 (e)13 The13 effect13 of13 a13 four13 diopter13 difference13 in13 refractive13 power13 between13 the13 two13 eyes13 drastically13 reduces13 the13 inter-shy‐ocular13 contrast13 signal13 A13 four13 diopter13 (or13 larger)13 difference13 between13 the13 left13 and13 right13 optics13 is13 a13 primary13 risk13 factor13 for13 the13 development13 of13 amblyopia13 (Levi13 et13 al13 2011)13 Note13 that13 these13 signals13 were13 simulated13 for13 eyes13 with13 2mm13 pupils13 With13 larger13 pupil13 diameters13 the13 same13 difference13 in13 refractive13 power13 would13 further13 reduce13 available13 disparity13 signals13 For13 example13 with13 a13 2mm13 pupil13 an13 813 diopter13 difference13 between13 the13 eyes13 produces13 the13 same13 difference13 in13 defocus13 blur13 as13 would13 a13 4mm13 pupil13 with13 a13 413 diopters13 difference13 between13 the13 eyes13 In13 that13 case13 there13 are13 no13 measureable13 signals13 above13 113 cpd13 (f)13 The13 signals13 in13 (e)13 after13 having13 been13 passed13 through13 a13 bank13 of13 1513 octave13 bandwidth13 log-shy‐Gabor13 filters13 (g)13 Same13 as13 (e)13 but13 with13 a13 4mm13 pupil13 (h)13 Same13 as13 (f)13 but13 with13 a13 4mm13 pupil13 13

13

1513 13

13 Figure13 S813 Depth13 structure13 not13 attributable13 to13 slant13 in13 natural13 scenes13 (a)13 A13 representative13 natural13 range13 image13 (b)13 Overhead13 plan13 view13 of13 the13 range13 along13 the13 white13 scan-shy‐line13 in13 (a)13 (c)13 Range13 values13 within13 a13 113 deg13 patch13 marked13 by13 the13 white13 box13 in13 (a)13 and13 the13 black13 box13 in13 (b)13 The13 dots13 represent13 the13 raw13 range13 values13 The13 plane13 is13 the13 best13 fitting13 plane13 to13 the13 range13 data13 (d)13 Depth13 variation13 in13 113 deg13 patches13 of13 natural13 scenes13 can13 largely13 be13 captured13 by13 slanted13 planes13 (see13 supplementary13 text)13 Values13 near13 one13 indicate13 that13 all13 range13 variation13 can13 be13 captured13 by13 (planar)13 surface13 slant13 Values13 near13 zero13 indicate13 that13 none13 of13 the13 range13 variation13 can13 be13 captured13 by13 planar13 surface13 slant13 In13 the13 most13 common13 patch13 nearly13 all13 depth13 variation13 is13 attributable13 to13 slant13 In13 the13 median13 patch13 7513 of13 depth13 variation13 is13 captured13 by13 slant13 13 13 13 13 13 13 13 13 13 13

1613 13

13 Figure13 S913 Contrast13 normalization13 and13 the13 Gaussian13 form13 of13 the13 filter13 response13 distributions13 The13 standard13 model13 of13

contrast13 normalization13 in13 retinal13 and13 cortical13 neurons13 is13 given13 by13 cnorm x( ) = c x( ) c x( ) 2

+ nc502 (Albrecht13 amp13 Geisler13

199113 Albrecht13 amp13 Hamilton13 198213 Heeger13 1992)13 Throughout13 the13 paper13 we13 assumed13 that13 c50 13 had13 a13 value13 of13 zero13 To13 examine13 the13 effect13 of13 different13 values13 of13 c50 13 on13 the13 form13 of13 the13 filter13 response13 distributions13 we13 first13 contrast-shy‐normalized13 the13 test13 stimuli13 using13 different13 values13 of13 c50 13 Then13 we13 projected13 the13 contrast-shy‐normalized13 the13 test13 stimuli13 from13 each13 disparity13 level13 onto13 the13 filters13 to13 obtain13 the13 conditional13 filter13 response13 distributions13 Finally13 we13 fit13 a13 multi-shy‐dimensional13 generalized13 Gaussian13 (one-shy‐dimension13 for13 each13 filter)13 to13 each13 conditional13 response13 distribution13

P R |δ k( ) 13 via13 maximum13 likelihood13 estimation13 (a)13 When13 c50 =13 0013 the13 conditional13 response13 distributions13 have13 tails13 that13 tend13 to13 be13 somewhat13 lighter13 than13 Gaussians13 (ie13 lower13 kurtosis13 than13 a13 Gaussian)13 When13 c50 =13 0113 the13 conditional13 response13 distributions13 are13 most13 Gaussian13 on13 average13 For13 c50 0113 the13 distributions13 have13 heavier13 tails13 than13 Gaussians13 (ie13 higher13 kurtosis13 than13 a13 Gaussian)13 (b)13 Powers13 of13 the13 multi-shy‐dimensional13 generalized13 Gaussians13 that13 best13 fit13 a13 subset13 of13 the13 two-shy‐dimensional13 conditional13 filter13 response13 distributions13 A13 power13 of13 two13 (dashed13 line)13 indicates13 that13 the13 best13 fitting13 distribution13 was13 Gaussian13 Powers13 greater13 than13 or13 less13 than13 two13 indicate13 that13 the13 responses13 are13 better13 fit13 by13 distributions13 having13 lighter13 or13 heavier13 tails13 than13 a13 Gaussian13 respectively13 The13 best-shy‐fitting13 powers13 are13 shown13 for13 three13 different13 values13 of13 c50 13 00013 01013 and13 02513 The13 average13 powers13 across13 the13 conditional13 response13 distributions13 for13 the13 three13 tested13 values13 of13 c50 13 were13 27913 20713 and13 11713 respectively13 The13 same13 qualitative13 results13 hold13 for13 the13 other13 two-shy‐dimensional13 response13 distributions13 The13 pattern13 also13 holds13 for13 higher-shy‐dimensional13 filter13 response13 distributions13 In13 higher13 dimensions13 however13 results13 are13 noisier13 because13 stable13 estimates13 of13 kurtosis13 require13 significantly13 more13 data13 13 13 13 13 13

  • jovi-14-02-01_118
    • Introduction
    • f01
    • Methods and results
    • e01
    • f02
    • e02
    • e03
    • e04
    • f03
    • e05
    • f04
    • f05
    • f06
    • f07
    • Discussion
    • Conclusions
    • Methods details
    • e06
    • e07
    • Albrecht1
    • Albrecht2
    • Anzai1
    • Badcock1
    • Banks1
    • Berens1
    • Blakemore1
    • Burge1
    • Burge2
    • Carandini1
    • Chen1
    • Cormack1
    • Cumming1
    • Cumming2
    • Cumming3
    • Curcio1
    • DeValois1
    • DeAngelis1
    • Ferster1
    • Fincham1
    • Fleet1
    • Ganguli1
    • Geisler1
    • Girshick1
    • Haefner1
    • Haefner2
    • Harris1
    • Hecht1
    • Heeger1
    • Hibbard1
    • Hoyer1
    • Jazayeri1
    • Kaye1
    • Landers1
    • Lehky1
    • Levi1
    • Lewicki1
    • Li1
    • Liu1
    • Ma1
    • Marcos1
    • Marr1
    • McKee1
    • Nakayama1
    • Nienborg1
    • Ogle1
    • Ohzawa1
    • Ohzawa2
    • Ohzawa3
    • Olshausen1
    • Panum1
    • Poggio1
    • Qian1
    • Qian2
    • Read1
    • Read2
    • Read3
    • Simoncelli1
    • Skottun1
    • Stevenson1
    • Stockman1
    • Tanabe1
    • Tanabe2
    • Thibos1
    • Tsao1
    • Tyler1
    • Tyler2
    • Vlaskamp1
    • Williams1
    • Wyszecki1
      • JOV-03741-2013-s01
Page 16: Optimal disparity estimation in natural stereo images · The local differences between the retinal images—the binocular disparities— ... vertically oriented receptive fields.The

distribution between each training step correspondingto one interpolated distribution every 35 arcsec)Interpolated distributions were obtained by fittingcubic splines through the response distribution meanvectors and covariance matrices This procedureresulted in 577 LL neurons which resulted in posteriorprobability distributions that were defined by 577discrete points MAP estimates were obtained byselecting the disparity with the highest posteriorprobability Increasing the number of interpolateddistributions (ie LL neurons) had no effect onperformance

Keywords natural scene statistics perceptual con-stancy ideal observer Bayesian statistics populationcode encoding decoding selectivity invariance depthperception stereopsis hierarchical model disparityenergy model simple cells complex cells

Acknowledgments

The authors thank Larry Cormack Bruno Olshau-sen and Jonathan Pillow for comments on an earlierdraft JB was supported by NSF grant IIS-1111328NIH grant 2R01-EY011747 and NIH Training Grant1T32-EY021462-01A1 WSG was supported by NIHgrant 2R01-EY011747

Commercial relationships noneCorresponding author Johannes BurgeEmail jburgemailcpsutexaseduAddress Center for Perceptual Systems University ofTexas at Austin Austin TX USA

References

Albrecht D G amp Geisler W S (1991) Motionselectivity and the contrast-response function ofsimple cells in the visual cortex Visual Neurosci-ence 7(6) 531ndash546

Albrecht D G amp Hamilton D B (1982) Striatecortex of monkey and cat Contrast responsefunction Journal of Neurophysiology 48(1) 217ndash237

Anzai A Ohzawa I amp Freeman R D (1999)Neural mechanisms for processing binocular in-formation I Simple cells Journal of Neurophysiol-ogy 82(2) 891ndash908

Badcock D R amp Schor C M (1985) Depth-increment detection function for individual spatialchannels Journal of the Optical Society of AmericaA 2(7) 1211ndash1215

Banks M S Gepshtein S amp Landy M S (2004)Why is spatial stereoresolution so low Journal ofNeuroscience 24(9) 2077ndash2089

Berens P Ecker A S Gerwinn S Tolias A S ampBethge M (2011) Reassessing optimal neuralpopulation codes with neurometric functionsProceedings of the National Academy of SciencesUSA 108(11) 4423

Blakemore C (1970) The range and scope ofbinocular depth discrimination in man The Journalof Physiology 211(3) 599ndash622

Burge J amp Geisler W S (2011) Optimal defocusestimation in individual natural images Proceed-ings of the National Academy of Sciences USA108(40) 16849ndash16854

Burge J amp Geisler W S (2012) Optimal defocusestimates from individual images for autofocusing adigital camera Proceedings of the ISampTSPIE 47thAnnual Meeting January 2012 Burlingame CA

Carandini M amp Heeger D J (2012) Normalizationas a canonical neural computation Nature ReviewsNeuroscience 13(1) 51ndash62

Chen Y amp Qian N (2004) A coarse-to-fine disparityenergy model with both phase-shift and position-shift receptive field mechanisms Neural Computa-tion 16(8) 1545ndash1577

Cormack L K Stevenson S B amp Schor C M(1991) Interocular correlation luminance contrastand cyclopean processing Vision Research 31(12)2195ndash2207

Cumming B G (2002) An unexpected specializationfor horizontal disparity in primate primary visualcortex Nature 418(6898) 633ndash636

Cumming B G amp DeAngelis G C (2001) Thephysiology of stereopsis Annual Review of Neuro-science 24 203ndash238

Cumming B G Shapiro S E amp Parker A J (1998)Disparity detection in anticorrelated stereogramsPerception 27(11) 1367ndash1377

Curcio C A Sloan K R Kalina R E ampHendrickson A E (1990) Human photoreceptortopography The Journal of Comparative Neurolo-gy 292(4) 497ndash523

De Valois R L Albrecht D G amp Thorell L G(1982) Spatial frequency selectivity of cells inmacaque visual cortex Vision Research 22(5) 545ndash559

DeAngelis G C Ohzawa I amp Freeman R D(1991) Depth is encoded in the visual cortex by aspecialized receptive field structure Nature352(6331) 156ndash159

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 16

Ferster D amp Miller K D (2000) Neural mechanismsof orientation selectivity in the visual cortexAnnual Review of Neuroscience 23(1) 441ndash471

Fincham E amp Walton J (1957) The reciprocalactions of accommodation and convergence TheJournal of Physiology 137(3) 488ndash508

Fleet D J Wagner H amp Heeger D J (1996)Neural encoding of binocular disparity Energymodels position shifts and phase shifts VisionResearch 36(12) 1839ndash1857

Ganguli D amp Simoncelli E P (2010) Implicitencoding of prior probabilities in optimal neuralpopulations Advances in Neural Information Pro-cessing Systems (NIPS) 23 658ndash666

Geisler W S Najemnik J amp Ing A D (2009)Optimal stimulus encoders for natural tasksJournal of Vision 9(13)17 1ndash16 httpwwwjournalofvisionorgcontent91317 10116791317 [PubMed] [Article]

Girshick A R Landy M S amp Simoncelli E P(2011) Cardinal rules Visual orientation percep-tion reflects knowledge of environmental statisticsNature Neuroscience 14(7) 926ndash932

Haefner R amp Bethge M (2010) Evaluating neuronalcodes for inference using Fisher informationAdvances in Neural Information Processing Systems23 1ndash9

Haefner R M amp Cumming B G (2008) Adaptationto natural binocular disparities in primate V1explained by a generalized energy model Neuron57(1) 147ndash158

Harris J M McKee S P amp Smallman H S (1997)Fine-scale processing in human binocular stereop-sis Journal of the Optical Society of America AOptics Image Science and Vision 14(8) 1673ndash1683

Hecht S Shlaer S amp Pirenne M H (1942) Energyquanta and vision The Journal of General Physi-ology 25(6) 819ndash840

Heeger D J (1992) Normalization of cell responses incat striate cortex Visual Neuroscience 9(2) 181ndash197

Hibbard P B (2008) Binocular energy responses tonatural images Vision Research 48(12) 1427ndash1439

Hoyer P O amp Hyvarinen A (2000) Independentcomponent analysis applied to feature extractionfrom colour and stereo images Network Compu-tation in Neural Systems 11(3) 191ndash210

Jazayeri M amp Movshon J A (2006) Optimalrepresentation of sensory information by neuralpopulations Nature Neuroscience 9(5) 690ndash696

Kaye M (1978) Stereopsis without binocular corre-lation Vision Research 18(8) 1013ndash1022

Landers D D amp Cormack L K (1997) Asymme-tries and errors in perception of depth fromdisparity suggest a multicomponent model ofdisparity processing Perception amp Psychophysics59(2) 219ndash231

Lehky S R amp Sejnowski T J (1990) Neural modelof stereoacuity and depth interpolation based on adistributed representation of stereo disparityJournal of Neuroscience 10(7) 2281ndash2299

Levi D M McKee S P amp Movshon J A (2011)Visual deficits in anisometropia Vision Research51(1) 48ndash57

Lewicki M S (2002) Efficient coding of naturalsounds Nature Neuroscience 5(4) 356ndash363

Li Z amp Atick J J (1994) Efficient stereo coding inthe multiscale representation Network Computa-tion in Neural Systems 5 157ndash174

Liu Y Bovik A C amp Cormack L K (2008)Disparity statistics in natural scenes Journal ofVision 8(11)19 1ndash14 httpwwwjournalofvisionorgcontent81119 doi10116781119 [PubMed][Article]

Ma W J Beck J M Latham P E amp Pouget A(2006) Bayesian inference with probabilistic pop-ulation codes Nature Neuroscience 9(11) 1432ndash1438

Marcos S amp Burns S A (2000) On the symmetrybetween eyes of wavefront aberration and conedirectionality Vision Research 40(18) 2437ndash2447

Marr D amp Poggio T (1976) Cooperative computa-tion of stereo disparity Science 194(4262) 283ndash287

McKee S P Levi D M amp Bowne S F (1990) Theimprecision of stereopsis Vision Research 30(11)1763ndash1779

Nakayama K amp Shimojo S (1990) Da Vincistereopsis Depth and subjective occluding contoursfrom unpaired image points Vision Research30(11) 1811ndash1825

Nienborg H Bridge H Parker A J amp CummingB G (2004) Receptive field size in V1 neuronslimits acuity for perceiving disparity modulationJournal of Neuroscience 24(9) 2065ndash2076

Ogle K N (1952) On the limits of stereoscopic visionJournal of Experimental Psychology 44(4) 253ndash259

Ohzawa I (1998) Mechanisms of stereoscopic visionThe disparity energy model Current Opinion inNeurobiology 8(4) 509ndash515

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 17

Ohzawa I DeAngelis G C amp Freeman R D(1990) Stereoscopic depth discrimination in thevisual cortex Neurons ideally suited as disparitydetectors Science 249(4972) 1037ndash1041

Ohzawa I DeAngelis G C amp Freeman R D(1997) Encoding of binocular disparity by complexcells in the catrsquos visual cortex Journal of Neuro-physiology 77(6) 2879ndash2909

Olshausen B A amp Field D J (1996) Emergence ofsimple-cell receptive field properties by learning asparse code for natural images Nature 381(6583)607ndash609

Panum P (1858) Physiologische Untersuchungen uberdas Sehen mit zwei Augen [Translation Psycholog-ical investigations on seeing with two eyes] KielSchwerssche Buchandlung

Poggio G F Gonzalez F amp Krause F (1988)Stereoscopic mechanisms in monkey visual cortexBinocular correlation and disparity selectivityJournal of Neuroscience 8(12) 4531ndash4550

Qian N (1997) Binocular disparity and the perceptionof depth Neuron 18(3) 359ndash368

Qian N amp Zhu Y (1997) Physiological computationof binocular disparity Vision Research 37(13)1811ndash1827

Read J C A amp Cumming B G (2003) Testingquantitative models of binocular disparity selectiv-ity in primary visual cortex Journal of Neurophys-iology 90(5) 2795ndash2817

Read J C A amp Cumming B G (2007) Sensors forimpossible stimuli may solve the stereo correspon-dence problem Nature Neuroscience 10(10) 1322ndash1328

Read J C A Parker A J amp Cumming B G (2002)A simple model accounts for the response ofdisparity-tuned V1 neurons to anticorrelated im-ages Visual Neuroscience 19(6) 735ndash753

Simoncelli E P amp Olshausen B A (2001) Naturalimage statistics and neural representation AnnualReview of Neuroscience 24 1193ndash1216

Skottun B C De Valois R L Grosof D HMovshon J A Albrecht D G amp Bonds A B(1991) Classifying simple and complex cells on the

basis of response modulation Vision Research31(7-8) 1079ndash1086

Stevenson S B Cormack L K Schor C M ampTyler C W (1992) Disparity tuning in mecha-nisms of human stereopsis Vision Research 32(9)1685ndash1694

Stockman A amp Sharpe L T (2000) The spectralsensitivities of the middle- and long-wavelength-sensitive cones derived from measurements inobservers of known genotype Vision Research40(13) 1711ndash1737

Tanabe S amp Cumming B G (2008) Mechanismsunderlying the transformation of disparity signalsfrom V1 to V2 in the macaque Journal ofNeuroscience 28(44) 11304ndash11314

Tanabe S Haefner R M amp Cumming B G (2011)Suppressive mechanisms in monkey V1 help tosolve the stereo correspondence problem Journal ofNeuroscience 31(22) 8295ndash8305

Thibos L N Ye M Zhang X amp Bradley A(1992) The chromatic eye A new reduced-eyemodel of ocular chromatic aberration in humansApplied Optics 31(19) 3594ndash3600

Tsao D Y Conway B R amp Livingstone M S(2003) Receptive fields of disparity-tuned simplecells in macaque V1 Neuron 38(1) 103ndash114

Tyler C W (1975) Spatial organization of binoculardisparity sensitivity Vision Research 15(5) 583ndash590

Tyler C W amp Julesz B (1978) Binocular cross-correlation in time and space Vision Research18(1) 101ndash105

Vlaskamp B N S Yoon G amp Banks M S (2011)Human stereopsis is not limited by the optics of thewell-focused eye Journal of Neuroscience 31(27)9814ndash9818

Williams D R (1985) Visibility of interference fringesnear the resolution limit Journal of the OpticalSociety of America A 2(7) 1087ndash1093

Wyszecki G amp Stiles W (1982) Color scienceConcepts and methods quantitative data and for-mulas New York John Wiley amp Sons

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 18

113 13

Optimal13 disparity13 estimation13 in13 natural13 stereo-shy‐images13 (Supplement)13

Johannes13 Burge13 amp13 Wilson13 S13 Geisler13 13 Computing13 the13 log13 likelihood13 with13 populations13 of13 simple13 and13 complex13 cells13 13 Simple13 cells13 in13 visual13 cortex13 produce13 only13 positive13 responses13 whereas13 AMA13 filters13 produce13 both13 positive13 and13 negative13 responses13 The13 linear13 AMA13 filter13 responses13 (see13 first13 term13 in13 Eq13 5)13 and13 their13 pairwise13 sums13 (see13 the13 sum13 in13 the13 third13 termrsquos13 parentheses)13 can13 be13 obtained13 from13 a13 population13 of13 simple13 cells13 We13 define13 simple13 cells13 as13 units13 whose13 outputs13 result13 from13 linear13 filtering13 followed13 by13 half-shy‐wave13 rectification13 (ie13 thresholding13 at13 zero)13 Each13 AMA13 filter13 response13 in13 the13 first13 term13 can13 be13 obtained13 from13 two13 simple13 cells13 that13 are13 lsquoonrsquo13 and13 lsquooffrsquo13 versions13 of13 each13 AMA13 filter13 (Fig13 S3a13 see13 Fig13 3)13 Each13 pairwise-shy‐summed13 AMA13 filter13 response13 in13 the13 third13 term13 can13 be13 obtained13 from13 two13 simple13 cells13 that13 are13 lsquoonrsquo13 and13 lsquooffrsquo13 versions13 of13 each13 pairwise-shy‐summed13 filter13 (Fig13 S3a13 S4)13 13 13 The13 squared13 AMA13 filter13 responses13 (see13 the13 second13 and13 third13 terms13 in13 Eq13 5)13 can13 be13 obtained13 from13 a13 population13 of13 complex13 cells13 We13 define13 complex13 cells13 as13 units13 whose13 responses13 result13 from13 linear13 filtering13 followed13 by13 a13 squaring13 non-shy‐linearity13 as13 is13 often13 the13 case13 in13 cortex13 Complex13 cells13 can13 be13 implemented13 by13 summing13 and13 then13 squaring13 (or13 squaring13 and13 then13 summing)13 the13 outputs13 of13 lsquoonrsquo13 and13 lsquooffrsquo13 simple13 cells13 (Fig13 S3b13 the13 disparity13 tuning13 curves13 of13 these13 complex13 cells13 are13 given13 in13 Fig13 S5)13 13 Finally13 neurons13 whose13 response13 rates13 represent13 the13 log13 likelihood13 of13 the13 AMA13 filter13 population13 responses13 can13 be13 constructed13 via13 a13 weighted13 sum13 of13 simple13 and13 complex13 cell13 population13 responses13 (Eq13 513 Fig13 613 Fig13 7)13 (Note13 that13 the13 likelihood13 can13 be13 obtained13 by13 exponentiating13 the13 log-shy‐likelihood13 responses)13 In13 other13 words13 a13 large13 collection13 of13 log-shy‐likelihood13 (LL)13 neurons13 each13 with13 a13 different13 preferred13 disparity13 can13 be13 constructed13 from13 a13 fixed13 set13 of13 simple13 and13 complex13 cells13 simply13 by13 changing13 the13 weights13 13 13 Derivation13 of13 optimal13 weights13 on13 simple13 and13 complex13 cells13 The13 first13 term13 in13 Eq13 513 (ie13 the13 linear13 term)13 can13 be13 conceptualized13 as13 a13 weighted13 sum13 of13 simple13 cell13 population13 responses13 The13 second13 and13 third13 terms13 (ie13 the13 quadratic13 terms)13 can13 be13 conceptualized13 as13 weighted13 sums13 of13 complex13 cell13 population13 responses13 The13 constant13 corresponds13 to13 baseline13 response13 The13 constant13 and13 the13 weights13 on13 the13 linear13 squared13 and13 sum-shy‐squared13 filters13 are13 given13 by13

wik = Ck

minus1uk 13 13 13 13 13 13 13 13 13 (S1a)13

13 wiik = minusdiag(Cminus1)+ 05Cminus11 13 13 13 13 13 13 13 (S1b)13

wij k = minus05Cij

minus1 forallij j gt i 13 13 13 13 13 13 13 13 (S1c)13

cons primetk = minus05ukTCk

minus1uk + constk 13 13 13 13 13 13 (S1d)13 13 where13 I13 is13 the13 identity13 matrix13 113 is13 the13 lsquoonesrsquo13 vector13 and13 diag()13 sets13 a13 matrix13 diagonal13 to13 a13 vector13 Here13 we13 derive13 these13 weight13 equations13 based13 on13 the13 multi-shy‐dimensional13 Gaussian13 approximations13 to13 the13 empirically13 determined13 filter13 response13 distributions13 (ie13 conditional13 response13 distributions)13 (see13 Fig13 4a)13 13 13 13 13 13

213 13

First13 we13 rewrite13 Eq13 513 in13 the13 main13 text13 by13 expanding13 its13 third13 term13 13

ln p R δ k( ) = wik Ri

i=1

n

sum + wiik Ri2

i=1

n

sum + wij k Ri2 + 2Ri Rj + Rj

2( )j=i+1

n

sumi=1

nminus1

sum + cons primet 13 (S2)13

13 Second13 we13 rewrite13 Eq13 413 in13 the13 main13 text13

13

ln p R δ k( ) = minus05 R minus uk( )T Ak R minus uk( )+ const 13 13 13 13 13 (S3)13

13 where13 Ak 13 equals13 the13 inverse13 covariance13 matrix13 Ck

minus1 13 (Recall13 that13 the13 mean13 vector13 uk and13 covariance13 matrix13 Ck are13 obtained13 by13 projecting13 a13 collection13 of13 natural13 stereo-shy‐images13 having13 disparity13 δ k 13 onto13 the13 filters)13 From13 here13 forward13 the13 disparity13 index13 k13 is13 dropped13 for13 notational13 simplicity13 Third13 we13 set13 Eqs13 S213 and13 S313 equal13 to13 one13 another13 multiply13 through13 and13 collect13 terms13 The13 terms13 of13 the13 log13 likelihood13 (Eq13 S3)13 are13 found13 to13 have13 the13 form13 13

( ) ( )2 2 205 05 2ii i i ii i ii i i ii ia R u a R a Ru a uminus minus = minus minus + 13 13 13 13 (S4a)13

minusaij Ri minus ui( ) Rj minus u j( ) = minus aijRiRj minus aijRiu j minus aijRjui + aiju jui( ) 13 13 13 (S4b)13

13 where13 iia 13 are13 the13 on-shy‐diagonal13 elements13 of13 A 13 ija are13 the13 off-shy‐diagonal13 elements13 and13 iu 13 the13 are13 the13 mean13 filter13 responses13 13 13 The13 aim13 is13 now13 to13 express13 wi wii 13 and13 wij 13 in13 terms13 of13 iu 13 iia 13 and13 ija 13 First13 consider13 wi 13 In13 this13 case13

we13 want13 to13 collect13 terms13 containing13 iR 13 wiRi 13 from13 Eq13 S213 and13 aiiRiui 13 and13 aijRiu j 13 from13 Eq13 S4a13 Setting13 the13 iR 13 terms13 equal13 and13 canceling13 gives13 13

wi = aiju jj=1

n

sum 13 13 13 13 13 13 13 13 13 (S5a)13

13 Note13 that13 the13 weights13 on13 the13 linear13 filter13 responses13 wi 13 are13 zero13 if13 the13 mean13 linear13 filter13 responses13 ui 13 are13 zero13 In13 our13 case13 the13 filter13 response13 distributions13 means13 (ie13 the13 means13 of13 the13 conditional13 response13 distributions)13 are13 all13 near13 zero13 (see13 Fig13 4a13 Fig13 S2)13 Thus13 the13 weights13 on13 the13 simple13 cell13 responses13 (Fig13 6a)13 contribute13 little13 to13 the13 units13 (LL13 neurons)13 that13 respond13 according13 to13 the13 log13 likelihood13 of13 the13 linear13 filter13 population13 response13 (see13 Discussion)13 13 Next13 consider13 wij 13 i ne j 13 In13 this13 case13 we13 want13 to13 collect13 terms13 containing13 RiRj 13 2wijRiRj 13 appears13

in13 Eq13 S213 and13 minusaijRiRj 13 appears13 in13 Eq13 S4b13 Setting13 the13 two13 RiRj 13 terms13 equal13 to13 each13 other13 and13 canceling13 gives13 13

wij = minus05aij 13 13 13 13 13 13 13 13 13 (S5b)13

313 13

Next13 consider13 wii 13 In13 this13 case13 the13 total13 weight13 on13 2iR 13 in13 Eq13 S213 must13 be13 05 iiaminus 13 The13 weight13 from13

the13 third13 term13 of13 Eq13 S213 can13 be13 found13 by13 substituting13 wij = minus05aij 13 Subtracting13 that13 weight13 from13 the13

total13 weight13 specified13 by13 Eq13 S413 gives13 wii 13 13

wii = minusaii + 05 aijjsum 13 13 13 13 13 13 13 13 13 (S5c)13

13 Last13 by13 grouping13 terms13 from13 Eq13 S413 that13 are13 independent13 of13 the13 filter13 response13 we13 find13 the13 constant13 in13 Eq13 S213 is13 given13 by13 13

cons primet = minus05 aiimicroi2 + aijmicroimicro j

j foralljneisum

⎝⎜⎞

⎠⎟isum + const 13 13 13 13 (S5d)13

13 Eqs13 S1a-shy‐d13 are13 obtained13 by13 rewriting13 Eqs13 S5a-shy‐d13 in13 matrix13 notation13 13 13 In13 visual13 cortex13 simple13 and13 complex13 cells13 have13 the13 same13 maximum13 firing13 rates13 on13 average13 (Albrecht13 amp13 Hamilton13 1982)13 However13 the13 maximum13 possible13 response13 of13 the13 squared13 filters13 (see13 the13 second13 term13 in13 Eq13 5)13 is13 R2max 13 and13 the13 maximum13 possible13 response13 of13 the13 sum-shy‐squared13 filters13 (see13 the13 third13

term13 in13 Eq13 5)13 is13 given13 by13 R2ijmax = Rmax 2+ 2fi sdot f j( )2 13 where13 fi 13 and13 f j 13 are13 the13 weighting13 functions13 defining13 filters13 i13 and13 j13 (see13 Fig13 3a)13 Thus13 if13 these13 computations13 were13 implemented13 in13 cortexmdashthat13 is13 if13 the13 maximum13 response13 of13 the13 complex13 cells13 equaled13 the13 maximum13 response13 of13 the13 simple13 cells13 Rmax 13 rather13 than13 R

2max 13 and13 R

2ijmaxmdashthe13 weights13 on13 the13 complex13 cell13 responses13 must13 be13 scaled13

Thus13 the13 optimal13 weights13 on13 complex13 cells13 are13 given13 by13 primewi = wi 13 primewii = Rmaxwii 13 and13 primewij =Rijmax2

Rmaxwij 13

These13 scale13 factors13 are13 important13 because13 in13 a13 neurophysiological13 experiment13 one13 would13 measure13 primew 13 not13 w 13 Consequently13 primewii 13 and13 primewij 13 are13 the13 normalized13 weights13 plotted13 in13 Fig13 7b13

13 Evaluating13 the13 effect13 of13 surface13 slant13 Slant13 causes13 disparity13 to13 change13 across13 the13 patch13 We13 evaluated13 the13 effect13 of13 surface13 slant13 by13 determining13 the13 optimal13 filters13 and13 estimation13 performance13 for13 a13 training13 set13 with13 a13 distribution13 of13 slants13 If13 we13 assume13 (plausibly)13 that13 objects13 are13 on13 average13 equally13 likely13 to13 be13 viewed13 from13 any13 given13 angle13 then13 for13 a13 given13 small13 angular13 retinal13 extent13 the13 probability13 of13 a13 given13 slant13 about13 the13 vertical13 axis13 should13 fall13 as13 a13 half-shy‐cosine13 function13 Slant13 was13 calculated13 for13 the13 cyclopean13 eye13 To13 evaluate13 the13 effect13 of13 surface13 slant13 variation13 we13 generated13 a13 training13 set13 of13 stereo-shy‐images13 from13 a13 set13 of13 surfaces13 with13 slants13 that13 were13 randomly13 drawn13 from13 a13 truncated13 (+7113 deg)13 cosine13 probability13 density13 (more13 extreme13 slants13 were13 not13 practical13 to13 include13 because13 of13 the13 surface13 size13 required13 to13 produce13 a13 projected13 patch13 width13 of13 113 deg)13 The13 within-shy‐patch13 changes13 in13 disparity13 that13 were13 introduced13 by13 these13 slants13 are13 comparable13 to13 those13 in13 natural13 viewing13 (Fig13 S1g)13 Results13 based13 on13 this13 analysis13 are13 thus13 likely13 to13 be13 representative13 Then13 we13 determined13 the13 optimal13 binocular13 filters13 for13 this13 new13 training13 set13 The13 filters13 that13 were13 trained13 on13 surfaces13 having13 different13 slants13 (Fig13 S1a-shy‐c)13 and13 the13 LL13 neurons13 that13 are13 constructed13 from13 them13 (Fig13 S1de)13 are13 quite13 similar13 to13 those13 that13 were13 trained13 only13 on13 fronto-shy‐parallel13 surfaces13 The13 main13 (although13 slight)13 differences13 are13 that13 the13 new13 filters13 have13 spatial13 extents13 that13 are13 slightly13 smaller13 and13 octave13 bandwidths13 that13 are13 slightly13 higher13

413 13

and13 that13 the13 new13 LL13 neuron13 tuning13 curves13 are13 broader13 Disparity13 estimation13 performance13 on13 slant13 surfaces13 is13 also13 comparable13 but13 precision13 is13 somewhat13 reduced13 (Fig13 S1f13 Fig13 5d)13 When13 disparity13 changes13 across13 the13 patch13 the13 optimal13 encoding13 filters13 must13 be13 small13 enough13 to13 prevent13 the13 disparity13 signal13 from13 being13 lsquosmeared13 outrsquo13 by13 within-shy‐patch13 (ie13 within-shy‐receptive13 field)13 disparity13 variation13 And13 they13 must13 be13 large13 enough13 to13 prevent13 a13 poor13 signal-shy‐to-shy‐noise13 ratio13 which13 would13 lead13 to13 a13 preponderance13 of13 false13 matches13 (ie13 inaccurate13 disparity13 estimates)13 The13 optimal13 filters13 appear13 to13 strike13 this13 balance13 13 13 Evaluating13 the13 effect13 of13 depth13 variations13 other13 than13 slant13 Naturally13 occurring13 fine13 depth13 structure13 and13 occlusions13 (ie13 within-shy‐patch13 depth13 variation13 and13 depth13 discontinuities13 Fig13 S8a-shy‐c)13 were13 not13 present13 in13 either13 of13 our13 training13 sets13 To13 determine13 how13 the13 magnitude13 of13 these13 other13 sources13 of13 depth13 variation13 compare13 to13 that13 due13 to13 slant13 we13 analyzed13 4000013 113 deg13 patches13 from13 3013 outdoor13 range13 images13 obtained13 with13 a13 high-shy‐precision13 Riegl13 VZ40013 range13 scanner13 For13 each13 patch13 we13 computed13 the13 standard13 deviation13 of13 i)13 the13 differences13 between13 the13 raw13 range13 values13 and13 the13 best-shy‐fitting13 fronto-shy‐parallel13 plane13

σ nonminus planar 13 and13 ii)13 the13 differences13

between13 the13 best-shy‐fitting13 slanted13 and13 fronto-shy‐parallel13 planes13 σ planar 13 The13 ratio13 of13 these13 standard13

deviations13 σ planar σ nonminus planar 13 provides13 a13 measure13 of13 the13 relative13 magnitude13 of13 depth13 variation13 due13 to13

slant13 and13 variation13 due13 to13 other13 factors13 If13 the13 depth13 variation13 is13 due13 only13 to13 surface13 slant13 then13 the13 ratio13

σ planar σ nonminus planar 13 will13 have13 a13 value13 equal13 to13 1013 If13 all13 of13 the13 depth13 variation13 is13 due13 to13 factors13

other13 than13 slant13 the13 ratio13 will13 have13 a13 value13 of13 0013 Fig13 S8d13 plots13 the13 distribution13 of13 this13 ratio13 for13 all13 range13 patches13 In13 the13 most13 common13 patches13 essentially13 all13 the13 variation13 is13 captured13 by13 slant13 (ie13 the13 modal13 ratio13 is13 ~10)13 The13 median13 ratio13 is13 07513 indicating13 that13 in13 the13 median13 patch13 7513 of13 the13 depth13 variation13 is13 captured13 by13 slant13 Given13 that13 the13 optimal13 binocular13 filters13 are13 robust13 to13 variation13 in13 surface13 slant13 and13 given13 that13 most13 of13 natural13 depth13 variation13 is13 captured13 by13 slant13 it13 is13 unlikely13 that13 depth13 variation13 other13 than13 slant13 will13 qualitatively13 affect13 our13 results13 13 External13 variability13 and13 neural13 noise13 13 We13 examined13 the13 relative13 magnitude13 of13 stimulus-shy‐induced13 LL13 neuron13 variability13 and13 typical13 neural13 noise13 in13 cortex13 Typical13 cortical13 neurons13 have13 a13 peak13 mean13 response13 larger13 than13 3013 spikessec13 (Geisler13 amp13 Albrecht13 1997)13 and13 intrinsic13 noise13 variance13 proportional13 to13 the13 neuronrsquos13 mean13 response13 with13 a13 proportionality13 constant13 (ie13 Fano13 factor)13 less13 than13 15(Tolhurst13 Movshon13 amp13 Dean13 1983)13 To13 err13 on13 the13 side13 of13 too13 much13 neural13 noise13 we13 assume13 that13 each13 LL13 neuron13 has13 a13 maximum13 average13 response13 of13 3013 spkss13 and13 a13 Fano13 factor13 of13 1513 (bc13 the13 effect13 of13 noise13 is13 greater13 at13 lower13 firing13 rates)13 We13 convert13 the13 LL13 neuron13 responses13 to13 spike13 rates13 by13 first13 obtaining13 the13 average13 response13 of13 each13 LL13 neuron13 to13 its13 preferred13 disparity13 and13 then13 scaling13 that13 value13 to13 the13 maximum13 average13 response13 For13 an13 integration13 time13 of13 a13 typical13 fixation13 duration13 (30013 ms)13 stimulus-shy‐induced13 response13 variance13 was13 ~3x13 greater13 than13 the13 internal13 noise13 For13 an13 integration13 time13 equal13 to13 a13 typical13 stimulus13 presentation13 in13 a13 psychophysical13 task13 (113 sec)13 13 stimulus-shy‐induced13 response13 variance13 was13 ~9x13 greater13 than13 the13 internal13 noise13 Thus13 external13 variability13 must13 be13 considered13 to13 understand13 the13 influence13 of13 neural13 noise13 on13 the13 behavioral13 limits13 of13 vision13 under13 natural13 conditions13 In13 fact13 stimulus-shy‐induced13 variability13 is13 likely13 to13 be13 the13 controlling13 variability13 in13 many13 natural13 single-shy‐trial13 tasks13 13 Spatial13 frequency13 tuning13 of13 optimal13 linear13 binocular13 filters13 13 To13 develop13 an13 intuition13 for13 why13 spatial13 frequencies13 higher13 than13 ~613 cpd13 carry13 little13 information13 about13 disparity13 we13 examined13 the13 disparity13 signals13 resulting13 from13 a13 binocularly-shy‐viewed13 high-shy‐contrast13 cosine-shy‐windowed13 edge13 The13 Fourier-shy‐decomposition13 of13 each13 eyersquos13 stimulus13 is13 a13 set13 of13

513 13

phase-shy‐aligned13 sinewave13 gratings13 with13 the13 1f13 contrast13 fall-shy‐off13 typical13 of13 natural13 scenes(Field13 1987)13 The13 binocular13 difference13 signal13 (ie13 the13 difference13 of13 the13 left13 and13 right13 signals)13 is13 sinusoidal13 with13 contrast13 amplitude13 given13 by13 13

AB f δ k( ) = AL( f )2 + AR( f )2 minus 2AL( f )AR( f )cos 2π fδ k( ) 13 13 13 (S6)13

13 where f 13 is13 the13 spatial13 frequency13 δ k 13 is13 a13 particular13 disparity13 AL ( f ) 13 and13 AR( f ) 13 are13 the13 left13 and13

right13 eye13 retinal13 amplitudes13 and13 AB f δ k( ) 13 is13 the13 amplitude13 of13 the13 binocular13 contrast13 difference13 signal13 If13 the13 two13 eyes13 have13 identical13 optics13 the13 left-shy‐13 and13 right-shy‐eye13 retinal13 amplitudes13 (which13 include13 the13 effects13 of13 the13 optics13 and13 the13 1 f 13 falloff13 of13 natural13 image13 spectra)13 will13 be13 the13 same13 for13 a13 given13 disparity13 Fig13 S7a13 plots13 pattern13 of13 binocular13 contrast13 signals13 for13 each13 of13 several13 disparities13 Fig13 S7b13 shows13 the13 signals13 after13 filtering13 with13 a13 bank13 of13 log-shy‐Gabor13 filters13 each13 having13 a13 1513 octave13 bandwidth13 The13 patterns13 of13 binocular13 contrast13 barely13 differ13 above13 ~613 cpd13 A13 filter13 population13 (for13 disparity13 estimation)13 will13 be13 most13 efficient13 if13 each13 filter13 signals13 information13 about13 a13 range13 of13 disparities13 13 13 V113 binocular13 receptive13 fields13 tuned13 to13 the13 highest13 useful13 spatial13 frequency13 (~613 cpd)13 would13 have13 a13 spatial13 extent13 of13 ~813 arcmin13 assuming13 the13 typical13 1513 octave13 bandwidth13 of13 neurons13 in13 cortex13 Given13 that13 receptive13 fields13 cannot13 signal13 disparity13 variation13 finer13 than13 their13 own13 width13 this13 analysis13 may13 help13 explain13 why13 humans13 lose13 the13 ability13 to13 detect13 disparity13 modulations13 (ie13 sinusoidal13 modulations13 in13 disparity-shy‐defined13 depth)13 at13 spatial13 frequencies13 higher13 than13 ~413 cpd13 (Banks13 Gepshtein13 amp13 Landy13 200413 Harris13 McKee13 amp13 Smallman13 1997)13 Vergence13 eye13 movement13 jitter13 has13 previously13 been13 proposed13 as13 an13 explanation13 for13 why13 useful13 binocular13 information13 is13 restricted13 to13 low13 spatial13 frequencies13 (Vlaskamp13 Yoon13 amp13 Banks13 2011)13 vergence13 noise13 can13 degrade13 stereo13 information13 enough13 to13 render13 high13 frequency13 signals13 unmeasurable13 (Fig13 S7cd)13 The13 analysis13 presented13 here13 suggests13 that13 the13 frequency13 at13 which13 humans13 lose13 the13 ability13 to13 detect13 disparity13 modulation13 may13 simply13 be13 explained13 by13 the13 statistics13 of13 natural13 images13 13 Optics13 The13 retinal13 images13 used13 in13 our13 analysis13 were13 simulated13 to13 match13 the13 retinal13 image13 formation13 that13 occurs13 in13 the13 human13 eye13 In13 humans13 and13 non-shy‐human13 primates13 accommodative13 and13 vergence13 eye-shy‐movement13 systems13 are13 yoked(Fincham13 amp13 Walton13 1957)13 Consistent13 with13 this13 fact13 we13 set13 the13 focus13 distance13 equal13 to13 the13 fixation13 distance13 of13 4013 cm13 and13 defocused13 the13 images13 appropriate13 for13 each13 disparity13 The13 image13 of13 a13 surface13 displaced13 from13 the13 fixation13 is13 defocused13 by13

13 ΔD cong δ IPD 13 13 13 13 13 13 13 13 13 13 (S7)13

13 where13 δ 13 is13 the13 disparity13 of13 the13 surface13 expressed13 in13 radians13 IPD13 is13 the13 inter-shy‐pupillary13 distance13 expressed13 in13 meters13 and13 ΔD 13 is13 the13 defocus13 expressed13 in13 diopters13 (1meters)13 To13 account13 for13 the13 effect13 of13 chromatic13 aberration13 we13 created13 a13 different13 polychromatic13 point-shy‐spread13 function13 (PSF)13 for13 each13 disparity13 (ie13 defocus13 Eq13 S7)13 Single-shy‐wavelength13 PSFs13 were13 computed13 every13 1013 nm13 between13 40013 and13 70013 nm13 The13 wavelength-shy‐dependent13 change13 in13 refractive13 power13 of13 the13 human13 eye13 was13 taken13 from13 the13 literature13 (Thibos13 Ye13 Zhang13 amp13 Bradley13 1992)13 Polychromatic13 PSFs13 were13 obtained13 by13 weighting13 the13 single-shy‐wavelength13 PSFs13 by13 the13 photopic13 sensitivity13 function13 (Stockman13 amp13 Sharpe13 2000)13 and13 by13 the13 D6513 daylight13 illumination13 spectrum13 and13 then13 summing13 13

613 13

13 psf photopic xΔD( ) = 1

Kpsf xλΔD( )sc

λsum λ( )D65 λ( ) 13 13 13 13 (S8)13

13 where13 sc λ( ) 13 is13 the13 human13 photopic13 sensitivity13 function13 and13 K13 is13 a13 normalizing13 constant13 that13 sets13 the13 volume13 of13 psf photopic 13 to13 1013 The13 modulation13 transfer13 function13 (MTF)13 is13 the13 amplitude13 of13 the13 Fourier13 transform13 of13 the13 PSF13 13 Contrast13 normalization13 Contrast13 normalization13 occurs13 early13 in13 visual13 processing13 The13 standard13 model13 of13 cortical13 neuron13 response13 assumes13 that13 the13 input13 is13 the13 contrast13 signal13

c x( ) = r x( )minus r( ) r 13 where13 r x( ) 13 are13 the13

noisy13 sensor13 responses13 to13 the13 luminance13 image13 falling13 on13 the13 retina13 over13 a13 local13 area13 and13 r 13 is13 the13 local13 mean13 This13 contrast13 signal13 is13 then13 normalized13 by13 the13 local13 contrast13

cnorm x( ) = c x( ) c x( ) 2

+ nc502 13 where13 n13 is13 the13 dimensionality13 of13 the13 vector13 c x( ) 2 13 is13 the13 contrast13

energy13 (or13 equivalently13 n 13 times13 the13 squared13 RMS13 contrast)13 and13 50c 13 is13 the13 half-shy‐saturation13 constant13 (Albrecht13 amp13 Geisler13 199113 Albrecht13 amp13 Hamilton13 198213 Heeger13 1992)13 (We13 are13 agnostic13 about13 the13 mechanism13 by13 which13 the13 contrast13 normalization13 is13 achieved)13 Finally13 the13 normalized13 contrast13 signal13 is13 weighted13 by13 the13 cortical13 neuronrsquos13 receptive13 field13 and13 passed13 through13 a13 static13 non-shy‐linearity13 For13 example13 in13 the13 case13 of13 a13 binocular13 simple13 cell13 the13 response13 of13 the13 neuron13 is13 obtained13 by13 taking13 the13 dot13 product13 of13 the13 contrast-shy‐normalized13 signal13 with13 the13 receptive13 field13 and13 half-shy‐wave13 rectifying13

Ri = Rmax cnorm x( ) sdot fi x( )⎢⎣ ⎥⎦ 13 13

13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13

713 13

13 References13 Albrecht13 D13 G13 amp13 Geisler13 W13 S13 (1991)13 Motion13 selectivity13 and13 the13 contrast-shy‐response13 function13 of13

simple13 cells13 in13 the13 visual13 cortex13 Visual13 Neuroscience13 7(6)13 531ndash54613 Albrecht13 D13 G13 amp13 Hamilton13 D13 B13 (1982)13 Striate13 cortex13 of13 monkey13 and13 cat13 contrast13 response13

function13 Journal13 of13 Neurophysiology13 48(1)13 217ndash23713 Banks13 M13 S13 Gepshtein13 S13 amp13 Landy13 M13 S13 (2004)13 Why13 is13 spatial13 stereoresolution13 so13 low13 The13

Journal13 of13 Neuroscience13 24(9)13 2077ndash208913 13 Field13 D13 J13 (1987)13 Relations13 between13 the13 statistics13 of13 natural13 images13 and13 the13 response13 properties13 of13

cortical13 cells13 Journal13 of13 the13 Optical13 Society13 of13 America13 A13 4(12)13 2379ndash239413 Fincham13 E13 amp13 Walton13 J13 (1957)13 The13 reciprocal13 actions13 of13 accommodation13 and13 convergence13 The13

Journal13 of13 Physiology13 137(3)13 488ndash50813 Geisler13 W13 S13 amp13 Albrecht13 D13 G13 (1997)13 Visual13 cortex13 neurons13 in13 monkeys13 and13 cats13 detection13

discrimination13 and13 identification13 Visual13 Neuroscience13 14(5)13 897ndash91913 Harris13 J13 M13 McKee13 S13 P13 amp13 Smallman13 H13 S13 (1997)13 Fine-shy‐scale13 processing13 in13 human13 binocular13

stereopsis13 Journal13 of13 the13 Optical13 Society13 of13 America13 A13 14(8)13 1673ndash168313 Heeger13 D13 J13 (1992)13 Normalization13 of13 cell13 responses13 in13 cat13 striate13 cortex13 Visual13 Neuroscience13 9(2)13

181ndash19713 Hibbard13 P13 B13 (2008)13 Binocular13 energy13 responses13 to13 natural13 images13 Vision13 Research13 48(12)13 1427ndash

143913 13 Stockman13 A13 amp13 Sharpe13 L13 T13 (2000)13 The13 spectral13 sensitivities13 of13 the13 middle-shy‐13 and13 long-shy‐wavelength-shy‐

sensitive13 cones13 derived13 from13 measurements13 in13 observers13 of13 known13 genotype13 Vision13 Research13 40(13)13 1711ndash173713

Thibos13 L13 N13 Ye13 M13 Zhang13 X13 amp13 Bradley13 A13 (1992)13 The13 chromatic13 eye13 a13 new13 reduced-shy‐eye13 model13 of13 ocular13 chromatic13 aberration13 in13 humans13 Applied13 Optics13 31(19)13 3594ndash360013

Tolhurst13 D13 J13 Movshon13 J13 A13 amp13 Dean13 A13 F13 (1983)13 The13 statistical13 reliability13 of13 signals13 in13 single13 neurons13 in13 cat13 and13 monkey13 visual13 cortex13 Vision13 Research13 23(8)13 775ndash78513

Vlaskamp13 B13 N13 S13 Yoon13 G13 amp13 Banks13 M13 S13 (2011)13 Human13 stereopsis13 is13 not13 limited13 by13 the13 optics13 of13 the13 well-shy‐focused13 eye13 The13 Journal13 of13 Neuroscience13 31(27)13 9814ndash981813

13

813 13

13 Figure13 S113 Optimal13 filters13 and13 disparity13 estimation13 performance13 for13 estimating13 disparity13 for13 surfaces13 with13 a13 cosine13 distribution13 of13 slants13 (a)13 Optimal13 linear13 binocular13 filters13 for13 estimating13 the13 disparity13 (of13 the13 center13 pixel)13 of13 slanted13 surfaces13 The13 filters13 were13 learned13 on13 a13 training13 set13 having13 surfaces13 with13 a13 cosine13 distribution13 of13 slants13 The13 optimal13 filter13 shapes13 are13 robust13 to13 the13 presence13 of13 non-shy‐zero13 surface13 slants13 in13 the13 training13 set13 However13 these13 filters13 are13 of13 somewhat13 smaller13 spatial13 extent13 and13 slightly13 higher13 octave13 bandwidth13 (compare13 filters13 F3-shy‐F513 to13 those13 in13 Fig13 3)13 (b)13 Spatial13 frequency13 tuning13 and13 bandwidth13 of13 filters13 (c)13 Two-shy‐dimensional13 analogs13 of13 filters13 in13 (a)13 The13 two-shy‐dimensional13 filters13 respond13 virtually13 identically13 to13 two-shy‐dimensional13 images13 as13 do13 the13 one-shy‐dimensional13 filters13 to13 one-shy‐dimensional13 signals13 (d)13 Tuning13 curves13 of13 LL13 neurons13 trained13 and13 tested13 on13 surfaces13 with13 varying13 surface13 slant13 (e)13 Normalized13 weights13 on13 model13 complex13 cell13 responses13 for13 constructing13 five13 LL13 neurons13 with13 the13 indicated13 preferred13 disparities13 (f)13 Disparity13 estimation13 performance13 with13 surfaces13 having13 a13 cosine13 distribution13 of13 slants13 (black13 line)13 Also13 shown13 are13 disparity13 estimates13 derived13 from13 a13 local13 cross-shy‐correlator13 (gray13 line)13 (g)13 Disparity13 gradients13 from13 viewing13 natural13 scenes(Hibbard13 2008)13 vs13 disparity13 gradients13 in13 our13 training13 set13 The13 relationship13 between13 slant13 and13 disparity13 gradient13 is13 given13 by13

g asymp IPD tanθ( ) d 13 where13 θ 13 is13 the13 surface13 slant13 and13 d13 is13 the13 distance13 to13 the13 surface13 Hibbard13 (2008)13 estimated13 disparity13 gradients13 in13 natural13 viewing13 for13 distances13 nearer13 than13 3513 m13 (dashed13 curve)13 and13 0513 m13 (dotted13 curve)13 At13 the13 04m13 viewing13 distance13 assumed13 throughout13 the13 paper13 our13 manipulation13 of13 surface13 slant13 introduced13 distribution13 of13 disparity13 gradients13 (black13 line)13 that13 is13 comparable13 to13 the13 gradients13 introduced13 in13 natural13 viewing13 13

913 13

13 Figure13 S213 Filter13 responses13 LL13 neuron13 responses13 and13 performance13 with13 random-shy‐dot13 stereograms13 and13 anti-shy‐correlated13 random-shy‐dot13 stereograms13 (a)13 Joint13 response13 distributions13 of13 filters13 F113 and13 F213 when13 stimulated13 with13 random-shy‐dot13 stereograms13 Compare13 to13 natural13 stimuli13 in13 Fig13 4a13 (b)13 Joint13 responses13 of13 filters13 F313 and13 F413 when13 stimulated13 with13 random-shy‐dot13 stereograms13 (c)13 LL13 neuron13 responses13 to13 random13 dot13 stereograms13 (d)13 LL13 neuron13 tuning13 vs13 bandwidth13 when13 stimulated13 with13 natural13 stimuli13 Note13 that13 bandwidths13 are13 elevated13 compared13 to13 when13 the13 LL13 neurons13 are13 stimulated13 with13 natural13 stimuli13 (see13 Fig13 7c)13 (e)13 Disparity13 estimates13 and13 estimate13 precision13 (inset)13 obtained13 with13 random-shy‐dot13 stereograms13 (fghi)13 Same13 as13 abce13 except13 that13 the13 stimuli13 were13 anti-shy‐correlated13 random13 dot13 stereograms13 Filter13 response13 distributions13 still13 segregate13 as13 a13 function13 of13 disparity13 (especially13 with13 F313 and13 F4)13 However13 the13 anti-shy‐correlated13 filter13 response13 distributions13 do13 not13 coincide13 with13 the13 lsquoexpectedrsquo13 pattern13 of13 response13 for13 particular13 disparities13 The13 disparities13 are13 decoded13 improperly13 with13 a13 decoder13 trained13 on13 natural13 stimuli13 LL13 neuron13 tuning13 curves13 with13 anti-shy‐correlated13 stereograms13 are13 neither13 selective13 nor13 invariant13 Inaccurate13 highly13 variable13 estimates13 result13 13 13

13 13

1013 13

13 Figure13 S313 Obtaining13 AMA13 filter13 and13 model13 complex13 cell13 responses13 with13 established13 neural13 operations13 (a)13 Simple13 cell13 responses13 are13 obtained13 by13 projecting13 the13 left-shy‐13 and13 right-shy‐eye13 signals13 onto13 each13 binocular13 filter13 and13 half-shy‐wave13 rectifying13 Subtracting13 lsquoonrsquo13 and13 lsquooffrsquo13 simple13 cell13 responses13 give13 the13 AMA13 filter13 responses13 (see13 Fig13 3a)13 (b)13 Model13 (and13 neurophysiological)13 complex13 cell13 responses13 are13 obtained13 by13 summing13 and13 squaring13 the13 responses13 of13 simple13 cells13 (Model13 complex13 cell13 responses13 can13 also13 be13 obtained13 by13 summing13 the13 responses13 of13 simple13 cells13 that13 have13 both13 a13 half-shy‐wave13 rectifying13 and13 a13 squaring13 nonlinearity)13 Note13 that13 the13 model13 complex13 cells13 differ13 from13 canonical13 disparity13 energy13 binocular13 complex13 cells13 We13 label13 them13 lsquocomplexrsquo13 because13 they13 exhibit13 temporal13 frequency13 doubling13 and13 have13 a13 fundamental13 to13 DC13 response13 ratio13 of13 less13 than13 one13 The13 model13 LL13 neurons13 (see13 Fig13 613 and13 Fig13 7)13 are13 more13 similar13 to13 disparity13 energy13 complex13 cells13 although13 as13 noted13 in13 the13 text13 there13 are13 also13 important13 differences13

13

1113 13

13 Figure13 S413 The13 eight13 AMA13 filters13 from13 Fig13 313 (on13 the13 diagonal)13 and13 their13 pairwise13 sums13 (see13 Fig13 S3a)13 Solid13 and13 dashed13 curves13 indicate13 the13 left13 and13 right13 eye13 filter13 components13 respectively13 Filter13 identity13 is13 indicated13 by13 the13 subscripts13 Fij13 Some13 filters13 have13 shapes13 similar13 to13 canonical13 V1-shy‐like13 binocular13 receptive13 fields13 These13 filters13 yield13 model13 complex13 cell13 tuning13 curves13 that13 vary13 with13 disparity13 (Fig13 S5)13 Other13 filters13 are13 irregularly13 shaped13 (eg13 F36)13 Irregularly13 shaped13 filters13 yield13 model13 complex13 cell13 tuning13 curves13 that13 do13 not13 vary13 significantly13 with13 disparity13 (Fig13 S5)13 The13 inset13 shows13 the13 largest13 absolute13 normalized13 weight13 (see13 Supplement13 Eqs13 S1a-shy‐d)13 given13 to13 each13 filter13 in13 constructing13 the13 set13 of13 LL13 neurons13 (Fig13 7a)13 Lighter13 colors13 indicate13 smaller13 weights13 Unsurprisingly13 the13 irregularly13 shaped13 filters13 (eg13 F36)13 play13 a13 generally13 smaller13 role13 in13 shaping13 the13 response13 properties13 of13 log-shy‐likelihood13 (LL)13 neurons13 13 13

1213 13

13 Figure13 S513 Disparity13 tuning13 curves13 of13 the13 model13 complex13 cells13 implied13 by13 the13 AMA13 filters13 (on-shy‐diagonal)13 and13 their13 normalized13 pairwise13 sums13 (off-shy‐diagonal)13 Solid13 curves13 show13 the13 complex13 cell13 tuning13 curves13 to13 natural13 stimuli13 The13

complex13 response13 is13 given13 by13 filtering13 the13 contrast-shy‐normalized13 stimulus13 and13 then13 squaring13 Rmax fi

Tcnorm( )2 13 (see13 Fig13 S4)13 Each13 point13 on13 the13 curve13 represents13 the13 average13 response13 to13 a13 large13 number13 of13 natural13 stimuli13 having13 the13 same13 disparity13 Gray13 area13 shows13 response13 variability13 due13 to13 variation13 in13 natural13 image13 content13 Dashed13 curves13 show13 the13 complex13 cell13 tuning13 curve13 to13 513 random-shy‐dot13 stereograms13 (RDSs)13 In13 most13 cases13 RDS13 tuning13 curves13 are13 similar13 to13 the13 tuning13 curves13 for13 natural13 stimuli13 although13 response13 magnitude13 is13 generally13 somewhat13 attenuated13 These13 individual13 model13 complex13 cells13 have13 some13 selectivity13 for13 disparity13 However13 they13 exhibit13 little13 invariance13 to13 variation13 in13 natural13 image13 content13 Thus13 the13 pattern13 of13 the13 joint13 population13 response13 must13 be13 used13 to13 accurately13 estimate13 disparity13 Neurons13 that13 signal13 the13 log-shy‐likelihood13 of13 disparity13 (LL13 neurons)13 can13 be13 constructed13 by13 appropriately13 combining13 the13 joint13 responses13 of13 model13 complex13 cells13 LL13 neurons13 are13 highly13 selective13 for13 disparity13 and13 are13 also13 strongly13 response13 invariant13 to13 irrelevant13

1313 13

13 Figure13 S613 Optimal13 disparity13 estimation13 vs13 estimation13 via13 local13 cross-shy‐correlation13 Performance13 measures13 were13 obtained13 by13 testing13 each13 method13 on13 identical13 sets13 of13 left-shy‐13 and13 right-shy‐eye13 signals13 like13 those13 shown13 in13 Fig13 1d13 (Note13 that13 this13 differs13 somewhat13 from13 the13 most13 common13 test13 of13 the13 cross-shy‐correlation13 model13 which13 gives13 the13 algorithm13 windowed13 access13 to13 many13 patches13 in13 each13 eye13 We13 chose13 the13 current13 method13 so13 that13 each13 algorithm13 has13 access13 to13 the13 same13 image13 information13 thereby13 making13 the13 comparison13 meaningful)13 Black13 curves13 indicate13 estimation13 performance13 via13 local13 cross-shy‐correlation13 Gray13 curves13 indicate13 performance13 obtained13 with13 the13 eight13 AMA13 filters13 shown13 in13 the13 main13 text13 (data13 replotted13 from13 Fig13 5)13 (a)13 Median13 MAP13 estimates13 for13 frontoparallel13 surfaces13 only13 Error13 bars13 indicate13 6813 confidence13 intervals13 (b)13 Estimate13 precision13 as13 indicated13 by13 6813 confidence13 intervals13 Error13 bars13 from13 (a)13 are13 replotted13 on13 a13 linear13 axis13 (Confidence13 intervals13 were13 plotted13 on13 a13 semi-shy‐log13 plot13 in13 the13 main13 text)13 (c)13 Sign13 identification13 performance13 with13 frontoparallel13 surfaces13 (de)13 Same13 as13 bc13 except13 that13 the13 performance13 is13 for13 surfaces13 with13 a13 cosine13 distribution13 of13 slants13 Estimation13 performance13 with13 the13 cross-shy‐correlator13 is13 (slightly)13 more13 biased13 and13 is13 significantly13 less13 precise13 at13 disparities13 larger13 than13 7513 arcmin13 13

1413 13

13 Figure13 S713 Information13 for13 disparity13 estimation13 is13 concentrated13 in13 low13 to13 mid13 spatial13 frequencies13 (a)13 Inter-shy‐ocular13 retinal13 contrast13 amplitude13 of13 a13 cosine-shy‐windowed13 high-shy‐contrast13 edge13 as13 a13 function13 of13 spatial13 frequency13 for13 different13 disparities13 (Eq13 S6)13 The13 optics13 in13 the13 two13 eyes13 were13 identical13 ie13

AL f( ) = AR f( )prop MTFδ f( ) f 13 where13 MTFδ is13 the13 modulation13 transfer13 function13 associated13 with13 a13 particular13 disparity13 (see13 Methods)13 (b)13 Inter-shy‐ocular13 contrast13 signals13 that13 have13 been13 passed13 through13 a13 bank13 of13 1513 octave13 bandwidth13 log-shy‐Gabor13 filters13 The13 signals13 differ13 little13 above13 ~513 cpd13 (c)13 The13 effect13 of13 vergence13 jitter13 on13 inter-shy‐ocular13 contrast13 signals13 Signals13 were13 corrupted13 by13 213 arcmin13 of13 vergence13 jitter13 a13 typical13 value13 in13 humans13 Vergence13 noise13 was13 simulated13 by13 convolving13 each13 disparity13 signal13 with13 a13 Gaussian13 having13 a13 standard13 deviation13 of13 213 arcmin13 Vergence13 jitter13 further13 attenuates13 high13 frequency13 inter-shy‐ocular13 contrast13 signals13 Natural13 image13 structure13 and13 vergence13 jitter13 both13 contribute13 to13 the13 lack13 of13 disparity13 information13 at13 high13 luminance13 spatial13 frequencies13 (d)13 The13 signals13 in13 (c)13 after13 having13 been13 passed13 through13 a13 bank13 of13 1513 octave13 bandwidth13 log-shy‐Gabor13 filters13 (e)13 The13 effect13 of13 a13 four13 diopter13 difference13 in13 refractive13 power13 between13 the13 two13 eyes13 drastically13 reduces13 the13 inter-shy‐ocular13 contrast13 signal13 A13 four13 diopter13 (or13 larger)13 difference13 between13 the13 left13 and13 right13 optics13 is13 a13 primary13 risk13 factor13 for13 the13 development13 of13 amblyopia13 (Levi13 et13 al13 2011)13 Note13 that13 these13 signals13 were13 simulated13 for13 eyes13 with13 2mm13 pupils13 With13 larger13 pupil13 diameters13 the13 same13 difference13 in13 refractive13 power13 would13 further13 reduce13 available13 disparity13 signals13 For13 example13 with13 a13 2mm13 pupil13 an13 813 diopter13 difference13 between13 the13 eyes13 produces13 the13 same13 difference13 in13 defocus13 blur13 as13 would13 a13 4mm13 pupil13 with13 a13 413 diopters13 difference13 between13 the13 eyes13 In13 that13 case13 there13 are13 no13 measureable13 signals13 above13 113 cpd13 (f)13 The13 signals13 in13 (e)13 after13 having13 been13 passed13 through13 a13 bank13 of13 1513 octave13 bandwidth13 log-shy‐Gabor13 filters13 (g)13 Same13 as13 (e)13 but13 with13 a13 4mm13 pupil13 (h)13 Same13 as13 (f)13 but13 with13 a13 4mm13 pupil13 13

13

1513 13

13 Figure13 S813 Depth13 structure13 not13 attributable13 to13 slant13 in13 natural13 scenes13 (a)13 A13 representative13 natural13 range13 image13 (b)13 Overhead13 plan13 view13 of13 the13 range13 along13 the13 white13 scan-shy‐line13 in13 (a)13 (c)13 Range13 values13 within13 a13 113 deg13 patch13 marked13 by13 the13 white13 box13 in13 (a)13 and13 the13 black13 box13 in13 (b)13 The13 dots13 represent13 the13 raw13 range13 values13 The13 plane13 is13 the13 best13 fitting13 plane13 to13 the13 range13 data13 (d)13 Depth13 variation13 in13 113 deg13 patches13 of13 natural13 scenes13 can13 largely13 be13 captured13 by13 slanted13 planes13 (see13 supplementary13 text)13 Values13 near13 one13 indicate13 that13 all13 range13 variation13 can13 be13 captured13 by13 (planar)13 surface13 slant13 Values13 near13 zero13 indicate13 that13 none13 of13 the13 range13 variation13 can13 be13 captured13 by13 planar13 surface13 slant13 In13 the13 most13 common13 patch13 nearly13 all13 depth13 variation13 is13 attributable13 to13 slant13 In13 the13 median13 patch13 7513 of13 depth13 variation13 is13 captured13 by13 slant13 13 13 13 13 13 13 13 13 13 13

1613 13

13 Figure13 S913 Contrast13 normalization13 and13 the13 Gaussian13 form13 of13 the13 filter13 response13 distributions13 The13 standard13 model13 of13

contrast13 normalization13 in13 retinal13 and13 cortical13 neurons13 is13 given13 by13 cnorm x( ) = c x( ) c x( ) 2

+ nc502 (Albrecht13 amp13 Geisler13

199113 Albrecht13 amp13 Hamilton13 198213 Heeger13 1992)13 Throughout13 the13 paper13 we13 assumed13 that13 c50 13 had13 a13 value13 of13 zero13 To13 examine13 the13 effect13 of13 different13 values13 of13 c50 13 on13 the13 form13 of13 the13 filter13 response13 distributions13 we13 first13 contrast-shy‐normalized13 the13 test13 stimuli13 using13 different13 values13 of13 c50 13 Then13 we13 projected13 the13 contrast-shy‐normalized13 the13 test13 stimuli13 from13 each13 disparity13 level13 onto13 the13 filters13 to13 obtain13 the13 conditional13 filter13 response13 distributions13 Finally13 we13 fit13 a13 multi-shy‐dimensional13 generalized13 Gaussian13 (one-shy‐dimension13 for13 each13 filter)13 to13 each13 conditional13 response13 distribution13

P R |δ k( ) 13 via13 maximum13 likelihood13 estimation13 (a)13 When13 c50 =13 0013 the13 conditional13 response13 distributions13 have13 tails13 that13 tend13 to13 be13 somewhat13 lighter13 than13 Gaussians13 (ie13 lower13 kurtosis13 than13 a13 Gaussian)13 When13 c50 =13 0113 the13 conditional13 response13 distributions13 are13 most13 Gaussian13 on13 average13 For13 c50 0113 the13 distributions13 have13 heavier13 tails13 than13 Gaussians13 (ie13 higher13 kurtosis13 than13 a13 Gaussian)13 (b)13 Powers13 of13 the13 multi-shy‐dimensional13 generalized13 Gaussians13 that13 best13 fit13 a13 subset13 of13 the13 two-shy‐dimensional13 conditional13 filter13 response13 distributions13 A13 power13 of13 two13 (dashed13 line)13 indicates13 that13 the13 best13 fitting13 distribution13 was13 Gaussian13 Powers13 greater13 than13 or13 less13 than13 two13 indicate13 that13 the13 responses13 are13 better13 fit13 by13 distributions13 having13 lighter13 or13 heavier13 tails13 than13 a13 Gaussian13 respectively13 The13 best-shy‐fitting13 powers13 are13 shown13 for13 three13 different13 values13 of13 c50 13 00013 01013 and13 02513 The13 average13 powers13 across13 the13 conditional13 response13 distributions13 for13 the13 three13 tested13 values13 of13 c50 13 were13 27913 20713 and13 11713 respectively13 The13 same13 qualitative13 results13 hold13 for13 the13 other13 two-shy‐dimensional13 response13 distributions13 The13 pattern13 also13 holds13 for13 higher-shy‐dimensional13 filter13 response13 distributions13 In13 higher13 dimensions13 however13 results13 are13 noisier13 because13 stable13 estimates13 of13 kurtosis13 require13 significantly13 more13 data13 13 13 13 13 13

  • jovi-14-02-01_118
    • Introduction
    • f01
    • Methods and results
    • e01
    • f02
    • e02
    • e03
    • e04
    • f03
    • e05
    • f04
    • f05
    • f06
    • f07
    • Discussion
    • Conclusions
    • Methods details
    • e06
    • e07
    • Albrecht1
    • Albrecht2
    • Anzai1
    • Badcock1
    • Banks1
    • Berens1
    • Blakemore1
    • Burge1
    • Burge2
    • Carandini1
    • Chen1
    • Cormack1
    • Cumming1
    • Cumming2
    • Cumming3
    • Curcio1
    • DeValois1
    • DeAngelis1
    • Ferster1
    • Fincham1
    • Fleet1
    • Ganguli1
    • Geisler1
    • Girshick1
    • Haefner1
    • Haefner2
    • Harris1
    • Hecht1
    • Heeger1
    • Hibbard1
    • Hoyer1
    • Jazayeri1
    • Kaye1
    • Landers1
    • Lehky1
    • Levi1
    • Lewicki1
    • Li1
    • Liu1
    • Ma1
    • Marcos1
    • Marr1
    • McKee1
    • Nakayama1
    • Nienborg1
    • Ogle1
    • Ohzawa1
    • Ohzawa2
    • Ohzawa3
    • Olshausen1
    • Panum1
    • Poggio1
    • Qian1
    • Qian2
    • Read1
    • Read2
    • Read3
    • Simoncelli1
    • Skottun1
    • Stevenson1
    • Stockman1
    • Tanabe1
    • Tanabe2
    • Thibos1
    • Tsao1
    • Tyler1
    • Tyler2
    • Vlaskamp1
    • Williams1
    • Wyszecki1
      • JOV-03741-2013-s01
Page 17: Optimal disparity estimation in natural stereo images · The local differences between the retinal images—the binocular disparities— ... vertically oriented receptive fields.The

Ferster D amp Miller K D (2000) Neural mechanismsof orientation selectivity in the visual cortexAnnual Review of Neuroscience 23(1) 441ndash471

Fincham E amp Walton J (1957) The reciprocalactions of accommodation and convergence TheJournal of Physiology 137(3) 488ndash508

Fleet D J Wagner H amp Heeger D J (1996)Neural encoding of binocular disparity Energymodels position shifts and phase shifts VisionResearch 36(12) 1839ndash1857

Ganguli D amp Simoncelli E P (2010) Implicitencoding of prior probabilities in optimal neuralpopulations Advances in Neural Information Pro-cessing Systems (NIPS) 23 658ndash666

Geisler W S Najemnik J amp Ing A D (2009)Optimal stimulus encoders for natural tasksJournal of Vision 9(13)17 1ndash16 httpwwwjournalofvisionorgcontent91317 10116791317 [PubMed] [Article]

Girshick A R Landy M S amp Simoncelli E P(2011) Cardinal rules Visual orientation percep-tion reflects knowledge of environmental statisticsNature Neuroscience 14(7) 926ndash932

Haefner R amp Bethge M (2010) Evaluating neuronalcodes for inference using Fisher informationAdvances in Neural Information Processing Systems23 1ndash9

Haefner R M amp Cumming B G (2008) Adaptationto natural binocular disparities in primate V1explained by a generalized energy model Neuron57(1) 147ndash158

Harris J M McKee S P amp Smallman H S (1997)Fine-scale processing in human binocular stereop-sis Journal of the Optical Society of America AOptics Image Science and Vision 14(8) 1673ndash1683

Hecht S Shlaer S amp Pirenne M H (1942) Energyquanta and vision The Journal of General Physi-ology 25(6) 819ndash840

Heeger D J (1992) Normalization of cell responses incat striate cortex Visual Neuroscience 9(2) 181ndash197

Hibbard P B (2008) Binocular energy responses tonatural images Vision Research 48(12) 1427ndash1439

Hoyer P O amp Hyvarinen A (2000) Independentcomponent analysis applied to feature extractionfrom colour and stereo images Network Compu-tation in Neural Systems 11(3) 191ndash210

Jazayeri M amp Movshon J A (2006) Optimalrepresentation of sensory information by neuralpopulations Nature Neuroscience 9(5) 690ndash696

Kaye M (1978) Stereopsis without binocular corre-lation Vision Research 18(8) 1013ndash1022

Landers D D amp Cormack L K (1997) Asymme-tries and errors in perception of depth fromdisparity suggest a multicomponent model ofdisparity processing Perception amp Psychophysics59(2) 219ndash231

Lehky S R amp Sejnowski T J (1990) Neural modelof stereoacuity and depth interpolation based on adistributed representation of stereo disparityJournal of Neuroscience 10(7) 2281ndash2299

Levi D M McKee S P amp Movshon J A (2011)Visual deficits in anisometropia Vision Research51(1) 48ndash57

Lewicki M S (2002) Efficient coding of naturalsounds Nature Neuroscience 5(4) 356ndash363

Li Z amp Atick J J (1994) Efficient stereo coding inthe multiscale representation Network Computa-tion in Neural Systems 5 157ndash174

Liu Y Bovik A C amp Cormack L K (2008)Disparity statistics in natural scenes Journal ofVision 8(11)19 1ndash14 httpwwwjournalofvisionorgcontent81119 doi10116781119 [PubMed][Article]

Ma W J Beck J M Latham P E amp Pouget A(2006) Bayesian inference with probabilistic pop-ulation codes Nature Neuroscience 9(11) 1432ndash1438

Marcos S amp Burns S A (2000) On the symmetrybetween eyes of wavefront aberration and conedirectionality Vision Research 40(18) 2437ndash2447

Marr D amp Poggio T (1976) Cooperative computa-tion of stereo disparity Science 194(4262) 283ndash287

McKee S P Levi D M amp Bowne S F (1990) Theimprecision of stereopsis Vision Research 30(11)1763ndash1779

Nakayama K amp Shimojo S (1990) Da Vincistereopsis Depth and subjective occluding contoursfrom unpaired image points Vision Research30(11) 1811ndash1825

Nienborg H Bridge H Parker A J amp CummingB G (2004) Receptive field size in V1 neuronslimits acuity for perceiving disparity modulationJournal of Neuroscience 24(9) 2065ndash2076

Ogle K N (1952) On the limits of stereoscopic visionJournal of Experimental Psychology 44(4) 253ndash259

Ohzawa I (1998) Mechanisms of stereoscopic visionThe disparity energy model Current Opinion inNeurobiology 8(4) 509ndash515

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 17

Ohzawa I DeAngelis G C amp Freeman R D(1990) Stereoscopic depth discrimination in thevisual cortex Neurons ideally suited as disparitydetectors Science 249(4972) 1037ndash1041

Ohzawa I DeAngelis G C amp Freeman R D(1997) Encoding of binocular disparity by complexcells in the catrsquos visual cortex Journal of Neuro-physiology 77(6) 2879ndash2909

Olshausen B A amp Field D J (1996) Emergence ofsimple-cell receptive field properties by learning asparse code for natural images Nature 381(6583)607ndash609

Panum P (1858) Physiologische Untersuchungen uberdas Sehen mit zwei Augen [Translation Psycholog-ical investigations on seeing with two eyes] KielSchwerssche Buchandlung

Poggio G F Gonzalez F amp Krause F (1988)Stereoscopic mechanisms in monkey visual cortexBinocular correlation and disparity selectivityJournal of Neuroscience 8(12) 4531ndash4550

Qian N (1997) Binocular disparity and the perceptionof depth Neuron 18(3) 359ndash368

Qian N amp Zhu Y (1997) Physiological computationof binocular disparity Vision Research 37(13)1811ndash1827

Read J C A amp Cumming B G (2003) Testingquantitative models of binocular disparity selectiv-ity in primary visual cortex Journal of Neurophys-iology 90(5) 2795ndash2817

Read J C A amp Cumming B G (2007) Sensors forimpossible stimuli may solve the stereo correspon-dence problem Nature Neuroscience 10(10) 1322ndash1328

Read J C A Parker A J amp Cumming B G (2002)A simple model accounts for the response ofdisparity-tuned V1 neurons to anticorrelated im-ages Visual Neuroscience 19(6) 735ndash753

Simoncelli E P amp Olshausen B A (2001) Naturalimage statistics and neural representation AnnualReview of Neuroscience 24 1193ndash1216

Skottun B C De Valois R L Grosof D HMovshon J A Albrecht D G amp Bonds A B(1991) Classifying simple and complex cells on the

basis of response modulation Vision Research31(7-8) 1079ndash1086

Stevenson S B Cormack L K Schor C M ampTyler C W (1992) Disparity tuning in mecha-nisms of human stereopsis Vision Research 32(9)1685ndash1694

Stockman A amp Sharpe L T (2000) The spectralsensitivities of the middle- and long-wavelength-sensitive cones derived from measurements inobservers of known genotype Vision Research40(13) 1711ndash1737

Tanabe S amp Cumming B G (2008) Mechanismsunderlying the transformation of disparity signalsfrom V1 to V2 in the macaque Journal ofNeuroscience 28(44) 11304ndash11314

Tanabe S Haefner R M amp Cumming B G (2011)Suppressive mechanisms in monkey V1 help tosolve the stereo correspondence problem Journal ofNeuroscience 31(22) 8295ndash8305

Thibos L N Ye M Zhang X amp Bradley A(1992) The chromatic eye A new reduced-eyemodel of ocular chromatic aberration in humansApplied Optics 31(19) 3594ndash3600

Tsao D Y Conway B R amp Livingstone M S(2003) Receptive fields of disparity-tuned simplecells in macaque V1 Neuron 38(1) 103ndash114

Tyler C W (1975) Spatial organization of binoculardisparity sensitivity Vision Research 15(5) 583ndash590

Tyler C W amp Julesz B (1978) Binocular cross-correlation in time and space Vision Research18(1) 101ndash105

Vlaskamp B N S Yoon G amp Banks M S (2011)Human stereopsis is not limited by the optics of thewell-focused eye Journal of Neuroscience 31(27)9814ndash9818

Williams D R (1985) Visibility of interference fringesnear the resolution limit Journal of the OpticalSociety of America A 2(7) 1087ndash1093

Wyszecki G amp Stiles W (1982) Color scienceConcepts and methods quantitative data and for-mulas New York John Wiley amp Sons

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 18

113 13

Optimal13 disparity13 estimation13 in13 natural13 stereo-shy‐images13 (Supplement)13

Johannes13 Burge13 amp13 Wilson13 S13 Geisler13 13 Computing13 the13 log13 likelihood13 with13 populations13 of13 simple13 and13 complex13 cells13 13 Simple13 cells13 in13 visual13 cortex13 produce13 only13 positive13 responses13 whereas13 AMA13 filters13 produce13 both13 positive13 and13 negative13 responses13 The13 linear13 AMA13 filter13 responses13 (see13 first13 term13 in13 Eq13 5)13 and13 their13 pairwise13 sums13 (see13 the13 sum13 in13 the13 third13 termrsquos13 parentheses)13 can13 be13 obtained13 from13 a13 population13 of13 simple13 cells13 We13 define13 simple13 cells13 as13 units13 whose13 outputs13 result13 from13 linear13 filtering13 followed13 by13 half-shy‐wave13 rectification13 (ie13 thresholding13 at13 zero)13 Each13 AMA13 filter13 response13 in13 the13 first13 term13 can13 be13 obtained13 from13 two13 simple13 cells13 that13 are13 lsquoonrsquo13 and13 lsquooffrsquo13 versions13 of13 each13 AMA13 filter13 (Fig13 S3a13 see13 Fig13 3)13 Each13 pairwise-shy‐summed13 AMA13 filter13 response13 in13 the13 third13 term13 can13 be13 obtained13 from13 two13 simple13 cells13 that13 are13 lsquoonrsquo13 and13 lsquooffrsquo13 versions13 of13 each13 pairwise-shy‐summed13 filter13 (Fig13 S3a13 S4)13 13 13 The13 squared13 AMA13 filter13 responses13 (see13 the13 second13 and13 third13 terms13 in13 Eq13 5)13 can13 be13 obtained13 from13 a13 population13 of13 complex13 cells13 We13 define13 complex13 cells13 as13 units13 whose13 responses13 result13 from13 linear13 filtering13 followed13 by13 a13 squaring13 non-shy‐linearity13 as13 is13 often13 the13 case13 in13 cortex13 Complex13 cells13 can13 be13 implemented13 by13 summing13 and13 then13 squaring13 (or13 squaring13 and13 then13 summing)13 the13 outputs13 of13 lsquoonrsquo13 and13 lsquooffrsquo13 simple13 cells13 (Fig13 S3b13 the13 disparity13 tuning13 curves13 of13 these13 complex13 cells13 are13 given13 in13 Fig13 S5)13 13 Finally13 neurons13 whose13 response13 rates13 represent13 the13 log13 likelihood13 of13 the13 AMA13 filter13 population13 responses13 can13 be13 constructed13 via13 a13 weighted13 sum13 of13 simple13 and13 complex13 cell13 population13 responses13 (Eq13 513 Fig13 613 Fig13 7)13 (Note13 that13 the13 likelihood13 can13 be13 obtained13 by13 exponentiating13 the13 log-shy‐likelihood13 responses)13 In13 other13 words13 a13 large13 collection13 of13 log-shy‐likelihood13 (LL)13 neurons13 each13 with13 a13 different13 preferred13 disparity13 can13 be13 constructed13 from13 a13 fixed13 set13 of13 simple13 and13 complex13 cells13 simply13 by13 changing13 the13 weights13 13 13 Derivation13 of13 optimal13 weights13 on13 simple13 and13 complex13 cells13 The13 first13 term13 in13 Eq13 513 (ie13 the13 linear13 term)13 can13 be13 conceptualized13 as13 a13 weighted13 sum13 of13 simple13 cell13 population13 responses13 The13 second13 and13 third13 terms13 (ie13 the13 quadratic13 terms)13 can13 be13 conceptualized13 as13 weighted13 sums13 of13 complex13 cell13 population13 responses13 The13 constant13 corresponds13 to13 baseline13 response13 The13 constant13 and13 the13 weights13 on13 the13 linear13 squared13 and13 sum-shy‐squared13 filters13 are13 given13 by13

wik = Ck

minus1uk 13 13 13 13 13 13 13 13 13 (S1a)13

13 wiik = minusdiag(Cminus1)+ 05Cminus11 13 13 13 13 13 13 13 (S1b)13

wij k = minus05Cij

minus1 forallij j gt i 13 13 13 13 13 13 13 13 (S1c)13

cons primetk = minus05ukTCk

minus1uk + constk 13 13 13 13 13 13 (S1d)13 13 where13 I13 is13 the13 identity13 matrix13 113 is13 the13 lsquoonesrsquo13 vector13 and13 diag()13 sets13 a13 matrix13 diagonal13 to13 a13 vector13 Here13 we13 derive13 these13 weight13 equations13 based13 on13 the13 multi-shy‐dimensional13 Gaussian13 approximations13 to13 the13 empirically13 determined13 filter13 response13 distributions13 (ie13 conditional13 response13 distributions)13 (see13 Fig13 4a)13 13 13 13 13 13

213 13

First13 we13 rewrite13 Eq13 513 in13 the13 main13 text13 by13 expanding13 its13 third13 term13 13

ln p R δ k( ) = wik Ri

i=1

n

sum + wiik Ri2

i=1

n

sum + wij k Ri2 + 2Ri Rj + Rj

2( )j=i+1

n

sumi=1

nminus1

sum + cons primet 13 (S2)13

13 Second13 we13 rewrite13 Eq13 413 in13 the13 main13 text13

13

ln p R δ k( ) = minus05 R minus uk( )T Ak R minus uk( )+ const 13 13 13 13 13 (S3)13

13 where13 Ak 13 equals13 the13 inverse13 covariance13 matrix13 Ck

minus1 13 (Recall13 that13 the13 mean13 vector13 uk and13 covariance13 matrix13 Ck are13 obtained13 by13 projecting13 a13 collection13 of13 natural13 stereo-shy‐images13 having13 disparity13 δ k 13 onto13 the13 filters)13 From13 here13 forward13 the13 disparity13 index13 k13 is13 dropped13 for13 notational13 simplicity13 Third13 we13 set13 Eqs13 S213 and13 S313 equal13 to13 one13 another13 multiply13 through13 and13 collect13 terms13 The13 terms13 of13 the13 log13 likelihood13 (Eq13 S3)13 are13 found13 to13 have13 the13 form13 13

( ) ( )2 2 205 05 2ii i i ii i ii i i ii ia R u a R a Ru a uminus minus = minus minus + 13 13 13 13 (S4a)13

minusaij Ri minus ui( ) Rj minus u j( ) = minus aijRiRj minus aijRiu j minus aijRjui + aiju jui( ) 13 13 13 (S4b)13

13 where13 iia 13 are13 the13 on-shy‐diagonal13 elements13 of13 A 13 ija are13 the13 off-shy‐diagonal13 elements13 and13 iu 13 the13 are13 the13 mean13 filter13 responses13 13 13 The13 aim13 is13 now13 to13 express13 wi wii 13 and13 wij 13 in13 terms13 of13 iu 13 iia 13 and13 ija 13 First13 consider13 wi 13 In13 this13 case13

we13 want13 to13 collect13 terms13 containing13 iR 13 wiRi 13 from13 Eq13 S213 and13 aiiRiui 13 and13 aijRiu j 13 from13 Eq13 S4a13 Setting13 the13 iR 13 terms13 equal13 and13 canceling13 gives13 13

wi = aiju jj=1

n

sum 13 13 13 13 13 13 13 13 13 (S5a)13

13 Note13 that13 the13 weights13 on13 the13 linear13 filter13 responses13 wi 13 are13 zero13 if13 the13 mean13 linear13 filter13 responses13 ui 13 are13 zero13 In13 our13 case13 the13 filter13 response13 distributions13 means13 (ie13 the13 means13 of13 the13 conditional13 response13 distributions)13 are13 all13 near13 zero13 (see13 Fig13 4a13 Fig13 S2)13 Thus13 the13 weights13 on13 the13 simple13 cell13 responses13 (Fig13 6a)13 contribute13 little13 to13 the13 units13 (LL13 neurons)13 that13 respond13 according13 to13 the13 log13 likelihood13 of13 the13 linear13 filter13 population13 response13 (see13 Discussion)13 13 Next13 consider13 wij 13 i ne j 13 In13 this13 case13 we13 want13 to13 collect13 terms13 containing13 RiRj 13 2wijRiRj 13 appears13

in13 Eq13 S213 and13 minusaijRiRj 13 appears13 in13 Eq13 S4b13 Setting13 the13 two13 RiRj 13 terms13 equal13 to13 each13 other13 and13 canceling13 gives13 13

wij = minus05aij 13 13 13 13 13 13 13 13 13 (S5b)13

313 13

Next13 consider13 wii 13 In13 this13 case13 the13 total13 weight13 on13 2iR 13 in13 Eq13 S213 must13 be13 05 iiaminus 13 The13 weight13 from13

the13 third13 term13 of13 Eq13 S213 can13 be13 found13 by13 substituting13 wij = minus05aij 13 Subtracting13 that13 weight13 from13 the13

total13 weight13 specified13 by13 Eq13 S413 gives13 wii 13 13

wii = minusaii + 05 aijjsum 13 13 13 13 13 13 13 13 13 (S5c)13

13 Last13 by13 grouping13 terms13 from13 Eq13 S413 that13 are13 independent13 of13 the13 filter13 response13 we13 find13 the13 constant13 in13 Eq13 S213 is13 given13 by13 13

cons primet = minus05 aiimicroi2 + aijmicroimicro j

j foralljneisum

⎝⎜⎞

⎠⎟isum + const 13 13 13 13 (S5d)13

13 Eqs13 S1a-shy‐d13 are13 obtained13 by13 rewriting13 Eqs13 S5a-shy‐d13 in13 matrix13 notation13 13 13 In13 visual13 cortex13 simple13 and13 complex13 cells13 have13 the13 same13 maximum13 firing13 rates13 on13 average13 (Albrecht13 amp13 Hamilton13 1982)13 However13 the13 maximum13 possible13 response13 of13 the13 squared13 filters13 (see13 the13 second13 term13 in13 Eq13 5)13 is13 R2max 13 and13 the13 maximum13 possible13 response13 of13 the13 sum-shy‐squared13 filters13 (see13 the13 third13

term13 in13 Eq13 5)13 is13 given13 by13 R2ijmax = Rmax 2+ 2fi sdot f j( )2 13 where13 fi 13 and13 f j 13 are13 the13 weighting13 functions13 defining13 filters13 i13 and13 j13 (see13 Fig13 3a)13 Thus13 if13 these13 computations13 were13 implemented13 in13 cortexmdashthat13 is13 if13 the13 maximum13 response13 of13 the13 complex13 cells13 equaled13 the13 maximum13 response13 of13 the13 simple13 cells13 Rmax 13 rather13 than13 R

2max 13 and13 R

2ijmaxmdashthe13 weights13 on13 the13 complex13 cell13 responses13 must13 be13 scaled13

Thus13 the13 optimal13 weights13 on13 complex13 cells13 are13 given13 by13 primewi = wi 13 primewii = Rmaxwii 13 and13 primewij =Rijmax2

Rmaxwij 13

These13 scale13 factors13 are13 important13 because13 in13 a13 neurophysiological13 experiment13 one13 would13 measure13 primew 13 not13 w 13 Consequently13 primewii 13 and13 primewij 13 are13 the13 normalized13 weights13 plotted13 in13 Fig13 7b13

13 Evaluating13 the13 effect13 of13 surface13 slant13 Slant13 causes13 disparity13 to13 change13 across13 the13 patch13 We13 evaluated13 the13 effect13 of13 surface13 slant13 by13 determining13 the13 optimal13 filters13 and13 estimation13 performance13 for13 a13 training13 set13 with13 a13 distribution13 of13 slants13 If13 we13 assume13 (plausibly)13 that13 objects13 are13 on13 average13 equally13 likely13 to13 be13 viewed13 from13 any13 given13 angle13 then13 for13 a13 given13 small13 angular13 retinal13 extent13 the13 probability13 of13 a13 given13 slant13 about13 the13 vertical13 axis13 should13 fall13 as13 a13 half-shy‐cosine13 function13 Slant13 was13 calculated13 for13 the13 cyclopean13 eye13 To13 evaluate13 the13 effect13 of13 surface13 slant13 variation13 we13 generated13 a13 training13 set13 of13 stereo-shy‐images13 from13 a13 set13 of13 surfaces13 with13 slants13 that13 were13 randomly13 drawn13 from13 a13 truncated13 (+7113 deg)13 cosine13 probability13 density13 (more13 extreme13 slants13 were13 not13 practical13 to13 include13 because13 of13 the13 surface13 size13 required13 to13 produce13 a13 projected13 patch13 width13 of13 113 deg)13 The13 within-shy‐patch13 changes13 in13 disparity13 that13 were13 introduced13 by13 these13 slants13 are13 comparable13 to13 those13 in13 natural13 viewing13 (Fig13 S1g)13 Results13 based13 on13 this13 analysis13 are13 thus13 likely13 to13 be13 representative13 Then13 we13 determined13 the13 optimal13 binocular13 filters13 for13 this13 new13 training13 set13 The13 filters13 that13 were13 trained13 on13 surfaces13 having13 different13 slants13 (Fig13 S1a-shy‐c)13 and13 the13 LL13 neurons13 that13 are13 constructed13 from13 them13 (Fig13 S1de)13 are13 quite13 similar13 to13 those13 that13 were13 trained13 only13 on13 fronto-shy‐parallel13 surfaces13 The13 main13 (although13 slight)13 differences13 are13 that13 the13 new13 filters13 have13 spatial13 extents13 that13 are13 slightly13 smaller13 and13 octave13 bandwidths13 that13 are13 slightly13 higher13

413 13

and13 that13 the13 new13 LL13 neuron13 tuning13 curves13 are13 broader13 Disparity13 estimation13 performance13 on13 slant13 surfaces13 is13 also13 comparable13 but13 precision13 is13 somewhat13 reduced13 (Fig13 S1f13 Fig13 5d)13 When13 disparity13 changes13 across13 the13 patch13 the13 optimal13 encoding13 filters13 must13 be13 small13 enough13 to13 prevent13 the13 disparity13 signal13 from13 being13 lsquosmeared13 outrsquo13 by13 within-shy‐patch13 (ie13 within-shy‐receptive13 field)13 disparity13 variation13 And13 they13 must13 be13 large13 enough13 to13 prevent13 a13 poor13 signal-shy‐to-shy‐noise13 ratio13 which13 would13 lead13 to13 a13 preponderance13 of13 false13 matches13 (ie13 inaccurate13 disparity13 estimates)13 The13 optimal13 filters13 appear13 to13 strike13 this13 balance13 13 13 Evaluating13 the13 effect13 of13 depth13 variations13 other13 than13 slant13 Naturally13 occurring13 fine13 depth13 structure13 and13 occlusions13 (ie13 within-shy‐patch13 depth13 variation13 and13 depth13 discontinuities13 Fig13 S8a-shy‐c)13 were13 not13 present13 in13 either13 of13 our13 training13 sets13 To13 determine13 how13 the13 magnitude13 of13 these13 other13 sources13 of13 depth13 variation13 compare13 to13 that13 due13 to13 slant13 we13 analyzed13 4000013 113 deg13 patches13 from13 3013 outdoor13 range13 images13 obtained13 with13 a13 high-shy‐precision13 Riegl13 VZ40013 range13 scanner13 For13 each13 patch13 we13 computed13 the13 standard13 deviation13 of13 i)13 the13 differences13 between13 the13 raw13 range13 values13 and13 the13 best-shy‐fitting13 fronto-shy‐parallel13 plane13

σ nonminus planar 13 and13 ii)13 the13 differences13

between13 the13 best-shy‐fitting13 slanted13 and13 fronto-shy‐parallel13 planes13 σ planar 13 The13 ratio13 of13 these13 standard13

deviations13 σ planar σ nonminus planar 13 provides13 a13 measure13 of13 the13 relative13 magnitude13 of13 depth13 variation13 due13 to13

slant13 and13 variation13 due13 to13 other13 factors13 If13 the13 depth13 variation13 is13 due13 only13 to13 surface13 slant13 then13 the13 ratio13

σ planar σ nonminus planar 13 will13 have13 a13 value13 equal13 to13 1013 If13 all13 of13 the13 depth13 variation13 is13 due13 to13 factors13

other13 than13 slant13 the13 ratio13 will13 have13 a13 value13 of13 0013 Fig13 S8d13 plots13 the13 distribution13 of13 this13 ratio13 for13 all13 range13 patches13 In13 the13 most13 common13 patches13 essentially13 all13 the13 variation13 is13 captured13 by13 slant13 (ie13 the13 modal13 ratio13 is13 ~10)13 The13 median13 ratio13 is13 07513 indicating13 that13 in13 the13 median13 patch13 7513 of13 the13 depth13 variation13 is13 captured13 by13 slant13 Given13 that13 the13 optimal13 binocular13 filters13 are13 robust13 to13 variation13 in13 surface13 slant13 and13 given13 that13 most13 of13 natural13 depth13 variation13 is13 captured13 by13 slant13 it13 is13 unlikely13 that13 depth13 variation13 other13 than13 slant13 will13 qualitatively13 affect13 our13 results13 13 External13 variability13 and13 neural13 noise13 13 We13 examined13 the13 relative13 magnitude13 of13 stimulus-shy‐induced13 LL13 neuron13 variability13 and13 typical13 neural13 noise13 in13 cortex13 Typical13 cortical13 neurons13 have13 a13 peak13 mean13 response13 larger13 than13 3013 spikessec13 (Geisler13 amp13 Albrecht13 1997)13 and13 intrinsic13 noise13 variance13 proportional13 to13 the13 neuronrsquos13 mean13 response13 with13 a13 proportionality13 constant13 (ie13 Fano13 factor)13 less13 than13 15(Tolhurst13 Movshon13 amp13 Dean13 1983)13 To13 err13 on13 the13 side13 of13 too13 much13 neural13 noise13 we13 assume13 that13 each13 LL13 neuron13 has13 a13 maximum13 average13 response13 of13 3013 spkss13 and13 a13 Fano13 factor13 of13 1513 (bc13 the13 effect13 of13 noise13 is13 greater13 at13 lower13 firing13 rates)13 We13 convert13 the13 LL13 neuron13 responses13 to13 spike13 rates13 by13 first13 obtaining13 the13 average13 response13 of13 each13 LL13 neuron13 to13 its13 preferred13 disparity13 and13 then13 scaling13 that13 value13 to13 the13 maximum13 average13 response13 For13 an13 integration13 time13 of13 a13 typical13 fixation13 duration13 (30013 ms)13 stimulus-shy‐induced13 response13 variance13 was13 ~3x13 greater13 than13 the13 internal13 noise13 For13 an13 integration13 time13 equal13 to13 a13 typical13 stimulus13 presentation13 in13 a13 psychophysical13 task13 (113 sec)13 13 stimulus-shy‐induced13 response13 variance13 was13 ~9x13 greater13 than13 the13 internal13 noise13 Thus13 external13 variability13 must13 be13 considered13 to13 understand13 the13 influence13 of13 neural13 noise13 on13 the13 behavioral13 limits13 of13 vision13 under13 natural13 conditions13 In13 fact13 stimulus-shy‐induced13 variability13 is13 likely13 to13 be13 the13 controlling13 variability13 in13 many13 natural13 single-shy‐trial13 tasks13 13 Spatial13 frequency13 tuning13 of13 optimal13 linear13 binocular13 filters13 13 To13 develop13 an13 intuition13 for13 why13 spatial13 frequencies13 higher13 than13 ~613 cpd13 carry13 little13 information13 about13 disparity13 we13 examined13 the13 disparity13 signals13 resulting13 from13 a13 binocularly-shy‐viewed13 high-shy‐contrast13 cosine-shy‐windowed13 edge13 The13 Fourier-shy‐decomposition13 of13 each13 eyersquos13 stimulus13 is13 a13 set13 of13

513 13

phase-shy‐aligned13 sinewave13 gratings13 with13 the13 1f13 contrast13 fall-shy‐off13 typical13 of13 natural13 scenes(Field13 1987)13 The13 binocular13 difference13 signal13 (ie13 the13 difference13 of13 the13 left13 and13 right13 signals)13 is13 sinusoidal13 with13 contrast13 amplitude13 given13 by13 13

AB f δ k( ) = AL( f )2 + AR( f )2 minus 2AL( f )AR( f )cos 2π fδ k( ) 13 13 13 (S6)13

13 where f 13 is13 the13 spatial13 frequency13 δ k 13 is13 a13 particular13 disparity13 AL ( f ) 13 and13 AR( f ) 13 are13 the13 left13 and13

right13 eye13 retinal13 amplitudes13 and13 AB f δ k( ) 13 is13 the13 amplitude13 of13 the13 binocular13 contrast13 difference13 signal13 If13 the13 two13 eyes13 have13 identical13 optics13 the13 left-shy‐13 and13 right-shy‐eye13 retinal13 amplitudes13 (which13 include13 the13 effects13 of13 the13 optics13 and13 the13 1 f 13 falloff13 of13 natural13 image13 spectra)13 will13 be13 the13 same13 for13 a13 given13 disparity13 Fig13 S7a13 plots13 pattern13 of13 binocular13 contrast13 signals13 for13 each13 of13 several13 disparities13 Fig13 S7b13 shows13 the13 signals13 after13 filtering13 with13 a13 bank13 of13 log-shy‐Gabor13 filters13 each13 having13 a13 1513 octave13 bandwidth13 The13 patterns13 of13 binocular13 contrast13 barely13 differ13 above13 ~613 cpd13 A13 filter13 population13 (for13 disparity13 estimation)13 will13 be13 most13 efficient13 if13 each13 filter13 signals13 information13 about13 a13 range13 of13 disparities13 13 13 V113 binocular13 receptive13 fields13 tuned13 to13 the13 highest13 useful13 spatial13 frequency13 (~613 cpd)13 would13 have13 a13 spatial13 extent13 of13 ~813 arcmin13 assuming13 the13 typical13 1513 octave13 bandwidth13 of13 neurons13 in13 cortex13 Given13 that13 receptive13 fields13 cannot13 signal13 disparity13 variation13 finer13 than13 their13 own13 width13 this13 analysis13 may13 help13 explain13 why13 humans13 lose13 the13 ability13 to13 detect13 disparity13 modulations13 (ie13 sinusoidal13 modulations13 in13 disparity-shy‐defined13 depth)13 at13 spatial13 frequencies13 higher13 than13 ~413 cpd13 (Banks13 Gepshtein13 amp13 Landy13 200413 Harris13 McKee13 amp13 Smallman13 1997)13 Vergence13 eye13 movement13 jitter13 has13 previously13 been13 proposed13 as13 an13 explanation13 for13 why13 useful13 binocular13 information13 is13 restricted13 to13 low13 spatial13 frequencies13 (Vlaskamp13 Yoon13 amp13 Banks13 2011)13 vergence13 noise13 can13 degrade13 stereo13 information13 enough13 to13 render13 high13 frequency13 signals13 unmeasurable13 (Fig13 S7cd)13 The13 analysis13 presented13 here13 suggests13 that13 the13 frequency13 at13 which13 humans13 lose13 the13 ability13 to13 detect13 disparity13 modulation13 may13 simply13 be13 explained13 by13 the13 statistics13 of13 natural13 images13 13 Optics13 The13 retinal13 images13 used13 in13 our13 analysis13 were13 simulated13 to13 match13 the13 retinal13 image13 formation13 that13 occurs13 in13 the13 human13 eye13 In13 humans13 and13 non-shy‐human13 primates13 accommodative13 and13 vergence13 eye-shy‐movement13 systems13 are13 yoked(Fincham13 amp13 Walton13 1957)13 Consistent13 with13 this13 fact13 we13 set13 the13 focus13 distance13 equal13 to13 the13 fixation13 distance13 of13 4013 cm13 and13 defocused13 the13 images13 appropriate13 for13 each13 disparity13 The13 image13 of13 a13 surface13 displaced13 from13 the13 fixation13 is13 defocused13 by13

13 ΔD cong δ IPD 13 13 13 13 13 13 13 13 13 13 (S7)13

13 where13 δ 13 is13 the13 disparity13 of13 the13 surface13 expressed13 in13 radians13 IPD13 is13 the13 inter-shy‐pupillary13 distance13 expressed13 in13 meters13 and13 ΔD 13 is13 the13 defocus13 expressed13 in13 diopters13 (1meters)13 To13 account13 for13 the13 effect13 of13 chromatic13 aberration13 we13 created13 a13 different13 polychromatic13 point-shy‐spread13 function13 (PSF)13 for13 each13 disparity13 (ie13 defocus13 Eq13 S7)13 Single-shy‐wavelength13 PSFs13 were13 computed13 every13 1013 nm13 between13 40013 and13 70013 nm13 The13 wavelength-shy‐dependent13 change13 in13 refractive13 power13 of13 the13 human13 eye13 was13 taken13 from13 the13 literature13 (Thibos13 Ye13 Zhang13 amp13 Bradley13 1992)13 Polychromatic13 PSFs13 were13 obtained13 by13 weighting13 the13 single-shy‐wavelength13 PSFs13 by13 the13 photopic13 sensitivity13 function13 (Stockman13 amp13 Sharpe13 2000)13 and13 by13 the13 D6513 daylight13 illumination13 spectrum13 and13 then13 summing13 13

613 13

13 psf photopic xΔD( ) = 1

Kpsf xλΔD( )sc

λsum λ( )D65 λ( ) 13 13 13 13 (S8)13

13 where13 sc λ( ) 13 is13 the13 human13 photopic13 sensitivity13 function13 and13 K13 is13 a13 normalizing13 constant13 that13 sets13 the13 volume13 of13 psf photopic 13 to13 1013 The13 modulation13 transfer13 function13 (MTF)13 is13 the13 amplitude13 of13 the13 Fourier13 transform13 of13 the13 PSF13 13 Contrast13 normalization13 Contrast13 normalization13 occurs13 early13 in13 visual13 processing13 The13 standard13 model13 of13 cortical13 neuron13 response13 assumes13 that13 the13 input13 is13 the13 contrast13 signal13

c x( ) = r x( )minus r( ) r 13 where13 r x( ) 13 are13 the13

noisy13 sensor13 responses13 to13 the13 luminance13 image13 falling13 on13 the13 retina13 over13 a13 local13 area13 and13 r 13 is13 the13 local13 mean13 This13 contrast13 signal13 is13 then13 normalized13 by13 the13 local13 contrast13

cnorm x( ) = c x( ) c x( ) 2

+ nc502 13 where13 n13 is13 the13 dimensionality13 of13 the13 vector13 c x( ) 2 13 is13 the13 contrast13

energy13 (or13 equivalently13 n 13 times13 the13 squared13 RMS13 contrast)13 and13 50c 13 is13 the13 half-shy‐saturation13 constant13 (Albrecht13 amp13 Geisler13 199113 Albrecht13 amp13 Hamilton13 198213 Heeger13 1992)13 (We13 are13 agnostic13 about13 the13 mechanism13 by13 which13 the13 contrast13 normalization13 is13 achieved)13 Finally13 the13 normalized13 contrast13 signal13 is13 weighted13 by13 the13 cortical13 neuronrsquos13 receptive13 field13 and13 passed13 through13 a13 static13 non-shy‐linearity13 For13 example13 in13 the13 case13 of13 a13 binocular13 simple13 cell13 the13 response13 of13 the13 neuron13 is13 obtained13 by13 taking13 the13 dot13 product13 of13 the13 contrast-shy‐normalized13 signal13 with13 the13 receptive13 field13 and13 half-shy‐wave13 rectifying13

Ri = Rmax cnorm x( ) sdot fi x( )⎢⎣ ⎥⎦ 13 13

13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13

713 13

13 References13 Albrecht13 D13 G13 amp13 Geisler13 W13 S13 (1991)13 Motion13 selectivity13 and13 the13 contrast-shy‐response13 function13 of13

simple13 cells13 in13 the13 visual13 cortex13 Visual13 Neuroscience13 7(6)13 531ndash54613 Albrecht13 D13 G13 amp13 Hamilton13 D13 B13 (1982)13 Striate13 cortex13 of13 monkey13 and13 cat13 contrast13 response13

function13 Journal13 of13 Neurophysiology13 48(1)13 217ndash23713 Banks13 M13 S13 Gepshtein13 S13 amp13 Landy13 M13 S13 (2004)13 Why13 is13 spatial13 stereoresolution13 so13 low13 The13

Journal13 of13 Neuroscience13 24(9)13 2077ndash208913 13 Field13 D13 J13 (1987)13 Relations13 between13 the13 statistics13 of13 natural13 images13 and13 the13 response13 properties13 of13

cortical13 cells13 Journal13 of13 the13 Optical13 Society13 of13 America13 A13 4(12)13 2379ndash239413 Fincham13 E13 amp13 Walton13 J13 (1957)13 The13 reciprocal13 actions13 of13 accommodation13 and13 convergence13 The13

Journal13 of13 Physiology13 137(3)13 488ndash50813 Geisler13 W13 S13 amp13 Albrecht13 D13 G13 (1997)13 Visual13 cortex13 neurons13 in13 monkeys13 and13 cats13 detection13

discrimination13 and13 identification13 Visual13 Neuroscience13 14(5)13 897ndash91913 Harris13 J13 M13 McKee13 S13 P13 amp13 Smallman13 H13 S13 (1997)13 Fine-shy‐scale13 processing13 in13 human13 binocular13

stereopsis13 Journal13 of13 the13 Optical13 Society13 of13 America13 A13 14(8)13 1673ndash168313 Heeger13 D13 J13 (1992)13 Normalization13 of13 cell13 responses13 in13 cat13 striate13 cortex13 Visual13 Neuroscience13 9(2)13

181ndash19713 Hibbard13 P13 B13 (2008)13 Binocular13 energy13 responses13 to13 natural13 images13 Vision13 Research13 48(12)13 1427ndash

143913 13 Stockman13 A13 amp13 Sharpe13 L13 T13 (2000)13 The13 spectral13 sensitivities13 of13 the13 middle-shy‐13 and13 long-shy‐wavelength-shy‐

sensitive13 cones13 derived13 from13 measurements13 in13 observers13 of13 known13 genotype13 Vision13 Research13 40(13)13 1711ndash173713

Thibos13 L13 N13 Ye13 M13 Zhang13 X13 amp13 Bradley13 A13 (1992)13 The13 chromatic13 eye13 a13 new13 reduced-shy‐eye13 model13 of13 ocular13 chromatic13 aberration13 in13 humans13 Applied13 Optics13 31(19)13 3594ndash360013

Tolhurst13 D13 J13 Movshon13 J13 A13 amp13 Dean13 A13 F13 (1983)13 The13 statistical13 reliability13 of13 signals13 in13 single13 neurons13 in13 cat13 and13 monkey13 visual13 cortex13 Vision13 Research13 23(8)13 775ndash78513

Vlaskamp13 B13 N13 S13 Yoon13 G13 amp13 Banks13 M13 S13 (2011)13 Human13 stereopsis13 is13 not13 limited13 by13 the13 optics13 of13 the13 well-shy‐focused13 eye13 The13 Journal13 of13 Neuroscience13 31(27)13 9814ndash981813

13

813 13

13 Figure13 S113 Optimal13 filters13 and13 disparity13 estimation13 performance13 for13 estimating13 disparity13 for13 surfaces13 with13 a13 cosine13 distribution13 of13 slants13 (a)13 Optimal13 linear13 binocular13 filters13 for13 estimating13 the13 disparity13 (of13 the13 center13 pixel)13 of13 slanted13 surfaces13 The13 filters13 were13 learned13 on13 a13 training13 set13 having13 surfaces13 with13 a13 cosine13 distribution13 of13 slants13 The13 optimal13 filter13 shapes13 are13 robust13 to13 the13 presence13 of13 non-shy‐zero13 surface13 slants13 in13 the13 training13 set13 However13 these13 filters13 are13 of13 somewhat13 smaller13 spatial13 extent13 and13 slightly13 higher13 octave13 bandwidth13 (compare13 filters13 F3-shy‐F513 to13 those13 in13 Fig13 3)13 (b)13 Spatial13 frequency13 tuning13 and13 bandwidth13 of13 filters13 (c)13 Two-shy‐dimensional13 analogs13 of13 filters13 in13 (a)13 The13 two-shy‐dimensional13 filters13 respond13 virtually13 identically13 to13 two-shy‐dimensional13 images13 as13 do13 the13 one-shy‐dimensional13 filters13 to13 one-shy‐dimensional13 signals13 (d)13 Tuning13 curves13 of13 LL13 neurons13 trained13 and13 tested13 on13 surfaces13 with13 varying13 surface13 slant13 (e)13 Normalized13 weights13 on13 model13 complex13 cell13 responses13 for13 constructing13 five13 LL13 neurons13 with13 the13 indicated13 preferred13 disparities13 (f)13 Disparity13 estimation13 performance13 with13 surfaces13 having13 a13 cosine13 distribution13 of13 slants13 (black13 line)13 Also13 shown13 are13 disparity13 estimates13 derived13 from13 a13 local13 cross-shy‐correlator13 (gray13 line)13 (g)13 Disparity13 gradients13 from13 viewing13 natural13 scenes(Hibbard13 2008)13 vs13 disparity13 gradients13 in13 our13 training13 set13 The13 relationship13 between13 slant13 and13 disparity13 gradient13 is13 given13 by13

g asymp IPD tanθ( ) d 13 where13 θ 13 is13 the13 surface13 slant13 and13 d13 is13 the13 distance13 to13 the13 surface13 Hibbard13 (2008)13 estimated13 disparity13 gradients13 in13 natural13 viewing13 for13 distances13 nearer13 than13 3513 m13 (dashed13 curve)13 and13 0513 m13 (dotted13 curve)13 At13 the13 04m13 viewing13 distance13 assumed13 throughout13 the13 paper13 our13 manipulation13 of13 surface13 slant13 introduced13 distribution13 of13 disparity13 gradients13 (black13 line)13 that13 is13 comparable13 to13 the13 gradients13 introduced13 in13 natural13 viewing13 13

913 13

13 Figure13 S213 Filter13 responses13 LL13 neuron13 responses13 and13 performance13 with13 random-shy‐dot13 stereograms13 and13 anti-shy‐correlated13 random-shy‐dot13 stereograms13 (a)13 Joint13 response13 distributions13 of13 filters13 F113 and13 F213 when13 stimulated13 with13 random-shy‐dot13 stereograms13 Compare13 to13 natural13 stimuli13 in13 Fig13 4a13 (b)13 Joint13 responses13 of13 filters13 F313 and13 F413 when13 stimulated13 with13 random-shy‐dot13 stereograms13 (c)13 LL13 neuron13 responses13 to13 random13 dot13 stereograms13 (d)13 LL13 neuron13 tuning13 vs13 bandwidth13 when13 stimulated13 with13 natural13 stimuli13 Note13 that13 bandwidths13 are13 elevated13 compared13 to13 when13 the13 LL13 neurons13 are13 stimulated13 with13 natural13 stimuli13 (see13 Fig13 7c)13 (e)13 Disparity13 estimates13 and13 estimate13 precision13 (inset)13 obtained13 with13 random-shy‐dot13 stereograms13 (fghi)13 Same13 as13 abce13 except13 that13 the13 stimuli13 were13 anti-shy‐correlated13 random13 dot13 stereograms13 Filter13 response13 distributions13 still13 segregate13 as13 a13 function13 of13 disparity13 (especially13 with13 F313 and13 F4)13 However13 the13 anti-shy‐correlated13 filter13 response13 distributions13 do13 not13 coincide13 with13 the13 lsquoexpectedrsquo13 pattern13 of13 response13 for13 particular13 disparities13 The13 disparities13 are13 decoded13 improperly13 with13 a13 decoder13 trained13 on13 natural13 stimuli13 LL13 neuron13 tuning13 curves13 with13 anti-shy‐correlated13 stereograms13 are13 neither13 selective13 nor13 invariant13 Inaccurate13 highly13 variable13 estimates13 result13 13 13

13 13

1013 13

13 Figure13 S313 Obtaining13 AMA13 filter13 and13 model13 complex13 cell13 responses13 with13 established13 neural13 operations13 (a)13 Simple13 cell13 responses13 are13 obtained13 by13 projecting13 the13 left-shy‐13 and13 right-shy‐eye13 signals13 onto13 each13 binocular13 filter13 and13 half-shy‐wave13 rectifying13 Subtracting13 lsquoonrsquo13 and13 lsquooffrsquo13 simple13 cell13 responses13 give13 the13 AMA13 filter13 responses13 (see13 Fig13 3a)13 (b)13 Model13 (and13 neurophysiological)13 complex13 cell13 responses13 are13 obtained13 by13 summing13 and13 squaring13 the13 responses13 of13 simple13 cells13 (Model13 complex13 cell13 responses13 can13 also13 be13 obtained13 by13 summing13 the13 responses13 of13 simple13 cells13 that13 have13 both13 a13 half-shy‐wave13 rectifying13 and13 a13 squaring13 nonlinearity)13 Note13 that13 the13 model13 complex13 cells13 differ13 from13 canonical13 disparity13 energy13 binocular13 complex13 cells13 We13 label13 them13 lsquocomplexrsquo13 because13 they13 exhibit13 temporal13 frequency13 doubling13 and13 have13 a13 fundamental13 to13 DC13 response13 ratio13 of13 less13 than13 one13 The13 model13 LL13 neurons13 (see13 Fig13 613 and13 Fig13 7)13 are13 more13 similar13 to13 disparity13 energy13 complex13 cells13 although13 as13 noted13 in13 the13 text13 there13 are13 also13 important13 differences13

13

1113 13

13 Figure13 S413 The13 eight13 AMA13 filters13 from13 Fig13 313 (on13 the13 diagonal)13 and13 their13 pairwise13 sums13 (see13 Fig13 S3a)13 Solid13 and13 dashed13 curves13 indicate13 the13 left13 and13 right13 eye13 filter13 components13 respectively13 Filter13 identity13 is13 indicated13 by13 the13 subscripts13 Fij13 Some13 filters13 have13 shapes13 similar13 to13 canonical13 V1-shy‐like13 binocular13 receptive13 fields13 These13 filters13 yield13 model13 complex13 cell13 tuning13 curves13 that13 vary13 with13 disparity13 (Fig13 S5)13 Other13 filters13 are13 irregularly13 shaped13 (eg13 F36)13 Irregularly13 shaped13 filters13 yield13 model13 complex13 cell13 tuning13 curves13 that13 do13 not13 vary13 significantly13 with13 disparity13 (Fig13 S5)13 The13 inset13 shows13 the13 largest13 absolute13 normalized13 weight13 (see13 Supplement13 Eqs13 S1a-shy‐d)13 given13 to13 each13 filter13 in13 constructing13 the13 set13 of13 LL13 neurons13 (Fig13 7a)13 Lighter13 colors13 indicate13 smaller13 weights13 Unsurprisingly13 the13 irregularly13 shaped13 filters13 (eg13 F36)13 play13 a13 generally13 smaller13 role13 in13 shaping13 the13 response13 properties13 of13 log-shy‐likelihood13 (LL)13 neurons13 13 13

1213 13

13 Figure13 S513 Disparity13 tuning13 curves13 of13 the13 model13 complex13 cells13 implied13 by13 the13 AMA13 filters13 (on-shy‐diagonal)13 and13 their13 normalized13 pairwise13 sums13 (off-shy‐diagonal)13 Solid13 curves13 show13 the13 complex13 cell13 tuning13 curves13 to13 natural13 stimuli13 The13

complex13 response13 is13 given13 by13 filtering13 the13 contrast-shy‐normalized13 stimulus13 and13 then13 squaring13 Rmax fi

Tcnorm( )2 13 (see13 Fig13 S4)13 Each13 point13 on13 the13 curve13 represents13 the13 average13 response13 to13 a13 large13 number13 of13 natural13 stimuli13 having13 the13 same13 disparity13 Gray13 area13 shows13 response13 variability13 due13 to13 variation13 in13 natural13 image13 content13 Dashed13 curves13 show13 the13 complex13 cell13 tuning13 curve13 to13 513 random-shy‐dot13 stereograms13 (RDSs)13 In13 most13 cases13 RDS13 tuning13 curves13 are13 similar13 to13 the13 tuning13 curves13 for13 natural13 stimuli13 although13 response13 magnitude13 is13 generally13 somewhat13 attenuated13 These13 individual13 model13 complex13 cells13 have13 some13 selectivity13 for13 disparity13 However13 they13 exhibit13 little13 invariance13 to13 variation13 in13 natural13 image13 content13 Thus13 the13 pattern13 of13 the13 joint13 population13 response13 must13 be13 used13 to13 accurately13 estimate13 disparity13 Neurons13 that13 signal13 the13 log-shy‐likelihood13 of13 disparity13 (LL13 neurons)13 can13 be13 constructed13 by13 appropriately13 combining13 the13 joint13 responses13 of13 model13 complex13 cells13 LL13 neurons13 are13 highly13 selective13 for13 disparity13 and13 are13 also13 strongly13 response13 invariant13 to13 irrelevant13

1313 13

13 Figure13 S613 Optimal13 disparity13 estimation13 vs13 estimation13 via13 local13 cross-shy‐correlation13 Performance13 measures13 were13 obtained13 by13 testing13 each13 method13 on13 identical13 sets13 of13 left-shy‐13 and13 right-shy‐eye13 signals13 like13 those13 shown13 in13 Fig13 1d13 (Note13 that13 this13 differs13 somewhat13 from13 the13 most13 common13 test13 of13 the13 cross-shy‐correlation13 model13 which13 gives13 the13 algorithm13 windowed13 access13 to13 many13 patches13 in13 each13 eye13 We13 chose13 the13 current13 method13 so13 that13 each13 algorithm13 has13 access13 to13 the13 same13 image13 information13 thereby13 making13 the13 comparison13 meaningful)13 Black13 curves13 indicate13 estimation13 performance13 via13 local13 cross-shy‐correlation13 Gray13 curves13 indicate13 performance13 obtained13 with13 the13 eight13 AMA13 filters13 shown13 in13 the13 main13 text13 (data13 replotted13 from13 Fig13 5)13 (a)13 Median13 MAP13 estimates13 for13 frontoparallel13 surfaces13 only13 Error13 bars13 indicate13 6813 confidence13 intervals13 (b)13 Estimate13 precision13 as13 indicated13 by13 6813 confidence13 intervals13 Error13 bars13 from13 (a)13 are13 replotted13 on13 a13 linear13 axis13 (Confidence13 intervals13 were13 plotted13 on13 a13 semi-shy‐log13 plot13 in13 the13 main13 text)13 (c)13 Sign13 identification13 performance13 with13 frontoparallel13 surfaces13 (de)13 Same13 as13 bc13 except13 that13 the13 performance13 is13 for13 surfaces13 with13 a13 cosine13 distribution13 of13 slants13 Estimation13 performance13 with13 the13 cross-shy‐correlator13 is13 (slightly)13 more13 biased13 and13 is13 significantly13 less13 precise13 at13 disparities13 larger13 than13 7513 arcmin13 13

1413 13

13 Figure13 S713 Information13 for13 disparity13 estimation13 is13 concentrated13 in13 low13 to13 mid13 spatial13 frequencies13 (a)13 Inter-shy‐ocular13 retinal13 contrast13 amplitude13 of13 a13 cosine-shy‐windowed13 high-shy‐contrast13 edge13 as13 a13 function13 of13 spatial13 frequency13 for13 different13 disparities13 (Eq13 S6)13 The13 optics13 in13 the13 two13 eyes13 were13 identical13 ie13

AL f( ) = AR f( )prop MTFδ f( ) f 13 where13 MTFδ is13 the13 modulation13 transfer13 function13 associated13 with13 a13 particular13 disparity13 (see13 Methods)13 (b)13 Inter-shy‐ocular13 contrast13 signals13 that13 have13 been13 passed13 through13 a13 bank13 of13 1513 octave13 bandwidth13 log-shy‐Gabor13 filters13 The13 signals13 differ13 little13 above13 ~513 cpd13 (c)13 The13 effect13 of13 vergence13 jitter13 on13 inter-shy‐ocular13 contrast13 signals13 Signals13 were13 corrupted13 by13 213 arcmin13 of13 vergence13 jitter13 a13 typical13 value13 in13 humans13 Vergence13 noise13 was13 simulated13 by13 convolving13 each13 disparity13 signal13 with13 a13 Gaussian13 having13 a13 standard13 deviation13 of13 213 arcmin13 Vergence13 jitter13 further13 attenuates13 high13 frequency13 inter-shy‐ocular13 contrast13 signals13 Natural13 image13 structure13 and13 vergence13 jitter13 both13 contribute13 to13 the13 lack13 of13 disparity13 information13 at13 high13 luminance13 spatial13 frequencies13 (d)13 The13 signals13 in13 (c)13 after13 having13 been13 passed13 through13 a13 bank13 of13 1513 octave13 bandwidth13 log-shy‐Gabor13 filters13 (e)13 The13 effect13 of13 a13 four13 diopter13 difference13 in13 refractive13 power13 between13 the13 two13 eyes13 drastically13 reduces13 the13 inter-shy‐ocular13 contrast13 signal13 A13 four13 diopter13 (or13 larger)13 difference13 between13 the13 left13 and13 right13 optics13 is13 a13 primary13 risk13 factor13 for13 the13 development13 of13 amblyopia13 (Levi13 et13 al13 2011)13 Note13 that13 these13 signals13 were13 simulated13 for13 eyes13 with13 2mm13 pupils13 With13 larger13 pupil13 diameters13 the13 same13 difference13 in13 refractive13 power13 would13 further13 reduce13 available13 disparity13 signals13 For13 example13 with13 a13 2mm13 pupil13 an13 813 diopter13 difference13 between13 the13 eyes13 produces13 the13 same13 difference13 in13 defocus13 blur13 as13 would13 a13 4mm13 pupil13 with13 a13 413 diopters13 difference13 between13 the13 eyes13 In13 that13 case13 there13 are13 no13 measureable13 signals13 above13 113 cpd13 (f)13 The13 signals13 in13 (e)13 after13 having13 been13 passed13 through13 a13 bank13 of13 1513 octave13 bandwidth13 log-shy‐Gabor13 filters13 (g)13 Same13 as13 (e)13 but13 with13 a13 4mm13 pupil13 (h)13 Same13 as13 (f)13 but13 with13 a13 4mm13 pupil13 13

13

1513 13

13 Figure13 S813 Depth13 structure13 not13 attributable13 to13 slant13 in13 natural13 scenes13 (a)13 A13 representative13 natural13 range13 image13 (b)13 Overhead13 plan13 view13 of13 the13 range13 along13 the13 white13 scan-shy‐line13 in13 (a)13 (c)13 Range13 values13 within13 a13 113 deg13 patch13 marked13 by13 the13 white13 box13 in13 (a)13 and13 the13 black13 box13 in13 (b)13 The13 dots13 represent13 the13 raw13 range13 values13 The13 plane13 is13 the13 best13 fitting13 plane13 to13 the13 range13 data13 (d)13 Depth13 variation13 in13 113 deg13 patches13 of13 natural13 scenes13 can13 largely13 be13 captured13 by13 slanted13 planes13 (see13 supplementary13 text)13 Values13 near13 one13 indicate13 that13 all13 range13 variation13 can13 be13 captured13 by13 (planar)13 surface13 slant13 Values13 near13 zero13 indicate13 that13 none13 of13 the13 range13 variation13 can13 be13 captured13 by13 planar13 surface13 slant13 In13 the13 most13 common13 patch13 nearly13 all13 depth13 variation13 is13 attributable13 to13 slant13 In13 the13 median13 patch13 7513 of13 depth13 variation13 is13 captured13 by13 slant13 13 13 13 13 13 13 13 13 13 13

1613 13

13 Figure13 S913 Contrast13 normalization13 and13 the13 Gaussian13 form13 of13 the13 filter13 response13 distributions13 The13 standard13 model13 of13

contrast13 normalization13 in13 retinal13 and13 cortical13 neurons13 is13 given13 by13 cnorm x( ) = c x( ) c x( ) 2

+ nc502 (Albrecht13 amp13 Geisler13

199113 Albrecht13 amp13 Hamilton13 198213 Heeger13 1992)13 Throughout13 the13 paper13 we13 assumed13 that13 c50 13 had13 a13 value13 of13 zero13 To13 examine13 the13 effect13 of13 different13 values13 of13 c50 13 on13 the13 form13 of13 the13 filter13 response13 distributions13 we13 first13 contrast-shy‐normalized13 the13 test13 stimuli13 using13 different13 values13 of13 c50 13 Then13 we13 projected13 the13 contrast-shy‐normalized13 the13 test13 stimuli13 from13 each13 disparity13 level13 onto13 the13 filters13 to13 obtain13 the13 conditional13 filter13 response13 distributions13 Finally13 we13 fit13 a13 multi-shy‐dimensional13 generalized13 Gaussian13 (one-shy‐dimension13 for13 each13 filter)13 to13 each13 conditional13 response13 distribution13

P R |δ k( ) 13 via13 maximum13 likelihood13 estimation13 (a)13 When13 c50 =13 0013 the13 conditional13 response13 distributions13 have13 tails13 that13 tend13 to13 be13 somewhat13 lighter13 than13 Gaussians13 (ie13 lower13 kurtosis13 than13 a13 Gaussian)13 When13 c50 =13 0113 the13 conditional13 response13 distributions13 are13 most13 Gaussian13 on13 average13 For13 c50 0113 the13 distributions13 have13 heavier13 tails13 than13 Gaussians13 (ie13 higher13 kurtosis13 than13 a13 Gaussian)13 (b)13 Powers13 of13 the13 multi-shy‐dimensional13 generalized13 Gaussians13 that13 best13 fit13 a13 subset13 of13 the13 two-shy‐dimensional13 conditional13 filter13 response13 distributions13 A13 power13 of13 two13 (dashed13 line)13 indicates13 that13 the13 best13 fitting13 distribution13 was13 Gaussian13 Powers13 greater13 than13 or13 less13 than13 two13 indicate13 that13 the13 responses13 are13 better13 fit13 by13 distributions13 having13 lighter13 or13 heavier13 tails13 than13 a13 Gaussian13 respectively13 The13 best-shy‐fitting13 powers13 are13 shown13 for13 three13 different13 values13 of13 c50 13 00013 01013 and13 02513 The13 average13 powers13 across13 the13 conditional13 response13 distributions13 for13 the13 three13 tested13 values13 of13 c50 13 were13 27913 20713 and13 11713 respectively13 The13 same13 qualitative13 results13 hold13 for13 the13 other13 two-shy‐dimensional13 response13 distributions13 The13 pattern13 also13 holds13 for13 higher-shy‐dimensional13 filter13 response13 distributions13 In13 higher13 dimensions13 however13 results13 are13 noisier13 because13 stable13 estimates13 of13 kurtosis13 require13 significantly13 more13 data13 13 13 13 13 13

  • jovi-14-02-01_118
    • Introduction
    • f01
    • Methods and results
    • e01
    • f02
    • e02
    • e03
    • e04
    • f03
    • e05
    • f04
    • f05
    • f06
    • f07
    • Discussion
    • Conclusions
    • Methods details
    • e06
    • e07
    • Albrecht1
    • Albrecht2
    • Anzai1
    • Badcock1
    • Banks1
    • Berens1
    • Blakemore1
    • Burge1
    • Burge2
    • Carandini1
    • Chen1
    • Cormack1
    • Cumming1
    • Cumming2
    • Cumming3
    • Curcio1
    • DeValois1
    • DeAngelis1
    • Ferster1
    • Fincham1
    • Fleet1
    • Ganguli1
    • Geisler1
    • Girshick1
    • Haefner1
    • Haefner2
    • Harris1
    • Hecht1
    • Heeger1
    • Hibbard1
    • Hoyer1
    • Jazayeri1
    • Kaye1
    • Landers1
    • Lehky1
    • Levi1
    • Lewicki1
    • Li1
    • Liu1
    • Ma1
    • Marcos1
    • Marr1
    • McKee1
    • Nakayama1
    • Nienborg1
    • Ogle1
    • Ohzawa1
    • Ohzawa2
    • Ohzawa3
    • Olshausen1
    • Panum1
    • Poggio1
    • Qian1
    • Qian2
    • Read1
    • Read2
    • Read3
    • Simoncelli1
    • Skottun1
    • Stevenson1
    • Stockman1
    • Tanabe1
    • Tanabe2
    • Thibos1
    • Tsao1
    • Tyler1
    • Tyler2
    • Vlaskamp1
    • Williams1
    • Wyszecki1
      • JOV-03741-2013-s01
Page 18: Optimal disparity estimation in natural stereo images · The local differences between the retinal images—the binocular disparities— ... vertically oriented receptive fields.The

Ohzawa I DeAngelis G C amp Freeman R D(1990) Stereoscopic depth discrimination in thevisual cortex Neurons ideally suited as disparitydetectors Science 249(4972) 1037ndash1041

Ohzawa I DeAngelis G C amp Freeman R D(1997) Encoding of binocular disparity by complexcells in the catrsquos visual cortex Journal of Neuro-physiology 77(6) 2879ndash2909

Olshausen B A amp Field D J (1996) Emergence ofsimple-cell receptive field properties by learning asparse code for natural images Nature 381(6583)607ndash609

Panum P (1858) Physiologische Untersuchungen uberdas Sehen mit zwei Augen [Translation Psycholog-ical investigations on seeing with two eyes] KielSchwerssche Buchandlung

Poggio G F Gonzalez F amp Krause F (1988)Stereoscopic mechanisms in monkey visual cortexBinocular correlation and disparity selectivityJournal of Neuroscience 8(12) 4531ndash4550

Qian N (1997) Binocular disparity and the perceptionof depth Neuron 18(3) 359ndash368

Qian N amp Zhu Y (1997) Physiological computationof binocular disparity Vision Research 37(13)1811ndash1827

Read J C A amp Cumming B G (2003) Testingquantitative models of binocular disparity selectiv-ity in primary visual cortex Journal of Neurophys-iology 90(5) 2795ndash2817

Read J C A amp Cumming B G (2007) Sensors forimpossible stimuli may solve the stereo correspon-dence problem Nature Neuroscience 10(10) 1322ndash1328

Read J C A Parker A J amp Cumming B G (2002)A simple model accounts for the response ofdisparity-tuned V1 neurons to anticorrelated im-ages Visual Neuroscience 19(6) 735ndash753

Simoncelli E P amp Olshausen B A (2001) Naturalimage statistics and neural representation AnnualReview of Neuroscience 24 1193ndash1216

Skottun B C De Valois R L Grosof D HMovshon J A Albrecht D G amp Bonds A B(1991) Classifying simple and complex cells on the

basis of response modulation Vision Research31(7-8) 1079ndash1086

Stevenson S B Cormack L K Schor C M ampTyler C W (1992) Disparity tuning in mecha-nisms of human stereopsis Vision Research 32(9)1685ndash1694

Stockman A amp Sharpe L T (2000) The spectralsensitivities of the middle- and long-wavelength-sensitive cones derived from measurements inobservers of known genotype Vision Research40(13) 1711ndash1737

Tanabe S amp Cumming B G (2008) Mechanismsunderlying the transformation of disparity signalsfrom V1 to V2 in the macaque Journal ofNeuroscience 28(44) 11304ndash11314

Tanabe S Haefner R M amp Cumming B G (2011)Suppressive mechanisms in monkey V1 help tosolve the stereo correspondence problem Journal ofNeuroscience 31(22) 8295ndash8305

Thibos L N Ye M Zhang X amp Bradley A(1992) The chromatic eye A new reduced-eyemodel of ocular chromatic aberration in humansApplied Optics 31(19) 3594ndash3600

Tsao D Y Conway B R amp Livingstone M S(2003) Receptive fields of disparity-tuned simplecells in macaque V1 Neuron 38(1) 103ndash114

Tyler C W (1975) Spatial organization of binoculardisparity sensitivity Vision Research 15(5) 583ndash590

Tyler C W amp Julesz B (1978) Binocular cross-correlation in time and space Vision Research18(1) 101ndash105

Vlaskamp B N S Yoon G amp Banks M S (2011)Human stereopsis is not limited by the optics of thewell-focused eye Journal of Neuroscience 31(27)9814ndash9818

Williams D R (1985) Visibility of interference fringesnear the resolution limit Journal of the OpticalSociety of America A 2(7) 1087ndash1093

Wyszecki G amp Stiles W (1982) Color scienceConcepts and methods quantitative data and for-mulas New York John Wiley amp Sons

Journal of Vision (2014) 14(2)1 1ndash18 Burge amp Geisler 18

113 13

Optimal13 disparity13 estimation13 in13 natural13 stereo-shy‐images13 (Supplement)13

Johannes13 Burge13 amp13 Wilson13 S13 Geisler13 13 Computing13 the13 log13 likelihood13 with13 populations13 of13 simple13 and13 complex13 cells13 13 Simple13 cells13 in13 visual13 cortex13 produce13 only13 positive13 responses13 whereas13 AMA13 filters13 produce13 both13 positive13 and13 negative13 responses13 The13 linear13 AMA13 filter13 responses13 (see13 first13 term13 in13 Eq13 5)13 and13 their13 pairwise13 sums13 (see13 the13 sum13 in13 the13 third13 termrsquos13 parentheses)13 can13 be13 obtained13 from13 a13 population13 of13 simple13 cells13 We13 define13 simple13 cells13 as13 units13 whose13 outputs13 result13 from13 linear13 filtering13 followed13 by13 half-shy‐wave13 rectification13 (ie13 thresholding13 at13 zero)13 Each13 AMA13 filter13 response13 in13 the13 first13 term13 can13 be13 obtained13 from13 two13 simple13 cells13 that13 are13 lsquoonrsquo13 and13 lsquooffrsquo13 versions13 of13 each13 AMA13 filter13 (Fig13 S3a13 see13 Fig13 3)13 Each13 pairwise-shy‐summed13 AMA13 filter13 response13 in13 the13 third13 term13 can13 be13 obtained13 from13 two13 simple13 cells13 that13 are13 lsquoonrsquo13 and13 lsquooffrsquo13 versions13 of13 each13 pairwise-shy‐summed13 filter13 (Fig13 S3a13 S4)13 13 13 The13 squared13 AMA13 filter13 responses13 (see13 the13 second13 and13 third13 terms13 in13 Eq13 5)13 can13 be13 obtained13 from13 a13 population13 of13 complex13 cells13 We13 define13 complex13 cells13 as13 units13 whose13 responses13 result13 from13 linear13 filtering13 followed13 by13 a13 squaring13 non-shy‐linearity13 as13 is13 often13 the13 case13 in13 cortex13 Complex13 cells13 can13 be13 implemented13 by13 summing13 and13 then13 squaring13 (or13 squaring13 and13 then13 summing)13 the13 outputs13 of13 lsquoonrsquo13 and13 lsquooffrsquo13 simple13 cells13 (Fig13 S3b13 the13 disparity13 tuning13 curves13 of13 these13 complex13 cells13 are13 given13 in13 Fig13 S5)13 13 Finally13 neurons13 whose13 response13 rates13 represent13 the13 log13 likelihood13 of13 the13 AMA13 filter13 population13 responses13 can13 be13 constructed13 via13 a13 weighted13 sum13 of13 simple13 and13 complex13 cell13 population13 responses13 (Eq13 513 Fig13 613 Fig13 7)13 (Note13 that13 the13 likelihood13 can13 be13 obtained13 by13 exponentiating13 the13 log-shy‐likelihood13 responses)13 In13 other13 words13 a13 large13 collection13 of13 log-shy‐likelihood13 (LL)13 neurons13 each13 with13 a13 different13 preferred13 disparity13 can13 be13 constructed13 from13 a13 fixed13 set13 of13 simple13 and13 complex13 cells13 simply13 by13 changing13 the13 weights13 13 13 Derivation13 of13 optimal13 weights13 on13 simple13 and13 complex13 cells13 The13 first13 term13 in13 Eq13 513 (ie13 the13 linear13 term)13 can13 be13 conceptualized13 as13 a13 weighted13 sum13 of13 simple13 cell13 population13 responses13 The13 second13 and13 third13 terms13 (ie13 the13 quadratic13 terms)13 can13 be13 conceptualized13 as13 weighted13 sums13 of13 complex13 cell13 population13 responses13 The13 constant13 corresponds13 to13 baseline13 response13 The13 constant13 and13 the13 weights13 on13 the13 linear13 squared13 and13 sum-shy‐squared13 filters13 are13 given13 by13

wik = Ck

minus1uk 13 13 13 13 13 13 13 13 13 (S1a)13

13 wiik = minusdiag(Cminus1)+ 05Cminus11 13 13 13 13 13 13 13 (S1b)13

wij k = minus05Cij

minus1 forallij j gt i 13 13 13 13 13 13 13 13 (S1c)13

cons primetk = minus05ukTCk

minus1uk + constk 13 13 13 13 13 13 (S1d)13 13 where13 I13 is13 the13 identity13 matrix13 113 is13 the13 lsquoonesrsquo13 vector13 and13 diag()13 sets13 a13 matrix13 diagonal13 to13 a13 vector13 Here13 we13 derive13 these13 weight13 equations13 based13 on13 the13 multi-shy‐dimensional13 Gaussian13 approximations13 to13 the13 empirically13 determined13 filter13 response13 distributions13 (ie13 conditional13 response13 distributions)13 (see13 Fig13 4a)13 13 13 13 13 13

213 13

First13 we13 rewrite13 Eq13 513 in13 the13 main13 text13 by13 expanding13 its13 third13 term13 13

ln p R δ k( ) = wik Ri

i=1

n

sum + wiik Ri2

i=1

n

sum + wij k Ri2 + 2Ri Rj + Rj

2( )j=i+1

n

sumi=1

nminus1

sum + cons primet 13 (S2)13

13 Second13 we13 rewrite13 Eq13 413 in13 the13 main13 text13

13

ln p R δ k( ) = minus05 R minus uk( )T Ak R minus uk( )+ const 13 13 13 13 13 (S3)13

13 where13 Ak 13 equals13 the13 inverse13 covariance13 matrix13 Ck

minus1 13 (Recall13 that13 the13 mean13 vector13 uk and13 covariance13 matrix13 Ck are13 obtained13 by13 projecting13 a13 collection13 of13 natural13 stereo-shy‐images13 having13 disparity13 δ k 13 onto13 the13 filters)13 From13 here13 forward13 the13 disparity13 index13 k13 is13 dropped13 for13 notational13 simplicity13 Third13 we13 set13 Eqs13 S213 and13 S313 equal13 to13 one13 another13 multiply13 through13 and13 collect13 terms13 The13 terms13 of13 the13 log13 likelihood13 (Eq13 S3)13 are13 found13 to13 have13 the13 form13 13

( ) ( )2 2 205 05 2ii i i ii i ii i i ii ia R u a R a Ru a uminus minus = minus minus + 13 13 13 13 (S4a)13

minusaij Ri minus ui( ) Rj minus u j( ) = minus aijRiRj minus aijRiu j minus aijRjui + aiju jui( ) 13 13 13 (S4b)13

13 where13 iia 13 are13 the13 on-shy‐diagonal13 elements13 of13 A 13 ija are13 the13 off-shy‐diagonal13 elements13 and13 iu 13 the13 are13 the13 mean13 filter13 responses13 13 13 The13 aim13 is13 now13 to13 express13 wi wii 13 and13 wij 13 in13 terms13 of13 iu 13 iia 13 and13 ija 13 First13 consider13 wi 13 In13 this13 case13

we13 want13 to13 collect13 terms13 containing13 iR 13 wiRi 13 from13 Eq13 S213 and13 aiiRiui 13 and13 aijRiu j 13 from13 Eq13 S4a13 Setting13 the13 iR 13 terms13 equal13 and13 canceling13 gives13 13

wi = aiju jj=1

n

sum 13 13 13 13 13 13 13 13 13 (S5a)13

13 Note13 that13 the13 weights13 on13 the13 linear13 filter13 responses13 wi 13 are13 zero13 if13 the13 mean13 linear13 filter13 responses13 ui 13 are13 zero13 In13 our13 case13 the13 filter13 response13 distributions13 means13 (ie13 the13 means13 of13 the13 conditional13 response13 distributions)13 are13 all13 near13 zero13 (see13 Fig13 4a13 Fig13 S2)13 Thus13 the13 weights13 on13 the13 simple13 cell13 responses13 (Fig13 6a)13 contribute13 little13 to13 the13 units13 (LL13 neurons)13 that13 respond13 according13 to13 the13 log13 likelihood13 of13 the13 linear13 filter13 population13 response13 (see13 Discussion)13 13 Next13 consider13 wij 13 i ne j 13 In13 this13 case13 we13 want13 to13 collect13 terms13 containing13 RiRj 13 2wijRiRj 13 appears13

in13 Eq13 S213 and13 minusaijRiRj 13 appears13 in13 Eq13 S4b13 Setting13 the13 two13 RiRj 13 terms13 equal13 to13 each13 other13 and13 canceling13 gives13 13

wij = minus05aij 13 13 13 13 13 13 13 13 13 (S5b)13

313 13

Next13 consider13 wii 13 In13 this13 case13 the13 total13 weight13 on13 2iR 13 in13 Eq13 S213 must13 be13 05 iiaminus 13 The13 weight13 from13

the13 third13 term13 of13 Eq13 S213 can13 be13 found13 by13 substituting13 wij = minus05aij 13 Subtracting13 that13 weight13 from13 the13

total13 weight13 specified13 by13 Eq13 S413 gives13 wii 13 13

wii = minusaii + 05 aijjsum 13 13 13 13 13 13 13 13 13 (S5c)13

13 Last13 by13 grouping13 terms13 from13 Eq13 S413 that13 are13 independent13 of13 the13 filter13 response13 we13 find13 the13 constant13 in13 Eq13 S213 is13 given13 by13 13

cons primet = minus05 aiimicroi2 + aijmicroimicro j

j foralljneisum

⎝⎜⎞

⎠⎟isum + const 13 13 13 13 (S5d)13

13 Eqs13 S1a-shy‐d13 are13 obtained13 by13 rewriting13 Eqs13 S5a-shy‐d13 in13 matrix13 notation13 13 13 In13 visual13 cortex13 simple13 and13 complex13 cells13 have13 the13 same13 maximum13 firing13 rates13 on13 average13 (Albrecht13 amp13 Hamilton13 1982)13 However13 the13 maximum13 possible13 response13 of13 the13 squared13 filters13 (see13 the13 second13 term13 in13 Eq13 5)13 is13 R2max 13 and13 the13 maximum13 possible13 response13 of13 the13 sum-shy‐squared13 filters13 (see13 the13 third13

term13 in13 Eq13 5)13 is13 given13 by13 R2ijmax = Rmax 2+ 2fi sdot f j( )2 13 where13 fi 13 and13 f j 13 are13 the13 weighting13 functions13 defining13 filters13 i13 and13 j13 (see13 Fig13 3a)13 Thus13 if13 these13 computations13 were13 implemented13 in13 cortexmdashthat13 is13 if13 the13 maximum13 response13 of13 the13 complex13 cells13 equaled13 the13 maximum13 response13 of13 the13 simple13 cells13 Rmax 13 rather13 than13 R

2max 13 and13 R

2ijmaxmdashthe13 weights13 on13 the13 complex13 cell13 responses13 must13 be13 scaled13

Thus13 the13 optimal13 weights13 on13 complex13 cells13 are13 given13 by13 primewi = wi 13 primewii = Rmaxwii 13 and13 primewij =Rijmax2

Rmaxwij 13

These13 scale13 factors13 are13 important13 because13 in13 a13 neurophysiological13 experiment13 one13 would13 measure13 primew 13 not13 w 13 Consequently13 primewii 13 and13 primewij 13 are13 the13 normalized13 weights13 plotted13 in13 Fig13 7b13

13 Evaluating13 the13 effect13 of13 surface13 slant13 Slant13 causes13 disparity13 to13 change13 across13 the13 patch13 We13 evaluated13 the13 effect13 of13 surface13 slant13 by13 determining13 the13 optimal13 filters13 and13 estimation13 performance13 for13 a13 training13 set13 with13 a13 distribution13 of13 slants13 If13 we13 assume13 (plausibly)13 that13 objects13 are13 on13 average13 equally13 likely13 to13 be13 viewed13 from13 any13 given13 angle13 then13 for13 a13 given13 small13 angular13 retinal13 extent13 the13 probability13 of13 a13 given13 slant13 about13 the13 vertical13 axis13 should13 fall13 as13 a13 half-shy‐cosine13 function13 Slant13 was13 calculated13 for13 the13 cyclopean13 eye13 To13 evaluate13 the13 effect13 of13 surface13 slant13 variation13 we13 generated13 a13 training13 set13 of13 stereo-shy‐images13 from13 a13 set13 of13 surfaces13 with13 slants13 that13 were13 randomly13 drawn13 from13 a13 truncated13 (+7113 deg)13 cosine13 probability13 density13 (more13 extreme13 slants13 were13 not13 practical13 to13 include13 because13 of13 the13 surface13 size13 required13 to13 produce13 a13 projected13 patch13 width13 of13 113 deg)13 The13 within-shy‐patch13 changes13 in13 disparity13 that13 were13 introduced13 by13 these13 slants13 are13 comparable13 to13 those13 in13 natural13 viewing13 (Fig13 S1g)13 Results13 based13 on13 this13 analysis13 are13 thus13 likely13 to13 be13 representative13 Then13 we13 determined13 the13 optimal13 binocular13 filters13 for13 this13 new13 training13 set13 The13 filters13 that13 were13 trained13 on13 surfaces13 having13 different13 slants13 (Fig13 S1a-shy‐c)13 and13 the13 LL13 neurons13 that13 are13 constructed13 from13 them13 (Fig13 S1de)13 are13 quite13 similar13 to13 those13 that13 were13 trained13 only13 on13 fronto-shy‐parallel13 surfaces13 The13 main13 (although13 slight)13 differences13 are13 that13 the13 new13 filters13 have13 spatial13 extents13 that13 are13 slightly13 smaller13 and13 octave13 bandwidths13 that13 are13 slightly13 higher13

413 13

and13 that13 the13 new13 LL13 neuron13 tuning13 curves13 are13 broader13 Disparity13 estimation13 performance13 on13 slant13 surfaces13 is13 also13 comparable13 but13 precision13 is13 somewhat13 reduced13 (Fig13 S1f13 Fig13 5d)13 When13 disparity13 changes13 across13 the13 patch13 the13 optimal13 encoding13 filters13 must13 be13 small13 enough13 to13 prevent13 the13 disparity13 signal13 from13 being13 lsquosmeared13 outrsquo13 by13 within-shy‐patch13 (ie13 within-shy‐receptive13 field)13 disparity13 variation13 And13 they13 must13 be13 large13 enough13 to13 prevent13 a13 poor13 signal-shy‐to-shy‐noise13 ratio13 which13 would13 lead13 to13 a13 preponderance13 of13 false13 matches13 (ie13 inaccurate13 disparity13 estimates)13 The13 optimal13 filters13 appear13 to13 strike13 this13 balance13 13 13 Evaluating13 the13 effect13 of13 depth13 variations13 other13 than13 slant13 Naturally13 occurring13 fine13 depth13 structure13 and13 occlusions13 (ie13 within-shy‐patch13 depth13 variation13 and13 depth13 discontinuities13 Fig13 S8a-shy‐c)13 were13 not13 present13 in13 either13 of13 our13 training13 sets13 To13 determine13 how13 the13 magnitude13 of13 these13 other13 sources13 of13 depth13 variation13 compare13 to13 that13 due13 to13 slant13 we13 analyzed13 4000013 113 deg13 patches13 from13 3013 outdoor13 range13 images13 obtained13 with13 a13 high-shy‐precision13 Riegl13 VZ40013 range13 scanner13 For13 each13 patch13 we13 computed13 the13 standard13 deviation13 of13 i)13 the13 differences13 between13 the13 raw13 range13 values13 and13 the13 best-shy‐fitting13 fronto-shy‐parallel13 plane13

σ nonminus planar 13 and13 ii)13 the13 differences13

between13 the13 best-shy‐fitting13 slanted13 and13 fronto-shy‐parallel13 planes13 σ planar 13 The13 ratio13 of13 these13 standard13

deviations13 σ planar σ nonminus planar 13 provides13 a13 measure13 of13 the13 relative13 magnitude13 of13 depth13 variation13 due13 to13

slant13 and13 variation13 due13 to13 other13 factors13 If13 the13 depth13 variation13 is13 due13 only13 to13 surface13 slant13 then13 the13 ratio13

σ planar σ nonminus planar 13 will13 have13 a13 value13 equal13 to13 1013 If13 all13 of13 the13 depth13 variation13 is13 due13 to13 factors13

other13 than13 slant13 the13 ratio13 will13 have13 a13 value13 of13 0013 Fig13 S8d13 plots13 the13 distribution13 of13 this13 ratio13 for13 all13 range13 patches13 In13 the13 most13 common13 patches13 essentially13 all13 the13 variation13 is13 captured13 by13 slant13 (ie13 the13 modal13 ratio13 is13 ~10)13 The13 median13 ratio13 is13 07513 indicating13 that13 in13 the13 median13 patch13 7513 of13 the13 depth13 variation13 is13 captured13 by13 slant13 Given13 that13 the13 optimal13 binocular13 filters13 are13 robust13 to13 variation13 in13 surface13 slant13 and13 given13 that13 most13 of13 natural13 depth13 variation13 is13 captured13 by13 slant13 it13 is13 unlikely13 that13 depth13 variation13 other13 than13 slant13 will13 qualitatively13 affect13 our13 results13 13 External13 variability13 and13 neural13 noise13 13 We13 examined13 the13 relative13 magnitude13 of13 stimulus-shy‐induced13 LL13 neuron13 variability13 and13 typical13 neural13 noise13 in13 cortex13 Typical13 cortical13 neurons13 have13 a13 peak13 mean13 response13 larger13 than13 3013 spikessec13 (Geisler13 amp13 Albrecht13 1997)13 and13 intrinsic13 noise13 variance13 proportional13 to13 the13 neuronrsquos13 mean13 response13 with13 a13 proportionality13 constant13 (ie13 Fano13 factor)13 less13 than13 15(Tolhurst13 Movshon13 amp13 Dean13 1983)13 To13 err13 on13 the13 side13 of13 too13 much13 neural13 noise13 we13 assume13 that13 each13 LL13 neuron13 has13 a13 maximum13 average13 response13 of13 3013 spkss13 and13 a13 Fano13 factor13 of13 1513 (bc13 the13 effect13 of13 noise13 is13 greater13 at13 lower13 firing13 rates)13 We13 convert13 the13 LL13 neuron13 responses13 to13 spike13 rates13 by13 first13 obtaining13 the13 average13 response13 of13 each13 LL13 neuron13 to13 its13 preferred13 disparity13 and13 then13 scaling13 that13 value13 to13 the13 maximum13 average13 response13 For13 an13 integration13 time13 of13 a13 typical13 fixation13 duration13 (30013 ms)13 stimulus-shy‐induced13 response13 variance13 was13 ~3x13 greater13 than13 the13 internal13 noise13 For13 an13 integration13 time13 equal13 to13 a13 typical13 stimulus13 presentation13 in13 a13 psychophysical13 task13 (113 sec)13 13 stimulus-shy‐induced13 response13 variance13 was13 ~9x13 greater13 than13 the13 internal13 noise13 Thus13 external13 variability13 must13 be13 considered13 to13 understand13 the13 influence13 of13 neural13 noise13 on13 the13 behavioral13 limits13 of13 vision13 under13 natural13 conditions13 In13 fact13 stimulus-shy‐induced13 variability13 is13 likely13 to13 be13 the13 controlling13 variability13 in13 many13 natural13 single-shy‐trial13 tasks13 13 Spatial13 frequency13 tuning13 of13 optimal13 linear13 binocular13 filters13 13 To13 develop13 an13 intuition13 for13 why13 spatial13 frequencies13 higher13 than13 ~613 cpd13 carry13 little13 information13 about13 disparity13 we13 examined13 the13 disparity13 signals13 resulting13 from13 a13 binocularly-shy‐viewed13 high-shy‐contrast13 cosine-shy‐windowed13 edge13 The13 Fourier-shy‐decomposition13 of13 each13 eyersquos13 stimulus13 is13 a13 set13 of13

513 13

phase-shy‐aligned13 sinewave13 gratings13 with13 the13 1f13 contrast13 fall-shy‐off13 typical13 of13 natural13 scenes(Field13 1987)13 The13 binocular13 difference13 signal13 (ie13 the13 difference13 of13 the13 left13 and13 right13 signals)13 is13 sinusoidal13 with13 contrast13 amplitude13 given13 by13 13

AB f δ k( ) = AL( f )2 + AR( f )2 minus 2AL( f )AR( f )cos 2π fδ k( ) 13 13 13 (S6)13

13 where f 13 is13 the13 spatial13 frequency13 δ k 13 is13 a13 particular13 disparity13 AL ( f ) 13 and13 AR( f ) 13 are13 the13 left13 and13

right13 eye13 retinal13 amplitudes13 and13 AB f δ k( ) 13 is13 the13 amplitude13 of13 the13 binocular13 contrast13 difference13 signal13 If13 the13 two13 eyes13 have13 identical13 optics13 the13 left-shy‐13 and13 right-shy‐eye13 retinal13 amplitudes13 (which13 include13 the13 effects13 of13 the13 optics13 and13 the13 1 f 13 falloff13 of13 natural13 image13 spectra)13 will13 be13 the13 same13 for13 a13 given13 disparity13 Fig13 S7a13 plots13 pattern13 of13 binocular13 contrast13 signals13 for13 each13 of13 several13 disparities13 Fig13 S7b13 shows13 the13 signals13 after13 filtering13 with13 a13 bank13 of13 log-shy‐Gabor13 filters13 each13 having13 a13 1513 octave13 bandwidth13 The13 patterns13 of13 binocular13 contrast13 barely13 differ13 above13 ~613 cpd13 A13 filter13 population13 (for13 disparity13 estimation)13 will13 be13 most13 efficient13 if13 each13 filter13 signals13 information13 about13 a13 range13 of13 disparities13 13 13 V113 binocular13 receptive13 fields13 tuned13 to13 the13 highest13 useful13 spatial13 frequency13 (~613 cpd)13 would13 have13 a13 spatial13 extent13 of13 ~813 arcmin13 assuming13 the13 typical13 1513 octave13 bandwidth13 of13 neurons13 in13 cortex13 Given13 that13 receptive13 fields13 cannot13 signal13 disparity13 variation13 finer13 than13 their13 own13 width13 this13 analysis13 may13 help13 explain13 why13 humans13 lose13 the13 ability13 to13 detect13 disparity13 modulations13 (ie13 sinusoidal13 modulations13 in13 disparity-shy‐defined13 depth)13 at13 spatial13 frequencies13 higher13 than13 ~413 cpd13 (Banks13 Gepshtein13 amp13 Landy13 200413 Harris13 McKee13 amp13 Smallman13 1997)13 Vergence13 eye13 movement13 jitter13 has13 previously13 been13 proposed13 as13 an13 explanation13 for13 why13 useful13 binocular13 information13 is13 restricted13 to13 low13 spatial13 frequencies13 (Vlaskamp13 Yoon13 amp13 Banks13 2011)13 vergence13 noise13 can13 degrade13 stereo13 information13 enough13 to13 render13 high13 frequency13 signals13 unmeasurable13 (Fig13 S7cd)13 The13 analysis13 presented13 here13 suggests13 that13 the13 frequency13 at13 which13 humans13 lose13 the13 ability13 to13 detect13 disparity13 modulation13 may13 simply13 be13 explained13 by13 the13 statistics13 of13 natural13 images13 13 Optics13 The13 retinal13 images13 used13 in13 our13 analysis13 were13 simulated13 to13 match13 the13 retinal13 image13 formation13 that13 occurs13 in13 the13 human13 eye13 In13 humans13 and13 non-shy‐human13 primates13 accommodative13 and13 vergence13 eye-shy‐movement13 systems13 are13 yoked(Fincham13 amp13 Walton13 1957)13 Consistent13 with13 this13 fact13 we13 set13 the13 focus13 distance13 equal13 to13 the13 fixation13 distance13 of13 4013 cm13 and13 defocused13 the13 images13 appropriate13 for13 each13 disparity13 The13 image13 of13 a13 surface13 displaced13 from13 the13 fixation13 is13 defocused13 by13

13 ΔD cong δ IPD 13 13 13 13 13 13 13 13 13 13 (S7)13

13 where13 δ 13 is13 the13 disparity13 of13 the13 surface13 expressed13 in13 radians13 IPD13 is13 the13 inter-shy‐pupillary13 distance13 expressed13 in13 meters13 and13 ΔD 13 is13 the13 defocus13 expressed13 in13 diopters13 (1meters)13 To13 account13 for13 the13 effect13 of13 chromatic13 aberration13 we13 created13 a13 different13 polychromatic13 point-shy‐spread13 function13 (PSF)13 for13 each13 disparity13 (ie13 defocus13 Eq13 S7)13 Single-shy‐wavelength13 PSFs13 were13 computed13 every13 1013 nm13 between13 40013 and13 70013 nm13 The13 wavelength-shy‐dependent13 change13 in13 refractive13 power13 of13 the13 human13 eye13 was13 taken13 from13 the13 literature13 (Thibos13 Ye13 Zhang13 amp13 Bradley13 1992)13 Polychromatic13 PSFs13 were13 obtained13 by13 weighting13 the13 single-shy‐wavelength13 PSFs13 by13 the13 photopic13 sensitivity13 function13 (Stockman13 amp13 Sharpe13 2000)13 and13 by13 the13 D6513 daylight13 illumination13 spectrum13 and13 then13 summing13 13

613 13

13 psf photopic xΔD( ) = 1

Kpsf xλΔD( )sc

λsum λ( )D65 λ( ) 13 13 13 13 (S8)13

13 where13 sc λ( ) 13 is13 the13 human13 photopic13 sensitivity13 function13 and13 K13 is13 a13 normalizing13 constant13 that13 sets13 the13 volume13 of13 psf photopic 13 to13 1013 The13 modulation13 transfer13 function13 (MTF)13 is13 the13 amplitude13 of13 the13 Fourier13 transform13 of13 the13 PSF13 13 Contrast13 normalization13 Contrast13 normalization13 occurs13 early13 in13 visual13 processing13 The13 standard13 model13 of13 cortical13 neuron13 response13 assumes13 that13 the13 input13 is13 the13 contrast13 signal13

c x( ) = r x( )minus r( ) r 13 where13 r x( ) 13 are13 the13

noisy13 sensor13 responses13 to13 the13 luminance13 image13 falling13 on13 the13 retina13 over13 a13 local13 area13 and13 r 13 is13 the13 local13 mean13 This13 contrast13 signal13 is13 then13 normalized13 by13 the13 local13 contrast13

cnorm x( ) = c x( ) c x( ) 2

+ nc502 13 where13 n13 is13 the13 dimensionality13 of13 the13 vector13 c x( ) 2 13 is13 the13 contrast13

energy13 (or13 equivalently13 n 13 times13 the13 squared13 RMS13 contrast)13 and13 50c 13 is13 the13 half-shy‐saturation13 constant13 (Albrecht13 amp13 Geisler13 199113 Albrecht13 amp13 Hamilton13 198213 Heeger13 1992)13 (We13 are13 agnostic13 about13 the13 mechanism13 by13 which13 the13 contrast13 normalization13 is13 achieved)13 Finally13 the13 normalized13 contrast13 signal13 is13 weighted13 by13 the13 cortical13 neuronrsquos13 receptive13 field13 and13 passed13 through13 a13 static13 non-shy‐linearity13 For13 example13 in13 the13 case13 of13 a13 binocular13 simple13 cell13 the13 response13 of13 the13 neuron13 is13 obtained13 by13 taking13 the13 dot13 product13 of13 the13 contrast-shy‐normalized13 signal13 with13 the13 receptive13 field13 and13 half-shy‐wave13 rectifying13

Ri = Rmax cnorm x( ) sdot fi x( )⎢⎣ ⎥⎦ 13 13

13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13

713 13

13 References13 Albrecht13 D13 G13 amp13 Geisler13 W13 S13 (1991)13 Motion13 selectivity13 and13 the13 contrast-shy‐response13 function13 of13

simple13 cells13 in13 the13 visual13 cortex13 Visual13 Neuroscience13 7(6)13 531ndash54613 Albrecht13 D13 G13 amp13 Hamilton13 D13 B13 (1982)13 Striate13 cortex13 of13 monkey13 and13 cat13 contrast13 response13

function13 Journal13 of13 Neurophysiology13 48(1)13 217ndash23713 Banks13 M13 S13 Gepshtein13 S13 amp13 Landy13 M13 S13 (2004)13 Why13 is13 spatial13 stereoresolution13 so13 low13 The13

Journal13 of13 Neuroscience13 24(9)13 2077ndash208913 13 Field13 D13 J13 (1987)13 Relations13 between13 the13 statistics13 of13 natural13 images13 and13 the13 response13 properties13 of13

cortical13 cells13 Journal13 of13 the13 Optical13 Society13 of13 America13 A13 4(12)13 2379ndash239413 Fincham13 E13 amp13 Walton13 J13 (1957)13 The13 reciprocal13 actions13 of13 accommodation13 and13 convergence13 The13

Journal13 of13 Physiology13 137(3)13 488ndash50813 Geisler13 W13 S13 amp13 Albrecht13 D13 G13 (1997)13 Visual13 cortex13 neurons13 in13 monkeys13 and13 cats13 detection13

discrimination13 and13 identification13 Visual13 Neuroscience13 14(5)13 897ndash91913 Harris13 J13 M13 McKee13 S13 P13 amp13 Smallman13 H13 S13 (1997)13 Fine-shy‐scale13 processing13 in13 human13 binocular13

stereopsis13 Journal13 of13 the13 Optical13 Society13 of13 America13 A13 14(8)13 1673ndash168313 Heeger13 D13 J13 (1992)13 Normalization13 of13 cell13 responses13 in13 cat13 striate13 cortex13 Visual13 Neuroscience13 9(2)13

181ndash19713 Hibbard13 P13 B13 (2008)13 Binocular13 energy13 responses13 to13 natural13 images13 Vision13 Research13 48(12)13 1427ndash

143913 13 Stockman13 A13 amp13 Sharpe13 L13 T13 (2000)13 The13 spectral13 sensitivities13 of13 the13 middle-shy‐13 and13 long-shy‐wavelength-shy‐

sensitive13 cones13 derived13 from13 measurements13 in13 observers13 of13 known13 genotype13 Vision13 Research13 40(13)13 1711ndash173713

Thibos13 L13 N13 Ye13 M13 Zhang13 X13 amp13 Bradley13 A13 (1992)13 The13 chromatic13 eye13 a13 new13 reduced-shy‐eye13 model13 of13 ocular13 chromatic13 aberration13 in13 humans13 Applied13 Optics13 31(19)13 3594ndash360013

Tolhurst13 D13 J13 Movshon13 J13 A13 amp13 Dean13 A13 F13 (1983)13 The13 statistical13 reliability13 of13 signals13 in13 single13 neurons13 in13 cat13 and13 monkey13 visual13 cortex13 Vision13 Research13 23(8)13 775ndash78513

Vlaskamp13 B13 N13 S13 Yoon13 G13 amp13 Banks13 M13 S13 (2011)13 Human13 stereopsis13 is13 not13 limited13 by13 the13 optics13 of13 the13 well-shy‐focused13 eye13 The13 Journal13 of13 Neuroscience13 31(27)13 9814ndash981813

13

813 13

13 Figure13 S113 Optimal13 filters13 and13 disparity13 estimation13 performance13 for13 estimating13 disparity13 for13 surfaces13 with13 a13 cosine13 distribution13 of13 slants13 (a)13 Optimal13 linear13 binocular13 filters13 for13 estimating13 the13 disparity13 (of13 the13 center13 pixel)13 of13 slanted13 surfaces13 The13 filters13 were13 learned13 on13 a13 training13 set13 having13 surfaces13 with13 a13 cosine13 distribution13 of13 slants13 The13 optimal13 filter13 shapes13 are13 robust13 to13 the13 presence13 of13 non-shy‐zero13 surface13 slants13 in13 the13 training13 set13 However13 these13 filters13 are13 of13 somewhat13 smaller13 spatial13 extent13 and13 slightly13 higher13 octave13 bandwidth13 (compare13 filters13 F3-shy‐F513 to13 those13 in13 Fig13 3)13 (b)13 Spatial13 frequency13 tuning13 and13 bandwidth13 of13 filters13 (c)13 Two-shy‐dimensional13 analogs13 of13 filters13 in13 (a)13 The13 two-shy‐dimensional13 filters13 respond13 virtually13 identically13 to13 two-shy‐dimensional13 images13 as13 do13 the13 one-shy‐dimensional13 filters13 to13 one-shy‐dimensional13 signals13 (d)13 Tuning13 curves13 of13 LL13 neurons13 trained13 and13 tested13 on13 surfaces13 with13 varying13 surface13 slant13 (e)13 Normalized13 weights13 on13 model13 complex13 cell13 responses13 for13 constructing13 five13 LL13 neurons13 with13 the13 indicated13 preferred13 disparities13 (f)13 Disparity13 estimation13 performance13 with13 surfaces13 having13 a13 cosine13 distribution13 of13 slants13 (black13 line)13 Also13 shown13 are13 disparity13 estimates13 derived13 from13 a13 local13 cross-shy‐correlator13 (gray13 line)13 (g)13 Disparity13 gradients13 from13 viewing13 natural13 scenes(Hibbard13 2008)13 vs13 disparity13 gradients13 in13 our13 training13 set13 The13 relationship13 between13 slant13 and13 disparity13 gradient13 is13 given13 by13

g asymp IPD tanθ( ) d 13 where13 θ 13 is13 the13 surface13 slant13 and13 d13 is13 the13 distance13 to13 the13 surface13 Hibbard13 (2008)13 estimated13 disparity13 gradients13 in13 natural13 viewing13 for13 distances13 nearer13 than13 3513 m13 (dashed13 curve)13 and13 0513 m13 (dotted13 curve)13 At13 the13 04m13 viewing13 distance13 assumed13 throughout13 the13 paper13 our13 manipulation13 of13 surface13 slant13 introduced13 distribution13 of13 disparity13 gradients13 (black13 line)13 that13 is13 comparable13 to13 the13 gradients13 introduced13 in13 natural13 viewing13 13

913 13

13 Figure13 S213 Filter13 responses13 LL13 neuron13 responses13 and13 performance13 with13 random-shy‐dot13 stereograms13 and13 anti-shy‐correlated13 random-shy‐dot13 stereograms13 (a)13 Joint13 response13 distributions13 of13 filters13 F113 and13 F213 when13 stimulated13 with13 random-shy‐dot13 stereograms13 Compare13 to13 natural13 stimuli13 in13 Fig13 4a13 (b)13 Joint13 responses13 of13 filters13 F313 and13 F413 when13 stimulated13 with13 random-shy‐dot13 stereograms13 (c)13 LL13 neuron13 responses13 to13 random13 dot13 stereograms13 (d)13 LL13 neuron13 tuning13 vs13 bandwidth13 when13 stimulated13 with13 natural13 stimuli13 Note13 that13 bandwidths13 are13 elevated13 compared13 to13 when13 the13 LL13 neurons13 are13 stimulated13 with13 natural13 stimuli13 (see13 Fig13 7c)13 (e)13 Disparity13 estimates13 and13 estimate13 precision13 (inset)13 obtained13 with13 random-shy‐dot13 stereograms13 (fghi)13 Same13 as13 abce13 except13 that13 the13 stimuli13 were13 anti-shy‐correlated13 random13 dot13 stereograms13 Filter13 response13 distributions13 still13 segregate13 as13 a13 function13 of13 disparity13 (especially13 with13 F313 and13 F4)13 However13 the13 anti-shy‐correlated13 filter13 response13 distributions13 do13 not13 coincide13 with13 the13 lsquoexpectedrsquo13 pattern13 of13 response13 for13 particular13 disparities13 The13 disparities13 are13 decoded13 improperly13 with13 a13 decoder13 trained13 on13 natural13 stimuli13 LL13 neuron13 tuning13 curves13 with13 anti-shy‐correlated13 stereograms13 are13 neither13 selective13 nor13 invariant13 Inaccurate13 highly13 variable13 estimates13 result13 13 13

13 13

1013 13

13 Figure13 S313 Obtaining13 AMA13 filter13 and13 model13 complex13 cell13 responses13 with13 established13 neural13 operations13 (a)13 Simple13 cell13 responses13 are13 obtained13 by13 projecting13 the13 left-shy‐13 and13 right-shy‐eye13 signals13 onto13 each13 binocular13 filter13 and13 half-shy‐wave13 rectifying13 Subtracting13 lsquoonrsquo13 and13 lsquooffrsquo13 simple13 cell13 responses13 give13 the13 AMA13 filter13 responses13 (see13 Fig13 3a)13 (b)13 Model13 (and13 neurophysiological)13 complex13 cell13 responses13 are13 obtained13 by13 summing13 and13 squaring13 the13 responses13 of13 simple13 cells13 (Model13 complex13 cell13 responses13 can13 also13 be13 obtained13 by13 summing13 the13 responses13 of13 simple13 cells13 that13 have13 both13 a13 half-shy‐wave13 rectifying13 and13 a13 squaring13 nonlinearity)13 Note13 that13 the13 model13 complex13 cells13 differ13 from13 canonical13 disparity13 energy13 binocular13 complex13 cells13 We13 label13 them13 lsquocomplexrsquo13 because13 they13 exhibit13 temporal13 frequency13 doubling13 and13 have13 a13 fundamental13 to13 DC13 response13 ratio13 of13 less13 than13 one13 The13 model13 LL13 neurons13 (see13 Fig13 613 and13 Fig13 7)13 are13 more13 similar13 to13 disparity13 energy13 complex13 cells13 although13 as13 noted13 in13 the13 text13 there13 are13 also13 important13 differences13

13

1113 13

13 Figure13 S413 The13 eight13 AMA13 filters13 from13 Fig13 313 (on13 the13 diagonal)13 and13 their13 pairwise13 sums13 (see13 Fig13 S3a)13 Solid13 and13 dashed13 curves13 indicate13 the13 left13 and13 right13 eye13 filter13 components13 respectively13 Filter13 identity13 is13 indicated13 by13 the13 subscripts13 Fij13 Some13 filters13 have13 shapes13 similar13 to13 canonical13 V1-shy‐like13 binocular13 receptive13 fields13 These13 filters13 yield13 model13 complex13 cell13 tuning13 curves13 that13 vary13 with13 disparity13 (Fig13 S5)13 Other13 filters13 are13 irregularly13 shaped13 (eg13 F36)13 Irregularly13 shaped13 filters13 yield13 model13 complex13 cell13 tuning13 curves13 that13 do13 not13 vary13 significantly13 with13 disparity13 (Fig13 S5)13 The13 inset13 shows13 the13 largest13 absolute13 normalized13 weight13 (see13 Supplement13 Eqs13 S1a-shy‐d)13 given13 to13 each13 filter13 in13 constructing13 the13 set13 of13 LL13 neurons13 (Fig13 7a)13 Lighter13 colors13 indicate13 smaller13 weights13 Unsurprisingly13 the13 irregularly13 shaped13 filters13 (eg13 F36)13 play13 a13 generally13 smaller13 role13 in13 shaping13 the13 response13 properties13 of13 log-shy‐likelihood13 (LL)13 neurons13 13 13

1213 13

13 Figure13 S513 Disparity13 tuning13 curves13 of13 the13 model13 complex13 cells13 implied13 by13 the13 AMA13 filters13 (on-shy‐diagonal)13 and13 their13 normalized13 pairwise13 sums13 (off-shy‐diagonal)13 Solid13 curves13 show13 the13 complex13 cell13 tuning13 curves13 to13 natural13 stimuli13 The13

complex13 response13 is13 given13 by13 filtering13 the13 contrast-shy‐normalized13 stimulus13 and13 then13 squaring13 Rmax fi

Tcnorm( )2 13 (see13 Fig13 S4)13 Each13 point13 on13 the13 curve13 represents13 the13 average13 response13 to13 a13 large13 number13 of13 natural13 stimuli13 having13 the13 same13 disparity13 Gray13 area13 shows13 response13 variability13 due13 to13 variation13 in13 natural13 image13 content13 Dashed13 curves13 show13 the13 complex13 cell13 tuning13 curve13 to13 513 random-shy‐dot13 stereograms13 (RDSs)13 In13 most13 cases13 RDS13 tuning13 curves13 are13 similar13 to13 the13 tuning13 curves13 for13 natural13 stimuli13 although13 response13 magnitude13 is13 generally13 somewhat13 attenuated13 These13 individual13 model13 complex13 cells13 have13 some13 selectivity13 for13 disparity13 However13 they13 exhibit13 little13 invariance13 to13 variation13 in13 natural13 image13 content13 Thus13 the13 pattern13 of13 the13 joint13 population13 response13 must13 be13 used13 to13 accurately13 estimate13 disparity13 Neurons13 that13 signal13 the13 log-shy‐likelihood13 of13 disparity13 (LL13 neurons)13 can13 be13 constructed13 by13 appropriately13 combining13 the13 joint13 responses13 of13 model13 complex13 cells13 LL13 neurons13 are13 highly13 selective13 for13 disparity13 and13 are13 also13 strongly13 response13 invariant13 to13 irrelevant13

1313 13

13 Figure13 S613 Optimal13 disparity13 estimation13 vs13 estimation13 via13 local13 cross-shy‐correlation13 Performance13 measures13 were13 obtained13 by13 testing13 each13 method13 on13 identical13 sets13 of13 left-shy‐13 and13 right-shy‐eye13 signals13 like13 those13 shown13 in13 Fig13 1d13 (Note13 that13 this13 differs13 somewhat13 from13 the13 most13 common13 test13 of13 the13 cross-shy‐correlation13 model13 which13 gives13 the13 algorithm13 windowed13 access13 to13 many13 patches13 in13 each13 eye13 We13 chose13 the13 current13 method13 so13 that13 each13 algorithm13 has13 access13 to13 the13 same13 image13 information13 thereby13 making13 the13 comparison13 meaningful)13 Black13 curves13 indicate13 estimation13 performance13 via13 local13 cross-shy‐correlation13 Gray13 curves13 indicate13 performance13 obtained13 with13 the13 eight13 AMA13 filters13 shown13 in13 the13 main13 text13 (data13 replotted13 from13 Fig13 5)13 (a)13 Median13 MAP13 estimates13 for13 frontoparallel13 surfaces13 only13 Error13 bars13 indicate13 6813 confidence13 intervals13 (b)13 Estimate13 precision13 as13 indicated13 by13 6813 confidence13 intervals13 Error13 bars13 from13 (a)13 are13 replotted13 on13 a13 linear13 axis13 (Confidence13 intervals13 were13 plotted13 on13 a13 semi-shy‐log13 plot13 in13 the13 main13 text)13 (c)13 Sign13 identification13 performance13 with13 frontoparallel13 surfaces13 (de)13 Same13 as13 bc13 except13 that13 the13 performance13 is13 for13 surfaces13 with13 a13 cosine13 distribution13 of13 slants13 Estimation13 performance13 with13 the13 cross-shy‐correlator13 is13 (slightly)13 more13 biased13 and13 is13 significantly13 less13 precise13 at13 disparities13 larger13 than13 7513 arcmin13 13

1413 13

13 Figure13 S713 Information13 for13 disparity13 estimation13 is13 concentrated13 in13 low13 to13 mid13 spatial13 frequencies13 (a)13 Inter-shy‐ocular13 retinal13 contrast13 amplitude13 of13 a13 cosine-shy‐windowed13 high-shy‐contrast13 edge13 as13 a13 function13 of13 spatial13 frequency13 for13 different13 disparities13 (Eq13 S6)13 The13 optics13 in13 the13 two13 eyes13 were13 identical13 ie13

AL f( ) = AR f( )prop MTFδ f( ) f 13 where13 MTFδ is13 the13 modulation13 transfer13 function13 associated13 with13 a13 particular13 disparity13 (see13 Methods)13 (b)13 Inter-shy‐ocular13 contrast13 signals13 that13 have13 been13 passed13 through13 a13 bank13 of13 1513 octave13 bandwidth13 log-shy‐Gabor13 filters13 The13 signals13 differ13 little13 above13 ~513 cpd13 (c)13 The13 effect13 of13 vergence13 jitter13 on13 inter-shy‐ocular13 contrast13 signals13 Signals13 were13 corrupted13 by13 213 arcmin13 of13 vergence13 jitter13 a13 typical13 value13 in13 humans13 Vergence13 noise13 was13 simulated13 by13 convolving13 each13 disparity13 signal13 with13 a13 Gaussian13 having13 a13 standard13 deviation13 of13 213 arcmin13 Vergence13 jitter13 further13 attenuates13 high13 frequency13 inter-shy‐ocular13 contrast13 signals13 Natural13 image13 structure13 and13 vergence13 jitter13 both13 contribute13 to13 the13 lack13 of13 disparity13 information13 at13 high13 luminance13 spatial13 frequencies13 (d)13 The13 signals13 in13 (c)13 after13 having13 been13 passed13 through13 a13 bank13 of13 1513 octave13 bandwidth13 log-shy‐Gabor13 filters13 (e)13 The13 effect13 of13 a13 four13 diopter13 difference13 in13 refractive13 power13 between13 the13 two13 eyes13 drastically13 reduces13 the13 inter-shy‐ocular13 contrast13 signal13 A13 four13 diopter13 (or13 larger)13 difference13 between13 the13 left13 and13 right13 optics13 is13 a13 primary13 risk13 factor13 for13 the13 development13 of13 amblyopia13 (Levi13 et13 al13 2011)13 Note13 that13 these13 signals13 were13 simulated13 for13 eyes13 with13 2mm13 pupils13 With13 larger13 pupil13 diameters13 the13 same13 difference13 in13 refractive13 power13 would13 further13 reduce13 available13 disparity13 signals13 For13 example13 with13 a13 2mm13 pupil13 an13 813 diopter13 difference13 between13 the13 eyes13 produces13 the13 same13 difference13 in13 defocus13 blur13 as13 would13 a13 4mm13 pupil13 with13 a13 413 diopters13 difference13 between13 the13 eyes13 In13 that13 case13 there13 are13 no13 measureable13 signals13 above13 113 cpd13 (f)13 The13 signals13 in13 (e)13 after13 having13 been13 passed13 through13 a13 bank13 of13 1513 octave13 bandwidth13 log-shy‐Gabor13 filters13 (g)13 Same13 as13 (e)13 but13 with13 a13 4mm13 pupil13 (h)13 Same13 as13 (f)13 but13 with13 a13 4mm13 pupil13 13

13

1513 13

13 Figure13 S813 Depth13 structure13 not13 attributable13 to13 slant13 in13 natural13 scenes13 (a)13 A13 representative13 natural13 range13 image13 (b)13 Overhead13 plan13 view13 of13 the13 range13 along13 the13 white13 scan-shy‐line13 in13 (a)13 (c)13 Range13 values13 within13 a13 113 deg13 patch13 marked13 by13 the13 white13 box13 in13 (a)13 and13 the13 black13 box13 in13 (b)13 The13 dots13 represent13 the13 raw13 range13 values13 The13 plane13 is13 the13 best13 fitting13 plane13 to13 the13 range13 data13 (d)13 Depth13 variation13 in13 113 deg13 patches13 of13 natural13 scenes13 can13 largely13 be13 captured13 by13 slanted13 planes13 (see13 supplementary13 text)13 Values13 near13 one13 indicate13 that13 all13 range13 variation13 can13 be13 captured13 by13 (planar)13 surface13 slant13 Values13 near13 zero13 indicate13 that13 none13 of13 the13 range13 variation13 can13 be13 captured13 by13 planar13 surface13 slant13 In13 the13 most13 common13 patch13 nearly13 all13 depth13 variation13 is13 attributable13 to13 slant13 In13 the13 median13 patch13 7513 of13 depth13 variation13 is13 captured13 by13 slant13 13 13 13 13 13 13 13 13 13 13

1613 13

13 Figure13 S913 Contrast13 normalization13 and13 the13 Gaussian13 form13 of13 the13 filter13 response13 distributions13 The13 standard13 model13 of13

contrast13 normalization13 in13 retinal13 and13 cortical13 neurons13 is13 given13 by13 cnorm x( ) = c x( ) c x( ) 2

+ nc502 (Albrecht13 amp13 Geisler13

199113 Albrecht13 amp13 Hamilton13 198213 Heeger13 1992)13 Throughout13 the13 paper13 we13 assumed13 that13 c50 13 had13 a13 value13 of13 zero13 To13 examine13 the13 effect13 of13 different13 values13 of13 c50 13 on13 the13 form13 of13 the13 filter13 response13 distributions13 we13 first13 contrast-shy‐normalized13 the13 test13 stimuli13 using13 different13 values13 of13 c50 13 Then13 we13 projected13 the13 contrast-shy‐normalized13 the13 test13 stimuli13 from13 each13 disparity13 level13 onto13 the13 filters13 to13 obtain13 the13 conditional13 filter13 response13 distributions13 Finally13 we13 fit13 a13 multi-shy‐dimensional13 generalized13 Gaussian13 (one-shy‐dimension13 for13 each13 filter)13 to13 each13 conditional13 response13 distribution13

P R |δ k( ) 13 via13 maximum13 likelihood13 estimation13 (a)13 When13 c50 =13 0013 the13 conditional13 response13 distributions13 have13 tails13 that13 tend13 to13 be13 somewhat13 lighter13 than13 Gaussians13 (ie13 lower13 kurtosis13 than13 a13 Gaussian)13 When13 c50 =13 0113 the13 conditional13 response13 distributions13 are13 most13 Gaussian13 on13 average13 For13 c50 0113 the13 distributions13 have13 heavier13 tails13 than13 Gaussians13 (ie13 higher13 kurtosis13 than13 a13 Gaussian)13 (b)13 Powers13 of13 the13 multi-shy‐dimensional13 generalized13 Gaussians13 that13 best13 fit13 a13 subset13 of13 the13 two-shy‐dimensional13 conditional13 filter13 response13 distributions13 A13 power13 of13 two13 (dashed13 line)13 indicates13 that13 the13 best13 fitting13 distribution13 was13 Gaussian13 Powers13 greater13 than13 or13 less13 than13 two13 indicate13 that13 the13 responses13 are13 better13 fit13 by13 distributions13 having13 lighter13 or13 heavier13 tails13 than13 a13 Gaussian13 respectively13 The13 best-shy‐fitting13 powers13 are13 shown13 for13 three13 different13 values13 of13 c50 13 00013 01013 and13 02513 The13 average13 powers13 across13 the13 conditional13 response13 distributions13 for13 the13 three13 tested13 values13 of13 c50 13 were13 27913 20713 and13 11713 respectively13 The13 same13 qualitative13 results13 hold13 for13 the13 other13 two-shy‐dimensional13 response13 distributions13 The13 pattern13 also13 holds13 for13 higher-shy‐dimensional13 filter13 response13 distributions13 In13 higher13 dimensions13 however13 results13 are13 noisier13 because13 stable13 estimates13 of13 kurtosis13 require13 significantly13 more13 data13 13 13 13 13 13

  • jovi-14-02-01_118
    • Introduction
    • f01
    • Methods and results
    • e01
    • f02
    • e02
    • e03
    • e04
    • f03
    • e05
    • f04
    • f05
    • f06
    • f07
    • Discussion
    • Conclusions
    • Methods details
    • e06
    • e07
    • Albrecht1
    • Albrecht2
    • Anzai1
    • Badcock1
    • Banks1
    • Berens1
    • Blakemore1
    • Burge1
    • Burge2
    • Carandini1
    • Chen1
    • Cormack1
    • Cumming1
    • Cumming2
    • Cumming3
    • Curcio1
    • DeValois1
    • DeAngelis1
    • Ferster1
    • Fincham1
    • Fleet1
    • Ganguli1
    • Geisler1
    • Girshick1
    • Haefner1
    • Haefner2
    • Harris1
    • Hecht1
    • Heeger1
    • Hibbard1
    • Hoyer1
    • Jazayeri1
    • Kaye1
    • Landers1
    • Lehky1
    • Levi1
    • Lewicki1
    • Li1
    • Liu1
    • Ma1
    • Marcos1
    • Marr1
    • McKee1
    • Nakayama1
    • Nienborg1
    • Ogle1
    • Ohzawa1
    • Ohzawa2
    • Ohzawa3
    • Olshausen1
    • Panum1
    • Poggio1
    • Qian1
    • Qian2
    • Read1
    • Read2
    • Read3
    • Simoncelli1
    • Skottun1
    • Stevenson1
    • Stockman1
    • Tanabe1
    • Tanabe2
    • Thibos1
    • Tsao1
    • Tyler1
    • Tyler2
    • Vlaskamp1
    • Williams1
    • Wyszecki1
      • JOV-03741-2013-s01
Page 19: Optimal disparity estimation in natural stereo images · The local differences between the retinal images—the binocular disparities— ... vertically oriented receptive fields.The

113 13

Optimal13 disparity13 estimation13 in13 natural13 stereo-shy‐images13 (Supplement)13

Johannes13 Burge13 amp13 Wilson13 S13 Geisler13 13 Computing13 the13 log13 likelihood13 with13 populations13 of13 simple13 and13 complex13 cells13 13 Simple13 cells13 in13 visual13 cortex13 produce13 only13 positive13 responses13 whereas13 AMA13 filters13 produce13 both13 positive13 and13 negative13 responses13 The13 linear13 AMA13 filter13 responses13 (see13 first13 term13 in13 Eq13 5)13 and13 their13 pairwise13 sums13 (see13 the13 sum13 in13 the13 third13 termrsquos13 parentheses)13 can13 be13 obtained13 from13 a13 population13 of13 simple13 cells13 We13 define13 simple13 cells13 as13 units13 whose13 outputs13 result13 from13 linear13 filtering13 followed13 by13 half-shy‐wave13 rectification13 (ie13 thresholding13 at13 zero)13 Each13 AMA13 filter13 response13 in13 the13 first13 term13 can13 be13 obtained13 from13 two13 simple13 cells13 that13 are13 lsquoonrsquo13 and13 lsquooffrsquo13 versions13 of13 each13 AMA13 filter13 (Fig13 S3a13 see13 Fig13 3)13 Each13 pairwise-shy‐summed13 AMA13 filter13 response13 in13 the13 third13 term13 can13 be13 obtained13 from13 two13 simple13 cells13 that13 are13 lsquoonrsquo13 and13 lsquooffrsquo13 versions13 of13 each13 pairwise-shy‐summed13 filter13 (Fig13 S3a13 S4)13 13 13 The13 squared13 AMA13 filter13 responses13 (see13 the13 second13 and13 third13 terms13 in13 Eq13 5)13 can13 be13 obtained13 from13 a13 population13 of13 complex13 cells13 We13 define13 complex13 cells13 as13 units13 whose13 responses13 result13 from13 linear13 filtering13 followed13 by13 a13 squaring13 non-shy‐linearity13 as13 is13 often13 the13 case13 in13 cortex13 Complex13 cells13 can13 be13 implemented13 by13 summing13 and13 then13 squaring13 (or13 squaring13 and13 then13 summing)13 the13 outputs13 of13 lsquoonrsquo13 and13 lsquooffrsquo13 simple13 cells13 (Fig13 S3b13 the13 disparity13 tuning13 curves13 of13 these13 complex13 cells13 are13 given13 in13 Fig13 S5)13 13 Finally13 neurons13 whose13 response13 rates13 represent13 the13 log13 likelihood13 of13 the13 AMA13 filter13 population13 responses13 can13 be13 constructed13 via13 a13 weighted13 sum13 of13 simple13 and13 complex13 cell13 population13 responses13 (Eq13 513 Fig13 613 Fig13 7)13 (Note13 that13 the13 likelihood13 can13 be13 obtained13 by13 exponentiating13 the13 log-shy‐likelihood13 responses)13 In13 other13 words13 a13 large13 collection13 of13 log-shy‐likelihood13 (LL)13 neurons13 each13 with13 a13 different13 preferred13 disparity13 can13 be13 constructed13 from13 a13 fixed13 set13 of13 simple13 and13 complex13 cells13 simply13 by13 changing13 the13 weights13 13 13 Derivation13 of13 optimal13 weights13 on13 simple13 and13 complex13 cells13 The13 first13 term13 in13 Eq13 513 (ie13 the13 linear13 term)13 can13 be13 conceptualized13 as13 a13 weighted13 sum13 of13 simple13 cell13 population13 responses13 The13 second13 and13 third13 terms13 (ie13 the13 quadratic13 terms)13 can13 be13 conceptualized13 as13 weighted13 sums13 of13 complex13 cell13 population13 responses13 The13 constant13 corresponds13 to13 baseline13 response13 The13 constant13 and13 the13 weights13 on13 the13 linear13 squared13 and13 sum-shy‐squared13 filters13 are13 given13 by13

wik = Ck

minus1uk 13 13 13 13 13 13 13 13 13 (S1a)13

13 wiik = minusdiag(Cminus1)+ 05Cminus11 13 13 13 13 13 13 13 (S1b)13

wij k = minus05Cij

minus1 forallij j gt i 13 13 13 13 13 13 13 13 (S1c)13

cons primetk = minus05ukTCk

minus1uk + constk 13 13 13 13 13 13 (S1d)13 13 where13 I13 is13 the13 identity13 matrix13 113 is13 the13 lsquoonesrsquo13 vector13 and13 diag()13 sets13 a13 matrix13 diagonal13 to13 a13 vector13 Here13 we13 derive13 these13 weight13 equations13 based13 on13 the13 multi-shy‐dimensional13 Gaussian13 approximations13 to13 the13 empirically13 determined13 filter13 response13 distributions13 (ie13 conditional13 response13 distributions)13 (see13 Fig13 4a)13 13 13 13 13 13

213 13

First13 we13 rewrite13 Eq13 513 in13 the13 main13 text13 by13 expanding13 its13 third13 term13 13

ln p R δ k( ) = wik Ri

i=1

n

sum + wiik Ri2

i=1

n

sum + wij k Ri2 + 2Ri Rj + Rj

2( )j=i+1

n

sumi=1

nminus1

sum + cons primet 13 (S2)13

13 Second13 we13 rewrite13 Eq13 413 in13 the13 main13 text13

13

ln p R δ k( ) = minus05 R minus uk( )T Ak R minus uk( )+ const 13 13 13 13 13 (S3)13

13 where13 Ak 13 equals13 the13 inverse13 covariance13 matrix13 Ck

minus1 13 (Recall13 that13 the13 mean13 vector13 uk and13 covariance13 matrix13 Ck are13 obtained13 by13 projecting13 a13 collection13 of13 natural13 stereo-shy‐images13 having13 disparity13 δ k 13 onto13 the13 filters)13 From13 here13 forward13 the13 disparity13 index13 k13 is13 dropped13 for13 notational13 simplicity13 Third13 we13 set13 Eqs13 S213 and13 S313 equal13 to13 one13 another13 multiply13 through13 and13 collect13 terms13 The13 terms13 of13 the13 log13 likelihood13 (Eq13 S3)13 are13 found13 to13 have13 the13 form13 13

( ) ( )2 2 205 05 2ii i i ii i ii i i ii ia R u a R a Ru a uminus minus = minus minus + 13 13 13 13 (S4a)13

minusaij Ri minus ui( ) Rj minus u j( ) = minus aijRiRj minus aijRiu j minus aijRjui + aiju jui( ) 13 13 13 (S4b)13

13 where13 iia 13 are13 the13 on-shy‐diagonal13 elements13 of13 A 13 ija are13 the13 off-shy‐diagonal13 elements13 and13 iu 13 the13 are13 the13 mean13 filter13 responses13 13 13 The13 aim13 is13 now13 to13 express13 wi wii 13 and13 wij 13 in13 terms13 of13 iu 13 iia 13 and13 ija 13 First13 consider13 wi 13 In13 this13 case13

we13 want13 to13 collect13 terms13 containing13 iR 13 wiRi 13 from13 Eq13 S213 and13 aiiRiui 13 and13 aijRiu j 13 from13 Eq13 S4a13 Setting13 the13 iR 13 terms13 equal13 and13 canceling13 gives13 13

wi = aiju jj=1

n

sum 13 13 13 13 13 13 13 13 13 (S5a)13

13 Note13 that13 the13 weights13 on13 the13 linear13 filter13 responses13 wi 13 are13 zero13 if13 the13 mean13 linear13 filter13 responses13 ui 13 are13 zero13 In13 our13 case13 the13 filter13 response13 distributions13 means13 (ie13 the13 means13 of13 the13 conditional13 response13 distributions)13 are13 all13 near13 zero13 (see13 Fig13 4a13 Fig13 S2)13 Thus13 the13 weights13 on13 the13 simple13 cell13 responses13 (Fig13 6a)13 contribute13 little13 to13 the13 units13 (LL13 neurons)13 that13 respond13 according13 to13 the13 log13 likelihood13 of13 the13 linear13 filter13 population13 response13 (see13 Discussion)13 13 Next13 consider13 wij 13 i ne j 13 In13 this13 case13 we13 want13 to13 collect13 terms13 containing13 RiRj 13 2wijRiRj 13 appears13

in13 Eq13 S213 and13 minusaijRiRj 13 appears13 in13 Eq13 S4b13 Setting13 the13 two13 RiRj 13 terms13 equal13 to13 each13 other13 and13 canceling13 gives13 13

wij = minus05aij 13 13 13 13 13 13 13 13 13 (S5b)13

313 13

Next13 consider13 wii 13 In13 this13 case13 the13 total13 weight13 on13 2iR 13 in13 Eq13 S213 must13 be13 05 iiaminus 13 The13 weight13 from13

the13 third13 term13 of13 Eq13 S213 can13 be13 found13 by13 substituting13 wij = minus05aij 13 Subtracting13 that13 weight13 from13 the13

total13 weight13 specified13 by13 Eq13 S413 gives13 wii 13 13

wii = minusaii + 05 aijjsum 13 13 13 13 13 13 13 13 13 (S5c)13

13 Last13 by13 grouping13 terms13 from13 Eq13 S413 that13 are13 independent13 of13 the13 filter13 response13 we13 find13 the13 constant13 in13 Eq13 S213 is13 given13 by13 13

cons primet = minus05 aiimicroi2 + aijmicroimicro j

j foralljneisum

⎝⎜⎞

⎠⎟isum + const 13 13 13 13 (S5d)13

13 Eqs13 S1a-shy‐d13 are13 obtained13 by13 rewriting13 Eqs13 S5a-shy‐d13 in13 matrix13 notation13 13 13 In13 visual13 cortex13 simple13 and13 complex13 cells13 have13 the13 same13 maximum13 firing13 rates13 on13 average13 (Albrecht13 amp13 Hamilton13 1982)13 However13 the13 maximum13 possible13 response13 of13 the13 squared13 filters13 (see13 the13 second13 term13 in13 Eq13 5)13 is13 R2max 13 and13 the13 maximum13 possible13 response13 of13 the13 sum-shy‐squared13 filters13 (see13 the13 third13

term13 in13 Eq13 5)13 is13 given13 by13 R2ijmax = Rmax 2+ 2fi sdot f j( )2 13 where13 fi 13 and13 f j 13 are13 the13 weighting13 functions13 defining13 filters13 i13 and13 j13 (see13 Fig13 3a)13 Thus13 if13 these13 computations13 were13 implemented13 in13 cortexmdashthat13 is13 if13 the13 maximum13 response13 of13 the13 complex13 cells13 equaled13 the13 maximum13 response13 of13 the13 simple13 cells13 Rmax 13 rather13 than13 R

2max 13 and13 R

2ijmaxmdashthe13 weights13 on13 the13 complex13 cell13 responses13 must13 be13 scaled13

Thus13 the13 optimal13 weights13 on13 complex13 cells13 are13 given13 by13 primewi = wi 13 primewii = Rmaxwii 13 and13 primewij =Rijmax2

Rmaxwij 13

These13 scale13 factors13 are13 important13 because13 in13 a13 neurophysiological13 experiment13 one13 would13 measure13 primew 13 not13 w 13 Consequently13 primewii 13 and13 primewij 13 are13 the13 normalized13 weights13 plotted13 in13 Fig13 7b13

13 Evaluating13 the13 effect13 of13 surface13 slant13 Slant13 causes13 disparity13 to13 change13 across13 the13 patch13 We13 evaluated13 the13 effect13 of13 surface13 slant13 by13 determining13 the13 optimal13 filters13 and13 estimation13 performance13 for13 a13 training13 set13 with13 a13 distribution13 of13 slants13 If13 we13 assume13 (plausibly)13 that13 objects13 are13 on13 average13 equally13 likely13 to13 be13 viewed13 from13 any13 given13 angle13 then13 for13 a13 given13 small13 angular13 retinal13 extent13 the13 probability13 of13 a13 given13 slant13 about13 the13 vertical13 axis13 should13 fall13 as13 a13 half-shy‐cosine13 function13 Slant13 was13 calculated13 for13 the13 cyclopean13 eye13 To13 evaluate13 the13 effect13 of13 surface13 slant13 variation13 we13 generated13 a13 training13 set13 of13 stereo-shy‐images13 from13 a13 set13 of13 surfaces13 with13 slants13 that13 were13 randomly13 drawn13 from13 a13 truncated13 (+7113 deg)13 cosine13 probability13 density13 (more13 extreme13 slants13 were13 not13 practical13 to13 include13 because13 of13 the13 surface13 size13 required13 to13 produce13 a13 projected13 patch13 width13 of13 113 deg)13 The13 within-shy‐patch13 changes13 in13 disparity13 that13 were13 introduced13 by13 these13 slants13 are13 comparable13 to13 those13 in13 natural13 viewing13 (Fig13 S1g)13 Results13 based13 on13 this13 analysis13 are13 thus13 likely13 to13 be13 representative13 Then13 we13 determined13 the13 optimal13 binocular13 filters13 for13 this13 new13 training13 set13 The13 filters13 that13 were13 trained13 on13 surfaces13 having13 different13 slants13 (Fig13 S1a-shy‐c)13 and13 the13 LL13 neurons13 that13 are13 constructed13 from13 them13 (Fig13 S1de)13 are13 quite13 similar13 to13 those13 that13 were13 trained13 only13 on13 fronto-shy‐parallel13 surfaces13 The13 main13 (although13 slight)13 differences13 are13 that13 the13 new13 filters13 have13 spatial13 extents13 that13 are13 slightly13 smaller13 and13 octave13 bandwidths13 that13 are13 slightly13 higher13

413 13

and13 that13 the13 new13 LL13 neuron13 tuning13 curves13 are13 broader13 Disparity13 estimation13 performance13 on13 slant13 surfaces13 is13 also13 comparable13 but13 precision13 is13 somewhat13 reduced13 (Fig13 S1f13 Fig13 5d)13 When13 disparity13 changes13 across13 the13 patch13 the13 optimal13 encoding13 filters13 must13 be13 small13 enough13 to13 prevent13 the13 disparity13 signal13 from13 being13 lsquosmeared13 outrsquo13 by13 within-shy‐patch13 (ie13 within-shy‐receptive13 field)13 disparity13 variation13 And13 they13 must13 be13 large13 enough13 to13 prevent13 a13 poor13 signal-shy‐to-shy‐noise13 ratio13 which13 would13 lead13 to13 a13 preponderance13 of13 false13 matches13 (ie13 inaccurate13 disparity13 estimates)13 The13 optimal13 filters13 appear13 to13 strike13 this13 balance13 13 13 Evaluating13 the13 effect13 of13 depth13 variations13 other13 than13 slant13 Naturally13 occurring13 fine13 depth13 structure13 and13 occlusions13 (ie13 within-shy‐patch13 depth13 variation13 and13 depth13 discontinuities13 Fig13 S8a-shy‐c)13 were13 not13 present13 in13 either13 of13 our13 training13 sets13 To13 determine13 how13 the13 magnitude13 of13 these13 other13 sources13 of13 depth13 variation13 compare13 to13 that13 due13 to13 slant13 we13 analyzed13 4000013 113 deg13 patches13 from13 3013 outdoor13 range13 images13 obtained13 with13 a13 high-shy‐precision13 Riegl13 VZ40013 range13 scanner13 For13 each13 patch13 we13 computed13 the13 standard13 deviation13 of13 i)13 the13 differences13 between13 the13 raw13 range13 values13 and13 the13 best-shy‐fitting13 fronto-shy‐parallel13 plane13

σ nonminus planar 13 and13 ii)13 the13 differences13

between13 the13 best-shy‐fitting13 slanted13 and13 fronto-shy‐parallel13 planes13 σ planar 13 The13 ratio13 of13 these13 standard13

deviations13 σ planar σ nonminus planar 13 provides13 a13 measure13 of13 the13 relative13 magnitude13 of13 depth13 variation13 due13 to13

slant13 and13 variation13 due13 to13 other13 factors13 If13 the13 depth13 variation13 is13 due13 only13 to13 surface13 slant13 then13 the13 ratio13

σ planar σ nonminus planar 13 will13 have13 a13 value13 equal13 to13 1013 If13 all13 of13 the13 depth13 variation13 is13 due13 to13 factors13

other13 than13 slant13 the13 ratio13 will13 have13 a13 value13 of13 0013 Fig13 S8d13 plots13 the13 distribution13 of13 this13 ratio13 for13 all13 range13 patches13 In13 the13 most13 common13 patches13 essentially13 all13 the13 variation13 is13 captured13 by13 slant13 (ie13 the13 modal13 ratio13 is13 ~10)13 The13 median13 ratio13 is13 07513 indicating13 that13 in13 the13 median13 patch13 7513 of13 the13 depth13 variation13 is13 captured13 by13 slant13 Given13 that13 the13 optimal13 binocular13 filters13 are13 robust13 to13 variation13 in13 surface13 slant13 and13 given13 that13 most13 of13 natural13 depth13 variation13 is13 captured13 by13 slant13 it13 is13 unlikely13 that13 depth13 variation13 other13 than13 slant13 will13 qualitatively13 affect13 our13 results13 13 External13 variability13 and13 neural13 noise13 13 We13 examined13 the13 relative13 magnitude13 of13 stimulus-shy‐induced13 LL13 neuron13 variability13 and13 typical13 neural13 noise13 in13 cortex13 Typical13 cortical13 neurons13 have13 a13 peak13 mean13 response13 larger13 than13 3013 spikessec13 (Geisler13 amp13 Albrecht13 1997)13 and13 intrinsic13 noise13 variance13 proportional13 to13 the13 neuronrsquos13 mean13 response13 with13 a13 proportionality13 constant13 (ie13 Fano13 factor)13 less13 than13 15(Tolhurst13 Movshon13 amp13 Dean13 1983)13 To13 err13 on13 the13 side13 of13 too13 much13 neural13 noise13 we13 assume13 that13 each13 LL13 neuron13 has13 a13 maximum13 average13 response13 of13 3013 spkss13 and13 a13 Fano13 factor13 of13 1513 (bc13 the13 effect13 of13 noise13 is13 greater13 at13 lower13 firing13 rates)13 We13 convert13 the13 LL13 neuron13 responses13 to13 spike13 rates13 by13 first13 obtaining13 the13 average13 response13 of13 each13 LL13 neuron13 to13 its13 preferred13 disparity13 and13 then13 scaling13 that13 value13 to13 the13 maximum13 average13 response13 For13 an13 integration13 time13 of13 a13 typical13 fixation13 duration13 (30013 ms)13 stimulus-shy‐induced13 response13 variance13 was13 ~3x13 greater13 than13 the13 internal13 noise13 For13 an13 integration13 time13 equal13 to13 a13 typical13 stimulus13 presentation13 in13 a13 psychophysical13 task13 (113 sec)13 13 stimulus-shy‐induced13 response13 variance13 was13 ~9x13 greater13 than13 the13 internal13 noise13 Thus13 external13 variability13 must13 be13 considered13 to13 understand13 the13 influence13 of13 neural13 noise13 on13 the13 behavioral13 limits13 of13 vision13 under13 natural13 conditions13 In13 fact13 stimulus-shy‐induced13 variability13 is13 likely13 to13 be13 the13 controlling13 variability13 in13 many13 natural13 single-shy‐trial13 tasks13 13 Spatial13 frequency13 tuning13 of13 optimal13 linear13 binocular13 filters13 13 To13 develop13 an13 intuition13 for13 why13 spatial13 frequencies13 higher13 than13 ~613 cpd13 carry13 little13 information13 about13 disparity13 we13 examined13 the13 disparity13 signals13 resulting13 from13 a13 binocularly-shy‐viewed13 high-shy‐contrast13 cosine-shy‐windowed13 edge13 The13 Fourier-shy‐decomposition13 of13 each13 eyersquos13 stimulus13 is13 a13 set13 of13

513 13

phase-shy‐aligned13 sinewave13 gratings13 with13 the13 1f13 contrast13 fall-shy‐off13 typical13 of13 natural13 scenes(Field13 1987)13 The13 binocular13 difference13 signal13 (ie13 the13 difference13 of13 the13 left13 and13 right13 signals)13 is13 sinusoidal13 with13 contrast13 amplitude13 given13 by13 13

AB f δ k( ) = AL( f )2 + AR( f )2 minus 2AL( f )AR( f )cos 2π fδ k( ) 13 13 13 (S6)13

13 where f 13 is13 the13 spatial13 frequency13 δ k 13 is13 a13 particular13 disparity13 AL ( f ) 13 and13 AR( f ) 13 are13 the13 left13 and13

right13 eye13 retinal13 amplitudes13 and13 AB f δ k( ) 13 is13 the13 amplitude13 of13 the13 binocular13 contrast13 difference13 signal13 If13 the13 two13 eyes13 have13 identical13 optics13 the13 left-shy‐13 and13 right-shy‐eye13 retinal13 amplitudes13 (which13 include13 the13 effects13 of13 the13 optics13 and13 the13 1 f 13 falloff13 of13 natural13 image13 spectra)13 will13 be13 the13 same13 for13 a13 given13 disparity13 Fig13 S7a13 plots13 pattern13 of13 binocular13 contrast13 signals13 for13 each13 of13 several13 disparities13 Fig13 S7b13 shows13 the13 signals13 after13 filtering13 with13 a13 bank13 of13 log-shy‐Gabor13 filters13 each13 having13 a13 1513 octave13 bandwidth13 The13 patterns13 of13 binocular13 contrast13 barely13 differ13 above13 ~613 cpd13 A13 filter13 population13 (for13 disparity13 estimation)13 will13 be13 most13 efficient13 if13 each13 filter13 signals13 information13 about13 a13 range13 of13 disparities13 13 13 V113 binocular13 receptive13 fields13 tuned13 to13 the13 highest13 useful13 spatial13 frequency13 (~613 cpd)13 would13 have13 a13 spatial13 extent13 of13 ~813 arcmin13 assuming13 the13 typical13 1513 octave13 bandwidth13 of13 neurons13 in13 cortex13 Given13 that13 receptive13 fields13 cannot13 signal13 disparity13 variation13 finer13 than13 their13 own13 width13 this13 analysis13 may13 help13 explain13 why13 humans13 lose13 the13 ability13 to13 detect13 disparity13 modulations13 (ie13 sinusoidal13 modulations13 in13 disparity-shy‐defined13 depth)13 at13 spatial13 frequencies13 higher13 than13 ~413 cpd13 (Banks13 Gepshtein13 amp13 Landy13 200413 Harris13 McKee13 amp13 Smallman13 1997)13 Vergence13 eye13 movement13 jitter13 has13 previously13 been13 proposed13 as13 an13 explanation13 for13 why13 useful13 binocular13 information13 is13 restricted13 to13 low13 spatial13 frequencies13 (Vlaskamp13 Yoon13 amp13 Banks13 2011)13 vergence13 noise13 can13 degrade13 stereo13 information13 enough13 to13 render13 high13 frequency13 signals13 unmeasurable13 (Fig13 S7cd)13 The13 analysis13 presented13 here13 suggests13 that13 the13 frequency13 at13 which13 humans13 lose13 the13 ability13 to13 detect13 disparity13 modulation13 may13 simply13 be13 explained13 by13 the13 statistics13 of13 natural13 images13 13 Optics13 The13 retinal13 images13 used13 in13 our13 analysis13 were13 simulated13 to13 match13 the13 retinal13 image13 formation13 that13 occurs13 in13 the13 human13 eye13 In13 humans13 and13 non-shy‐human13 primates13 accommodative13 and13 vergence13 eye-shy‐movement13 systems13 are13 yoked(Fincham13 amp13 Walton13 1957)13 Consistent13 with13 this13 fact13 we13 set13 the13 focus13 distance13 equal13 to13 the13 fixation13 distance13 of13 4013 cm13 and13 defocused13 the13 images13 appropriate13 for13 each13 disparity13 The13 image13 of13 a13 surface13 displaced13 from13 the13 fixation13 is13 defocused13 by13

13 ΔD cong δ IPD 13 13 13 13 13 13 13 13 13 13 (S7)13

13 where13 δ 13 is13 the13 disparity13 of13 the13 surface13 expressed13 in13 radians13 IPD13 is13 the13 inter-shy‐pupillary13 distance13 expressed13 in13 meters13 and13 ΔD 13 is13 the13 defocus13 expressed13 in13 diopters13 (1meters)13 To13 account13 for13 the13 effect13 of13 chromatic13 aberration13 we13 created13 a13 different13 polychromatic13 point-shy‐spread13 function13 (PSF)13 for13 each13 disparity13 (ie13 defocus13 Eq13 S7)13 Single-shy‐wavelength13 PSFs13 were13 computed13 every13 1013 nm13 between13 40013 and13 70013 nm13 The13 wavelength-shy‐dependent13 change13 in13 refractive13 power13 of13 the13 human13 eye13 was13 taken13 from13 the13 literature13 (Thibos13 Ye13 Zhang13 amp13 Bradley13 1992)13 Polychromatic13 PSFs13 were13 obtained13 by13 weighting13 the13 single-shy‐wavelength13 PSFs13 by13 the13 photopic13 sensitivity13 function13 (Stockman13 amp13 Sharpe13 2000)13 and13 by13 the13 D6513 daylight13 illumination13 spectrum13 and13 then13 summing13 13

613 13

13 psf photopic xΔD( ) = 1

Kpsf xλΔD( )sc

λsum λ( )D65 λ( ) 13 13 13 13 (S8)13

13 where13 sc λ( ) 13 is13 the13 human13 photopic13 sensitivity13 function13 and13 K13 is13 a13 normalizing13 constant13 that13 sets13 the13 volume13 of13 psf photopic 13 to13 1013 The13 modulation13 transfer13 function13 (MTF)13 is13 the13 amplitude13 of13 the13 Fourier13 transform13 of13 the13 PSF13 13 Contrast13 normalization13 Contrast13 normalization13 occurs13 early13 in13 visual13 processing13 The13 standard13 model13 of13 cortical13 neuron13 response13 assumes13 that13 the13 input13 is13 the13 contrast13 signal13

c x( ) = r x( )minus r( ) r 13 where13 r x( ) 13 are13 the13

noisy13 sensor13 responses13 to13 the13 luminance13 image13 falling13 on13 the13 retina13 over13 a13 local13 area13 and13 r 13 is13 the13 local13 mean13 This13 contrast13 signal13 is13 then13 normalized13 by13 the13 local13 contrast13

cnorm x( ) = c x( ) c x( ) 2

+ nc502 13 where13 n13 is13 the13 dimensionality13 of13 the13 vector13 c x( ) 2 13 is13 the13 contrast13

energy13 (or13 equivalently13 n 13 times13 the13 squared13 RMS13 contrast)13 and13 50c 13 is13 the13 half-shy‐saturation13 constant13 (Albrecht13 amp13 Geisler13 199113 Albrecht13 amp13 Hamilton13 198213 Heeger13 1992)13 (We13 are13 agnostic13 about13 the13 mechanism13 by13 which13 the13 contrast13 normalization13 is13 achieved)13 Finally13 the13 normalized13 contrast13 signal13 is13 weighted13 by13 the13 cortical13 neuronrsquos13 receptive13 field13 and13 passed13 through13 a13 static13 non-shy‐linearity13 For13 example13 in13 the13 case13 of13 a13 binocular13 simple13 cell13 the13 response13 of13 the13 neuron13 is13 obtained13 by13 taking13 the13 dot13 product13 of13 the13 contrast-shy‐normalized13 signal13 with13 the13 receptive13 field13 and13 half-shy‐wave13 rectifying13

Ri = Rmax cnorm x( ) sdot fi x( )⎢⎣ ⎥⎦ 13 13

13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13

713 13

13 References13 Albrecht13 D13 G13 amp13 Geisler13 W13 S13 (1991)13 Motion13 selectivity13 and13 the13 contrast-shy‐response13 function13 of13

simple13 cells13 in13 the13 visual13 cortex13 Visual13 Neuroscience13 7(6)13 531ndash54613 Albrecht13 D13 G13 amp13 Hamilton13 D13 B13 (1982)13 Striate13 cortex13 of13 monkey13 and13 cat13 contrast13 response13

function13 Journal13 of13 Neurophysiology13 48(1)13 217ndash23713 Banks13 M13 S13 Gepshtein13 S13 amp13 Landy13 M13 S13 (2004)13 Why13 is13 spatial13 stereoresolution13 so13 low13 The13

Journal13 of13 Neuroscience13 24(9)13 2077ndash208913 13 Field13 D13 J13 (1987)13 Relations13 between13 the13 statistics13 of13 natural13 images13 and13 the13 response13 properties13 of13

cortical13 cells13 Journal13 of13 the13 Optical13 Society13 of13 America13 A13 4(12)13 2379ndash239413 Fincham13 E13 amp13 Walton13 J13 (1957)13 The13 reciprocal13 actions13 of13 accommodation13 and13 convergence13 The13

Journal13 of13 Physiology13 137(3)13 488ndash50813 Geisler13 W13 S13 amp13 Albrecht13 D13 G13 (1997)13 Visual13 cortex13 neurons13 in13 monkeys13 and13 cats13 detection13

discrimination13 and13 identification13 Visual13 Neuroscience13 14(5)13 897ndash91913 Harris13 J13 M13 McKee13 S13 P13 amp13 Smallman13 H13 S13 (1997)13 Fine-shy‐scale13 processing13 in13 human13 binocular13

stereopsis13 Journal13 of13 the13 Optical13 Society13 of13 America13 A13 14(8)13 1673ndash168313 Heeger13 D13 J13 (1992)13 Normalization13 of13 cell13 responses13 in13 cat13 striate13 cortex13 Visual13 Neuroscience13 9(2)13

181ndash19713 Hibbard13 P13 B13 (2008)13 Binocular13 energy13 responses13 to13 natural13 images13 Vision13 Research13 48(12)13 1427ndash

143913 13 Stockman13 A13 amp13 Sharpe13 L13 T13 (2000)13 The13 spectral13 sensitivities13 of13 the13 middle-shy‐13 and13 long-shy‐wavelength-shy‐

sensitive13 cones13 derived13 from13 measurements13 in13 observers13 of13 known13 genotype13 Vision13 Research13 40(13)13 1711ndash173713

Thibos13 L13 N13 Ye13 M13 Zhang13 X13 amp13 Bradley13 A13 (1992)13 The13 chromatic13 eye13 a13 new13 reduced-shy‐eye13 model13 of13 ocular13 chromatic13 aberration13 in13 humans13 Applied13 Optics13 31(19)13 3594ndash360013

Tolhurst13 D13 J13 Movshon13 J13 A13 amp13 Dean13 A13 F13 (1983)13 The13 statistical13 reliability13 of13 signals13 in13 single13 neurons13 in13 cat13 and13 monkey13 visual13 cortex13 Vision13 Research13 23(8)13 775ndash78513

Vlaskamp13 B13 N13 S13 Yoon13 G13 amp13 Banks13 M13 S13 (2011)13 Human13 stereopsis13 is13 not13 limited13 by13 the13 optics13 of13 the13 well-shy‐focused13 eye13 The13 Journal13 of13 Neuroscience13 31(27)13 9814ndash981813

13

813 13

13 Figure13 S113 Optimal13 filters13 and13 disparity13 estimation13 performance13 for13 estimating13 disparity13 for13 surfaces13 with13 a13 cosine13 distribution13 of13 slants13 (a)13 Optimal13 linear13 binocular13 filters13 for13 estimating13 the13 disparity13 (of13 the13 center13 pixel)13 of13 slanted13 surfaces13 The13 filters13 were13 learned13 on13 a13 training13 set13 having13 surfaces13 with13 a13 cosine13 distribution13 of13 slants13 The13 optimal13 filter13 shapes13 are13 robust13 to13 the13 presence13 of13 non-shy‐zero13 surface13 slants13 in13 the13 training13 set13 However13 these13 filters13 are13 of13 somewhat13 smaller13 spatial13 extent13 and13 slightly13 higher13 octave13 bandwidth13 (compare13 filters13 F3-shy‐F513 to13 those13 in13 Fig13 3)13 (b)13 Spatial13 frequency13 tuning13 and13 bandwidth13 of13 filters13 (c)13 Two-shy‐dimensional13 analogs13 of13 filters13 in13 (a)13 The13 two-shy‐dimensional13 filters13 respond13 virtually13 identically13 to13 two-shy‐dimensional13 images13 as13 do13 the13 one-shy‐dimensional13 filters13 to13 one-shy‐dimensional13 signals13 (d)13 Tuning13 curves13 of13 LL13 neurons13 trained13 and13 tested13 on13 surfaces13 with13 varying13 surface13 slant13 (e)13 Normalized13 weights13 on13 model13 complex13 cell13 responses13 for13 constructing13 five13 LL13 neurons13 with13 the13 indicated13 preferred13 disparities13 (f)13 Disparity13 estimation13 performance13 with13 surfaces13 having13 a13 cosine13 distribution13 of13 slants13 (black13 line)13 Also13 shown13 are13 disparity13 estimates13 derived13 from13 a13 local13 cross-shy‐correlator13 (gray13 line)13 (g)13 Disparity13 gradients13 from13 viewing13 natural13 scenes(Hibbard13 2008)13 vs13 disparity13 gradients13 in13 our13 training13 set13 The13 relationship13 between13 slant13 and13 disparity13 gradient13 is13 given13 by13

g asymp IPD tanθ( ) d 13 where13 θ 13 is13 the13 surface13 slant13 and13 d13 is13 the13 distance13 to13 the13 surface13 Hibbard13 (2008)13 estimated13 disparity13 gradients13 in13 natural13 viewing13 for13 distances13 nearer13 than13 3513 m13 (dashed13 curve)13 and13 0513 m13 (dotted13 curve)13 At13 the13 04m13 viewing13 distance13 assumed13 throughout13 the13 paper13 our13 manipulation13 of13 surface13 slant13 introduced13 distribution13 of13 disparity13 gradients13 (black13 line)13 that13 is13 comparable13 to13 the13 gradients13 introduced13 in13 natural13 viewing13 13

913 13

13 Figure13 S213 Filter13 responses13 LL13 neuron13 responses13 and13 performance13 with13 random-shy‐dot13 stereograms13 and13 anti-shy‐correlated13 random-shy‐dot13 stereograms13 (a)13 Joint13 response13 distributions13 of13 filters13 F113 and13 F213 when13 stimulated13 with13 random-shy‐dot13 stereograms13 Compare13 to13 natural13 stimuli13 in13 Fig13 4a13 (b)13 Joint13 responses13 of13 filters13 F313 and13 F413 when13 stimulated13 with13 random-shy‐dot13 stereograms13 (c)13 LL13 neuron13 responses13 to13 random13 dot13 stereograms13 (d)13 LL13 neuron13 tuning13 vs13 bandwidth13 when13 stimulated13 with13 natural13 stimuli13 Note13 that13 bandwidths13 are13 elevated13 compared13 to13 when13 the13 LL13 neurons13 are13 stimulated13 with13 natural13 stimuli13 (see13 Fig13 7c)13 (e)13 Disparity13 estimates13 and13 estimate13 precision13 (inset)13 obtained13 with13 random-shy‐dot13 stereograms13 (fghi)13 Same13 as13 abce13 except13 that13 the13 stimuli13 were13 anti-shy‐correlated13 random13 dot13 stereograms13 Filter13 response13 distributions13 still13 segregate13 as13 a13 function13 of13 disparity13 (especially13 with13 F313 and13 F4)13 However13 the13 anti-shy‐correlated13 filter13 response13 distributions13 do13 not13 coincide13 with13 the13 lsquoexpectedrsquo13 pattern13 of13 response13 for13 particular13 disparities13 The13 disparities13 are13 decoded13 improperly13 with13 a13 decoder13 trained13 on13 natural13 stimuli13 LL13 neuron13 tuning13 curves13 with13 anti-shy‐correlated13 stereograms13 are13 neither13 selective13 nor13 invariant13 Inaccurate13 highly13 variable13 estimates13 result13 13 13

13 13

1013 13

13 Figure13 S313 Obtaining13 AMA13 filter13 and13 model13 complex13 cell13 responses13 with13 established13 neural13 operations13 (a)13 Simple13 cell13 responses13 are13 obtained13 by13 projecting13 the13 left-shy‐13 and13 right-shy‐eye13 signals13 onto13 each13 binocular13 filter13 and13 half-shy‐wave13 rectifying13 Subtracting13 lsquoonrsquo13 and13 lsquooffrsquo13 simple13 cell13 responses13 give13 the13 AMA13 filter13 responses13 (see13 Fig13 3a)13 (b)13 Model13 (and13 neurophysiological)13 complex13 cell13 responses13 are13 obtained13 by13 summing13 and13 squaring13 the13 responses13 of13 simple13 cells13 (Model13 complex13 cell13 responses13 can13 also13 be13 obtained13 by13 summing13 the13 responses13 of13 simple13 cells13 that13 have13 both13 a13 half-shy‐wave13 rectifying13 and13 a13 squaring13 nonlinearity)13 Note13 that13 the13 model13 complex13 cells13 differ13 from13 canonical13 disparity13 energy13 binocular13 complex13 cells13 We13 label13 them13 lsquocomplexrsquo13 because13 they13 exhibit13 temporal13 frequency13 doubling13 and13 have13 a13 fundamental13 to13 DC13 response13 ratio13 of13 less13 than13 one13 The13 model13 LL13 neurons13 (see13 Fig13 613 and13 Fig13 7)13 are13 more13 similar13 to13 disparity13 energy13 complex13 cells13 although13 as13 noted13 in13 the13 text13 there13 are13 also13 important13 differences13

13

1113 13

13 Figure13 S413 The13 eight13 AMA13 filters13 from13 Fig13 313 (on13 the13 diagonal)13 and13 their13 pairwise13 sums13 (see13 Fig13 S3a)13 Solid13 and13 dashed13 curves13 indicate13 the13 left13 and13 right13 eye13 filter13 components13 respectively13 Filter13 identity13 is13 indicated13 by13 the13 subscripts13 Fij13 Some13 filters13 have13 shapes13 similar13 to13 canonical13 V1-shy‐like13 binocular13 receptive13 fields13 These13 filters13 yield13 model13 complex13 cell13 tuning13 curves13 that13 vary13 with13 disparity13 (Fig13 S5)13 Other13 filters13 are13 irregularly13 shaped13 (eg13 F36)13 Irregularly13 shaped13 filters13 yield13 model13 complex13 cell13 tuning13 curves13 that13 do13 not13 vary13 significantly13 with13 disparity13 (Fig13 S5)13 The13 inset13 shows13 the13 largest13 absolute13 normalized13 weight13 (see13 Supplement13 Eqs13 S1a-shy‐d)13 given13 to13 each13 filter13 in13 constructing13 the13 set13 of13 LL13 neurons13 (Fig13 7a)13 Lighter13 colors13 indicate13 smaller13 weights13 Unsurprisingly13 the13 irregularly13 shaped13 filters13 (eg13 F36)13 play13 a13 generally13 smaller13 role13 in13 shaping13 the13 response13 properties13 of13 log-shy‐likelihood13 (LL)13 neurons13 13 13

1213 13

13 Figure13 S513 Disparity13 tuning13 curves13 of13 the13 model13 complex13 cells13 implied13 by13 the13 AMA13 filters13 (on-shy‐diagonal)13 and13 their13 normalized13 pairwise13 sums13 (off-shy‐diagonal)13 Solid13 curves13 show13 the13 complex13 cell13 tuning13 curves13 to13 natural13 stimuli13 The13

complex13 response13 is13 given13 by13 filtering13 the13 contrast-shy‐normalized13 stimulus13 and13 then13 squaring13 Rmax fi

Tcnorm( )2 13 (see13 Fig13 S4)13 Each13 point13 on13 the13 curve13 represents13 the13 average13 response13 to13 a13 large13 number13 of13 natural13 stimuli13 having13 the13 same13 disparity13 Gray13 area13 shows13 response13 variability13 due13 to13 variation13 in13 natural13 image13 content13 Dashed13 curves13 show13 the13 complex13 cell13 tuning13 curve13 to13 513 random-shy‐dot13 stereograms13 (RDSs)13 In13 most13 cases13 RDS13 tuning13 curves13 are13 similar13 to13 the13 tuning13 curves13 for13 natural13 stimuli13 although13 response13 magnitude13 is13 generally13 somewhat13 attenuated13 These13 individual13 model13 complex13 cells13 have13 some13 selectivity13 for13 disparity13 However13 they13 exhibit13 little13 invariance13 to13 variation13 in13 natural13 image13 content13 Thus13 the13 pattern13 of13 the13 joint13 population13 response13 must13 be13 used13 to13 accurately13 estimate13 disparity13 Neurons13 that13 signal13 the13 log-shy‐likelihood13 of13 disparity13 (LL13 neurons)13 can13 be13 constructed13 by13 appropriately13 combining13 the13 joint13 responses13 of13 model13 complex13 cells13 LL13 neurons13 are13 highly13 selective13 for13 disparity13 and13 are13 also13 strongly13 response13 invariant13 to13 irrelevant13

1313 13

13 Figure13 S613 Optimal13 disparity13 estimation13 vs13 estimation13 via13 local13 cross-shy‐correlation13 Performance13 measures13 were13 obtained13 by13 testing13 each13 method13 on13 identical13 sets13 of13 left-shy‐13 and13 right-shy‐eye13 signals13 like13 those13 shown13 in13 Fig13 1d13 (Note13 that13 this13 differs13 somewhat13 from13 the13 most13 common13 test13 of13 the13 cross-shy‐correlation13 model13 which13 gives13 the13 algorithm13 windowed13 access13 to13 many13 patches13 in13 each13 eye13 We13 chose13 the13 current13 method13 so13 that13 each13 algorithm13 has13 access13 to13 the13 same13 image13 information13 thereby13 making13 the13 comparison13 meaningful)13 Black13 curves13 indicate13 estimation13 performance13 via13 local13 cross-shy‐correlation13 Gray13 curves13 indicate13 performance13 obtained13 with13 the13 eight13 AMA13 filters13 shown13 in13 the13 main13 text13 (data13 replotted13 from13 Fig13 5)13 (a)13 Median13 MAP13 estimates13 for13 frontoparallel13 surfaces13 only13 Error13 bars13 indicate13 6813 confidence13 intervals13 (b)13 Estimate13 precision13 as13 indicated13 by13 6813 confidence13 intervals13 Error13 bars13 from13 (a)13 are13 replotted13 on13 a13 linear13 axis13 (Confidence13 intervals13 were13 plotted13 on13 a13 semi-shy‐log13 plot13 in13 the13 main13 text)13 (c)13 Sign13 identification13 performance13 with13 frontoparallel13 surfaces13 (de)13 Same13 as13 bc13 except13 that13 the13 performance13 is13 for13 surfaces13 with13 a13 cosine13 distribution13 of13 slants13 Estimation13 performance13 with13 the13 cross-shy‐correlator13 is13 (slightly)13 more13 biased13 and13 is13 significantly13 less13 precise13 at13 disparities13 larger13 than13 7513 arcmin13 13

1413 13

13 Figure13 S713 Information13 for13 disparity13 estimation13 is13 concentrated13 in13 low13 to13 mid13 spatial13 frequencies13 (a)13 Inter-shy‐ocular13 retinal13 contrast13 amplitude13 of13 a13 cosine-shy‐windowed13 high-shy‐contrast13 edge13 as13 a13 function13 of13 spatial13 frequency13 for13 different13 disparities13 (Eq13 S6)13 The13 optics13 in13 the13 two13 eyes13 were13 identical13 ie13

AL f( ) = AR f( )prop MTFδ f( ) f 13 where13 MTFδ is13 the13 modulation13 transfer13 function13 associated13 with13 a13 particular13 disparity13 (see13 Methods)13 (b)13 Inter-shy‐ocular13 contrast13 signals13 that13 have13 been13 passed13 through13 a13 bank13 of13 1513 octave13 bandwidth13 log-shy‐Gabor13 filters13 The13 signals13 differ13 little13 above13 ~513 cpd13 (c)13 The13 effect13 of13 vergence13 jitter13 on13 inter-shy‐ocular13 contrast13 signals13 Signals13 were13 corrupted13 by13 213 arcmin13 of13 vergence13 jitter13 a13 typical13 value13 in13 humans13 Vergence13 noise13 was13 simulated13 by13 convolving13 each13 disparity13 signal13 with13 a13 Gaussian13 having13 a13 standard13 deviation13 of13 213 arcmin13 Vergence13 jitter13 further13 attenuates13 high13 frequency13 inter-shy‐ocular13 contrast13 signals13 Natural13 image13 structure13 and13 vergence13 jitter13 both13 contribute13 to13 the13 lack13 of13 disparity13 information13 at13 high13 luminance13 spatial13 frequencies13 (d)13 The13 signals13 in13 (c)13 after13 having13 been13 passed13 through13 a13 bank13 of13 1513 octave13 bandwidth13 log-shy‐Gabor13 filters13 (e)13 The13 effect13 of13 a13 four13 diopter13 difference13 in13 refractive13 power13 between13 the13 two13 eyes13 drastically13 reduces13 the13 inter-shy‐ocular13 contrast13 signal13 A13 four13 diopter13 (or13 larger)13 difference13 between13 the13 left13 and13 right13 optics13 is13 a13 primary13 risk13 factor13 for13 the13 development13 of13 amblyopia13 (Levi13 et13 al13 2011)13 Note13 that13 these13 signals13 were13 simulated13 for13 eyes13 with13 2mm13 pupils13 With13 larger13 pupil13 diameters13 the13 same13 difference13 in13 refractive13 power13 would13 further13 reduce13 available13 disparity13 signals13 For13 example13 with13 a13 2mm13 pupil13 an13 813 diopter13 difference13 between13 the13 eyes13 produces13 the13 same13 difference13 in13 defocus13 blur13 as13 would13 a13 4mm13 pupil13 with13 a13 413 diopters13 difference13 between13 the13 eyes13 In13 that13 case13 there13 are13 no13 measureable13 signals13 above13 113 cpd13 (f)13 The13 signals13 in13 (e)13 after13 having13 been13 passed13 through13 a13 bank13 of13 1513 octave13 bandwidth13 log-shy‐Gabor13 filters13 (g)13 Same13 as13 (e)13 but13 with13 a13 4mm13 pupil13 (h)13 Same13 as13 (f)13 but13 with13 a13 4mm13 pupil13 13

13

1513 13

13 Figure13 S813 Depth13 structure13 not13 attributable13 to13 slant13 in13 natural13 scenes13 (a)13 A13 representative13 natural13 range13 image13 (b)13 Overhead13 plan13 view13 of13 the13 range13 along13 the13 white13 scan-shy‐line13 in13 (a)13 (c)13 Range13 values13 within13 a13 113 deg13 patch13 marked13 by13 the13 white13 box13 in13 (a)13 and13 the13 black13 box13 in13 (b)13 The13 dots13 represent13 the13 raw13 range13 values13 The13 plane13 is13 the13 best13 fitting13 plane13 to13 the13 range13 data13 (d)13 Depth13 variation13 in13 113 deg13 patches13 of13 natural13 scenes13 can13 largely13 be13 captured13 by13 slanted13 planes13 (see13 supplementary13 text)13 Values13 near13 one13 indicate13 that13 all13 range13 variation13 can13 be13 captured13 by13 (planar)13 surface13 slant13 Values13 near13 zero13 indicate13 that13 none13 of13 the13 range13 variation13 can13 be13 captured13 by13 planar13 surface13 slant13 In13 the13 most13 common13 patch13 nearly13 all13 depth13 variation13 is13 attributable13 to13 slant13 In13 the13 median13 patch13 7513 of13 depth13 variation13 is13 captured13 by13 slant13 13 13 13 13 13 13 13 13 13 13

1613 13

13 Figure13 S913 Contrast13 normalization13 and13 the13 Gaussian13 form13 of13 the13 filter13 response13 distributions13 The13 standard13 model13 of13

contrast13 normalization13 in13 retinal13 and13 cortical13 neurons13 is13 given13 by13 cnorm x( ) = c x( ) c x( ) 2

+ nc502 (Albrecht13 amp13 Geisler13

199113 Albrecht13 amp13 Hamilton13 198213 Heeger13 1992)13 Throughout13 the13 paper13 we13 assumed13 that13 c50 13 had13 a13 value13 of13 zero13 To13 examine13 the13 effect13 of13 different13 values13 of13 c50 13 on13 the13 form13 of13 the13 filter13 response13 distributions13 we13 first13 contrast-shy‐normalized13 the13 test13 stimuli13 using13 different13 values13 of13 c50 13 Then13 we13 projected13 the13 contrast-shy‐normalized13 the13 test13 stimuli13 from13 each13 disparity13 level13 onto13 the13 filters13 to13 obtain13 the13 conditional13 filter13 response13 distributions13 Finally13 we13 fit13 a13 multi-shy‐dimensional13 generalized13 Gaussian13 (one-shy‐dimension13 for13 each13 filter)13 to13 each13 conditional13 response13 distribution13

P R |δ k( ) 13 via13 maximum13 likelihood13 estimation13 (a)13 When13 c50 =13 0013 the13 conditional13 response13 distributions13 have13 tails13 that13 tend13 to13 be13 somewhat13 lighter13 than13 Gaussians13 (ie13 lower13 kurtosis13 than13 a13 Gaussian)13 When13 c50 =13 0113 the13 conditional13 response13 distributions13 are13 most13 Gaussian13 on13 average13 For13 c50 0113 the13 distributions13 have13 heavier13 tails13 than13 Gaussians13 (ie13 higher13 kurtosis13 than13 a13 Gaussian)13 (b)13 Powers13 of13 the13 multi-shy‐dimensional13 generalized13 Gaussians13 that13 best13 fit13 a13 subset13 of13 the13 two-shy‐dimensional13 conditional13 filter13 response13 distributions13 A13 power13 of13 two13 (dashed13 line)13 indicates13 that13 the13 best13 fitting13 distribution13 was13 Gaussian13 Powers13 greater13 than13 or13 less13 than13 two13 indicate13 that13 the13 responses13 are13 better13 fit13 by13 distributions13 having13 lighter13 or13 heavier13 tails13 than13 a13 Gaussian13 respectively13 The13 best-shy‐fitting13 powers13 are13 shown13 for13 three13 different13 values13 of13 c50 13 00013 01013 and13 02513 The13 average13 powers13 across13 the13 conditional13 response13 distributions13 for13 the13 three13 tested13 values13 of13 c50 13 were13 27913 20713 and13 11713 respectively13 The13 same13 qualitative13 results13 hold13 for13 the13 other13 two-shy‐dimensional13 response13 distributions13 The13 pattern13 also13 holds13 for13 higher-shy‐dimensional13 filter13 response13 distributions13 In13 higher13 dimensions13 however13 results13 are13 noisier13 because13 stable13 estimates13 of13 kurtosis13 require13 significantly13 more13 data13 13 13 13 13 13

  • jovi-14-02-01_118
    • Introduction
    • f01
    • Methods and results
    • e01
    • f02
    • e02
    • e03
    • e04
    • f03
    • e05
    • f04
    • f05
    • f06
    • f07
    • Discussion
    • Conclusions
    • Methods details
    • e06
    • e07
    • Albrecht1
    • Albrecht2
    • Anzai1
    • Badcock1
    • Banks1
    • Berens1
    • Blakemore1
    • Burge1
    • Burge2
    • Carandini1
    • Chen1
    • Cormack1
    • Cumming1
    • Cumming2
    • Cumming3
    • Curcio1
    • DeValois1
    • DeAngelis1
    • Ferster1
    • Fincham1
    • Fleet1
    • Ganguli1
    • Geisler1
    • Girshick1
    • Haefner1
    • Haefner2
    • Harris1
    • Hecht1
    • Heeger1
    • Hibbard1
    • Hoyer1
    • Jazayeri1
    • Kaye1
    • Landers1
    • Lehky1
    • Levi1
    • Lewicki1
    • Li1
    • Liu1
    • Ma1
    • Marcos1
    • Marr1
    • McKee1
    • Nakayama1
    • Nienborg1
    • Ogle1
    • Ohzawa1
    • Ohzawa2
    • Ohzawa3
    • Olshausen1
    • Panum1
    • Poggio1
    • Qian1
    • Qian2
    • Read1
    • Read2
    • Read3
    • Simoncelli1
    • Skottun1
    • Stevenson1
    • Stockman1
    • Tanabe1
    • Tanabe2
    • Thibos1
    • Tsao1
    • Tyler1
    • Tyler2
    • Vlaskamp1
    • Williams1
    • Wyszecki1
      • JOV-03741-2013-s01
Page 20: Optimal disparity estimation in natural stereo images · The local differences between the retinal images—the binocular disparities— ... vertically oriented receptive fields.The

213 13

First13 we13 rewrite13 Eq13 513 in13 the13 main13 text13 by13 expanding13 its13 third13 term13 13

ln p R δ k( ) = wik Ri

i=1

n

sum + wiik Ri2

i=1

n

sum + wij k Ri2 + 2Ri Rj + Rj

2( )j=i+1

n

sumi=1

nminus1

sum + cons primet 13 (S2)13

13 Second13 we13 rewrite13 Eq13 413 in13 the13 main13 text13

13

ln p R δ k( ) = minus05 R minus uk( )T Ak R minus uk( )+ const 13 13 13 13 13 (S3)13

13 where13 Ak 13 equals13 the13 inverse13 covariance13 matrix13 Ck

minus1 13 (Recall13 that13 the13 mean13 vector13 uk and13 covariance13 matrix13 Ck are13 obtained13 by13 projecting13 a13 collection13 of13 natural13 stereo-shy‐images13 having13 disparity13 δ k 13 onto13 the13 filters)13 From13 here13 forward13 the13 disparity13 index13 k13 is13 dropped13 for13 notational13 simplicity13 Third13 we13 set13 Eqs13 S213 and13 S313 equal13 to13 one13 another13 multiply13 through13 and13 collect13 terms13 The13 terms13 of13 the13 log13 likelihood13 (Eq13 S3)13 are13 found13 to13 have13 the13 form13 13

( ) ( )2 2 205 05 2ii i i ii i ii i i ii ia R u a R a Ru a uminus minus = minus minus + 13 13 13 13 (S4a)13

minusaij Ri minus ui( ) Rj minus u j( ) = minus aijRiRj minus aijRiu j minus aijRjui + aiju jui( ) 13 13 13 (S4b)13

13 where13 iia 13 are13 the13 on-shy‐diagonal13 elements13 of13 A 13 ija are13 the13 off-shy‐diagonal13 elements13 and13 iu 13 the13 are13 the13 mean13 filter13 responses13 13 13 The13 aim13 is13 now13 to13 express13 wi wii 13 and13 wij 13 in13 terms13 of13 iu 13 iia 13 and13 ija 13 First13 consider13 wi 13 In13 this13 case13

we13 want13 to13 collect13 terms13 containing13 iR 13 wiRi 13 from13 Eq13 S213 and13 aiiRiui 13 and13 aijRiu j 13 from13 Eq13 S4a13 Setting13 the13 iR 13 terms13 equal13 and13 canceling13 gives13 13

wi = aiju jj=1

n

sum 13 13 13 13 13 13 13 13 13 (S5a)13

13 Note13 that13 the13 weights13 on13 the13 linear13 filter13 responses13 wi 13 are13 zero13 if13 the13 mean13 linear13 filter13 responses13 ui 13 are13 zero13 In13 our13 case13 the13 filter13 response13 distributions13 means13 (ie13 the13 means13 of13 the13 conditional13 response13 distributions)13 are13 all13 near13 zero13 (see13 Fig13 4a13 Fig13 S2)13 Thus13 the13 weights13 on13 the13 simple13 cell13 responses13 (Fig13 6a)13 contribute13 little13 to13 the13 units13 (LL13 neurons)13 that13 respond13 according13 to13 the13 log13 likelihood13 of13 the13 linear13 filter13 population13 response13 (see13 Discussion)13 13 Next13 consider13 wij 13 i ne j 13 In13 this13 case13 we13 want13 to13 collect13 terms13 containing13 RiRj 13 2wijRiRj 13 appears13

in13 Eq13 S213 and13 minusaijRiRj 13 appears13 in13 Eq13 S4b13 Setting13 the13 two13 RiRj 13 terms13 equal13 to13 each13 other13 and13 canceling13 gives13 13

wij = minus05aij 13 13 13 13 13 13 13 13 13 (S5b)13

313 13

Next13 consider13 wii 13 In13 this13 case13 the13 total13 weight13 on13 2iR 13 in13 Eq13 S213 must13 be13 05 iiaminus 13 The13 weight13 from13

the13 third13 term13 of13 Eq13 S213 can13 be13 found13 by13 substituting13 wij = minus05aij 13 Subtracting13 that13 weight13 from13 the13

total13 weight13 specified13 by13 Eq13 S413 gives13 wii 13 13

wii = minusaii + 05 aijjsum 13 13 13 13 13 13 13 13 13 (S5c)13

13 Last13 by13 grouping13 terms13 from13 Eq13 S413 that13 are13 independent13 of13 the13 filter13 response13 we13 find13 the13 constant13 in13 Eq13 S213 is13 given13 by13 13

cons primet = minus05 aiimicroi2 + aijmicroimicro j

j foralljneisum

⎝⎜⎞

⎠⎟isum + const 13 13 13 13 (S5d)13

13 Eqs13 S1a-shy‐d13 are13 obtained13 by13 rewriting13 Eqs13 S5a-shy‐d13 in13 matrix13 notation13 13 13 In13 visual13 cortex13 simple13 and13 complex13 cells13 have13 the13 same13 maximum13 firing13 rates13 on13 average13 (Albrecht13 amp13 Hamilton13 1982)13 However13 the13 maximum13 possible13 response13 of13 the13 squared13 filters13 (see13 the13 second13 term13 in13 Eq13 5)13 is13 R2max 13 and13 the13 maximum13 possible13 response13 of13 the13 sum-shy‐squared13 filters13 (see13 the13 third13

term13 in13 Eq13 5)13 is13 given13 by13 R2ijmax = Rmax 2+ 2fi sdot f j( )2 13 where13 fi 13 and13 f j 13 are13 the13 weighting13 functions13 defining13 filters13 i13 and13 j13 (see13 Fig13 3a)13 Thus13 if13 these13 computations13 were13 implemented13 in13 cortexmdashthat13 is13 if13 the13 maximum13 response13 of13 the13 complex13 cells13 equaled13 the13 maximum13 response13 of13 the13 simple13 cells13 Rmax 13 rather13 than13 R

2max 13 and13 R

2ijmaxmdashthe13 weights13 on13 the13 complex13 cell13 responses13 must13 be13 scaled13

Thus13 the13 optimal13 weights13 on13 complex13 cells13 are13 given13 by13 primewi = wi 13 primewii = Rmaxwii 13 and13 primewij =Rijmax2

Rmaxwij 13

These13 scale13 factors13 are13 important13 because13 in13 a13 neurophysiological13 experiment13 one13 would13 measure13 primew 13 not13 w 13 Consequently13 primewii 13 and13 primewij 13 are13 the13 normalized13 weights13 plotted13 in13 Fig13 7b13

13 Evaluating13 the13 effect13 of13 surface13 slant13 Slant13 causes13 disparity13 to13 change13 across13 the13 patch13 We13 evaluated13 the13 effect13 of13 surface13 slant13 by13 determining13 the13 optimal13 filters13 and13 estimation13 performance13 for13 a13 training13 set13 with13 a13 distribution13 of13 slants13 If13 we13 assume13 (plausibly)13 that13 objects13 are13 on13 average13 equally13 likely13 to13 be13 viewed13 from13 any13 given13 angle13 then13 for13 a13 given13 small13 angular13 retinal13 extent13 the13 probability13 of13 a13 given13 slant13 about13 the13 vertical13 axis13 should13 fall13 as13 a13 half-shy‐cosine13 function13 Slant13 was13 calculated13 for13 the13 cyclopean13 eye13 To13 evaluate13 the13 effect13 of13 surface13 slant13 variation13 we13 generated13 a13 training13 set13 of13 stereo-shy‐images13 from13 a13 set13 of13 surfaces13 with13 slants13 that13 were13 randomly13 drawn13 from13 a13 truncated13 (+7113 deg)13 cosine13 probability13 density13 (more13 extreme13 slants13 were13 not13 practical13 to13 include13 because13 of13 the13 surface13 size13 required13 to13 produce13 a13 projected13 patch13 width13 of13 113 deg)13 The13 within-shy‐patch13 changes13 in13 disparity13 that13 were13 introduced13 by13 these13 slants13 are13 comparable13 to13 those13 in13 natural13 viewing13 (Fig13 S1g)13 Results13 based13 on13 this13 analysis13 are13 thus13 likely13 to13 be13 representative13 Then13 we13 determined13 the13 optimal13 binocular13 filters13 for13 this13 new13 training13 set13 The13 filters13 that13 were13 trained13 on13 surfaces13 having13 different13 slants13 (Fig13 S1a-shy‐c)13 and13 the13 LL13 neurons13 that13 are13 constructed13 from13 them13 (Fig13 S1de)13 are13 quite13 similar13 to13 those13 that13 were13 trained13 only13 on13 fronto-shy‐parallel13 surfaces13 The13 main13 (although13 slight)13 differences13 are13 that13 the13 new13 filters13 have13 spatial13 extents13 that13 are13 slightly13 smaller13 and13 octave13 bandwidths13 that13 are13 slightly13 higher13

413 13

and13 that13 the13 new13 LL13 neuron13 tuning13 curves13 are13 broader13 Disparity13 estimation13 performance13 on13 slant13 surfaces13 is13 also13 comparable13 but13 precision13 is13 somewhat13 reduced13 (Fig13 S1f13 Fig13 5d)13 When13 disparity13 changes13 across13 the13 patch13 the13 optimal13 encoding13 filters13 must13 be13 small13 enough13 to13 prevent13 the13 disparity13 signal13 from13 being13 lsquosmeared13 outrsquo13 by13 within-shy‐patch13 (ie13 within-shy‐receptive13 field)13 disparity13 variation13 And13 they13 must13 be13 large13 enough13 to13 prevent13 a13 poor13 signal-shy‐to-shy‐noise13 ratio13 which13 would13 lead13 to13 a13 preponderance13 of13 false13 matches13 (ie13 inaccurate13 disparity13 estimates)13 The13 optimal13 filters13 appear13 to13 strike13 this13 balance13 13 13 Evaluating13 the13 effect13 of13 depth13 variations13 other13 than13 slant13 Naturally13 occurring13 fine13 depth13 structure13 and13 occlusions13 (ie13 within-shy‐patch13 depth13 variation13 and13 depth13 discontinuities13 Fig13 S8a-shy‐c)13 were13 not13 present13 in13 either13 of13 our13 training13 sets13 To13 determine13 how13 the13 magnitude13 of13 these13 other13 sources13 of13 depth13 variation13 compare13 to13 that13 due13 to13 slant13 we13 analyzed13 4000013 113 deg13 patches13 from13 3013 outdoor13 range13 images13 obtained13 with13 a13 high-shy‐precision13 Riegl13 VZ40013 range13 scanner13 For13 each13 patch13 we13 computed13 the13 standard13 deviation13 of13 i)13 the13 differences13 between13 the13 raw13 range13 values13 and13 the13 best-shy‐fitting13 fronto-shy‐parallel13 plane13

σ nonminus planar 13 and13 ii)13 the13 differences13

between13 the13 best-shy‐fitting13 slanted13 and13 fronto-shy‐parallel13 planes13 σ planar 13 The13 ratio13 of13 these13 standard13

deviations13 σ planar σ nonminus planar 13 provides13 a13 measure13 of13 the13 relative13 magnitude13 of13 depth13 variation13 due13 to13

slant13 and13 variation13 due13 to13 other13 factors13 If13 the13 depth13 variation13 is13 due13 only13 to13 surface13 slant13 then13 the13 ratio13

σ planar σ nonminus planar 13 will13 have13 a13 value13 equal13 to13 1013 If13 all13 of13 the13 depth13 variation13 is13 due13 to13 factors13

other13 than13 slant13 the13 ratio13 will13 have13 a13 value13 of13 0013 Fig13 S8d13 plots13 the13 distribution13 of13 this13 ratio13 for13 all13 range13 patches13 In13 the13 most13 common13 patches13 essentially13 all13 the13 variation13 is13 captured13 by13 slant13 (ie13 the13 modal13 ratio13 is13 ~10)13 The13 median13 ratio13 is13 07513 indicating13 that13 in13 the13 median13 patch13 7513 of13 the13 depth13 variation13 is13 captured13 by13 slant13 Given13 that13 the13 optimal13 binocular13 filters13 are13 robust13 to13 variation13 in13 surface13 slant13 and13 given13 that13 most13 of13 natural13 depth13 variation13 is13 captured13 by13 slant13 it13 is13 unlikely13 that13 depth13 variation13 other13 than13 slant13 will13 qualitatively13 affect13 our13 results13 13 External13 variability13 and13 neural13 noise13 13 We13 examined13 the13 relative13 magnitude13 of13 stimulus-shy‐induced13 LL13 neuron13 variability13 and13 typical13 neural13 noise13 in13 cortex13 Typical13 cortical13 neurons13 have13 a13 peak13 mean13 response13 larger13 than13 3013 spikessec13 (Geisler13 amp13 Albrecht13 1997)13 and13 intrinsic13 noise13 variance13 proportional13 to13 the13 neuronrsquos13 mean13 response13 with13 a13 proportionality13 constant13 (ie13 Fano13 factor)13 less13 than13 15(Tolhurst13 Movshon13 amp13 Dean13 1983)13 To13 err13 on13 the13 side13 of13 too13 much13 neural13 noise13 we13 assume13 that13 each13 LL13 neuron13 has13 a13 maximum13 average13 response13 of13 3013 spkss13 and13 a13 Fano13 factor13 of13 1513 (bc13 the13 effect13 of13 noise13 is13 greater13 at13 lower13 firing13 rates)13 We13 convert13 the13 LL13 neuron13 responses13 to13 spike13 rates13 by13 first13 obtaining13 the13 average13 response13 of13 each13 LL13 neuron13 to13 its13 preferred13 disparity13 and13 then13 scaling13 that13 value13 to13 the13 maximum13 average13 response13 For13 an13 integration13 time13 of13 a13 typical13 fixation13 duration13 (30013 ms)13 stimulus-shy‐induced13 response13 variance13 was13 ~3x13 greater13 than13 the13 internal13 noise13 For13 an13 integration13 time13 equal13 to13 a13 typical13 stimulus13 presentation13 in13 a13 psychophysical13 task13 (113 sec)13 13 stimulus-shy‐induced13 response13 variance13 was13 ~9x13 greater13 than13 the13 internal13 noise13 Thus13 external13 variability13 must13 be13 considered13 to13 understand13 the13 influence13 of13 neural13 noise13 on13 the13 behavioral13 limits13 of13 vision13 under13 natural13 conditions13 In13 fact13 stimulus-shy‐induced13 variability13 is13 likely13 to13 be13 the13 controlling13 variability13 in13 many13 natural13 single-shy‐trial13 tasks13 13 Spatial13 frequency13 tuning13 of13 optimal13 linear13 binocular13 filters13 13 To13 develop13 an13 intuition13 for13 why13 spatial13 frequencies13 higher13 than13 ~613 cpd13 carry13 little13 information13 about13 disparity13 we13 examined13 the13 disparity13 signals13 resulting13 from13 a13 binocularly-shy‐viewed13 high-shy‐contrast13 cosine-shy‐windowed13 edge13 The13 Fourier-shy‐decomposition13 of13 each13 eyersquos13 stimulus13 is13 a13 set13 of13

513 13

phase-shy‐aligned13 sinewave13 gratings13 with13 the13 1f13 contrast13 fall-shy‐off13 typical13 of13 natural13 scenes(Field13 1987)13 The13 binocular13 difference13 signal13 (ie13 the13 difference13 of13 the13 left13 and13 right13 signals)13 is13 sinusoidal13 with13 contrast13 amplitude13 given13 by13 13

AB f δ k( ) = AL( f )2 + AR( f )2 minus 2AL( f )AR( f )cos 2π fδ k( ) 13 13 13 (S6)13

13 where f 13 is13 the13 spatial13 frequency13 δ k 13 is13 a13 particular13 disparity13 AL ( f ) 13 and13 AR( f ) 13 are13 the13 left13 and13

right13 eye13 retinal13 amplitudes13 and13 AB f δ k( ) 13 is13 the13 amplitude13 of13 the13 binocular13 contrast13 difference13 signal13 If13 the13 two13 eyes13 have13 identical13 optics13 the13 left-shy‐13 and13 right-shy‐eye13 retinal13 amplitudes13 (which13 include13 the13 effects13 of13 the13 optics13 and13 the13 1 f 13 falloff13 of13 natural13 image13 spectra)13 will13 be13 the13 same13 for13 a13 given13 disparity13 Fig13 S7a13 plots13 pattern13 of13 binocular13 contrast13 signals13 for13 each13 of13 several13 disparities13 Fig13 S7b13 shows13 the13 signals13 after13 filtering13 with13 a13 bank13 of13 log-shy‐Gabor13 filters13 each13 having13 a13 1513 octave13 bandwidth13 The13 patterns13 of13 binocular13 contrast13 barely13 differ13 above13 ~613 cpd13 A13 filter13 population13 (for13 disparity13 estimation)13 will13 be13 most13 efficient13 if13 each13 filter13 signals13 information13 about13 a13 range13 of13 disparities13 13 13 V113 binocular13 receptive13 fields13 tuned13 to13 the13 highest13 useful13 spatial13 frequency13 (~613 cpd)13 would13 have13 a13 spatial13 extent13 of13 ~813 arcmin13 assuming13 the13 typical13 1513 octave13 bandwidth13 of13 neurons13 in13 cortex13 Given13 that13 receptive13 fields13 cannot13 signal13 disparity13 variation13 finer13 than13 their13 own13 width13 this13 analysis13 may13 help13 explain13 why13 humans13 lose13 the13 ability13 to13 detect13 disparity13 modulations13 (ie13 sinusoidal13 modulations13 in13 disparity-shy‐defined13 depth)13 at13 spatial13 frequencies13 higher13 than13 ~413 cpd13 (Banks13 Gepshtein13 amp13 Landy13 200413 Harris13 McKee13 amp13 Smallman13 1997)13 Vergence13 eye13 movement13 jitter13 has13 previously13 been13 proposed13 as13 an13 explanation13 for13 why13 useful13 binocular13 information13 is13 restricted13 to13 low13 spatial13 frequencies13 (Vlaskamp13 Yoon13 amp13 Banks13 2011)13 vergence13 noise13 can13 degrade13 stereo13 information13 enough13 to13 render13 high13 frequency13 signals13 unmeasurable13 (Fig13 S7cd)13 The13 analysis13 presented13 here13 suggests13 that13 the13 frequency13 at13 which13 humans13 lose13 the13 ability13 to13 detect13 disparity13 modulation13 may13 simply13 be13 explained13 by13 the13 statistics13 of13 natural13 images13 13 Optics13 The13 retinal13 images13 used13 in13 our13 analysis13 were13 simulated13 to13 match13 the13 retinal13 image13 formation13 that13 occurs13 in13 the13 human13 eye13 In13 humans13 and13 non-shy‐human13 primates13 accommodative13 and13 vergence13 eye-shy‐movement13 systems13 are13 yoked(Fincham13 amp13 Walton13 1957)13 Consistent13 with13 this13 fact13 we13 set13 the13 focus13 distance13 equal13 to13 the13 fixation13 distance13 of13 4013 cm13 and13 defocused13 the13 images13 appropriate13 for13 each13 disparity13 The13 image13 of13 a13 surface13 displaced13 from13 the13 fixation13 is13 defocused13 by13

13 ΔD cong δ IPD 13 13 13 13 13 13 13 13 13 13 (S7)13

13 where13 δ 13 is13 the13 disparity13 of13 the13 surface13 expressed13 in13 radians13 IPD13 is13 the13 inter-shy‐pupillary13 distance13 expressed13 in13 meters13 and13 ΔD 13 is13 the13 defocus13 expressed13 in13 diopters13 (1meters)13 To13 account13 for13 the13 effect13 of13 chromatic13 aberration13 we13 created13 a13 different13 polychromatic13 point-shy‐spread13 function13 (PSF)13 for13 each13 disparity13 (ie13 defocus13 Eq13 S7)13 Single-shy‐wavelength13 PSFs13 were13 computed13 every13 1013 nm13 between13 40013 and13 70013 nm13 The13 wavelength-shy‐dependent13 change13 in13 refractive13 power13 of13 the13 human13 eye13 was13 taken13 from13 the13 literature13 (Thibos13 Ye13 Zhang13 amp13 Bradley13 1992)13 Polychromatic13 PSFs13 were13 obtained13 by13 weighting13 the13 single-shy‐wavelength13 PSFs13 by13 the13 photopic13 sensitivity13 function13 (Stockman13 amp13 Sharpe13 2000)13 and13 by13 the13 D6513 daylight13 illumination13 spectrum13 and13 then13 summing13 13

613 13

13 psf photopic xΔD( ) = 1

Kpsf xλΔD( )sc

λsum λ( )D65 λ( ) 13 13 13 13 (S8)13

13 where13 sc λ( ) 13 is13 the13 human13 photopic13 sensitivity13 function13 and13 K13 is13 a13 normalizing13 constant13 that13 sets13 the13 volume13 of13 psf photopic 13 to13 1013 The13 modulation13 transfer13 function13 (MTF)13 is13 the13 amplitude13 of13 the13 Fourier13 transform13 of13 the13 PSF13 13 Contrast13 normalization13 Contrast13 normalization13 occurs13 early13 in13 visual13 processing13 The13 standard13 model13 of13 cortical13 neuron13 response13 assumes13 that13 the13 input13 is13 the13 contrast13 signal13

c x( ) = r x( )minus r( ) r 13 where13 r x( ) 13 are13 the13

noisy13 sensor13 responses13 to13 the13 luminance13 image13 falling13 on13 the13 retina13 over13 a13 local13 area13 and13 r 13 is13 the13 local13 mean13 This13 contrast13 signal13 is13 then13 normalized13 by13 the13 local13 contrast13

cnorm x( ) = c x( ) c x( ) 2

+ nc502 13 where13 n13 is13 the13 dimensionality13 of13 the13 vector13 c x( ) 2 13 is13 the13 contrast13

energy13 (or13 equivalently13 n 13 times13 the13 squared13 RMS13 contrast)13 and13 50c 13 is13 the13 half-shy‐saturation13 constant13 (Albrecht13 amp13 Geisler13 199113 Albrecht13 amp13 Hamilton13 198213 Heeger13 1992)13 (We13 are13 agnostic13 about13 the13 mechanism13 by13 which13 the13 contrast13 normalization13 is13 achieved)13 Finally13 the13 normalized13 contrast13 signal13 is13 weighted13 by13 the13 cortical13 neuronrsquos13 receptive13 field13 and13 passed13 through13 a13 static13 non-shy‐linearity13 For13 example13 in13 the13 case13 of13 a13 binocular13 simple13 cell13 the13 response13 of13 the13 neuron13 is13 obtained13 by13 taking13 the13 dot13 product13 of13 the13 contrast-shy‐normalized13 signal13 with13 the13 receptive13 field13 and13 half-shy‐wave13 rectifying13

Ri = Rmax cnorm x( ) sdot fi x( )⎢⎣ ⎥⎦ 13 13

13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13

713 13

13 References13 Albrecht13 D13 G13 amp13 Geisler13 W13 S13 (1991)13 Motion13 selectivity13 and13 the13 contrast-shy‐response13 function13 of13

simple13 cells13 in13 the13 visual13 cortex13 Visual13 Neuroscience13 7(6)13 531ndash54613 Albrecht13 D13 G13 amp13 Hamilton13 D13 B13 (1982)13 Striate13 cortex13 of13 monkey13 and13 cat13 contrast13 response13

function13 Journal13 of13 Neurophysiology13 48(1)13 217ndash23713 Banks13 M13 S13 Gepshtein13 S13 amp13 Landy13 M13 S13 (2004)13 Why13 is13 spatial13 stereoresolution13 so13 low13 The13

Journal13 of13 Neuroscience13 24(9)13 2077ndash208913 13 Field13 D13 J13 (1987)13 Relations13 between13 the13 statistics13 of13 natural13 images13 and13 the13 response13 properties13 of13

cortical13 cells13 Journal13 of13 the13 Optical13 Society13 of13 America13 A13 4(12)13 2379ndash239413 Fincham13 E13 amp13 Walton13 J13 (1957)13 The13 reciprocal13 actions13 of13 accommodation13 and13 convergence13 The13

Journal13 of13 Physiology13 137(3)13 488ndash50813 Geisler13 W13 S13 amp13 Albrecht13 D13 G13 (1997)13 Visual13 cortex13 neurons13 in13 monkeys13 and13 cats13 detection13

discrimination13 and13 identification13 Visual13 Neuroscience13 14(5)13 897ndash91913 Harris13 J13 M13 McKee13 S13 P13 amp13 Smallman13 H13 S13 (1997)13 Fine-shy‐scale13 processing13 in13 human13 binocular13

stereopsis13 Journal13 of13 the13 Optical13 Society13 of13 America13 A13 14(8)13 1673ndash168313 Heeger13 D13 J13 (1992)13 Normalization13 of13 cell13 responses13 in13 cat13 striate13 cortex13 Visual13 Neuroscience13 9(2)13

181ndash19713 Hibbard13 P13 B13 (2008)13 Binocular13 energy13 responses13 to13 natural13 images13 Vision13 Research13 48(12)13 1427ndash

143913 13 Stockman13 A13 amp13 Sharpe13 L13 T13 (2000)13 The13 spectral13 sensitivities13 of13 the13 middle-shy‐13 and13 long-shy‐wavelength-shy‐

sensitive13 cones13 derived13 from13 measurements13 in13 observers13 of13 known13 genotype13 Vision13 Research13 40(13)13 1711ndash173713

Thibos13 L13 N13 Ye13 M13 Zhang13 X13 amp13 Bradley13 A13 (1992)13 The13 chromatic13 eye13 a13 new13 reduced-shy‐eye13 model13 of13 ocular13 chromatic13 aberration13 in13 humans13 Applied13 Optics13 31(19)13 3594ndash360013

Tolhurst13 D13 J13 Movshon13 J13 A13 amp13 Dean13 A13 F13 (1983)13 The13 statistical13 reliability13 of13 signals13 in13 single13 neurons13 in13 cat13 and13 monkey13 visual13 cortex13 Vision13 Research13 23(8)13 775ndash78513

Vlaskamp13 B13 N13 S13 Yoon13 G13 amp13 Banks13 M13 S13 (2011)13 Human13 stereopsis13 is13 not13 limited13 by13 the13 optics13 of13 the13 well-shy‐focused13 eye13 The13 Journal13 of13 Neuroscience13 31(27)13 9814ndash981813

13

813 13

13 Figure13 S113 Optimal13 filters13 and13 disparity13 estimation13 performance13 for13 estimating13 disparity13 for13 surfaces13 with13 a13 cosine13 distribution13 of13 slants13 (a)13 Optimal13 linear13 binocular13 filters13 for13 estimating13 the13 disparity13 (of13 the13 center13 pixel)13 of13 slanted13 surfaces13 The13 filters13 were13 learned13 on13 a13 training13 set13 having13 surfaces13 with13 a13 cosine13 distribution13 of13 slants13 The13 optimal13 filter13 shapes13 are13 robust13 to13 the13 presence13 of13 non-shy‐zero13 surface13 slants13 in13 the13 training13 set13 However13 these13 filters13 are13 of13 somewhat13 smaller13 spatial13 extent13 and13 slightly13 higher13 octave13 bandwidth13 (compare13 filters13 F3-shy‐F513 to13 those13 in13 Fig13 3)13 (b)13 Spatial13 frequency13 tuning13 and13 bandwidth13 of13 filters13 (c)13 Two-shy‐dimensional13 analogs13 of13 filters13 in13 (a)13 The13 two-shy‐dimensional13 filters13 respond13 virtually13 identically13 to13 two-shy‐dimensional13 images13 as13 do13 the13 one-shy‐dimensional13 filters13 to13 one-shy‐dimensional13 signals13 (d)13 Tuning13 curves13 of13 LL13 neurons13 trained13 and13 tested13 on13 surfaces13 with13 varying13 surface13 slant13 (e)13 Normalized13 weights13 on13 model13 complex13 cell13 responses13 for13 constructing13 five13 LL13 neurons13 with13 the13 indicated13 preferred13 disparities13 (f)13 Disparity13 estimation13 performance13 with13 surfaces13 having13 a13 cosine13 distribution13 of13 slants13 (black13 line)13 Also13 shown13 are13 disparity13 estimates13 derived13 from13 a13 local13 cross-shy‐correlator13 (gray13 line)13 (g)13 Disparity13 gradients13 from13 viewing13 natural13 scenes(Hibbard13 2008)13 vs13 disparity13 gradients13 in13 our13 training13 set13 The13 relationship13 between13 slant13 and13 disparity13 gradient13 is13 given13 by13

g asymp IPD tanθ( ) d 13 where13 θ 13 is13 the13 surface13 slant13 and13 d13 is13 the13 distance13 to13 the13 surface13 Hibbard13 (2008)13 estimated13 disparity13 gradients13 in13 natural13 viewing13 for13 distances13 nearer13 than13 3513 m13 (dashed13 curve)13 and13 0513 m13 (dotted13 curve)13 At13 the13 04m13 viewing13 distance13 assumed13 throughout13 the13 paper13 our13 manipulation13 of13 surface13 slant13 introduced13 distribution13 of13 disparity13 gradients13 (black13 line)13 that13 is13 comparable13 to13 the13 gradients13 introduced13 in13 natural13 viewing13 13

913 13

13 Figure13 S213 Filter13 responses13 LL13 neuron13 responses13 and13 performance13 with13 random-shy‐dot13 stereograms13 and13 anti-shy‐correlated13 random-shy‐dot13 stereograms13 (a)13 Joint13 response13 distributions13 of13 filters13 F113 and13 F213 when13 stimulated13 with13 random-shy‐dot13 stereograms13 Compare13 to13 natural13 stimuli13 in13 Fig13 4a13 (b)13 Joint13 responses13 of13 filters13 F313 and13 F413 when13 stimulated13 with13 random-shy‐dot13 stereograms13 (c)13 LL13 neuron13 responses13 to13 random13 dot13 stereograms13 (d)13 LL13 neuron13 tuning13 vs13 bandwidth13 when13 stimulated13 with13 natural13 stimuli13 Note13 that13 bandwidths13 are13 elevated13 compared13 to13 when13 the13 LL13 neurons13 are13 stimulated13 with13 natural13 stimuli13 (see13 Fig13 7c)13 (e)13 Disparity13 estimates13 and13 estimate13 precision13 (inset)13 obtained13 with13 random-shy‐dot13 stereograms13 (fghi)13 Same13 as13 abce13 except13 that13 the13 stimuli13 were13 anti-shy‐correlated13 random13 dot13 stereograms13 Filter13 response13 distributions13 still13 segregate13 as13 a13 function13 of13 disparity13 (especially13 with13 F313 and13 F4)13 However13 the13 anti-shy‐correlated13 filter13 response13 distributions13 do13 not13 coincide13 with13 the13 lsquoexpectedrsquo13 pattern13 of13 response13 for13 particular13 disparities13 The13 disparities13 are13 decoded13 improperly13 with13 a13 decoder13 trained13 on13 natural13 stimuli13 LL13 neuron13 tuning13 curves13 with13 anti-shy‐correlated13 stereograms13 are13 neither13 selective13 nor13 invariant13 Inaccurate13 highly13 variable13 estimates13 result13 13 13

13 13

1013 13

13 Figure13 S313 Obtaining13 AMA13 filter13 and13 model13 complex13 cell13 responses13 with13 established13 neural13 operations13 (a)13 Simple13 cell13 responses13 are13 obtained13 by13 projecting13 the13 left-shy‐13 and13 right-shy‐eye13 signals13 onto13 each13 binocular13 filter13 and13 half-shy‐wave13 rectifying13 Subtracting13 lsquoonrsquo13 and13 lsquooffrsquo13 simple13 cell13 responses13 give13 the13 AMA13 filter13 responses13 (see13 Fig13 3a)13 (b)13 Model13 (and13 neurophysiological)13 complex13 cell13 responses13 are13 obtained13 by13 summing13 and13 squaring13 the13 responses13 of13 simple13 cells13 (Model13 complex13 cell13 responses13 can13 also13 be13 obtained13 by13 summing13 the13 responses13 of13 simple13 cells13 that13 have13 both13 a13 half-shy‐wave13 rectifying13 and13 a13 squaring13 nonlinearity)13 Note13 that13 the13 model13 complex13 cells13 differ13 from13 canonical13 disparity13 energy13 binocular13 complex13 cells13 We13 label13 them13 lsquocomplexrsquo13 because13 they13 exhibit13 temporal13 frequency13 doubling13 and13 have13 a13 fundamental13 to13 DC13 response13 ratio13 of13 less13 than13 one13 The13 model13 LL13 neurons13 (see13 Fig13 613 and13 Fig13 7)13 are13 more13 similar13 to13 disparity13 energy13 complex13 cells13 although13 as13 noted13 in13 the13 text13 there13 are13 also13 important13 differences13

13

1113 13

13 Figure13 S413 The13 eight13 AMA13 filters13 from13 Fig13 313 (on13 the13 diagonal)13 and13 their13 pairwise13 sums13 (see13 Fig13 S3a)13 Solid13 and13 dashed13 curves13 indicate13 the13 left13 and13 right13 eye13 filter13 components13 respectively13 Filter13 identity13 is13 indicated13 by13 the13 subscripts13 Fij13 Some13 filters13 have13 shapes13 similar13 to13 canonical13 V1-shy‐like13 binocular13 receptive13 fields13 These13 filters13 yield13 model13 complex13 cell13 tuning13 curves13 that13 vary13 with13 disparity13 (Fig13 S5)13 Other13 filters13 are13 irregularly13 shaped13 (eg13 F36)13 Irregularly13 shaped13 filters13 yield13 model13 complex13 cell13 tuning13 curves13 that13 do13 not13 vary13 significantly13 with13 disparity13 (Fig13 S5)13 The13 inset13 shows13 the13 largest13 absolute13 normalized13 weight13 (see13 Supplement13 Eqs13 S1a-shy‐d)13 given13 to13 each13 filter13 in13 constructing13 the13 set13 of13 LL13 neurons13 (Fig13 7a)13 Lighter13 colors13 indicate13 smaller13 weights13 Unsurprisingly13 the13 irregularly13 shaped13 filters13 (eg13 F36)13 play13 a13 generally13 smaller13 role13 in13 shaping13 the13 response13 properties13 of13 log-shy‐likelihood13 (LL)13 neurons13 13 13

1213 13

13 Figure13 S513 Disparity13 tuning13 curves13 of13 the13 model13 complex13 cells13 implied13 by13 the13 AMA13 filters13 (on-shy‐diagonal)13 and13 their13 normalized13 pairwise13 sums13 (off-shy‐diagonal)13 Solid13 curves13 show13 the13 complex13 cell13 tuning13 curves13 to13 natural13 stimuli13 The13

complex13 response13 is13 given13 by13 filtering13 the13 contrast-shy‐normalized13 stimulus13 and13 then13 squaring13 Rmax fi

Tcnorm( )2 13 (see13 Fig13 S4)13 Each13 point13 on13 the13 curve13 represents13 the13 average13 response13 to13 a13 large13 number13 of13 natural13 stimuli13 having13 the13 same13 disparity13 Gray13 area13 shows13 response13 variability13 due13 to13 variation13 in13 natural13 image13 content13 Dashed13 curves13 show13 the13 complex13 cell13 tuning13 curve13 to13 513 random-shy‐dot13 stereograms13 (RDSs)13 In13 most13 cases13 RDS13 tuning13 curves13 are13 similar13 to13 the13 tuning13 curves13 for13 natural13 stimuli13 although13 response13 magnitude13 is13 generally13 somewhat13 attenuated13 These13 individual13 model13 complex13 cells13 have13 some13 selectivity13 for13 disparity13 However13 they13 exhibit13 little13 invariance13 to13 variation13 in13 natural13 image13 content13 Thus13 the13 pattern13 of13 the13 joint13 population13 response13 must13 be13 used13 to13 accurately13 estimate13 disparity13 Neurons13 that13 signal13 the13 log-shy‐likelihood13 of13 disparity13 (LL13 neurons)13 can13 be13 constructed13 by13 appropriately13 combining13 the13 joint13 responses13 of13 model13 complex13 cells13 LL13 neurons13 are13 highly13 selective13 for13 disparity13 and13 are13 also13 strongly13 response13 invariant13 to13 irrelevant13

1313 13

13 Figure13 S613 Optimal13 disparity13 estimation13 vs13 estimation13 via13 local13 cross-shy‐correlation13 Performance13 measures13 were13 obtained13 by13 testing13 each13 method13 on13 identical13 sets13 of13 left-shy‐13 and13 right-shy‐eye13 signals13 like13 those13 shown13 in13 Fig13 1d13 (Note13 that13 this13 differs13 somewhat13 from13 the13 most13 common13 test13 of13 the13 cross-shy‐correlation13 model13 which13 gives13 the13 algorithm13 windowed13 access13 to13 many13 patches13 in13 each13 eye13 We13 chose13 the13 current13 method13 so13 that13 each13 algorithm13 has13 access13 to13 the13 same13 image13 information13 thereby13 making13 the13 comparison13 meaningful)13 Black13 curves13 indicate13 estimation13 performance13 via13 local13 cross-shy‐correlation13 Gray13 curves13 indicate13 performance13 obtained13 with13 the13 eight13 AMA13 filters13 shown13 in13 the13 main13 text13 (data13 replotted13 from13 Fig13 5)13 (a)13 Median13 MAP13 estimates13 for13 frontoparallel13 surfaces13 only13 Error13 bars13 indicate13 6813 confidence13 intervals13 (b)13 Estimate13 precision13 as13 indicated13 by13 6813 confidence13 intervals13 Error13 bars13 from13 (a)13 are13 replotted13 on13 a13 linear13 axis13 (Confidence13 intervals13 were13 plotted13 on13 a13 semi-shy‐log13 plot13 in13 the13 main13 text)13 (c)13 Sign13 identification13 performance13 with13 frontoparallel13 surfaces13 (de)13 Same13 as13 bc13 except13 that13 the13 performance13 is13 for13 surfaces13 with13 a13 cosine13 distribution13 of13 slants13 Estimation13 performance13 with13 the13 cross-shy‐correlator13 is13 (slightly)13 more13 biased13 and13 is13 significantly13 less13 precise13 at13 disparities13 larger13 than13 7513 arcmin13 13

1413 13

13 Figure13 S713 Information13 for13 disparity13 estimation13 is13 concentrated13 in13 low13 to13 mid13 spatial13 frequencies13 (a)13 Inter-shy‐ocular13 retinal13 contrast13 amplitude13 of13 a13 cosine-shy‐windowed13 high-shy‐contrast13 edge13 as13 a13 function13 of13 spatial13 frequency13 for13 different13 disparities13 (Eq13 S6)13 The13 optics13 in13 the13 two13 eyes13 were13 identical13 ie13

AL f( ) = AR f( )prop MTFδ f( ) f 13 where13 MTFδ is13 the13 modulation13 transfer13 function13 associated13 with13 a13 particular13 disparity13 (see13 Methods)13 (b)13 Inter-shy‐ocular13 contrast13 signals13 that13 have13 been13 passed13 through13 a13 bank13 of13 1513 octave13 bandwidth13 log-shy‐Gabor13 filters13 The13 signals13 differ13 little13 above13 ~513 cpd13 (c)13 The13 effect13 of13 vergence13 jitter13 on13 inter-shy‐ocular13 contrast13 signals13 Signals13 were13 corrupted13 by13 213 arcmin13 of13 vergence13 jitter13 a13 typical13 value13 in13 humans13 Vergence13 noise13 was13 simulated13 by13 convolving13 each13 disparity13 signal13 with13 a13 Gaussian13 having13 a13 standard13 deviation13 of13 213 arcmin13 Vergence13 jitter13 further13 attenuates13 high13 frequency13 inter-shy‐ocular13 contrast13 signals13 Natural13 image13 structure13 and13 vergence13 jitter13 both13 contribute13 to13 the13 lack13 of13 disparity13 information13 at13 high13 luminance13 spatial13 frequencies13 (d)13 The13 signals13 in13 (c)13 after13 having13 been13 passed13 through13 a13 bank13 of13 1513 octave13 bandwidth13 log-shy‐Gabor13 filters13 (e)13 The13 effect13 of13 a13 four13 diopter13 difference13 in13 refractive13 power13 between13 the13 two13 eyes13 drastically13 reduces13 the13 inter-shy‐ocular13 contrast13 signal13 A13 four13 diopter13 (or13 larger)13 difference13 between13 the13 left13 and13 right13 optics13 is13 a13 primary13 risk13 factor13 for13 the13 development13 of13 amblyopia13 (Levi13 et13 al13 2011)13 Note13 that13 these13 signals13 were13 simulated13 for13 eyes13 with13 2mm13 pupils13 With13 larger13 pupil13 diameters13 the13 same13 difference13 in13 refractive13 power13 would13 further13 reduce13 available13 disparity13 signals13 For13 example13 with13 a13 2mm13 pupil13 an13 813 diopter13 difference13 between13 the13 eyes13 produces13 the13 same13 difference13 in13 defocus13 blur13 as13 would13 a13 4mm13 pupil13 with13 a13 413 diopters13 difference13 between13 the13 eyes13 In13 that13 case13 there13 are13 no13 measureable13 signals13 above13 113 cpd13 (f)13 The13 signals13 in13 (e)13 after13 having13 been13 passed13 through13 a13 bank13 of13 1513 octave13 bandwidth13 log-shy‐Gabor13 filters13 (g)13 Same13 as13 (e)13 but13 with13 a13 4mm13 pupil13 (h)13 Same13 as13 (f)13 but13 with13 a13 4mm13 pupil13 13

13

1513 13

13 Figure13 S813 Depth13 structure13 not13 attributable13 to13 slant13 in13 natural13 scenes13 (a)13 A13 representative13 natural13 range13 image13 (b)13 Overhead13 plan13 view13 of13 the13 range13 along13 the13 white13 scan-shy‐line13 in13 (a)13 (c)13 Range13 values13 within13 a13 113 deg13 patch13 marked13 by13 the13 white13 box13 in13 (a)13 and13 the13 black13 box13 in13 (b)13 The13 dots13 represent13 the13 raw13 range13 values13 The13 plane13 is13 the13 best13 fitting13 plane13 to13 the13 range13 data13 (d)13 Depth13 variation13 in13 113 deg13 patches13 of13 natural13 scenes13 can13 largely13 be13 captured13 by13 slanted13 planes13 (see13 supplementary13 text)13 Values13 near13 one13 indicate13 that13 all13 range13 variation13 can13 be13 captured13 by13 (planar)13 surface13 slant13 Values13 near13 zero13 indicate13 that13 none13 of13 the13 range13 variation13 can13 be13 captured13 by13 planar13 surface13 slant13 In13 the13 most13 common13 patch13 nearly13 all13 depth13 variation13 is13 attributable13 to13 slant13 In13 the13 median13 patch13 7513 of13 depth13 variation13 is13 captured13 by13 slant13 13 13 13 13 13 13 13 13 13 13

1613 13

13 Figure13 S913 Contrast13 normalization13 and13 the13 Gaussian13 form13 of13 the13 filter13 response13 distributions13 The13 standard13 model13 of13

contrast13 normalization13 in13 retinal13 and13 cortical13 neurons13 is13 given13 by13 cnorm x( ) = c x( ) c x( ) 2

+ nc502 (Albrecht13 amp13 Geisler13

199113 Albrecht13 amp13 Hamilton13 198213 Heeger13 1992)13 Throughout13 the13 paper13 we13 assumed13 that13 c50 13 had13 a13 value13 of13 zero13 To13 examine13 the13 effect13 of13 different13 values13 of13 c50 13 on13 the13 form13 of13 the13 filter13 response13 distributions13 we13 first13 contrast-shy‐normalized13 the13 test13 stimuli13 using13 different13 values13 of13 c50 13 Then13 we13 projected13 the13 contrast-shy‐normalized13 the13 test13 stimuli13 from13 each13 disparity13 level13 onto13 the13 filters13 to13 obtain13 the13 conditional13 filter13 response13 distributions13 Finally13 we13 fit13 a13 multi-shy‐dimensional13 generalized13 Gaussian13 (one-shy‐dimension13 for13 each13 filter)13 to13 each13 conditional13 response13 distribution13

P R |δ k( ) 13 via13 maximum13 likelihood13 estimation13 (a)13 When13 c50 =13 0013 the13 conditional13 response13 distributions13 have13 tails13 that13 tend13 to13 be13 somewhat13 lighter13 than13 Gaussians13 (ie13 lower13 kurtosis13 than13 a13 Gaussian)13 When13 c50 =13 0113 the13 conditional13 response13 distributions13 are13 most13 Gaussian13 on13 average13 For13 c50 0113 the13 distributions13 have13 heavier13 tails13 than13 Gaussians13 (ie13 higher13 kurtosis13 than13 a13 Gaussian)13 (b)13 Powers13 of13 the13 multi-shy‐dimensional13 generalized13 Gaussians13 that13 best13 fit13 a13 subset13 of13 the13 two-shy‐dimensional13 conditional13 filter13 response13 distributions13 A13 power13 of13 two13 (dashed13 line)13 indicates13 that13 the13 best13 fitting13 distribution13 was13 Gaussian13 Powers13 greater13 than13 or13 less13 than13 two13 indicate13 that13 the13 responses13 are13 better13 fit13 by13 distributions13 having13 lighter13 or13 heavier13 tails13 than13 a13 Gaussian13 respectively13 The13 best-shy‐fitting13 powers13 are13 shown13 for13 three13 different13 values13 of13 c50 13 00013 01013 and13 02513 The13 average13 powers13 across13 the13 conditional13 response13 distributions13 for13 the13 three13 tested13 values13 of13 c50 13 were13 27913 20713 and13 11713 respectively13 The13 same13 qualitative13 results13 hold13 for13 the13 other13 two-shy‐dimensional13 response13 distributions13 The13 pattern13 also13 holds13 for13 higher-shy‐dimensional13 filter13 response13 distributions13 In13 higher13 dimensions13 however13 results13 are13 noisier13 because13 stable13 estimates13 of13 kurtosis13 require13 significantly13 more13 data13 13 13 13 13 13

  • jovi-14-02-01_118
    • Introduction
    • f01
    • Methods and results
    • e01
    • f02
    • e02
    • e03
    • e04
    • f03
    • e05
    • f04
    • f05
    • f06
    • f07
    • Discussion
    • Conclusions
    • Methods details
    • e06
    • e07
    • Albrecht1
    • Albrecht2
    • Anzai1
    • Badcock1
    • Banks1
    • Berens1
    • Blakemore1
    • Burge1
    • Burge2
    • Carandini1
    • Chen1
    • Cormack1
    • Cumming1
    • Cumming2
    • Cumming3
    • Curcio1
    • DeValois1
    • DeAngelis1
    • Ferster1
    • Fincham1
    • Fleet1
    • Ganguli1
    • Geisler1
    • Girshick1
    • Haefner1
    • Haefner2
    • Harris1
    • Hecht1
    • Heeger1
    • Hibbard1
    • Hoyer1
    • Jazayeri1
    • Kaye1
    • Landers1
    • Lehky1
    • Levi1
    • Lewicki1
    • Li1
    • Liu1
    • Ma1
    • Marcos1
    • Marr1
    • McKee1
    • Nakayama1
    • Nienborg1
    • Ogle1
    • Ohzawa1
    • Ohzawa2
    • Ohzawa3
    • Olshausen1
    • Panum1
    • Poggio1
    • Qian1
    • Qian2
    • Read1
    • Read2
    • Read3
    • Simoncelli1
    • Skottun1
    • Stevenson1
    • Stockman1
    • Tanabe1
    • Tanabe2
    • Thibos1
    • Tsao1
    • Tyler1
    • Tyler2
    • Vlaskamp1
    • Williams1
    • Wyszecki1
      • JOV-03741-2013-s01
Page 21: Optimal disparity estimation in natural stereo images · The local differences between the retinal images—the binocular disparities— ... vertically oriented receptive fields.The

313 13

Next13 consider13 wii 13 In13 this13 case13 the13 total13 weight13 on13 2iR 13 in13 Eq13 S213 must13 be13 05 iiaminus 13 The13 weight13 from13

the13 third13 term13 of13 Eq13 S213 can13 be13 found13 by13 substituting13 wij = minus05aij 13 Subtracting13 that13 weight13 from13 the13

total13 weight13 specified13 by13 Eq13 S413 gives13 wii 13 13

wii = minusaii + 05 aijjsum 13 13 13 13 13 13 13 13 13 (S5c)13

13 Last13 by13 grouping13 terms13 from13 Eq13 S413 that13 are13 independent13 of13 the13 filter13 response13 we13 find13 the13 constant13 in13 Eq13 S213 is13 given13 by13 13

cons primet = minus05 aiimicroi2 + aijmicroimicro j

j foralljneisum

⎝⎜⎞

⎠⎟isum + const 13 13 13 13 (S5d)13

13 Eqs13 S1a-shy‐d13 are13 obtained13 by13 rewriting13 Eqs13 S5a-shy‐d13 in13 matrix13 notation13 13 13 In13 visual13 cortex13 simple13 and13 complex13 cells13 have13 the13 same13 maximum13 firing13 rates13 on13 average13 (Albrecht13 amp13 Hamilton13 1982)13 However13 the13 maximum13 possible13 response13 of13 the13 squared13 filters13 (see13 the13 second13 term13 in13 Eq13 5)13 is13 R2max 13 and13 the13 maximum13 possible13 response13 of13 the13 sum-shy‐squared13 filters13 (see13 the13 third13

term13 in13 Eq13 5)13 is13 given13 by13 R2ijmax = Rmax 2+ 2fi sdot f j( )2 13 where13 fi 13 and13 f j 13 are13 the13 weighting13 functions13 defining13 filters13 i13 and13 j13 (see13 Fig13 3a)13 Thus13 if13 these13 computations13 were13 implemented13 in13 cortexmdashthat13 is13 if13 the13 maximum13 response13 of13 the13 complex13 cells13 equaled13 the13 maximum13 response13 of13 the13 simple13 cells13 Rmax 13 rather13 than13 R

2max 13 and13 R

2ijmaxmdashthe13 weights13 on13 the13 complex13 cell13 responses13 must13 be13 scaled13

Thus13 the13 optimal13 weights13 on13 complex13 cells13 are13 given13 by13 primewi = wi 13 primewii = Rmaxwii 13 and13 primewij =Rijmax2

Rmaxwij 13

These13 scale13 factors13 are13 important13 because13 in13 a13 neurophysiological13 experiment13 one13 would13 measure13 primew 13 not13 w 13 Consequently13 primewii 13 and13 primewij 13 are13 the13 normalized13 weights13 plotted13 in13 Fig13 7b13

13 Evaluating13 the13 effect13 of13 surface13 slant13 Slant13 causes13 disparity13 to13 change13 across13 the13 patch13 We13 evaluated13 the13 effect13 of13 surface13 slant13 by13 determining13 the13 optimal13 filters13 and13 estimation13 performance13 for13 a13 training13 set13 with13 a13 distribution13 of13 slants13 If13 we13 assume13 (plausibly)13 that13 objects13 are13 on13 average13 equally13 likely13 to13 be13 viewed13 from13 any13 given13 angle13 then13 for13 a13 given13 small13 angular13 retinal13 extent13 the13 probability13 of13 a13 given13 slant13 about13 the13 vertical13 axis13 should13 fall13 as13 a13 half-shy‐cosine13 function13 Slant13 was13 calculated13 for13 the13 cyclopean13 eye13 To13 evaluate13 the13 effect13 of13 surface13 slant13 variation13 we13 generated13 a13 training13 set13 of13 stereo-shy‐images13 from13 a13 set13 of13 surfaces13 with13 slants13 that13 were13 randomly13 drawn13 from13 a13 truncated13 (+7113 deg)13 cosine13 probability13 density13 (more13 extreme13 slants13 were13 not13 practical13 to13 include13 because13 of13 the13 surface13 size13 required13 to13 produce13 a13 projected13 patch13 width13 of13 113 deg)13 The13 within-shy‐patch13 changes13 in13 disparity13 that13 were13 introduced13 by13 these13 slants13 are13 comparable13 to13 those13 in13 natural13 viewing13 (Fig13 S1g)13 Results13 based13 on13 this13 analysis13 are13 thus13 likely13 to13 be13 representative13 Then13 we13 determined13 the13 optimal13 binocular13 filters13 for13 this13 new13 training13 set13 The13 filters13 that13 were13 trained13 on13 surfaces13 having13 different13 slants13 (Fig13 S1a-shy‐c)13 and13 the13 LL13 neurons13 that13 are13 constructed13 from13 them13 (Fig13 S1de)13 are13 quite13 similar13 to13 those13 that13 were13 trained13 only13 on13 fronto-shy‐parallel13 surfaces13 The13 main13 (although13 slight)13 differences13 are13 that13 the13 new13 filters13 have13 spatial13 extents13 that13 are13 slightly13 smaller13 and13 octave13 bandwidths13 that13 are13 slightly13 higher13

413 13

and13 that13 the13 new13 LL13 neuron13 tuning13 curves13 are13 broader13 Disparity13 estimation13 performance13 on13 slant13 surfaces13 is13 also13 comparable13 but13 precision13 is13 somewhat13 reduced13 (Fig13 S1f13 Fig13 5d)13 When13 disparity13 changes13 across13 the13 patch13 the13 optimal13 encoding13 filters13 must13 be13 small13 enough13 to13 prevent13 the13 disparity13 signal13 from13 being13 lsquosmeared13 outrsquo13 by13 within-shy‐patch13 (ie13 within-shy‐receptive13 field)13 disparity13 variation13 And13 they13 must13 be13 large13 enough13 to13 prevent13 a13 poor13 signal-shy‐to-shy‐noise13 ratio13 which13 would13 lead13 to13 a13 preponderance13 of13 false13 matches13 (ie13 inaccurate13 disparity13 estimates)13 The13 optimal13 filters13 appear13 to13 strike13 this13 balance13 13 13 Evaluating13 the13 effect13 of13 depth13 variations13 other13 than13 slant13 Naturally13 occurring13 fine13 depth13 structure13 and13 occlusions13 (ie13 within-shy‐patch13 depth13 variation13 and13 depth13 discontinuities13 Fig13 S8a-shy‐c)13 were13 not13 present13 in13 either13 of13 our13 training13 sets13 To13 determine13 how13 the13 magnitude13 of13 these13 other13 sources13 of13 depth13 variation13 compare13 to13 that13 due13 to13 slant13 we13 analyzed13 4000013 113 deg13 patches13 from13 3013 outdoor13 range13 images13 obtained13 with13 a13 high-shy‐precision13 Riegl13 VZ40013 range13 scanner13 For13 each13 patch13 we13 computed13 the13 standard13 deviation13 of13 i)13 the13 differences13 between13 the13 raw13 range13 values13 and13 the13 best-shy‐fitting13 fronto-shy‐parallel13 plane13

σ nonminus planar 13 and13 ii)13 the13 differences13

between13 the13 best-shy‐fitting13 slanted13 and13 fronto-shy‐parallel13 planes13 σ planar 13 The13 ratio13 of13 these13 standard13

deviations13 σ planar σ nonminus planar 13 provides13 a13 measure13 of13 the13 relative13 magnitude13 of13 depth13 variation13 due13 to13

slant13 and13 variation13 due13 to13 other13 factors13 If13 the13 depth13 variation13 is13 due13 only13 to13 surface13 slant13 then13 the13 ratio13

σ planar σ nonminus planar 13 will13 have13 a13 value13 equal13 to13 1013 If13 all13 of13 the13 depth13 variation13 is13 due13 to13 factors13

other13 than13 slant13 the13 ratio13 will13 have13 a13 value13 of13 0013 Fig13 S8d13 plots13 the13 distribution13 of13 this13 ratio13 for13 all13 range13 patches13 In13 the13 most13 common13 patches13 essentially13 all13 the13 variation13 is13 captured13 by13 slant13 (ie13 the13 modal13 ratio13 is13 ~10)13 The13 median13 ratio13 is13 07513 indicating13 that13 in13 the13 median13 patch13 7513 of13 the13 depth13 variation13 is13 captured13 by13 slant13 Given13 that13 the13 optimal13 binocular13 filters13 are13 robust13 to13 variation13 in13 surface13 slant13 and13 given13 that13 most13 of13 natural13 depth13 variation13 is13 captured13 by13 slant13 it13 is13 unlikely13 that13 depth13 variation13 other13 than13 slant13 will13 qualitatively13 affect13 our13 results13 13 External13 variability13 and13 neural13 noise13 13 We13 examined13 the13 relative13 magnitude13 of13 stimulus-shy‐induced13 LL13 neuron13 variability13 and13 typical13 neural13 noise13 in13 cortex13 Typical13 cortical13 neurons13 have13 a13 peak13 mean13 response13 larger13 than13 3013 spikessec13 (Geisler13 amp13 Albrecht13 1997)13 and13 intrinsic13 noise13 variance13 proportional13 to13 the13 neuronrsquos13 mean13 response13 with13 a13 proportionality13 constant13 (ie13 Fano13 factor)13 less13 than13 15(Tolhurst13 Movshon13 amp13 Dean13 1983)13 To13 err13 on13 the13 side13 of13 too13 much13 neural13 noise13 we13 assume13 that13 each13 LL13 neuron13 has13 a13 maximum13 average13 response13 of13 3013 spkss13 and13 a13 Fano13 factor13 of13 1513 (bc13 the13 effect13 of13 noise13 is13 greater13 at13 lower13 firing13 rates)13 We13 convert13 the13 LL13 neuron13 responses13 to13 spike13 rates13 by13 first13 obtaining13 the13 average13 response13 of13 each13 LL13 neuron13 to13 its13 preferred13 disparity13 and13 then13 scaling13 that13 value13 to13 the13 maximum13 average13 response13 For13 an13 integration13 time13 of13 a13 typical13 fixation13 duration13 (30013 ms)13 stimulus-shy‐induced13 response13 variance13 was13 ~3x13 greater13 than13 the13 internal13 noise13 For13 an13 integration13 time13 equal13 to13 a13 typical13 stimulus13 presentation13 in13 a13 psychophysical13 task13 (113 sec)13 13 stimulus-shy‐induced13 response13 variance13 was13 ~9x13 greater13 than13 the13 internal13 noise13 Thus13 external13 variability13 must13 be13 considered13 to13 understand13 the13 influence13 of13 neural13 noise13 on13 the13 behavioral13 limits13 of13 vision13 under13 natural13 conditions13 In13 fact13 stimulus-shy‐induced13 variability13 is13 likely13 to13 be13 the13 controlling13 variability13 in13 many13 natural13 single-shy‐trial13 tasks13 13 Spatial13 frequency13 tuning13 of13 optimal13 linear13 binocular13 filters13 13 To13 develop13 an13 intuition13 for13 why13 spatial13 frequencies13 higher13 than13 ~613 cpd13 carry13 little13 information13 about13 disparity13 we13 examined13 the13 disparity13 signals13 resulting13 from13 a13 binocularly-shy‐viewed13 high-shy‐contrast13 cosine-shy‐windowed13 edge13 The13 Fourier-shy‐decomposition13 of13 each13 eyersquos13 stimulus13 is13 a13 set13 of13

513 13

phase-shy‐aligned13 sinewave13 gratings13 with13 the13 1f13 contrast13 fall-shy‐off13 typical13 of13 natural13 scenes(Field13 1987)13 The13 binocular13 difference13 signal13 (ie13 the13 difference13 of13 the13 left13 and13 right13 signals)13 is13 sinusoidal13 with13 contrast13 amplitude13 given13 by13 13

AB f δ k( ) = AL( f )2 + AR( f )2 minus 2AL( f )AR( f )cos 2π fδ k( ) 13 13 13 (S6)13

13 where f 13 is13 the13 spatial13 frequency13 δ k 13 is13 a13 particular13 disparity13 AL ( f ) 13 and13 AR( f ) 13 are13 the13 left13 and13

right13 eye13 retinal13 amplitudes13 and13 AB f δ k( ) 13 is13 the13 amplitude13 of13 the13 binocular13 contrast13 difference13 signal13 If13 the13 two13 eyes13 have13 identical13 optics13 the13 left-shy‐13 and13 right-shy‐eye13 retinal13 amplitudes13 (which13 include13 the13 effects13 of13 the13 optics13 and13 the13 1 f 13 falloff13 of13 natural13 image13 spectra)13 will13 be13 the13 same13 for13 a13 given13 disparity13 Fig13 S7a13 plots13 pattern13 of13 binocular13 contrast13 signals13 for13 each13 of13 several13 disparities13 Fig13 S7b13 shows13 the13 signals13 after13 filtering13 with13 a13 bank13 of13 log-shy‐Gabor13 filters13 each13 having13 a13 1513 octave13 bandwidth13 The13 patterns13 of13 binocular13 contrast13 barely13 differ13 above13 ~613 cpd13 A13 filter13 population13 (for13 disparity13 estimation)13 will13 be13 most13 efficient13 if13 each13 filter13 signals13 information13 about13 a13 range13 of13 disparities13 13 13 V113 binocular13 receptive13 fields13 tuned13 to13 the13 highest13 useful13 spatial13 frequency13 (~613 cpd)13 would13 have13 a13 spatial13 extent13 of13 ~813 arcmin13 assuming13 the13 typical13 1513 octave13 bandwidth13 of13 neurons13 in13 cortex13 Given13 that13 receptive13 fields13 cannot13 signal13 disparity13 variation13 finer13 than13 their13 own13 width13 this13 analysis13 may13 help13 explain13 why13 humans13 lose13 the13 ability13 to13 detect13 disparity13 modulations13 (ie13 sinusoidal13 modulations13 in13 disparity-shy‐defined13 depth)13 at13 spatial13 frequencies13 higher13 than13 ~413 cpd13 (Banks13 Gepshtein13 amp13 Landy13 200413 Harris13 McKee13 amp13 Smallman13 1997)13 Vergence13 eye13 movement13 jitter13 has13 previously13 been13 proposed13 as13 an13 explanation13 for13 why13 useful13 binocular13 information13 is13 restricted13 to13 low13 spatial13 frequencies13 (Vlaskamp13 Yoon13 amp13 Banks13 2011)13 vergence13 noise13 can13 degrade13 stereo13 information13 enough13 to13 render13 high13 frequency13 signals13 unmeasurable13 (Fig13 S7cd)13 The13 analysis13 presented13 here13 suggests13 that13 the13 frequency13 at13 which13 humans13 lose13 the13 ability13 to13 detect13 disparity13 modulation13 may13 simply13 be13 explained13 by13 the13 statistics13 of13 natural13 images13 13 Optics13 The13 retinal13 images13 used13 in13 our13 analysis13 were13 simulated13 to13 match13 the13 retinal13 image13 formation13 that13 occurs13 in13 the13 human13 eye13 In13 humans13 and13 non-shy‐human13 primates13 accommodative13 and13 vergence13 eye-shy‐movement13 systems13 are13 yoked(Fincham13 amp13 Walton13 1957)13 Consistent13 with13 this13 fact13 we13 set13 the13 focus13 distance13 equal13 to13 the13 fixation13 distance13 of13 4013 cm13 and13 defocused13 the13 images13 appropriate13 for13 each13 disparity13 The13 image13 of13 a13 surface13 displaced13 from13 the13 fixation13 is13 defocused13 by13

13 ΔD cong δ IPD 13 13 13 13 13 13 13 13 13 13 (S7)13

13 where13 δ 13 is13 the13 disparity13 of13 the13 surface13 expressed13 in13 radians13 IPD13 is13 the13 inter-shy‐pupillary13 distance13 expressed13 in13 meters13 and13 ΔD 13 is13 the13 defocus13 expressed13 in13 diopters13 (1meters)13 To13 account13 for13 the13 effect13 of13 chromatic13 aberration13 we13 created13 a13 different13 polychromatic13 point-shy‐spread13 function13 (PSF)13 for13 each13 disparity13 (ie13 defocus13 Eq13 S7)13 Single-shy‐wavelength13 PSFs13 were13 computed13 every13 1013 nm13 between13 40013 and13 70013 nm13 The13 wavelength-shy‐dependent13 change13 in13 refractive13 power13 of13 the13 human13 eye13 was13 taken13 from13 the13 literature13 (Thibos13 Ye13 Zhang13 amp13 Bradley13 1992)13 Polychromatic13 PSFs13 were13 obtained13 by13 weighting13 the13 single-shy‐wavelength13 PSFs13 by13 the13 photopic13 sensitivity13 function13 (Stockman13 amp13 Sharpe13 2000)13 and13 by13 the13 D6513 daylight13 illumination13 spectrum13 and13 then13 summing13 13

613 13

13 psf photopic xΔD( ) = 1

Kpsf xλΔD( )sc

λsum λ( )D65 λ( ) 13 13 13 13 (S8)13

13 where13 sc λ( ) 13 is13 the13 human13 photopic13 sensitivity13 function13 and13 K13 is13 a13 normalizing13 constant13 that13 sets13 the13 volume13 of13 psf photopic 13 to13 1013 The13 modulation13 transfer13 function13 (MTF)13 is13 the13 amplitude13 of13 the13 Fourier13 transform13 of13 the13 PSF13 13 Contrast13 normalization13 Contrast13 normalization13 occurs13 early13 in13 visual13 processing13 The13 standard13 model13 of13 cortical13 neuron13 response13 assumes13 that13 the13 input13 is13 the13 contrast13 signal13

c x( ) = r x( )minus r( ) r 13 where13 r x( ) 13 are13 the13

noisy13 sensor13 responses13 to13 the13 luminance13 image13 falling13 on13 the13 retina13 over13 a13 local13 area13 and13 r 13 is13 the13 local13 mean13 This13 contrast13 signal13 is13 then13 normalized13 by13 the13 local13 contrast13

cnorm x( ) = c x( ) c x( ) 2

+ nc502 13 where13 n13 is13 the13 dimensionality13 of13 the13 vector13 c x( ) 2 13 is13 the13 contrast13

energy13 (or13 equivalently13 n 13 times13 the13 squared13 RMS13 contrast)13 and13 50c 13 is13 the13 half-shy‐saturation13 constant13 (Albrecht13 amp13 Geisler13 199113 Albrecht13 amp13 Hamilton13 198213 Heeger13 1992)13 (We13 are13 agnostic13 about13 the13 mechanism13 by13 which13 the13 contrast13 normalization13 is13 achieved)13 Finally13 the13 normalized13 contrast13 signal13 is13 weighted13 by13 the13 cortical13 neuronrsquos13 receptive13 field13 and13 passed13 through13 a13 static13 non-shy‐linearity13 For13 example13 in13 the13 case13 of13 a13 binocular13 simple13 cell13 the13 response13 of13 the13 neuron13 is13 obtained13 by13 taking13 the13 dot13 product13 of13 the13 contrast-shy‐normalized13 signal13 with13 the13 receptive13 field13 and13 half-shy‐wave13 rectifying13

Ri = Rmax cnorm x( ) sdot fi x( )⎢⎣ ⎥⎦ 13 13

13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13

713 13

13 References13 Albrecht13 D13 G13 amp13 Geisler13 W13 S13 (1991)13 Motion13 selectivity13 and13 the13 contrast-shy‐response13 function13 of13

simple13 cells13 in13 the13 visual13 cortex13 Visual13 Neuroscience13 7(6)13 531ndash54613 Albrecht13 D13 G13 amp13 Hamilton13 D13 B13 (1982)13 Striate13 cortex13 of13 monkey13 and13 cat13 contrast13 response13

function13 Journal13 of13 Neurophysiology13 48(1)13 217ndash23713 Banks13 M13 S13 Gepshtein13 S13 amp13 Landy13 M13 S13 (2004)13 Why13 is13 spatial13 stereoresolution13 so13 low13 The13

Journal13 of13 Neuroscience13 24(9)13 2077ndash208913 13 Field13 D13 J13 (1987)13 Relations13 between13 the13 statistics13 of13 natural13 images13 and13 the13 response13 properties13 of13

cortical13 cells13 Journal13 of13 the13 Optical13 Society13 of13 America13 A13 4(12)13 2379ndash239413 Fincham13 E13 amp13 Walton13 J13 (1957)13 The13 reciprocal13 actions13 of13 accommodation13 and13 convergence13 The13

Journal13 of13 Physiology13 137(3)13 488ndash50813 Geisler13 W13 S13 amp13 Albrecht13 D13 G13 (1997)13 Visual13 cortex13 neurons13 in13 monkeys13 and13 cats13 detection13

discrimination13 and13 identification13 Visual13 Neuroscience13 14(5)13 897ndash91913 Harris13 J13 M13 McKee13 S13 P13 amp13 Smallman13 H13 S13 (1997)13 Fine-shy‐scale13 processing13 in13 human13 binocular13

stereopsis13 Journal13 of13 the13 Optical13 Society13 of13 America13 A13 14(8)13 1673ndash168313 Heeger13 D13 J13 (1992)13 Normalization13 of13 cell13 responses13 in13 cat13 striate13 cortex13 Visual13 Neuroscience13 9(2)13

181ndash19713 Hibbard13 P13 B13 (2008)13 Binocular13 energy13 responses13 to13 natural13 images13 Vision13 Research13 48(12)13 1427ndash

143913 13 Stockman13 A13 amp13 Sharpe13 L13 T13 (2000)13 The13 spectral13 sensitivities13 of13 the13 middle-shy‐13 and13 long-shy‐wavelength-shy‐

sensitive13 cones13 derived13 from13 measurements13 in13 observers13 of13 known13 genotype13 Vision13 Research13 40(13)13 1711ndash173713

Thibos13 L13 N13 Ye13 M13 Zhang13 X13 amp13 Bradley13 A13 (1992)13 The13 chromatic13 eye13 a13 new13 reduced-shy‐eye13 model13 of13 ocular13 chromatic13 aberration13 in13 humans13 Applied13 Optics13 31(19)13 3594ndash360013

Tolhurst13 D13 J13 Movshon13 J13 A13 amp13 Dean13 A13 F13 (1983)13 The13 statistical13 reliability13 of13 signals13 in13 single13 neurons13 in13 cat13 and13 monkey13 visual13 cortex13 Vision13 Research13 23(8)13 775ndash78513

Vlaskamp13 B13 N13 S13 Yoon13 G13 amp13 Banks13 M13 S13 (2011)13 Human13 stereopsis13 is13 not13 limited13 by13 the13 optics13 of13 the13 well-shy‐focused13 eye13 The13 Journal13 of13 Neuroscience13 31(27)13 9814ndash981813

13

813 13

13 Figure13 S113 Optimal13 filters13 and13 disparity13 estimation13 performance13 for13 estimating13 disparity13 for13 surfaces13 with13 a13 cosine13 distribution13 of13 slants13 (a)13 Optimal13 linear13 binocular13 filters13 for13 estimating13 the13 disparity13 (of13 the13 center13 pixel)13 of13 slanted13 surfaces13 The13 filters13 were13 learned13 on13 a13 training13 set13 having13 surfaces13 with13 a13 cosine13 distribution13 of13 slants13 The13 optimal13 filter13 shapes13 are13 robust13 to13 the13 presence13 of13 non-shy‐zero13 surface13 slants13 in13 the13 training13 set13 However13 these13 filters13 are13 of13 somewhat13 smaller13 spatial13 extent13 and13 slightly13 higher13 octave13 bandwidth13 (compare13 filters13 F3-shy‐F513 to13 those13 in13 Fig13 3)13 (b)13 Spatial13 frequency13 tuning13 and13 bandwidth13 of13 filters13 (c)13 Two-shy‐dimensional13 analogs13 of13 filters13 in13 (a)13 The13 two-shy‐dimensional13 filters13 respond13 virtually13 identically13 to13 two-shy‐dimensional13 images13 as13 do13 the13 one-shy‐dimensional13 filters13 to13 one-shy‐dimensional13 signals13 (d)13 Tuning13 curves13 of13 LL13 neurons13 trained13 and13 tested13 on13 surfaces13 with13 varying13 surface13 slant13 (e)13 Normalized13 weights13 on13 model13 complex13 cell13 responses13 for13 constructing13 five13 LL13 neurons13 with13 the13 indicated13 preferred13 disparities13 (f)13 Disparity13 estimation13 performance13 with13 surfaces13 having13 a13 cosine13 distribution13 of13 slants13 (black13 line)13 Also13 shown13 are13 disparity13 estimates13 derived13 from13 a13 local13 cross-shy‐correlator13 (gray13 line)13 (g)13 Disparity13 gradients13 from13 viewing13 natural13 scenes(Hibbard13 2008)13 vs13 disparity13 gradients13 in13 our13 training13 set13 The13 relationship13 between13 slant13 and13 disparity13 gradient13 is13 given13 by13

g asymp IPD tanθ( ) d 13 where13 θ 13 is13 the13 surface13 slant13 and13 d13 is13 the13 distance13 to13 the13 surface13 Hibbard13 (2008)13 estimated13 disparity13 gradients13 in13 natural13 viewing13 for13 distances13 nearer13 than13 3513 m13 (dashed13 curve)13 and13 0513 m13 (dotted13 curve)13 At13 the13 04m13 viewing13 distance13 assumed13 throughout13 the13 paper13 our13 manipulation13 of13 surface13 slant13 introduced13 distribution13 of13 disparity13 gradients13 (black13 line)13 that13 is13 comparable13 to13 the13 gradients13 introduced13 in13 natural13 viewing13 13

913 13

13 Figure13 S213 Filter13 responses13 LL13 neuron13 responses13 and13 performance13 with13 random-shy‐dot13 stereograms13 and13 anti-shy‐correlated13 random-shy‐dot13 stereograms13 (a)13 Joint13 response13 distributions13 of13 filters13 F113 and13 F213 when13 stimulated13 with13 random-shy‐dot13 stereograms13 Compare13 to13 natural13 stimuli13 in13 Fig13 4a13 (b)13 Joint13 responses13 of13 filters13 F313 and13 F413 when13 stimulated13 with13 random-shy‐dot13 stereograms13 (c)13 LL13 neuron13 responses13 to13 random13 dot13 stereograms13 (d)13 LL13 neuron13 tuning13 vs13 bandwidth13 when13 stimulated13 with13 natural13 stimuli13 Note13 that13 bandwidths13 are13 elevated13 compared13 to13 when13 the13 LL13 neurons13 are13 stimulated13 with13 natural13 stimuli13 (see13 Fig13 7c)13 (e)13 Disparity13 estimates13 and13 estimate13 precision13 (inset)13 obtained13 with13 random-shy‐dot13 stereograms13 (fghi)13 Same13 as13 abce13 except13 that13 the13 stimuli13 were13 anti-shy‐correlated13 random13 dot13 stereograms13 Filter13 response13 distributions13 still13 segregate13 as13 a13 function13 of13 disparity13 (especially13 with13 F313 and13 F4)13 However13 the13 anti-shy‐correlated13 filter13 response13 distributions13 do13 not13 coincide13 with13 the13 lsquoexpectedrsquo13 pattern13 of13 response13 for13 particular13 disparities13 The13 disparities13 are13 decoded13 improperly13 with13 a13 decoder13 trained13 on13 natural13 stimuli13 LL13 neuron13 tuning13 curves13 with13 anti-shy‐correlated13 stereograms13 are13 neither13 selective13 nor13 invariant13 Inaccurate13 highly13 variable13 estimates13 result13 13 13

13 13

1013 13

13 Figure13 S313 Obtaining13 AMA13 filter13 and13 model13 complex13 cell13 responses13 with13 established13 neural13 operations13 (a)13 Simple13 cell13 responses13 are13 obtained13 by13 projecting13 the13 left-shy‐13 and13 right-shy‐eye13 signals13 onto13 each13 binocular13 filter13 and13 half-shy‐wave13 rectifying13 Subtracting13 lsquoonrsquo13 and13 lsquooffrsquo13 simple13 cell13 responses13 give13 the13 AMA13 filter13 responses13 (see13 Fig13 3a)13 (b)13 Model13 (and13 neurophysiological)13 complex13 cell13 responses13 are13 obtained13 by13 summing13 and13 squaring13 the13 responses13 of13 simple13 cells13 (Model13 complex13 cell13 responses13 can13 also13 be13 obtained13 by13 summing13 the13 responses13 of13 simple13 cells13 that13 have13 both13 a13 half-shy‐wave13 rectifying13 and13 a13 squaring13 nonlinearity)13 Note13 that13 the13 model13 complex13 cells13 differ13 from13 canonical13 disparity13 energy13 binocular13 complex13 cells13 We13 label13 them13 lsquocomplexrsquo13 because13 they13 exhibit13 temporal13 frequency13 doubling13 and13 have13 a13 fundamental13 to13 DC13 response13 ratio13 of13 less13 than13 one13 The13 model13 LL13 neurons13 (see13 Fig13 613 and13 Fig13 7)13 are13 more13 similar13 to13 disparity13 energy13 complex13 cells13 although13 as13 noted13 in13 the13 text13 there13 are13 also13 important13 differences13

13

1113 13

13 Figure13 S413 The13 eight13 AMA13 filters13 from13 Fig13 313 (on13 the13 diagonal)13 and13 their13 pairwise13 sums13 (see13 Fig13 S3a)13 Solid13 and13 dashed13 curves13 indicate13 the13 left13 and13 right13 eye13 filter13 components13 respectively13 Filter13 identity13 is13 indicated13 by13 the13 subscripts13 Fij13 Some13 filters13 have13 shapes13 similar13 to13 canonical13 V1-shy‐like13 binocular13 receptive13 fields13 These13 filters13 yield13 model13 complex13 cell13 tuning13 curves13 that13 vary13 with13 disparity13 (Fig13 S5)13 Other13 filters13 are13 irregularly13 shaped13 (eg13 F36)13 Irregularly13 shaped13 filters13 yield13 model13 complex13 cell13 tuning13 curves13 that13 do13 not13 vary13 significantly13 with13 disparity13 (Fig13 S5)13 The13 inset13 shows13 the13 largest13 absolute13 normalized13 weight13 (see13 Supplement13 Eqs13 S1a-shy‐d)13 given13 to13 each13 filter13 in13 constructing13 the13 set13 of13 LL13 neurons13 (Fig13 7a)13 Lighter13 colors13 indicate13 smaller13 weights13 Unsurprisingly13 the13 irregularly13 shaped13 filters13 (eg13 F36)13 play13 a13 generally13 smaller13 role13 in13 shaping13 the13 response13 properties13 of13 log-shy‐likelihood13 (LL)13 neurons13 13 13

1213 13

13 Figure13 S513 Disparity13 tuning13 curves13 of13 the13 model13 complex13 cells13 implied13 by13 the13 AMA13 filters13 (on-shy‐diagonal)13 and13 their13 normalized13 pairwise13 sums13 (off-shy‐diagonal)13 Solid13 curves13 show13 the13 complex13 cell13 tuning13 curves13 to13 natural13 stimuli13 The13

complex13 response13 is13 given13 by13 filtering13 the13 contrast-shy‐normalized13 stimulus13 and13 then13 squaring13 Rmax fi

Tcnorm( )2 13 (see13 Fig13 S4)13 Each13 point13 on13 the13 curve13 represents13 the13 average13 response13 to13 a13 large13 number13 of13 natural13 stimuli13 having13 the13 same13 disparity13 Gray13 area13 shows13 response13 variability13 due13 to13 variation13 in13 natural13 image13 content13 Dashed13 curves13 show13 the13 complex13 cell13 tuning13 curve13 to13 513 random-shy‐dot13 stereograms13 (RDSs)13 In13 most13 cases13 RDS13 tuning13 curves13 are13 similar13 to13 the13 tuning13 curves13 for13 natural13 stimuli13 although13 response13 magnitude13 is13 generally13 somewhat13 attenuated13 These13 individual13 model13 complex13 cells13 have13 some13 selectivity13 for13 disparity13 However13 they13 exhibit13 little13 invariance13 to13 variation13 in13 natural13 image13 content13 Thus13 the13 pattern13 of13 the13 joint13 population13 response13 must13 be13 used13 to13 accurately13 estimate13 disparity13 Neurons13 that13 signal13 the13 log-shy‐likelihood13 of13 disparity13 (LL13 neurons)13 can13 be13 constructed13 by13 appropriately13 combining13 the13 joint13 responses13 of13 model13 complex13 cells13 LL13 neurons13 are13 highly13 selective13 for13 disparity13 and13 are13 also13 strongly13 response13 invariant13 to13 irrelevant13

1313 13

13 Figure13 S613 Optimal13 disparity13 estimation13 vs13 estimation13 via13 local13 cross-shy‐correlation13 Performance13 measures13 were13 obtained13 by13 testing13 each13 method13 on13 identical13 sets13 of13 left-shy‐13 and13 right-shy‐eye13 signals13 like13 those13 shown13 in13 Fig13 1d13 (Note13 that13 this13 differs13 somewhat13 from13 the13 most13 common13 test13 of13 the13 cross-shy‐correlation13 model13 which13 gives13 the13 algorithm13 windowed13 access13 to13 many13 patches13 in13 each13 eye13 We13 chose13 the13 current13 method13 so13 that13 each13 algorithm13 has13 access13 to13 the13 same13 image13 information13 thereby13 making13 the13 comparison13 meaningful)13 Black13 curves13 indicate13 estimation13 performance13 via13 local13 cross-shy‐correlation13 Gray13 curves13 indicate13 performance13 obtained13 with13 the13 eight13 AMA13 filters13 shown13 in13 the13 main13 text13 (data13 replotted13 from13 Fig13 5)13 (a)13 Median13 MAP13 estimates13 for13 frontoparallel13 surfaces13 only13 Error13 bars13 indicate13 6813 confidence13 intervals13 (b)13 Estimate13 precision13 as13 indicated13 by13 6813 confidence13 intervals13 Error13 bars13 from13 (a)13 are13 replotted13 on13 a13 linear13 axis13 (Confidence13 intervals13 were13 plotted13 on13 a13 semi-shy‐log13 plot13 in13 the13 main13 text)13 (c)13 Sign13 identification13 performance13 with13 frontoparallel13 surfaces13 (de)13 Same13 as13 bc13 except13 that13 the13 performance13 is13 for13 surfaces13 with13 a13 cosine13 distribution13 of13 slants13 Estimation13 performance13 with13 the13 cross-shy‐correlator13 is13 (slightly)13 more13 biased13 and13 is13 significantly13 less13 precise13 at13 disparities13 larger13 than13 7513 arcmin13 13

1413 13

13 Figure13 S713 Information13 for13 disparity13 estimation13 is13 concentrated13 in13 low13 to13 mid13 spatial13 frequencies13 (a)13 Inter-shy‐ocular13 retinal13 contrast13 amplitude13 of13 a13 cosine-shy‐windowed13 high-shy‐contrast13 edge13 as13 a13 function13 of13 spatial13 frequency13 for13 different13 disparities13 (Eq13 S6)13 The13 optics13 in13 the13 two13 eyes13 were13 identical13 ie13

AL f( ) = AR f( )prop MTFδ f( ) f 13 where13 MTFδ is13 the13 modulation13 transfer13 function13 associated13 with13 a13 particular13 disparity13 (see13 Methods)13 (b)13 Inter-shy‐ocular13 contrast13 signals13 that13 have13 been13 passed13 through13 a13 bank13 of13 1513 octave13 bandwidth13 log-shy‐Gabor13 filters13 The13 signals13 differ13 little13 above13 ~513 cpd13 (c)13 The13 effect13 of13 vergence13 jitter13 on13 inter-shy‐ocular13 contrast13 signals13 Signals13 were13 corrupted13 by13 213 arcmin13 of13 vergence13 jitter13 a13 typical13 value13 in13 humans13 Vergence13 noise13 was13 simulated13 by13 convolving13 each13 disparity13 signal13 with13 a13 Gaussian13 having13 a13 standard13 deviation13 of13 213 arcmin13 Vergence13 jitter13 further13 attenuates13 high13 frequency13 inter-shy‐ocular13 contrast13 signals13 Natural13 image13 structure13 and13 vergence13 jitter13 both13 contribute13 to13 the13 lack13 of13 disparity13 information13 at13 high13 luminance13 spatial13 frequencies13 (d)13 The13 signals13 in13 (c)13 after13 having13 been13 passed13 through13 a13 bank13 of13 1513 octave13 bandwidth13 log-shy‐Gabor13 filters13 (e)13 The13 effect13 of13 a13 four13 diopter13 difference13 in13 refractive13 power13 between13 the13 two13 eyes13 drastically13 reduces13 the13 inter-shy‐ocular13 contrast13 signal13 A13 four13 diopter13 (or13 larger)13 difference13 between13 the13 left13 and13 right13 optics13 is13 a13 primary13 risk13 factor13 for13 the13 development13 of13 amblyopia13 (Levi13 et13 al13 2011)13 Note13 that13 these13 signals13 were13 simulated13 for13 eyes13 with13 2mm13 pupils13 With13 larger13 pupil13 diameters13 the13 same13 difference13 in13 refractive13 power13 would13 further13 reduce13 available13 disparity13 signals13 For13 example13 with13 a13 2mm13 pupil13 an13 813 diopter13 difference13 between13 the13 eyes13 produces13 the13 same13 difference13 in13 defocus13 blur13 as13 would13 a13 4mm13 pupil13 with13 a13 413 diopters13 difference13 between13 the13 eyes13 In13 that13 case13 there13 are13 no13 measureable13 signals13 above13 113 cpd13 (f)13 The13 signals13 in13 (e)13 after13 having13 been13 passed13 through13 a13 bank13 of13 1513 octave13 bandwidth13 log-shy‐Gabor13 filters13 (g)13 Same13 as13 (e)13 but13 with13 a13 4mm13 pupil13 (h)13 Same13 as13 (f)13 but13 with13 a13 4mm13 pupil13 13

13

1513 13

13 Figure13 S813 Depth13 structure13 not13 attributable13 to13 slant13 in13 natural13 scenes13 (a)13 A13 representative13 natural13 range13 image13 (b)13 Overhead13 plan13 view13 of13 the13 range13 along13 the13 white13 scan-shy‐line13 in13 (a)13 (c)13 Range13 values13 within13 a13 113 deg13 patch13 marked13 by13 the13 white13 box13 in13 (a)13 and13 the13 black13 box13 in13 (b)13 The13 dots13 represent13 the13 raw13 range13 values13 The13 plane13 is13 the13 best13 fitting13 plane13 to13 the13 range13 data13 (d)13 Depth13 variation13 in13 113 deg13 patches13 of13 natural13 scenes13 can13 largely13 be13 captured13 by13 slanted13 planes13 (see13 supplementary13 text)13 Values13 near13 one13 indicate13 that13 all13 range13 variation13 can13 be13 captured13 by13 (planar)13 surface13 slant13 Values13 near13 zero13 indicate13 that13 none13 of13 the13 range13 variation13 can13 be13 captured13 by13 planar13 surface13 slant13 In13 the13 most13 common13 patch13 nearly13 all13 depth13 variation13 is13 attributable13 to13 slant13 In13 the13 median13 patch13 7513 of13 depth13 variation13 is13 captured13 by13 slant13 13 13 13 13 13 13 13 13 13 13

1613 13

13 Figure13 S913 Contrast13 normalization13 and13 the13 Gaussian13 form13 of13 the13 filter13 response13 distributions13 The13 standard13 model13 of13

contrast13 normalization13 in13 retinal13 and13 cortical13 neurons13 is13 given13 by13 cnorm x( ) = c x( ) c x( ) 2

+ nc502 (Albrecht13 amp13 Geisler13

199113 Albrecht13 amp13 Hamilton13 198213 Heeger13 1992)13 Throughout13 the13 paper13 we13 assumed13 that13 c50 13 had13 a13 value13 of13 zero13 To13 examine13 the13 effect13 of13 different13 values13 of13 c50 13 on13 the13 form13 of13 the13 filter13 response13 distributions13 we13 first13 contrast-shy‐normalized13 the13 test13 stimuli13 using13 different13 values13 of13 c50 13 Then13 we13 projected13 the13 contrast-shy‐normalized13 the13 test13 stimuli13 from13 each13 disparity13 level13 onto13 the13 filters13 to13 obtain13 the13 conditional13 filter13 response13 distributions13 Finally13 we13 fit13 a13 multi-shy‐dimensional13 generalized13 Gaussian13 (one-shy‐dimension13 for13 each13 filter)13 to13 each13 conditional13 response13 distribution13

P R |δ k( ) 13 via13 maximum13 likelihood13 estimation13 (a)13 When13 c50 =13 0013 the13 conditional13 response13 distributions13 have13 tails13 that13 tend13 to13 be13 somewhat13 lighter13 than13 Gaussians13 (ie13 lower13 kurtosis13 than13 a13 Gaussian)13 When13 c50 =13 0113 the13 conditional13 response13 distributions13 are13 most13 Gaussian13 on13 average13 For13 c50 0113 the13 distributions13 have13 heavier13 tails13 than13 Gaussians13 (ie13 higher13 kurtosis13 than13 a13 Gaussian)13 (b)13 Powers13 of13 the13 multi-shy‐dimensional13 generalized13 Gaussians13 that13 best13 fit13 a13 subset13 of13 the13 two-shy‐dimensional13 conditional13 filter13 response13 distributions13 A13 power13 of13 two13 (dashed13 line)13 indicates13 that13 the13 best13 fitting13 distribution13 was13 Gaussian13 Powers13 greater13 than13 or13 less13 than13 two13 indicate13 that13 the13 responses13 are13 better13 fit13 by13 distributions13 having13 lighter13 or13 heavier13 tails13 than13 a13 Gaussian13 respectively13 The13 best-shy‐fitting13 powers13 are13 shown13 for13 three13 different13 values13 of13 c50 13 00013 01013 and13 02513 The13 average13 powers13 across13 the13 conditional13 response13 distributions13 for13 the13 three13 tested13 values13 of13 c50 13 were13 27913 20713 and13 11713 respectively13 The13 same13 qualitative13 results13 hold13 for13 the13 other13 two-shy‐dimensional13 response13 distributions13 The13 pattern13 also13 holds13 for13 higher-shy‐dimensional13 filter13 response13 distributions13 In13 higher13 dimensions13 however13 results13 are13 noisier13 because13 stable13 estimates13 of13 kurtosis13 require13 significantly13 more13 data13 13 13 13 13 13

  • jovi-14-02-01_118
    • Introduction
    • f01
    • Methods and results
    • e01
    • f02
    • e02
    • e03
    • e04
    • f03
    • e05
    • f04
    • f05
    • f06
    • f07
    • Discussion
    • Conclusions
    • Methods details
    • e06
    • e07
    • Albrecht1
    • Albrecht2
    • Anzai1
    • Badcock1
    • Banks1
    • Berens1
    • Blakemore1
    • Burge1
    • Burge2
    • Carandini1
    • Chen1
    • Cormack1
    • Cumming1
    • Cumming2
    • Cumming3
    • Curcio1
    • DeValois1
    • DeAngelis1
    • Ferster1
    • Fincham1
    • Fleet1
    • Ganguli1
    • Geisler1
    • Girshick1
    • Haefner1
    • Haefner2
    • Harris1
    • Hecht1
    • Heeger1
    • Hibbard1
    • Hoyer1
    • Jazayeri1
    • Kaye1
    • Landers1
    • Lehky1
    • Levi1
    • Lewicki1
    • Li1
    • Liu1
    • Ma1
    • Marcos1
    • Marr1
    • McKee1
    • Nakayama1
    • Nienborg1
    • Ogle1
    • Ohzawa1
    • Ohzawa2
    • Ohzawa3
    • Olshausen1
    • Panum1
    • Poggio1
    • Qian1
    • Qian2
    • Read1
    • Read2
    • Read3
    • Simoncelli1
    • Skottun1
    • Stevenson1
    • Stockman1
    • Tanabe1
    • Tanabe2
    • Thibos1
    • Tsao1
    • Tyler1
    • Tyler2
    • Vlaskamp1
    • Williams1
    • Wyszecki1
      • JOV-03741-2013-s01
Page 22: Optimal disparity estimation in natural stereo images · The local differences between the retinal images—the binocular disparities— ... vertically oriented receptive fields.The

413 13

and13 that13 the13 new13 LL13 neuron13 tuning13 curves13 are13 broader13 Disparity13 estimation13 performance13 on13 slant13 surfaces13 is13 also13 comparable13 but13 precision13 is13 somewhat13 reduced13 (Fig13 S1f13 Fig13 5d)13 When13 disparity13 changes13 across13 the13 patch13 the13 optimal13 encoding13 filters13 must13 be13 small13 enough13 to13 prevent13 the13 disparity13 signal13 from13 being13 lsquosmeared13 outrsquo13 by13 within-shy‐patch13 (ie13 within-shy‐receptive13 field)13 disparity13 variation13 And13 they13 must13 be13 large13 enough13 to13 prevent13 a13 poor13 signal-shy‐to-shy‐noise13 ratio13 which13 would13 lead13 to13 a13 preponderance13 of13 false13 matches13 (ie13 inaccurate13 disparity13 estimates)13 The13 optimal13 filters13 appear13 to13 strike13 this13 balance13 13 13 Evaluating13 the13 effect13 of13 depth13 variations13 other13 than13 slant13 Naturally13 occurring13 fine13 depth13 structure13 and13 occlusions13 (ie13 within-shy‐patch13 depth13 variation13 and13 depth13 discontinuities13 Fig13 S8a-shy‐c)13 were13 not13 present13 in13 either13 of13 our13 training13 sets13 To13 determine13 how13 the13 magnitude13 of13 these13 other13 sources13 of13 depth13 variation13 compare13 to13 that13 due13 to13 slant13 we13 analyzed13 4000013 113 deg13 patches13 from13 3013 outdoor13 range13 images13 obtained13 with13 a13 high-shy‐precision13 Riegl13 VZ40013 range13 scanner13 For13 each13 patch13 we13 computed13 the13 standard13 deviation13 of13 i)13 the13 differences13 between13 the13 raw13 range13 values13 and13 the13 best-shy‐fitting13 fronto-shy‐parallel13 plane13

σ nonminus planar 13 and13 ii)13 the13 differences13

between13 the13 best-shy‐fitting13 slanted13 and13 fronto-shy‐parallel13 planes13 σ planar 13 The13 ratio13 of13 these13 standard13

deviations13 σ planar σ nonminus planar 13 provides13 a13 measure13 of13 the13 relative13 magnitude13 of13 depth13 variation13 due13 to13

slant13 and13 variation13 due13 to13 other13 factors13 If13 the13 depth13 variation13 is13 due13 only13 to13 surface13 slant13 then13 the13 ratio13

σ planar σ nonminus planar 13 will13 have13 a13 value13 equal13 to13 1013 If13 all13 of13 the13 depth13 variation13 is13 due13 to13 factors13

other13 than13 slant13 the13 ratio13 will13 have13 a13 value13 of13 0013 Fig13 S8d13 plots13 the13 distribution13 of13 this13 ratio13 for13 all13 range13 patches13 In13 the13 most13 common13 patches13 essentially13 all13 the13 variation13 is13 captured13 by13 slant13 (ie13 the13 modal13 ratio13 is13 ~10)13 The13 median13 ratio13 is13 07513 indicating13 that13 in13 the13 median13 patch13 7513 of13 the13 depth13 variation13 is13 captured13 by13 slant13 Given13 that13 the13 optimal13 binocular13 filters13 are13 robust13 to13 variation13 in13 surface13 slant13 and13 given13 that13 most13 of13 natural13 depth13 variation13 is13 captured13 by13 slant13 it13 is13 unlikely13 that13 depth13 variation13 other13 than13 slant13 will13 qualitatively13 affect13 our13 results13 13 External13 variability13 and13 neural13 noise13 13 We13 examined13 the13 relative13 magnitude13 of13 stimulus-shy‐induced13 LL13 neuron13 variability13 and13 typical13 neural13 noise13 in13 cortex13 Typical13 cortical13 neurons13 have13 a13 peak13 mean13 response13 larger13 than13 3013 spikessec13 (Geisler13 amp13 Albrecht13 1997)13 and13 intrinsic13 noise13 variance13 proportional13 to13 the13 neuronrsquos13 mean13 response13 with13 a13 proportionality13 constant13 (ie13 Fano13 factor)13 less13 than13 15(Tolhurst13 Movshon13 amp13 Dean13 1983)13 To13 err13 on13 the13 side13 of13 too13 much13 neural13 noise13 we13 assume13 that13 each13 LL13 neuron13 has13 a13 maximum13 average13 response13 of13 3013 spkss13 and13 a13 Fano13 factor13 of13 1513 (bc13 the13 effect13 of13 noise13 is13 greater13 at13 lower13 firing13 rates)13 We13 convert13 the13 LL13 neuron13 responses13 to13 spike13 rates13 by13 first13 obtaining13 the13 average13 response13 of13 each13 LL13 neuron13 to13 its13 preferred13 disparity13 and13 then13 scaling13 that13 value13 to13 the13 maximum13 average13 response13 For13 an13 integration13 time13 of13 a13 typical13 fixation13 duration13 (30013 ms)13 stimulus-shy‐induced13 response13 variance13 was13 ~3x13 greater13 than13 the13 internal13 noise13 For13 an13 integration13 time13 equal13 to13 a13 typical13 stimulus13 presentation13 in13 a13 psychophysical13 task13 (113 sec)13 13 stimulus-shy‐induced13 response13 variance13 was13 ~9x13 greater13 than13 the13 internal13 noise13 Thus13 external13 variability13 must13 be13 considered13 to13 understand13 the13 influence13 of13 neural13 noise13 on13 the13 behavioral13 limits13 of13 vision13 under13 natural13 conditions13 In13 fact13 stimulus-shy‐induced13 variability13 is13 likely13 to13 be13 the13 controlling13 variability13 in13 many13 natural13 single-shy‐trial13 tasks13 13 Spatial13 frequency13 tuning13 of13 optimal13 linear13 binocular13 filters13 13 To13 develop13 an13 intuition13 for13 why13 spatial13 frequencies13 higher13 than13 ~613 cpd13 carry13 little13 information13 about13 disparity13 we13 examined13 the13 disparity13 signals13 resulting13 from13 a13 binocularly-shy‐viewed13 high-shy‐contrast13 cosine-shy‐windowed13 edge13 The13 Fourier-shy‐decomposition13 of13 each13 eyersquos13 stimulus13 is13 a13 set13 of13

513 13

phase-shy‐aligned13 sinewave13 gratings13 with13 the13 1f13 contrast13 fall-shy‐off13 typical13 of13 natural13 scenes(Field13 1987)13 The13 binocular13 difference13 signal13 (ie13 the13 difference13 of13 the13 left13 and13 right13 signals)13 is13 sinusoidal13 with13 contrast13 amplitude13 given13 by13 13

AB f δ k( ) = AL( f )2 + AR( f )2 minus 2AL( f )AR( f )cos 2π fδ k( ) 13 13 13 (S6)13

13 where f 13 is13 the13 spatial13 frequency13 δ k 13 is13 a13 particular13 disparity13 AL ( f ) 13 and13 AR( f ) 13 are13 the13 left13 and13

right13 eye13 retinal13 amplitudes13 and13 AB f δ k( ) 13 is13 the13 amplitude13 of13 the13 binocular13 contrast13 difference13 signal13 If13 the13 two13 eyes13 have13 identical13 optics13 the13 left-shy‐13 and13 right-shy‐eye13 retinal13 amplitudes13 (which13 include13 the13 effects13 of13 the13 optics13 and13 the13 1 f 13 falloff13 of13 natural13 image13 spectra)13 will13 be13 the13 same13 for13 a13 given13 disparity13 Fig13 S7a13 plots13 pattern13 of13 binocular13 contrast13 signals13 for13 each13 of13 several13 disparities13 Fig13 S7b13 shows13 the13 signals13 after13 filtering13 with13 a13 bank13 of13 log-shy‐Gabor13 filters13 each13 having13 a13 1513 octave13 bandwidth13 The13 patterns13 of13 binocular13 contrast13 barely13 differ13 above13 ~613 cpd13 A13 filter13 population13 (for13 disparity13 estimation)13 will13 be13 most13 efficient13 if13 each13 filter13 signals13 information13 about13 a13 range13 of13 disparities13 13 13 V113 binocular13 receptive13 fields13 tuned13 to13 the13 highest13 useful13 spatial13 frequency13 (~613 cpd)13 would13 have13 a13 spatial13 extent13 of13 ~813 arcmin13 assuming13 the13 typical13 1513 octave13 bandwidth13 of13 neurons13 in13 cortex13 Given13 that13 receptive13 fields13 cannot13 signal13 disparity13 variation13 finer13 than13 their13 own13 width13 this13 analysis13 may13 help13 explain13 why13 humans13 lose13 the13 ability13 to13 detect13 disparity13 modulations13 (ie13 sinusoidal13 modulations13 in13 disparity-shy‐defined13 depth)13 at13 spatial13 frequencies13 higher13 than13 ~413 cpd13 (Banks13 Gepshtein13 amp13 Landy13 200413 Harris13 McKee13 amp13 Smallman13 1997)13 Vergence13 eye13 movement13 jitter13 has13 previously13 been13 proposed13 as13 an13 explanation13 for13 why13 useful13 binocular13 information13 is13 restricted13 to13 low13 spatial13 frequencies13 (Vlaskamp13 Yoon13 amp13 Banks13 2011)13 vergence13 noise13 can13 degrade13 stereo13 information13 enough13 to13 render13 high13 frequency13 signals13 unmeasurable13 (Fig13 S7cd)13 The13 analysis13 presented13 here13 suggests13 that13 the13 frequency13 at13 which13 humans13 lose13 the13 ability13 to13 detect13 disparity13 modulation13 may13 simply13 be13 explained13 by13 the13 statistics13 of13 natural13 images13 13 Optics13 The13 retinal13 images13 used13 in13 our13 analysis13 were13 simulated13 to13 match13 the13 retinal13 image13 formation13 that13 occurs13 in13 the13 human13 eye13 In13 humans13 and13 non-shy‐human13 primates13 accommodative13 and13 vergence13 eye-shy‐movement13 systems13 are13 yoked(Fincham13 amp13 Walton13 1957)13 Consistent13 with13 this13 fact13 we13 set13 the13 focus13 distance13 equal13 to13 the13 fixation13 distance13 of13 4013 cm13 and13 defocused13 the13 images13 appropriate13 for13 each13 disparity13 The13 image13 of13 a13 surface13 displaced13 from13 the13 fixation13 is13 defocused13 by13

13 ΔD cong δ IPD 13 13 13 13 13 13 13 13 13 13 (S7)13

13 where13 δ 13 is13 the13 disparity13 of13 the13 surface13 expressed13 in13 radians13 IPD13 is13 the13 inter-shy‐pupillary13 distance13 expressed13 in13 meters13 and13 ΔD 13 is13 the13 defocus13 expressed13 in13 diopters13 (1meters)13 To13 account13 for13 the13 effect13 of13 chromatic13 aberration13 we13 created13 a13 different13 polychromatic13 point-shy‐spread13 function13 (PSF)13 for13 each13 disparity13 (ie13 defocus13 Eq13 S7)13 Single-shy‐wavelength13 PSFs13 were13 computed13 every13 1013 nm13 between13 40013 and13 70013 nm13 The13 wavelength-shy‐dependent13 change13 in13 refractive13 power13 of13 the13 human13 eye13 was13 taken13 from13 the13 literature13 (Thibos13 Ye13 Zhang13 amp13 Bradley13 1992)13 Polychromatic13 PSFs13 were13 obtained13 by13 weighting13 the13 single-shy‐wavelength13 PSFs13 by13 the13 photopic13 sensitivity13 function13 (Stockman13 amp13 Sharpe13 2000)13 and13 by13 the13 D6513 daylight13 illumination13 spectrum13 and13 then13 summing13 13

613 13

13 psf photopic xΔD( ) = 1

Kpsf xλΔD( )sc

λsum λ( )D65 λ( ) 13 13 13 13 (S8)13

13 where13 sc λ( ) 13 is13 the13 human13 photopic13 sensitivity13 function13 and13 K13 is13 a13 normalizing13 constant13 that13 sets13 the13 volume13 of13 psf photopic 13 to13 1013 The13 modulation13 transfer13 function13 (MTF)13 is13 the13 amplitude13 of13 the13 Fourier13 transform13 of13 the13 PSF13 13 Contrast13 normalization13 Contrast13 normalization13 occurs13 early13 in13 visual13 processing13 The13 standard13 model13 of13 cortical13 neuron13 response13 assumes13 that13 the13 input13 is13 the13 contrast13 signal13

c x( ) = r x( )minus r( ) r 13 where13 r x( ) 13 are13 the13

noisy13 sensor13 responses13 to13 the13 luminance13 image13 falling13 on13 the13 retina13 over13 a13 local13 area13 and13 r 13 is13 the13 local13 mean13 This13 contrast13 signal13 is13 then13 normalized13 by13 the13 local13 contrast13

cnorm x( ) = c x( ) c x( ) 2

+ nc502 13 where13 n13 is13 the13 dimensionality13 of13 the13 vector13 c x( ) 2 13 is13 the13 contrast13

energy13 (or13 equivalently13 n 13 times13 the13 squared13 RMS13 contrast)13 and13 50c 13 is13 the13 half-shy‐saturation13 constant13 (Albrecht13 amp13 Geisler13 199113 Albrecht13 amp13 Hamilton13 198213 Heeger13 1992)13 (We13 are13 agnostic13 about13 the13 mechanism13 by13 which13 the13 contrast13 normalization13 is13 achieved)13 Finally13 the13 normalized13 contrast13 signal13 is13 weighted13 by13 the13 cortical13 neuronrsquos13 receptive13 field13 and13 passed13 through13 a13 static13 non-shy‐linearity13 For13 example13 in13 the13 case13 of13 a13 binocular13 simple13 cell13 the13 response13 of13 the13 neuron13 is13 obtained13 by13 taking13 the13 dot13 product13 of13 the13 contrast-shy‐normalized13 signal13 with13 the13 receptive13 field13 and13 half-shy‐wave13 rectifying13

Ri = Rmax cnorm x( ) sdot fi x( )⎢⎣ ⎥⎦ 13 13

13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13

713 13

13 References13 Albrecht13 D13 G13 amp13 Geisler13 W13 S13 (1991)13 Motion13 selectivity13 and13 the13 contrast-shy‐response13 function13 of13

simple13 cells13 in13 the13 visual13 cortex13 Visual13 Neuroscience13 7(6)13 531ndash54613 Albrecht13 D13 G13 amp13 Hamilton13 D13 B13 (1982)13 Striate13 cortex13 of13 monkey13 and13 cat13 contrast13 response13

function13 Journal13 of13 Neurophysiology13 48(1)13 217ndash23713 Banks13 M13 S13 Gepshtein13 S13 amp13 Landy13 M13 S13 (2004)13 Why13 is13 spatial13 stereoresolution13 so13 low13 The13

Journal13 of13 Neuroscience13 24(9)13 2077ndash208913 13 Field13 D13 J13 (1987)13 Relations13 between13 the13 statistics13 of13 natural13 images13 and13 the13 response13 properties13 of13

cortical13 cells13 Journal13 of13 the13 Optical13 Society13 of13 America13 A13 4(12)13 2379ndash239413 Fincham13 E13 amp13 Walton13 J13 (1957)13 The13 reciprocal13 actions13 of13 accommodation13 and13 convergence13 The13

Journal13 of13 Physiology13 137(3)13 488ndash50813 Geisler13 W13 S13 amp13 Albrecht13 D13 G13 (1997)13 Visual13 cortex13 neurons13 in13 monkeys13 and13 cats13 detection13

discrimination13 and13 identification13 Visual13 Neuroscience13 14(5)13 897ndash91913 Harris13 J13 M13 McKee13 S13 P13 amp13 Smallman13 H13 S13 (1997)13 Fine-shy‐scale13 processing13 in13 human13 binocular13

stereopsis13 Journal13 of13 the13 Optical13 Society13 of13 America13 A13 14(8)13 1673ndash168313 Heeger13 D13 J13 (1992)13 Normalization13 of13 cell13 responses13 in13 cat13 striate13 cortex13 Visual13 Neuroscience13 9(2)13

181ndash19713 Hibbard13 P13 B13 (2008)13 Binocular13 energy13 responses13 to13 natural13 images13 Vision13 Research13 48(12)13 1427ndash

143913 13 Stockman13 A13 amp13 Sharpe13 L13 T13 (2000)13 The13 spectral13 sensitivities13 of13 the13 middle-shy‐13 and13 long-shy‐wavelength-shy‐

sensitive13 cones13 derived13 from13 measurements13 in13 observers13 of13 known13 genotype13 Vision13 Research13 40(13)13 1711ndash173713

Thibos13 L13 N13 Ye13 M13 Zhang13 X13 amp13 Bradley13 A13 (1992)13 The13 chromatic13 eye13 a13 new13 reduced-shy‐eye13 model13 of13 ocular13 chromatic13 aberration13 in13 humans13 Applied13 Optics13 31(19)13 3594ndash360013

Tolhurst13 D13 J13 Movshon13 J13 A13 amp13 Dean13 A13 F13 (1983)13 The13 statistical13 reliability13 of13 signals13 in13 single13 neurons13 in13 cat13 and13 monkey13 visual13 cortex13 Vision13 Research13 23(8)13 775ndash78513

Vlaskamp13 B13 N13 S13 Yoon13 G13 amp13 Banks13 M13 S13 (2011)13 Human13 stereopsis13 is13 not13 limited13 by13 the13 optics13 of13 the13 well-shy‐focused13 eye13 The13 Journal13 of13 Neuroscience13 31(27)13 9814ndash981813

13

813 13

13 Figure13 S113 Optimal13 filters13 and13 disparity13 estimation13 performance13 for13 estimating13 disparity13 for13 surfaces13 with13 a13 cosine13 distribution13 of13 slants13 (a)13 Optimal13 linear13 binocular13 filters13 for13 estimating13 the13 disparity13 (of13 the13 center13 pixel)13 of13 slanted13 surfaces13 The13 filters13 were13 learned13 on13 a13 training13 set13 having13 surfaces13 with13 a13 cosine13 distribution13 of13 slants13 The13 optimal13 filter13 shapes13 are13 robust13 to13 the13 presence13 of13 non-shy‐zero13 surface13 slants13 in13 the13 training13 set13 However13 these13 filters13 are13 of13 somewhat13 smaller13 spatial13 extent13 and13 slightly13 higher13 octave13 bandwidth13 (compare13 filters13 F3-shy‐F513 to13 those13 in13 Fig13 3)13 (b)13 Spatial13 frequency13 tuning13 and13 bandwidth13 of13 filters13 (c)13 Two-shy‐dimensional13 analogs13 of13 filters13 in13 (a)13 The13 two-shy‐dimensional13 filters13 respond13 virtually13 identically13 to13 two-shy‐dimensional13 images13 as13 do13 the13 one-shy‐dimensional13 filters13 to13 one-shy‐dimensional13 signals13 (d)13 Tuning13 curves13 of13 LL13 neurons13 trained13 and13 tested13 on13 surfaces13 with13 varying13 surface13 slant13 (e)13 Normalized13 weights13 on13 model13 complex13 cell13 responses13 for13 constructing13 five13 LL13 neurons13 with13 the13 indicated13 preferred13 disparities13 (f)13 Disparity13 estimation13 performance13 with13 surfaces13 having13 a13 cosine13 distribution13 of13 slants13 (black13 line)13 Also13 shown13 are13 disparity13 estimates13 derived13 from13 a13 local13 cross-shy‐correlator13 (gray13 line)13 (g)13 Disparity13 gradients13 from13 viewing13 natural13 scenes(Hibbard13 2008)13 vs13 disparity13 gradients13 in13 our13 training13 set13 The13 relationship13 between13 slant13 and13 disparity13 gradient13 is13 given13 by13

g asymp IPD tanθ( ) d 13 where13 θ 13 is13 the13 surface13 slant13 and13 d13 is13 the13 distance13 to13 the13 surface13 Hibbard13 (2008)13 estimated13 disparity13 gradients13 in13 natural13 viewing13 for13 distances13 nearer13 than13 3513 m13 (dashed13 curve)13 and13 0513 m13 (dotted13 curve)13 At13 the13 04m13 viewing13 distance13 assumed13 throughout13 the13 paper13 our13 manipulation13 of13 surface13 slant13 introduced13 distribution13 of13 disparity13 gradients13 (black13 line)13 that13 is13 comparable13 to13 the13 gradients13 introduced13 in13 natural13 viewing13 13

913 13

13 Figure13 S213 Filter13 responses13 LL13 neuron13 responses13 and13 performance13 with13 random-shy‐dot13 stereograms13 and13 anti-shy‐correlated13 random-shy‐dot13 stereograms13 (a)13 Joint13 response13 distributions13 of13 filters13 F113 and13 F213 when13 stimulated13 with13 random-shy‐dot13 stereograms13 Compare13 to13 natural13 stimuli13 in13 Fig13 4a13 (b)13 Joint13 responses13 of13 filters13 F313 and13 F413 when13 stimulated13 with13 random-shy‐dot13 stereograms13 (c)13 LL13 neuron13 responses13 to13 random13 dot13 stereograms13 (d)13 LL13 neuron13 tuning13 vs13 bandwidth13 when13 stimulated13 with13 natural13 stimuli13 Note13 that13 bandwidths13 are13 elevated13 compared13 to13 when13 the13 LL13 neurons13 are13 stimulated13 with13 natural13 stimuli13 (see13 Fig13 7c)13 (e)13 Disparity13 estimates13 and13 estimate13 precision13 (inset)13 obtained13 with13 random-shy‐dot13 stereograms13 (fghi)13 Same13 as13 abce13 except13 that13 the13 stimuli13 were13 anti-shy‐correlated13 random13 dot13 stereograms13 Filter13 response13 distributions13 still13 segregate13 as13 a13 function13 of13 disparity13 (especially13 with13 F313 and13 F4)13 However13 the13 anti-shy‐correlated13 filter13 response13 distributions13 do13 not13 coincide13 with13 the13 lsquoexpectedrsquo13 pattern13 of13 response13 for13 particular13 disparities13 The13 disparities13 are13 decoded13 improperly13 with13 a13 decoder13 trained13 on13 natural13 stimuli13 LL13 neuron13 tuning13 curves13 with13 anti-shy‐correlated13 stereograms13 are13 neither13 selective13 nor13 invariant13 Inaccurate13 highly13 variable13 estimates13 result13 13 13

13 13

1013 13

13 Figure13 S313 Obtaining13 AMA13 filter13 and13 model13 complex13 cell13 responses13 with13 established13 neural13 operations13 (a)13 Simple13 cell13 responses13 are13 obtained13 by13 projecting13 the13 left-shy‐13 and13 right-shy‐eye13 signals13 onto13 each13 binocular13 filter13 and13 half-shy‐wave13 rectifying13 Subtracting13 lsquoonrsquo13 and13 lsquooffrsquo13 simple13 cell13 responses13 give13 the13 AMA13 filter13 responses13 (see13 Fig13 3a)13 (b)13 Model13 (and13 neurophysiological)13 complex13 cell13 responses13 are13 obtained13 by13 summing13 and13 squaring13 the13 responses13 of13 simple13 cells13 (Model13 complex13 cell13 responses13 can13 also13 be13 obtained13 by13 summing13 the13 responses13 of13 simple13 cells13 that13 have13 both13 a13 half-shy‐wave13 rectifying13 and13 a13 squaring13 nonlinearity)13 Note13 that13 the13 model13 complex13 cells13 differ13 from13 canonical13 disparity13 energy13 binocular13 complex13 cells13 We13 label13 them13 lsquocomplexrsquo13 because13 they13 exhibit13 temporal13 frequency13 doubling13 and13 have13 a13 fundamental13 to13 DC13 response13 ratio13 of13 less13 than13 one13 The13 model13 LL13 neurons13 (see13 Fig13 613 and13 Fig13 7)13 are13 more13 similar13 to13 disparity13 energy13 complex13 cells13 although13 as13 noted13 in13 the13 text13 there13 are13 also13 important13 differences13

13

1113 13

13 Figure13 S413 The13 eight13 AMA13 filters13 from13 Fig13 313 (on13 the13 diagonal)13 and13 their13 pairwise13 sums13 (see13 Fig13 S3a)13 Solid13 and13 dashed13 curves13 indicate13 the13 left13 and13 right13 eye13 filter13 components13 respectively13 Filter13 identity13 is13 indicated13 by13 the13 subscripts13 Fij13 Some13 filters13 have13 shapes13 similar13 to13 canonical13 V1-shy‐like13 binocular13 receptive13 fields13 These13 filters13 yield13 model13 complex13 cell13 tuning13 curves13 that13 vary13 with13 disparity13 (Fig13 S5)13 Other13 filters13 are13 irregularly13 shaped13 (eg13 F36)13 Irregularly13 shaped13 filters13 yield13 model13 complex13 cell13 tuning13 curves13 that13 do13 not13 vary13 significantly13 with13 disparity13 (Fig13 S5)13 The13 inset13 shows13 the13 largest13 absolute13 normalized13 weight13 (see13 Supplement13 Eqs13 S1a-shy‐d)13 given13 to13 each13 filter13 in13 constructing13 the13 set13 of13 LL13 neurons13 (Fig13 7a)13 Lighter13 colors13 indicate13 smaller13 weights13 Unsurprisingly13 the13 irregularly13 shaped13 filters13 (eg13 F36)13 play13 a13 generally13 smaller13 role13 in13 shaping13 the13 response13 properties13 of13 log-shy‐likelihood13 (LL)13 neurons13 13 13

1213 13

13 Figure13 S513 Disparity13 tuning13 curves13 of13 the13 model13 complex13 cells13 implied13 by13 the13 AMA13 filters13 (on-shy‐diagonal)13 and13 their13 normalized13 pairwise13 sums13 (off-shy‐diagonal)13 Solid13 curves13 show13 the13 complex13 cell13 tuning13 curves13 to13 natural13 stimuli13 The13

complex13 response13 is13 given13 by13 filtering13 the13 contrast-shy‐normalized13 stimulus13 and13 then13 squaring13 Rmax fi

Tcnorm( )2 13 (see13 Fig13 S4)13 Each13 point13 on13 the13 curve13 represents13 the13 average13 response13 to13 a13 large13 number13 of13 natural13 stimuli13 having13 the13 same13 disparity13 Gray13 area13 shows13 response13 variability13 due13 to13 variation13 in13 natural13 image13 content13 Dashed13 curves13 show13 the13 complex13 cell13 tuning13 curve13 to13 513 random-shy‐dot13 stereograms13 (RDSs)13 In13 most13 cases13 RDS13 tuning13 curves13 are13 similar13 to13 the13 tuning13 curves13 for13 natural13 stimuli13 although13 response13 magnitude13 is13 generally13 somewhat13 attenuated13 These13 individual13 model13 complex13 cells13 have13 some13 selectivity13 for13 disparity13 However13 they13 exhibit13 little13 invariance13 to13 variation13 in13 natural13 image13 content13 Thus13 the13 pattern13 of13 the13 joint13 population13 response13 must13 be13 used13 to13 accurately13 estimate13 disparity13 Neurons13 that13 signal13 the13 log-shy‐likelihood13 of13 disparity13 (LL13 neurons)13 can13 be13 constructed13 by13 appropriately13 combining13 the13 joint13 responses13 of13 model13 complex13 cells13 LL13 neurons13 are13 highly13 selective13 for13 disparity13 and13 are13 also13 strongly13 response13 invariant13 to13 irrelevant13

1313 13

13 Figure13 S613 Optimal13 disparity13 estimation13 vs13 estimation13 via13 local13 cross-shy‐correlation13 Performance13 measures13 were13 obtained13 by13 testing13 each13 method13 on13 identical13 sets13 of13 left-shy‐13 and13 right-shy‐eye13 signals13 like13 those13 shown13 in13 Fig13 1d13 (Note13 that13 this13 differs13 somewhat13 from13 the13 most13 common13 test13 of13 the13 cross-shy‐correlation13 model13 which13 gives13 the13 algorithm13 windowed13 access13 to13 many13 patches13 in13 each13 eye13 We13 chose13 the13 current13 method13 so13 that13 each13 algorithm13 has13 access13 to13 the13 same13 image13 information13 thereby13 making13 the13 comparison13 meaningful)13 Black13 curves13 indicate13 estimation13 performance13 via13 local13 cross-shy‐correlation13 Gray13 curves13 indicate13 performance13 obtained13 with13 the13 eight13 AMA13 filters13 shown13 in13 the13 main13 text13 (data13 replotted13 from13 Fig13 5)13 (a)13 Median13 MAP13 estimates13 for13 frontoparallel13 surfaces13 only13 Error13 bars13 indicate13 6813 confidence13 intervals13 (b)13 Estimate13 precision13 as13 indicated13 by13 6813 confidence13 intervals13 Error13 bars13 from13 (a)13 are13 replotted13 on13 a13 linear13 axis13 (Confidence13 intervals13 were13 plotted13 on13 a13 semi-shy‐log13 plot13 in13 the13 main13 text)13 (c)13 Sign13 identification13 performance13 with13 frontoparallel13 surfaces13 (de)13 Same13 as13 bc13 except13 that13 the13 performance13 is13 for13 surfaces13 with13 a13 cosine13 distribution13 of13 slants13 Estimation13 performance13 with13 the13 cross-shy‐correlator13 is13 (slightly)13 more13 biased13 and13 is13 significantly13 less13 precise13 at13 disparities13 larger13 than13 7513 arcmin13 13

1413 13

13 Figure13 S713 Information13 for13 disparity13 estimation13 is13 concentrated13 in13 low13 to13 mid13 spatial13 frequencies13 (a)13 Inter-shy‐ocular13 retinal13 contrast13 amplitude13 of13 a13 cosine-shy‐windowed13 high-shy‐contrast13 edge13 as13 a13 function13 of13 spatial13 frequency13 for13 different13 disparities13 (Eq13 S6)13 The13 optics13 in13 the13 two13 eyes13 were13 identical13 ie13

AL f( ) = AR f( )prop MTFδ f( ) f 13 where13 MTFδ is13 the13 modulation13 transfer13 function13 associated13 with13 a13 particular13 disparity13 (see13 Methods)13 (b)13 Inter-shy‐ocular13 contrast13 signals13 that13 have13 been13 passed13 through13 a13 bank13 of13 1513 octave13 bandwidth13 log-shy‐Gabor13 filters13 The13 signals13 differ13 little13 above13 ~513 cpd13 (c)13 The13 effect13 of13 vergence13 jitter13 on13 inter-shy‐ocular13 contrast13 signals13 Signals13 were13 corrupted13 by13 213 arcmin13 of13 vergence13 jitter13 a13 typical13 value13 in13 humans13 Vergence13 noise13 was13 simulated13 by13 convolving13 each13 disparity13 signal13 with13 a13 Gaussian13 having13 a13 standard13 deviation13 of13 213 arcmin13 Vergence13 jitter13 further13 attenuates13 high13 frequency13 inter-shy‐ocular13 contrast13 signals13 Natural13 image13 structure13 and13 vergence13 jitter13 both13 contribute13 to13 the13 lack13 of13 disparity13 information13 at13 high13 luminance13 spatial13 frequencies13 (d)13 The13 signals13 in13 (c)13 after13 having13 been13 passed13 through13 a13 bank13 of13 1513 octave13 bandwidth13 log-shy‐Gabor13 filters13 (e)13 The13 effect13 of13 a13 four13 diopter13 difference13 in13 refractive13 power13 between13 the13 two13 eyes13 drastically13 reduces13 the13 inter-shy‐ocular13 contrast13 signal13 A13 four13 diopter13 (or13 larger)13 difference13 between13 the13 left13 and13 right13 optics13 is13 a13 primary13 risk13 factor13 for13 the13 development13 of13 amblyopia13 (Levi13 et13 al13 2011)13 Note13 that13 these13 signals13 were13 simulated13 for13 eyes13 with13 2mm13 pupils13 With13 larger13 pupil13 diameters13 the13 same13 difference13 in13 refractive13 power13 would13 further13 reduce13 available13 disparity13 signals13 For13 example13 with13 a13 2mm13 pupil13 an13 813 diopter13 difference13 between13 the13 eyes13 produces13 the13 same13 difference13 in13 defocus13 blur13 as13 would13 a13 4mm13 pupil13 with13 a13 413 diopters13 difference13 between13 the13 eyes13 In13 that13 case13 there13 are13 no13 measureable13 signals13 above13 113 cpd13 (f)13 The13 signals13 in13 (e)13 after13 having13 been13 passed13 through13 a13 bank13 of13 1513 octave13 bandwidth13 log-shy‐Gabor13 filters13 (g)13 Same13 as13 (e)13 but13 with13 a13 4mm13 pupil13 (h)13 Same13 as13 (f)13 but13 with13 a13 4mm13 pupil13 13

13

1513 13

13 Figure13 S813 Depth13 structure13 not13 attributable13 to13 slant13 in13 natural13 scenes13 (a)13 A13 representative13 natural13 range13 image13 (b)13 Overhead13 plan13 view13 of13 the13 range13 along13 the13 white13 scan-shy‐line13 in13 (a)13 (c)13 Range13 values13 within13 a13 113 deg13 patch13 marked13 by13 the13 white13 box13 in13 (a)13 and13 the13 black13 box13 in13 (b)13 The13 dots13 represent13 the13 raw13 range13 values13 The13 plane13 is13 the13 best13 fitting13 plane13 to13 the13 range13 data13 (d)13 Depth13 variation13 in13 113 deg13 patches13 of13 natural13 scenes13 can13 largely13 be13 captured13 by13 slanted13 planes13 (see13 supplementary13 text)13 Values13 near13 one13 indicate13 that13 all13 range13 variation13 can13 be13 captured13 by13 (planar)13 surface13 slant13 Values13 near13 zero13 indicate13 that13 none13 of13 the13 range13 variation13 can13 be13 captured13 by13 planar13 surface13 slant13 In13 the13 most13 common13 patch13 nearly13 all13 depth13 variation13 is13 attributable13 to13 slant13 In13 the13 median13 patch13 7513 of13 depth13 variation13 is13 captured13 by13 slant13 13 13 13 13 13 13 13 13 13 13

1613 13

13 Figure13 S913 Contrast13 normalization13 and13 the13 Gaussian13 form13 of13 the13 filter13 response13 distributions13 The13 standard13 model13 of13

contrast13 normalization13 in13 retinal13 and13 cortical13 neurons13 is13 given13 by13 cnorm x( ) = c x( ) c x( ) 2

+ nc502 (Albrecht13 amp13 Geisler13

199113 Albrecht13 amp13 Hamilton13 198213 Heeger13 1992)13 Throughout13 the13 paper13 we13 assumed13 that13 c50 13 had13 a13 value13 of13 zero13 To13 examine13 the13 effect13 of13 different13 values13 of13 c50 13 on13 the13 form13 of13 the13 filter13 response13 distributions13 we13 first13 contrast-shy‐normalized13 the13 test13 stimuli13 using13 different13 values13 of13 c50 13 Then13 we13 projected13 the13 contrast-shy‐normalized13 the13 test13 stimuli13 from13 each13 disparity13 level13 onto13 the13 filters13 to13 obtain13 the13 conditional13 filter13 response13 distributions13 Finally13 we13 fit13 a13 multi-shy‐dimensional13 generalized13 Gaussian13 (one-shy‐dimension13 for13 each13 filter)13 to13 each13 conditional13 response13 distribution13

P R |δ k( ) 13 via13 maximum13 likelihood13 estimation13 (a)13 When13 c50 =13 0013 the13 conditional13 response13 distributions13 have13 tails13 that13 tend13 to13 be13 somewhat13 lighter13 than13 Gaussians13 (ie13 lower13 kurtosis13 than13 a13 Gaussian)13 When13 c50 =13 0113 the13 conditional13 response13 distributions13 are13 most13 Gaussian13 on13 average13 For13 c50 0113 the13 distributions13 have13 heavier13 tails13 than13 Gaussians13 (ie13 higher13 kurtosis13 than13 a13 Gaussian)13 (b)13 Powers13 of13 the13 multi-shy‐dimensional13 generalized13 Gaussians13 that13 best13 fit13 a13 subset13 of13 the13 two-shy‐dimensional13 conditional13 filter13 response13 distributions13 A13 power13 of13 two13 (dashed13 line)13 indicates13 that13 the13 best13 fitting13 distribution13 was13 Gaussian13 Powers13 greater13 than13 or13 less13 than13 two13 indicate13 that13 the13 responses13 are13 better13 fit13 by13 distributions13 having13 lighter13 or13 heavier13 tails13 than13 a13 Gaussian13 respectively13 The13 best-shy‐fitting13 powers13 are13 shown13 for13 three13 different13 values13 of13 c50 13 00013 01013 and13 02513 The13 average13 powers13 across13 the13 conditional13 response13 distributions13 for13 the13 three13 tested13 values13 of13 c50 13 were13 27913 20713 and13 11713 respectively13 The13 same13 qualitative13 results13 hold13 for13 the13 other13 two-shy‐dimensional13 response13 distributions13 The13 pattern13 also13 holds13 for13 higher-shy‐dimensional13 filter13 response13 distributions13 In13 higher13 dimensions13 however13 results13 are13 noisier13 because13 stable13 estimates13 of13 kurtosis13 require13 significantly13 more13 data13 13 13 13 13 13

  • jovi-14-02-01_118
    • Introduction
    • f01
    • Methods and results
    • e01
    • f02
    • e02
    • e03
    • e04
    • f03
    • e05
    • f04
    • f05
    • f06
    • f07
    • Discussion
    • Conclusions
    • Methods details
    • e06
    • e07
    • Albrecht1
    • Albrecht2
    • Anzai1
    • Badcock1
    • Banks1
    • Berens1
    • Blakemore1
    • Burge1
    • Burge2
    • Carandini1
    • Chen1
    • Cormack1
    • Cumming1
    • Cumming2
    • Cumming3
    • Curcio1
    • DeValois1
    • DeAngelis1
    • Ferster1
    • Fincham1
    • Fleet1
    • Ganguli1
    • Geisler1
    • Girshick1
    • Haefner1
    • Haefner2
    • Harris1
    • Hecht1
    • Heeger1
    • Hibbard1
    • Hoyer1
    • Jazayeri1
    • Kaye1
    • Landers1
    • Lehky1
    • Levi1
    • Lewicki1
    • Li1
    • Liu1
    • Ma1
    • Marcos1
    • Marr1
    • McKee1
    • Nakayama1
    • Nienborg1
    • Ogle1
    • Ohzawa1
    • Ohzawa2
    • Ohzawa3
    • Olshausen1
    • Panum1
    • Poggio1
    • Qian1
    • Qian2
    • Read1
    • Read2
    • Read3
    • Simoncelli1
    • Skottun1
    • Stevenson1
    • Stockman1
    • Tanabe1
    • Tanabe2
    • Thibos1
    • Tsao1
    • Tyler1
    • Tyler2
    • Vlaskamp1
    • Williams1
    • Wyszecki1
      • JOV-03741-2013-s01
Page 23: Optimal disparity estimation in natural stereo images · The local differences between the retinal images—the binocular disparities— ... vertically oriented receptive fields.The

513 13

phase-shy‐aligned13 sinewave13 gratings13 with13 the13 1f13 contrast13 fall-shy‐off13 typical13 of13 natural13 scenes(Field13 1987)13 The13 binocular13 difference13 signal13 (ie13 the13 difference13 of13 the13 left13 and13 right13 signals)13 is13 sinusoidal13 with13 contrast13 amplitude13 given13 by13 13

AB f δ k( ) = AL( f )2 + AR( f )2 minus 2AL( f )AR( f )cos 2π fδ k( ) 13 13 13 (S6)13

13 where f 13 is13 the13 spatial13 frequency13 δ k 13 is13 a13 particular13 disparity13 AL ( f ) 13 and13 AR( f ) 13 are13 the13 left13 and13

right13 eye13 retinal13 amplitudes13 and13 AB f δ k( ) 13 is13 the13 amplitude13 of13 the13 binocular13 contrast13 difference13 signal13 If13 the13 two13 eyes13 have13 identical13 optics13 the13 left-shy‐13 and13 right-shy‐eye13 retinal13 amplitudes13 (which13 include13 the13 effects13 of13 the13 optics13 and13 the13 1 f 13 falloff13 of13 natural13 image13 spectra)13 will13 be13 the13 same13 for13 a13 given13 disparity13 Fig13 S7a13 plots13 pattern13 of13 binocular13 contrast13 signals13 for13 each13 of13 several13 disparities13 Fig13 S7b13 shows13 the13 signals13 after13 filtering13 with13 a13 bank13 of13 log-shy‐Gabor13 filters13 each13 having13 a13 1513 octave13 bandwidth13 The13 patterns13 of13 binocular13 contrast13 barely13 differ13 above13 ~613 cpd13 A13 filter13 population13 (for13 disparity13 estimation)13 will13 be13 most13 efficient13 if13 each13 filter13 signals13 information13 about13 a13 range13 of13 disparities13 13 13 V113 binocular13 receptive13 fields13 tuned13 to13 the13 highest13 useful13 spatial13 frequency13 (~613 cpd)13 would13 have13 a13 spatial13 extent13 of13 ~813 arcmin13 assuming13 the13 typical13 1513 octave13 bandwidth13 of13 neurons13 in13 cortex13 Given13 that13 receptive13 fields13 cannot13 signal13 disparity13 variation13 finer13 than13 their13 own13 width13 this13 analysis13 may13 help13 explain13 why13 humans13 lose13 the13 ability13 to13 detect13 disparity13 modulations13 (ie13 sinusoidal13 modulations13 in13 disparity-shy‐defined13 depth)13 at13 spatial13 frequencies13 higher13 than13 ~413 cpd13 (Banks13 Gepshtein13 amp13 Landy13 200413 Harris13 McKee13 amp13 Smallman13 1997)13 Vergence13 eye13 movement13 jitter13 has13 previously13 been13 proposed13 as13 an13 explanation13 for13 why13 useful13 binocular13 information13 is13 restricted13 to13 low13 spatial13 frequencies13 (Vlaskamp13 Yoon13 amp13 Banks13 2011)13 vergence13 noise13 can13 degrade13 stereo13 information13 enough13 to13 render13 high13 frequency13 signals13 unmeasurable13 (Fig13 S7cd)13 The13 analysis13 presented13 here13 suggests13 that13 the13 frequency13 at13 which13 humans13 lose13 the13 ability13 to13 detect13 disparity13 modulation13 may13 simply13 be13 explained13 by13 the13 statistics13 of13 natural13 images13 13 Optics13 The13 retinal13 images13 used13 in13 our13 analysis13 were13 simulated13 to13 match13 the13 retinal13 image13 formation13 that13 occurs13 in13 the13 human13 eye13 In13 humans13 and13 non-shy‐human13 primates13 accommodative13 and13 vergence13 eye-shy‐movement13 systems13 are13 yoked(Fincham13 amp13 Walton13 1957)13 Consistent13 with13 this13 fact13 we13 set13 the13 focus13 distance13 equal13 to13 the13 fixation13 distance13 of13 4013 cm13 and13 defocused13 the13 images13 appropriate13 for13 each13 disparity13 The13 image13 of13 a13 surface13 displaced13 from13 the13 fixation13 is13 defocused13 by13

13 ΔD cong δ IPD 13 13 13 13 13 13 13 13 13 13 (S7)13

13 where13 δ 13 is13 the13 disparity13 of13 the13 surface13 expressed13 in13 radians13 IPD13 is13 the13 inter-shy‐pupillary13 distance13 expressed13 in13 meters13 and13 ΔD 13 is13 the13 defocus13 expressed13 in13 diopters13 (1meters)13 To13 account13 for13 the13 effect13 of13 chromatic13 aberration13 we13 created13 a13 different13 polychromatic13 point-shy‐spread13 function13 (PSF)13 for13 each13 disparity13 (ie13 defocus13 Eq13 S7)13 Single-shy‐wavelength13 PSFs13 were13 computed13 every13 1013 nm13 between13 40013 and13 70013 nm13 The13 wavelength-shy‐dependent13 change13 in13 refractive13 power13 of13 the13 human13 eye13 was13 taken13 from13 the13 literature13 (Thibos13 Ye13 Zhang13 amp13 Bradley13 1992)13 Polychromatic13 PSFs13 were13 obtained13 by13 weighting13 the13 single-shy‐wavelength13 PSFs13 by13 the13 photopic13 sensitivity13 function13 (Stockman13 amp13 Sharpe13 2000)13 and13 by13 the13 D6513 daylight13 illumination13 spectrum13 and13 then13 summing13 13

613 13

13 psf photopic xΔD( ) = 1

Kpsf xλΔD( )sc

λsum λ( )D65 λ( ) 13 13 13 13 (S8)13

13 where13 sc λ( ) 13 is13 the13 human13 photopic13 sensitivity13 function13 and13 K13 is13 a13 normalizing13 constant13 that13 sets13 the13 volume13 of13 psf photopic 13 to13 1013 The13 modulation13 transfer13 function13 (MTF)13 is13 the13 amplitude13 of13 the13 Fourier13 transform13 of13 the13 PSF13 13 Contrast13 normalization13 Contrast13 normalization13 occurs13 early13 in13 visual13 processing13 The13 standard13 model13 of13 cortical13 neuron13 response13 assumes13 that13 the13 input13 is13 the13 contrast13 signal13

c x( ) = r x( )minus r( ) r 13 where13 r x( ) 13 are13 the13

noisy13 sensor13 responses13 to13 the13 luminance13 image13 falling13 on13 the13 retina13 over13 a13 local13 area13 and13 r 13 is13 the13 local13 mean13 This13 contrast13 signal13 is13 then13 normalized13 by13 the13 local13 contrast13

cnorm x( ) = c x( ) c x( ) 2

+ nc502 13 where13 n13 is13 the13 dimensionality13 of13 the13 vector13 c x( ) 2 13 is13 the13 contrast13

energy13 (or13 equivalently13 n 13 times13 the13 squared13 RMS13 contrast)13 and13 50c 13 is13 the13 half-shy‐saturation13 constant13 (Albrecht13 amp13 Geisler13 199113 Albrecht13 amp13 Hamilton13 198213 Heeger13 1992)13 (We13 are13 agnostic13 about13 the13 mechanism13 by13 which13 the13 contrast13 normalization13 is13 achieved)13 Finally13 the13 normalized13 contrast13 signal13 is13 weighted13 by13 the13 cortical13 neuronrsquos13 receptive13 field13 and13 passed13 through13 a13 static13 non-shy‐linearity13 For13 example13 in13 the13 case13 of13 a13 binocular13 simple13 cell13 the13 response13 of13 the13 neuron13 is13 obtained13 by13 taking13 the13 dot13 product13 of13 the13 contrast-shy‐normalized13 signal13 with13 the13 receptive13 field13 and13 half-shy‐wave13 rectifying13

Ri = Rmax cnorm x( ) sdot fi x( )⎢⎣ ⎥⎦ 13 13

13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13

713 13

13 References13 Albrecht13 D13 G13 amp13 Geisler13 W13 S13 (1991)13 Motion13 selectivity13 and13 the13 contrast-shy‐response13 function13 of13

simple13 cells13 in13 the13 visual13 cortex13 Visual13 Neuroscience13 7(6)13 531ndash54613 Albrecht13 D13 G13 amp13 Hamilton13 D13 B13 (1982)13 Striate13 cortex13 of13 monkey13 and13 cat13 contrast13 response13

function13 Journal13 of13 Neurophysiology13 48(1)13 217ndash23713 Banks13 M13 S13 Gepshtein13 S13 amp13 Landy13 M13 S13 (2004)13 Why13 is13 spatial13 stereoresolution13 so13 low13 The13

Journal13 of13 Neuroscience13 24(9)13 2077ndash208913 13 Field13 D13 J13 (1987)13 Relations13 between13 the13 statistics13 of13 natural13 images13 and13 the13 response13 properties13 of13

cortical13 cells13 Journal13 of13 the13 Optical13 Society13 of13 America13 A13 4(12)13 2379ndash239413 Fincham13 E13 amp13 Walton13 J13 (1957)13 The13 reciprocal13 actions13 of13 accommodation13 and13 convergence13 The13

Journal13 of13 Physiology13 137(3)13 488ndash50813 Geisler13 W13 S13 amp13 Albrecht13 D13 G13 (1997)13 Visual13 cortex13 neurons13 in13 monkeys13 and13 cats13 detection13

discrimination13 and13 identification13 Visual13 Neuroscience13 14(5)13 897ndash91913 Harris13 J13 M13 McKee13 S13 P13 amp13 Smallman13 H13 S13 (1997)13 Fine-shy‐scale13 processing13 in13 human13 binocular13

stereopsis13 Journal13 of13 the13 Optical13 Society13 of13 America13 A13 14(8)13 1673ndash168313 Heeger13 D13 J13 (1992)13 Normalization13 of13 cell13 responses13 in13 cat13 striate13 cortex13 Visual13 Neuroscience13 9(2)13

181ndash19713 Hibbard13 P13 B13 (2008)13 Binocular13 energy13 responses13 to13 natural13 images13 Vision13 Research13 48(12)13 1427ndash

143913 13 Stockman13 A13 amp13 Sharpe13 L13 T13 (2000)13 The13 spectral13 sensitivities13 of13 the13 middle-shy‐13 and13 long-shy‐wavelength-shy‐

sensitive13 cones13 derived13 from13 measurements13 in13 observers13 of13 known13 genotype13 Vision13 Research13 40(13)13 1711ndash173713

Thibos13 L13 N13 Ye13 M13 Zhang13 X13 amp13 Bradley13 A13 (1992)13 The13 chromatic13 eye13 a13 new13 reduced-shy‐eye13 model13 of13 ocular13 chromatic13 aberration13 in13 humans13 Applied13 Optics13 31(19)13 3594ndash360013

Tolhurst13 D13 J13 Movshon13 J13 A13 amp13 Dean13 A13 F13 (1983)13 The13 statistical13 reliability13 of13 signals13 in13 single13 neurons13 in13 cat13 and13 monkey13 visual13 cortex13 Vision13 Research13 23(8)13 775ndash78513

Vlaskamp13 B13 N13 S13 Yoon13 G13 amp13 Banks13 M13 S13 (2011)13 Human13 stereopsis13 is13 not13 limited13 by13 the13 optics13 of13 the13 well-shy‐focused13 eye13 The13 Journal13 of13 Neuroscience13 31(27)13 9814ndash981813

13

813 13

13 Figure13 S113 Optimal13 filters13 and13 disparity13 estimation13 performance13 for13 estimating13 disparity13 for13 surfaces13 with13 a13 cosine13 distribution13 of13 slants13 (a)13 Optimal13 linear13 binocular13 filters13 for13 estimating13 the13 disparity13 (of13 the13 center13 pixel)13 of13 slanted13 surfaces13 The13 filters13 were13 learned13 on13 a13 training13 set13 having13 surfaces13 with13 a13 cosine13 distribution13 of13 slants13 The13 optimal13 filter13 shapes13 are13 robust13 to13 the13 presence13 of13 non-shy‐zero13 surface13 slants13 in13 the13 training13 set13 However13 these13 filters13 are13 of13 somewhat13 smaller13 spatial13 extent13 and13 slightly13 higher13 octave13 bandwidth13 (compare13 filters13 F3-shy‐F513 to13 those13 in13 Fig13 3)13 (b)13 Spatial13 frequency13 tuning13 and13 bandwidth13 of13 filters13 (c)13 Two-shy‐dimensional13 analogs13 of13 filters13 in13 (a)13 The13 two-shy‐dimensional13 filters13 respond13 virtually13 identically13 to13 two-shy‐dimensional13 images13 as13 do13 the13 one-shy‐dimensional13 filters13 to13 one-shy‐dimensional13 signals13 (d)13 Tuning13 curves13 of13 LL13 neurons13 trained13 and13 tested13 on13 surfaces13 with13 varying13 surface13 slant13 (e)13 Normalized13 weights13 on13 model13 complex13 cell13 responses13 for13 constructing13 five13 LL13 neurons13 with13 the13 indicated13 preferred13 disparities13 (f)13 Disparity13 estimation13 performance13 with13 surfaces13 having13 a13 cosine13 distribution13 of13 slants13 (black13 line)13 Also13 shown13 are13 disparity13 estimates13 derived13 from13 a13 local13 cross-shy‐correlator13 (gray13 line)13 (g)13 Disparity13 gradients13 from13 viewing13 natural13 scenes(Hibbard13 2008)13 vs13 disparity13 gradients13 in13 our13 training13 set13 The13 relationship13 between13 slant13 and13 disparity13 gradient13 is13 given13 by13

g asymp IPD tanθ( ) d 13 where13 θ 13 is13 the13 surface13 slant13 and13 d13 is13 the13 distance13 to13 the13 surface13 Hibbard13 (2008)13 estimated13 disparity13 gradients13 in13 natural13 viewing13 for13 distances13 nearer13 than13 3513 m13 (dashed13 curve)13 and13 0513 m13 (dotted13 curve)13 At13 the13 04m13 viewing13 distance13 assumed13 throughout13 the13 paper13 our13 manipulation13 of13 surface13 slant13 introduced13 distribution13 of13 disparity13 gradients13 (black13 line)13 that13 is13 comparable13 to13 the13 gradients13 introduced13 in13 natural13 viewing13 13

913 13

13 Figure13 S213 Filter13 responses13 LL13 neuron13 responses13 and13 performance13 with13 random-shy‐dot13 stereograms13 and13 anti-shy‐correlated13 random-shy‐dot13 stereograms13 (a)13 Joint13 response13 distributions13 of13 filters13 F113 and13 F213 when13 stimulated13 with13 random-shy‐dot13 stereograms13 Compare13 to13 natural13 stimuli13 in13 Fig13 4a13 (b)13 Joint13 responses13 of13 filters13 F313 and13 F413 when13 stimulated13 with13 random-shy‐dot13 stereograms13 (c)13 LL13 neuron13 responses13 to13 random13 dot13 stereograms13 (d)13 LL13 neuron13 tuning13 vs13 bandwidth13 when13 stimulated13 with13 natural13 stimuli13 Note13 that13 bandwidths13 are13 elevated13 compared13 to13 when13 the13 LL13 neurons13 are13 stimulated13 with13 natural13 stimuli13 (see13 Fig13 7c)13 (e)13 Disparity13 estimates13 and13 estimate13 precision13 (inset)13 obtained13 with13 random-shy‐dot13 stereograms13 (fghi)13 Same13 as13 abce13 except13 that13 the13 stimuli13 were13 anti-shy‐correlated13 random13 dot13 stereograms13 Filter13 response13 distributions13 still13 segregate13 as13 a13 function13 of13 disparity13 (especially13 with13 F313 and13 F4)13 However13 the13 anti-shy‐correlated13 filter13 response13 distributions13 do13 not13 coincide13 with13 the13 lsquoexpectedrsquo13 pattern13 of13 response13 for13 particular13 disparities13 The13 disparities13 are13 decoded13 improperly13 with13 a13 decoder13 trained13 on13 natural13 stimuli13 LL13 neuron13 tuning13 curves13 with13 anti-shy‐correlated13 stereograms13 are13 neither13 selective13 nor13 invariant13 Inaccurate13 highly13 variable13 estimates13 result13 13 13

13 13

1013 13

13 Figure13 S313 Obtaining13 AMA13 filter13 and13 model13 complex13 cell13 responses13 with13 established13 neural13 operations13 (a)13 Simple13 cell13 responses13 are13 obtained13 by13 projecting13 the13 left-shy‐13 and13 right-shy‐eye13 signals13 onto13 each13 binocular13 filter13 and13 half-shy‐wave13 rectifying13 Subtracting13 lsquoonrsquo13 and13 lsquooffrsquo13 simple13 cell13 responses13 give13 the13 AMA13 filter13 responses13 (see13 Fig13 3a)13 (b)13 Model13 (and13 neurophysiological)13 complex13 cell13 responses13 are13 obtained13 by13 summing13 and13 squaring13 the13 responses13 of13 simple13 cells13 (Model13 complex13 cell13 responses13 can13 also13 be13 obtained13 by13 summing13 the13 responses13 of13 simple13 cells13 that13 have13 both13 a13 half-shy‐wave13 rectifying13 and13 a13 squaring13 nonlinearity)13 Note13 that13 the13 model13 complex13 cells13 differ13 from13 canonical13 disparity13 energy13 binocular13 complex13 cells13 We13 label13 them13 lsquocomplexrsquo13 because13 they13 exhibit13 temporal13 frequency13 doubling13 and13 have13 a13 fundamental13 to13 DC13 response13 ratio13 of13 less13 than13 one13 The13 model13 LL13 neurons13 (see13 Fig13 613 and13 Fig13 7)13 are13 more13 similar13 to13 disparity13 energy13 complex13 cells13 although13 as13 noted13 in13 the13 text13 there13 are13 also13 important13 differences13

13

1113 13

13 Figure13 S413 The13 eight13 AMA13 filters13 from13 Fig13 313 (on13 the13 diagonal)13 and13 their13 pairwise13 sums13 (see13 Fig13 S3a)13 Solid13 and13 dashed13 curves13 indicate13 the13 left13 and13 right13 eye13 filter13 components13 respectively13 Filter13 identity13 is13 indicated13 by13 the13 subscripts13 Fij13 Some13 filters13 have13 shapes13 similar13 to13 canonical13 V1-shy‐like13 binocular13 receptive13 fields13 These13 filters13 yield13 model13 complex13 cell13 tuning13 curves13 that13 vary13 with13 disparity13 (Fig13 S5)13 Other13 filters13 are13 irregularly13 shaped13 (eg13 F36)13 Irregularly13 shaped13 filters13 yield13 model13 complex13 cell13 tuning13 curves13 that13 do13 not13 vary13 significantly13 with13 disparity13 (Fig13 S5)13 The13 inset13 shows13 the13 largest13 absolute13 normalized13 weight13 (see13 Supplement13 Eqs13 S1a-shy‐d)13 given13 to13 each13 filter13 in13 constructing13 the13 set13 of13 LL13 neurons13 (Fig13 7a)13 Lighter13 colors13 indicate13 smaller13 weights13 Unsurprisingly13 the13 irregularly13 shaped13 filters13 (eg13 F36)13 play13 a13 generally13 smaller13 role13 in13 shaping13 the13 response13 properties13 of13 log-shy‐likelihood13 (LL)13 neurons13 13 13

1213 13

13 Figure13 S513 Disparity13 tuning13 curves13 of13 the13 model13 complex13 cells13 implied13 by13 the13 AMA13 filters13 (on-shy‐diagonal)13 and13 their13 normalized13 pairwise13 sums13 (off-shy‐diagonal)13 Solid13 curves13 show13 the13 complex13 cell13 tuning13 curves13 to13 natural13 stimuli13 The13

complex13 response13 is13 given13 by13 filtering13 the13 contrast-shy‐normalized13 stimulus13 and13 then13 squaring13 Rmax fi

Tcnorm( )2 13 (see13 Fig13 S4)13 Each13 point13 on13 the13 curve13 represents13 the13 average13 response13 to13 a13 large13 number13 of13 natural13 stimuli13 having13 the13 same13 disparity13 Gray13 area13 shows13 response13 variability13 due13 to13 variation13 in13 natural13 image13 content13 Dashed13 curves13 show13 the13 complex13 cell13 tuning13 curve13 to13 513 random-shy‐dot13 stereograms13 (RDSs)13 In13 most13 cases13 RDS13 tuning13 curves13 are13 similar13 to13 the13 tuning13 curves13 for13 natural13 stimuli13 although13 response13 magnitude13 is13 generally13 somewhat13 attenuated13 These13 individual13 model13 complex13 cells13 have13 some13 selectivity13 for13 disparity13 However13 they13 exhibit13 little13 invariance13 to13 variation13 in13 natural13 image13 content13 Thus13 the13 pattern13 of13 the13 joint13 population13 response13 must13 be13 used13 to13 accurately13 estimate13 disparity13 Neurons13 that13 signal13 the13 log-shy‐likelihood13 of13 disparity13 (LL13 neurons)13 can13 be13 constructed13 by13 appropriately13 combining13 the13 joint13 responses13 of13 model13 complex13 cells13 LL13 neurons13 are13 highly13 selective13 for13 disparity13 and13 are13 also13 strongly13 response13 invariant13 to13 irrelevant13

1313 13

13 Figure13 S613 Optimal13 disparity13 estimation13 vs13 estimation13 via13 local13 cross-shy‐correlation13 Performance13 measures13 were13 obtained13 by13 testing13 each13 method13 on13 identical13 sets13 of13 left-shy‐13 and13 right-shy‐eye13 signals13 like13 those13 shown13 in13 Fig13 1d13 (Note13 that13 this13 differs13 somewhat13 from13 the13 most13 common13 test13 of13 the13 cross-shy‐correlation13 model13 which13 gives13 the13 algorithm13 windowed13 access13 to13 many13 patches13 in13 each13 eye13 We13 chose13 the13 current13 method13 so13 that13 each13 algorithm13 has13 access13 to13 the13 same13 image13 information13 thereby13 making13 the13 comparison13 meaningful)13 Black13 curves13 indicate13 estimation13 performance13 via13 local13 cross-shy‐correlation13 Gray13 curves13 indicate13 performance13 obtained13 with13 the13 eight13 AMA13 filters13 shown13 in13 the13 main13 text13 (data13 replotted13 from13 Fig13 5)13 (a)13 Median13 MAP13 estimates13 for13 frontoparallel13 surfaces13 only13 Error13 bars13 indicate13 6813 confidence13 intervals13 (b)13 Estimate13 precision13 as13 indicated13 by13 6813 confidence13 intervals13 Error13 bars13 from13 (a)13 are13 replotted13 on13 a13 linear13 axis13 (Confidence13 intervals13 were13 plotted13 on13 a13 semi-shy‐log13 plot13 in13 the13 main13 text)13 (c)13 Sign13 identification13 performance13 with13 frontoparallel13 surfaces13 (de)13 Same13 as13 bc13 except13 that13 the13 performance13 is13 for13 surfaces13 with13 a13 cosine13 distribution13 of13 slants13 Estimation13 performance13 with13 the13 cross-shy‐correlator13 is13 (slightly)13 more13 biased13 and13 is13 significantly13 less13 precise13 at13 disparities13 larger13 than13 7513 arcmin13 13

1413 13

13 Figure13 S713 Information13 for13 disparity13 estimation13 is13 concentrated13 in13 low13 to13 mid13 spatial13 frequencies13 (a)13 Inter-shy‐ocular13 retinal13 contrast13 amplitude13 of13 a13 cosine-shy‐windowed13 high-shy‐contrast13 edge13 as13 a13 function13 of13 spatial13 frequency13 for13 different13 disparities13 (Eq13 S6)13 The13 optics13 in13 the13 two13 eyes13 were13 identical13 ie13

AL f( ) = AR f( )prop MTFδ f( ) f 13 where13 MTFδ is13 the13 modulation13 transfer13 function13 associated13 with13 a13 particular13 disparity13 (see13 Methods)13 (b)13 Inter-shy‐ocular13 contrast13 signals13 that13 have13 been13 passed13 through13 a13 bank13 of13 1513 octave13 bandwidth13 log-shy‐Gabor13 filters13 The13 signals13 differ13 little13 above13 ~513 cpd13 (c)13 The13 effect13 of13 vergence13 jitter13 on13 inter-shy‐ocular13 contrast13 signals13 Signals13 were13 corrupted13 by13 213 arcmin13 of13 vergence13 jitter13 a13 typical13 value13 in13 humans13 Vergence13 noise13 was13 simulated13 by13 convolving13 each13 disparity13 signal13 with13 a13 Gaussian13 having13 a13 standard13 deviation13 of13 213 arcmin13 Vergence13 jitter13 further13 attenuates13 high13 frequency13 inter-shy‐ocular13 contrast13 signals13 Natural13 image13 structure13 and13 vergence13 jitter13 both13 contribute13 to13 the13 lack13 of13 disparity13 information13 at13 high13 luminance13 spatial13 frequencies13 (d)13 The13 signals13 in13 (c)13 after13 having13 been13 passed13 through13 a13 bank13 of13 1513 octave13 bandwidth13 log-shy‐Gabor13 filters13 (e)13 The13 effect13 of13 a13 four13 diopter13 difference13 in13 refractive13 power13 between13 the13 two13 eyes13 drastically13 reduces13 the13 inter-shy‐ocular13 contrast13 signal13 A13 four13 diopter13 (or13 larger)13 difference13 between13 the13 left13 and13 right13 optics13 is13 a13 primary13 risk13 factor13 for13 the13 development13 of13 amblyopia13 (Levi13 et13 al13 2011)13 Note13 that13 these13 signals13 were13 simulated13 for13 eyes13 with13 2mm13 pupils13 With13 larger13 pupil13 diameters13 the13 same13 difference13 in13 refractive13 power13 would13 further13 reduce13 available13 disparity13 signals13 For13 example13 with13 a13 2mm13 pupil13 an13 813 diopter13 difference13 between13 the13 eyes13 produces13 the13 same13 difference13 in13 defocus13 blur13 as13 would13 a13 4mm13 pupil13 with13 a13 413 diopters13 difference13 between13 the13 eyes13 In13 that13 case13 there13 are13 no13 measureable13 signals13 above13 113 cpd13 (f)13 The13 signals13 in13 (e)13 after13 having13 been13 passed13 through13 a13 bank13 of13 1513 octave13 bandwidth13 log-shy‐Gabor13 filters13 (g)13 Same13 as13 (e)13 but13 with13 a13 4mm13 pupil13 (h)13 Same13 as13 (f)13 but13 with13 a13 4mm13 pupil13 13

13

1513 13

13 Figure13 S813 Depth13 structure13 not13 attributable13 to13 slant13 in13 natural13 scenes13 (a)13 A13 representative13 natural13 range13 image13 (b)13 Overhead13 plan13 view13 of13 the13 range13 along13 the13 white13 scan-shy‐line13 in13 (a)13 (c)13 Range13 values13 within13 a13 113 deg13 patch13 marked13 by13 the13 white13 box13 in13 (a)13 and13 the13 black13 box13 in13 (b)13 The13 dots13 represent13 the13 raw13 range13 values13 The13 plane13 is13 the13 best13 fitting13 plane13 to13 the13 range13 data13 (d)13 Depth13 variation13 in13 113 deg13 patches13 of13 natural13 scenes13 can13 largely13 be13 captured13 by13 slanted13 planes13 (see13 supplementary13 text)13 Values13 near13 one13 indicate13 that13 all13 range13 variation13 can13 be13 captured13 by13 (planar)13 surface13 slant13 Values13 near13 zero13 indicate13 that13 none13 of13 the13 range13 variation13 can13 be13 captured13 by13 planar13 surface13 slant13 In13 the13 most13 common13 patch13 nearly13 all13 depth13 variation13 is13 attributable13 to13 slant13 In13 the13 median13 patch13 7513 of13 depth13 variation13 is13 captured13 by13 slant13 13 13 13 13 13 13 13 13 13 13

1613 13

13 Figure13 S913 Contrast13 normalization13 and13 the13 Gaussian13 form13 of13 the13 filter13 response13 distributions13 The13 standard13 model13 of13

contrast13 normalization13 in13 retinal13 and13 cortical13 neurons13 is13 given13 by13 cnorm x( ) = c x( ) c x( ) 2

+ nc502 (Albrecht13 amp13 Geisler13

199113 Albrecht13 amp13 Hamilton13 198213 Heeger13 1992)13 Throughout13 the13 paper13 we13 assumed13 that13 c50 13 had13 a13 value13 of13 zero13 To13 examine13 the13 effect13 of13 different13 values13 of13 c50 13 on13 the13 form13 of13 the13 filter13 response13 distributions13 we13 first13 contrast-shy‐normalized13 the13 test13 stimuli13 using13 different13 values13 of13 c50 13 Then13 we13 projected13 the13 contrast-shy‐normalized13 the13 test13 stimuli13 from13 each13 disparity13 level13 onto13 the13 filters13 to13 obtain13 the13 conditional13 filter13 response13 distributions13 Finally13 we13 fit13 a13 multi-shy‐dimensional13 generalized13 Gaussian13 (one-shy‐dimension13 for13 each13 filter)13 to13 each13 conditional13 response13 distribution13

P R |δ k( ) 13 via13 maximum13 likelihood13 estimation13 (a)13 When13 c50 =13 0013 the13 conditional13 response13 distributions13 have13 tails13 that13 tend13 to13 be13 somewhat13 lighter13 than13 Gaussians13 (ie13 lower13 kurtosis13 than13 a13 Gaussian)13 When13 c50 =13 0113 the13 conditional13 response13 distributions13 are13 most13 Gaussian13 on13 average13 For13 c50 0113 the13 distributions13 have13 heavier13 tails13 than13 Gaussians13 (ie13 higher13 kurtosis13 than13 a13 Gaussian)13 (b)13 Powers13 of13 the13 multi-shy‐dimensional13 generalized13 Gaussians13 that13 best13 fit13 a13 subset13 of13 the13 two-shy‐dimensional13 conditional13 filter13 response13 distributions13 A13 power13 of13 two13 (dashed13 line)13 indicates13 that13 the13 best13 fitting13 distribution13 was13 Gaussian13 Powers13 greater13 than13 or13 less13 than13 two13 indicate13 that13 the13 responses13 are13 better13 fit13 by13 distributions13 having13 lighter13 or13 heavier13 tails13 than13 a13 Gaussian13 respectively13 The13 best-shy‐fitting13 powers13 are13 shown13 for13 three13 different13 values13 of13 c50 13 00013 01013 and13 02513 The13 average13 powers13 across13 the13 conditional13 response13 distributions13 for13 the13 three13 tested13 values13 of13 c50 13 were13 27913 20713 and13 11713 respectively13 The13 same13 qualitative13 results13 hold13 for13 the13 other13 two-shy‐dimensional13 response13 distributions13 The13 pattern13 also13 holds13 for13 higher-shy‐dimensional13 filter13 response13 distributions13 In13 higher13 dimensions13 however13 results13 are13 noisier13 because13 stable13 estimates13 of13 kurtosis13 require13 significantly13 more13 data13 13 13 13 13 13

  • jovi-14-02-01_118
    • Introduction
    • f01
    • Methods and results
    • e01
    • f02
    • e02
    • e03
    • e04
    • f03
    • e05
    • f04
    • f05
    • f06
    • f07
    • Discussion
    • Conclusions
    • Methods details
    • e06
    • e07
    • Albrecht1
    • Albrecht2
    • Anzai1
    • Badcock1
    • Banks1
    • Berens1
    • Blakemore1
    • Burge1
    • Burge2
    • Carandini1
    • Chen1
    • Cormack1
    • Cumming1
    • Cumming2
    • Cumming3
    • Curcio1
    • DeValois1
    • DeAngelis1
    • Ferster1
    • Fincham1
    • Fleet1
    • Ganguli1
    • Geisler1
    • Girshick1
    • Haefner1
    • Haefner2
    • Harris1
    • Hecht1
    • Heeger1
    • Hibbard1
    • Hoyer1
    • Jazayeri1
    • Kaye1
    • Landers1
    • Lehky1
    • Levi1
    • Lewicki1
    • Li1
    • Liu1
    • Ma1
    • Marcos1
    • Marr1
    • McKee1
    • Nakayama1
    • Nienborg1
    • Ogle1
    • Ohzawa1
    • Ohzawa2
    • Ohzawa3
    • Olshausen1
    • Panum1
    • Poggio1
    • Qian1
    • Qian2
    • Read1
    • Read2
    • Read3
    • Simoncelli1
    • Skottun1
    • Stevenson1
    • Stockman1
    • Tanabe1
    • Tanabe2
    • Thibos1
    • Tsao1
    • Tyler1
    • Tyler2
    • Vlaskamp1
    • Williams1
    • Wyszecki1
      • JOV-03741-2013-s01
Page 24: Optimal disparity estimation in natural stereo images · The local differences between the retinal images—the binocular disparities— ... vertically oriented receptive fields.The

613 13

13 psf photopic xΔD( ) = 1

Kpsf xλΔD( )sc

λsum λ( )D65 λ( ) 13 13 13 13 (S8)13

13 where13 sc λ( ) 13 is13 the13 human13 photopic13 sensitivity13 function13 and13 K13 is13 a13 normalizing13 constant13 that13 sets13 the13 volume13 of13 psf photopic 13 to13 1013 The13 modulation13 transfer13 function13 (MTF)13 is13 the13 amplitude13 of13 the13 Fourier13 transform13 of13 the13 PSF13 13 Contrast13 normalization13 Contrast13 normalization13 occurs13 early13 in13 visual13 processing13 The13 standard13 model13 of13 cortical13 neuron13 response13 assumes13 that13 the13 input13 is13 the13 contrast13 signal13

c x( ) = r x( )minus r( ) r 13 where13 r x( ) 13 are13 the13

noisy13 sensor13 responses13 to13 the13 luminance13 image13 falling13 on13 the13 retina13 over13 a13 local13 area13 and13 r 13 is13 the13 local13 mean13 This13 contrast13 signal13 is13 then13 normalized13 by13 the13 local13 contrast13

cnorm x( ) = c x( ) c x( ) 2

+ nc502 13 where13 n13 is13 the13 dimensionality13 of13 the13 vector13 c x( ) 2 13 is13 the13 contrast13

energy13 (or13 equivalently13 n 13 times13 the13 squared13 RMS13 contrast)13 and13 50c 13 is13 the13 half-shy‐saturation13 constant13 (Albrecht13 amp13 Geisler13 199113 Albrecht13 amp13 Hamilton13 198213 Heeger13 1992)13 (We13 are13 agnostic13 about13 the13 mechanism13 by13 which13 the13 contrast13 normalization13 is13 achieved)13 Finally13 the13 normalized13 contrast13 signal13 is13 weighted13 by13 the13 cortical13 neuronrsquos13 receptive13 field13 and13 passed13 through13 a13 static13 non-shy‐linearity13 For13 example13 in13 the13 case13 of13 a13 binocular13 simple13 cell13 the13 response13 of13 the13 neuron13 is13 obtained13 by13 taking13 the13 dot13 product13 of13 the13 contrast-shy‐normalized13 signal13 with13 the13 receptive13 field13 and13 half-shy‐wave13 rectifying13

Ri = Rmax cnorm x( ) sdot fi x( )⎢⎣ ⎥⎦ 13 13

13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13

713 13

13 References13 Albrecht13 D13 G13 amp13 Geisler13 W13 S13 (1991)13 Motion13 selectivity13 and13 the13 contrast-shy‐response13 function13 of13

simple13 cells13 in13 the13 visual13 cortex13 Visual13 Neuroscience13 7(6)13 531ndash54613 Albrecht13 D13 G13 amp13 Hamilton13 D13 B13 (1982)13 Striate13 cortex13 of13 monkey13 and13 cat13 contrast13 response13

function13 Journal13 of13 Neurophysiology13 48(1)13 217ndash23713 Banks13 M13 S13 Gepshtein13 S13 amp13 Landy13 M13 S13 (2004)13 Why13 is13 spatial13 stereoresolution13 so13 low13 The13

Journal13 of13 Neuroscience13 24(9)13 2077ndash208913 13 Field13 D13 J13 (1987)13 Relations13 between13 the13 statistics13 of13 natural13 images13 and13 the13 response13 properties13 of13

cortical13 cells13 Journal13 of13 the13 Optical13 Society13 of13 America13 A13 4(12)13 2379ndash239413 Fincham13 E13 amp13 Walton13 J13 (1957)13 The13 reciprocal13 actions13 of13 accommodation13 and13 convergence13 The13

Journal13 of13 Physiology13 137(3)13 488ndash50813 Geisler13 W13 S13 amp13 Albrecht13 D13 G13 (1997)13 Visual13 cortex13 neurons13 in13 monkeys13 and13 cats13 detection13

discrimination13 and13 identification13 Visual13 Neuroscience13 14(5)13 897ndash91913 Harris13 J13 M13 McKee13 S13 P13 amp13 Smallman13 H13 S13 (1997)13 Fine-shy‐scale13 processing13 in13 human13 binocular13

stereopsis13 Journal13 of13 the13 Optical13 Society13 of13 America13 A13 14(8)13 1673ndash168313 Heeger13 D13 J13 (1992)13 Normalization13 of13 cell13 responses13 in13 cat13 striate13 cortex13 Visual13 Neuroscience13 9(2)13

181ndash19713 Hibbard13 P13 B13 (2008)13 Binocular13 energy13 responses13 to13 natural13 images13 Vision13 Research13 48(12)13 1427ndash

143913 13 Stockman13 A13 amp13 Sharpe13 L13 T13 (2000)13 The13 spectral13 sensitivities13 of13 the13 middle-shy‐13 and13 long-shy‐wavelength-shy‐

sensitive13 cones13 derived13 from13 measurements13 in13 observers13 of13 known13 genotype13 Vision13 Research13 40(13)13 1711ndash173713

Thibos13 L13 N13 Ye13 M13 Zhang13 X13 amp13 Bradley13 A13 (1992)13 The13 chromatic13 eye13 a13 new13 reduced-shy‐eye13 model13 of13 ocular13 chromatic13 aberration13 in13 humans13 Applied13 Optics13 31(19)13 3594ndash360013

Tolhurst13 D13 J13 Movshon13 J13 A13 amp13 Dean13 A13 F13 (1983)13 The13 statistical13 reliability13 of13 signals13 in13 single13 neurons13 in13 cat13 and13 monkey13 visual13 cortex13 Vision13 Research13 23(8)13 775ndash78513

Vlaskamp13 B13 N13 S13 Yoon13 G13 amp13 Banks13 M13 S13 (2011)13 Human13 stereopsis13 is13 not13 limited13 by13 the13 optics13 of13 the13 well-shy‐focused13 eye13 The13 Journal13 of13 Neuroscience13 31(27)13 9814ndash981813

13

813 13

13 Figure13 S113 Optimal13 filters13 and13 disparity13 estimation13 performance13 for13 estimating13 disparity13 for13 surfaces13 with13 a13 cosine13 distribution13 of13 slants13 (a)13 Optimal13 linear13 binocular13 filters13 for13 estimating13 the13 disparity13 (of13 the13 center13 pixel)13 of13 slanted13 surfaces13 The13 filters13 were13 learned13 on13 a13 training13 set13 having13 surfaces13 with13 a13 cosine13 distribution13 of13 slants13 The13 optimal13 filter13 shapes13 are13 robust13 to13 the13 presence13 of13 non-shy‐zero13 surface13 slants13 in13 the13 training13 set13 However13 these13 filters13 are13 of13 somewhat13 smaller13 spatial13 extent13 and13 slightly13 higher13 octave13 bandwidth13 (compare13 filters13 F3-shy‐F513 to13 those13 in13 Fig13 3)13 (b)13 Spatial13 frequency13 tuning13 and13 bandwidth13 of13 filters13 (c)13 Two-shy‐dimensional13 analogs13 of13 filters13 in13 (a)13 The13 two-shy‐dimensional13 filters13 respond13 virtually13 identically13 to13 two-shy‐dimensional13 images13 as13 do13 the13 one-shy‐dimensional13 filters13 to13 one-shy‐dimensional13 signals13 (d)13 Tuning13 curves13 of13 LL13 neurons13 trained13 and13 tested13 on13 surfaces13 with13 varying13 surface13 slant13 (e)13 Normalized13 weights13 on13 model13 complex13 cell13 responses13 for13 constructing13 five13 LL13 neurons13 with13 the13 indicated13 preferred13 disparities13 (f)13 Disparity13 estimation13 performance13 with13 surfaces13 having13 a13 cosine13 distribution13 of13 slants13 (black13 line)13 Also13 shown13 are13 disparity13 estimates13 derived13 from13 a13 local13 cross-shy‐correlator13 (gray13 line)13 (g)13 Disparity13 gradients13 from13 viewing13 natural13 scenes(Hibbard13 2008)13 vs13 disparity13 gradients13 in13 our13 training13 set13 The13 relationship13 between13 slant13 and13 disparity13 gradient13 is13 given13 by13

g asymp IPD tanθ( ) d 13 where13 θ 13 is13 the13 surface13 slant13 and13 d13 is13 the13 distance13 to13 the13 surface13 Hibbard13 (2008)13 estimated13 disparity13 gradients13 in13 natural13 viewing13 for13 distances13 nearer13 than13 3513 m13 (dashed13 curve)13 and13 0513 m13 (dotted13 curve)13 At13 the13 04m13 viewing13 distance13 assumed13 throughout13 the13 paper13 our13 manipulation13 of13 surface13 slant13 introduced13 distribution13 of13 disparity13 gradients13 (black13 line)13 that13 is13 comparable13 to13 the13 gradients13 introduced13 in13 natural13 viewing13 13

913 13

13 Figure13 S213 Filter13 responses13 LL13 neuron13 responses13 and13 performance13 with13 random-shy‐dot13 stereograms13 and13 anti-shy‐correlated13 random-shy‐dot13 stereograms13 (a)13 Joint13 response13 distributions13 of13 filters13 F113 and13 F213 when13 stimulated13 with13 random-shy‐dot13 stereograms13 Compare13 to13 natural13 stimuli13 in13 Fig13 4a13 (b)13 Joint13 responses13 of13 filters13 F313 and13 F413 when13 stimulated13 with13 random-shy‐dot13 stereograms13 (c)13 LL13 neuron13 responses13 to13 random13 dot13 stereograms13 (d)13 LL13 neuron13 tuning13 vs13 bandwidth13 when13 stimulated13 with13 natural13 stimuli13 Note13 that13 bandwidths13 are13 elevated13 compared13 to13 when13 the13 LL13 neurons13 are13 stimulated13 with13 natural13 stimuli13 (see13 Fig13 7c)13 (e)13 Disparity13 estimates13 and13 estimate13 precision13 (inset)13 obtained13 with13 random-shy‐dot13 stereograms13 (fghi)13 Same13 as13 abce13 except13 that13 the13 stimuli13 were13 anti-shy‐correlated13 random13 dot13 stereograms13 Filter13 response13 distributions13 still13 segregate13 as13 a13 function13 of13 disparity13 (especially13 with13 F313 and13 F4)13 However13 the13 anti-shy‐correlated13 filter13 response13 distributions13 do13 not13 coincide13 with13 the13 lsquoexpectedrsquo13 pattern13 of13 response13 for13 particular13 disparities13 The13 disparities13 are13 decoded13 improperly13 with13 a13 decoder13 trained13 on13 natural13 stimuli13 LL13 neuron13 tuning13 curves13 with13 anti-shy‐correlated13 stereograms13 are13 neither13 selective13 nor13 invariant13 Inaccurate13 highly13 variable13 estimates13 result13 13 13

13 13

1013 13

13 Figure13 S313 Obtaining13 AMA13 filter13 and13 model13 complex13 cell13 responses13 with13 established13 neural13 operations13 (a)13 Simple13 cell13 responses13 are13 obtained13 by13 projecting13 the13 left-shy‐13 and13 right-shy‐eye13 signals13 onto13 each13 binocular13 filter13 and13 half-shy‐wave13 rectifying13 Subtracting13 lsquoonrsquo13 and13 lsquooffrsquo13 simple13 cell13 responses13 give13 the13 AMA13 filter13 responses13 (see13 Fig13 3a)13 (b)13 Model13 (and13 neurophysiological)13 complex13 cell13 responses13 are13 obtained13 by13 summing13 and13 squaring13 the13 responses13 of13 simple13 cells13 (Model13 complex13 cell13 responses13 can13 also13 be13 obtained13 by13 summing13 the13 responses13 of13 simple13 cells13 that13 have13 both13 a13 half-shy‐wave13 rectifying13 and13 a13 squaring13 nonlinearity)13 Note13 that13 the13 model13 complex13 cells13 differ13 from13 canonical13 disparity13 energy13 binocular13 complex13 cells13 We13 label13 them13 lsquocomplexrsquo13 because13 they13 exhibit13 temporal13 frequency13 doubling13 and13 have13 a13 fundamental13 to13 DC13 response13 ratio13 of13 less13 than13 one13 The13 model13 LL13 neurons13 (see13 Fig13 613 and13 Fig13 7)13 are13 more13 similar13 to13 disparity13 energy13 complex13 cells13 although13 as13 noted13 in13 the13 text13 there13 are13 also13 important13 differences13

13

1113 13

13 Figure13 S413 The13 eight13 AMA13 filters13 from13 Fig13 313 (on13 the13 diagonal)13 and13 their13 pairwise13 sums13 (see13 Fig13 S3a)13 Solid13 and13 dashed13 curves13 indicate13 the13 left13 and13 right13 eye13 filter13 components13 respectively13 Filter13 identity13 is13 indicated13 by13 the13 subscripts13 Fij13 Some13 filters13 have13 shapes13 similar13 to13 canonical13 V1-shy‐like13 binocular13 receptive13 fields13 These13 filters13 yield13 model13 complex13 cell13 tuning13 curves13 that13 vary13 with13 disparity13 (Fig13 S5)13 Other13 filters13 are13 irregularly13 shaped13 (eg13 F36)13 Irregularly13 shaped13 filters13 yield13 model13 complex13 cell13 tuning13 curves13 that13 do13 not13 vary13 significantly13 with13 disparity13 (Fig13 S5)13 The13 inset13 shows13 the13 largest13 absolute13 normalized13 weight13 (see13 Supplement13 Eqs13 S1a-shy‐d)13 given13 to13 each13 filter13 in13 constructing13 the13 set13 of13 LL13 neurons13 (Fig13 7a)13 Lighter13 colors13 indicate13 smaller13 weights13 Unsurprisingly13 the13 irregularly13 shaped13 filters13 (eg13 F36)13 play13 a13 generally13 smaller13 role13 in13 shaping13 the13 response13 properties13 of13 log-shy‐likelihood13 (LL)13 neurons13 13 13

1213 13

13 Figure13 S513 Disparity13 tuning13 curves13 of13 the13 model13 complex13 cells13 implied13 by13 the13 AMA13 filters13 (on-shy‐diagonal)13 and13 their13 normalized13 pairwise13 sums13 (off-shy‐diagonal)13 Solid13 curves13 show13 the13 complex13 cell13 tuning13 curves13 to13 natural13 stimuli13 The13

complex13 response13 is13 given13 by13 filtering13 the13 contrast-shy‐normalized13 stimulus13 and13 then13 squaring13 Rmax fi

Tcnorm( )2 13 (see13 Fig13 S4)13 Each13 point13 on13 the13 curve13 represents13 the13 average13 response13 to13 a13 large13 number13 of13 natural13 stimuli13 having13 the13 same13 disparity13 Gray13 area13 shows13 response13 variability13 due13 to13 variation13 in13 natural13 image13 content13 Dashed13 curves13 show13 the13 complex13 cell13 tuning13 curve13 to13 513 random-shy‐dot13 stereograms13 (RDSs)13 In13 most13 cases13 RDS13 tuning13 curves13 are13 similar13 to13 the13 tuning13 curves13 for13 natural13 stimuli13 although13 response13 magnitude13 is13 generally13 somewhat13 attenuated13 These13 individual13 model13 complex13 cells13 have13 some13 selectivity13 for13 disparity13 However13 they13 exhibit13 little13 invariance13 to13 variation13 in13 natural13 image13 content13 Thus13 the13 pattern13 of13 the13 joint13 population13 response13 must13 be13 used13 to13 accurately13 estimate13 disparity13 Neurons13 that13 signal13 the13 log-shy‐likelihood13 of13 disparity13 (LL13 neurons)13 can13 be13 constructed13 by13 appropriately13 combining13 the13 joint13 responses13 of13 model13 complex13 cells13 LL13 neurons13 are13 highly13 selective13 for13 disparity13 and13 are13 also13 strongly13 response13 invariant13 to13 irrelevant13

1313 13

13 Figure13 S613 Optimal13 disparity13 estimation13 vs13 estimation13 via13 local13 cross-shy‐correlation13 Performance13 measures13 were13 obtained13 by13 testing13 each13 method13 on13 identical13 sets13 of13 left-shy‐13 and13 right-shy‐eye13 signals13 like13 those13 shown13 in13 Fig13 1d13 (Note13 that13 this13 differs13 somewhat13 from13 the13 most13 common13 test13 of13 the13 cross-shy‐correlation13 model13 which13 gives13 the13 algorithm13 windowed13 access13 to13 many13 patches13 in13 each13 eye13 We13 chose13 the13 current13 method13 so13 that13 each13 algorithm13 has13 access13 to13 the13 same13 image13 information13 thereby13 making13 the13 comparison13 meaningful)13 Black13 curves13 indicate13 estimation13 performance13 via13 local13 cross-shy‐correlation13 Gray13 curves13 indicate13 performance13 obtained13 with13 the13 eight13 AMA13 filters13 shown13 in13 the13 main13 text13 (data13 replotted13 from13 Fig13 5)13 (a)13 Median13 MAP13 estimates13 for13 frontoparallel13 surfaces13 only13 Error13 bars13 indicate13 6813 confidence13 intervals13 (b)13 Estimate13 precision13 as13 indicated13 by13 6813 confidence13 intervals13 Error13 bars13 from13 (a)13 are13 replotted13 on13 a13 linear13 axis13 (Confidence13 intervals13 were13 plotted13 on13 a13 semi-shy‐log13 plot13 in13 the13 main13 text)13 (c)13 Sign13 identification13 performance13 with13 frontoparallel13 surfaces13 (de)13 Same13 as13 bc13 except13 that13 the13 performance13 is13 for13 surfaces13 with13 a13 cosine13 distribution13 of13 slants13 Estimation13 performance13 with13 the13 cross-shy‐correlator13 is13 (slightly)13 more13 biased13 and13 is13 significantly13 less13 precise13 at13 disparities13 larger13 than13 7513 arcmin13 13

1413 13

13 Figure13 S713 Information13 for13 disparity13 estimation13 is13 concentrated13 in13 low13 to13 mid13 spatial13 frequencies13 (a)13 Inter-shy‐ocular13 retinal13 contrast13 amplitude13 of13 a13 cosine-shy‐windowed13 high-shy‐contrast13 edge13 as13 a13 function13 of13 spatial13 frequency13 for13 different13 disparities13 (Eq13 S6)13 The13 optics13 in13 the13 two13 eyes13 were13 identical13 ie13

AL f( ) = AR f( )prop MTFδ f( ) f 13 where13 MTFδ is13 the13 modulation13 transfer13 function13 associated13 with13 a13 particular13 disparity13 (see13 Methods)13 (b)13 Inter-shy‐ocular13 contrast13 signals13 that13 have13 been13 passed13 through13 a13 bank13 of13 1513 octave13 bandwidth13 log-shy‐Gabor13 filters13 The13 signals13 differ13 little13 above13 ~513 cpd13 (c)13 The13 effect13 of13 vergence13 jitter13 on13 inter-shy‐ocular13 contrast13 signals13 Signals13 were13 corrupted13 by13 213 arcmin13 of13 vergence13 jitter13 a13 typical13 value13 in13 humans13 Vergence13 noise13 was13 simulated13 by13 convolving13 each13 disparity13 signal13 with13 a13 Gaussian13 having13 a13 standard13 deviation13 of13 213 arcmin13 Vergence13 jitter13 further13 attenuates13 high13 frequency13 inter-shy‐ocular13 contrast13 signals13 Natural13 image13 structure13 and13 vergence13 jitter13 both13 contribute13 to13 the13 lack13 of13 disparity13 information13 at13 high13 luminance13 spatial13 frequencies13 (d)13 The13 signals13 in13 (c)13 after13 having13 been13 passed13 through13 a13 bank13 of13 1513 octave13 bandwidth13 log-shy‐Gabor13 filters13 (e)13 The13 effect13 of13 a13 four13 diopter13 difference13 in13 refractive13 power13 between13 the13 two13 eyes13 drastically13 reduces13 the13 inter-shy‐ocular13 contrast13 signal13 A13 four13 diopter13 (or13 larger)13 difference13 between13 the13 left13 and13 right13 optics13 is13 a13 primary13 risk13 factor13 for13 the13 development13 of13 amblyopia13 (Levi13 et13 al13 2011)13 Note13 that13 these13 signals13 were13 simulated13 for13 eyes13 with13 2mm13 pupils13 With13 larger13 pupil13 diameters13 the13 same13 difference13 in13 refractive13 power13 would13 further13 reduce13 available13 disparity13 signals13 For13 example13 with13 a13 2mm13 pupil13 an13 813 diopter13 difference13 between13 the13 eyes13 produces13 the13 same13 difference13 in13 defocus13 blur13 as13 would13 a13 4mm13 pupil13 with13 a13 413 diopters13 difference13 between13 the13 eyes13 In13 that13 case13 there13 are13 no13 measureable13 signals13 above13 113 cpd13 (f)13 The13 signals13 in13 (e)13 after13 having13 been13 passed13 through13 a13 bank13 of13 1513 octave13 bandwidth13 log-shy‐Gabor13 filters13 (g)13 Same13 as13 (e)13 but13 with13 a13 4mm13 pupil13 (h)13 Same13 as13 (f)13 but13 with13 a13 4mm13 pupil13 13

13

1513 13

13 Figure13 S813 Depth13 structure13 not13 attributable13 to13 slant13 in13 natural13 scenes13 (a)13 A13 representative13 natural13 range13 image13 (b)13 Overhead13 plan13 view13 of13 the13 range13 along13 the13 white13 scan-shy‐line13 in13 (a)13 (c)13 Range13 values13 within13 a13 113 deg13 patch13 marked13 by13 the13 white13 box13 in13 (a)13 and13 the13 black13 box13 in13 (b)13 The13 dots13 represent13 the13 raw13 range13 values13 The13 plane13 is13 the13 best13 fitting13 plane13 to13 the13 range13 data13 (d)13 Depth13 variation13 in13 113 deg13 patches13 of13 natural13 scenes13 can13 largely13 be13 captured13 by13 slanted13 planes13 (see13 supplementary13 text)13 Values13 near13 one13 indicate13 that13 all13 range13 variation13 can13 be13 captured13 by13 (planar)13 surface13 slant13 Values13 near13 zero13 indicate13 that13 none13 of13 the13 range13 variation13 can13 be13 captured13 by13 planar13 surface13 slant13 In13 the13 most13 common13 patch13 nearly13 all13 depth13 variation13 is13 attributable13 to13 slant13 In13 the13 median13 patch13 7513 of13 depth13 variation13 is13 captured13 by13 slant13 13 13 13 13 13 13 13 13 13 13

1613 13

13 Figure13 S913 Contrast13 normalization13 and13 the13 Gaussian13 form13 of13 the13 filter13 response13 distributions13 The13 standard13 model13 of13

contrast13 normalization13 in13 retinal13 and13 cortical13 neurons13 is13 given13 by13 cnorm x( ) = c x( ) c x( ) 2

+ nc502 (Albrecht13 amp13 Geisler13

199113 Albrecht13 amp13 Hamilton13 198213 Heeger13 1992)13 Throughout13 the13 paper13 we13 assumed13 that13 c50 13 had13 a13 value13 of13 zero13 To13 examine13 the13 effect13 of13 different13 values13 of13 c50 13 on13 the13 form13 of13 the13 filter13 response13 distributions13 we13 first13 contrast-shy‐normalized13 the13 test13 stimuli13 using13 different13 values13 of13 c50 13 Then13 we13 projected13 the13 contrast-shy‐normalized13 the13 test13 stimuli13 from13 each13 disparity13 level13 onto13 the13 filters13 to13 obtain13 the13 conditional13 filter13 response13 distributions13 Finally13 we13 fit13 a13 multi-shy‐dimensional13 generalized13 Gaussian13 (one-shy‐dimension13 for13 each13 filter)13 to13 each13 conditional13 response13 distribution13

P R |δ k( ) 13 via13 maximum13 likelihood13 estimation13 (a)13 When13 c50 =13 0013 the13 conditional13 response13 distributions13 have13 tails13 that13 tend13 to13 be13 somewhat13 lighter13 than13 Gaussians13 (ie13 lower13 kurtosis13 than13 a13 Gaussian)13 When13 c50 =13 0113 the13 conditional13 response13 distributions13 are13 most13 Gaussian13 on13 average13 For13 c50 0113 the13 distributions13 have13 heavier13 tails13 than13 Gaussians13 (ie13 higher13 kurtosis13 than13 a13 Gaussian)13 (b)13 Powers13 of13 the13 multi-shy‐dimensional13 generalized13 Gaussians13 that13 best13 fit13 a13 subset13 of13 the13 two-shy‐dimensional13 conditional13 filter13 response13 distributions13 A13 power13 of13 two13 (dashed13 line)13 indicates13 that13 the13 best13 fitting13 distribution13 was13 Gaussian13 Powers13 greater13 than13 or13 less13 than13 two13 indicate13 that13 the13 responses13 are13 better13 fit13 by13 distributions13 having13 lighter13 or13 heavier13 tails13 than13 a13 Gaussian13 respectively13 The13 best-shy‐fitting13 powers13 are13 shown13 for13 three13 different13 values13 of13 c50 13 00013 01013 and13 02513 The13 average13 powers13 across13 the13 conditional13 response13 distributions13 for13 the13 three13 tested13 values13 of13 c50 13 were13 27913 20713 and13 11713 respectively13 The13 same13 qualitative13 results13 hold13 for13 the13 other13 two-shy‐dimensional13 response13 distributions13 The13 pattern13 also13 holds13 for13 higher-shy‐dimensional13 filter13 response13 distributions13 In13 higher13 dimensions13 however13 results13 are13 noisier13 because13 stable13 estimates13 of13 kurtosis13 require13 significantly13 more13 data13 13 13 13 13 13

  • jovi-14-02-01_118
    • Introduction
    • f01
    • Methods and results
    • e01
    • f02
    • e02
    • e03
    • e04
    • f03
    • e05
    • f04
    • f05
    • f06
    • f07
    • Discussion
    • Conclusions
    • Methods details
    • e06
    • e07
    • Albrecht1
    • Albrecht2
    • Anzai1
    • Badcock1
    • Banks1
    • Berens1
    • Blakemore1
    • Burge1
    • Burge2
    • Carandini1
    • Chen1
    • Cormack1
    • Cumming1
    • Cumming2
    • Cumming3
    • Curcio1
    • DeValois1
    • DeAngelis1
    • Ferster1
    • Fincham1
    • Fleet1
    • Ganguli1
    • Geisler1
    • Girshick1
    • Haefner1
    • Haefner2
    • Harris1
    • Hecht1
    • Heeger1
    • Hibbard1
    • Hoyer1
    • Jazayeri1
    • Kaye1
    • Landers1
    • Lehky1
    • Levi1
    • Lewicki1
    • Li1
    • Liu1
    • Ma1
    • Marcos1
    • Marr1
    • McKee1
    • Nakayama1
    • Nienborg1
    • Ogle1
    • Ohzawa1
    • Ohzawa2
    • Ohzawa3
    • Olshausen1
    • Panum1
    • Poggio1
    • Qian1
    • Qian2
    • Read1
    • Read2
    • Read3
    • Simoncelli1
    • Skottun1
    • Stevenson1
    • Stockman1
    • Tanabe1
    • Tanabe2
    • Thibos1
    • Tsao1
    • Tyler1
    • Tyler2
    • Vlaskamp1
    • Williams1
    • Wyszecki1
      • JOV-03741-2013-s01
Page 25: Optimal disparity estimation in natural stereo images · The local differences between the retinal images—the binocular disparities— ... vertically oriented receptive fields.The

713 13

13 References13 Albrecht13 D13 G13 amp13 Geisler13 W13 S13 (1991)13 Motion13 selectivity13 and13 the13 contrast-shy‐response13 function13 of13

simple13 cells13 in13 the13 visual13 cortex13 Visual13 Neuroscience13 7(6)13 531ndash54613 Albrecht13 D13 G13 amp13 Hamilton13 D13 B13 (1982)13 Striate13 cortex13 of13 monkey13 and13 cat13 contrast13 response13

function13 Journal13 of13 Neurophysiology13 48(1)13 217ndash23713 Banks13 M13 S13 Gepshtein13 S13 amp13 Landy13 M13 S13 (2004)13 Why13 is13 spatial13 stereoresolution13 so13 low13 The13

Journal13 of13 Neuroscience13 24(9)13 2077ndash208913 13 Field13 D13 J13 (1987)13 Relations13 between13 the13 statistics13 of13 natural13 images13 and13 the13 response13 properties13 of13

cortical13 cells13 Journal13 of13 the13 Optical13 Society13 of13 America13 A13 4(12)13 2379ndash239413 Fincham13 E13 amp13 Walton13 J13 (1957)13 The13 reciprocal13 actions13 of13 accommodation13 and13 convergence13 The13

Journal13 of13 Physiology13 137(3)13 488ndash50813 Geisler13 W13 S13 amp13 Albrecht13 D13 G13 (1997)13 Visual13 cortex13 neurons13 in13 monkeys13 and13 cats13 detection13

discrimination13 and13 identification13 Visual13 Neuroscience13 14(5)13 897ndash91913 Harris13 J13 M13 McKee13 S13 P13 amp13 Smallman13 H13 S13 (1997)13 Fine-shy‐scale13 processing13 in13 human13 binocular13

stereopsis13 Journal13 of13 the13 Optical13 Society13 of13 America13 A13 14(8)13 1673ndash168313 Heeger13 D13 J13 (1992)13 Normalization13 of13 cell13 responses13 in13 cat13 striate13 cortex13 Visual13 Neuroscience13 9(2)13

181ndash19713 Hibbard13 P13 B13 (2008)13 Binocular13 energy13 responses13 to13 natural13 images13 Vision13 Research13 48(12)13 1427ndash

143913 13 Stockman13 A13 amp13 Sharpe13 L13 T13 (2000)13 The13 spectral13 sensitivities13 of13 the13 middle-shy‐13 and13 long-shy‐wavelength-shy‐

sensitive13 cones13 derived13 from13 measurements13 in13 observers13 of13 known13 genotype13 Vision13 Research13 40(13)13 1711ndash173713

Thibos13 L13 N13 Ye13 M13 Zhang13 X13 amp13 Bradley13 A13 (1992)13 The13 chromatic13 eye13 a13 new13 reduced-shy‐eye13 model13 of13 ocular13 chromatic13 aberration13 in13 humans13 Applied13 Optics13 31(19)13 3594ndash360013

Tolhurst13 D13 J13 Movshon13 J13 A13 amp13 Dean13 A13 F13 (1983)13 The13 statistical13 reliability13 of13 signals13 in13 single13 neurons13 in13 cat13 and13 monkey13 visual13 cortex13 Vision13 Research13 23(8)13 775ndash78513

Vlaskamp13 B13 N13 S13 Yoon13 G13 amp13 Banks13 M13 S13 (2011)13 Human13 stereopsis13 is13 not13 limited13 by13 the13 optics13 of13 the13 well-shy‐focused13 eye13 The13 Journal13 of13 Neuroscience13 31(27)13 9814ndash981813

13

813 13

13 Figure13 S113 Optimal13 filters13 and13 disparity13 estimation13 performance13 for13 estimating13 disparity13 for13 surfaces13 with13 a13 cosine13 distribution13 of13 slants13 (a)13 Optimal13 linear13 binocular13 filters13 for13 estimating13 the13 disparity13 (of13 the13 center13 pixel)13 of13 slanted13 surfaces13 The13 filters13 were13 learned13 on13 a13 training13 set13 having13 surfaces13 with13 a13 cosine13 distribution13 of13 slants13 The13 optimal13 filter13 shapes13 are13 robust13 to13 the13 presence13 of13 non-shy‐zero13 surface13 slants13 in13 the13 training13 set13 However13 these13 filters13 are13 of13 somewhat13 smaller13 spatial13 extent13 and13 slightly13 higher13 octave13 bandwidth13 (compare13 filters13 F3-shy‐F513 to13 those13 in13 Fig13 3)13 (b)13 Spatial13 frequency13 tuning13 and13 bandwidth13 of13 filters13 (c)13 Two-shy‐dimensional13 analogs13 of13 filters13 in13 (a)13 The13 two-shy‐dimensional13 filters13 respond13 virtually13 identically13 to13 two-shy‐dimensional13 images13 as13 do13 the13 one-shy‐dimensional13 filters13 to13 one-shy‐dimensional13 signals13 (d)13 Tuning13 curves13 of13 LL13 neurons13 trained13 and13 tested13 on13 surfaces13 with13 varying13 surface13 slant13 (e)13 Normalized13 weights13 on13 model13 complex13 cell13 responses13 for13 constructing13 five13 LL13 neurons13 with13 the13 indicated13 preferred13 disparities13 (f)13 Disparity13 estimation13 performance13 with13 surfaces13 having13 a13 cosine13 distribution13 of13 slants13 (black13 line)13 Also13 shown13 are13 disparity13 estimates13 derived13 from13 a13 local13 cross-shy‐correlator13 (gray13 line)13 (g)13 Disparity13 gradients13 from13 viewing13 natural13 scenes(Hibbard13 2008)13 vs13 disparity13 gradients13 in13 our13 training13 set13 The13 relationship13 between13 slant13 and13 disparity13 gradient13 is13 given13 by13

g asymp IPD tanθ( ) d 13 where13 θ 13 is13 the13 surface13 slant13 and13 d13 is13 the13 distance13 to13 the13 surface13 Hibbard13 (2008)13 estimated13 disparity13 gradients13 in13 natural13 viewing13 for13 distances13 nearer13 than13 3513 m13 (dashed13 curve)13 and13 0513 m13 (dotted13 curve)13 At13 the13 04m13 viewing13 distance13 assumed13 throughout13 the13 paper13 our13 manipulation13 of13 surface13 slant13 introduced13 distribution13 of13 disparity13 gradients13 (black13 line)13 that13 is13 comparable13 to13 the13 gradients13 introduced13 in13 natural13 viewing13 13

913 13

13 Figure13 S213 Filter13 responses13 LL13 neuron13 responses13 and13 performance13 with13 random-shy‐dot13 stereograms13 and13 anti-shy‐correlated13 random-shy‐dot13 stereograms13 (a)13 Joint13 response13 distributions13 of13 filters13 F113 and13 F213 when13 stimulated13 with13 random-shy‐dot13 stereograms13 Compare13 to13 natural13 stimuli13 in13 Fig13 4a13 (b)13 Joint13 responses13 of13 filters13 F313 and13 F413 when13 stimulated13 with13 random-shy‐dot13 stereograms13 (c)13 LL13 neuron13 responses13 to13 random13 dot13 stereograms13 (d)13 LL13 neuron13 tuning13 vs13 bandwidth13 when13 stimulated13 with13 natural13 stimuli13 Note13 that13 bandwidths13 are13 elevated13 compared13 to13 when13 the13 LL13 neurons13 are13 stimulated13 with13 natural13 stimuli13 (see13 Fig13 7c)13 (e)13 Disparity13 estimates13 and13 estimate13 precision13 (inset)13 obtained13 with13 random-shy‐dot13 stereograms13 (fghi)13 Same13 as13 abce13 except13 that13 the13 stimuli13 were13 anti-shy‐correlated13 random13 dot13 stereograms13 Filter13 response13 distributions13 still13 segregate13 as13 a13 function13 of13 disparity13 (especially13 with13 F313 and13 F4)13 However13 the13 anti-shy‐correlated13 filter13 response13 distributions13 do13 not13 coincide13 with13 the13 lsquoexpectedrsquo13 pattern13 of13 response13 for13 particular13 disparities13 The13 disparities13 are13 decoded13 improperly13 with13 a13 decoder13 trained13 on13 natural13 stimuli13 LL13 neuron13 tuning13 curves13 with13 anti-shy‐correlated13 stereograms13 are13 neither13 selective13 nor13 invariant13 Inaccurate13 highly13 variable13 estimates13 result13 13 13

13 13

1013 13

13 Figure13 S313 Obtaining13 AMA13 filter13 and13 model13 complex13 cell13 responses13 with13 established13 neural13 operations13 (a)13 Simple13 cell13 responses13 are13 obtained13 by13 projecting13 the13 left-shy‐13 and13 right-shy‐eye13 signals13 onto13 each13 binocular13 filter13 and13 half-shy‐wave13 rectifying13 Subtracting13 lsquoonrsquo13 and13 lsquooffrsquo13 simple13 cell13 responses13 give13 the13 AMA13 filter13 responses13 (see13 Fig13 3a)13 (b)13 Model13 (and13 neurophysiological)13 complex13 cell13 responses13 are13 obtained13 by13 summing13 and13 squaring13 the13 responses13 of13 simple13 cells13 (Model13 complex13 cell13 responses13 can13 also13 be13 obtained13 by13 summing13 the13 responses13 of13 simple13 cells13 that13 have13 both13 a13 half-shy‐wave13 rectifying13 and13 a13 squaring13 nonlinearity)13 Note13 that13 the13 model13 complex13 cells13 differ13 from13 canonical13 disparity13 energy13 binocular13 complex13 cells13 We13 label13 them13 lsquocomplexrsquo13 because13 they13 exhibit13 temporal13 frequency13 doubling13 and13 have13 a13 fundamental13 to13 DC13 response13 ratio13 of13 less13 than13 one13 The13 model13 LL13 neurons13 (see13 Fig13 613 and13 Fig13 7)13 are13 more13 similar13 to13 disparity13 energy13 complex13 cells13 although13 as13 noted13 in13 the13 text13 there13 are13 also13 important13 differences13

13

1113 13

13 Figure13 S413 The13 eight13 AMA13 filters13 from13 Fig13 313 (on13 the13 diagonal)13 and13 their13 pairwise13 sums13 (see13 Fig13 S3a)13 Solid13 and13 dashed13 curves13 indicate13 the13 left13 and13 right13 eye13 filter13 components13 respectively13 Filter13 identity13 is13 indicated13 by13 the13 subscripts13 Fij13 Some13 filters13 have13 shapes13 similar13 to13 canonical13 V1-shy‐like13 binocular13 receptive13 fields13 These13 filters13 yield13 model13 complex13 cell13 tuning13 curves13 that13 vary13 with13 disparity13 (Fig13 S5)13 Other13 filters13 are13 irregularly13 shaped13 (eg13 F36)13 Irregularly13 shaped13 filters13 yield13 model13 complex13 cell13 tuning13 curves13 that13 do13 not13 vary13 significantly13 with13 disparity13 (Fig13 S5)13 The13 inset13 shows13 the13 largest13 absolute13 normalized13 weight13 (see13 Supplement13 Eqs13 S1a-shy‐d)13 given13 to13 each13 filter13 in13 constructing13 the13 set13 of13 LL13 neurons13 (Fig13 7a)13 Lighter13 colors13 indicate13 smaller13 weights13 Unsurprisingly13 the13 irregularly13 shaped13 filters13 (eg13 F36)13 play13 a13 generally13 smaller13 role13 in13 shaping13 the13 response13 properties13 of13 log-shy‐likelihood13 (LL)13 neurons13 13 13

1213 13

13 Figure13 S513 Disparity13 tuning13 curves13 of13 the13 model13 complex13 cells13 implied13 by13 the13 AMA13 filters13 (on-shy‐diagonal)13 and13 their13 normalized13 pairwise13 sums13 (off-shy‐diagonal)13 Solid13 curves13 show13 the13 complex13 cell13 tuning13 curves13 to13 natural13 stimuli13 The13

complex13 response13 is13 given13 by13 filtering13 the13 contrast-shy‐normalized13 stimulus13 and13 then13 squaring13 Rmax fi

Tcnorm( )2 13 (see13 Fig13 S4)13 Each13 point13 on13 the13 curve13 represents13 the13 average13 response13 to13 a13 large13 number13 of13 natural13 stimuli13 having13 the13 same13 disparity13 Gray13 area13 shows13 response13 variability13 due13 to13 variation13 in13 natural13 image13 content13 Dashed13 curves13 show13 the13 complex13 cell13 tuning13 curve13 to13 513 random-shy‐dot13 stereograms13 (RDSs)13 In13 most13 cases13 RDS13 tuning13 curves13 are13 similar13 to13 the13 tuning13 curves13 for13 natural13 stimuli13 although13 response13 magnitude13 is13 generally13 somewhat13 attenuated13 These13 individual13 model13 complex13 cells13 have13 some13 selectivity13 for13 disparity13 However13 they13 exhibit13 little13 invariance13 to13 variation13 in13 natural13 image13 content13 Thus13 the13 pattern13 of13 the13 joint13 population13 response13 must13 be13 used13 to13 accurately13 estimate13 disparity13 Neurons13 that13 signal13 the13 log-shy‐likelihood13 of13 disparity13 (LL13 neurons)13 can13 be13 constructed13 by13 appropriately13 combining13 the13 joint13 responses13 of13 model13 complex13 cells13 LL13 neurons13 are13 highly13 selective13 for13 disparity13 and13 are13 also13 strongly13 response13 invariant13 to13 irrelevant13

1313 13

13 Figure13 S613 Optimal13 disparity13 estimation13 vs13 estimation13 via13 local13 cross-shy‐correlation13 Performance13 measures13 were13 obtained13 by13 testing13 each13 method13 on13 identical13 sets13 of13 left-shy‐13 and13 right-shy‐eye13 signals13 like13 those13 shown13 in13 Fig13 1d13 (Note13 that13 this13 differs13 somewhat13 from13 the13 most13 common13 test13 of13 the13 cross-shy‐correlation13 model13 which13 gives13 the13 algorithm13 windowed13 access13 to13 many13 patches13 in13 each13 eye13 We13 chose13 the13 current13 method13 so13 that13 each13 algorithm13 has13 access13 to13 the13 same13 image13 information13 thereby13 making13 the13 comparison13 meaningful)13 Black13 curves13 indicate13 estimation13 performance13 via13 local13 cross-shy‐correlation13 Gray13 curves13 indicate13 performance13 obtained13 with13 the13 eight13 AMA13 filters13 shown13 in13 the13 main13 text13 (data13 replotted13 from13 Fig13 5)13 (a)13 Median13 MAP13 estimates13 for13 frontoparallel13 surfaces13 only13 Error13 bars13 indicate13 6813 confidence13 intervals13 (b)13 Estimate13 precision13 as13 indicated13 by13 6813 confidence13 intervals13 Error13 bars13 from13 (a)13 are13 replotted13 on13 a13 linear13 axis13 (Confidence13 intervals13 were13 plotted13 on13 a13 semi-shy‐log13 plot13 in13 the13 main13 text)13 (c)13 Sign13 identification13 performance13 with13 frontoparallel13 surfaces13 (de)13 Same13 as13 bc13 except13 that13 the13 performance13 is13 for13 surfaces13 with13 a13 cosine13 distribution13 of13 slants13 Estimation13 performance13 with13 the13 cross-shy‐correlator13 is13 (slightly)13 more13 biased13 and13 is13 significantly13 less13 precise13 at13 disparities13 larger13 than13 7513 arcmin13 13

1413 13

13 Figure13 S713 Information13 for13 disparity13 estimation13 is13 concentrated13 in13 low13 to13 mid13 spatial13 frequencies13 (a)13 Inter-shy‐ocular13 retinal13 contrast13 amplitude13 of13 a13 cosine-shy‐windowed13 high-shy‐contrast13 edge13 as13 a13 function13 of13 spatial13 frequency13 for13 different13 disparities13 (Eq13 S6)13 The13 optics13 in13 the13 two13 eyes13 were13 identical13 ie13

AL f( ) = AR f( )prop MTFδ f( ) f 13 where13 MTFδ is13 the13 modulation13 transfer13 function13 associated13 with13 a13 particular13 disparity13 (see13 Methods)13 (b)13 Inter-shy‐ocular13 contrast13 signals13 that13 have13 been13 passed13 through13 a13 bank13 of13 1513 octave13 bandwidth13 log-shy‐Gabor13 filters13 The13 signals13 differ13 little13 above13 ~513 cpd13 (c)13 The13 effect13 of13 vergence13 jitter13 on13 inter-shy‐ocular13 contrast13 signals13 Signals13 were13 corrupted13 by13 213 arcmin13 of13 vergence13 jitter13 a13 typical13 value13 in13 humans13 Vergence13 noise13 was13 simulated13 by13 convolving13 each13 disparity13 signal13 with13 a13 Gaussian13 having13 a13 standard13 deviation13 of13 213 arcmin13 Vergence13 jitter13 further13 attenuates13 high13 frequency13 inter-shy‐ocular13 contrast13 signals13 Natural13 image13 structure13 and13 vergence13 jitter13 both13 contribute13 to13 the13 lack13 of13 disparity13 information13 at13 high13 luminance13 spatial13 frequencies13 (d)13 The13 signals13 in13 (c)13 after13 having13 been13 passed13 through13 a13 bank13 of13 1513 octave13 bandwidth13 log-shy‐Gabor13 filters13 (e)13 The13 effect13 of13 a13 four13 diopter13 difference13 in13 refractive13 power13 between13 the13 two13 eyes13 drastically13 reduces13 the13 inter-shy‐ocular13 contrast13 signal13 A13 four13 diopter13 (or13 larger)13 difference13 between13 the13 left13 and13 right13 optics13 is13 a13 primary13 risk13 factor13 for13 the13 development13 of13 amblyopia13 (Levi13 et13 al13 2011)13 Note13 that13 these13 signals13 were13 simulated13 for13 eyes13 with13 2mm13 pupils13 With13 larger13 pupil13 diameters13 the13 same13 difference13 in13 refractive13 power13 would13 further13 reduce13 available13 disparity13 signals13 For13 example13 with13 a13 2mm13 pupil13 an13 813 diopter13 difference13 between13 the13 eyes13 produces13 the13 same13 difference13 in13 defocus13 blur13 as13 would13 a13 4mm13 pupil13 with13 a13 413 diopters13 difference13 between13 the13 eyes13 In13 that13 case13 there13 are13 no13 measureable13 signals13 above13 113 cpd13 (f)13 The13 signals13 in13 (e)13 after13 having13 been13 passed13 through13 a13 bank13 of13 1513 octave13 bandwidth13 log-shy‐Gabor13 filters13 (g)13 Same13 as13 (e)13 but13 with13 a13 4mm13 pupil13 (h)13 Same13 as13 (f)13 but13 with13 a13 4mm13 pupil13 13

13

1513 13

13 Figure13 S813 Depth13 structure13 not13 attributable13 to13 slant13 in13 natural13 scenes13 (a)13 A13 representative13 natural13 range13 image13 (b)13 Overhead13 plan13 view13 of13 the13 range13 along13 the13 white13 scan-shy‐line13 in13 (a)13 (c)13 Range13 values13 within13 a13 113 deg13 patch13 marked13 by13 the13 white13 box13 in13 (a)13 and13 the13 black13 box13 in13 (b)13 The13 dots13 represent13 the13 raw13 range13 values13 The13 plane13 is13 the13 best13 fitting13 plane13 to13 the13 range13 data13 (d)13 Depth13 variation13 in13 113 deg13 patches13 of13 natural13 scenes13 can13 largely13 be13 captured13 by13 slanted13 planes13 (see13 supplementary13 text)13 Values13 near13 one13 indicate13 that13 all13 range13 variation13 can13 be13 captured13 by13 (planar)13 surface13 slant13 Values13 near13 zero13 indicate13 that13 none13 of13 the13 range13 variation13 can13 be13 captured13 by13 planar13 surface13 slant13 In13 the13 most13 common13 patch13 nearly13 all13 depth13 variation13 is13 attributable13 to13 slant13 In13 the13 median13 patch13 7513 of13 depth13 variation13 is13 captured13 by13 slant13 13 13 13 13 13 13 13 13 13 13

1613 13

13 Figure13 S913 Contrast13 normalization13 and13 the13 Gaussian13 form13 of13 the13 filter13 response13 distributions13 The13 standard13 model13 of13

contrast13 normalization13 in13 retinal13 and13 cortical13 neurons13 is13 given13 by13 cnorm x( ) = c x( ) c x( ) 2

+ nc502 (Albrecht13 amp13 Geisler13

199113 Albrecht13 amp13 Hamilton13 198213 Heeger13 1992)13 Throughout13 the13 paper13 we13 assumed13 that13 c50 13 had13 a13 value13 of13 zero13 To13 examine13 the13 effect13 of13 different13 values13 of13 c50 13 on13 the13 form13 of13 the13 filter13 response13 distributions13 we13 first13 contrast-shy‐normalized13 the13 test13 stimuli13 using13 different13 values13 of13 c50 13 Then13 we13 projected13 the13 contrast-shy‐normalized13 the13 test13 stimuli13 from13 each13 disparity13 level13 onto13 the13 filters13 to13 obtain13 the13 conditional13 filter13 response13 distributions13 Finally13 we13 fit13 a13 multi-shy‐dimensional13 generalized13 Gaussian13 (one-shy‐dimension13 for13 each13 filter)13 to13 each13 conditional13 response13 distribution13

P R |δ k( ) 13 via13 maximum13 likelihood13 estimation13 (a)13 When13 c50 =13 0013 the13 conditional13 response13 distributions13 have13 tails13 that13 tend13 to13 be13 somewhat13 lighter13 than13 Gaussians13 (ie13 lower13 kurtosis13 than13 a13 Gaussian)13 When13 c50 =13 0113 the13 conditional13 response13 distributions13 are13 most13 Gaussian13 on13 average13 For13 c50 0113 the13 distributions13 have13 heavier13 tails13 than13 Gaussians13 (ie13 higher13 kurtosis13 than13 a13 Gaussian)13 (b)13 Powers13 of13 the13 multi-shy‐dimensional13 generalized13 Gaussians13 that13 best13 fit13 a13 subset13 of13 the13 two-shy‐dimensional13 conditional13 filter13 response13 distributions13 A13 power13 of13 two13 (dashed13 line)13 indicates13 that13 the13 best13 fitting13 distribution13 was13 Gaussian13 Powers13 greater13 than13 or13 less13 than13 two13 indicate13 that13 the13 responses13 are13 better13 fit13 by13 distributions13 having13 lighter13 or13 heavier13 tails13 than13 a13 Gaussian13 respectively13 The13 best-shy‐fitting13 powers13 are13 shown13 for13 three13 different13 values13 of13 c50 13 00013 01013 and13 02513 The13 average13 powers13 across13 the13 conditional13 response13 distributions13 for13 the13 three13 tested13 values13 of13 c50 13 were13 27913 20713 and13 11713 respectively13 The13 same13 qualitative13 results13 hold13 for13 the13 other13 two-shy‐dimensional13 response13 distributions13 The13 pattern13 also13 holds13 for13 higher-shy‐dimensional13 filter13 response13 distributions13 In13 higher13 dimensions13 however13 results13 are13 noisier13 because13 stable13 estimates13 of13 kurtosis13 require13 significantly13 more13 data13 13 13 13 13 13

  • jovi-14-02-01_118
    • Introduction
    • f01
    • Methods and results
    • e01
    • f02
    • e02
    • e03
    • e04
    • f03
    • e05
    • f04
    • f05
    • f06
    • f07
    • Discussion
    • Conclusions
    • Methods details
    • e06
    • e07
    • Albrecht1
    • Albrecht2
    • Anzai1
    • Badcock1
    • Banks1
    • Berens1
    • Blakemore1
    • Burge1
    • Burge2
    • Carandini1
    • Chen1
    • Cormack1
    • Cumming1
    • Cumming2
    • Cumming3
    • Curcio1
    • DeValois1
    • DeAngelis1
    • Ferster1
    • Fincham1
    • Fleet1
    • Ganguli1
    • Geisler1
    • Girshick1
    • Haefner1
    • Haefner2
    • Harris1
    • Hecht1
    • Heeger1
    • Hibbard1
    • Hoyer1
    • Jazayeri1
    • Kaye1
    • Landers1
    • Lehky1
    • Levi1
    • Lewicki1
    • Li1
    • Liu1
    • Ma1
    • Marcos1
    • Marr1
    • McKee1
    • Nakayama1
    • Nienborg1
    • Ogle1
    • Ohzawa1
    • Ohzawa2
    • Ohzawa3
    • Olshausen1
    • Panum1
    • Poggio1
    • Qian1
    • Qian2
    • Read1
    • Read2
    • Read3
    • Simoncelli1
    • Skottun1
    • Stevenson1
    • Stockman1
    • Tanabe1
    • Tanabe2
    • Thibos1
    • Tsao1
    • Tyler1
    • Tyler2
    • Vlaskamp1
    • Williams1
    • Wyszecki1
      • JOV-03741-2013-s01
Page 26: Optimal disparity estimation in natural stereo images · The local differences between the retinal images—the binocular disparities— ... vertically oriented receptive fields.The

813 13

13 Figure13 S113 Optimal13 filters13 and13 disparity13 estimation13 performance13 for13 estimating13 disparity13 for13 surfaces13 with13 a13 cosine13 distribution13 of13 slants13 (a)13 Optimal13 linear13 binocular13 filters13 for13 estimating13 the13 disparity13 (of13 the13 center13 pixel)13 of13 slanted13 surfaces13 The13 filters13 were13 learned13 on13 a13 training13 set13 having13 surfaces13 with13 a13 cosine13 distribution13 of13 slants13 The13 optimal13 filter13 shapes13 are13 robust13 to13 the13 presence13 of13 non-shy‐zero13 surface13 slants13 in13 the13 training13 set13 However13 these13 filters13 are13 of13 somewhat13 smaller13 spatial13 extent13 and13 slightly13 higher13 octave13 bandwidth13 (compare13 filters13 F3-shy‐F513 to13 those13 in13 Fig13 3)13 (b)13 Spatial13 frequency13 tuning13 and13 bandwidth13 of13 filters13 (c)13 Two-shy‐dimensional13 analogs13 of13 filters13 in13 (a)13 The13 two-shy‐dimensional13 filters13 respond13 virtually13 identically13 to13 two-shy‐dimensional13 images13 as13 do13 the13 one-shy‐dimensional13 filters13 to13 one-shy‐dimensional13 signals13 (d)13 Tuning13 curves13 of13 LL13 neurons13 trained13 and13 tested13 on13 surfaces13 with13 varying13 surface13 slant13 (e)13 Normalized13 weights13 on13 model13 complex13 cell13 responses13 for13 constructing13 five13 LL13 neurons13 with13 the13 indicated13 preferred13 disparities13 (f)13 Disparity13 estimation13 performance13 with13 surfaces13 having13 a13 cosine13 distribution13 of13 slants13 (black13 line)13 Also13 shown13 are13 disparity13 estimates13 derived13 from13 a13 local13 cross-shy‐correlator13 (gray13 line)13 (g)13 Disparity13 gradients13 from13 viewing13 natural13 scenes(Hibbard13 2008)13 vs13 disparity13 gradients13 in13 our13 training13 set13 The13 relationship13 between13 slant13 and13 disparity13 gradient13 is13 given13 by13

g asymp IPD tanθ( ) d 13 where13 θ 13 is13 the13 surface13 slant13 and13 d13 is13 the13 distance13 to13 the13 surface13 Hibbard13 (2008)13 estimated13 disparity13 gradients13 in13 natural13 viewing13 for13 distances13 nearer13 than13 3513 m13 (dashed13 curve)13 and13 0513 m13 (dotted13 curve)13 At13 the13 04m13 viewing13 distance13 assumed13 throughout13 the13 paper13 our13 manipulation13 of13 surface13 slant13 introduced13 distribution13 of13 disparity13 gradients13 (black13 line)13 that13 is13 comparable13 to13 the13 gradients13 introduced13 in13 natural13 viewing13 13

913 13

13 Figure13 S213 Filter13 responses13 LL13 neuron13 responses13 and13 performance13 with13 random-shy‐dot13 stereograms13 and13 anti-shy‐correlated13 random-shy‐dot13 stereograms13 (a)13 Joint13 response13 distributions13 of13 filters13 F113 and13 F213 when13 stimulated13 with13 random-shy‐dot13 stereograms13 Compare13 to13 natural13 stimuli13 in13 Fig13 4a13 (b)13 Joint13 responses13 of13 filters13 F313 and13 F413 when13 stimulated13 with13 random-shy‐dot13 stereograms13 (c)13 LL13 neuron13 responses13 to13 random13 dot13 stereograms13 (d)13 LL13 neuron13 tuning13 vs13 bandwidth13 when13 stimulated13 with13 natural13 stimuli13 Note13 that13 bandwidths13 are13 elevated13 compared13 to13 when13 the13 LL13 neurons13 are13 stimulated13 with13 natural13 stimuli13 (see13 Fig13 7c)13 (e)13 Disparity13 estimates13 and13 estimate13 precision13 (inset)13 obtained13 with13 random-shy‐dot13 stereograms13 (fghi)13 Same13 as13 abce13 except13 that13 the13 stimuli13 were13 anti-shy‐correlated13 random13 dot13 stereograms13 Filter13 response13 distributions13 still13 segregate13 as13 a13 function13 of13 disparity13 (especially13 with13 F313 and13 F4)13 However13 the13 anti-shy‐correlated13 filter13 response13 distributions13 do13 not13 coincide13 with13 the13 lsquoexpectedrsquo13 pattern13 of13 response13 for13 particular13 disparities13 The13 disparities13 are13 decoded13 improperly13 with13 a13 decoder13 trained13 on13 natural13 stimuli13 LL13 neuron13 tuning13 curves13 with13 anti-shy‐correlated13 stereograms13 are13 neither13 selective13 nor13 invariant13 Inaccurate13 highly13 variable13 estimates13 result13 13 13

13 13

1013 13

13 Figure13 S313 Obtaining13 AMA13 filter13 and13 model13 complex13 cell13 responses13 with13 established13 neural13 operations13 (a)13 Simple13 cell13 responses13 are13 obtained13 by13 projecting13 the13 left-shy‐13 and13 right-shy‐eye13 signals13 onto13 each13 binocular13 filter13 and13 half-shy‐wave13 rectifying13 Subtracting13 lsquoonrsquo13 and13 lsquooffrsquo13 simple13 cell13 responses13 give13 the13 AMA13 filter13 responses13 (see13 Fig13 3a)13 (b)13 Model13 (and13 neurophysiological)13 complex13 cell13 responses13 are13 obtained13 by13 summing13 and13 squaring13 the13 responses13 of13 simple13 cells13 (Model13 complex13 cell13 responses13 can13 also13 be13 obtained13 by13 summing13 the13 responses13 of13 simple13 cells13 that13 have13 both13 a13 half-shy‐wave13 rectifying13 and13 a13 squaring13 nonlinearity)13 Note13 that13 the13 model13 complex13 cells13 differ13 from13 canonical13 disparity13 energy13 binocular13 complex13 cells13 We13 label13 them13 lsquocomplexrsquo13 because13 they13 exhibit13 temporal13 frequency13 doubling13 and13 have13 a13 fundamental13 to13 DC13 response13 ratio13 of13 less13 than13 one13 The13 model13 LL13 neurons13 (see13 Fig13 613 and13 Fig13 7)13 are13 more13 similar13 to13 disparity13 energy13 complex13 cells13 although13 as13 noted13 in13 the13 text13 there13 are13 also13 important13 differences13

13

1113 13

13 Figure13 S413 The13 eight13 AMA13 filters13 from13 Fig13 313 (on13 the13 diagonal)13 and13 their13 pairwise13 sums13 (see13 Fig13 S3a)13 Solid13 and13 dashed13 curves13 indicate13 the13 left13 and13 right13 eye13 filter13 components13 respectively13 Filter13 identity13 is13 indicated13 by13 the13 subscripts13 Fij13 Some13 filters13 have13 shapes13 similar13 to13 canonical13 V1-shy‐like13 binocular13 receptive13 fields13 These13 filters13 yield13 model13 complex13 cell13 tuning13 curves13 that13 vary13 with13 disparity13 (Fig13 S5)13 Other13 filters13 are13 irregularly13 shaped13 (eg13 F36)13 Irregularly13 shaped13 filters13 yield13 model13 complex13 cell13 tuning13 curves13 that13 do13 not13 vary13 significantly13 with13 disparity13 (Fig13 S5)13 The13 inset13 shows13 the13 largest13 absolute13 normalized13 weight13 (see13 Supplement13 Eqs13 S1a-shy‐d)13 given13 to13 each13 filter13 in13 constructing13 the13 set13 of13 LL13 neurons13 (Fig13 7a)13 Lighter13 colors13 indicate13 smaller13 weights13 Unsurprisingly13 the13 irregularly13 shaped13 filters13 (eg13 F36)13 play13 a13 generally13 smaller13 role13 in13 shaping13 the13 response13 properties13 of13 log-shy‐likelihood13 (LL)13 neurons13 13 13

1213 13

13 Figure13 S513 Disparity13 tuning13 curves13 of13 the13 model13 complex13 cells13 implied13 by13 the13 AMA13 filters13 (on-shy‐diagonal)13 and13 their13 normalized13 pairwise13 sums13 (off-shy‐diagonal)13 Solid13 curves13 show13 the13 complex13 cell13 tuning13 curves13 to13 natural13 stimuli13 The13

complex13 response13 is13 given13 by13 filtering13 the13 contrast-shy‐normalized13 stimulus13 and13 then13 squaring13 Rmax fi

Tcnorm( )2 13 (see13 Fig13 S4)13 Each13 point13 on13 the13 curve13 represents13 the13 average13 response13 to13 a13 large13 number13 of13 natural13 stimuli13 having13 the13 same13 disparity13 Gray13 area13 shows13 response13 variability13 due13 to13 variation13 in13 natural13 image13 content13 Dashed13 curves13 show13 the13 complex13 cell13 tuning13 curve13 to13 513 random-shy‐dot13 stereograms13 (RDSs)13 In13 most13 cases13 RDS13 tuning13 curves13 are13 similar13 to13 the13 tuning13 curves13 for13 natural13 stimuli13 although13 response13 magnitude13 is13 generally13 somewhat13 attenuated13 These13 individual13 model13 complex13 cells13 have13 some13 selectivity13 for13 disparity13 However13 they13 exhibit13 little13 invariance13 to13 variation13 in13 natural13 image13 content13 Thus13 the13 pattern13 of13 the13 joint13 population13 response13 must13 be13 used13 to13 accurately13 estimate13 disparity13 Neurons13 that13 signal13 the13 log-shy‐likelihood13 of13 disparity13 (LL13 neurons)13 can13 be13 constructed13 by13 appropriately13 combining13 the13 joint13 responses13 of13 model13 complex13 cells13 LL13 neurons13 are13 highly13 selective13 for13 disparity13 and13 are13 also13 strongly13 response13 invariant13 to13 irrelevant13

1313 13

13 Figure13 S613 Optimal13 disparity13 estimation13 vs13 estimation13 via13 local13 cross-shy‐correlation13 Performance13 measures13 were13 obtained13 by13 testing13 each13 method13 on13 identical13 sets13 of13 left-shy‐13 and13 right-shy‐eye13 signals13 like13 those13 shown13 in13 Fig13 1d13 (Note13 that13 this13 differs13 somewhat13 from13 the13 most13 common13 test13 of13 the13 cross-shy‐correlation13 model13 which13 gives13 the13 algorithm13 windowed13 access13 to13 many13 patches13 in13 each13 eye13 We13 chose13 the13 current13 method13 so13 that13 each13 algorithm13 has13 access13 to13 the13 same13 image13 information13 thereby13 making13 the13 comparison13 meaningful)13 Black13 curves13 indicate13 estimation13 performance13 via13 local13 cross-shy‐correlation13 Gray13 curves13 indicate13 performance13 obtained13 with13 the13 eight13 AMA13 filters13 shown13 in13 the13 main13 text13 (data13 replotted13 from13 Fig13 5)13 (a)13 Median13 MAP13 estimates13 for13 frontoparallel13 surfaces13 only13 Error13 bars13 indicate13 6813 confidence13 intervals13 (b)13 Estimate13 precision13 as13 indicated13 by13 6813 confidence13 intervals13 Error13 bars13 from13 (a)13 are13 replotted13 on13 a13 linear13 axis13 (Confidence13 intervals13 were13 plotted13 on13 a13 semi-shy‐log13 plot13 in13 the13 main13 text)13 (c)13 Sign13 identification13 performance13 with13 frontoparallel13 surfaces13 (de)13 Same13 as13 bc13 except13 that13 the13 performance13 is13 for13 surfaces13 with13 a13 cosine13 distribution13 of13 slants13 Estimation13 performance13 with13 the13 cross-shy‐correlator13 is13 (slightly)13 more13 biased13 and13 is13 significantly13 less13 precise13 at13 disparities13 larger13 than13 7513 arcmin13 13

1413 13

13 Figure13 S713 Information13 for13 disparity13 estimation13 is13 concentrated13 in13 low13 to13 mid13 spatial13 frequencies13 (a)13 Inter-shy‐ocular13 retinal13 contrast13 amplitude13 of13 a13 cosine-shy‐windowed13 high-shy‐contrast13 edge13 as13 a13 function13 of13 spatial13 frequency13 for13 different13 disparities13 (Eq13 S6)13 The13 optics13 in13 the13 two13 eyes13 were13 identical13 ie13

AL f( ) = AR f( )prop MTFδ f( ) f 13 where13 MTFδ is13 the13 modulation13 transfer13 function13 associated13 with13 a13 particular13 disparity13 (see13 Methods)13 (b)13 Inter-shy‐ocular13 contrast13 signals13 that13 have13 been13 passed13 through13 a13 bank13 of13 1513 octave13 bandwidth13 log-shy‐Gabor13 filters13 The13 signals13 differ13 little13 above13 ~513 cpd13 (c)13 The13 effect13 of13 vergence13 jitter13 on13 inter-shy‐ocular13 contrast13 signals13 Signals13 were13 corrupted13 by13 213 arcmin13 of13 vergence13 jitter13 a13 typical13 value13 in13 humans13 Vergence13 noise13 was13 simulated13 by13 convolving13 each13 disparity13 signal13 with13 a13 Gaussian13 having13 a13 standard13 deviation13 of13 213 arcmin13 Vergence13 jitter13 further13 attenuates13 high13 frequency13 inter-shy‐ocular13 contrast13 signals13 Natural13 image13 structure13 and13 vergence13 jitter13 both13 contribute13 to13 the13 lack13 of13 disparity13 information13 at13 high13 luminance13 spatial13 frequencies13 (d)13 The13 signals13 in13 (c)13 after13 having13 been13 passed13 through13 a13 bank13 of13 1513 octave13 bandwidth13 log-shy‐Gabor13 filters13 (e)13 The13 effect13 of13 a13 four13 diopter13 difference13 in13 refractive13 power13 between13 the13 two13 eyes13 drastically13 reduces13 the13 inter-shy‐ocular13 contrast13 signal13 A13 four13 diopter13 (or13 larger)13 difference13 between13 the13 left13 and13 right13 optics13 is13 a13 primary13 risk13 factor13 for13 the13 development13 of13 amblyopia13 (Levi13 et13 al13 2011)13 Note13 that13 these13 signals13 were13 simulated13 for13 eyes13 with13 2mm13 pupils13 With13 larger13 pupil13 diameters13 the13 same13 difference13 in13 refractive13 power13 would13 further13 reduce13 available13 disparity13 signals13 For13 example13 with13 a13 2mm13 pupil13 an13 813 diopter13 difference13 between13 the13 eyes13 produces13 the13 same13 difference13 in13 defocus13 blur13 as13 would13 a13 4mm13 pupil13 with13 a13 413 diopters13 difference13 between13 the13 eyes13 In13 that13 case13 there13 are13 no13 measureable13 signals13 above13 113 cpd13 (f)13 The13 signals13 in13 (e)13 after13 having13 been13 passed13 through13 a13 bank13 of13 1513 octave13 bandwidth13 log-shy‐Gabor13 filters13 (g)13 Same13 as13 (e)13 but13 with13 a13 4mm13 pupil13 (h)13 Same13 as13 (f)13 but13 with13 a13 4mm13 pupil13 13

13

1513 13

13 Figure13 S813 Depth13 structure13 not13 attributable13 to13 slant13 in13 natural13 scenes13 (a)13 A13 representative13 natural13 range13 image13 (b)13 Overhead13 plan13 view13 of13 the13 range13 along13 the13 white13 scan-shy‐line13 in13 (a)13 (c)13 Range13 values13 within13 a13 113 deg13 patch13 marked13 by13 the13 white13 box13 in13 (a)13 and13 the13 black13 box13 in13 (b)13 The13 dots13 represent13 the13 raw13 range13 values13 The13 plane13 is13 the13 best13 fitting13 plane13 to13 the13 range13 data13 (d)13 Depth13 variation13 in13 113 deg13 patches13 of13 natural13 scenes13 can13 largely13 be13 captured13 by13 slanted13 planes13 (see13 supplementary13 text)13 Values13 near13 one13 indicate13 that13 all13 range13 variation13 can13 be13 captured13 by13 (planar)13 surface13 slant13 Values13 near13 zero13 indicate13 that13 none13 of13 the13 range13 variation13 can13 be13 captured13 by13 planar13 surface13 slant13 In13 the13 most13 common13 patch13 nearly13 all13 depth13 variation13 is13 attributable13 to13 slant13 In13 the13 median13 patch13 7513 of13 depth13 variation13 is13 captured13 by13 slant13 13 13 13 13 13 13 13 13 13 13

1613 13

13 Figure13 S913 Contrast13 normalization13 and13 the13 Gaussian13 form13 of13 the13 filter13 response13 distributions13 The13 standard13 model13 of13

contrast13 normalization13 in13 retinal13 and13 cortical13 neurons13 is13 given13 by13 cnorm x( ) = c x( ) c x( ) 2

+ nc502 (Albrecht13 amp13 Geisler13

199113 Albrecht13 amp13 Hamilton13 198213 Heeger13 1992)13 Throughout13 the13 paper13 we13 assumed13 that13 c50 13 had13 a13 value13 of13 zero13 To13 examine13 the13 effect13 of13 different13 values13 of13 c50 13 on13 the13 form13 of13 the13 filter13 response13 distributions13 we13 first13 contrast-shy‐normalized13 the13 test13 stimuli13 using13 different13 values13 of13 c50 13 Then13 we13 projected13 the13 contrast-shy‐normalized13 the13 test13 stimuli13 from13 each13 disparity13 level13 onto13 the13 filters13 to13 obtain13 the13 conditional13 filter13 response13 distributions13 Finally13 we13 fit13 a13 multi-shy‐dimensional13 generalized13 Gaussian13 (one-shy‐dimension13 for13 each13 filter)13 to13 each13 conditional13 response13 distribution13

P R |δ k( ) 13 via13 maximum13 likelihood13 estimation13 (a)13 When13 c50 =13 0013 the13 conditional13 response13 distributions13 have13 tails13 that13 tend13 to13 be13 somewhat13 lighter13 than13 Gaussians13 (ie13 lower13 kurtosis13 than13 a13 Gaussian)13 When13 c50 =13 0113 the13 conditional13 response13 distributions13 are13 most13 Gaussian13 on13 average13 For13 c50 0113 the13 distributions13 have13 heavier13 tails13 than13 Gaussians13 (ie13 higher13 kurtosis13 than13 a13 Gaussian)13 (b)13 Powers13 of13 the13 multi-shy‐dimensional13 generalized13 Gaussians13 that13 best13 fit13 a13 subset13 of13 the13 two-shy‐dimensional13 conditional13 filter13 response13 distributions13 A13 power13 of13 two13 (dashed13 line)13 indicates13 that13 the13 best13 fitting13 distribution13 was13 Gaussian13 Powers13 greater13 than13 or13 less13 than13 two13 indicate13 that13 the13 responses13 are13 better13 fit13 by13 distributions13 having13 lighter13 or13 heavier13 tails13 than13 a13 Gaussian13 respectively13 The13 best-shy‐fitting13 powers13 are13 shown13 for13 three13 different13 values13 of13 c50 13 00013 01013 and13 02513 The13 average13 powers13 across13 the13 conditional13 response13 distributions13 for13 the13 three13 tested13 values13 of13 c50 13 were13 27913 20713 and13 11713 respectively13 The13 same13 qualitative13 results13 hold13 for13 the13 other13 two-shy‐dimensional13 response13 distributions13 The13 pattern13 also13 holds13 for13 higher-shy‐dimensional13 filter13 response13 distributions13 In13 higher13 dimensions13 however13 results13 are13 noisier13 because13 stable13 estimates13 of13 kurtosis13 require13 significantly13 more13 data13 13 13 13 13 13

  • jovi-14-02-01_118
    • Introduction
    • f01
    • Methods and results
    • e01
    • f02
    • e02
    • e03
    • e04
    • f03
    • e05
    • f04
    • f05
    • f06
    • f07
    • Discussion
    • Conclusions
    • Methods details
    • e06
    • e07
    • Albrecht1
    • Albrecht2
    • Anzai1
    • Badcock1
    • Banks1
    • Berens1
    • Blakemore1
    • Burge1
    • Burge2
    • Carandini1
    • Chen1
    • Cormack1
    • Cumming1
    • Cumming2
    • Cumming3
    • Curcio1
    • DeValois1
    • DeAngelis1
    • Ferster1
    • Fincham1
    • Fleet1
    • Ganguli1
    • Geisler1
    • Girshick1
    • Haefner1
    • Haefner2
    • Harris1
    • Hecht1
    • Heeger1
    • Hibbard1
    • Hoyer1
    • Jazayeri1
    • Kaye1
    • Landers1
    • Lehky1
    • Levi1
    • Lewicki1
    • Li1
    • Liu1
    • Ma1
    • Marcos1
    • Marr1
    • McKee1
    • Nakayama1
    • Nienborg1
    • Ogle1
    • Ohzawa1
    • Ohzawa2
    • Ohzawa3
    • Olshausen1
    • Panum1
    • Poggio1
    • Qian1
    • Qian2
    • Read1
    • Read2
    • Read3
    • Simoncelli1
    • Skottun1
    • Stevenson1
    • Stockman1
    • Tanabe1
    • Tanabe2
    • Thibos1
    • Tsao1
    • Tyler1
    • Tyler2
    • Vlaskamp1
    • Williams1
    • Wyszecki1
      • JOV-03741-2013-s01
Page 27: Optimal disparity estimation in natural stereo images · The local differences between the retinal images—the binocular disparities— ... vertically oriented receptive fields.The

913 13

13 Figure13 S213 Filter13 responses13 LL13 neuron13 responses13 and13 performance13 with13 random-shy‐dot13 stereograms13 and13 anti-shy‐correlated13 random-shy‐dot13 stereograms13 (a)13 Joint13 response13 distributions13 of13 filters13 F113 and13 F213 when13 stimulated13 with13 random-shy‐dot13 stereograms13 Compare13 to13 natural13 stimuli13 in13 Fig13 4a13 (b)13 Joint13 responses13 of13 filters13 F313 and13 F413 when13 stimulated13 with13 random-shy‐dot13 stereograms13 (c)13 LL13 neuron13 responses13 to13 random13 dot13 stereograms13 (d)13 LL13 neuron13 tuning13 vs13 bandwidth13 when13 stimulated13 with13 natural13 stimuli13 Note13 that13 bandwidths13 are13 elevated13 compared13 to13 when13 the13 LL13 neurons13 are13 stimulated13 with13 natural13 stimuli13 (see13 Fig13 7c)13 (e)13 Disparity13 estimates13 and13 estimate13 precision13 (inset)13 obtained13 with13 random-shy‐dot13 stereograms13 (fghi)13 Same13 as13 abce13 except13 that13 the13 stimuli13 were13 anti-shy‐correlated13 random13 dot13 stereograms13 Filter13 response13 distributions13 still13 segregate13 as13 a13 function13 of13 disparity13 (especially13 with13 F313 and13 F4)13 However13 the13 anti-shy‐correlated13 filter13 response13 distributions13 do13 not13 coincide13 with13 the13 lsquoexpectedrsquo13 pattern13 of13 response13 for13 particular13 disparities13 The13 disparities13 are13 decoded13 improperly13 with13 a13 decoder13 trained13 on13 natural13 stimuli13 LL13 neuron13 tuning13 curves13 with13 anti-shy‐correlated13 stereograms13 are13 neither13 selective13 nor13 invariant13 Inaccurate13 highly13 variable13 estimates13 result13 13 13

13 13

1013 13

13 Figure13 S313 Obtaining13 AMA13 filter13 and13 model13 complex13 cell13 responses13 with13 established13 neural13 operations13 (a)13 Simple13 cell13 responses13 are13 obtained13 by13 projecting13 the13 left-shy‐13 and13 right-shy‐eye13 signals13 onto13 each13 binocular13 filter13 and13 half-shy‐wave13 rectifying13 Subtracting13 lsquoonrsquo13 and13 lsquooffrsquo13 simple13 cell13 responses13 give13 the13 AMA13 filter13 responses13 (see13 Fig13 3a)13 (b)13 Model13 (and13 neurophysiological)13 complex13 cell13 responses13 are13 obtained13 by13 summing13 and13 squaring13 the13 responses13 of13 simple13 cells13 (Model13 complex13 cell13 responses13 can13 also13 be13 obtained13 by13 summing13 the13 responses13 of13 simple13 cells13 that13 have13 both13 a13 half-shy‐wave13 rectifying13 and13 a13 squaring13 nonlinearity)13 Note13 that13 the13 model13 complex13 cells13 differ13 from13 canonical13 disparity13 energy13 binocular13 complex13 cells13 We13 label13 them13 lsquocomplexrsquo13 because13 they13 exhibit13 temporal13 frequency13 doubling13 and13 have13 a13 fundamental13 to13 DC13 response13 ratio13 of13 less13 than13 one13 The13 model13 LL13 neurons13 (see13 Fig13 613 and13 Fig13 7)13 are13 more13 similar13 to13 disparity13 energy13 complex13 cells13 although13 as13 noted13 in13 the13 text13 there13 are13 also13 important13 differences13

13

1113 13

13 Figure13 S413 The13 eight13 AMA13 filters13 from13 Fig13 313 (on13 the13 diagonal)13 and13 their13 pairwise13 sums13 (see13 Fig13 S3a)13 Solid13 and13 dashed13 curves13 indicate13 the13 left13 and13 right13 eye13 filter13 components13 respectively13 Filter13 identity13 is13 indicated13 by13 the13 subscripts13 Fij13 Some13 filters13 have13 shapes13 similar13 to13 canonical13 V1-shy‐like13 binocular13 receptive13 fields13 These13 filters13 yield13 model13 complex13 cell13 tuning13 curves13 that13 vary13 with13 disparity13 (Fig13 S5)13 Other13 filters13 are13 irregularly13 shaped13 (eg13 F36)13 Irregularly13 shaped13 filters13 yield13 model13 complex13 cell13 tuning13 curves13 that13 do13 not13 vary13 significantly13 with13 disparity13 (Fig13 S5)13 The13 inset13 shows13 the13 largest13 absolute13 normalized13 weight13 (see13 Supplement13 Eqs13 S1a-shy‐d)13 given13 to13 each13 filter13 in13 constructing13 the13 set13 of13 LL13 neurons13 (Fig13 7a)13 Lighter13 colors13 indicate13 smaller13 weights13 Unsurprisingly13 the13 irregularly13 shaped13 filters13 (eg13 F36)13 play13 a13 generally13 smaller13 role13 in13 shaping13 the13 response13 properties13 of13 log-shy‐likelihood13 (LL)13 neurons13 13 13

1213 13

13 Figure13 S513 Disparity13 tuning13 curves13 of13 the13 model13 complex13 cells13 implied13 by13 the13 AMA13 filters13 (on-shy‐diagonal)13 and13 their13 normalized13 pairwise13 sums13 (off-shy‐diagonal)13 Solid13 curves13 show13 the13 complex13 cell13 tuning13 curves13 to13 natural13 stimuli13 The13

complex13 response13 is13 given13 by13 filtering13 the13 contrast-shy‐normalized13 stimulus13 and13 then13 squaring13 Rmax fi

Tcnorm( )2 13 (see13 Fig13 S4)13 Each13 point13 on13 the13 curve13 represents13 the13 average13 response13 to13 a13 large13 number13 of13 natural13 stimuli13 having13 the13 same13 disparity13 Gray13 area13 shows13 response13 variability13 due13 to13 variation13 in13 natural13 image13 content13 Dashed13 curves13 show13 the13 complex13 cell13 tuning13 curve13 to13 513 random-shy‐dot13 stereograms13 (RDSs)13 In13 most13 cases13 RDS13 tuning13 curves13 are13 similar13 to13 the13 tuning13 curves13 for13 natural13 stimuli13 although13 response13 magnitude13 is13 generally13 somewhat13 attenuated13 These13 individual13 model13 complex13 cells13 have13 some13 selectivity13 for13 disparity13 However13 they13 exhibit13 little13 invariance13 to13 variation13 in13 natural13 image13 content13 Thus13 the13 pattern13 of13 the13 joint13 population13 response13 must13 be13 used13 to13 accurately13 estimate13 disparity13 Neurons13 that13 signal13 the13 log-shy‐likelihood13 of13 disparity13 (LL13 neurons)13 can13 be13 constructed13 by13 appropriately13 combining13 the13 joint13 responses13 of13 model13 complex13 cells13 LL13 neurons13 are13 highly13 selective13 for13 disparity13 and13 are13 also13 strongly13 response13 invariant13 to13 irrelevant13

1313 13

13 Figure13 S613 Optimal13 disparity13 estimation13 vs13 estimation13 via13 local13 cross-shy‐correlation13 Performance13 measures13 were13 obtained13 by13 testing13 each13 method13 on13 identical13 sets13 of13 left-shy‐13 and13 right-shy‐eye13 signals13 like13 those13 shown13 in13 Fig13 1d13 (Note13 that13 this13 differs13 somewhat13 from13 the13 most13 common13 test13 of13 the13 cross-shy‐correlation13 model13 which13 gives13 the13 algorithm13 windowed13 access13 to13 many13 patches13 in13 each13 eye13 We13 chose13 the13 current13 method13 so13 that13 each13 algorithm13 has13 access13 to13 the13 same13 image13 information13 thereby13 making13 the13 comparison13 meaningful)13 Black13 curves13 indicate13 estimation13 performance13 via13 local13 cross-shy‐correlation13 Gray13 curves13 indicate13 performance13 obtained13 with13 the13 eight13 AMA13 filters13 shown13 in13 the13 main13 text13 (data13 replotted13 from13 Fig13 5)13 (a)13 Median13 MAP13 estimates13 for13 frontoparallel13 surfaces13 only13 Error13 bars13 indicate13 6813 confidence13 intervals13 (b)13 Estimate13 precision13 as13 indicated13 by13 6813 confidence13 intervals13 Error13 bars13 from13 (a)13 are13 replotted13 on13 a13 linear13 axis13 (Confidence13 intervals13 were13 plotted13 on13 a13 semi-shy‐log13 plot13 in13 the13 main13 text)13 (c)13 Sign13 identification13 performance13 with13 frontoparallel13 surfaces13 (de)13 Same13 as13 bc13 except13 that13 the13 performance13 is13 for13 surfaces13 with13 a13 cosine13 distribution13 of13 slants13 Estimation13 performance13 with13 the13 cross-shy‐correlator13 is13 (slightly)13 more13 biased13 and13 is13 significantly13 less13 precise13 at13 disparities13 larger13 than13 7513 arcmin13 13

1413 13

13 Figure13 S713 Information13 for13 disparity13 estimation13 is13 concentrated13 in13 low13 to13 mid13 spatial13 frequencies13 (a)13 Inter-shy‐ocular13 retinal13 contrast13 amplitude13 of13 a13 cosine-shy‐windowed13 high-shy‐contrast13 edge13 as13 a13 function13 of13 spatial13 frequency13 for13 different13 disparities13 (Eq13 S6)13 The13 optics13 in13 the13 two13 eyes13 were13 identical13 ie13

AL f( ) = AR f( )prop MTFδ f( ) f 13 where13 MTFδ is13 the13 modulation13 transfer13 function13 associated13 with13 a13 particular13 disparity13 (see13 Methods)13 (b)13 Inter-shy‐ocular13 contrast13 signals13 that13 have13 been13 passed13 through13 a13 bank13 of13 1513 octave13 bandwidth13 log-shy‐Gabor13 filters13 The13 signals13 differ13 little13 above13 ~513 cpd13 (c)13 The13 effect13 of13 vergence13 jitter13 on13 inter-shy‐ocular13 contrast13 signals13 Signals13 were13 corrupted13 by13 213 arcmin13 of13 vergence13 jitter13 a13 typical13 value13 in13 humans13 Vergence13 noise13 was13 simulated13 by13 convolving13 each13 disparity13 signal13 with13 a13 Gaussian13 having13 a13 standard13 deviation13 of13 213 arcmin13 Vergence13 jitter13 further13 attenuates13 high13 frequency13 inter-shy‐ocular13 contrast13 signals13 Natural13 image13 structure13 and13 vergence13 jitter13 both13 contribute13 to13 the13 lack13 of13 disparity13 information13 at13 high13 luminance13 spatial13 frequencies13 (d)13 The13 signals13 in13 (c)13 after13 having13 been13 passed13 through13 a13 bank13 of13 1513 octave13 bandwidth13 log-shy‐Gabor13 filters13 (e)13 The13 effect13 of13 a13 four13 diopter13 difference13 in13 refractive13 power13 between13 the13 two13 eyes13 drastically13 reduces13 the13 inter-shy‐ocular13 contrast13 signal13 A13 four13 diopter13 (or13 larger)13 difference13 between13 the13 left13 and13 right13 optics13 is13 a13 primary13 risk13 factor13 for13 the13 development13 of13 amblyopia13 (Levi13 et13 al13 2011)13 Note13 that13 these13 signals13 were13 simulated13 for13 eyes13 with13 2mm13 pupils13 With13 larger13 pupil13 diameters13 the13 same13 difference13 in13 refractive13 power13 would13 further13 reduce13 available13 disparity13 signals13 For13 example13 with13 a13 2mm13 pupil13 an13 813 diopter13 difference13 between13 the13 eyes13 produces13 the13 same13 difference13 in13 defocus13 blur13 as13 would13 a13 4mm13 pupil13 with13 a13 413 diopters13 difference13 between13 the13 eyes13 In13 that13 case13 there13 are13 no13 measureable13 signals13 above13 113 cpd13 (f)13 The13 signals13 in13 (e)13 after13 having13 been13 passed13 through13 a13 bank13 of13 1513 octave13 bandwidth13 log-shy‐Gabor13 filters13 (g)13 Same13 as13 (e)13 but13 with13 a13 4mm13 pupil13 (h)13 Same13 as13 (f)13 but13 with13 a13 4mm13 pupil13 13

13

1513 13

13 Figure13 S813 Depth13 structure13 not13 attributable13 to13 slant13 in13 natural13 scenes13 (a)13 A13 representative13 natural13 range13 image13 (b)13 Overhead13 plan13 view13 of13 the13 range13 along13 the13 white13 scan-shy‐line13 in13 (a)13 (c)13 Range13 values13 within13 a13 113 deg13 patch13 marked13 by13 the13 white13 box13 in13 (a)13 and13 the13 black13 box13 in13 (b)13 The13 dots13 represent13 the13 raw13 range13 values13 The13 plane13 is13 the13 best13 fitting13 plane13 to13 the13 range13 data13 (d)13 Depth13 variation13 in13 113 deg13 patches13 of13 natural13 scenes13 can13 largely13 be13 captured13 by13 slanted13 planes13 (see13 supplementary13 text)13 Values13 near13 one13 indicate13 that13 all13 range13 variation13 can13 be13 captured13 by13 (planar)13 surface13 slant13 Values13 near13 zero13 indicate13 that13 none13 of13 the13 range13 variation13 can13 be13 captured13 by13 planar13 surface13 slant13 In13 the13 most13 common13 patch13 nearly13 all13 depth13 variation13 is13 attributable13 to13 slant13 In13 the13 median13 patch13 7513 of13 depth13 variation13 is13 captured13 by13 slant13 13 13 13 13 13 13 13 13 13 13

1613 13

13 Figure13 S913 Contrast13 normalization13 and13 the13 Gaussian13 form13 of13 the13 filter13 response13 distributions13 The13 standard13 model13 of13

contrast13 normalization13 in13 retinal13 and13 cortical13 neurons13 is13 given13 by13 cnorm x( ) = c x( ) c x( ) 2

+ nc502 (Albrecht13 amp13 Geisler13

199113 Albrecht13 amp13 Hamilton13 198213 Heeger13 1992)13 Throughout13 the13 paper13 we13 assumed13 that13 c50 13 had13 a13 value13 of13 zero13 To13 examine13 the13 effect13 of13 different13 values13 of13 c50 13 on13 the13 form13 of13 the13 filter13 response13 distributions13 we13 first13 contrast-shy‐normalized13 the13 test13 stimuli13 using13 different13 values13 of13 c50 13 Then13 we13 projected13 the13 contrast-shy‐normalized13 the13 test13 stimuli13 from13 each13 disparity13 level13 onto13 the13 filters13 to13 obtain13 the13 conditional13 filter13 response13 distributions13 Finally13 we13 fit13 a13 multi-shy‐dimensional13 generalized13 Gaussian13 (one-shy‐dimension13 for13 each13 filter)13 to13 each13 conditional13 response13 distribution13

P R |δ k( ) 13 via13 maximum13 likelihood13 estimation13 (a)13 When13 c50 =13 0013 the13 conditional13 response13 distributions13 have13 tails13 that13 tend13 to13 be13 somewhat13 lighter13 than13 Gaussians13 (ie13 lower13 kurtosis13 than13 a13 Gaussian)13 When13 c50 =13 0113 the13 conditional13 response13 distributions13 are13 most13 Gaussian13 on13 average13 For13 c50 0113 the13 distributions13 have13 heavier13 tails13 than13 Gaussians13 (ie13 higher13 kurtosis13 than13 a13 Gaussian)13 (b)13 Powers13 of13 the13 multi-shy‐dimensional13 generalized13 Gaussians13 that13 best13 fit13 a13 subset13 of13 the13 two-shy‐dimensional13 conditional13 filter13 response13 distributions13 A13 power13 of13 two13 (dashed13 line)13 indicates13 that13 the13 best13 fitting13 distribution13 was13 Gaussian13 Powers13 greater13 than13 or13 less13 than13 two13 indicate13 that13 the13 responses13 are13 better13 fit13 by13 distributions13 having13 lighter13 or13 heavier13 tails13 than13 a13 Gaussian13 respectively13 The13 best-shy‐fitting13 powers13 are13 shown13 for13 three13 different13 values13 of13 c50 13 00013 01013 and13 02513 The13 average13 powers13 across13 the13 conditional13 response13 distributions13 for13 the13 three13 tested13 values13 of13 c50 13 were13 27913 20713 and13 11713 respectively13 The13 same13 qualitative13 results13 hold13 for13 the13 other13 two-shy‐dimensional13 response13 distributions13 The13 pattern13 also13 holds13 for13 higher-shy‐dimensional13 filter13 response13 distributions13 In13 higher13 dimensions13 however13 results13 are13 noisier13 because13 stable13 estimates13 of13 kurtosis13 require13 significantly13 more13 data13 13 13 13 13 13

  • jovi-14-02-01_118
    • Introduction
    • f01
    • Methods and results
    • e01
    • f02
    • e02
    • e03
    • e04
    • f03
    • e05
    • f04
    • f05
    • f06
    • f07
    • Discussion
    • Conclusions
    • Methods details
    • e06
    • e07
    • Albrecht1
    • Albrecht2
    • Anzai1
    • Badcock1
    • Banks1
    • Berens1
    • Blakemore1
    • Burge1
    • Burge2
    • Carandini1
    • Chen1
    • Cormack1
    • Cumming1
    • Cumming2
    • Cumming3
    • Curcio1
    • DeValois1
    • DeAngelis1
    • Ferster1
    • Fincham1
    • Fleet1
    • Ganguli1
    • Geisler1
    • Girshick1
    • Haefner1
    • Haefner2
    • Harris1
    • Hecht1
    • Heeger1
    • Hibbard1
    • Hoyer1
    • Jazayeri1
    • Kaye1
    • Landers1
    • Lehky1
    • Levi1
    • Lewicki1
    • Li1
    • Liu1
    • Ma1
    • Marcos1
    • Marr1
    • McKee1
    • Nakayama1
    • Nienborg1
    • Ogle1
    • Ohzawa1
    • Ohzawa2
    • Ohzawa3
    • Olshausen1
    • Panum1
    • Poggio1
    • Qian1
    • Qian2
    • Read1
    • Read2
    • Read3
    • Simoncelli1
    • Skottun1
    • Stevenson1
    • Stockman1
    • Tanabe1
    • Tanabe2
    • Thibos1
    • Tsao1
    • Tyler1
    • Tyler2
    • Vlaskamp1
    • Williams1
    • Wyszecki1
      • JOV-03741-2013-s01
Page 28: Optimal disparity estimation in natural stereo images · The local differences between the retinal images—the binocular disparities— ... vertically oriented receptive fields.The

1013 13

13 Figure13 S313 Obtaining13 AMA13 filter13 and13 model13 complex13 cell13 responses13 with13 established13 neural13 operations13 (a)13 Simple13 cell13 responses13 are13 obtained13 by13 projecting13 the13 left-shy‐13 and13 right-shy‐eye13 signals13 onto13 each13 binocular13 filter13 and13 half-shy‐wave13 rectifying13 Subtracting13 lsquoonrsquo13 and13 lsquooffrsquo13 simple13 cell13 responses13 give13 the13 AMA13 filter13 responses13 (see13 Fig13 3a)13 (b)13 Model13 (and13 neurophysiological)13 complex13 cell13 responses13 are13 obtained13 by13 summing13 and13 squaring13 the13 responses13 of13 simple13 cells13 (Model13 complex13 cell13 responses13 can13 also13 be13 obtained13 by13 summing13 the13 responses13 of13 simple13 cells13 that13 have13 both13 a13 half-shy‐wave13 rectifying13 and13 a13 squaring13 nonlinearity)13 Note13 that13 the13 model13 complex13 cells13 differ13 from13 canonical13 disparity13 energy13 binocular13 complex13 cells13 We13 label13 them13 lsquocomplexrsquo13 because13 they13 exhibit13 temporal13 frequency13 doubling13 and13 have13 a13 fundamental13 to13 DC13 response13 ratio13 of13 less13 than13 one13 The13 model13 LL13 neurons13 (see13 Fig13 613 and13 Fig13 7)13 are13 more13 similar13 to13 disparity13 energy13 complex13 cells13 although13 as13 noted13 in13 the13 text13 there13 are13 also13 important13 differences13

13

1113 13

13 Figure13 S413 The13 eight13 AMA13 filters13 from13 Fig13 313 (on13 the13 diagonal)13 and13 their13 pairwise13 sums13 (see13 Fig13 S3a)13 Solid13 and13 dashed13 curves13 indicate13 the13 left13 and13 right13 eye13 filter13 components13 respectively13 Filter13 identity13 is13 indicated13 by13 the13 subscripts13 Fij13 Some13 filters13 have13 shapes13 similar13 to13 canonical13 V1-shy‐like13 binocular13 receptive13 fields13 These13 filters13 yield13 model13 complex13 cell13 tuning13 curves13 that13 vary13 with13 disparity13 (Fig13 S5)13 Other13 filters13 are13 irregularly13 shaped13 (eg13 F36)13 Irregularly13 shaped13 filters13 yield13 model13 complex13 cell13 tuning13 curves13 that13 do13 not13 vary13 significantly13 with13 disparity13 (Fig13 S5)13 The13 inset13 shows13 the13 largest13 absolute13 normalized13 weight13 (see13 Supplement13 Eqs13 S1a-shy‐d)13 given13 to13 each13 filter13 in13 constructing13 the13 set13 of13 LL13 neurons13 (Fig13 7a)13 Lighter13 colors13 indicate13 smaller13 weights13 Unsurprisingly13 the13 irregularly13 shaped13 filters13 (eg13 F36)13 play13 a13 generally13 smaller13 role13 in13 shaping13 the13 response13 properties13 of13 log-shy‐likelihood13 (LL)13 neurons13 13 13

1213 13

13 Figure13 S513 Disparity13 tuning13 curves13 of13 the13 model13 complex13 cells13 implied13 by13 the13 AMA13 filters13 (on-shy‐diagonal)13 and13 their13 normalized13 pairwise13 sums13 (off-shy‐diagonal)13 Solid13 curves13 show13 the13 complex13 cell13 tuning13 curves13 to13 natural13 stimuli13 The13

complex13 response13 is13 given13 by13 filtering13 the13 contrast-shy‐normalized13 stimulus13 and13 then13 squaring13 Rmax fi

Tcnorm( )2 13 (see13 Fig13 S4)13 Each13 point13 on13 the13 curve13 represents13 the13 average13 response13 to13 a13 large13 number13 of13 natural13 stimuli13 having13 the13 same13 disparity13 Gray13 area13 shows13 response13 variability13 due13 to13 variation13 in13 natural13 image13 content13 Dashed13 curves13 show13 the13 complex13 cell13 tuning13 curve13 to13 513 random-shy‐dot13 stereograms13 (RDSs)13 In13 most13 cases13 RDS13 tuning13 curves13 are13 similar13 to13 the13 tuning13 curves13 for13 natural13 stimuli13 although13 response13 magnitude13 is13 generally13 somewhat13 attenuated13 These13 individual13 model13 complex13 cells13 have13 some13 selectivity13 for13 disparity13 However13 they13 exhibit13 little13 invariance13 to13 variation13 in13 natural13 image13 content13 Thus13 the13 pattern13 of13 the13 joint13 population13 response13 must13 be13 used13 to13 accurately13 estimate13 disparity13 Neurons13 that13 signal13 the13 log-shy‐likelihood13 of13 disparity13 (LL13 neurons)13 can13 be13 constructed13 by13 appropriately13 combining13 the13 joint13 responses13 of13 model13 complex13 cells13 LL13 neurons13 are13 highly13 selective13 for13 disparity13 and13 are13 also13 strongly13 response13 invariant13 to13 irrelevant13

1313 13

13 Figure13 S613 Optimal13 disparity13 estimation13 vs13 estimation13 via13 local13 cross-shy‐correlation13 Performance13 measures13 were13 obtained13 by13 testing13 each13 method13 on13 identical13 sets13 of13 left-shy‐13 and13 right-shy‐eye13 signals13 like13 those13 shown13 in13 Fig13 1d13 (Note13 that13 this13 differs13 somewhat13 from13 the13 most13 common13 test13 of13 the13 cross-shy‐correlation13 model13 which13 gives13 the13 algorithm13 windowed13 access13 to13 many13 patches13 in13 each13 eye13 We13 chose13 the13 current13 method13 so13 that13 each13 algorithm13 has13 access13 to13 the13 same13 image13 information13 thereby13 making13 the13 comparison13 meaningful)13 Black13 curves13 indicate13 estimation13 performance13 via13 local13 cross-shy‐correlation13 Gray13 curves13 indicate13 performance13 obtained13 with13 the13 eight13 AMA13 filters13 shown13 in13 the13 main13 text13 (data13 replotted13 from13 Fig13 5)13 (a)13 Median13 MAP13 estimates13 for13 frontoparallel13 surfaces13 only13 Error13 bars13 indicate13 6813 confidence13 intervals13 (b)13 Estimate13 precision13 as13 indicated13 by13 6813 confidence13 intervals13 Error13 bars13 from13 (a)13 are13 replotted13 on13 a13 linear13 axis13 (Confidence13 intervals13 were13 plotted13 on13 a13 semi-shy‐log13 plot13 in13 the13 main13 text)13 (c)13 Sign13 identification13 performance13 with13 frontoparallel13 surfaces13 (de)13 Same13 as13 bc13 except13 that13 the13 performance13 is13 for13 surfaces13 with13 a13 cosine13 distribution13 of13 slants13 Estimation13 performance13 with13 the13 cross-shy‐correlator13 is13 (slightly)13 more13 biased13 and13 is13 significantly13 less13 precise13 at13 disparities13 larger13 than13 7513 arcmin13 13

1413 13

13 Figure13 S713 Information13 for13 disparity13 estimation13 is13 concentrated13 in13 low13 to13 mid13 spatial13 frequencies13 (a)13 Inter-shy‐ocular13 retinal13 contrast13 amplitude13 of13 a13 cosine-shy‐windowed13 high-shy‐contrast13 edge13 as13 a13 function13 of13 spatial13 frequency13 for13 different13 disparities13 (Eq13 S6)13 The13 optics13 in13 the13 two13 eyes13 were13 identical13 ie13

AL f( ) = AR f( )prop MTFδ f( ) f 13 where13 MTFδ is13 the13 modulation13 transfer13 function13 associated13 with13 a13 particular13 disparity13 (see13 Methods)13 (b)13 Inter-shy‐ocular13 contrast13 signals13 that13 have13 been13 passed13 through13 a13 bank13 of13 1513 octave13 bandwidth13 log-shy‐Gabor13 filters13 The13 signals13 differ13 little13 above13 ~513 cpd13 (c)13 The13 effect13 of13 vergence13 jitter13 on13 inter-shy‐ocular13 contrast13 signals13 Signals13 were13 corrupted13 by13 213 arcmin13 of13 vergence13 jitter13 a13 typical13 value13 in13 humans13 Vergence13 noise13 was13 simulated13 by13 convolving13 each13 disparity13 signal13 with13 a13 Gaussian13 having13 a13 standard13 deviation13 of13 213 arcmin13 Vergence13 jitter13 further13 attenuates13 high13 frequency13 inter-shy‐ocular13 contrast13 signals13 Natural13 image13 structure13 and13 vergence13 jitter13 both13 contribute13 to13 the13 lack13 of13 disparity13 information13 at13 high13 luminance13 spatial13 frequencies13 (d)13 The13 signals13 in13 (c)13 after13 having13 been13 passed13 through13 a13 bank13 of13 1513 octave13 bandwidth13 log-shy‐Gabor13 filters13 (e)13 The13 effect13 of13 a13 four13 diopter13 difference13 in13 refractive13 power13 between13 the13 two13 eyes13 drastically13 reduces13 the13 inter-shy‐ocular13 contrast13 signal13 A13 four13 diopter13 (or13 larger)13 difference13 between13 the13 left13 and13 right13 optics13 is13 a13 primary13 risk13 factor13 for13 the13 development13 of13 amblyopia13 (Levi13 et13 al13 2011)13 Note13 that13 these13 signals13 were13 simulated13 for13 eyes13 with13 2mm13 pupils13 With13 larger13 pupil13 diameters13 the13 same13 difference13 in13 refractive13 power13 would13 further13 reduce13 available13 disparity13 signals13 For13 example13 with13 a13 2mm13 pupil13 an13 813 diopter13 difference13 between13 the13 eyes13 produces13 the13 same13 difference13 in13 defocus13 blur13 as13 would13 a13 4mm13 pupil13 with13 a13 413 diopters13 difference13 between13 the13 eyes13 In13 that13 case13 there13 are13 no13 measureable13 signals13 above13 113 cpd13 (f)13 The13 signals13 in13 (e)13 after13 having13 been13 passed13 through13 a13 bank13 of13 1513 octave13 bandwidth13 log-shy‐Gabor13 filters13 (g)13 Same13 as13 (e)13 but13 with13 a13 4mm13 pupil13 (h)13 Same13 as13 (f)13 but13 with13 a13 4mm13 pupil13 13

13

1513 13

13 Figure13 S813 Depth13 structure13 not13 attributable13 to13 slant13 in13 natural13 scenes13 (a)13 A13 representative13 natural13 range13 image13 (b)13 Overhead13 plan13 view13 of13 the13 range13 along13 the13 white13 scan-shy‐line13 in13 (a)13 (c)13 Range13 values13 within13 a13 113 deg13 patch13 marked13 by13 the13 white13 box13 in13 (a)13 and13 the13 black13 box13 in13 (b)13 The13 dots13 represent13 the13 raw13 range13 values13 The13 plane13 is13 the13 best13 fitting13 plane13 to13 the13 range13 data13 (d)13 Depth13 variation13 in13 113 deg13 patches13 of13 natural13 scenes13 can13 largely13 be13 captured13 by13 slanted13 planes13 (see13 supplementary13 text)13 Values13 near13 one13 indicate13 that13 all13 range13 variation13 can13 be13 captured13 by13 (planar)13 surface13 slant13 Values13 near13 zero13 indicate13 that13 none13 of13 the13 range13 variation13 can13 be13 captured13 by13 planar13 surface13 slant13 In13 the13 most13 common13 patch13 nearly13 all13 depth13 variation13 is13 attributable13 to13 slant13 In13 the13 median13 patch13 7513 of13 depth13 variation13 is13 captured13 by13 slant13 13 13 13 13 13 13 13 13 13 13

1613 13

13 Figure13 S913 Contrast13 normalization13 and13 the13 Gaussian13 form13 of13 the13 filter13 response13 distributions13 The13 standard13 model13 of13

contrast13 normalization13 in13 retinal13 and13 cortical13 neurons13 is13 given13 by13 cnorm x( ) = c x( ) c x( ) 2

+ nc502 (Albrecht13 amp13 Geisler13

199113 Albrecht13 amp13 Hamilton13 198213 Heeger13 1992)13 Throughout13 the13 paper13 we13 assumed13 that13 c50 13 had13 a13 value13 of13 zero13 To13 examine13 the13 effect13 of13 different13 values13 of13 c50 13 on13 the13 form13 of13 the13 filter13 response13 distributions13 we13 first13 contrast-shy‐normalized13 the13 test13 stimuli13 using13 different13 values13 of13 c50 13 Then13 we13 projected13 the13 contrast-shy‐normalized13 the13 test13 stimuli13 from13 each13 disparity13 level13 onto13 the13 filters13 to13 obtain13 the13 conditional13 filter13 response13 distributions13 Finally13 we13 fit13 a13 multi-shy‐dimensional13 generalized13 Gaussian13 (one-shy‐dimension13 for13 each13 filter)13 to13 each13 conditional13 response13 distribution13

P R |δ k( ) 13 via13 maximum13 likelihood13 estimation13 (a)13 When13 c50 =13 0013 the13 conditional13 response13 distributions13 have13 tails13 that13 tend13 to13 be13 somewhat13 lighter13 than13 Gaussians13 (ie13 lower13 kurtosis13 than13 a13 Gaussian)13 When13 c50 =13 0113 the13 conditional13 response13 distributions13 are13 most13 Gaussian13 on13 average13 For13 c50 0113 the13 distributions13 have13 heavier13 tails13 than13 Gaussians13 (ie13 higher13 kurtosis13 than13 a13 Gaussian)13 (b)13 Powers13 of13 the13 multi-shy‐dimensional13 generalized13 Gaussians13 that13 best13 fit13 a13 subset13 of13 the13 two-shy‐dimensional13 conditional13 filter13 response13 distributions13 A13 power13 of13 two13 (dashed13 line)13 indicates13 that13 the13 best13 fitting13 distribution13 was13 Gaussian13 Powers13 greater13 than13 or13 less13 than13 two13 indicate13 that13 the13 responses13 are13 better13 fit13 by13 distributions13 having13 lighter13 or13 heavier13 tails13 than13 a13 Gaussian13 respectively13 The13 best-shy‐fitting13 powers13 are13 shown13 for13 three13 different13 values13 of13 c50 13 00013 01013 and13 02513 The13 average13 powers13 across13 the13 conditional13 response13 distributions13 for13 the13 three13 tested13 values13 of13 c50 13 were13 27913 20713 and13 11713 respectively13 The13 same13 qualitative13 results13 hold13 for13 the13 other13 two-shy‐dimensional13 response13 distributions13 The13 pattern13 also13 holds13 for13 higher-shy‐dimensional13 filter13 response13 distributions13 In13 higher13 dimensions13 however13 results13 are13 noisier13 because13 stable13 estimates13 of13 kurtosis13 require13 significantly13 more13 data13 13 13 13 13 13

  • jovi-14-02-01_118
    • Introduction
    • f01
    • Methods and results
    • e01
    • f02
    • e02
    • e03
    • e04
    • f03
    • e05
    • f04
    • f05
    • f06
    • f07
    • Discussion
    • Conclusions
    • Methods details
    • e06
    • e07
    • Albrecht1
    • Albrecht2
    • Anzai1
    • Badcock1
    • Banks1
    • Berens1
    • Blakemore1
    • Burge1
    • Burge2
    • Carandini1
    • Chen1
    • Cormack1
    • Cumming1
    • Cumming2
    • Cumming3
    • Curcio1
    • DeValois1
    • DeAngelis1
    • Ferster1
    • Fincham1
    • Fleet1
    • Ganguli1
    • Geisler1
    • Girshick1
    • Haefner1
    • Haefner2
    • Harris1
    • Hecht1
    • Heeger1
    • Hibbard1
    • Hoyer1
    • Jazayeri1
    • Kaye1
    • Landers1
    • Lehky1
    • Levi1
    • Lewicki1
    • Li1
    • Liu1
    • Ma1
    • Marcos1
    • Marr1
    • McKee1
    • Nakayama1
    • Nienborg1
    • Ogle1
    • Ohzawa1
    • Ohzawa2
    • Ohzawa3
    • Olshausen1
    • Panum1
    • Poggio1
    • Qian1
    • Qian2
    • Read1
    • Read2
    • Read3
    • Simoncelli1
    • Skottun1
    • Stevenson1
    • Stockman1
    • Tanabe1
    • Tanabe2
    • Thibos1
    • Tsao1
    • Tyler1
    • Tyler2
    • Vlaskamp1
    • Williams1
    • Wyszecki1
      • JOV-03741-2013-s01
Page 29: Optimal disparity estimation in natural stereo images · The local differences between the retinal images—the binocular disparities— ... vertically oriented receptive fields.The

1113 13

13 Figure13 S413 The13 eight13 AMA13 filters13 from13 Fig13 313 (on13 the13 diagonal)13 and13 their13 pairwise13 sums13 (see13 Fig13 S3a)13 Solid13 and13 dashed13 curves13 indicate13 the13 left13 and13 right13 eye13 filter13 components13 respectively13 Filter13 identity13 is13 indicated13 by13 the13 subscripts13 Fij13 Some13 filters13 have13 shapes13 similar13 to13 canonical13 V1-shy‐like13 binocular13 receptive13 fields13 These13 filters13 yield13 model13 complex13 cell13 tuning13 curves13 that13 vary13 with13 disparity13 (Fig13 S5)13 Other13 filters13 are13 irregularly13 shaped13 (eg13 F36)13 Irregularly13 shaped13 filters13 yield13 model13 complex13 cell13 tuning13 curves13 that13 do13 not13 vary13 significantly13 with13 disparity13 (Fig13 S5)13 The13 inset13 shows13 the13 largest13 absolute13 normalized13 weight13 (see13 Supplement13 Eqs13 S1a-shy‐d)13 given13 to13 each13 filter13 in13 constructing13 the13 set13 of13 LL13 neurons13 (Fig13 7a)13 Lighter13 colors13 indicate13 smaller13 weights13 Unsurprisingly13 the13 irregularly13 shaped13 filters13 (eg13 F36)13 play13 a13 generally13 smaller13 role13 in13 shaping13 the13 response13 properties13 of13 log-shy‐likelihood13 (LL)13 neurons13 13 13

1213 13

13 Figure13 S513 Disparity13 tuning13 curves13 of13 the13 model13 complex13 cells13 implied13 by13 the13 AMA13 filters13 (on-shy‐diagonal)13 and13 their13 normalized13 pairwise13 sums13 (off-shy‐diagonal)13 Solid13 curves13 show13 the13 complex13 cell13 tuning13 curves13 to13 natural13 stimuli13 The13

complex13 response13 is13 given13 by13 filtering13 the13 contrast-shy‐normalized13 stimulus13 and13 then13 squaring13 Rmax fi

Tcnorm( )2 13 (see13 Fig13 S4)13 Each13 point13 on13 the13 curve13 represents13 the13 average13 response13 to13 a13 large13 number13 of13 natural13 stimuli13 having13 the13 same13 disparity13 Gray13 area13 shows13 response13 variability13 due13 to13 variation13 in13 natural13 image13 content13 Dashed13 curves13 show13 the13 complex13 cell13 tuning13 curve13 to13 513 random-shy‐dot13 stereograms13 (RDSs)13 In13 most13 cases13 RDS13 tuning13 curves13 are13 similar13 to13 the13 tuning13 curves13 for13 natural13 stimuli13 although13 response13 magnitude13 is13 generally13 somewhat13 attenuated13 These13 individual13 model13 complex13 cells13 have13 some13 selectivity13 for13 disparity13 However13 they13 exhibit13 little13 invariance13 to13 variation13 in13 natural13 image13 content13 Thus13 the13 pattern13 of13 the13 joint13 population13 response13 must13 be13 used13 to13 accurately13 estimate13 disparity13 Neurons13 that13 signal13 the13 log-shy‐likelihood13 of13 disparity13 (LL13 neurons)13 can13 be13 constructed13 by13 appropriately13 combining13 the13 joint13 responses13 of13 model13 complex13 cells13 LL13 neurons13 are13 highly13 selective13 for13 disparity13 and13 are13 also13 strongly13 response13 invariant13 to13 irrelevant13

1313 13

13 Figure13 S613 Optimal13 disparity13 estimation13 vs13 estimation13 via13 local13 cross-shy‐correlation13 Performance13 measures13 were13 obtained13 by13 testing13 each13 method13 on13 identical13 sets13 of13 left-shy‐13 and13 right-shy‐eye13 signals13 like13 those13 shown13 in13 Fig13 1d13 (Note13 that13 this13 differs13 somewhat13 from13 the13 most13 common13 test13 of13 the13 cross-shy‐correlation13 model13 which13 gives13 the13 algorithm13 windowed13 access13 to13 many13 patches13 in13 each13 eye13 We13 chose13 the13 current13 method13 so13 that13 each13 algorithm13 has13 access13 to13 the13 same13 image13 information13 thereby13 making13 the13 comparison13 meaningful)13 Black13 curves13 indicate13 estimation13 performance13 via13 local13 cross-shy‐correlation13 Gray13 curves13 indicate13 performance13 obtained13 with13 the13 eight13 AMA13 filters13 shown13 in13 the13 main13 text13 (data13 replotted13 from13 Fig13 5)13 (a)13 Median13 MAP13 estimates13 for13 frontoparallel13 surfaces13 only13 Error13 bars13 indicate13 6813 confidence13 intervals13 (b)13 Estimate13 precision13 as13 indicated13 by13 6813 confidence13 intervals13 Error13 bars13 from13 (a)13 are13 replotted13 on13 a13 linear13 axis13 (Confidence13 intervals13 were13 plotted13 on13 a13 semi-shy‐log13 plot13 in13 the13 main13 text)13 (c)13 Sign13 identification13 performance13 with13 frontoparallel13 surfaces13 (de)13 Same13 as13 bc13 except13 that13 the13 performance13 is13 for13 surfaces13 with13 a13 cosine13 distribution13 of13 slants13 Estimation13 performance13 with13 the13 cross-shy‐correlator13 is13 (slightly)13 more13 biased13 and13 is13 significantly13 less13 precise13 at13 disparities13 larger13 than13 7513 arcmin13 13

1413 13

13 Figure13 S713 Information13 for13 disparity13 estimation13 is13 concentrated13 in13 low13 to13 mid13 spatial13 frequencies13 (a)13 Inter-shy‐ocular13 retinal13 contrast13 amplitude13 of13 a13 cosine-shy‐windowed13 high-shy‐contrast13 edge13 as13 a13 function13 of13 spatial13 frequency13 for13 different13 disparities13 (Eq13 S6)13 The13 optics13 in13 the13 two13 eyes13 were13 identical13 ie13

AL f( ) = AR f( )prop MTFδ f( ) f 13 where13 MTFδ is13 the13 modulation13 transfer13 function13 associated13 with13 a13 particular13 disparity13 (see13 Methods)13 (b)13 Inter-shy‐ocular13 contrast13 signals13 that13 have13 been13 passed13 through13 a13 bank13 of13 1513 octave13 bandwidth13 log-shy‐Gabor13 filters13 The13 signals13 differ13 little13 above13 ~513 cpd13 (c)13 The13 effect13 of13 vergence13 jitter13 on13 inter-shy‐ocular13 contrast13 signals13 Signals13 were13 corrupted13 by13 213 arcmin13 of13 vergence13 jitter13 a13 typical13 value13 in13 humans13 Vergence13 noise13 was13 simulated13 by13 convolving13 each13 disparity13 signal13 with13 a13 Gaussian13 having13 a13 standard13 deviation13 of13 213 arcmin13 Vergence13 jitter13 further13 attenuates13 high13 frequency13 inter-shy‐ocular13 contrast13 signals13 Natural13 image13 structure13 and13 vergence13 jitter13 both13 contribute13 to13 the13 lack13 of13 disparity13 information13 at13 high13 luminance13 spatial13 frequencies13 (d)13 The13 signals13 in13 (c)13 after13 having13 been13 passed13 through13 a13 bank13 of13 1513 octave13 bandwidth13 log-shy‐Gabor13 filters13 (e)13 The13 effect13 of13 a13 four13 diopter13 difference13 in13 refractive13 power13 between13 the13 two13 eyes13 drastically13 reduces13 the13 inter-shy‐ocular13 contrast13 signal13 A13 four13 diopter13 (or13 larger)13 difference13 between13 the13 left13 and13 right13 optics13 is13 a13 primary13 risk13 factor13 for13 the13 development13 of13 amblyopia13 (Levi13 et13 al13 2011)13 Note13 that13 these13 signals13 were13 simulated13 for13 eyes13 with13 2mm13 pupils13 With13 larger13 pupil13 diameters13 the13 same13 difference13 in13 refractive13 power13 would13 further13 reduce13 available13 disparity13 signals13 For13 example13 with13 a13 2mm13 pupil13 an13 813 diopter13 difference13 between13 the13 eyes13 produces13 the13 same13 difference13 in13 defocus13 blur13 as13 would13 a13 4mm13 pupil13 with13 a13 413 diopters13 difference13 between13 the13 eyes13 In13 that13 case13 there13 are13 no13 measureable13 signals13 above13 113 cpd13 (f)13 The13 signals13 in13 (e)13 after13 having13 been13 passed13 through13 a13 bank13 of13 1513 octave13 bandwidth13 log-shy‐Gabor13 filters13 (g)13 Same13 as13 (e)13 but13 with13 a13 4mm13 pupil13 (h)13 Same13 as13 (f)13 but13 with13 a13 4mm13 pupil13 13

13

1513 13

13 Figure13 S813 Depth13 structure13 not13 attributable13 to13 slant13 in13 natural13 scenes13 (a)13 A13 representative13 natural13 range13 image13 (b)13 Overhead13 plan13 view13 of13 the13 range13 along13 the13 white13 scan-shy‐line13 in13 (a)13 (c)13 Range13 values13 within13 a13 113 deg13 patch13 marked13 by13 the13 white13 box13 in13 (a)13 and13 the13 black13 box13 in13 (b)13 The13 dots13 represent13 the13 raw13 range13 values13 The13 plane13 is13 the13 best13 fitting13 plane13 to13 the13 range13 data13 (d)13 Depth13 variation13 in13 113 deg13 patches13 of13 natural13 scenes13 can13 largely13 be13 captured13 by13 slanted13 planes13 (see13 supplementary13 text)13 Values13 near13 one13 indicate13 that13 all13 range13 variation13 can13 be13 captured13 by13 (planar)13 surface13 slant13 Values13 near13 zero13 indicate13 that13 none13 of13 the13 range13 variation13 can13 be13 captured13 by13 planar13 surface13 slant13 In13 the13 most13 common13 patch13 nearly13 all13 depth13 variation13 is13 attributable13 to13 slant13 In13 the13 median13 patch13 7513 of13 depth13 variation13 is13 captured13 by13 slant13 13 13 13 13 13 13 13 13 13 13

1613 13

13 Figure13 S913 Contrast13 normalization13 and13 the13 Gaussian13 form13 of13 the13 filter13 response13 distributions13 The13 standard13 model13 of13

contrast13 normalization13 in13 retinal13 and13 cortical13 neurons13 is13 given13 by13 cnorm x( ) = c x( ) c x( ) 2

+ nc502 (Albrecht13 amp13 Geisler13

199113 Albrecht13 amp13 Hamilton13 198213 Heeger13 1992)13 Throughout13 the13 paper13 we13 assumed13 that13 c50 13 had13 a13 value13 of13 zero13 To13 examine13 the13 effect13 of13 different13 values13 of13 c50 13 on13 the13 form13 of13 the13 filter13 response13 distributions13 we13 first13 contrast-shy‐normalized13 the13 test13 stimuli13 using13 different13 values13 of13 c50 13 Then13 we13 projected13 the13 contrast-shy‐normalized13 the13 test13 stimuli13 from13 each13 disparity13 level13 onto13 the13 filters13 to13 obtain13 the13 conditional13 filter13 response13 distributions13 Finally13 we13 fit13 a13 multi-shy‐dimensional13 generalized13 Gaussian13 (one-shy‐dimension13 for13 each13 filter)13 to13 each13 conditional13 response13 distribution13

P R |δ k( ) 13 via13 maximum13 likelihood13 estimation13 (a)13 When13 c50 =13 0013 the13 conditional13 response13 distributions13 have13 tails13 that13 tend13 to13 be13 somewhat13 lighter13 than13 Gaussians13 (ie13 lower13 kurtosis13 than13 a13 Gaussian)13 When13 c50 =13 0113 the13 conditional13 response13 distributions13 are13 most13 Gaussian13 on13 average13 For13 c50 0113 the13 distributions13 have13 heavier13 tails13 than13 Gaussians13 (ie13 higher13 kurtosis13 than13 a13 Gaussian)13 (b)13 Powers13 of13 the13 multi-shy‐dimensional13 generalized13 Gaussians13 that13 best13 fit13 a13 subset13 of13 the13 two-shy‐dimensional13 conditional13 filter13 response13 distributions13 A13 power13 of13 two13 (dashed13 line)13 indicates13 that13 the13 best13 fitting13 distribution13 was13 Gaussian13 Powers13 greater13 than13 or13 less13 than13 two13 indicate13 that13 the13 responses13 are13 better13 fit13 by13 distributions13 having13 lighter13 or13 heavier13 tails13 than13 a13 Gaussian13 respectively13 The13 best-shy‐fitting13 powers13 are13 shown13 for13 three13 different13 values13 of13 c50 13 00013 01013 and13 02513 The13 average13 powers13 across13 the13 conditional13 response13 distributions13 for13 the13 three13 tested13 values13 of13 c50 13 were13 27913 20713 and13 11713 respectively13 The13 same13 qualitative13 results13 hold13 for13 the13 other13 two-shy‐dimensional13 response13 distributions13 The13 pattern13 also13 holds13 for13 higher-shy‐dimensional13 filter13 response13 distributions13 In13 higher13 dimensions13 however13 results13 are13 noisier13 because13 stable13 estimates13 of13 kurtosis13 require13 significantly13 more13 data13 13 13 13 13 13

  • jovi-14-02-01_118
    • Introduction
    • f01
    • Methods and results
    • e01
    • f02
    • e02
    • e03
    • e04
    • f03
    • e05
    • f04
    • f05
    • f06
    • f07
    • Discussion
    • Conclusions
    • Methods details
    • e06
    • e07
    • Albrecht1
    • Albrecht2
    • Anzai1
    • Badcock1
    • Banks1
    • Berens1
    • Blakemore1
    • Burge1
    • Burge2
    • Carandini1
    • Chen1
    • Cormack1
    • Cumming1
    • Cumming2
    • Cumming3
    • Curcio1
    • DeValois1
    • DeAngelis1
    • Ferster1
    • Fincham1
    • Fleet1
    • Ganguli1
    • Geisler1
    • Girshick1
    • Haefner1
    • Haefner2
    • Harris1
    • Hecht1
    • Heeger1
    • Hibbard1
    • Hoyer1
    • Jazayeri1
    • Kaye1
    • Landers1
    • Lehky1
    • Levi1
    • Lewicki1
    • Li1
    • Liu1
    • Ma1
    • Marcos1
    • Marr1
    • McKee1
    • Nakayama1
    • Nienborg1
    • Ogle1
    • Ohzawa1
    • Ohzawa2
    • Ohzawa3
    • Olshausen1
    • Panum1
    • Poggio1
    • Qian1
    • Qian2
    • Read1
    • Read2
    • Read3
    • Simoncelli1
    • Skottun1
    • Stevenson1
    • Stockman1
    • Tanabe1
    • Tanabe2
    • Thibos1
    • Tsao1
    • Tyler1
    • Tyler2
    • Vlaskamp1
    • Williams1
    • Wyszecki1
      • JOV-03741-2013-s01
Page 30: Optimal disparity estimation in natural stereo images · The local differences between the retinal images—the binocular disparities— ... vertically oriented receptive fields.The

1213 13

13 Figure13 S513 Disparity13 tuning13 curves13 of13 the13 model13 complex13 cells13 implied13 by13 the13 AMA13 filters13 (on-shy‐diagonal)13 and13 their13 normalized13 pairwise13 sums13 (off-shy‐diagonal)13 Solid13 curves13 show13 the13 complex13 cell13 tuning13 curves13 to13 natural13 stimuli13 The13

complex13 response13 is13 given13 by13 filtering13 the13 contrast-shy‐normalized13 stimulus13 and13 then13 squaring13 Rmax fi

Tcnorm( )2 13 (see13 Fig13 S4)13 Each13 point13 on13 the13 curve13 represents13 the13 average13 response13 to13 a13 large13 number13 of13 natural13 stimuli13 having13 the13 same13 disparity13 Gray13 area13 shows13 response13 variability13 due13 to13 variation13 in13 natural13 image13 content13 Dashed13 curves13 show13 the13 complex13 cell13 tuning13 curve13 to13 513 random-shy‐dot13 stereograms13 (RDSs)13 In13 most13 cases13 RDS13 tuning13 curves13 are13 similar13 to13 the13 tuning13 curves13 for13 natural13 stimuli13 although13 response13 magnitude13 is13 generally13 somewhat13 attenuated13 These13 individual13 model13 complex13 cells13 have13 some13 selectivity13 for13 disparity13 However13 they13 exhibit13 little13 invariance13 to13 variation13 in13 natural13 image13 content13 Thus13 the13 pattern13 of13 the13 joint13 population13 response13 must13 be13 used13 to13 accurately13 estimate13 disparity13 Neurons13 that13 signal13 the13 log-shy‐likelihood13 of13 disparity13 (LL13 neurons)13 can13 be13 constructed13 by13 appropriately13 combining13 the13 joint13 responses13 of13 model13 complex13 cells13 LL13 neurons13 are13 highly13 selective13 for13 disparity13 and13 are13 also13 strongly13 response13 invariant13 to13 irrelevant13

1313 13

13 Figure13 S613 Optimal13 disparity13 estimation13 vs13 estimation13 via13 local13 cross-shy‐correlation13 Performance13 measures13 were13 obtained13 by13 testing13 each13 method13 on13 identical13 sets13 of13 left-shy‐13 and13 right-shy‐eye13 signals13 like13 those13 shown13 in13 Fig13 1d13 (Note13 that13 this13 differs13 somewhat13 from13 the13 most13 common13 test13 of13 the13 cross-shy‐correlation13 model13 which13 gives13 the13 algorithm13 windowed13 access13 to13 many13 patches13 in13 each13 eye13 We13 chose13 the13 current13 method13 so13 that13 each13 algorithm13 has13 access13 to13 the13 same13 image13 information13 thereby13 making13 the13 comparison13 meaningful)13 Black13 curves13 indicate13 estimation13 performance13 via13 local13 cross-shy‐correlation13 Gray13 curves13 indicate13 performance13 obtained13 with13 the13 eight13 AMA13 filters13 shown13 in13 the13 main13 text13 (data13 replotted13 from13 Fig13 5)13 (a)13 Median13 MAP13 estimates13 for13 frontoparallel13 surfaces13 only13 Error13 bars13 indicate13 6813 confidence13 intervals13 (b)13 Estimate13 precision13 as13 indicated13 by13 6813 confidence13 intervals13 Error13 bars13 from13 (a)13 are13 replotted13 on13 a13 linear13 axis13 (Confidence13 intervals13 were13 plotted13 on13 a13 semi-shy‐log13 plot13 in13 the13 main13 text)13 (c)13 Sign13 identification13 performance13 with13 frontoparallel13 surfaces13 (de)13 Same13 as13 bc13 except13 that13 the13 performance13 is13 for13 surfaces13 with13 a13 cosine13 distribution13 of13 slants13 Estimation13 performance13 with13 the13 cross-shy‐correlator13 is13 (slightly)13 more13 biased13 and13 is13 significantly13 less13 precise13 at13 disparities13 larger13 than13 7513 arcmin13 13

1413 13

13 Figure13 S713 Information13 for13 disparity13 estimation13 is13 concentrated13 in13 low13 to13 mid13 spatial13 frequencies13 (a)13 Inter-shy‐ocular13 retinal13 contrast13 amplitude13 of13 a13 cosine-shy‐windowed13 high-shy‐contrast13 edge13 as13 a13 function13 of13 spatial13 frequency13 for13 different13 disparities13 (Eq13 S6)13 The13 optics13 in13 the13 two13 eyes13 were13 identical13 ie13

AL f( ) = AR f( )prop MTFδ f( ) f 13 where13 MTFδ is13 the13 modulation13 transfer13 function13 associated13 with13 a13 particular13 disparity13 (see13 Methods)13 (b)13 Inter-shy‐ocular13 contrast13 signals13 that13 have13 been13 passed13 through13 a13 bank13 of13 1513 octave13 bandwidth13 log-shy‐Gabor13 filters13 The13 signals13 differ13 little13 above13 ~513 cpd13 (c)13 The13 effect13 of13 vergence13 jitter13 on13 inter-shy‐ocular13 contrast13 signals13 Signals13 were13 corrupted13 by13 213 arcmin13 of13 vergence13 jitter13 a13 typical13 value13 in13 humans13 Vergence13 noise13 was13 simulated13 by13 convolving13 each13 disparity13 signal13 with13 a13 Gaussian13 having13 a13 standard13 deviation13 of13 213 arcmin13 Vergence13 jitter13 further13 attenuates13 high13 frequency13 inter-shy‐ocular13 contrast13 signals13 Natural13 image13 structure13 and13 vergence13 jitter13 both13 contribute13 to13 the13 lack13 of13 disparity13 information13 at13 high13 luminance13 spatial13 frequencies13 (d)13 The13 signals13 in13 (c)13 after13 having13 been13 passed13 through13 a13 bank13 of13 1513 octave13 bandwidth13 log-shy‐Gabor13 filters13 (e)13 The13 effect13 of13 a13 four13 diopter13 difference13 in13 refractive13 power13 between13 the13 two13 eyes13 drastically13 reduces13 the13 inter-shy‐ocular13 contrast13 signal13 A13 four13 diopter13 (or13 larger)13 difference13 between13 the13 left13 and13 right13 optics13 is13 a13 primary13 risk13 factor13 for13 the13 development13 of13 amblyopia13 (Levi13 et13 al13 2011)13 Note13 that13 these13 signals13 were13 simulated13 for13 eyes13 with13 2mm13 pupils13 With13 larger13 pupil13 diameters13 the13 same13 difference13 in13 refractive13 power13 would13 further13 reduce13 available13 disparity13 signals13 For13 example13 with13 a13 2mm13 pupil13 an13 813 diopter13 difference13 between13 the13 eyes13 produces13 the13 same13 difference13 in13 defocus13 blur13 as13 would13 a13 4mm13 pupil13 with13 a13 413 diopters13 difference13 between13 the13 eyes13 In13 that13 case13 there13 are13 no13 measureable13 signals13 above13 113 cpd13 (f)13 The13 signals13 in13 (e)13 after13 having13 been13 passed13 through13 a13 bank13 of13 1513 octave13 bandwidth13 log-shy‐Gabor13 filters13 (g)13 Same13 as13 (e)13 but13 with13 a13 4mm13 pupil13 (h)13 Same13 as13 (f)13 but13 with13 a13 4mm13 pupil13 13

13

1513 13

13 Figure13 S813 Depth13 structure13 not13 attributable13 to13 slant13 in13 natural13 scenes13 (a)13 A13 representative13 natural13 range13 image13 (b)13 Overhead13 plan13 view13 of13 the13 range13 along13 the13 white13 scan-shy‐line13 in13 (a)13 (c)13 Range13 values13 within13 a13 113 deg13 patch13 marked13 by13 the13 white13 box13 in13 (a)13 and13 the13 black13 box13 in13 (b)13 The13 dots13 represent13 the13 raw13 range13 values13 The13 plane13 is13 the13 best13 fitting13 plane13 to13 the13 range13 data13 (d)13 Depth13 variation13 in13 113 deg13 patches13 of13 natural13 scenes13 can13 largely13 be13 captured13 by13 slanted13 planes13 (see13 supplementary13 text)13 Values13 near13 one13 indicate13 that13 all13 range13 variation13 can13 be13 captured13 by13 (planar)13 surface13 slant13 Values13 near13 zero13 indicate13 that13 none13 of13 the13 range13 variation13 can13 be13 captured13 by13 planar13 surface13 slant13 In13 the13 most13 common13 patch13 nearly13 all13 depth13 variation13 is13 attributable13 to13 slant13 In13 the13 median13 patch13 7513 of13 depth13 variation13 is13 captured13 by13 slant13 13 13 13 13 13 13 13 13 13 13

1613 13

13 Figure13 S913 Contrast13 normalization13 and13 the13 Gaussian13 form13 of13 the13 filter13 response13 distributions13 The13 standard13 model13 of13

contrast13 normalization13 in13 retinal13 and13 cortical13 neurons13 is13 given13 by13 cnorm x( ) = c x( ) c x( ) 2

+ nc502 (Albrecht13 amp13 Geisler13

199113 Albrecht13 amp13 Hamilton13 198213 Heeger13 1992)13 Throughout13 the13 paper13 we13 assumed13 that13 c50 13 had13 a13 value13 of13 zero13 To13 examine13 the13 effect13 of13 different13 values13 of13 c50 13 on13 the13 form13 of13 the13 filter13 response13 distributions13 we13 first13 contrast-shy‐normalized13 the13 test13 stimuli13 using13 different13 values13 of13 c50 13 Then13 we13 projected13 the13 contrast-shy‐normalized13 the13 test13 stimuli13 from13 each13 disparity13 level13 onto13 the13 filters13 to13 obtain13 the13 conditional13 filter13 response13 distributions13 Finally13 we13 fit13 a13 multi-shy‐dimensional13 generalized13 Gaussian13 (one-shy‐dimension13 for13 each13 filter)13 to13 each13 conditional13 response13 distribution13

P R |δ k( ) 13 via13 maximum13 likelihood13 estimation13 (a)13 When13 c50 =13 0013 the13 conditional13 response13 distributions13 have13 tails13 that13 tend13 to13 be13 somewhat13 lighter13 than13 Gaussians13 (ie13 lower13 kurtosis13 than13 a13 Gaussian)13 When13 c50 =13 0113 the13 conditional13 response13 distributions13 are13 most13 Gaussian13 on13 average13 For13 c50 0113 the13 distributions13 have13 heavier13 tails13 than13 Gaussians13 (ie13 higher13 kurtosis13 than13 a13 Gaussian)13 (b)13 Powers13 of13 the13 multi-shy‐dimensional13 generalized13 Gaussians13 that13 best13 fit13 a13 subset13 of13 the13 two-shy‐dimensional13 conditional13 filter13 response13 distributions13 A13 power13 of13 two13 (dashed13 line)13 indicates13 that13 the13 best13 fitting13 distribution13 was13 Gaussian13 Powers13 greater13 than13 or13 less13 than13 two13 indicate13 that13 the13 responses13 are13 better13 fit13 by13 distributions13 having13 lighter13 or13 heavier13 tails13 than13 a13 Gaussian13 respectively13 The13 best-shy‐fitting13 powers13 are13 shown13 for13 three13 different13 values13 of13 c50 13 00013 01013 and13 02513 The13 average13 powers13 across13 the13 conditional13 response13 distributions13 for13 the13 three13 tested13 values13 of13 c50 13 were13 27913 20713 and13 11713 respectively13 The13 same13 qualitative13 results13 hold13 for13 the13 other13 two-shy‐dimensional13 response13 distributions13 The13 pattern13 also13 holds13 for13 higher-shy‐dimensional13 filter13 response13 distributions13 In13 higher13 dimensions13 however13 results13 are13 noisier13 because13 stable13 estimates13 of13 kurtosis13 require13 significantly13 more13 data13 13 13 13 13 13

  • jovi-14-02-01_118
    • Introduction
    • f01
    • Methods and results
    • e01
    • f02
    • e02
    • e03
    • e04
    • f03
    • e05
    • f04
    • f05
    • f06
    • f07
    • Discussion
    • Conclusions
    • Methods details
    • e06
    • e07
    • Albrecht1
    • Albrecht2
    • Anzai1
    • Badcock1
    • Banks1
    • Berens1
    • Blakemore1
    • Burge1
    • Burge2
    • Carandini1
    • Chen1
    • Cormack1
    • Cumming1
    • Cumming2
    • Cumming3
    • Curcio1
    • DeValois1
    • DeAngelis1
    • Ferster1
    • Fincham1
    • Fleet1
    • Ganguli1
    • Geisler1
    • Girshick1
    • Haefner1
    • Haefner2
    • Harris1
    • Hecht1
    • Heeger1
    • Hibbard1
    • Hoyer1
    • Jazayeri1
    • Kaye1
    • Landers1
    • Lehky1
    • Levi1
    • Lewicki1
    • Li1
    • Liu1
    • Ma1
    • Marcos1
    • Marr1
    • McKee1
    • Nakayama1
    • Nienborg1
    • Ogle1
    • Ohzawa1
    • Ohzawa2
    • Ohzawa3
    • Olshausen1
    • Panum1
    • Poggio1
    • Qian1
    • Qian2
    • Read1
    • Read2
    • Read3
    • Simoncelli1
    • Skottun1
    • Stevenson1
    • Stockman1
    • Tanabe1
    • Tanabe2
    • Thibos1
    • Tsao1
    • Tyler1
    • Tyler2
    • Vlaskamp1
    • Williams1
    • Wyszecki1
      • JOV-03741-2013-s01
Page 31: Optimal disparity estimation in natural stereo images · The local differences between the retinal images—the binocular disparities— ... vertically oriented receptive fields.The

1313 13

13 Figure13 S613 Optimal13 disparity13 estimation13 vs13 estimation13 via13 local13 cross-shy‐correlation13 Performance13 measures13 were13 obtained13 by13 testing13 each13 method13 on13 identical13 sets13 of13 left-shy‐13 and13 right-shy‐eye13 signals13 like13 those13 shown13 in13 Fig13 1d13 (Note13 that13 this13 differs13 somewhat13 from13 the13 most13 common13 test13 of13 the13 cross-shy‐correlation13 model13 which13 gives13 the13 algorithm13 windowed13 access13 to13 many13 patches13 in13 each13 eye13 We13 chose13 the13 current13 method13 so13 that13 each13 algorithm13 has13 access13 to13 the13 same13 image13 information13 thereby13 making13 the13 comparison13 meaningful)13 Black13 curves13 indicate13 estimation13 performance13 via13 local13 cross-shy‐correlation13 Gray13 curves13 indicate13 performance13 obtained13 with13 the13 eight13 AMA13 filters13 shown13 in13 the13 main13 text13 (data13 replotted13 from13 Fig13 5)13 (a)13 Median13 MAP13 estimates13 for13 frontoparallel13 surfaces13 only13 Error13 bars13 indicate13 6813 confidence13 intervals13 (b)13 Estimate13 precision13 as13 indicated13 by13 6813 confidence13 intervals13 Error13 bars13 from13 (a)13 are13 replotted13 on13 a13 linear13 axis13 (Confidence13 intervals13 were13 plotted13 on13 a13 semi-shy‐log13 plot13 in13 the13 main13 text)13 (c)13 Sign13 identification13 performance13 with13 frontoparallel13 surfaces13 (de)13 Same13 as13 bc13 except13 that13 the13 performance13 is13 for13 surfaces13 with13 a13 cosine13 distribution13 of13 slants13 Estimation13 performance13 with13 the13 cross-shy‐correlator13 is13 (slightly)13 more13 biased13 and13 is13 significantly13 less13 precise13 at13 disparities13 larger13 than13 7513 arcmin13 13

1413 13

13 Figure13 S713 Information13 for13 disparity13 estimation13 is13 concentrated13 in13 low13 to13 mid13 spatial13 frequencies13 (a)13 Inter-shy‐ocular13 retinal13 contrast13 amplitude13 of13 a13 cosine-shy‐windowed13 high-shy‐contrast13 edge13 as13 a13 function13 of13 spatial13 frequency13 for13 different13 disparities13 (Eq13 S6)13 The13 optics13 in13 the13 two13 eyes13 were13 identical13 ie13

AL f( ) = AR f( )prop MTFδ f( ) f 13 where13 MTFδ is13 the13 modulation13 transfer13 function13 associated13 with13 a13 particular13 disparity13 (see13 Methods)13 (b)13 Inter-shy‐ocular13 contrast13 signals13 that13 have13 been13 passed13 through13 a13 bank13 of13 1513 octave13 bandwidth13 log-shy‐Gabor13 filters13 The13 signals13 differ13 little13 above13 ~513 cpd13 (c)13 The13 effect13 of13 vergence13 jitter13 on13 inter-shy‐ocular13 contrast13 signals13 Signals13 were13 corrupted13 by13 213 arcmin13 of13 vergence13 jitter13 a13 typical13 value13 in13 humans13 Vergence13 noise13 was13 simulated13 by13 convolving13 each13 disparity13 signal13 with13 a13 Gaussian13 having13 a13 standard13 deviation13 of13 213 arcmin13 Vergence13 jitter13 further13 attenuates13 high13 frequency13 inter-shy‐ocular13 contrast13 signals13 Natural13 image13 structure13 and13 vergence13 jitter13 both13 contribute13 to13 the13 lack13 of13 disparity13 information13 at13 high13 luminance13 spatial13 frequencies13 (d)13 The13 signals13 in13 (c)13 after13 having13 been13 passed13 through13 a13 bank13 of13 1513 octave13 bandwidth13 log-shy‐Gabor13 filters13 (e)13 The13 effect13 of13 a13 four13 diopter13 difference13 in13 refractive13 power13 between13 the13 two13 eyes13 drastically13 reduces13 the13 inter-shy‐ocular13 contrast13 signal13 A13 four13 diopter13 (or13 larger)13 difference13 between13 the13 left13 and13 right13 optics13 is13 a13 primary13 risk13 factor13 for13 the13 development13 of13 amblyopia13 (Levi13 et13 al13 2011)13 Note13 that13 these13 signals13 were13 simulated13 for13 eyes13 with13 2mm13 pupils13 With13 larger13 pupil13 diameters13 the13 same13 difference13 in13 refractive13 power13 would13 further13 reduce13 available13 disparity13 signals13 For13 example13 with13 a13 2mm13 pupil13 an13 813 diopter13 difference13 between13 the13 eyes13 produces13 the13 same13 difference13 in13 defocus13 blur13 as13 would13 a13 4mm13 pupil13 with13 a13 413 diopters13 difference13 between13 the13 eyes13 In13 that13 case13 there13 are13 no13 measureable13 signals13 above13 113 cpd13 (f)13 The13 signals13 in13 (e)13 after13 having13 been13 passed13 through13 a13 bank13 of13 1513 octave13 bandwidth13 log-shy‐Gabor13 filters13 (g)13 Same13 as13 (e)13 but13 with13 a13 4mm13 pupil13 (h)13 Same13 as13 (f)13 but13 with13 a13 4mm13 pupil13 13

13

1513 13

13 Figure13 S813 Depth13 structure13 not13 attributable13 to13 slant13 in13 natural13 scenes13 (a)13 A13 representative13 natural13 range13 image13 (b)13 Overhead13 plan13 view13 of13 the13 range13 along13 the13 white13 scan-shy‐line13 in13 (a)13 (c)13 Range13 values13 within13 a13 113 deg13 patch13 marked13 by13 the13 white13 box13 in13 (a)13 and13 the13 black13 box13 in13 (b)13 The13 dots13 represent13 the13 raw13 range13 values13 The13 plane13 is13 the13 best13 fitting13 plane13 to13 the13 range13 data13 (d)13 Depth13 variation13 in13 113 deg13 patches13 of13 natural13 scenes13 can13 largely13 be13 captured13 by13 slanted13 planes13 (see13 supplementary13 text)13 Values13 near13 one13 indicate13 that13 all13 range13 variation13 can13 be13 captured13 by13 (planar)13 surface13 slant13 Values13 near13 zero13 indicate13 that13 none13 of13 the13 range13 variation13 can13 be13 captured13 by13 planar13 surface13 slant13 In13 the13 most13 common13 patch13 nearly13 all13 depth13 variation13 is13 attributable13 to13 slant13 In13 the13 median13 patch13 7513 of13 depth13 variation13 is13 captured13 by13 slant13 13 13 13 13 13 13 13 13 13 13

1613 13

13 Figure13 S913 Contrast13 normalization13 and13 the13 Gaussian13 form13 of13 the13 filter13 response13 distributions13 The13 standard13 model13 of13

contrast13 normalization13 in13 retinal13 and13 cortical13 neurons13 is13 given13 by13 cnorm x( ) = c x( ) c x( ) 2

+ nc502 (Albrecht13 amp13 Geisler13

199113 Albrecht13 amp13 Hamilton13 198213 Heeger13 1992)13 Throughout13 the13 paper13 we13 assumed13 that13 c50 13 had13 a13 value13 of13 zero13 To13 examine13 the13 effect13 of13 different13 values13 of13 c50 13 on13 the13 form13 of13 the13 filter13 response13 distributions13 we13 first13 contrast-shy‐normalized13 the13 test13 stimuli13 using13 different13 values13 of13 c50 13 Then13 we13 projected13 the13 contrast-shy‐normalized13 the13 test13 stimuli13 from13 each13 disparity13 level13 onto13 the13 filters13 to13 obtain13 the13 conditional13 filter13 response13 distributions13 Finally13 we13 fit13 a13 multi-shy‐dimensional13 generalized13 Gaussian13 (one-shy‐dimension13 for13 each13 filter)13 to13 each13 conditional13 response13 distribution13

P R |δ k( ) 13 via13 maximum13 likelihood13 estimation13 (a)13 When13 c50 =13 0013 the13 conditional13 response13 distributions13 have13 tails13 that13 tend13 to13 be13 somewhat13 lighter13 than13 Gaussians13 (ie13 lower13 kurtosis13 than13 a13 Gaussian)13 When13 c50 =13 0113 the13 conditional13 response13 distributions13 are13 most13 Gaussian13 on13 average13 For13 c50 0113 the13 distributions13 have13 heavier13 tails13 than13 Gaussians13 (ie13 higher13 kurtosis13 than13 a13 Gaussian)13 (b)13 Powers13 of13 the13 multi-shy‐dimensional13 generalized13 Gaussians13 that13 best13 fit13 a13 subset13 of13 the13 two-shy‐dimensional13 conditional13 filter13 response13 distributions13 A13 power13 of13 two13 (dashed13 line)13 indicates13 that13 the13 best13 fitting13 distribution13 was13 Gaussian13 Powers13 greater13 than13 or13 less13 than13 two13 indicate13 that13 the13 responses13 are13 better13 fit13 by13 distributions13 having13 lighter13 or13 heavier13 tails13 than13 a13 Gaussian13 respectively13 The13 best-shy‐fitting13 powers13 are13 shown13 for13 three13 different13 values13 of13 c50 13 00013 01013 and13 02513 The13 average13 powers13 across13 the13 conditional13 response13 distributions13 for13 the13 three13 tested13 values13 of13 c50 13 were13 27913 20713 and13 11713 respectively13 The13 same13 qualitative13 results13 hold13 for13 the13 other13 two-shy‐dimensional13 response13 distributions13 The13 pattern13 also13 holds13 for13 higher-shy‐dimensional13 filter13 response13 distributions13 In13 higher13 dimensions13 however13 results13 are13 noisier13 because13 stable13 estimates13 of13 kurtosis13 require13 significantly13 more13 data13 13 13 13 13 13

  • jovi-14-02-01_118
    • Introduction
    • f01
    • Methods and results
    • e01
    • f02
    • e02
    • e03
    • e04
    • f03
    • e05
    • f04
    • f05
    • f06
    • f07
    • Discussion
    • Conclusions
    • Methods details
    • e06
    • e07
    • Albrecht1
    • Albrecht2
    • Anzai1
    • Badcock1
    • Banks1
    • Berens1
    • Blakemore1
    • Burge1
    • Burge2
    • Carandini1
    • Chen1
    • Cormack1
    • Cumming1
    • Cumming2
    • Cumming3
    • Curcio1
    • DeValois1
    • DeAngelis1
    • Ferster1
    • Fincham1
    • Fleet1
    • Ganguli1
    • Geisler1
    • Girshick1
    • Haefner1
    • Haefner2
    • Harris1
    • Hecht1
    • Heeger1
    • Hibbard1
    • Hoyer1
    • Jazayeri1
    • Kaye1
    • Landers1
    • Lehky1
    • Levi1
    • Lewicki1
    • Li1
    • Liu1
    • Ma1
    • Marcos1
    • Marr1
    • McKee1
    • Nakayama1
    • Nienborg1
    • Ogle1
    • Ohzawa1
    • Ohzawa2
    • Ohzawa3
    • Olshausen1
    • Panum1
    • Poggio1
    • Qian1
    • Qian2
    • Read1
    • Read2
    • Read3
    • Simoncelli1
    • Skottun1
    • Stevenson1
    • Stockman1
    • Tanabe1
    • Tanabe2
    • Thibos1
    • Tsao1
    • Tyler1
    • Tyler2
    • Vlaskamp1
    • Williams1
    • Wyszecki1
      • JOV-03741-2013-s01
Page 32: Optimal disparity estimation in natural stereo images · The local differences between the retinal images—the binocular disparities— ... vertically oriented receptive fields.The

1413 13

13 Figure13 S713 Information13 for13 disparity13 estimation13 is13 concentrated13 in13 low13 to13 mid13 spatial13 frequencies13 (a)13 Inter-shy‐ocular13 retinal13 contrast13 amplitude13 of13 a13 cosine-shy‐windowed13 high-shy‐contrast13 edge13 as13 a13 function13 of13 spatial13 frequency13 for13 different13 disparities13 (Eq13 S6)13 The13 optics13 in13 the13 two13 eyes13 were13 identical13 ie13

AL f( ) = AR f( )prop MTFδ f( ) f 13 where13 MTFδ is13 the13 modulation13 transfer13 function13 associated13 with13 a13 particular13 disparity13 (see13 Methods)13 (b)13 Inter-shy‐ocular13 contrast13 signals13 that13 have13 been13 passed13 through13 a13 bank13 of13 1513 octave13 bandwidth13 log-shy‐Gabor13 filters13 The13 signals13 differ13 little13 above13 ~513 cpd13 (c)13 The13 effect13 of13 vergence13 jitter13 on13 inter-shy‐ocular13 contrast13 signals13 Signals13 were13 corrupted13 by13 213 arcmin13 of13 vergence13 jitter13 a13 typical13 value13 in13 humans13 Vergence13 noise13 was13 simulated13 by13 convolving13 each13 disparity13 signal13 with13 a13 Gaussian13 having13 a13 standard13 deviation13 of13 213 arcmin13 Vergence13 jitter13 further13 attenuates13 high13 frequency13 inter-shy‐ocular13 contrast13 signals13 Natural13 image13 structure13 and13 vergence13 jitter13 both13 contribute13 to13 the13 lack13 of13 disparity13 information13 at13 high13 luminance13 spatial13 frequencies13 (d)13 The13 signals13 in13 (c)13 after13 having13 been13 passed13 through13 a13 bank13 of13 1513 octave13 bandwidth13 log-shy‐Gabor13 filters13 (e)13 The13 effect13 of13 a13 four13 diopter13 difference13 in13 refractive13 power13 between13 the13 two13 eyes13 drastically13 reduces13 the13 inter-shy‐ocular13 contrast13 signal13 A13 four13 diopter13 (or13 larger)13 difference13 between13 the13 left13 and13 right13 optics13 is13 a13 primary13 risk13 factor13 for13 the13 development13 of13 amblyopia13 (Levi13 et13 al13 2011)13 Note13 that13 these13 signals13 were13 simulated13 for13 eyes13 with13 2mm13 pupils13 With13 larger13 pupil13 diameters13 the13 same13 difference13 in13 refractive13 power13 would13 further13 reduce13 available13 disparity13 signals13 For13 example13 with13 a13 2mm13 pupil13 an13 813 diopter13 difference13 between13 the13 eyes13 produces13 the13 same13 difference13 in13 defocus13 blur13 as13 would13 a13 4mm13 pupil13 with13 a13 413 diopters13 difference13 between13 the13 eyes13 In13 that13 case13 there13 are13 no13 measureable13 signals13 above13 113 cpd13 (f)13 The13 signals13 in13 (e)13 after13 having13 been13 passed13 through13 a13 bank13 of13 1513 octave13 bandwidth13 log-shy‐Gabor13 filters13 (g)13 Same13 as13 (e)13 but13 with13 a13 4mm13 pupil13 (h)13 Same13 as13 (f)13 but13 with13 a13 4mm13 pupil13 13

13

1513 13

13 Figure13 S813 Depth13 structure13 not13 attributable13 to13 slant13 in13 natural13 scenes13 (a)13 A13 representative13 natural13 range13 image13 (b)13 Overhead13 plan13 view13 of13 the13 range13 along13 the13 white13 scan-shy‐line13 in13 (a)13 (c)13 Range13 values13 within13 a13 113 deg13 patch13 marked13 by13 the13 white13 box13 in13 (a)13 and13 the13 black13 box13 in13 (b)13 The13 dots13 represent13 the13 raw13 range13 values13 The13 plane13 is13 the13 best13 fitting13 plane13 to13 the13 range13 data13 (d)13 Depth13 variation13 in13 113 deg13 patches13 of13 natural13 scenes13 can13 largely13 be13 captured13 by13 slanted13 planes13 (see13 supplementary13 text)13 Values13 near13 one13 indicate13 that13 all13 range13 variation13 can13 be13 captured13 by13 (planar)13 surface13 slant13 Values13 near13 zero13 indicate13 that13 none13 of13 the13 range13 variation13 can13 be13 captured13 by13 planar13 surface13 slant13 In13 the13 most13 common13 patch13 nearly13 all13 depth13 variation13 is13 attributable13 to13 slant13 In13 the13 median13 patch13 7513 of13 depth13 variation13 is13 captured13 by13 slant13 13 13 13 13 13 13 13 13 13 13

1613 13

13 Figure13 S913 Contrast13 normalization13 and13 the13 Gaussian13 form13 of13 the13 filter13 response13 distributions13 The13 standard13 model13 of13

contrast13 normalization13 in13 retinal13 and13 cortical13 neurons13 is13 given13 by13 cnorm x( ) = c x( ) c x( ) 2

+ nc502 (Albrecht13 amp13 Geisler13

199113 Albrecht13 amp13 Hamilton13 198213 Heeger13 1992)13 Throughout13 the13 paper13 we13 assumed13 that13 c50 13 had13 a13 value13 of13 zero13 To13 examine13 the13 effect13 of13 different13 values13 of13 c50 13 on13 the13 form13 of13 the13 filter13 response13 distributions13 we13 first13 contrast-shy‐normalized13 the13 test13 stimuli13 using13 different13 values13 of13 c50 13 Then13 we13 projected13 the13 contrast-shy‐normalized13 the13 test13 stimuli13 from13 each13 disparity13 level13 onto13 the13 filters13 to13 obtain13 the13 conditional13 filter13 response13 distributions13 Finally13 we13 fit13 a13 multi-shy‐dimensional13 generalized13 Gaussian13 (one-shy‐dimension13 for13 each13 filter)13 to13 each13 conditional13 response13 distribution13

P R |δ k( ) 13 via13 maximum13 likelihood13 estimation13 (a)13 When13 c50 =13 0013 the13 conditional13 response13 distributions13 have13 tails13 that13 tend13 to13 be13 somewhat13 lighter13 than13 Gaussians13 (ie13 lower13 kurtosis13 than13 a13 Gaussian)13 When13 c50 =13 0113 the13 conditional13 response13 distributions13 are13 most13 Gaussian13 on13 average13 For13 c50 0113 the13 distributions13 have13 heavier13 tails13 than13 Gaussians13 (ie13 higher13 kurtosis13 than13 a13 Gaussian)13 (b)13 Powers13 of13 the13 multi-shy‐dimensional13 generalized13 Gaussians13 that13 best13 fit13 a13 subset13 of13 the13 two-shy‐dimensional13 conditional13 filter13 response13 distributions13 A13 power13 of13 two13 (dashed13 line)13 indicates13 that13 the13 best13 fitting13 distribution13 was13 Gaussian13 Powers13 greater13 than13 or13 less13 than13 two13 indicate13 that13 the13 responses13 are13 better13 fit13 by13 distributions13 having13 lighter13 or13 heavier13 tails13 than13 a13 Gaussian13 respectively13 The13 best-shy‐fitting13 powers13 are13 shown13 for13 three13 different13 values13 of13 c50 13 00013 01013 and13 02513 The13 average13 powers13 across13 the13 conditional13 response13 distributions13 for13 the13 three13 tested13 values13 of13 c50 13 were13 27913 20713 and13 11713 respectively13 The13 same13 qualitative13 results13 hold13 for13 the13 other13 two-shy‐dimensional13 response13 distributions13 The13 pattern13 also13 holds13 for13 higher-shy‐dimensional13 filter13 response13 distributions13 In13 higher13 dimensions13 however13 results13 are13 noisier13 because13 stable13 estimates13 of13 kurtosis13 require13 significantly13 more13 data13 13 13 13 13 13

  • jovi-14-02-01_118
    • Introduction
    • f01
    • Methods and results
    • e01
    • f02
    • e02
    • e03
    • e04
    • f03
    • e05
    • f04
    • f05
    • f06
    • f07
    • Discussion
    • Conclusions
    • Methods details
    • e06
    • e07
    • Albrecht1
    • Albrecht2
    • Anzai1
    • Badcock1
    • Banks1
    • Berens1
    • Blakemore1
    • Burge1
    • Burge2
    • Carandini1
    • Chen1
    • Cormack1
    • Cumming1
    • Cumming2
    • Cumming3
    • Curcio1
    • DeValois1
    • DeAngelis1
    • Ferster1
    • Fincham1
    • Fleet1
    • Ganguli1
    • Geisler1
    • Girshick1
    • Haefner1
    • Haefner2
    • Harris1
    • Hecht1
    • Heeger1
    • Hibbard1
    • Hoyer1
    • Jazayeri1
    • Kaye1
    • Landers1
    • Lehky1
    • Levi1
    • Lewicki1
    • Li1
    • Liu1
    • Ma1
    • Marcos1
    • Marr1
    • McKee1
    • Nakayama1
    • Nienborg1
    • Ogle1
    • Ohzawa1
    • Ohzawa2
    • Ohzawa3
    • Olshausen1
    • Panum1
    • Poggio1
    • Qian1
    • Qian2
    • Read1
    • Read2
    • Read3
    • Simoncelli1
    • Skottun1
    • Stevenson1
    • Stockman1
    • Tanabe1
    • Tanabe2
    • Thibos1
    • Tsao1
    • Tyler1
    • Tyler2
    • Vlaskamp1
    • Williams1
    • Wyszecki1
      • JOV-03741-2013-s01
Page 33: Optimal disparity estimation in natural stereo images · The local differences between the retinal images—the binocular disparities— ... vertically oriented receptive fields.The

1513 13

13 Figure13 S813 Depth13 structure13 not13 attributable13 to13 slant13 in13 natural13 scenes13 (a)13 A13 representative13 natural13 range13 image13 (b)13 Overhead13 plan13 view13 of13 the13 range13 along13 the13 white13 scan-shy‐line13 in13 (a)13 (c)13 Range13 values13 within13 a13 113 deg13 patch13 marked13 by13 the13 white13 box13 in13 (a)13 and13 the13 black13 box13 in13 (b)13 The13 dots13 represent13 the13 raw13 range13 values13 The13 plane13 is13 the13 best13 fitting13 plane13 to13 the13 range13 data13 (d)13 Depth13 variation13 in13 113 deg13 patches13 of13 natural13 scenes13 can13 largely13 be13 captured13 by13 slanted13 planes13 (see13 supplementary13 text)13 Values13 near13 one13 indicate13 that13 all13 range13 variation13 can13 be13 captured13 by13 (planar)13 surface13 slant13 Values13 near13 zero13 indicate13 that13 none13 of13 the13 range13 variation13 can13 be13 captured13 by13 planar13 surface13 slant13 In13 the13 most13 common13 patch13 nearly13 all13 depth13 variation13 is13 attributable13 to13 slant13 In13 the13 median13 patch13 7513 of13 depth13 variation13 is13 captured13 by13 slant13 13 13 13 13 13 13 13 13 13 13

1613 13

13 Figure13 S913 Contrast13 normalization13 and13 the13 Gaussian13 form13 of13 the13 filter13 response13 distributions13 The13 standard13 model13 of13

contrast13 normalization13 in13 retinal13 and13 cortical13 neurons13 is13 given13 by13 cnorm x( ) = c x( ) c x( ) 2

+ nc502 (Albrecht13 amp13 Geisler13

199113 Albrecht13 amp13 Hamilton13 198213 Heeger13 1992)13 Throughout13 the13 paper13 we13 assumed13 that13 c50 13 had13 a13 value13 of13 zero13 To13 examine13 the13 effect13 of13 different13 values13 of13 c50 13 on13 the13 form13 of13 the13 filter13 response13 distributions13 we13 first13 contrast-shy‐normalized13 the13 test13 stimuli13 using13 different13 values13 of13 c50 13 Then13 we13 projected13 the13 contrast-shy‐normalized13 the13 test13 stimuli13 from13 each13 disparity13 level13 onto13 the13 filters13 to13 obtain13 the13 conditional13 filter13 response13 distributions13 Finally13 we13 fit13 a13 multi-shy‐dimensional13 generalized13 Gaussian13 (one-shy‐dimension13 for13 each13 filter)13 to13 each13 conditional13 response13 distribution13

P R |δ k( ) 13 via13 maximum13 likelihood13 estimation13 (a)13 When13 c50 =13 0013 the13 conditional13 response13 distributions13 have13 tails13 that13 tend13 to13 be13 somewhat13 lighter13 than13 Gaussians13 (ie13 lower13 kurtosis13 than13 a13 Gaussian)13 When13 c50 =13 0113 the13 conditional13 response13 distributions13 are13 most13 Gaussian13 on13 average13 For13 c50 0113 the13 distributions13 have13 heavier13 tails13 than13 Gaussians13 (ie13 higher13 kurtosis13 than13 a13 Gaussian)13 (b)13 Powers13 of13 the13 multi-shy‐dimensional13 generalized13 Gaussians13 that13 best13 fit13 a13 subset13 of13 the13 two-shy‐dimensional13 conditional13 filter13 response13 distributions13 A13 power13 of13 two13 (dashed13 line)13 indicates13 that13 the13 best13 fitting13 distribution13 was13 Gaussian13 Powers13 greater13 than13 or13 less13 than13 two13 indicate13 that13 the13 responses13 are13 better13 fit13 by13 distributions13 having13 lighter13 or13 heavier13 tails13 than13 a13 Gaussian13 respectively13 The13 best-shy‐fitting13 powers13 are13 shown13 for13 three13 different13 values13 of13 c50 13 00013 01013 and13 02513 The13 average13 powers13 across13 the13 conditional13 response13 distributions13 for13 the13 three13 tested13 values13 of13 c50 13 were13 27913 20713 and13 11713 respectively13 The13 same13 qualitative13 results13 hold13 for13 the13 other13 two-shy‐dimensional13 response13 distributions13 The13 pattern13 also13 holds13 for13 higher-shy‐dimensional13 filter13 response13 distributions13 In13 higher13 dimensions13 however13 results13 are13 noisier13 because13 stable13 estimates13 of13 kurtosis13 require13 significantly13 more13 data13 13 13 13 13 13

  • jovi-14-02-01_118
    • Introduction
    • f01
    • Methods and results
    • e01
    • f02
    • e02
    • e03
    • e04
    • f03
    • e05
    • f04
    • f05
    • f06
    • f07
    • Discussion
    • Conclusions
    • Methods details
    • e06
    • e07
    • Albrecht1
    • Albrecht2
    • Anzai1
    • Badcock1
    • Banks1
    • Berens1
    • Blakemore1
    • Burge1
    • Burge2
    • Carandini1
    • Chen1
    • Cormack1
    • Cumming1
    • Cumming2
    • Cumming3
    • Curcio1
    • DeValois1
    • DeAngelis1
    • Ferster1
    • Fincham1
    • Fleet1
    • Ganguli1
    • Geisler1
    • Girshick1
    • Haefner1
    • Haefner2
    • Harris1
    • Hecht1
    • Heeger1
    • Hibbard1
    • Hoyer1
    • Jazayeri1
    • Kaye1
    • Landers1
    • Lehky1
    • Levi1
    • Lewicki1
    • Li1
    • Liu1
    • Ma1
    • Marcos1
    • Marr1
    • McKee1
    • Nakayama1
    • Nienborg1
    • Ogle1
    • Ohzawa1
    • Ohzawa2
    • Ohzawa3
    • Olshausen1
    • Panum1
    • Poggio1
    • Qian1
    • Qian2
    • Read1
    • Read2
    • Read3
    • Simoncelli1
    • Skottun1
    • Stevenson1
    • Stockman1
    • Tanabe1
    • Tanabe2
    • Thibos1
    • Tsao1
    • Tyler1
    • Tyler2
    • Vlaskamp1
    • Williams1
    • Wyszecki1
      • JOV-03741-2013-s01
Page 34: Optimal disparity estimation in natural stereo images · The local differences between the retinal images—the binocular disparities— ... vertically oriented receptive fields.The

1613 13

13 Figure13 S913 Contrast13 normalization13 and13 the13 Gaussian13 form13 of13 the13 filter13 response13 distributions13 The13 standard13 model13 of13

contrast13 normalization13 in13 retinal13 and13 cortical13 neurons13 is13 given13 by13 cnorm x( ) = c x( ) c x( ) 2

+ nc502 (Albrecht13 amp13 Geisler13

199113 Albrecht13 amp13 Hamilton13 198213 Heeger13 1992)13 Throughout13 the13 paper13 we13 assumed13 that13 c50 13 had13 a13 value13 of13 zero13 To13 examine13 the13 effect13 of13 different13 values13 of13 c50 13 on13 the13 form13 of13 the13 filter13 response13 distributions13 we13 first13 contrast-shy‐normalized13 the13 test13 stimuli13 using13 different13 values13 of13 c50 13 Then13 we13 projected13 the13 contrast-shy‐normalized13 the13 test13 stimuli13 from13 each13 disparity13 level13 onto13 the13 filters13 to13 obtain13 the13 conditional13 filter13 response13 distributions13 Finally13 we13 fit13 a13 multi-shy‐dimensional13 generalized13 Gaussian13 (one-shy‐dimension13 for13 each13 filter)13 to13 each13 conditional13 response13 distribution13

P R |δ k( ) 13 via13 maximum13 likelihood13 estimation13 (a)13 When13 c50 =13 0013 the13 conditional13 response13 distributions13 have13 tails13 that13 tend13 to13 be13 somewhat13 lighter13 than13 Gaussians13 (ie13 lower13 kurtosis13 than13 a13 Gaussian)13 When13 c50 =13 0113 the13 conditional13 response13 distributions13 are13 most13 Gaussian13 on13 average13 For13 c50 0113 the13 distributions13 have13 heavier13 tails13 than13 Gaussians13 (ie13 higher13 kurtosis13 than13 a13 Gaussian)13 (b)13 Powers13 of13 the13 multi-shy‐dimensional13 generalized13 Gaussians13 that13 best13 fit13 a13 subset13 of13 the13 two-shy‐dimensional13 conditional13 filter13 response13 distributions13 A13 power13 of13 two13 (dashed13 line)13 indicates13 that13 the13 best13 fitting13 distribution13 was13 Gaussian13 Powers13 greater13 than13 or13 less13 than13 two13 indicate13 that13 the13 responses13 are13 better13 fit13 by13 distributions13 having13 lighter13 or13 heavier13 tails13 than13 a13 Gaussian13 respectively13 The13 best-shy‐fitting13 powers13 are13 shown13 for13 three13 different13 values13 of13 c50 13 00013 01013 and13 02513 The13 average13 powers13 across13 the13 conditional13 response13 distributions13 for13 the13 three13 tested13 values13 of13 c50 13 were13 27913 20713 and13 11713 respectively13 The13 same13 qualitative13 results13 hold13 for13 the13 other13 two-shy‐dimensional13 response13 distributions13 The13 pattern13 also13 holds13 for13 higher-shy‐dimensional13 filter13 response13 distributions13 In13 higher13 dimensions13 however13 results13 are13 noisier13 because13 stable13 estimates13 of13 kurtosis13 require13 significantly13 more13 data13 13 13 13 13 13

  • jovi-14-02-01_118
    • Introduction
    • f01
    • Methods and results
    • e01
    • f02
    • e02
    • e03
    • e04
    • f03
    • e05
    • f04
    • f05
    • f06
    • f07
    • Discussion
    • Conclusions
    • Methods details
    • e06
    • e07
    • Albrecht1
    • Albrecht2
    • Anzai1
    • Badcock1
    • Banks1
    • Berens1
    • Blakemore1
    • Burge1
    • Burge2
    • Carandini1
    • Chen1
    • Cormack1
    • Cumming1
    • Cumming2
    • Cumming3
    • Curcio1
    • DeValois1
    • DeAngelis1
    • Ferster1
    • Fincham1
    • Fleet1
    • Ganguli1
    • Geisler1
    • Girshick1
    • Haefner1
    • Haefner2
    • Harris1
    • Hecht1
    • Heeger1
    • Hibbard1
    • Hoyer1
    • Jazayeri1
    • Kaye1
    • Landers1
    • Lehky1
    • Levi1
    • Lewicki1
    • Li1
    • Liu1
    • Ma1
    • Marcos1
    • Marr1
    • McKee1
    • Nakayama1
    • Nienborg1
    • Ogle1
    • Ohzawa1
    • Ohzawa2
    • Ohzawa3
    • Olshausen1
    • Panum1
    • Poggio1
    • Qian1
    • Qian2
    • Read1
    • Read2
    • Read3
    • Simoncelli1
    • Skottun1
    • Stevenson1
    • Stockman1
    • Tanabe1
    • Tanabe2
    • Thibos1
    • Tsao1
    • Tyler1
    • Tyler2
    • Vlaskamp1
    • Williams1
    • Wyszecki1
      • JOV-03741-2013-s01