ieee transactions on pattern analysis and machine...

12
A New Image Representation Algorithm Inspired by Image Submodality Models, Redundancy Reduction, and Learning in Biological Vision Nikhil Balakrishnan, Karthik Hariharakrishnan, and Dan Schonfeld, Senior Member, IEEE Abstract—We develop a new biologically motivated algorithm for representing natural images using successive projections into complementary subspaces. An image is first projected into an edge subspace spanned using an ICA basis adapted to natural images which captures the sharp features of an image like edges and curves. The residual image obtained after extraction of the sharp image features is approximated using a mixture of probabilistic principal component analyzers (MPPCA) model. The model is consistent with cellular, functional, information theoretic, and learning paradigms in visual pathway modeling. We demonstrate the efficiency of our model for representing different attributes of natural images like color and luminance. We compare the performance of our model in terms of quality of representation against commonly used basis, like the discrete cosine transform (DCT), independent component analysis (ICA), and principal components analysis (PCA), based on their entropies. Chrominance and luminance components of images are represented using codes having lower entropy than DCT, ICA, or PCA for similar visual quality. The model attains considerable simplification for learning from images by using a sparse independent code for representing edges and explicitly evaluating probabilities in the residual subspace. Index Terms—Computer vision, feature representation, statistical models, clustering algorithms, machine learning, color. æ 1 INTRODUCTION T HE human visual system has developed efficient coding strategies to represent natural images, given the fact that we are able to make sense of millions of bytes of data every day in a seemingly effortless manner [1]. Visual information is transduced by the rods and cones of the retina and carried to the brain by the optic nerve which are the axons of cells in the ganglionic layer. The fibers synapse in the lateral geniculate body of the thalamus before terminating in the primary visual cortex or V1 [2]. Sensory information is represented using the output of an array of neurons. The type of stimulus is inferred from the location and amplitude of neurons that show maximal activity. Encoding information using the activity of a large number of neurons is known as population coding [3]. Mathematical models of the visual system have been approached from various viewpoints. These can be broadly classified into three types—1) single cell models, where the attempt is to model responses of single cells in the visual cortex to various stimuli [4], 2) visual processing models— where the focus is on modeling higher order functions of the visual pathway such as detecting and representing contours, edges, surfaces, etc. [5], and 3) statistical models, which represent the visual pathway as effecting sensory transfor- mations for achieving diverse objectives like redundancy reduction and learning [4], [6], [7], [8]. These models are often interlinked and complementary as outputs of simple cell models are used for representing higher order structure in a visual scene by visual processing models. It is increasingly clear that sense perception and learning are interlinked and learning involves evaluating probabilities of different hy- pothesis from sensory data [8]. In this sense, neural representations need not be seen as merely transformations of stimulus energies, but as approximate estimates of probable truths of hypotheses in the current environment [8]. In this paper, we develop a mathematical model for visual representation which is consistent with these diverse perspectives. The algorithm is formulated by the use of an independent component analysis (ICA) model for edge representation followed by a mixture of probabilistic princi- pal components analyzers (MPPCA) model for surface representation. Section 2 briefly describes the salient math- ematical features of the individual models described above. Section 2.1 discusses mathematical models of simple and complex cells, Section 2.2 discusses data streams in image representation, and Section 2.3 discusses neural representa- tion in the framework of redundancy reduction. We describe the salient mathematical features of our algorithm in Section 3. In Section 4, we compare the performance of our model against representation obtained using an ICA basis, a PCA basis, and a discrete cosine transform (DCT) basis. The results of comparisons are presented in Section 5. We discuss our IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 27, NO. 9, SEPTEMBER 2005 1367 . N. Balakrishnan is with the Department of Bioengineering, University of Illinois at Chicago, 851 S. Morgan Street, Room 218, (M/C 063), Chicago, IL 60607. E-mail: [email protected]. . K. Hariharakrishnan is with Motorola India Electronics Limited, # 66/1, Plot No. 5, 6th Main, Hoysala Nagar, Bangalore 560093, Karnataka, India. E-mail: [email protected]. . D. Schonfeld is with the Department of Electrical and Computer Engineering, University of Illinois at Chicago, 851 S. Morgan Street, (M/C 154), Chicago, IL 60607. E-mail: [email protected]. Manuscript received 8 Jan. 2004; revised 30 Sept. 2004; accepted 22 Nov. 2004; published online 14 July 2005. Recommended for acceptance by E.R. Hancock. For information on obtaining reprints of this article, please send e-mail to: [email protected], and reference IEEECS Log Number TPAMI-0021-0104. 0162-8828/05/$20.00 ß 2005 IEEE Published by the IEEE Computer Society

Upload: others

Post on 13-Oct-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE ...ftp.gunadarma.ac.id/research/IEEE/PAMI/Sep_05/i1367.pdf · A New Image Representation Algorithm Inspired by Image Submodality

A New Image Representation AlgorithmInspired by Image SubmodalityModels, Redundancy Reduction,and Learning in Biological Vision

Nikhil Balakrishnan, Karthik Hariharakrishnan, and Dan Schonfeld, Senior Member, IEEE

Abstract—We develop a new biologically motivated algorithm for representing natural images using successive projections into

complementarysubspaces.An image is firstprojected intoanedgesubspacespannedusingan ICAbasisadapted tonatural imageswhich

captures the sharp features of an image like edges and curves. The residual image obtained after extraction of the sharp image features is

approximated using a mixture of probabilistic principal component analyzers (MPPCA) model. The model is consistent with cellular,

functional, information theoretic, and learning paradigms in visual pathway modeling. We demonstrate the efficiency of our model for

representingdifferentattributesofnatural images likecolor and luminance.Wecompare theperformanceofourmodel in termsof qualityof

representation against commonly used basis, like the discrete cosine transform (DCT), independent component analysis (ICA), and

principalcomponentsanalysis (PCA),basedon theirentropies.Chrominanceand luminancecomponentsof imagesare representedusing

codeshaving lower entropy thanDCT, ICA, orPCA for similar visual quality. Themodel attains considerable simplification for learning from

images by using a sparse independent code for representing edges and explicitly evaluating probabilities in the residual subspace.

Index Terms—Computer vision, feature representation, statistical models, clustering algorithms, machine learning, color.

1 INTRODUCTION

THE human visual system has developed efficient coding

strategies to represent natural images, given the fact

that we are able to make sense of millions of bytes of dataevery day in a seemingly effortless manner [1]. Visual

information is transduced by the rods and cones of the

retina and carried to the brain by the optic nerve which are

the axons of cells in the ganglionic layer. The fibers synapse

in the lateral geniculate body of the thalamus before

terminating in the primary visual cortex or V1 [2]. Sensory

information is represented using the output of an array of

neurons. The type of stimulus is inferred from the locationand amplitude of neurons that show maximal activity.

Encoding information using the activity of a large number

of neurons is known as population coding [3].Mathematical models of the visual system have been

approached from various viewpoints. These can be broadly

classified into three types—1) single cell models, where the

attempt is to model responses of single cells in the visual

cortex to various stimuli [4], 2) visual processing models—where the focus is on modeling higher order functions of thevisual pathway such as detecting and representing contours,edges, surfaces, etc. [5], and 3) statistical models, whichrepresent the visual pathway as effecting sensory transfor-mations for achieving diverse objectives like redundancyreduction and learning [4], [6], [7], [8]. Thesemodels are ofteninterlinked and complementary as outputs of simple cellmodels are used for representing higher order structure in avisual scene by visual processing models. It is increasinglyclear that sense perception and learning are interlinked andlearning involves evaluating probabilities of different hy-pothesis from sensory data [8]. In this sense, neuralrepresentations need not be seen as merely transformationsof stimulus energies, but as approximate estimates ofprobable truths of hypotheses in the current environment [8].

In this paper, we develop amathematical model for visualrepresentation which is consistent with these diverseperspectives. The algorithm is formulated by the use of anindependent component analysis (ICA) model for edgerepresentation followed by a mixture of probabilistic princi-pal components analyzers (MPPCA) model for surfacerepresentation. Section 2 briefly describes the salient math-ematical features of the individual models described above.Section 2.1 discusses mathematical models of simple andcomplex cells, Section 2.2 discusses data streams in imagerepresentation, and Section 2.3 discusses neural representa-tion in the framework of redundancy reduction. We describethesalientmathematical featuresofouralgorithminSection3.In Section 4, we compare the performance of our modelagainst representation obtained using an ICA basis, a PCAbasis, and adiscrete cosine transform (DCT) basis. The resultsof comparisons are presented in Section 5. We discuss our

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 27, NO. 9, SEPTEMBER 2005 1367

. N. Balakrishnan is with the Department of Bioengineering, University ofIllinois at Chicago, 851 S. Morgan Street, Room 218, (M/C 063), Chicago,IL 60607. E-mail: [email protected].

. K. Hariharakrishnan is with Motorola India Electronics Limited, # 66/1,Plot No. 5, 6th Main, Hoysala Nagar, Bangalore 560093, Karnataka, India.E-mail: [email protected].

. D. Schonfeld is with the Department of Electrical and ComputerEngineering, University of Illinois at Chicago, 851 S. Morgan Street,(M/C 154), Chicago, IL 60607. E-mail: [email protected].

Manuscript received 8 Jan. 2004; revised 30 Sept. 2004; accepted 22 Nov.2004; published online 14 July 2005.Recommended for acceptance by E.R. Hancock.For information on obtaining reprints of this article, please send e-mail to:[email protected], and reference IEEECS Log Number TPAMI-0021-0104.

0162-8828/05/$20.00 � 2005 IEEE Published by the IEEE Computer Society

Page 2: IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE ...ftp.gunadarma.ac.id/research/IEEE/PAMI/Sep_05/i1367.pdf · A New Image Representation Algorithm Inspired by Image Submodality

model in light of current trends in neuroscience and learningtheory in Section 6. Section 7 presents a summary andsuggests directions for future work.

2 BACKGROUND AND NEURAL MOTIVATION

2.1 Structure and Organization of the Visual System

V1 neurons are classified into simple and complex cells. Asystematic representation of the visual space is formed onthe cortical surface. Neurons that share the same orientationpreference are grouped together into columns and succes-sive columns rotate systematically in preferred angle [4].Mathematically, the receptive fields of simple cells of thevisual cortex are modeled as Gabor wavelets which areparameterized as

fðx; yÞ ¼ exp �½ðx� xoÞ2=�2o þ ðy� yoÞ2=�2

o �n o

exp �2�i½uoðx� xoÞ þ voðy� yoÞ�f g;ð1Þ

where (xo, yo) specifies position in the image, (�o, �o)specifies the filter’s effective width, and (uo, vo) specifies thefilter’s frequency. The real and imaginary parts of thiscomplex filter function describe associated pairs of simplecells in quadrature phase [4]. The 2D Fourier transformF ðu; vÞ of a 2D Gabor wavelet is given by

F ðu; vÞ ¼ exp �½ðu� uoÞ2�20 þ ðv� voÞ2�2

0 �n o

exp 2�i½xoðu� uoÞ þ yoðv� voÞ�f g:ð2Þ

Thepower spectrumof a2DGaborwavelet takes the formofabivariateGaussiancenteredat(uo; vo),havingmagnitude!o ¼uo

2 þ vo2

� �1=2andphase�o ¼ tan�1 vo=u0. Therefore, thepeak

response occurs for an orientation of �o and an angularfrequency of !o corresponding to the excitatory field [4].

Complex cells form the other major cell type in V1.Mathematically, they are modeled as summing the squaredmagnitudes of simple cell inputs, thus enabling phaseinvariance and limited translation invariance. These cellsbecome important in higher order tasks, like pattern recogni-tion, which take place in adjacent parts of the brain, like V2.

2.2 Data Streams in Visual Processing

This section describes image representation from a func-tional perspective. Neurophysiology and psychophysicsdata suggest that visual object representation is organizedin parallel interacting data streams which follow differentcomputational strategies. Attributes of an image, such ascolor, edges, boundaries, luminance, etc., are extracted andprocessed separately. This is referred to as the boundarycontour system/feature contour system (BCS/FCS) [5].

Boundary formationproceeds by linking oriented contrastmeasures (small edges)alongsmoothcontours.BCSperformsedge detection, edge completion, and sends outputs to FCS.FCS performs diffusion of uniform color or brightness inperceptually similar areas and inhibits diffusion of uniformluminance or color across boundaries [5]. The major input toFCS is from BCS. Thus, surfaces are formed from edges.Computational strategies for the generation of perceptualsurface qualities generally follow one of three basic strategies[5]—1) filtering and rule-based symbolic interpretations [9],2) spatial integration, inverse filtering, and labeling modelswhich follow the operations sequence

filtering=differentiation !boundaries=thresholding ! integration

[10], [11], and 3) filling-in models which follow the sequencefiltering ! boundaries ! filling-in [12], [13]. Differentia-tion and subsequent thresholding operations are intended todetect salient changes in the luminance signal which serve tocreate region boundaries and, subsequently, trigger integra-tion from local luminance ratios. The latter two approachesbeginwithlocal luminanceratiosestimatedalongboundaries.Surface properties are computed by propagating localestimates into region interiors to generate a spatiallycontiguous representation. See Neumann and Mingolla [5]for a review of the topic.

This concept also underlies color processing. Variations inwavelength of light are processed distinct from variations inintensity [14]. The continuous color space is partitioned intodiscrete color categories [14]. Color is transduced by differ-entially wavelength-sensitive cones in the retina. Cones areclassified as L, M, and S (long, medium, and short) typesbased on their peak wavelength sensitivity. There are threedistinct channels conveying color information from the retinato V1. TheM-channel contributes to perception of luminanceand motion, but does not convey wavelength coded signals.The P-channel conveys long and medium wavelengthinformation and fine detail. The K-channel conveys informa-tion regarding color sensations. K and P channels innervatecytochrome oxidase stained areas called “blobs” in V1. P andM channels innervate, remaining regions, known as “inter-blobs” [14]. See Kentridge et al. [14] for a review of the topic.

2.3 Information Theory Perspective—TheRedundancy Reduction Principle

A close link exists between this observed anatomy andphysiology of the visual pathway and data compressionprinciples of information theory. Classical models incomputational neuroscience assume efficient coding in thesensory system has been attained through Barlow’s princi-ple of redundancy reduction [6], [7]. According to thisprinciple, sensory transformations should reduce redun-dancies in the input sensory stream [1] with the aim ofachieving a more parsimonious representation in the brain.Therefore, visual image processing should serve to trans-form the input visual data into statistically independentcomponents so that there is no overlap in the informationconveyed by different components.

The Karhunen Loeve (KL) transform has been usedextensively in the literature for decorrelating or “whitening”multivariate input data. The KL transform removes redun-dancies due to linear pairwise correlations among imagepixels. However, natural images have an abundance offeatures like oriented lines, edges, curves, etc., which giverise to higher order statistical dependencies between pixelsthan just linear piecewise correlations [1]. ICA refers to abroad range of algorithms exploiting different objectivecriteria to effect a transformation of a vector into statisticallyindependent components, a stronger form of independencebetween components than the linear independence resultingfrom PCA. An image patch x is represented as a linearsuperposition of columns of the mixing matrix A, thecolumns ofA being such that the components of the vector sare statistically independent.

x ¼ As: ð3Þ

1368 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 27, NO. 9, SEPTEMBER 2005

Page 3: IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE ...ftp.gunadarma.ac.id/research/IEEE/PAMI/Sep_05/i1367.pdf · A New Image Representation Algorithm Inspired by Image Submodality

3 HYBRID ICA—MIXTURE OF PPCA ALGORITHM

3.1 ICA Model

We need to estimate a matrix A such that s ¼ A�1x hascomponents that are statistically independent. “Sparseness”is an additional condition imposed on s. A sparse activationpattern is modeled as a Laplacian density characterized byhigh kurtosis. Kurtosis is defined as the fourth ordercumulant of a random variable and is 0 for a Gaussianrandom variable. Therefore, to determine A, the non-Gaussianity of s can be used as a cost function. Kurtosis,which serves as a measure of the non-Gaussianity of arandom variable, can be approximated using negentropy[15]. Negentropy of a random variable y is defined as thedifference between the differential entropy of y assuming aGaussian pdf and its differential entropy estimated using itsactual density, for the same covariance [15].

JðyÞ ¼ H ygauss

� ��HðyÞ: ð4Þ

Since a Gaussian pdf has the greatest entropy for a givenvariance, negentropy is always positive and equal to zero ifand only if y is Gaussian [15]. Negentropy of a randomvariable s is approximated as [15]

JðsÞ � 1

12E s3� �2þ 1

48kurtðsÞ2; ð5Þ

where “kurt” denotes the kurtosis of the random variable inthe argument. We use the FastICA Matlab package [16] toestimate the independent components of natural images.This is a fast fixed-point algorithm that achieves a sparse,independent components transformation by iterativelymaximizing an approximation to the negentropy. Theestimated matrix A is projected into the manifold oforthogonal matrices at each step so that (3) is constrainedto be an orthogonal transformation.

The resulting filters (columns of A) resemble Gaborfunctions andwavelets and function as edge detectors whichrespond maximally in the presence of a corresponding edgein the visual field. The vector s is a population code ofcorresponding neuronal activations and is obtained by

s ¼ ATx: ð6Þ

The elements of s (referred to as sources) with largemagnitudes signify the presence of corresponding “sharp

features” in an image patch—features such as edges,oriented curves, etc. The components of s with smallermagnitudes represent the summation of a number of weakedges to yield less interesting features in the image patch.

The columns ofA and the components of the vector s aredetermined up to a permutation. Independent subspaceanalysis (ISA) and topographic independent componentanalysis (TICA) are algorithms that allow limited depen-dencies between components of s to allow for localclustering of cells with similar receptive fields [17], [18]. Inthese models, there is local dependence between thecomponents of a subset of neurons (vector s) and indepen-dence of the subset with respect to the rest of the vector. Thisallows for neurons with similar orientation and frequencyselectivity to be grouped together.

Fig. 1a shows the activation profile of a source in responseto different image patches. Fig. 1b shows the correspondinghistogram of source activity. The histogram gives a nonpara-metric approximation of the probability density. The peakoccurs at 0, with sharp tails at higher values of activation,suggestive of a sparse super-Gaussian activation pattern. Thekurtosis was estimated to be 9:11, which confirms the super-Gaussian nature. Fig. 2 shows the same source after athreshold is appliedonactivation.As seen inFig. 2, the sourceis significantly active only a fraction of the time,which, in thiscase, is 27/256or 10.5percent of the time. Fromtrial anderror,an absolute value of 1.5 is set for a component of s to beconsidered significantly active. Let ssharp denote the result ofthresholding vector a for an image patch x, where allcomponents of s that have an absolute value less than 1.5 areset equal to zero. Denoting the reconstructed image vector byxsharp, we get

xsharp ¼ Assharp: ð7Þ

The residual subspace is given by

xresidual ¼ x� xsharp: ð8Þ

The residual subspace nevertheless plays an important rolein attaining a smooth representation and cannot beneglected entirely. Fig. 3 shows the sharp and residualsubspaces of two images. The sharp features image wouldcorrespond to the BCS discussed in Section 2.2. We use alatent variable model formulation to arrive at an efficientbasis for the residual subspace, described in Section 3.2.

BALAKRISHNAN ET AL.: A NEW IMAGE REPRESENTATION ALGORITHM INSPIRED BY IMAGE SUBMODALITY MODELS, REDUNDANCY... 1369

Fig. 1. (a) Activity profile of a typical source in response to natural image patches and (b) corresponding histogram. The histogram reveals a peak at0 and a sharp fall at higher activations, suggesting a super-Gaussian probability density function.

Page 4: IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE ...ftp.gunadarma.ac.id/research/IEEE/PAMI/Sep_05/i1367.pdf · A New Image Representation Algorithm Inspired by Image Submodality

3.2 MPPCA Model

The residual image is assumed to originate by samplingfrom a limited number of lower dimensional self-similarclusters. Surfaces having a particular orientation originatefrom a single cluster for which an efficient basis can beconstructed. A cluster k is first selected at random,following which an observation is generated using thelinear model

xk ¼ Wksk þ ����k þ ����k; k ¼ 1; . . . ; K; ð9Þ

where x 2 IRd is the observation, s 2 IRq is the lowerdimensional source manifold assumed to be Gaussiandistributed with zero mean and identity covariance, d > q,����k is the cluster observationmean, ����k � Nð0;����kÞ is Gaussianwhite noise with zeromean and diagonal covariance.Wk, sk,����k, and ����k are hiddenvariables that need to be estimated fromdata; hence, the term latent variable model [19].

Tipping and Bishop formulate estimating the parametersof the model in a maximum-likelihood framework [19]. Wedescribe the salient features of their algorithm. The statistical

factor analysis model (9) can be simplified by assuming anisotropic noise model ����k � Nð0; �2

kIÞ. For the one cluster

case ðK ¼ 1Þ, the conditional density of x given s is given by

pðxjsÞ ¼ ð2��2Þ�d=2 exp�1

2�2x�Ws� ����k k2

� �; ð10Þ

where

pðsÞ ¼ ð2�Þ�q=2 exp � 1

2sT s

� �: ð11Þ

The marginal distribution of x is obtained by the integral

pðxÞ ¼Z

pðxjsÞpðsÞ ds ð12Þ

¼ ð2�Þ�d=2jCj�1=2 exp�1

2ðx� ����ÞTC�1ðx� ����Þ

� �; ð13Þ

where the model covariance is given by

C ¼ �2IþWWT :

The posterior density of the latent variables s given x is

given from Bayes rule by

pðsjxÞ ¼ pðxjsÞpðsÞpðxÞ ð14Þ

¼ ð2�Þ�q=2 ��2M�� ��1=2exp �1

2zT��2Mz

� ; ð15Þ

where

z ¼ s�M�1WT x� ����ð Þ

and the posterior covariance matrix is given by

�2M�1 ¼ �2ð�2IþWTWÞ�1:

The Gaussian assumption governing the density of s and the

conditionaldensityofxgivens is exploited toobtainsolutions

1370 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 27, NO. 9, SEPTEMBER 2005

Fig. 2. Source activity following application of threshold on activation,suggesting a sparse activation profile.

Fig. 3. Image decomposition into (b) edge and (c) residual subspaces. The sharp features image corresponds to the BCS, while the residualsubspace corresponds to FCS. Thus, visual information is split into two complementary subspaces, each following different computationalstrategies. Residual subspaces may be common across diverse images. Image (c) of (a) the Lions (top) and Forest (bottom) images are very similar,though edges—(b) are very different, and contain discriminating information.

Page 5: IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE ...ftp.gunadarma.ac.id/research/IEEE/PAMI/Sep_05/i1367.pdf · A New Image Representation Algorithm Inspired by Image Submodality

of (12), (14) in closed form. Further details of parameterestimation for the model are presented in Appendix A.

In the PPCA framework (derived in Appendix A.1), anobservation vector x is represented in latent space not by asingle observation as in conventional PCA, but by a posteriorconditional density which is a Gaussian given by (15).Therefore, the best point estimate for the nth observation inlower dimensional latent space is the posterior mean givenby [19]

hsni ¼ M�1WT xn � ����ð Þ: ð16Þ

The optimal least squares linear reconstruction of anobservation fromitsposteriormeanvector hsni isgivenby[19]

xxn ¼ W WTW� ��1

Mhsni þ ����: ð17Þ

Extending this to the MPPCA framework (Appendix A.2),each observation has a posterior density associated witheach latent space. The corresponding posterior mean for anobservation within a cluster i is given by

hsii ¼ �2i IþWT

i Wi

� ��1WT

i x� ����ið Þ: ð18Þ

Reconstruction of an observation from its latent spacerepresentation can be obtained by substituting the corre-sponding Wi in (17).

Therefore, sparse code obtained by ICA represents edgesand sharp discontinuities, while the MPPCA basis codes forsmooth manifolds in an image using a latent variable model.Representing the residual subspace involves first performingpattern recognition, followed by coding a pattern using abasis for that cluster. We refer to this method as the HybridICA-Mixture of PPCA algorithm (HIMPA).

4 EXPERIMENTS

Model parameters were estimated using samples from atraining set of 13 natural images downloaded with theFastICA package [16]. Each image was first normalized to0mean and unit variance. 16� 16 block samples were drawnrandomly from the training set, vectorized, following whichdimensionality reduction to 160� 1 was done using PCA.ICA estimation was then performed on the reduced dimen-sion vectors. Once the ICA mixing matrix was determined,sharp features were extracted and residual image vectorswereobtained.Duringpractical implementation, theMPPCA

model failed to converge for the original 256� 1 vectors.Therefore, residual image vectors were converted into64� 1 vectors and a model having eight clusters with fourprincipal components in each cluster was estimated usingthe EM algorithm with parameter updates outlined inAppendix A. We used the assistance of the NETLAB Matlabpackage for estimating the MPPCA parameters [20].

4.1 Nature of Obtained Solutions

Figs. 4a and 4b show the solutions obtained for the differentclusters in space and frequency domains, respectively. Thefigures were obtained by summing the columns of theindividual Wis in ratio of their eigenvalues. In the spacedomain (Fig. 4a), the filters resemble tiles at variousorientations. Fig. 4b shows the corresponding Fouriertransform magnitudes. In the frequency domain, the filterspossess significant magnitudes only at low frequencies andspecific orientations. Functionally, such neurons belong toFCS discussed in Section 2.2.

4.2 Performance Comparisons with ConventionalBasis Systems

Model parameters obtained from training were used torepresent images in the test set using the same sequence ofoperations outlined above.Hard clusteringwasperformed toassign each observation in residual space to the cluster thatleads to the least reconstruction error. For each image patch,the relevant parameters to be stored are the 160� 1 ICAfeatures, the local mean, the MPPCA cluster identifier, and a4� 1 vector of probabilistic principal components. Since thedifferent coefficients have very different distributions, theentropies were estimated separately and added to arrive at ameasure of bits per pixel. Appendix B gives a brief outline ofthe method used to compute entropies.

We compare the performance of our neurally inspiredmodel with DCT, ICA, and PCA. The DCT forms the basis ofJPEG, one of themost popular image compression algorithmsin use [21].We examine the performance of our algorithm fordifferent attributes of vision—color and luminance. TheDCTis a fixed orthogonal basis where the cosine basis functionsare not dependent on data. A PCA basis is data dependent(basis vectors are eigenvectors of the data covariance matrix)[21]. We use image coding efficiency measured in bits/pixelas an index of performance. Comparisons are made bycompressing the same image using HIMPA, DCT, ICA, andPCA to the same level of visual quality and similar signal to

BALAKRISHNAN ET AL.: A NEW IMAGE REPRESENTATION ALGORITHM INSPIRED BY IMAGE SUBMODALITY MODELS, REDUNDANCY... 1371

Fig. 4. Space and frequency domain characteristics of MPPCA filters. (a) Space domain—surface orientation of MPPCA filters. (b) Frequencydomain—Fourier transforms of MPPCA filters. In the space domain, the MPPCA basis resembles tiles at specific orientations. The correspondingFourier transforms are low pass and oriented.

Page 6: IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE ...ftp.gunadarma.ac.id/research/IEEE/PAMI/Sep_05/i1367.pdf · A New Image Representation Algorithm Inspired by Image Submodality

noise ratio (SNR). SNR alone was not found to be a goodmeasureofperformance as, inmanycases, ahighSNRdidnotcorrespond to good visual quality and vice versa.

5 RESULTS

We discuss results for color and luminance in the followingsections.

5.1 Chrominance Representation

A true color RGB image is converted into a Y IQ

(luminance-inphase-quadrature) image using the lineartransformation [21].

YIQ

0@

1A ¼

0:30 0:59 0:110:60 �0:28 �0:320:21 �0:52 0:31

0@

1A R

GB

0@

1A:

RGB denote the intensities of the red, green, and bluecolors, respectively. The Y component contains luminanceinformation, while the I and Q components contain colorinformation. We consider color a “surface” attribute ofvision, i.e., only smooth approximations are needed in animage for preserving high quality perception. In theseexperiments, the luminance component was kept constantand the representation of chrominance components usingDCT and HIMPA was studied.

Color was represented using a single cluster PPCAmodel having four or less components with little variationwithin a 8� 8 image patch. Excellent visual appearance andSNR were obtained for virtually all natural images using

this method. We show two images in Fig. 5 covering a broadspectrum of color. Table 1 shows the correspondingentropies and combined SNR of the I and Q componentsfor HIMPA and DCT. This can be extended to an MPPCAmodel to yield probabilistic partitions of the color space forcolor discrimination.

5.2 Luminance Representation

Edgesare important for thiscomponent forpreservingqualityof representation. Fig. 6 shows examples of the performanceobtained on four test imageswhich are very diverse fromoneanother. Fig. 7 shows the representation obtained using ICAalone. SNR (dB) and entropy (bits/pixel) for images in Figs. 6and 7 are displayed in Table 2. The results show that HIMPAcodes have less entropy for comparable SNR and visualquality. Although ICA attains a marginally higher SNR thanHIMPA, it is evident from comparing Figs. 6 and 7 thatHIMPA images have a better visual appearance. This is mostapparent in the“landscape,”“parrot,” and“bees” imagesandless discernable in the “tulips” image. The MPPCA compo-nent of HIMPA attains much better representation ofrelatively smooth areas in an image when compared to ICAcodes alone. Images rich in smooth surfaces like grass, sand,etc., and lacking edge information are not represented wellusing HIMPA. In such images, the residual subspace is thedominating feature and the MPPCA model which encodesthis spaceprojects the imagesontoanoverly smoothmanifoldleading to blocking. Thus, HIMPA performs best in imagesinvolving a blend of edge and surface attributes.

1372 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 27, NO. 9, SEPTEMBER 2005

Fig. 5. Color representation using (a) original, (b) DCT, and (c) HIMPA. The figures show that color can be well represented as a surface attribute.

TABLE 1Comparison of HIMPA and DCT for Chrominance Representation for Images in Fig. 5

Page 7: IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE ...ftp.gunadarma.ac.id/research/IEEE/PAMI/Sep_05/i1367.pdf · A New Image Representation Algorithm Inspired by Image Submodality

6 DISCUSSION

HIMPA achieves decomposition of an image into parallel

data streams. This is similar to the BCS/FCS paradigm

discussed in Section 2.2 and is shown schematically in Fig. 8.

The HIMPA architecture allows higher order data transfor-

mations on individual streams to be performed in relative

BALAKRISHNAN ET AL.: A NEW IMAGE REPRESENTATION ALGORITHM INSPIRED BY IMAGE SUBMODALITY MODELS, REDUNDANCY... 1373

Fig. 6. Natural image representation using (a) original (tulips, landscape, bees, and parrots), (b) HIMPA, and (c) DCT. Representation using an ICAbasis only is shown in Fig. 7. Images in this experiment are very different from one another and HIMPA attains comparable visual quality and SNR atlower entropy (Table 2).

Fig. 7. Representation of images shown in Fig. 6 using a nature adapted ICA basis. The “Tulips” image, which is rich in edges, is well represented.Smooth regions of other images are poorly represented, resulting in poor visual quality at similar SNR. See Table 2. (a) Tulips, (b) bees,(c) landscape, and (d) parrots.

Page 8: IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE ...ftp.gunadarma.ac.id/research/IEEE/PAMI/Sep_05/i1367.pdf · A New Image Representation Algorithm Inspired by Image Submodality

isolation from other streams. This is a significant advantagefrom the viewpoint of the data processing inequality ofinformation theory. The data processing inequality statesthat, for a random variable x undergoing successivetransformationsx ! y ! z, themutual information betweenx and z is bounded from above by the mutual informationbetween x and y [22]. Thus successive transformationsreduce mutual information between an observation and itsneural representation. In the series-parallel architecture ofHIMPA, successive transformations on one attribute do notdegrade mutual information in another stream. Thus,hypothesis testing and learning can be performed onindividual data streams and combined to arrive at coherentconclusions given visual observations.

6.1 General Discussion of HIMPA in Relationto DCT, ICA, and PCA

Figs. 5 and 6 show that different attributes of vision require

different levels of descriptive precision for high quality

perception. HIMPA exploits the strengths of different basis

systems to represent different attributes of an image. This is

exploited to a limited extent in JPEG by the limited bits

allotted for the chrominance components [21]. However, the

DCT does not yield a partition of an image into residual

surfaces and sharp features for the luminance component.

PCA, too, does not achieve this partition and achieves better

feature representation with increasing dimensionality. An

ICA basis has limitations in representing relatively smooth

areas in an image, evident from Fig. 7.The coding length of an observation x is related to the

probability of observation by

L / log1

pðxÞ

� : ð19Þ

For the residual subspace, under the assumption ofGaussianity, the probability of an observation pðxÞ /exp �error2

� �. Therefore, assigning each observation to the

cluster with the least reconstruction error maximizes theprobability of the observation and, thus, leads to theminimum coding length by (19). The method is inagreement with the minimum description length principlewhich is believed to govern neural coding strategies [23].

6.2 HIMPA and Learning

Sense perception blends imperceptibly with learning.Learning is understood in a Bayesian sense as evaluatingprobabilities of truths of hypothesis about the currentenvironment to decide upon an optimum course of action

[8]. In Appendix C, we derive the following bound on thejoint probability of a hypothesis H and an observation x.

P ðH;xÞ � P ðHÞYi2S0

P ðsied ¼ 0jHÞYl2S0

P ðsledjHÞ

þXKk¼1

P ðHjsksur;mÞP ðsksur;mjxsurÞP ðxsurjkÞ�k:

ð20Þ

In this equation, x represents the observation havingedge component xed and surface component xsur. sed is thesparse edge component and sksur;m is the posterior mean of xin the kth cluster and the product indices in the first term ofthe equation are over components of sed. S0 is the set ofcomponents of sed which are inactive for the observation,and S0 denotes the complementary set.

HIMPA explicitly evaluates terms in the second half of(20) for all values of k using (13) and (15). P ðsedjHÞ andP ðHjsksur;mÞ terms in (20) are context dependent and dependsolely on the hypothesis being evaluated. Equation (20),which relates P ðH;xÞ to P ðH; sÞ, achieves great simplifica-tion for hypothesis testing. The first term evaluates the jointprobability in terms of edge information and the second interms of residual information. Being sparse, the overwhelm-ing majority of indices in the edge code (around 85 percent)belong to set S0, the first product in (20). This term can beestimated easily as it evaluates the probability of an edgebeing inactive under a certain hypothesis without requiringthe level of activity to be discerned. Being factorial, a complexdecision can be obtained quite simply by a product of simplerdecisions.

The residual subspace may contain very little discrimina-tory information. Fig. 3 shows two very different images andthe information subspaces in each of them. The residualsubspaces of the “Lions” and “Forest” image are very similarand it is very unlikely that this subspace would contributemuch towardevaluatingahypothesis about theenvironment.The edge subspaces clearly are very different and holddiscriminating information. In such a situation, the latterterm in (20) can be ignored and Bayesian hypothesis testingcan be performed using edge information alone, whichpossesses all the desirable properties discussed in [8]. Ingeneral, weighted combinations of the two terms in (20) arehighly desirable as their relative importance is hypothesisdependent. This is perhaps the most important differencebetweenHIMPA, ICA,DCT,orPCA.HIMPAperforms image

1374 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 27, NO. 9, SEPTEMBER 2005

TABLE 2Comparison of HIMPA, DCT, and PCA for

Luminance Representation for Images in Fig. 6

Fig. 8. Data transformations and data streams in the HIMPA algorithm.

HIMPA channels an incident image into parallel data streams using an

ICA basis for edges and MPPCA basis for surfaces. This resembles the

BCS/FCS paradigm discussed in Section 2.2.

Page 9: IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE ...ftp.gunadarma.ac.id/research/IEEE/PAMI/Sep_05/i1367.pdf · A New Image Representation Algorithm Inspired by Image Submodality

decomposition and coding in a manner that blends imageperception and learning. In comparison, none of the otherbasis systems yield parallel data streams where the relativecontribution of a visual attribute to a task at hand is flexibleand hypothesis dependent.

6.3 Recent Viewpoints about Redundancy in NeuralSystems

In a recent paper [8], Barlow re-examined the redundancyreduction paradigm advanced in the 1950s and 1960s [6],[7]. The paper is critical of redundancy reduction for severalreasons and concluded that redundancy in sensory mes-sages conveys crucial information about the environmentand its importance does not lie in compressive coding.

Barlow defines redundancy in image data to be manifestor nonmanifest (also referred to as hidden) [8]. Manifestredundancy, caused by unequal probabilities of primarymessage elements, is benign. Nonmanifest redundancy,caused by unequal joint probabilities of higher ordercombinations of elements, is believed to be of criticalimportance and is considered an important source ofknowledge about the environment and vital for estimatingprobabilities of hypothesis about the environment [8].Redundancy in living systems is also understood to have atemporal element to it as those features of sensory stimula-tion that are accurately predictable from knowledgeacquired from experience become redundant [8]. Compres-sive representations, which reduce redundancy, have highactivity ratios of individual neurons in response to astimulus. Such representations are easily degraded by noiseand learning from such a representation is difficult owing tothe high degree of overlap in neuronal activities in responseto different stimuli [8].

Barlow bases some of his conclusions on empiricalevidence revealing greater than expected neuronal activityat higher levels of visual processing. However, the nature ofredundancy that is added is not discussed. Redundancyadded in the form of error correcting codes to prevent signaldegradation at higher levels of the nervous system isconsiderablydifferent fromredundancy inperceived stimuli.This distinction is vital for understanding the importance ofredundancy in high-level processing of visual information. Inthe following sections, we demonstrate that our proposedalgorithm is consistent with the new paradigm proposed byBarlow.

6.3.1 HIMPA Separates Manifest Redundancies in an

Image and Makes Nonmanifest Redundancies

Easier to Identify

In our model, the residual subspace captures manifestredundancies in a visual scene. This manifold lacks structureand the neural system is able to generate an internalrepresentation for it through knowledge acquired from priorexperienceusingthegenerativemodelgiven in (9).Once thesehave been identified and removed, the remaining sourceactivationsaresparseandcontaindominantedge informationfrom which higher order redundancies governing the co-occurrence of edges can be learned. Edge co-occurrencestatisticshavebeenshowntobevital forderivingprobabilisticrules governing grouping of edges to form longer contoursand boundaries (BCS discussed in Section 2.2) [24], [25]. Thesparse ICA code no longer has a high activity ratio and twodifferent patterns that need to be distinguished are less likely

to overlap in their representations, thus enablingmore robustdetection and probability evaluation. Therefore, HIMPAcodes do not suffer from the limitations of compressedrepresentations in general, as discussed in [8]. HIMPA doesnot discard nonmanifest redundancies, which is undesirableaccording to Barlow [8], rather it helps extract them from themanifest redundancies inwhich theyareburied.Byextractingand preserving manifest redundancies in a separate stream,HIMPAimproves thesignal tonoise ratioof important signalsin the environment, improving their detection and identifica-tion. This is a desirable goal of sensory coding, as discussed in[8]. This is important in reinforcement learning as wellbecause all reinforcement learning is a response to thestatistical structure of sensory signals.

6.3.2 HIMPA Codes Are Well-Suited for Learning in

Neural Systems

The advantages of a sparse independent code for learningwere discussed in Section 6.2. In addition to this property,HIMPA edge codes span a limited range (between -8 and +8,as shown in Fig. 1b). A small dynamic range confines discreteapproximations of neuronal activity to a small alphabet,making probabilistic estimates of distinct activation levelseasier, a highly desirable property according to Barlow in [8].Further, HIMPA uses closed form expressions to evaluateprobabilities in residual spacewhich can be evaluated easily.

7 SUMMARY AND FUTURE WORK

We present a hybrid algorithm that uses an ICA basis torepresent edges and a MPPCA basis to span the residualsubspace. The residual subspace combines a large number ofindividually insignificant events in a concise summary,which is vital for adapting to changing goals. The methodhas its origins in current mathematical models of the visualcortex and visual data processing. We demonstrate applica-tionof themodel in representingchrominanceand luminancecomponents of natural images. Further, HIMPA codesgreatly simplify estimating probabilities of truths of hypoth-esis in the environment, a task intimately connected tolearning. The model can be extended to data sets other thannatural images using a mixture of ICA formulation. Theselection of active versus inactive sources can be extendedbeyond the simple threshold rule employed by us based onexperimental datawhichwould help discernwhich subset oflow source activities is worth preserving for good visualreconstructions. Estimation of the residual subspace para-meters can be formulated in a Bayesian framework andsolved using approximation methods, like the Monte Carlomethod or variational learning, to improve generalization.

APPENDIX A

MAXIMUM-LIKELIHOOD ESTIMATION OF PARAMETERS

IN THE MPPCA MODEL

A.1 One Cluster Case

Assuming independence of the observations, for the onecluster case, the log likelihood of observing the data underthe model is given by

BALAKRISHNAN ET AL.: A NEW IMAGE REPRESENTATION ALGORITHM INSPIRED BY IMAGE SUBMODALITY MODELS, REDUNDANCY... 1375

Page 10: IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE ...ftp.gunadarma.ac.id/research/IEEE/PAMI/Sep_05/i1367.pdf · A New Image Representation Algorithm Inspired by Image Submodality

L ¼XNn¼1

ln pðxnÞf g

¼ �N

2dlnð2�Þ þ lnjCj þ trðC�1SÞ

� �;

ðA:21Þ

where

S ¼ 1

N

XNn¼1

ðxðnÞ � ����ÞðxðnÞ � ����ÞT

is the sample covariance matrix. The parameters in themodel are determined such that the likelihood (A.21) ismaximized. The maximum-likelihood estimator of theparameter ���� is given by the mean of the data [19].

����ML ¼ 1

N

XNn¼1

xn: ðA:22Þ

The maximum-likelihood solution of W is given by

WML ¼ Uqð����q � �2IÞ1=2R;

where thecolumnsofUq are theeigenvectorsofScorrespond-

ing to the first q eigenvalues,����q is a q � qmatrix containing the

principal eigenvalues of S arranged in descending order and

R is an arbitrary q � q orthogonal rotationmatrix [19]. This is

the probabilistic principal components analysis framework

(PPCA) for modeling a mapping from latent space to the

principal subspace of the observed data. The maximum-

likelihood estimator for �2 is given by �2 ¼ 1d�q

Pdj¼qþ1 �j,

where �qþ1 to �d are the smallest eigenvalues of S [19].

A.2 Multicluster Framework

In this case, the complete data log likelihood is given by

L ¼XNi¼1

XKk¼1

p xnjkð Þ�kð Þ" #

; ðA:23Þ

where pðxnjkÞ is the probability of observing xn given that itoriginated from cluster k, �k are the prior mixingprobabilities and are subject to the condition

P�k ¼ 1.

An iterative Expectation Maximization (EM) algorithm isused to estimate the model parameters. In a manneranalogous to estimating the Gaussian mixture model,closed form expressions can be derived for updating themodel parameters [19]. We denote the posterior probabilityof membership of observation xn to cluster i—pðijxnÞ byRni. By application of Bayes rule, Rni is given by

Rni ¼ p ijxnð Þ ¼ p xnjið Þ�i

pðxnÞ: ðA:24Þ

The maximum-likelihood update equations for the mixing

probabilities and cluster means are given by [19]

~��i ¼1

N

XNn¼1

Rni ðA:25Þ

~��������i ¼PN

n¼1 RnixnPNn¼1 Rni

: ðA:26Þ

The updates for Wi and �i are obtained from the local

responsibility weighted covariance matrix [19]

~SSi ¼1

~�i�iN

XNn¼1

Rni xn � ~��������ið Þ xn � ~��������ið ÞT ðA:27Þ

using the same eigendecomposition outlined above (Ap-pendix A.1) for the single cluster case. The above equationsare updated iteratively until convergence.

APPENDIX B

ENTROPY ESTIMATION

Entropy denotes the average number of bits needed toencode one symbol of the source [15], [22]. Outputs of filters(ICA or MPPCA) are first quantized, thereby mapping acontinuous random variable to a discrete set A. The entropyof a discrete random variable is then estimated according tothe formula

H ¼ �Xk2A

pk log2 pk;

where pk denotes the probability of observing the symbol inA indexed by k and the sum is over all the symbols in set A.(pk is determined by dividing the number of observations ofthe symbol indexed by k by total number of observations).

The number of symbols needed to encode an imagepatch is divided by the total number of pixels in the patch toarrive at a measure of bits/pixel. Let HICA, Hmean, HPPCA,HClusterID denote the entropies of the ICA codes, the localmeans, the MPPCA codes, and identity of the MPPCAclusters, respectively. Therefore, from Sections 3 and 4, theentropy in bits/pixel for the I component is given by

Bits=Pixel ¼ HICA160

256

�þHmean

256

þHPPCA4

64

�þHClusterID

64:

Entropies for chrominance component, DCT, and PCA areobtained in a similar manner.

APPENDIX C

EVALUATING JOINT PROBABILITY OF A HYPOTHESIS

AND AN OBSERVATION USING HIMPA

Let x represent the actual observation, which is decom-posed into an observed edge component xed and surfacecomponent xsur. Let sed and ssur be the sparse edgecomponent and surface component generated by HIMPA,respectively. Let H be the hypothesis being evaluated. SinceHIMPA partitions an observation into its edge and residualsurface components, P ðH;xÞ—the joint probability ofhypothesis H and observation x is the sum of the jointprobability of the hypothesis and the corresponding edgeand surface components.

P ðH;xÞ ¼ P ðH;xedÞ þ P ðH;xsurÞ: ðC:28Þ

Assumming surface xsur can originate from K clusters, thetotal probability becomes

P ðH;xÞ ¼ P ðH;xedÞ þXKk¼1

P ðH;xsur; kÞ: ðC:29Þ

1376 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 27, NO. 9, SEPTEMBER 2005

Page 11: IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE ...ftp.gunadarma.ac.id/research/IEEE/PAMI/Sep_05/i1367.pdf · A New Image Representation Algorithm Inspired by Image Submodality

Let sksur be the latent space representation of xsur in cluster k.Therefore, the joint probability can be expressed as

P ðH;xÞ ¼ P ðH;xedÞ þXKk¼1

Xsksur

P ðH;xsur; sksur; kÞ: ðC:30Þ

Using the chain rule, P ðH;xsur; sksur; kÞ can be expressed as

P ðH;xsur; sksur; kÞ ¼ P ðHjsksur;xsur; kÞ

P ðsksurjxsurÞP ðxsurjkÞP ðkÞ:ðC:31Þ

P ðkÞ is the prior mixing probability of cluster k, denoted by�k in (A.25). Given sksur, P ðHjxsur; s

ksur; kÞ is conditionally

independent of xsur and k. Therefore, the joint probabilitycan be simplified as

P ðH;x; sksur; kÞ ¼ P ðHjsksurÞP ðsksurjxsurÞP ðxsurjkÞ�k: ðC:32Þ

Substituting (C.32) in (C.30), we get

P ðH;xÞ¼P ðH;xedÞ

þXKk¼1

Xsksur

P ðHjsksurÞP ðsksurjxsurÞP ðxsurjkÞ�k:ðC:33Þ

From Bayes rule, P ðH;xedÞ can be expressed as

P ðH;xedÞ ¼ P ðxedjHÞP ðHÞ: ðC:34Þ

For a linear, invertible transformation, xed ¼ Ased, the pdfs

are related by [15]

P ðxedjHÞ ¼ 1

detAj jP ðsedjHÞ: ðC:35Þ

The mixing matrix A was determined to maximize thenegentropy, as discussed in Section 3.1. Since x is assumed tobe multivariate Gaussian, from the definition of negentropyin (4) it follows that

HðxÞ � HðsÞ: ðC:36Þ

For a linear tranformation, H(x) and H(s) are related by [15]

HðxÞ ¼ HðsÞ þ log jdetAj: ðC:37Þ

It therefore follows that log jdetAj � 0 or jdetAj � 1.Substituting this lower bound for jdetAj in (C.34), we get

the following inequality

P ðH;xedÞ � P ðsedjHÞP ðHÞ; ðC:38Þ

where P ðHÞ denotes the prior probability of the hypothesisunder consideration. Substituting (C.38) in (C.33), we get

P ðH;xÞ � P ðsedjHÞP ðHÞ

þXKk¼1

Xsksur

P ðHjsksurÞP ðsksurjxsurÞP ðxsurjkÞ�k:

ðC:39Þ

Being independent, the joint probability of activation ofindividual representational elements in a distributed net-work can be obtained quite simply by multiplication ofindividual probabilities, making Bayesian inference abouthypothesis much easier. Therefore, the likelihood of sedconditioned on H can be written as

P ðsedjHÞ ¼Yi

P ðsiedjHÞ; ðC:40Þ

where sied denotes the ith component of the edge subspacerepresentation sed. Substituting (C.40) in (C.39), we obtainan approximation to the total posterior probability as

P ðH;xÞ � P ðHÞYi

P ðsiedjHÞ

þXKk¼1

Xsksur

P ðHjsksurÞP ðsksurjxsurÞP ðxsurjkÞ�k:

ðC:41Þ

We can split the product in (C.41) into two sets—the set S0

of indices i, where sied ¼ 0, and the complementary set S0,where sled 6¼ 0. Therefore, (C.41) can be expressed as

P ðH;xÞ � P ðHÞYi2S0

P ðsied ¼ 0jHÞYl2S0

P ðsledjHÞ

þXKk¼1

Xsksur

P ðHjsksurÞP ðsksurjxsurÞP ðxsurjkÞ�k:ðC:42Þ

The inner sum over sksur can be dropped if we evaluate theprobability of the hypothesis only at the posterior mean ofssur (denoted by sksur;m for a particular choice of cluster k)given xsur using (16). Therefore, (C.42) simplifies to

P ðH;xÞ � P ðHÞYi2S0

P ðsied ¼ 0jHÞYl2S0

P ðsledjHÞ

þXKk¼1

P ðHjsksur;mÞP ðsksur;mjxsurÞP ðxsurjkÞ�k:

ðC:43Þ

Equation (C.43) evaluates the joint probability of a hypoth-esis and an observation in terms of its neural representation.

ACKNOWLEDGMENTS

The authors express their gratitude to the associate editor andfive anonymous reviewers for helpful suggestions andcorrections which helped improve the quality of the manu-script ingreatmeasure.N.Balakrishnanwould like to expresshis profoundgratitude toDr. L.R. Chary andMr. S. Easwaranfor introducing him to the world of signal processing.

REFERENCES

[1] B.A. Olshausen and D.J. Field, “Sparse Coding with an Over-complete Basis Set: A Strategy Employed by V1?” Vision Research,vol. 37, no. 23, pp. 3311-3325, 1997.

[2] R.S. Snell, Clinical Neuroanatomy for Medical Students, pp. 702-720.Philadelphia: Lippincott Williams and Wilkins, 1997.

[3] Neural Codes and Distributed Representations: Foundations of NeuralComputation, L. Abbott and T.J. Sejnowski, eds. Cambridge, Mass.:MIT Press, 1999.

[4] J. Daugman, “Gabor Wavelets and Statistical Pattern Recogni-tion,” The Handbook of Brain Theory and Neural Networks, seconded., M.A. Arbib, ed., pp. 457-463. Cambridge: MIT Press, 2003.

[5] H. Neumann and E. Mingolla, “Contour and Surface Perception,”The Handbook of Brain Theory and Neural Networks, second ed.,M.A. Arbib, ed., pp. 271-276. Cambridge: MIT Press, 2003.

[6] H.B. Barlow, “Possible Principles Underlying the Transformationsof Sensory Messages,” Sensory Comm., W.A. Rosenblith, ed.,pp. 217-234. Cambridge: MIT Press, 1961.

[7] H.B. Barlow, “Unsupervised Learning,”Neural Computation, vol. 1,pp. 295-311, 1989.

BALAKRISHNAN ET AL.: A NEW IMAGE REPRESENTATION ALGORITHM INSPIRED BY IMAGE SUBMODALITY MODELS, REDUNDANCY... 1377

Page 12: IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE ...ftp.gunadarma.ac.id/research/IEEE/PAMI/Sep_05/i1367.pdf · A New Image Representation Algorithm Inspired by Image Submodality

[8] H.B. Barlow, “Redundancy Reduction Revisited,” Network: Com-putation in Neural Systems, vol. 12, pp. 241-253, 2001.

[9] J.A. McArthur and B. Moulden, “A Two-Dimensional Model ofBrightness Perception Based on Spatial Filtering Consistent withRetinal Processing,” Vision Research, vol. 39, pp. 1199-1219, 1999.

[10] E.H. Land and J.J. McCann, “Lightness and Retinex Theory,”J. Optical Soc. Am., vol. 61, pp. 1-11, 1971.

[11] A. Hurlbert, “Formal Connections between Lightness Algo-rithms,” J. Optical Soc. Am., vol. 3, pp. 1684-1693, 1986.

[12] S. Grossberg and D. Todorovi�cc, “Neural Dynamics of 1-D and 2-DBrightness Perception: A Unified Model of Classical and RecentPhenomena,” Perception & Psychophysics, vol. 43, pp. 723-742, 1988.

[13] M.A. Paradiso and K. Nakayama, “Brightness Perception andFilling-In,” Vision Research, vol. 39, pp. 1221-1236, 1991.

[14] R. Kentridge, C. Heywood, and J. Davidoff, “Color Perception,”The Handbook of Brain Theory and Neural Networks, second ed.,M.A. Arbib, ed., pp. 230-233. Cambridge Mass.: MIT Press, 2003.

[15] A. Hyvarinen, J. Karhunen, and E. Oja, Independent ComponentAnalysis. New York: John Wiley & Sons, 2001.

[16] A.Hyvarinen, “Fast ICAMatlabPackage,” http://www.cis.hut.fi/projects/ica/fastlab, Apr. 2003.

[17] A. Hyvarinen and P. Hoyer, “Emergence of Phase and ShiftInvariant Features by Decomposition of Natural Images intoIndependent Feature Subspaces,” Neural Computation, vol. 12,no. 7, pp. 1705-1720, 2000.

[18] A. Hyvarinen, P.O. Hoyer, and M. Hinki, “Topographic Indepen-dent Component Analysis,” Neural Computation, vol. 13, no. 7,pp. 1527-1558, 2001.

[19] M. Tipping and C. Bishop, “Mixtures of Probabilistic PrincipalComponent Analyzers,”Neural Computation, vol. 11, no. 2, pp. 443-482, 1999.

[20] I. Nabney and C. Bishop, “Netlab Neural Network Software,”http://www.ncrg.aston.ac.uk/netlab, July 2003.

[21] R.C. Gonzalez and R.E. Woods, Digital Image Processing. Reading,Mass.: Addison Wesley, 1993.

[22] T.M. Cover and J.A. Thomas, Elements of Information Theory, pp. 12-49. New York: John Wiley & Sons, 1991.

[23] D.H. Ballard, An Introduction to Natural Computation, pp. 46-48.Cambridge, Mass.: MIT Press, 1999.

[24] N. Kruger, “Collinearity and Parallelism are Statistically Signifi-cant Second Order Relations of Complex Cell Responses,” NeuralProcessing Letters, vol. 8, pp. 117-129, 1998.

[25] W.S. Geisler, J.S. Perry, B.J. Super, and D.P. Gallogly, “Edge Co-Occurrence in Natural Images Predicts Contour GroupingPerformance,” Vision Research, vol. 41, pp. 711-724, 2001.

Nikhil Balakrishnan received the Bachelor ofMedicine and the Bachelor of Surgery (MBBS)degree from Mahatma Gandhi University, India,in September 1999 and the MS degree inbiomedical engineering from the University ofIllinois at Chicago in 2004. His research interestsinclude computer vision, neural networks, andmachine learning.

Karthik Hariharakrishnan received the BSdegree in engineering from the Birla Institute ofTechnology and Sciences, India, in 2000 and theMS degree in electrical and computer engineer-ing from the University of Illinois at Chicago in2003. He is currently with the Multimedia Groupin Motorola India Electronics Ltd. His researchinterests are video compression, object tracking,segmentation, and music synthesis.

Dan Schonfeld received the BS degree inelectrical engineering and computer sciencefrom the University of California, Berkeley, andthe MS and PhD degrees in electrical andcomputer engineering from the Johns HopkinsUniversity, Baltimore, Maryland, in 1986, 1988,and 1990, respectively. In August 1990, hejoined the Department of Electrical Engineeringand Computer Science at the University ofIllinois, Chicago, where he is currently an

associate professor in the Departments of Electrical and ComputerEngineering, Computer Science, and Bioengineering, and codirector ofthe Multimedia Communications Laboratory (MCL) and member of theSignal and Image Research Laboratory (SIRL). He has authored morethan 60 technical papers in various journals and conferences. He hasserved as a consultant and technical standards committee member inthe areas of multimedia compression, storage, retrieval, communica-tions, and networks. He has also served as an associate editor of theIEEE Transactions on Image Processing on nonlinear filtering as well asan associate editor of the IEEE Transactions on Signal Processing onmultidimensional signal processing and multimedia signal processing.He was a member of the organizing committees of the IEEEInternational Conference on Image Processing and IEEE Workshopon Nonlinear Signal and Image Processing. He was the plenary speakerat the INPT/ASME International Conference on Communications,Signals, and Systems. He has been elected a senior member of theIEEE. He has previously served as president of Multimedia SystemsCorp. and provided consulting and technical services to variouscorporations, including AOL Time Warner, Chicago Merchantile Ex-change, Dell Computer Corp., Getco Corp., EarthLink, Fish &Richardson, IBM, Jones Day, Latham & Watkins, Mirror Image Internet,Motorola, Multimedia Systems Corp., nCUBE, NeoMagic, Nixon &Vanderhye, PrairieComm, Teledyne Systems, Touchtunes Music,Xcelera, and 24/7 Media. His current research interests are inmultimedia communication networks, multimedia compression, storage,and retrieval, signal, image, and video processing, image analysis andcomputer vision, pattern recognition, and medical imaging.

. For more information on this or any other computing topic,please visit our Digital Library at www.computer.org/publications/dlib.

1378 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 27, NO. 9, SEPTEMBER 2005