using texture to analyze and manage large collections of remote sensed image and video data

8
Using texture to analyze and manage large collections of remote sensed image and video data Shawn Newsam, Lei Wang, Sitaram Bhagavathy, and Bangalore S. Manjunath We describe recent research into using the visual primitive of texture to analyze and manage large collections of remote sensed image and video data. Texture is regarded as the spatial dependence of pixel intensity. It is characterized by the amount of dependence at different scales and orientations, as measured with frequency-selective filters. A homogeneous texture descriptor based on the filter outputs is shown to enable 1 content-based image retrieval in large collections of satellite imagery, 2 semantic labeling and layout retrieval in an aerial video management system, and 3 statistical object modeling in geographic digital libraries. © 2004 Optical Society of America OCIS codes: 100.2960, 100.5010, 110.2960. 1. Introduction Remote sensed data, such as satellite imagery or aer- ial videography, are being acquired at increasing rates because of technological advances in airborne and spaceborne optical sensing systems. The full value of these data sets is not being realized, how- ever, because of the prohibitive cost of manual anal- ysis, both in terms of time and money. It has been hoped that automated analysis methods with com- puters would help clear this information bottleneck. Although progress has been made, the rate of analy- sis still lags the rate of acquisition. Further ad- vancements in the automated analysis of remote sensed data sets are urgently needed. Most research into automating the analysis has focused on the spectral dimension. This has been the case even though the data are intrinsically spa- tial. Other research has primarily focused on ex- ploiting the spatial dimension to localize the analysis or detect well-structured objects. 1,2 The research presented in this paper represents our recent efforts toward using spatial information to help automate the analysis and management of remote sensed im- age and video data. The basis of the approach is to consider the spatial relationships of pixel intensities in remote sensed image and video data as the visual primitive of tex- ture. The analysis of image texture has received significant research attention during the past several decades, perhaps only second to color. 3–8 Yet devel- oping effective descriptors for such a fundamental image feature remains a challenge. There are generally two approaches to analyzing texture quantitatively. The first approach explicitly analyzes the spatial dependence of pixel intensity in the texture. Examples of this approach include co- occurrence matrices 4 and methods based on Markov random fields MRFs. 5–7 In the second approach the structure in the texture is inferred by analyses of the distribution of frequencies with appropriate fil- ters. The former is more precise and intuitive in the visual characterization of the texture. However, in practice, it fails to accommodate the pixel variations among textures that are considered similar. The latter is more useful for similarity retrieval because it is a global approximation of the image structure. It is the approach taken by the research presented in this paper. In particular, orientation and scale- selective filters are used to characterize what can be considered as the direction and coarseness of a tex- ture. We begin this paper by describing the Gabor filters that form the basis of the texture analysis. We pro- vide details on how orientation and scale-selective filters are constructed by varying the modulation of Gabor functions. In the remainder of the paper we describe how the outputs of Gabor filter banks enable texture-based analysis and management of remote sensed imagery. Specifically, a homogeneous tex- The authors are with the Department of Electrical and Com- puter Engineering, University of California, Santa Barbara, Santa Barbara, California 93106. The e-mail address for S. Newsam is [email protected]. Received 20 May 2003; revised manuscript received 7 July 2003. 0003-693504020210-08$15.000 © 2004 Optical Society of America 210 APPLIED OPTICS Vol. 43, No. 2 10 January 2004

Upload: others

Post on 12-Feb-2022

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Using texture to analyze and manage large collections of remote sensed image and video data

Uc

S

1

RiraveyhpAsvs

fttpoptta

pBs

2

sing texture to analyze and manage largeollections of remote sensed image and video data

hawn Newsam, Lei Wang, Sitaram Bhagavathy, and Bangalore S. Manjunath

We describe recent research into using the visual primitive of texture to analyze and manage largecollections of remote sensed image and video data. Texture is regarded as the spatial dependence ofpixel intensity. It is characterized by the amount of dependence at different scales and orientations, asmeasured with frequency-selective filters. A homogeneous texture descriptor based on the filter outputsis shown to enable �1� content-based image retrieval in large collections of satellite imagery, �2� semanticlabeling and layout retrieval in an aerial video management system, and �3� statistical object modelingin geographic digital libraries. © 2004 Optical Society of America

OCIS codes: 100.2960, 100.5010, 110.2960.

ritsdoi

tatortttvpaliitsct

tvfiGdts

. Introduction

emote sensed data, such as satellite imagery or aer-al videography, are being acquired at increasingates because of technological advances in airbornend spaceborne optical sensing systems. The fullalue of these data sets is not being realized, how-ver, because of the prohibitive cost of manual anal-sis, both in terms of time and money. It has beenoped that automated analysis methods with com-uters would help clear this information bottleneck.lthough progress has been made, the rate of analy-is still lags the rate of acquisition. Further ad-ancements in the automated analysis of remoteensed data sets are urgently needed.Most research into automating the analysis has

ocused on the spectral dimension. This has beenhe case even though the data are intrinsically spa-ial. Other research has primarily focused on ex-loiting the spatial dimension to localize the analysisr detect well-structured objects.1,2 The researchresented in this paper represents our recent effortsoward using spatial information to help automatehe analysis and management of remote sensed im-ge and video data.The basis of the approach is to consider the spatial

The authors are with the Department of Electrical and Com-uter Engineering, University of California, Santa Barbara, Santaarbara, California 93106. The e-mail address for S. Newsam [email protected] 20 May 2003; revised manuscript received 7 July 2003.0003-6935�04�020210-08$15.00�0© 2004 Optical Society of America

10 APPLIED OPTICS � Vol. 43, No. 2 � 10 January 2004

elationships of pixel intensities in remote sensedmage and video data as the visual primitive of tex-ure. The analysis of image texture has receivedignificant research attention during the past severalecades, perhaps only second to color.3–8 Yet devel-ping effective descriptors for such a fundamentalmage feature remains a challenge.

There are generally two approaches to analyzingexture quantitatively. The first approach explicitlynalyzes the spatial dependence of pixel intensity inhe texture. Examples of this approach include co-ccurrence matrices4 and methods based on Markovandom fields �MRFs�.5–7 In the second approachhe structure in the texture is inferred by analyses ofhe distribution of frequencies with appropriate fil-ers. The former is more precise and intuitive in theisual characterization of the texture. However, inractice, it fails to accommodate the pixel variationsmong textures that are considered similar. Theatter is more useful for similarity retrieval because its a global approximation of the image structure. Its the approach taken by the research presented inhis paper. In particular, orientation and scale-elective filters are used to characterize what can beonsidered as the direction and coarseness of a tex-ure.

We begin this paper by describing the Gabor filtershat form the basis of the texture analysis. We pro-ide details on how orientation and scale-selectivelters are constructed by varying the modulation ofabor functions. In the remainder of the paper weescribe how the outputs of Gabor filter banks enableexture-based analysis and management of remoteensed imagery. Specifically, a homogeneous tex-

Page 2: Using texture to analyze and manage large collections of remote sensed image and video data

tPcopailpmctl

2

UtGtufrsfis

cfb

G

wsnAde

g

wtRnai

dotv

ib

A

TrfmtRsfswpb�itet

fps

wtrdDmtm

B

Ltistvtttc

AcWiWis

i6l

ure descriptor recently standardized by the Movingicture Experts Group �MPEG� is used to performontent-based similarity retrieval in large collectionsf aerial and satellite imagery. The descriptor alsorovides a semantic labeling of aerial videography invideo database management system. This label-

ng enables similarity retrieval based on the semanticayout of the video frames. Finally, the filter out-uts enable an object-based representation for re-ote sensed imagery. In particular, the

haracteristic textures, or texture motifs, of objectshat defy traditional modeling approaches areearned by use of a statistical approach.

. Texture Analysis with Gabor Filters

se of filters based on Gabor functions to analyzeexture is motivated by several factors. First, theabor representation can be shown to be optimal in

he sense that it minimizes the joint two-dimensionalncertainty in space and frequency.9 Second, the

act that Gabor functions can be used to model theeceptive fields of simple cells in the mammalian vi-ual cortex10 is psychovisual evidence that Gabor-likeltering takes place early on in the human visualystem.A bank of orientation and scale-selective filters are

onstructed as follows.8 A two-dimensional Gaborunction g�x, y� and its Fourier transform G�u, �� cane written as

g� x, y� � � 12��x�y

�exp��12 � x2

�x2 �

y2

�y2� � 2�jWx� ,

(1)

�u, �� � exp��12 ��u � W�2

�u2 �

�2

��2�� , (2)

here �u � 1�2��x and �� � 1�2��y. A class ofelf-similar functions referred to as Gabor wavelets isow considered. Let g�x, y� be the mother wavelet.

filter dictionary can be obtained by appropriateilations and translations of g�x, y� through the gen-rating function:

rs� x, y� � a�sg� x�, y��, a � 1,

s � 0, . . . , S � 1, r � 1,. . .R,

x� � a�s� x cos � y sin �,

y� � a�s��x sin � y cos �, (3)

here � �r � 1���R. The indices r and s indicatehe orientation and scale of the filter, respectively.

is the total number of orientations and S is the totalumber of scales in the filter bank. The scale factor�s in Eqs. �3� is meant to ensure that the energy isndependent of s.

Although the size of the filter bank is applicationependent, experimentation has shown that a bankf filters tuned to combinations of five scales, at oc-ave intervals, and six orientations, at 30-deg inter-als, is sufficient for the analysis of remote sensed

magery. See Ref. 8 for more details on the filterank construction.

. Homogeneous Texture Descriptor

he homogeneous texture descriptor is a compactepresentation of the Gabor filter outputs.11 It is aeature vector formed when the first and second mo-ents of the filter outputs are computed for a spa-

ially homogeneous image or image region.epresenting texture as a point in this multidimen-ional feature space is useful because closeness in theeature space turns out to correspond well with visualimilarity. The visual similarity between imagesith respect to texture can be quantitatively com-uted with a distance function. These distances cane used to perform content-based similarity retrievalexamples to follow�. Furthermore, machine learn-ng techniques can be used to estimate the distribu-ions of texture classes from manually labeledxamples. We can label new images using these dis-ributions.

The homogeneous texture descriptor is formed asollows. Suppose f11�x, y�, . . . , fRS�x, y� are the out-uts of a filter bank tuned to R orientations and Scales. The feature vector f is then

f � �11, �11, �12, �12, . . . , �1S, �1S, . . . , �RS, �RS�

(4)

here �rs and �rs are the mean and standard devia-ion of frs�x, y�, respectively. This descriptor wasecently standardized by the International Stan-ards Organization MPEG-7 Multimedia Contentescription Interface Standard12 with the minorodification that the mean and standard deviation of

he filter outputs are computed in the frequency do-ain for reasons of efficiency.

. Similarity Retrieval

ow-level visual primitives have been used for someime to perform content-based similarity retrieval inmage data sets.13–16 The homogeneous texture de-criptor is particularly effective for similarity re-rieval in remote sensed imagery. We compute theisual similarity between images by defining a dis-ance measure in the 2RS-dimensional texture fea-ure space. Euclidean distance is commonly used sohat the similarity between images I�1� and I�2� isomputed as

dI�1�, I�2�� � � f �1� � f �2��2. (5)

query-by-example paradigm can be used to performontent-based similarity retrieval in image data sets.e compute the nearest-neighbor queries by return-

ng those images with the least distance to the query.e compute the range queries by returning those

mages whose distances to the query image aremaller than some specified threshold.To localize the descriptor, large satellite or aerial

mages are typically divided into tiles measuring4 64 or 128 128 pixels. This tiling results inarge data sets because of the size of the original

10 January 2004 � Vol. 43, No. 2 � APPLIED OPTICS 211

Page 3: Using texture to analyze and manage large collections of remote sensed image and video data

isttaStd

dfT�otoufta

mio

Hsfvo

bt

ondteash

3

Tteshh

Fm

Fea

Fqo

2

mages. We compute the homogeneous texture de-criptor for each tile by applying a Gabor filter bankuned to combinations of five scales and six orienta-ions. The output of the 30 filters is summarized by

60-component feature vector, as described above.imilarity retrieval is performed when we computehe distances between a query tile and the rest of theata set.Figure 1 shows an example of a range query for a

ata set of approximately 400,000, 64 64 pixel tilesrom seven satellite images with 1-m resolution.he query tile is the rightmost tile on the freeway

the tiles are indicated with a white border�. Thether tiles are the results of a range query. Notehat high precision and recall are achieved becausenly the remaining freeway tiles are retrieved. Fig-re 2 shows an example of a nearest-neighbor query

or the same data set. The top-left tile is the queryile. The other tiles are those closest to the querymong the 400,000 tile data set.Euclidean distance results in a visual similarityeasure that is orientation sensitive. Orientation-

nvariant similarity measurement is possible by usef the following distance function:

dRII�1�, I�2�� � min

r�R� f�r�

�1� � f �2��2. (6)

ere, f�r� indicates the feature vector f circularlyhifted by r rotations. Conceptually, this distanceunction computes the best match between rotatedersions of the images. Figure 3 shows an examplef an orientation-invariant nearest-neighbor search

ig. 1. Results of a range query in a satellite image. The right-ost tile is the query tile. The other tiles are the results.

12 APPLIED OPTICS � Vol. 43, No. 2 � 10 January 2004

y use of the same query tile as in Fig. 2. Again, theop-left tile is the query tile.

Similarity retrieval examples such as these dem-nstrate that the descriptor provides meaningful an-otation of remote sensed imagery. In fact, theescriptor was accepted to the MPEG-7 standard af-er extensive experimentation and testing with anvaluation data set that included a large collection oferial images. A demonstration of content-basedimilarity retrieval by use of the MPEG-7-compliantomogeneous texture descriptor is available on line.17

. Semantic Image Labeling

he above similarity retrieval examples demonstratehat the homogeneous texture descriptor provides anffective low-level perceptual description of remoteensed imagery. Low-level descriptions are limited,owever, in their ability to support the kinds ofigher-level interaction that are ultimately required.

ig. 2. Results of a nearest-neighbor query in a data set of sat-llite images. The top-left tile is the query tile. The other tilesre the results.

ig. 3. Results of an orientation-invariant nearest-neighboruery. The top-left tile is the query tile. The other tiles are somef the results.

Page 4: Using texture to analyze and manage large collections of remote sensed image and video data

Illm

tclmcomub

A

TsbtibdtTsfvGmcttRu

wffia

B

WGlttcbtdaa��

le

wtt

wTte

wlassisvpcfsjl

newwcest

C

IiorittXctpcv

n this section we describe how statistical machineearning techniques can be used to extend this low-evel description to provide a more meaningful se-

antic level of interaction.The approach has two salient components. First,

he statistical distributions of texture feature vectorsorresponding to different semantic classes areearned. This allows novel image regions to be se-

antically labeled. Second, a MRF model is used toheck for inconsistencies in the spatial arrangementf this labeling. The two components are imple-ented in a video management system that allows

sers to interact with aerial videography data setsased on the spatial layouts of video frames.

. Semantic Labeling

he statistical distribution of texture features corre-ponding to different semantic classes, such as roads,uildings, and fields, can be learned with a labeledraining set. In the proposed approach, this train-ng set consists of image or video frame tiles that haveeen manually assigned semantic labels from a pre-etermined set of classes. The tiles are also anno-ated by use of the homogeneous texture descriptor.he premise of the approach is that, because �1� vi-ually similar textures form clusters in the sparseeature space and �2� there is a wide variation inisual appearance within each semantic class, thenaussian mixture models �GMMs� can be used toodel the feature distributions conditioned on the

lass labels. In particular, under the assumptionhat each semantic class forms K clusters in the fea-ure space, we model the distribution of features Y �d �d � 60� for a class as a mixture of K Gaussianssing the following density function:

P� ym� � �j�1

K

�mj

1�2��d�mj�1�2 exp��

12

� y

� �mj�T�mj

�1� y � �mj�� , (7)

here m � ��mj, �mj, �mj�j�1K is the parameter set

or the semantic class m. �mj are the mixture coef-cients, �mj � Rd is the mean of the jth Gaussian,nd �mj is the covariance matrix of the jth Gaussian.

. Spatial Layout Consistency

e labeled the image tiles using the set of trainedMMs and a maximum-likelihood classifier. This

abeling is rarely perfect, however. An effective wayo improve its accuracy is to impose constraints onhe spatial arrangement of the labels. Inconsisten-ies can be ruled out, such as a tile labeled parking loteing surrounded by tiles labeled water. In this par-icular approach we use MRFs to model the spatialistribution of the class labels. The label of a blockt site s is modeled as a discrete-valued random vari-ble Xs, taking values from the semantic label set M�1, 2, . . . , M�, and the set of random variables X �

X , s � S� constitutes a random field where S is the

s

attice of image blocks. The random field X is mod-led as a MRF with a Gibbs distribution18:

p� x� �1Z

exp�U� x��, (8)

here x is a realization of X. The Gibbs energy func-ion U�x� can be expressed as the sum of clique po-ential functions:

U� x� � �c�Q

Vc� x�, (9)

here Q is the set of all cliques in a neighborhood.he MRF model is reinforced when we incorporatehe class-conditioned feature likelihoods into the en-rgy function, as follows:

U� x� � �s�S

� �s��Ns

��LPs�s� � �LPs� , (10)

here LPs�s� � logps�s��xs, xs��� and LPs �ogp�ysxs

��. Ns is the set of neighbors of the site snd � and � are the weights of LPs and LPs�s�, re-pectively. LPs�s� represents the spatial relation-hip between neighboring sites s and s� where s�s�ndicates the direction of neighborhood. LPs repre-ents the conditional probability density of featureector ys given the label xs. ps�s��xs, xs�� is the jointrobability of xs and xs� along the direction s�s� andan be approximated with a co-occurrence matrixrom the labeled training set. For each type of clique�s�, a co-occurrence matrix is constructed from theoint probabilities Pr�i, j� between pairs of semanticabels i and j in a given direction r.

To simplify the model, a second-order pair-siteeighborhood system is used. Each site thus hasight neighbors. Four types of clique are considered,herein s�s� makes angles of 0, 45, 90, and 135 degith respect to the x axis. In any neighborhood,

liques along the same direction are consideredquivalent. Four co-occurrence matrices are con-tructed along these four directions of label distribu-ion.

. Spatial Layout Retrieval

n this step, a representation termed semantic layouts used to characterize an image or video frame basedn the spatial arrangement of its labeled tiles. Foretrieval purposes, the similarity between the querymage and each stored image can be determined fromheir semantic layouts. Let the semantic layout ofhe query image be Xq and that of the stored image beI. To improve the retrieval performance, a softlassification scheme is adopted. For a given imageile, the labels with the three largest local conditionalrobabilities are selected to represent this tile. Allandidate labels are stored along with the featureectors for future retrieval. The modified semantic

10 January 2004 � Vol. 43, No. 2 � APPLIED OPTICS 213

Page 5: Using texture to analyze and manage large collections of remote sensed image and video data

ls

wdd

ctiet

wtgisldt

taamtfaaGs

4

SiittsmstT

vtsce

hwiTppwtrw

ts

Fmf

2

ayout similarity between the query image and eachtored image is given by

S3 � �s�S

��j�1

3

a1 aj�� xs, jq, xs,1

1��� �

s�S��

j�1

3

a2 aj�� xs, jq, xs,2

I��� �1 � �

i�1

3

�� xs,iq, xs,1

I��� �

s�S��

j�1

3

a3 aj�� xs, jq, xs,3

I��� �1 � �

i�1

3

�� xs,iq, xs,1

I��� �1 � �

i�1

3

�� xs,iq, xs,2

I�� , (11)

here ai � 1⁄2i�1, i � 1, 2, 3, are the weights forifferent label similarities, and xs, j

q is the jth candi-ate label of the query image tile at site s.In Eq. �11�, we computed the similarity measure by

omparing each candidate label of a query image tileo all the candidate labels of the corresponding storedmage tile. This approach to similarity retrieval isxpected to perform better than existing methodshat do not consider the underlying semantics.

The similarity measure in Eq. �11� is effective onlyhen all images are of the same size. To compare

he layouts of images of any size, a semantic histo-ram can be computed for each image. This is sim-lar to the image histogram where the inputs areemantic labels instead of image intensities. Asong as the semantic label set is the same, twoifferent-sized images can be compared by use ofheir semantic histograms.

Examples of semantic layout retrieval in a proto-ype video management system are shown in Figs. 4nd 5. In each example, the top frame is the querynd the remaining frames are the results. The se-antic labeling of the query is shown. Observe that

he retrieved frames can be visually quite differentrom the query even though their semantic layoutsre similar. These results could not be achieved bylow-level description alone. The combination ofMM and MRF models is thus shown to capture the

emantic layout of the frames.

. Geospatial Object Modeling

o far, the texture descriptor has been used for sim-larity retrieval in large collections of remote sensedmage or video data. In this section we describe howhe texture features enable an object-based represen-ation for remote sensed imagery.19 Such a repre-entation allows users to interact with the data at aore intuitive level. In the proposed approach, geo-

patial objects that consist of multiple characteristicextures are modeled by mixtures of Gaussians.he model parameters are estimated with unsuper-

14 APPLIED OPTICS � Vol. 43, No. 2 � 10 January 2004

ised learning techniques �unlike in Section 3 wherehe parameters are estimated from labeled trainingets�. The goal is to use the models to locate andatalog new objects instances in remote sensed imag-ry.The focus is on modeling geospatial objects that

ave characteristic textures but lack structure orell-defined boundaries. Examples of such objects

nclude harbors, golf courses, and mobile home parks.hese objects defy traditional object modeling ap-roaches, such as template matching, so new ap-roaches are needed. With the proposed techniquee model the objects by statistically characterizing

he common textures, or texture motifs, such as theows of boats and water in harbors or the grass fair-ays and trees in golf courses.The texture motifs are learned by use of the Gabor

exture features extracted from sample object in-tances. An RS-dimensional feature vector c is ex-

ig. 4. Example of a spatial layout retrieval in a prototype videoanagement system. The top frame is the query. The lower

our frames are the results.

Page 6: Using texture to analyze and manage large collections of remote sensed image and video data

todtjatc

wtpwtm

Fmf

Fe

Fl

racted for each pixel with a filter bank of Rrientations and S scales. �Note that the analysis isone at the pixel level, not at the tile level.� Underhe assumptions that �1� the pixels for a class of ob-ects are generated by one of N possible texture motifsnd �2� these motifs have Gaussian distributions inhe feature space, the probability density function ofcan be expressed as a mixture distribution:

p�c� � �j�1

N

P� j� p�c j�, (12)

here p�c j� is the conditional likelihood of the fea-ure c being generated by motif j, and P� j� is the priorrobability of motif j. The number of motifs N alongith distribution means and covariance matrices are

he parameters that completely specify the classodel.The expectation maximization algorithm20 is used

ig. 5. Example of a spatial layout retrieval in a prototype videoanagement system. The top frame is the query. The lower

our frames are the results.

ig. 6. Model parameters are learned from a sample set with thexpectation maximization algorithm.

Fig. 7. Motif labeling with a MAP classifier.

ig. 8. Harbor image and its corresponding labeling. The grayevel indicates the motif.

10 January 2004 � Vol. 43, No. 2 � APPLIED OPTICS 215

Page 7: Using texture to analyze and manage large collections of remote sensed image and video data

tptdf

fma

Tt

hlmcoescaeU

5

Wtsbsbi

at

NpP�0�v

R

1

1

1

1

1

1

1

1

Fg

2

o estimate the GMM parameters from a set of sam-le object instances, as shown in Fig. 6. It is impor-ant to note that the characteristic textures areetermined solely from the distribution of the textureeatures.

A trained GMM can then be used to label the motifsor an object instance, as shown in Fig. 7. A maxi-

um a posteriori �MAP� classifier assigns label i* topixel with feature vector c according to

i* � arg max1�i�N

P�ic��. (13)

he posterior probabilities P�ic� are computed withhe Bayes rule.

Figures 8 and 9 show the motif assignments for twoarbor regions. The gray level indicates the motif

abel. The consistency between the two assign-ents demonstrates that the proposed approach suc-

essfully learns the textures common to harborbjects. Subsequent analysis can use this knowl-dge to detect novel object instances in new remoteensed imagery to automatically update an objectatalog �commonly known as a gazetteer�. Such anpproach is currently being investigated for the Al-xandria Digital Library of geographic data at theniversity of California at Santa Barbara.21

. Conclusion

e have described recent research into use of textureo analyze and manage large collections of remoteensed image and video data. A texture descriptorased on the outputs of scale and orientation-elective Gabor filters is shown to enable �1� content-ased image retrieval in large collections of satellitemagery, �2� semantic labeling and layout retrieval in

ig. 9. Second harbor image and its corresponding labeling. Theray level indicates the motif.

16 APPLIED OPTICS � Vol. 43, No. 2 � 10 January 2004

n aerial video management system, and �3� statis-ical object modeling in geographic digital libraries.

This research is supported by the following grants:ASA California Space Grant �S. Newsam�, U.S. De-artment of Transportation Research and Specialrograms Administration contract DTRS-00-T-0002

S. Newsam�, U.S. Office of Naval Research N00014-1-1-0391 �L. Wang�, National Science FoundationNSF� Digital Library award IIS-9817432 �S. Bhaga-athy�, and NSF Instrumentation EIA-9986057.

eferences1. A. Huertas and R. Nevatia, “Detecting buildings in aerial im-

ages,” Comput. Vision Graph. Image Process. 41�2�, 131–152�1988�.

2. J. Shufelt and D. M. McKeown, “Fusion of monocular cues todetect man-made structures in aerial imagery,” Comput. Vi-sion Graph. Image Process. 57�3�, 307–330 �1993�.

3. T. Chang and C.-C. J. Kuo, “Texture analysis and classificationwith tree-structured wavelet transform,” IEEE Trans. ImageProcess. 2�4�, 429–441 �1993�.

4. R. M. Haralick, K. Shanmugam, and I. Dinstein, “Texturalfeatures for image classification,” IEEE Trans. Syst. Man Cy-bern. 3�6�, 610–621 �1973�.

5. F. Liu and R. W. Picard, “Periodicity, directionality, and ran-domness: wold features for image modeling and retrieval,”IEEE Trans. Pattern Anal. Mach. Intell. 18�7�, 722–733 �1996�.

6. J. Mao and A. Jain, “Texture classification and segmentationusing multiresolution simultaneous autoregressive models,”Pattern Recogn. 25�2�, 173–188 �1992�.

7. R. Chellappa and S. Chatterjee, “Classification of textures us-ing Gaussian Markov random fields,” IEEE Trans. Acoust.Speech Signal Process. 33, 959–963 �1985�.

8. B. S. Manjunath and W. Y. Ma, “Texture features for browsingand retrieval of image data,” IEEE Trans. Pattern Anal. Mach.Intell. 18, 837–842 �1996�.

9. J. Daugman, “Complete discrete 2D Gabor transform by neuralnetworks for image analysis and compression,” IEEE Trans.Acoust. Speech Signal Process. 36, 1169–1179 �1988�.

0. S. Marcelja, “Mathematical description of the responses ofsimple cortical cells,” J. Opt. Soc. Am. 70, 1297–1300 �1980�.

1. P. Wu, B. S. Manjunath, S. Newsam, and H. D. Shin, “Atexture descriptor for browsing and similarity retrieval,” J.Signal Process. 16�1–2�, 33–43 �2000�.

2. B. S. Manjunath, P. Salembier, and T. Sikora, eds., Introduc-tion to MPEG7: Multimedia Content Description Interface�Wiley, New York, 2002�.

3. M. Flickner, H. Sawhney, W. Niblack, J. Ashley, Q. Huang, B.Dom, M. Gorkani, J. Hafine, D. Lee, D. Petkovic, D. Steele, andP. Yanker, “Query by image and video content: the QBICsystem,” IEEE Comput. 28�9�, 23–32 �1995�.

4. J. R. Bach, C. Fuller, A. Gupta, A. Hampapur, B. Horowitz, R.Humphrey, R. Jain, and C. F. Shu, “The Virage image searchengine: an open framework for image management,” in Stor-age and Retrieval for Image and Video Databases IV, I. K.Sethi and R. C. Jain, eds., Proc. SPIE 2670, 76–87 �1996�.

5. A. Pentland, R. W. Picard, and S. Sclaroff, “Photobook:content-based manipulation of image databases,” Int. J. Com-put. Vision 18�3�, 233–254 �1996�.

6. J. R. Smith and S.-F. Chang, “VisualSEEk: a fully automatedcontent-based image query system,” in Proceedings of theFourth ACM International Conference on Multimedia �AMCMultimedia, Berkeley, Calif., 1996�, pp. 87–98.

7. S. Newsam, J. Tesic, M. Saban, and B. S. Manjunath, MPEG-7

Page 8: Using texture to analyze and manage large collections of remote sensed image and video data

1

1

2

2

HomogeneousTextureDescriptorDemo,http:��vision.ece.ucsb.edu�texture�mpeg7�instructions.html.

8. S. Z. Li, Markov Random Field Modeling in Image Analysis�Springer-Verlag, Berlin, 2001�.

9. S. Bhagavathy, S. Newsam, and B. S. Manjunath, “Modelingobject classes in aerial images using texture motifs,” in Pro-ceedings of the International Conference on Pattern Recogni-

tion �IEEE Computer Society, Los Alamitos, Calif., 2002�, pp.981–984.

0. A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximumlikelihood estimation from incomplete data via the EM algo-rithm,” J. R. Statistical Soc. Ser. B 39�1�, 1–38 �1977�.

1. The Alexandria Digital Library Project, http:��www.alexandria.ucsb.edu.

10 January 2004 � Vol. 43, No. 2 � APPLIED OPTICS 217