an architecture for and query processing in distributed content-based image retrieval

14
Query Processing in Distributed Image Retrieval Environment Images are being generated at an ever increasing rate by diverse military and civilian sources. The recent ubiquitous interest in multimedia systems has further accelerated this trend. A content-based image retrieval (CBIR) system is required to utilize information from the image repositories effectively. Content-based retrieval is characterized by the ability of the system to retrieve relevant images based on the contents of an image rather than by using simple attri- butes or keywords assigned to the image. Content-based retrieval is facilitated by employing several generic query classes. The relevance of the retrieved images may be judged differently by system users for an identically for- mulated query. That is, the notion of relevance is dynamic, subjective, and is a function of both the user’s retrieval need and context. This necessitates a CBIR system to be adaptive and process queries from the view point of the user’s interpretation of the images and domain seman- tics. With the existence of the information superhighway, image repositories are evolving in a decentralized fashion 1077-2014/96/030139 + 14 $18.00 © 1996 Academic Press Limited An Architecture for and Query Processing in Distributed Content-based Image Retrieval mages are being generated at an ever increasing rate by diverse military and civilian sources. A content-based image retrieval system is required to utilize information from the image repositories Ieffectively. Content-based retrieval is characterized by several generic query classes. With the exis- tence of the information superhighway, image repositories are evolving in a decentralized fashion on the Internet. This necessitates network transparent distributed access in addition to the content-based retrieval capability. Images stored in low-level formats such as vector and raster are referred to as physical images. Constructing interactive responses to user queries using physical images is not practical and robust. To overcome this problem, we introduce the notion of logical features and describe various features to enable content-based query processing in a distributed environment. We describe a tool named SemCap for extracting the logical features semi-automatically. We also propose an architecture and an applica- tion level communication protocol for distributed content-based retrieval. We describe the prototype implementation of the architecture and demonstrate its versatility on two distributed image collections. © 1996 Academic Press Limited Venkat N. Gudivada* and Gwang S. Jung *Department of Computer Science, University of Missouri, Rolla, MO 65401, USA Department of Computer Science, Jackson State University, Jackson, MS 39217, USA Real-Time Imaging 2, 139–152 (1996)

Upload: venkat-n-gudivada

Post on 15-Jun-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Query Processing in Distributed Image RetrievalEnvironment

Images are being generated at an ever increasing rate bydiverse military and civilian sources. The recent ubiquitousinterest in multimedia systems has further accelerated thistrend. A content-based image retrieval (CBIR) system isrequired to utilize information from the image repositorieseffectively. Content-based retrieval is characterized by theability of the system to retrieve relevant images based onthe contents of an image rather than by using simple attri-

butes or keywords assigned to the image. Content-basedretrieval is facilitated by employing several generic queryclasses. The relevance of the retrieved images may bejudged differently by system users for an identically for-mulated query. That is, the notion of relevance isdynamic, subjective, and is a function of both the user’sretrieval need and context. This necessitates a CBIR systemto be adaptive and process queries from the view point ofthe user’s interpretation of the images and domain seman-tics. With the existence of the information superhighway,image repositories are evolving in a decentralized fashion

1077-2014/96/030139 + 14 $18.00 © 1996 Academic Press Limited

An Architecture for and Query Processingin Distributed Content-based ImageRetrieval

mages are being generated at an ever increasing rate by diverse military and civilian sources. Acontent-based image retrieval system is required to utilize information from the image repositoriesIeffectively. Content-based retrieval is characterized by several generic query classes. With the exis-

tence of the information superhighway, image repositories are evolving in a decentralized fashion onthe Internet. This necessitates network transparent distributed access in addition to the content-basedretrieval capability.

Images stored in low-level formats such as vector and raster are referred to as physical images.Constructing interactive responses to user queries using physical images is not practical and robust. Toovercome this problem, we introduce the notion of logical features and describe various features toenable content-based query processing in a distributed environment. We describe a tool named SemCapfor extracting the logical features semi-automatically. We also propose an architecture and an applica-tion level communication protocol for distributed content-based retrieval. We describe the prototypeimplementation of the architecture and demonstrate its versatility on two distributed image collections.

© 1996 Academic Press Limited

Venkat N. Gudivada* and Gwang S. Jung†

*Department of Computer Science,University of Missouri, Rolla,

MO 65401, USA†Department of Computer Science,

Jackson State University, Jackson, MS 39217, USA

Real-Time Imaging 2, 139–152 (1996)

on the Internet. Thus, there is a need to provide networktransparent access to distributed image collections. A CBIRsystem should also feature a query language/interface thatis natural and elegant for specifying queries by naive andcasual users.

User queries can be processed using either physicalimages or logical features of the physical images. Physicalimage refers to the representation of an image in a formatsuitable for efficient storage and display (e.g., tiff format).There are several representations for physical images basedon raster or vector formats. Since user queries tend to be at aconceptual level, processing them using physical imagesrequires first extracting relevant image abstractions by usingimage processing and interpretation techniques. Moreover,this task is repeated each time the image is considered aspotentially relevant for a query since the image abstractionsare not stored persistently. Thus, physical images are rarelyused in interactive query processing environments.

Logical features, on the other hand, are domainindependent as well as domain-specific abstractions of animage at various levels. One can perceive logical featuresas spanning a spectrum with the physical image situated atone end of the spectrum. The farther a logical feature isfrom the physical image on the spectrum, the higher is thedegree of abstraction manifested in the logical feature.Each logical feature abstracts a certain content of thephysical image and facilitates efficient processing of one ormore generic queries. Therefore, to process various genericqueries, pertinent logical features are required. For example, to process color similarity queries, an image colorhistogram is an appropriate logical feature. For processingspatial similarity queries, the logical features such as a spatial orientation graph, QR-string is suitable. Logicalfeatures may be derived manually, automatically, or by acombination of both depending on the domain and com-plexity of images. This task is carried out only at the timeof inserting an image into the database. We sometimes referto logical features as logical images to suit the context.

Our approach to query processing in a distributed imageretrieval environment is based on the notion of logicalimages. We have two distinct databases corresponding to animage repository: physical and logical. For each physicalimage, there are many possible logical images to facilitatecontent-based image retrieval. The notion of logical imagesobviates the need for repeated image understanding. Thedisk space required to store a logical image compared to itsphysical counterpart is negligible. These observations haveinteresting implications for query processing in a dis-tributed image retrieval environment.

In this paper, we propose a design for distributed content-based image retrieval and examine various associated issues.Our distributed image retrieval environment encompassesvarious image collections on the Internet. Therefore, weinherently have a loosely coupled distributed* imagedatabase. We adapt the terminology from conventionaldistributed database management systems† to describe thefeatures of the proposed architecture. Conventional systemssupport features along five dimensions: degree of globalquery optimization, data manipulation language transpa-rency, number of user interfaces, degree of componentDBMS control, and the number of component databasesthat the user can update in a single request [1]. The degreeof global query optimization refers to whether or not thesystem provides replication, location and fragmentationtransparency. If the system provides no global query optimization, the user is responsible for identifying all therelevant image repositories (i.e., component databases),formulating and submitting queries to component data-bases, and integrating the results. In contrast, if the systemfeatures location transparency, there is no need for the userto be aware of the location of the images relevant to theirquery. Likewise, under replication transparency the user isnot aware of the existence of more than one copy of aphysical or logical image. For effective utilization of theimage repositories on the Internet, replication and locationtransparency is critical and our design supports this transparency.

A system supports fragmentation transparency if the useris not aware that an image is fragmented. Typically eachfragment resides at a different location. Fragmentingphysical images is useful in order to reduce the disk I/Otime. An image can be divided into several blocks andsuccessive blocks are stored on different disks or servers.Logical images can be grouped and distributed based ontheir class (e.g., logical images for shape, spatial, color,texture) assuming that each site has a predominant need forone or more classes of logical images. The current versionof the architecture doesn’t address the fragmentation issueand we plan to incorporate it in future enhancements.

Data manipulation language transparency and the numberof query interfaces fuse into one dimension for distributedimage databases. There exist several classes of queries andeach query class requires an interface that is both elegant

140 V. N. GUDIVADA AND G. S. JUNG

* The component databases are possibly heterogeneous since theymay feature different image repositories such as satelliteimages, floor plans and perspective views of homes for sale,natural scenes.

† Conventional distributed database management systems arethose used in commercial and business applications and arebased on the relational data model.

and intuitive. The degree of component DBMS controlrefers to the ability of the component image database toenable or disable access to their images. In our proposeddesign, the component databases have full access control totheir image collections. This control is facilitated by anapplication level communication protocol. The number ofcomponent databases that the user can update in a singlerequest is not applicable to the current generation of distributed image databases. This is because the queriesonly need to retrieve the relevant images and the compo-nent databases typically prohibit updates to the images.

Content-Based Image Retrieval

Regardless of the domain, an image can be thought of asa complex object or entity. That is, an image is composedof one or more domain objects. The domain objectsthemselves can be complex objects. They are constructedusing hierarchy, aggregation, and association data model-ing principles. A domain object can be considered as asub-image within an image. Intuitively, a domain objectis a semantic entity contained in the image which ismeaningful in the application. For example, in architectu-ral design application, various functional and aestheticunits in the floor plan image constitute domain objects.At the physical representation level, a domain object isdefined as a subset of the image pixels. Domain objectsare characterized by various logical features. We groupfeatures into three categories: objective, subjective, andsemantic. The interpretation of an objective feature is thesame from one system user to another. For example,number of bedrooms and total floor area are two objectivefeatures of a residential floor plan image. Compared to sub-jective features (discussed below), objective features aremore precise.

The interpretation of subjective features, on the otherhand, may vary significantly from one user to another. Therange of values assumed by a subjective feature is bestviewed as spanning a spectrum characterized by a left-handpole (one extreme position) and a right-hand pole (the otherextreme position). A user’s subjectivity is then associatedwith a specific position on the spectrum. Semantic featuresdenote deeper domain semantics manifested in theimages. Often, a semantic feature corresponds to a groupof domain objects. Semantic features are used to capturegeometrical and topological properties, aggregation andgeneralization/specialization relationships, and othersemantics among the domain objects. We use the termattribute to refer to the logical features suitable forformulating user queries.

A Taxonomy for Query Classes

Based on the retrieval requirements analysis of a number ofimage retrieval applications, we have found the followinggeneric query classes important to facilitate CBIR [2]:retrieval by color, texture, sketch, shape, volume, spatialconstraints, browsing, objective attributes, subjectiveattributes, sequences, keywords, natural language text, anddomain concepts. Retrieval by color and texture queriesenable retrieving images that contain domain objects withspecified color and texture. Using retrieval by sketch, a usersimply sketches an image of interest and expects the systemto retrieve images in the database that are similar to thesketch. Retrieval by sketch can be thought of as retrievingimages by matching the dominant edges. Retrieval byshape facilitates a class of queries that are based on theshapes of objects in an image and its counterpart in 3Dimages is referred to as retrieval by volume.

Retrieval by spatial constraints deals with a class ofqueries based on spatial and topological relationshipsamong the domain objects. This query is subdivided intotwo categories: retrieval by spatial similarity and topologicalrelationships. Retrieval by spatial similarity queriesrequires selecting database images that satisfy the spatialrelationships specified in the query to varying degrees. Thisdegree of conformance is used to rank order databaseimages with respect to the query. In contrast, retrieval bytopological relationships involves selecting those databaseimages in which the specified topological relationship existamong the domain objects.

Retrieval by browsing is performed when the user isvague about his retrieval needs or unfamiliar with thestructure and the types of information available in theimage database. In retrieval by objective attributes, a queryis formulated using objective image attributes and is similarto the retrieval in conventional databases using SQL(structured query language). Query processing is based onexact match on the attribute values. Subjective attributes areused to specify retrieval by subjective attributes queries.Processing this class of queries requires that the query pro-cessor be adaptive by learning from the user interaction atquery processing time. Retrieval by sequence queries facili-tates the retrieval of spatiotemporal image sequences thatdepict a domain phenomenon that varies in space or time.

In some applications, images tend to be quite distinctfrom each other both in structure and semantic content. Thenotion of keyword or term from information retrieval [3]area is useful for modeling such images. An image ismodeled by a set of keywords which are representative of

DISTRIBUTED QUERY PROCESSING 141

the content. In applications such as photo journalism, acaption or natural language text typically accompanies apicture (or image). This text usually describes the contentsof the image, among other things. Retrieving images fromsuch applications is modeled by natural language textquery class. The above query classes can be used as fundamental, primitive building blocks in formulating aclass of complex queries referred to as retrieval by domainconcepts.

Logical Features for Distributed Query Processing

In this section, we briefly discuss various logical featuressuitable for processing shape, spatial, color, text queryclasses.

Geometry Based Logical Features for Spatial Similarity

We discuss the following logical features: minimumbounding rectangle, plane sweep representation, spatialorientation graph, ÏQR-string, 2D-string. All these featuresconsider the domain objects as point objects located at theircentroid.

Minimum Bounding RectangleMinimum bounding rectangle (MBR) is the minimum sizerectangle that completely bounds a given object. The MBRconcept is very useful in dealing with image objects that arearbitrarily complex in terms of their boundary shapes.MBR representation serves as an efficient test (a necessarybut not a sufficient condition) to determine whether or nottwo objects intersect. Figure 1(a) shows an example ofMBR approximation of objects in an image.

Sweep Line RepresentationIn computational geometry, there is an operation calledsweep, that is natural and efficient for solving severalgeometrical problems [4]. Instantiation of the sweep technique for 3D and 2D geometrical problems is knownas space and plane sweep. Plane sweep technique uses ahorizontal line and a vertical line to sweep the image planefrom top to bottom (vertical sweep) and from left to right(horizontal sweep). Both vertical and horizontal sweeplines stop at predetermined points called the event points.Event points are selected in a way to capture the spatialextent of domain objects. For each stop position of thesweep line, the image objects intersected by the sweep lineare recorded (i.e., the sweep line status). Therefore, thesweep line representation of an image consists of a set of

142 V. N. GUDIVADA AND G. S. JUNG

Figure 1. Logical features for spatial similarity.

event points, and for each event point, its sweep line statusfor both horizontal and vertical sweeps. Containment andoverlap queries can be efficiently processed using this logi-cal feature. As an example, consider the image shown inFigure 1(b). The spatial extent of each of the five domainobjects is represented by their polygonal approximations.The vertices of these polygons constitute the event points.The figure shows a snapshot for one stop of the horizontal(line HH) and vertical sweep lines (line VV). The sweepline status for the horizontal sweep is: elephant and for thevertical sweep: antelope, brachiosaurus.

Spatial Orientation GraphA spatial orientation graph [Figure 1(c)] is a fully con-nected weighted graph. Each vertex in the graphcorresponds to a domain object and each vertex is con-nected to every other vertex. Associated with each vertexare the (x, y)-coordinates of the corresponding domainobject with reference to a Cartesian coordinate system. Theweight of an edge connecting two vertices is the slope* ofthe line joining the corresponding domain objects. Spatialorientation graph has been used by Gudivada andRaghavan [5] for computing the spatial similarity between2D images and is extended to 3D images by Gudivada andJung [6].

QR-stringQR-string representation of an image is a variation of thesweep line representation. While sweep line representationemploys two lines (horizontal and vertical), QR-stringemploys only one sweep line (i.e., radial sweep). As shownin Figure 1(d), the radial sweep line is pivoted at the imagecentroid (indicated by the large black dot). QR-string isgenerated by concatenating the names of the image objectsin the order intersected by the radial line as it sweeps onefull revolution about the pivot point. Assuming counter-clockwise direction for the radial sweep, QR-stringrepresentation for the image is: brachiosaurus, tiger, elephant, antelope and camel. QR-string has been used byGudivada and Jung [7] to compute spatial similarity.

2D-stringA 2D-string can be viewed as the projection of the objectsof an image along the and the x- and the y-axes. 2D-stringis denoted by (U, V), where U and V are the projections ofthe objects on the x- and y-axes. Let R be the set {=, <, :},where the symbol “=” denotes the spatial relation “at the

same location as,” the symbol “<” denotes the spatialrelation “left of/right of” or “below/above,” and the symbol“:” denotes the spatial relation. Consider the image shownin Figure 1(c). The projection of the domain objects on thex-axis gives the following string: (elephant < antelope <brachiosaurus < tiger < camel). The projection of theobjects on the y-axis gives the string: (camel = antelope <brachiosaurus < tiger < elephant). In Lee et al. [8], thislogical feature has been used for computing the spatialsimilarity between the images.

Logical Features for Shape Similarity

Several logical features have been proposed for character-izing an object’s shape and associated algorithms exist forcomputing the similarity between shapes [9]. Shape features can be based on the interior of an object or its boundary. Karhunan-Loeve transform-based shaperepresentation has also been used [10]. An important criterion for features for shape representation is that they beinvariant to affine transformations (e.g., translation, scale,and rotation) of images.

Logical Features for Color Similarity

Most of the approaches in the literature employ a colorhistogram as a logical feature for computing the colorsimilarity. Colors in an image are mapped into a discretecolor space containing n colors. A color histogram H(I) ofan image I is a vector (c1, c2, c3,..., cn) in an n-dimensionalvector space, where each element, ci, represents the numberof pixels of color i in the image. A metric on the colorhistogram space is used to quantify color similarity.

Logical Features for Text

It is worth recalling that in some applications text associ-ated with an image describes the contents of an image,among other things. Logical features based on naturallanguage processing techniques (morphological, syntactic,and semantic analysis) are suitable for representing thecontent of text. As an alternative, keywords can be used ascontent descriptors. Optionally, a weight (usually in therange 0 to 1) can be associated with a keyword to indicatethe degree of relevance of the keyword to describe thecontents of an image. Keywords can be automaticallyextracted and weighted from the text using automaticindexing methods. This method is preferred when theimage collection is very large and interactive query

DISTRIBUTED QUERY PROCESSING 143

* The weight of an edge connecting two objects o1 and o2 withcentroid coordinates (x1, y1) and (x2, y2) is given by the expres-sion (y22y1)/(x22x1).

processing is essential. Queries are processed using information retrieval models ranging from simple set-theoretic Boolean to advanced algebraic models such asvector space.

Transform-Based Features for Similarity Computation

Various transforms can be applied to the image to derivecontent-specific compact logical features [11]. Similaritymeasures are defined on these compact representations.Each feature is tuned for a specific query class. For example, an approach based on Karhunen-Loeve transformis used if the contents to be represented are the variousobjects in an image and their geometric relations. On theother hand, for representing texture, an approach based onWold decomposition is used. Transform-based features canbe viewed as lossy compression schemes. They can also beused to reconstruct the original image with some loss ofquality.

Extracting the Logical Features

When a new image is inserted into the database, variousdomain objects present in the image are identified andlabeled, and their logical features are extracted.Furthermore, various relationships among the domainobjects (e.g., snow covered mountain) are also determined.We refer to all these activities associated with the insertionof a new image as semantic content capture (SCC) and iscentral to the proposed architecture for distributed content-based image retrieval.

Automatic approaches to SCC are highly desirable, sincemanual approaches are expensive and tedious. However,automated approaches are computationally expensive,difficult, and tend to be domain-specific. Moreover, thestate-of-the-art in automatic image interpretation has notprogressed beyond low-level feature extraction and objectrecognition for preconditioned images in narrow domainssuch as industrial recognition applications [12]. A simpletask such as detecting and labeling a balloon by using shapeand color analysis techniques is not robust in the generalcase. The problem becomes worse when we deal withcontinuous tone images of natural scenes which may contain weak contrast along the object boundary, spuriousedges, overlapping and occluding objects. To be usefulacross a range of domains, a CBIR system should have thecapability to deal with images originating from diversedomains and the types of objects that will be present in theimages is not known a priori. Even otherwise, the number

of models required to recognize the domain objects isnumerous and adding contextual information to thesemodels results in extremely complex models that have littlepractical value.

As an alternative to automated SCC, we have designedand partially implemented a semi-automated tool for SCCand is referred to as SemCap (Semantics Capture). SemCapis versatile in the sense that it can be used across severaldomains. SemCap is also intuitive and easy to use It isdesigned for naive and casual users. SemCap takes a hybridapproach to feature extraction—features for which robustimage interpretation techniques exist are extracted automat-ically while others are derived semi-automatically ormanually. However, manual approaches to SCC introduceinconsistency and subjectivity. Inconsistency refers to theproblem of representing the contents of similar imagesdifferently by the same person (i.e., the indexer).Subjectivity arises due to the differing interpretations of afeature by the indexer and the retrieval user. Controlledvocabulary and semi-automated tools will help alleviate theinconsistency problem. Subjectivity can be resolveddynamically through relevance feedback at retrieval time[19]. To keep the (initial) design of SemCap simple, wehave deliberately excluded the functionality required toextract features to support retrieval by volume andsequence query classes.

SemCap Design

SemCap design is centered around three principles: consis-tency, extensibility, and usability. Consistency is achievedby using class (or generalization/specialization) hierarchiesand controlled vocabulary. Class hierarchies encode richdomain semantics and restrain the indexer’s assignment ofdomain objects to the predefined classes. Class is anabstraction for grouping a collection of objects that havesimilar characteristics. Instances of the class are the actualobjects that are members of the class. A class hierarchy is atree or lattice structure in which the nodes correspond toclasses. Given a class c and its parent class p (also referredto as superclass), class p is a generalization of class c.Likewise, if l is a descendant class of c (also referred to assubclass), then class l is a specialized class of c. Thus, theclass hierarchy represents generalization/specializationrelationships between the classes. The implication for CBIRis that the class hierarchy can be used to make the userquery more specific (by traversing down the hierarchy) ormore general (by traversing up the hierarchy). Furthermore,class hierarchy can be used as an inferential knowledgestructure to facilitate similarity-based query processing.

144 V. N. GUDIVADA AND G. S. JUNG

When assigning objective and subjective features to newimage instances, the indexer consults auxiliary aids in theform of terminology lists and “scope notes” [3]. These aidsspecify the allowable (i.e., controlled) vocabulary forobjective and subjective features and provide instructionsfor use of the vocabulary.

The class hierarchies and controlled vocabulary must becarefully constructed by the domain experts, as the qualityof semantic content capture critically depends on them. Tosemi-automate this process, we have developed a toolbased on Personal Construct Theory (PCT) borrowed fromclinical psychology domain [13]. The PCT tool helps todiscover class hierarchies, and objective and subjectivefeatures which are useful in making relevant distinctionsamong the images. Consider a mug-shot image databaseand two associated applications: dating service and crim-inal investigation. For the dating service, features such ascute and handsome are relevant while for the criminalinvestigation, a different set of features such as jaw lineindentation, nasal-lip distance are appropriate. Applicationof PCT tool for face image domain is shown in Figure 2.The tool randomly selects three images at a time from thecollection and asks the domain expert to name a featureby which the first two images are similar and maximallydifferent from the third. The same question is repeated fortwo other combinations of the same three images. The pro-

cess is repeated by randomly selecting the next set of threeimages and terminates when the expert is unable to nameany more new concepts.

In addition to class hierarchy concept and controlledvocabulary, SemCap borrows two additional conceptsfrom semantic data modeling: aggregation and associa-tion. SemCap considers an image as a complex object orentity. That is, an image is composed of one or moredomain objects. The domain objects themselves can becomplex objects. Aggregation or part-of concept is usedto model complex objects. It denotes the collection ofdomain objects that constitute a complex object. Forexample, an image may consist of a lake, barn house, herdof cattle, and corn field. Association concept is used todenote relationships among the domain objects. Forexample, in the natural scene image of a snow-coveredmountain, snow and mountain are domain objects andcovered denotes the (topological) relationship between theobjects. Detailed design of SemCap is described byGudivada and Jung [14]. A screen snapshot of SemCapshown in Figure 3 depicts the content capture of a complex natural scene image.

The class hierarchies and controlled vocabulary areinstantiated in SemCap to facilitate consistency in SCC.This instantiation feature essentially provides extensibility

DISTRIBUTED QUERY PROCESSING 145

Figure 2. Application of PCT for discovering features in face images.

to SemCap. Highly intuitive and consistent graphical userinterface of SemCap renders it easy to use by naive andcasual users. The next section describes our proposedarchitecture for distributed content-based image retrieval.

An Architecture for Distributed Content-basedImage Retrieval

We refer to our proposed architecture as DCBIR (distributedcontent-based image retrieval). DCBIR is designed toenable effective image retrieval in a distributed environ-ment. DCBIR employs client and server mode of operationbased on a novel application layer communication protocol—dcbirp. The client provides a graphical user interface(GUI) to enable the user to interact easily with a DCBIRserver. Since DCBIR servers in a distributed environmentare dynamically evolving, we have designed dcbirp as ageneric and extensible protocol. It provides a means bywhich the client can obtain from the server the necessaryinformation for creating the query GUIs for different imagedomains. dcbirp works cooperatively with http and ftp pro-tocols, so that the client can easily access distributed imagecollections stored in the http and ftp servers on the Internet.

To make an image collection available for querying and

access in the DCBIR environment, first SCC should be per-formed using the SemCap on the physical images. Thenboth physical and logical images must be registered withthe DCBIR server.

DCBIR Protocol

Distributed Content-Based Image Retrieval Protocol, dcbirp,is based on tcp stream socket [15] and is designed as a state-less protocol for reliable communication. A transactionbetween the client and server consists of a request, connec-tion, response and close. Closing of the transaction can bedone by either the client (by abort) or server. The server pro-vides concurrent services to multiple clients. dcbirp messagetypes (or commands) are enumerated below.

(i) DOMAIN: to request domain information and queryreformulation and retrieval mechanisms available forthe image collections hosted by the server. Thisrequest is important to initialize internal structures ofthe query GUI of the client.

(ii) PROFILE: to request profile information of a specificimage collection including the number of logicalfeatures and possible values for the features, and othercharacteristics. This information is required by theclient to create an appropriate query GUI.

146 V. N. GUDIVADA AND G. S. JUNG

Figure 3. A screen snapshot of SemCap.

(iii) QUERY: to request the server to process the client’squery. The client also informs the server the targetimage collection, maximum number of images to beretrieved, etc.

(iv) FEEDBACK: is used for sending the user’s relevantfeedback information and current query to the serverfor query reformulation and execution.

(v) FEATURE: is used for requesting the logical featurevalues of a specific image from the server.

(vi) HELP: is used for retrieving help documents from theserver.

(vii) RESPONSE: is used for sending the server’s responseto the client’s request (explained below).

The nature of the server’s RESPONSE depends on therequest type. For example, the response message to theclient’s FEEDBACK request is the retrieval results obtainedby the reformulated query. The results are structured in theform of ordered triplets (image name, similarity value, andthe location of the image). The location of an image isspecified in Uniform Resource Locator (URL) syntax andsemantics of the World Wide Web [16]. In the RESPONSE,status message field is used to notify the client of the statusof the transaction (e.g., success, failure, or exceptions). Thestatus message field always precedes the response data.

DCBIR Architecture

The DCBIR environment consists of a client and server.The primary responsibility of the client is to facilitate the

query specification, browsing of the retrieved images, andeliciting user relevance feedback on the retrieved images.The client provides a suitable query GUI to the user andsends the user’s query in a proper format to the server.Upon receiving the results from the server, the client dis-plays a listing of the image names and their URLs. The usercan click on an URL and the client retrieves the associatedphysical image from an http or ftp server that hosts thephysical image.

As shown in Figure 4, the major building blocks of theDCBIR client are User Interface Manager (UIM), DataParser, Protocol Interpreter, and Cache Manager. UIMcommunicates with its contractors (i.e., the Query GUIBuilder, Query Interpreter, Display Manager) for deliveringquery specification, execution, and browsing services to theuser. The Query GUI Builder is responsible for creating thequery GUI appropriate for a specific image domain. TheQuery Interpreter is responsible for interpreting the user’squery and converting the query (using the Data Parser) toan internal format. Display Manager communicates withthe user to customize the display environment and keepstrack of the user’s feedback history. Cache Manager worksclosely with the Display Manager for efficient storagemanagement for image caching, maintaining user’s long-term and short-term profiles. The user profile consists of auser’s accumulated feedback history and a listing (hot list)of interesting images. The size and the location of the cachedirectory is determined by the Cache Manager. Cachestorage space is managed in first-in-first-out manner. Theprotocol interpreter interprets the data (generated by the

DISTRIBUTED QUERY PROCESSING 147

Figure 4. Architecture of DCBIR.

UIM) and creates a proper message for communicatingwith the dcbirp, http, and ftp servers.

The client’s message should be delimitered by an end-of-message sequence; otherwise, the server may continuously wait for the messages from the client. Themessage box between the client and server is either in mainmemory (communication end-point) or in a file dependingon the message type (e.g., DOMAIN type messages aredirectly transmitted from the client to the server via thecommunication end-point while query and feedback messages are first stored in a file and then read by theserver).

The main building blocks of the DCBIR server areProtocol Interpreter, Query Processor, and DatabaseManager. The Protocol Interpreter is responsible forunderstanding the request message of the client and performing the appropriate actions. The Query Processorconsists of the retrieval and reformulation mechanisms.The Query Processor is responsible for processing theclient’s QUERY and FEEDBACK requests. The DatabaseManager is responsible for maintaining the informationfor answering the client’s DOMAIN, PROFILE, FEATURE,and HELP requests. It also maintains the logical imagedatabase, and possibly maintains the physical image data-base as well. The DCBIR server invokes two processes:master and slave. The master process listens to requestsdelivered to its port, and the slave process performs thetransaction. Concurrency control is not required since theclient is not allowed to update the server’s database(s).The server doesn’t delimiter the end of message to theclient, since it always disconnects after a transaction iscompleted.

We are currently enhancing the dcbirp protocol to facilitatecomponent databases to join or withdraw from the DCBIRenvironment. The current implementation of DCBIR isbased on centralized logical image database. That is, thelogical features of all the image repositories participating inthe DCBIR environment are centrally managed at theDCBIR server site. However, the distributed logical data-bases can be easily incorporated by extending the dcbirp(e.g., adding GETLOGICALDB message).

Prototype Implementation

Our prototype implementation of DCBIR is developed torun under motif GUI on a UNIX workstation. Currently, itmanages two image collections, face and hair-style, images.Our face image collection consists of 93 images and we

have identified 19 subjective features using the PCT tooland a forensic composite technician as the domain expert.The hair-style image collection features 60 images and suit-able logical features are identified with the help of ahair-stylist.

The query GUI sent by the DCBIR server to a client isshown in Figure 5 for the face image collection. This GUI isused to specify retrieval by subjective feature queries. Theuser specifies a query by selecting a (small) subset of the 19subjective features. This mode of query specification is suit-able for exploratory query processing in criminalinvestigation and law enforcement applications. An eyewit-ness of a crime may not recall all the facial features of theperpetrator of the crime. Moreover, subjectivity and uncer-tainty are associated with the description of the eyewitness.Thus, the initial user query is incomplete, uncertain, and sub-jective. The algorithm proposed in [20] is used to process thequery. The query is processed and the URLs of images indecreasing order of relevance are sent to the client (Figure 6).The user can retrieve the physical image corresponding to anURL entry by clicking on it. The query is incrementallyrefined to make the retrieval more effective using user rele-vance feedback. The user provides relevance feedback bysimply labeling the highly ranked images as relevant or non-relevant. The algorithm makes use of this information todetermine the importance of various subjective attributes.

148 V. N. GUDIVADA AND G. S. JUNG

Figure 5. Query specification GUI for face image collection.

Similar screen snapshots for hair-style image collection areshown in Figures 7 and 8.

We have also developed query GUIs and retrieval algorithms for two other domains — shape database andresidential floor plans. Figure 9 shows rank ordered imagefile names relevant to a shape similarity query. The imagein the top left quadrant is the query image and the top rightquadrant lists the file names of images in the decreasingorder of similarity. By clicking on a file name, the corresponding image is displayed in the lower left quadrant. The lower right quadrant lists miscellaneousinformation about the database image being displayed.

Figure 10 shows the query GUI for specifying retrievalby spatial similarity queries on a residential floor plandatabase. A query is specified by first spatially configuringthe icons corresponding to the domain objects. Spatialrelationships among the icons implicitly specify the desiredspatial relationship among the corresponding objects in thefloor plans. We have two algorithms for processing thisquery [5, 7], and the user can select the algorithm to be used.Figure 11 shows the interface for browsing the retrievedfloor plan images. Both shape and residential floor plandatabases are currently unavailable for distributed access.

Discussion and Conclusions

Current approaches to CBIR are based on modeling theimage contents as logical features. The approaches differ

in terms of what features are extracted, how and when theyare extracted, the level of abstraction manifested in thefeatures, and the degree of domain independence desired[2].

DISTRIBUTED QUERY PROCESSING 149

Figure 6. Rank ordered list of URLs for a query on face image collection.

Figure 7. Query specification GUI for hair-style image collection.

In developing CBIR systems, there is an inherent trade-offbetween the degree of automation desired for feature extrac-tion and the level of domain independence realized in thesystem. CBIR systems can be developed with emphasis on

automatic and dynamic feature extraction. Although somefeatures may be determined a priori, these systems empha-size the ability to compute the required features under theguidance of a domain expert dynamically [17, 18]. Though

150 V. N. GUDIVADA AND G. S. JUNG

Figure 8. Rank ordered list of URLs for a query on hair-style image collection.

Figure 9. Query specification GUI for shape database.

this approach is ambitious and aims at sophisticated CBIR, itis somewhat limited by the state-of-the-art in image interpre-tation techniques. We refer to this as dynamic featureextraction approach. CBIR systems can also be developedwith an emphasis on achieving a reasonable degree ofdomain independence at the cost of not having a completelyautomated system for feature extraction. All the features arederived at the time of inserting an image into the databaseand queries are processed using these features only. We referto this as a priori feature extraction approach.

We believe that a priori feature extraction approach haspotential for widespread impact. Though this approachdoesn’t entail the level of sophistication of the dynamic fea-ture extraction, a priori feature extraction appears to bemost promising and scalable. Since the features are

extracted a priori, queries can be processed interactivelyover large distributed image collections. Therefore, ourapproach to distributed image retrieval is based on a priorifeature extraction. We have identified various generic queryclasses and relevant logical features to process the queriesefficiently. To make the logical feature extraction consistentand less tedious, we have designed and partially imple-mented SemCap. We are currently working on robust andfully fledged implementation of SemCap. We hope that theavailability of SemCap will facilitate distributed content-based access to numerous image collections on the Internet.

A real challenge in distributed content-based imageretrieval is to design a query language with well-definedsemantics that can be used to query diverse image collec-tions uniformly (i.e., under a universal query interface).The problem is that there are several generic query classesand each class requires a specification scheme that is bothnatural and elegant to specify queries of that class. Forexample, a query involving spatial relationships is specifiedby selecting and configuring the domain objects in a sketchpad window as shown in Figure 10. The spatial relation-ships among the domain objects are specified implicitly. Onthe other hand, specifying a query that involves subjectivefeatures requires a different type of interface as shown inFigure 5. We believe that a natural language text queryinterface is the ideal one since it provides an elegant anduniform means of querying diverse image collections.However, this requires analysing the natural language textquery, identifying various subqueries, and mapping eachsubquery onto the relevant generic query classes. Forinstance, analysing locative expressions in the naturallanguage text query facilitates the construction of theretrieval by spatial constraints query corresponding to the locative expressions. Similar mappings need to be investigated for other generic query classes.

Our future research involves refining the dcbirp protocoland evolving the DCBIR architecture to accommodateretrieval needs of various other image collections on theInternet*. We will also be incorporating the residential floorplan and shape databases into the DCBIR environment.

Acknowledgment

This work has been supported in part by ARPA GrantNumber N00174-93-RC-00004 and in part by SEA/DOEGrant Number DE-FG05-94ER25229 (second author).

DISTRIBUTED QUERY PROCESSING 151

Figure 10. Query specification GUI for residential floor plandatabase.

Figure 11. GUI for browsing retrieved images to a query in resi-dential floor plan database.

* Other collections include the On-Line Images at the NationalLibrary of Medicine, and Smithsonian Image Database.

References

1. Larson, J. (1995) Database Directions: From Relational toDistributed, Multimedia and Object-Oriented DatabaseSystems. Prentice Hall.

2. Gudivada, V. & Raghavan, V. (1995) Content-based imageretrieval systems. IEEE Computer, 28(9): 18–22.

3. Salton, G. (1989) Automatic Text Processing. Reading, MA:Addison-Wesley.

4. Preparata, F. & Shamos, M. (1985) Computational Geometry:An Introduction. New York: Springer-Verlag.

5. Gudivada, V. & Raghavan, V. (1995) Design and evaluationof algorithms for image retrieval by spatial similarity. ACMTrans. Inf. Sys., 13(1): 115–144.

6. Gudivada, V. & Jung, G. (1995) Spatial knowledge repre-sentation and retrieval in 3-D image databases. In IEEEInternational Conference on Multimedia Computing andSystems, pp. 90–97. IEEE Computer Society Press.

7. Gudivada, V. & Jung, G. (1995) A linear time algorithm forretrieval by spatial constraints in multimedia database appli-cations. In ACM Computer Science Conference, pp. 195–202,Nashville, TN.

8. Lee, S.Y., Shan, M.K. & Yang, W.P. (1989) Similarity retrievalof ICONIC image database. Patt. Recog., 22(6): 675–682.

9. Mehtre, B. (1995) Shape measures for similarity retrieval ofimages. Technical Report TR95-179-0, Institute of SystemsScience, National University of Singapore.

10. Faloutsos, C. et al. (1994) Efficient and effective querying byimage content. J. Intell. Inf. Sys., 3(3): 231–262.

11. Pentland, A., Picard, R. & Sclaroff, S. (1994) Photobook:Tools for content-based manipulation of image databases. In Storage and Retrieval for Image and Video Databases II,pp. 34–46. SPIE, Vol. 2185.

12. Daneels, D. et al. (1993) Interactive outlining: An improvedapproach using active contours. In: Storage and Retrieval forImage and Video Databases, pp. 226–233. SPIE, 1908.

13. Raghavan, V., Gudivada, V. & Katiyar, A. (1991) Discoveryof conceptual categories in an image database. In Inter-national Conference on Intelligent Text and Image Handling,pp. 902–915. RIAO 91, Barcelona, Spain.

14. Gudivada, V. & Jung, G. (1995) Semantic content capturefor distributed content-based image retrieval. TechnicalReport CS-95-04, Ohio University, School of ElectricalEngineering and Computer Science, Athens, OH 45701.

15. Comer, D. & Stevens, D. (1994) Internetworking withTCP/IP, Vol. III, Client-Server Programming andApplications. Prentice Hall.

16. Vetter, R., Spell, C. & Ward, C. (1994) Mosaic and theWorld-Wide Web. IEEE Comp., 27(10): 49–57.

17. Griffioen, J., Mehrotra, R. & Yavatkar, R. (1993) A semanticdata model for embedded image information. In SecondInternational Conference on Information and KnowledgeManagement, pp. 393–402, Washington, D.C.

18. Gupta, A., Weymouth, T. & Jain, R. (1991) Semantic querieswith pictures: The VIMSYS model. In 17th InternationalConference on Very Large Data Bases, pp. 69–79.

19. Jung, G. & Gudivada, V. (1994) Adaptive query reformula-tion in attribute based image retrieval. In Third Golden WestInternational Conference on Intelligent Systems, pp. 763–774.Kluwer Academic Publishers.

20. Gudivada, V., Raghavan, V. & Seetharaman, G. (1994) Anapproach to interactive retrieval in face image databases basedon semantic attributes. In Third Annual Symposium onDocument Analysis and Information Retrieval, pp. 319–335.Information Science Research Institute, University of Nevada,Las Vegas.

152 V. N. GUDIVADA AND G. S. JUNG