merging results for distributed content based image retrieval

Multimedia Tools and Applications, 24, 215–232, 2004c© 2004 Kluwer Academic Publishers. Manufactured in The Netherlands.

Merging Results for Distributed Content BasedImage Retrieval∗

STEFANO BERRETTI [email protected] DEL BIMBO [email protected] PALA [email protected] Sistemi e Informatica, Universita di Firenze, via S. Marta 3, 50139 Firenze, Italy

Abstract. Searching information through the Internet often requires users to separately contact several digitallibraries, use each library interface to author the query, analyze retrieval results and merge them with resultsreturned by other libraries. Such a solution could be simplified by using a centralized server that acts as a gatewaybetween the user and several distributed repositories: The centralized server receives the user query, forwardsthe user query to federated repositories—possibly translating the query in the specific format required by eachrepository—and fuses retrieved documents for presentation to the user. To accomplish these tasks efficiently, thecentralized server should perform some major operations such as: resource selection, query transformation anddata fusion.

In this paper we report on some aspects of MIND, a system for managing distributed, heterogeneous multimedialibraries (MIND, 2001, http://www.mind-project.org). In particular, this paper focusses on the issue of fusing resultsreturned by different image repositories. The proposed approach is based on normalization of matching scoresassigned to retrieved images by individual libraries. Experimental results on a prototype system show the potentialof the proposed approach with respect to traditional solutions.

Keywords: digital libraries, collection fusion, distributed retrieval

1. Introduction

Nowadays, many different document repositories are accessible through the Internet. Ageneric user looking for a particular information should contact each repository and ver-ify the presence of the information of interest. This typically takes place by contactingthe repository server through a client web browsing application. Then, the user is pre-sented a web page that guides her/him in the process of retrieval of information. In amultimedia scenario this can be in the form of text documents, images, videos, audio andso on.

This framework has two major drawbacks. First of all, in order to find some informationof interest, the user should know in advance which repository stores it. Internet searchengines such as Altavista, Google and Lycos, to say a few, can support the user in this task.However, if the user doesn’t know the existence of one repository s/he will never contactit, even if it is the only resource storing the desired information.

∗This work is supported by EU Commission under grant IST-2000-26061. Detailed information about this projectis available at http://www.mind-project.org.

216 BERRETTI, DEL BIMBO AND PALA

A second drawback is related to the fact that, in order to find the information of interest,the user typically contacts several repositories. Each one is queried and even for a limitednumber of repositories, the user is soon overwhelmed by a huge and unmanageable amountof—probably irrelevant—retrieved documents.

A much more efficient and convenient solution, especially for the user, is to federate allrepositories within a network featuring one central server. A generic user looking for someinformation sends the query to the server. This one is in charge to forward the query to all,or a subset of all, the networked resources. All retrieved documents, that is the set of alldocuments returned by each resource, are gathered by the central server and convenientlyarranged for presentation to the user.

The implementation of this solution develops upon the availability of three main func-tions: resource buffering, resource selection and data fusion.

Resource buffering. In general, each resource has its own ways of describing, access-ing and presenting stored information. These correspond to different choices in terms ofdescriptors of content, query paradigms and visualization metaphors. In order to allow theuser to seamlessly query several different resources and receive their results, a module isrequired that acts as a buffer translating information back and forth between the resourceand the central server. In its simplest form, this module is just a wrapper that accomplishessimple format translation of data, commands and content descriptors. However, in a morerealistic scenario, this module should also be responsible for management of heterogeneouscontent representations. For instance, the user query is represented through a 64 bins colorhistogram whilst the digital library process queries only if are in the form of a 32 bins colorhistogram.

Resource selection. In general, it is not always convenient to send the user query to allfederated resources. Several considerations can be brought forward to support this decision:sending the query to all resources contribute to network congestion; some resources providedocuments for free whereas some others don’t; some resources may have much more relevantdocuments than others. More precisely, resource selection is the process that, given a userquery, estimates the optimum number of documents that should be retrieved from eachresource (the query is not sent to resources that should provide no document) [5, 10, 11].An efficient resource selection relies on several information, including cost factors as wellas descriptors of content for each resource (resource descriptions [2]).

Data fusion. Since each resource uses its own ways to represent image content andcompute the similarity between images, retrieval lists returned by different libraries may notbe directly comparable. Furthermore, even if matching scores are associated with retrievedimages, they are not normalized with each other. This prevents the possibility to infer therelative relevance of documents returned by different resources from the analysis of theirmatching scores. However, comparing the relative relevance of all retrieved documents isnecessary in order to present them in a suitable form to the user. Data fusion is the processby which retrieval lists returned by different resources are normalized with each other soas to enable comparison of the relative relevance of all retrieved documents [22].

In this paper, a novel approach is presented for fusion of results returned by distributeddigital image libraries. In particular, we focussed on the case in which documents scoresare available in the result lists. In the case in which scores cannot be obtained by every

MERGING RESULTS FOR DISTRIBUTED CONTENT BASED IMAGE RETRIEVAL 217

library, an additional module is required to map document rank into a document score,possibly relying on duplicates and cross matching between results returned by differentlibraries.

The proposed approach relies on learning normalization coefficients. These are usedto normalize, with respect to a reference scale, document scores associated by differentlibraries. This is accomplished through two separate steps. In the first step, each library isprocessed separately through a set of sample queries. For each sample query, the value ofnormalization coefficients is computed. In the second step, known values of normalizationcoefficients are used to estimate their values for new queries.

In addition, in order to make the discussion more concrete, we present the data fusion asthe final stage of the wider process carried out by a distributed retrieval system. In particular,we will refer to the architecture developed in the context of the MIND project—ResourceSelection and Data Fusion for Multimedia INternational Digital Libraries [15]—focusingon the problem of resource description extraction. In fact, the way resource descriptors areextracted largely affects the applicability and quality of data fusion approaches.

This paper is organized as follows. In Section 2, the overall architecture of the MINDsystem is described, focussing on the resource description process. In Section 3, previouswork related to data fusion for distributed libraries is reviewed. In Section 4, the proposedmodel for data fusion is described. Finally, experimental results are reported in Section 5,and conclusions are drawn in Section 6.

2. Architecture

Several mediator/federated systems have been developed since the early 1990s. Few of theseare the TSIMMIS project [6], which is primarily focused on the semi-automatic generationof wrappers, translators and mediators; the DISCO [20] and HERMES [19] projects whichalso address related issues. Among the others, the GARLIC project [8], focused on thespecific case of federating multiple multimedia sources.

Figure 1 shows the MIND architecture for the overall distributed retrieval system. Itcan be noticed that each archive is linked to the network through a proxy module. Eachproxy stores all relevant information about the resource it gives access to. This informationincludes summaries of resource content (resource descriptions), specification of supportedmedia types, database schema and query predicates for each schema attribute, rules to formata query so that it can be processed by the resource, and rules to format retrieval results sothat they conform to a common, predefined syntax. A new proxy is built each time a newresource wants to join the network.

At that time, the resource is sampled either following a cooperative protocol or an un-cooperative one (only for text libraries). In the former case, it is assumed that the digitallibrary adheres to the project, joining the network of federated libraries and providing contentdescriptors for all its documents. The latter case occurs when the digital library does notexplicitly join the federated network, and its content can be only estimated through someform of sampling. Sampling the resource aims at the extraction of the resource description,that is a summary of resource content. Throughout the paper, it is assumed that image digitallibraries adhere to a cooperative protocol.


Figure 1. Simplified overview of the MIND system architecture for distributed retrieval.

Resource descriptions play a central role in the operational cycle of the distributed retrievalsystem, and are used both for resource selection and for data fusion.

Resource selection is the process that, given a user query, selects which resources arebest candidates to contain relevant items. Of course, selection is not performed by actuallyissuing the query to the resource. Rather, estimation of the relevant items that are contained ina generic resource is accomplished by comparing the query against the resource description.

Resource descriptions are also used for data fusion, as described in the next Sections. Ingeneral, being resource descriptions a summary of resource content, they are used to helpthe definition of a model representing how items are distributed in the collection (resource).

2.1. Extraction of resource descriptions

Regardless of the media characterization of database items (text, audio, images, videos, etc.),previous approaches to computation of resource descriptors rely on some form of clustering[5]: database items are grouped (clustered) according to some homogeneity criteria whichcorresponds to the proximity of items in some suitable representation space. Then, eachcluster is represented through its distinguishing features: typically, the value of the featurefor which cluster items are homogeneous.

In the following, we assume that (i) the content of each image is represented througha multidimensional feature vector and (ii) similarity of content between two images ismeasured through a metric distance between their feature vectors.

Extraction of resource descriptors relies on organization of feature vectors into a metricaccess structure [7]. Computation of this structure induces a hierarchical clustering of featurevectors: at each level of the hierarchy, similar vectors are grouped together, while dissimilarones are located into separated clusters. The resulting multi-level structure represents thedatabase content at different levels of resolution, through a tree of nodes. Specifically, eachnode is associated with a fixed maximum number m of entries. Each entry is the root of a


Figure 2. (a) Multiresolution clustering of database items. The hierarchy of cluster centers is also shown. (b)Resource descriptors extraction for different values of granularity. Composition and cardinality of resource de-scriptors change with different granularities.

sub-tree and is representative for all the nV feature vectors included in its sub-tree subV .According to this, each node entry V (routing vector) can be seen as the center of thecluster with radius µV , which comprises all the feature vectors in the sub-tree of V (i.e. µV

is the maximum distance between the routing vector and a generic vector belonging to itssub-tree).

According to this organization, layer of the tree (composed of nodes at the same height)can be considered as an abstraction of database content at a certain level of resolution.In fact, routing vectors representative of a generic resolution level (level-i), are derivedby promoting routing vectors from the immediately lower level of the tree. In this way,proceeding towards the upper levels, representative routing vectors abstract the content ofthe database using a reduced number of vectors. An example is shown in figure 2(a) wherea sample set of items is clustered at three different resolution levels.

During access to the index, the triangular inequality can be used to support efficientprocessing of range queries, i.e. queries seeking for all the images in the archive which arewithin a given range of distance from a query image Q. To this end, the distance between Qand any image in the covering region of a routing vector V can be lower-bounded using theradius µV and the distance between V and Q. Specifically, if µmax is the range of the query(that is, the query returns all images which are similar to a given image with a threshold ofµmax distance), the following condition can be employed to check whether all the featurevectors in the covering region of V can be discarded, based on the sole evaluation of thedistance µ(Q, V ):

µ(Q, V ) ≥ µmax + µV → no vector acceptable in subV (1)

In similar manner, an opposite condition checks whether all the feature vectors in thecovering region of V fall within the range of the query (in this case, all the feature vectors


in the region can be accepted). In the case that neither of the two conditions holds, thecovering region of V may contain both acceptable and non-acceptable feature vectors, andthe search must be repeated on the sub-index subV .

K -nearest neighbor queries [16], seeking for the K feature vectors that are most similarto the query, can also be managed in a similar manner. This is obtained by regarding thequery as a particular case of range query in which the range µmax is determined during thesearch.

2.2. Resource descriptions at different granularities

The hierarchical organization of database items induced by multi-level clustering is ex-ploited to derive resource descriptors at different levels of granularity. In this way, estima-tion of database population relevant to a query relies on resource descriptors selected froma specific resolution accordingly to some trade-off between the accuracy of the estimateand the efficiency of the estimation process.

In principle, the multi-level clustering approach of Section 2.1 is applicable both to thecase of cooperative or un-cooperative image libraries. For cooperative libraries, a separateindex is constructed for every individual resource, by using the entire set of content de-scriptors provided by the library for all its images. In the case of un-cooperative libraries,image descriptors are not directly accessible to the federated network; in this case the librarycontent can be only estimated through some form of sampling. A possible approach is thatof randomly quering the library and considering the returned result sets as representative ofthe library content. During this sampling process, images are gathered into a local database;after that, content descriptors are extracted from all these images. Locally extracted de-scriptors can be used to construct the index of resource descriptor in the same way usedfor cooperative libraries. It is worth noting that the effective use of such descriptors in theresource selection phase largely depends on the way local descriptors are capable to repli-cate the actual behaviour of the unknown descriptor used by the particular library. In thefollowing, a cooperative protocol is assumed.

As evidenced in Section 2.1, database items are organized in clusters each identified by arouting object and a radius. According to this, cluster items are subsequently decomposedin more coherent clusters (i.e. clusters with a smaller radius) from the top to the bottomof the hierarchy. In so doing, cluster radius does not-increase descending the hierarchy to-wards the leaf level. The resource descriptor for a particular granularity is thus determinedby imposing a maximum value for the cluster radius and using it to traverse the hierar-chy. Along each path, starting from the root node, the search is stopped when a clusterwith radius less than the given maximum is encountered, and the corresponding routingobject is retained as one of the components of the resource descriptor at that granularity(i.e. the set of routing objects collected in this process constitute the resource descrip-tor). The value of the maximum is used to identify a specific granularity, that is, in thehypothesis of a distance normalized in the [0, 1] interval, the resource descriptor for granu-larity 0.1 comprises the routing objects with radius less than 0.1, selected according to theabove process. A schematic representation of resource descriptors extraction is reported infigure 2(b).


By repeating the resource descriptors selection process for different values of granularity,independent resource descriptors are extracted. In this way, for each granularity, the entire setof database items collected into the hierarchy is reduced to a resource descriptor constitutedby a set of components each associated with the feature vector of the cluster it represents,the cluster radius, and the population of the cluster.

3. Related work on data fusion

In general, the main problem with merging multiple ranked result lists is the use of manydifferent similarity metrics and their incompatibility for comparison. Out of the specificcontext of data fusion, works which are strictly related are those in [1, 12] and [9]. InAdali et al. [1], proposed a similarity algebra that brings together relational operators andresults of multiple similarity implementations in a uniform language. This algebra canbe used to specify complex queries that combine different interpretations of similarityvalues and multiple algorithms for computing these values. Ilyas et al. [12], introduced apipelined query operator, that produces a global rank from input ranked streams based ona score function. Considering the problem under a different perspective, in Fagin et al. [9]introduced various distance measures between the top k documents in different retrievallists which also provides an approximation algorithms for the rank aggregation problemwith respect to a large class of distance measures.

Most of the approaches proposed for merging results apply to text libraries and, for someof them, the extension to deal with multimedia libraries is not straightforward. A majordifficulty related to the extension of approaches developed for text to deal with images isdue to the difference, in terms of computational complexity, related to computation of thesimilarity between two text documents and two images. Comparison of text documents canbe accomplished very quickly. Differently, comparison of two images typically takes muchmore time, due to the processing time required to extract content descriptors to be usedduring distance computation. This prevents the possibility of re-ranking retrieval results ifthese are in the form of images.

One of the most known approaches is called Round-Robin [21]. In this approach, it isassumed that each collection contains approximately the same number of relevant items thatare equally distributed on the top of the result lists provided by each collection. Therefore,results merging can be accomplished by picking up items from the top of the result lists in around-robin fashion—the first item from the first list, the first from the second list, . . . , thesecond from the first list and so on. Unfortunately, the assumption of uniform distributionof relevant retrieved items is rarely observed in practice, especially if libraries are genericcollections available on the Web. As a result, the effectiveness of the Round-Robin approachis very limited.

A different solution develops on the hypothesis that the same search model is used toretrieve items from different collections. In this case, items matching scores are com-parable across different collections and they can be used to drive the fusion strategy.This approach is known as Raw Score Merging [14], but it is rarely used due to thelarge number of different search models that are used to index documents even in textlibraries.


To overcome limitations of approaches based on raw score matching, more effective so-lutions have been proposed developing on the idea of score normalization. These solutionsassume that each library returns a list of retrieved items with matching scores. Then, sometechnique is used to normalize matching scores provided by different libraries. Score nor-malization can also be accomplished by looking for duplicate documents in different lists.The presence of one document in two different lists is exploited to normalize scores in thetwo lists. Unfortunately, this approach cannot be used if one or more lists are composedonly of unique documents, that is documents that are not included in other lists.

A more sophisticated approach based on cross similarity of documents is presentedin [13]. For each list, each document is used as a new query. Retrieval results for thesenew queries as well as matching scores are used to evaluate cross similarities betweendocuments retrieved by different libraries. Then, cross similarities are used to normalizematching scores. The main limitations of this approach are related to its computationalcomplexity—each document in each retrieved list should be used as a new query for allthe libraries—and to the fact that it requires the extraction of content descriptors fromeach retrieved document. This latter requirement that can be easily accomplished for textdocument, can be quite challenging for image documents.

In order to lessen these requirements, in [17] a new approach is presented that is basedon the use of a local archive of documents to accomplish score normalization. Periodically,libraries are sampled in order to extract a set of representative documents. Representativedocuments extracted from all the libraries are gathered into a central archive. When a newquery is issued to the libraries, it is also issued to the central search engine that evaluatesthe similarity of the query with all documents in the central archive. Then, under theassumption that each retrieved list contains at least two documents that are also includedin the central archive, linear regression is exploited to compute normalization coefficient,for each retrieved list, and normalize scores. This approach has been successfully testedfor text libraries that were described through the Query-based Sampling approach. Briefly,Query-based Sampling [3] is a method to extract a description of the content of a givenresource by iteratively running queries to the resource and analyzing retrieved results. Atthe end of the iterative process, all retrieved documents are used to compute a description ofthe resource content. Unfortunately, when the approach presented in [17] is not combinedwith a description of each resource extracted using Query-based Sampling, the assumptionthat each retrieved list contains two elements of the central archive is rarely verified. Inthese cases, linear regression cannot be applied and matching scores are left un-normalized.

As an example, figure 3 shows the ratio of queries that are successfully processed usingthe method described above for a database of images. In this experiment, the central archiveis constructed using the approach described in Section 2.1. In the figure, the horizontalaxis reports the number of retrieved images in the result set. The vertical axis reports theratio of queries (averaged over 1000 random queries) that return at least two elementsalso included in the central archive. Several curves are reported, corresponding to differentcardinalities of the central archive measured as percentage with respect to the library size.For instance, using a central archive that contains only 10% of images in the library, theaverage percentage that among the first 40 retrieved images there are at least 2 images ofthe central archive is only 80%.


Figure 3. Applicability to a library of images of the data fusion approach using sampling and regression. Differentcurves represent the ratio of manageable queries at different granularities. For example, the curve at 16% indicatesthe ratio of manageable queries using a central archive which comprises about 16% of the images in the source.

4. Data fusion model

The proposed solution develops on the method presented in [17, 18]. Improvements areintroduced to address two major limitations of the original method: applicability of thesolution to the general case, regardless of the particular solution adopted for resourcedescription extraction; reduction of computational costs at retrieval time. These goals areachieved by decomposing the fusion process in two separate steps: model learning andnormalization.

Model learning is carried out once, separately for each library: it is based on running aset of sampling queries to the library and processing retrieval results. In particular, for eachsampling query, the library returns a list of retrieved images with associated matching scores.These images are reordered using a fusion search engine that associates a normalized scoreto each image. Then, the function mapping original scores onto normalized scores is learned.

Differently, normalization takes place only at retrieval time. It allows, given a query andresults provided by a library for that specific query, to normalize matching scores associatedwith retrieved results. This is achieved by using the normalization model learned for thatlibrary during the model learning phase. In doing so, data fusion is achieved through amodel learning approach: during the first phase parameters of the model are learned; duringthe second one the learned model is used to accomplish score normalization and enablefusion of results based on normalized scores merging.

Selection of the fusion search engine plays a central role in the data fusion process. Thetype of search engine, its way of describing image content and computing the similaritybetween images, affects the accuracy of score normalization. In general, the higher is thesimilarity between the fusion engine and a specific digital library engine–in terms of contentrepresentation and similarity measure—the higher the accuracy of score normalization.

4.1. Score modelling

Let L(i)(q) = {(Ik, sk)}nk=1 be the set of pairs image/score retrieved by the i-th digital library

as a result to query q . We assume that the relationship between un-normalized scores sk and


their normalized counterpart σk can be approximated by a linear relationship. In particular,we assume σk = a ∗ sk + b + εk, where εk is an approximation error (residue), a and b twoparameters that depend on the digital library (i.e. they may not be the same for two differentdigital libraries, even if the query is the same) and on the query (i.e. for one digital librarythey may not be the same for two different queries). The approximation error εk accountsfor the use of heterogeneous content descriptors (the digital library and the data fusionserver may represent image content in different ways) as well as for the use of differentsimilarity measures (even if the digital library and the data fusion server use the same imagecontent descriptors—e.g. 64 bins color histogram—they may use two different similaritymeasures—e.g. one may compare histograms through a quadratic form distance function,the other may use the histogram intersection).

Values of parameters a and b are unknown at the beginning of the learning process.However, during learning, values of these parameters are computed, separately for eachlibrary. For each sample query, the value of parameters a and b is estimated. Computingthe value of parameters a and b for different queries is equivalent to sampling the valuedistribution of these parameters on a grid of points in the query space. Knowledge ofparameter values on the grid points is exploited to approximate the value of parametersfor new queries. This is accomplished by considering the position of the new query withrespect to grid points.

Approximation of parameters a and b for a new query q is carried out as follows. If thenew query matches exactly one of the sample queries (say qk) used during the learningprocess, then the value of a and b is set exactly to the values ak and bk that were computedfor the sample query qk . Otherwise, the three grid points q j , qk , ql that are closest to thenew query q are considered. The value of a and b for the new query is approximated byconsidering values of a and b as they were computed for queries q j , qk , ql . In particular,if these values were a j , b j , ak , bk , al and bl , then values of a and b for the query qk areestimated as:

a = d j

Da j + dk

Dak + dl

Dal , b = d j

Db j + dk

Dbk + dl

Dbl (2)

where D = d j + dk + dl , and d j , dk and dl are the Euclidean distances between the newquery and grid points q j , qk , ql , respectively (see figure 4(a)).

Using Eq. (2) is equivalent to estimate values of parameters a and b as if the new querylain to the hyperplane passing by q j , qk and ql . For this assumption to be valid, the new queryshould be close to the hyperplane. The closer it is, the more precise the approximation. Asa consequence of this approximation, normalized score is estimated as: σk = a ∗ sk + b sothat the error εk = σk − σk is a function of the query q.

In the proposed approach, linear regression is the mathematical tool used for the estima-tion of normalization coefficients, given a sample query and the corresponding retrievedimages. In our case, data points are pairs (σi , si ) being si the original matching score of thei-th image and σi its normalized score. The application of this method to a collection of pairs(σi , si ) results in the identification of the two parameters a and b such that σi = a∗si +b+εi .Values of parameters a and b are determined so as to minimize the sum of square distances


Figure 4. (a) Position of a new query q with respect to the approximating points q j , qk , ql , on the grid. (b)Normalized scores σk with respect to un-normalized ones sk for a sample query. The continuous straight lineapproximates the data model using linear regression.

of data points to the model line:

mina,b

∑k

(ask − σk + b)2

a2 + 1(3)

Figure 4(b) plots, for a representative sample query, the values of normalized scores σk

with respect to un-normalized ones sk . The straight line is that derived during the learningphase by computing parameters a and b of the linear regression between scores σk and sk .This shows that the linear model assumption is quite well fulfilled, at least for the first fewretrieved images.

In this experiment, we used the fusion engine with the features described in Section 4 anda digital library which supports color search based on a 64 bins histogram and Euclideanmetric of distance.

4.2. Selection of sample queries

Through the learning phase, distribution of parameters a and b is sampled for a set ofqueries. Selection of these sample queries affects the quality of the approximation of a andb for a generic query in the query space during normalization.

Basically, two main approaches can be distinguished to select sample queries. The firstapproach completely disregards any information available about the distribution of imagesin the library. In this case, the only viable solution is to use sample queries uniformlydistributed in the query space. A different approach relies on the availability of informationabout the distribution of images in the library. This should not be considered a severelimitation since this kind of information is certainly available to accomplish the resourceselection task. In particular, this information is available in the form of a resource descriptorcapturing the content of the entire library. Typically, resource descriptors are obtained byclustering library images and retaining information about cluster centers, cluster populationand cluster radius. Clustering is performed at several resolution levels so as to obtain multiple


cluster sets, each one representing the content of the library at a specific granularity (seeSection 2.2).

Resource descriptors can be used to guide the selection of sample queries by using thecluster center itself as the sample query. In this way, the distribution of sample queriesconforms to the distribution of images in the library.

Once the resource description process is completed, each cluster center ck is used as aquery to feed the library search engine. This returns a set of retrieved images and associ-ated matching scores si . Then, ck is used as a query to feed the reference search engine(i.e. the search engine used by the data fuser) that associates with each retrieved image anormalized (reference) matching score σi . Linear regression is used to compute regressioncoefficients (ak, bk) for the set of pairs (σi , si ). In doing so, each cluster center ck is associ-ated with a pair (ak, bk), approximating values of regression coefficients for query pointsclose to ck . The set of points (ck, ak, bk) can be used to approximate the value of regres-sion coefficients (a, b) for a generic query q by interpolating values of (ak, bk) accordingto Eq. (2).

It is worth noting that, in general, feature vectors capturing image content are defined ina high dimensional space, thus making difficult a uniform coverage of the whole featurespace by using a limited set of sample queries. However, the use of resource descriptors assampling points reduces this complexity by adaptively locating sampling queries in thoseparts of the space where the features vectors are more dense. In fact, points of resourcedescriptors corresponds to cluster centers in the feature space. This guarantees an accuratecoverage of the feature space with a limited number of sample queries. In addition, use ofresource descriptors at different granularities as sample queries, yields an adaptive coverageof the feature space at different levels of resolution.

Finally, we note that the computational complexity of the learning process is determinedby the combination of several factors. In particular, the type of content descriptor usedby the data fusion engine and its similarity measure are the two factors that most affectthe computational load. Furthermore, since the query process takes place in a distributedenvironment, the connection speed over the network becomes the main bottleneck of thelearning phase. In fact, the query time sums up the time to deliver the query to the digitallibrary, the search time on the source side, and the time to transfer all the result images to thedata fusion engine; only at this time re-processing and learning phases can be accomplishedby the data fusion module.

5. Experimental results

The proposed solution to data fusion for distributed digital libraries has been implementedand tested on a benchmark of three different libraries each one including about 1000 images.Two libraries represent image content through 64 bins color histograms. However, they usetwo different similarity measures, namely L1 (city-block) and L2 (Euclidean) norm, inorder to compare image histograms. The third library computes local histograms on thecomponents of a 3 × 3 grid which partitions the image. The CIE Lab is used as referencecolor space for the computation of this histogram, while the Euclidean distance is used tocompare image histograms.


For the experiments described in this session, the fusion engine represents image contentthrough a 32 bins color histogram and uses histogram intersection in the RGB space tocompute image similarity.

Experimental results are presented to report both on the quality of normalization coeffi-cients approximation and on the overall quality of the data fusion process.

Each library was processed separately. Libraries were first subject to the extraction ofresource descriptors according to the procedure presented in Section 2.1. Cluster cen-ters extracted through the resource description process were used as sample queries. Foreach sample query, values of normalization coefficients were computed, as explained inSection 4.1.

Evaluation of the proposed solution is carried out addressing two different accuracy mea-sures. The first one describes the accuracy associated with approximation of normalizationcoefficients. The second one describes the accuracy associated with ranking of mergedresults.

Values of normalization coefficients for sample queries were used to approximate thevalue of normalization coefficients for new queries. In particular, the accuracy of the ap-proximation was measured with reference to a random set of test queries, not used duringmodel learning. For each test query, values of normalization coefficients were approximatedaccording to Eq. (2). Actual values of normalization coefficients were also computed byrunning the query and following the same procedure used for model learning (Section 4.1).Comparison of actual and approximated values of normalization coefficients gives a mea-sure of the approximation accuracy.

Let a and b be the estimated value of normalization parameters and a and b their actualvalues. Quality of the approximation is measured by considering the expected value of thenormalized errors:

εa = E

[ |a − a|1 + |a|

], εb = E

[ |b − b|1 + |b|

](4)

Plots in figure 5 report values of the expected normalized errors for different valuesof the granularity of resource descriptions. Granularity is measured as the percentage ofresource descriptions cardinality with respect to the database size. According to this, asmaller granularity corresponds to a lower number of sample images used at learning time.It can be noticed that, even with coarse descriptors corresponding to low granularity levels,the value of the approximation error is always quite small. This experimentally confirms thatit is possible to use estimated parameters for the data fusion process instead of computingtheir actual values on the fly at retrieval time.

The second measure of accuracy aims to estimate the quality of the data fusion and ofthe ordering of retrieved images that it yields. This is obtained by considering how imagesare ordered using the data fusion engine to process all retrieved images. This results in thecomparison of two ordered lists of images.

The first list (data fusion list) is the output of the proposed data fusion module (calledLearning and Regression, L R in the following). The fusion module receives the N bestranked retrieved images from each library and transforms their original scores into


Figure 5. Normalization error for approximated parameters of the three digital libraries: (a) error on parametera and (b) error on parameter b.

normalized scores. This yields a list of N ∗ L images (being L the number of availabledigital libraries) ordered by normalized scores.

A second ranked list (reference list) is obtained by ranking the set of all N ∗ L retrievedimages using the data fusion search engine. To build this second list, each one of theN ∗ L images is processed in order to extract its content descriptor. Then, these contentdescriptors are compared against the user query and all images are sorted in decreasingorder of similarity. For this experimentation, the content of each image is described locallyin terms of a color histogram (according to the representation used by the data fusion engine,see Section 4), while the similarity between two image histograms is evaluated through thehistogram intersection distance. The three digital libraries used in the experiment are thosedescribed at the beginning ot this Section.

Let L(r )(q) = {(d (r )k , σ

(r )k )}M

k=1 and L(d f )(q) = {(d (d f )k , σ

(d f )k )}M

k=1 be the set of pairsimage/normalized-score for the reference and the data fusion list, respectively, obtainedby combining results returned from a query q. The size of both the lists is indicated asM = N ∗ L . Let f : {1, . . . , M} �→ {1, . . . , M} a function that associates the index i of ageneric image in the reference list to the index f (i) of the same image in the data fusion list.We assume that the best result for the data fusion process corresponds to the case in whichevery image in the data fusion list has the same rank that the image yields in the referencelist, that is: rank(r )(i) = rank(d f )( f (i)). Thus, a measure of the overall quality of the datafusion process can be expressed through the following function:

F = 1

N

∑i=1,N

∣∣rank(r )(i) − rank(d f )( f (i))∣∣ (5)

that is, the average value of rank disparity computed on the first N images of the two lists.This corresponds to the case in which a user asks for N images and the resource selectionmodule queries more than one library retrieving N images from each of them. Then, onlythe N images in the fused list which are more relevant are shown to the user.


Figure 6. (a) Quality of the data fusion for a sample query. The straight diagonal line is the ideal mapping whichpreserves the image rankings between the reference and the data fuser list. (b) Quality of the data fusion for theLearning and Regression method (LR curve) and for the Round-Robin approach (RR curve). Values of F arereported for different granularities.

The lower the value of F the higher the quality of data fusion (i.e. 0 corresponds to anoptimal data fusion).

As an example, figure 6(a) plots values of F(i) = |rank(r )(i) − rank(d f )( f (i))| for asample query and resource descriptions granularity equal to 4. In this experiments the threelibraries are considered (L = 3), each returning sixteen images (N = 16), so that themaximum size of the fused and reference lists is equal to M = 16 × 3 = 48. Resultingvalues are reported only for the first 32 images ranked according to the reference list. Thehorizontal axis represents the ranking produced according to the reference list, while thevertical axis is the ranking for the list originated by the data fuser. In this graphic, the straightdiagonal line represents the behavior of an ideal fuser, which yields the same ranking givenby the reference list. Each displacement with respect to this line indicates a difference inranking the corresponding image between the two lists.

It can be observed that the error is quite small for all ranks. In particular, the error is smallfor the first and more relevant images of the ranking. In addition only few of the imagescomprised in the first 32 of the reference list are ranked outside of this limit in the L R list.A similar behavior has been experimented in the average case.

Finally, the proposed L R solution for data fusion has been compared to the generalRound-Robin (R R) approach [21].

Figure 6(b) compares values of F for L R and R R, averaged on 100 random queries andfor different granularities. It results that the average ranking error for L R is quite small anddecreases with the granularity. It can be also observed that the Learning and Regressionsolution yields better results with respect to R R approach, with the gain of L R increasingwith the granularity.

6. Conclusion and future work

In this paper, an approach is presented for merging results returned by distributed imagelibraries. The proposed solution is based on learning and approximating normalization


coefficients for each library. Since the major computational effort is moved to the learningphase, the approach has a low processing time at retrieval time, making it applicable toarchives of images. In addition, differently from other fusion methods which rely on thepresence of common documents in different result lists, the proposed fusion strategy iscompletely general, not imposing any particular constraint on the results. Experimentalresults are reported to demonstrate the effectiveness of the proposed solution. A comparativeevaluation has also shown that the proposed Linear and Regression solution outperformsthe general Round-Robin approach to the data fusion problem.

Future work, will address experimentation and comparison of the effect of the learningtechnique on fusion effectiveness. In particular, we will investigate the possibility to adaptthe learning model to the specific digital library in order to use more appropriate mod-els than the linear one. Issues related to the relationships between granularity of samplequeries and accuracy of fusion will be also investigated together with a detailed analy-sis of relationships between the choice of fusion search engine and accuracy of modellearning.

References

1. S. Adali, P.A. Bonatti, M.L. Sapino, and V.S. Subrahmanian, “A multi-similarity algebra,” in Proc. of ACMSIGMOD Conf. on Management of Data, Seattle, WA, June 1998, pp. 402–413.

2. S. Berretti, A. Del Bimbo, and P. Pala, “Using indexing structures for resource descriptors extraction fromdistributed image repositories,” in Proc. IEEE Int. Conf. on Multimedia and Expo, Lousanne, Switzerland,Aug. 2002, Vol. 2, pp. 197–200.

3. J. Callan and M. Connell, “Query-based sampling of text databases,” ACM Transactions on InformationSystems, Vol. 19, No. 2, pp. 97–130, 2001.

4. W. Chang, D. Murthy, A. Zhang, and T. Syeda-Mahmood, “Metadatabase and search agent for multimediadatabase access over internet,” Int. Conf. on Multimedia Computing and Systems (ICMCS’97), Ottawa,Canada, June 1997.

5. W. Chang, G. Sheikholaslami, A. Zhang, and T. Syeda-Mahmood, “Efficient resource selection in distributedvisual information retrieval,” in Proc. of ACM Multimedia’97, Seattle, 1997.

6. S. Chawathe, H. Garcia-Molina, J. Hammer, K. Ireland, Y. Papakonstantinou, J. Ullman, and J. Widom, “TheTSIMMIS project: Integration of heterogeneous information sources,” in Proceedings of IPSJ Conference,Tokyo, Japan, Oct. 1994, pp. 7–18.

7. P. Ciaccia, M. Patella, and P. Zezula, “M-tree: An efficient access method for similarity search in metricspaces,” in Proc. of the Int. Conf. on Very Large Databases, Athens, Greece, 1997.

8. W.F. Cody, L.M. Haas, W. Niblack, M. Arya, M.J. Carey, R. Fagin, M. Flickner, D. Lee, D. Petkovic, P.M.Schwarz, J. Thomas, M. Tork Roth, J.H. Williams, and E.L. Wimmers, “Querying multimedia data frommultiple repositories by content: The garlic project,” IFIP 2.6 Third Working Conference on Visual DatabaseSystems (VDB-3), Lausanne, Switzerland, March 1995, pp. 17–35.

9. R. Fagin, R. Kumar, and D. Sivakumar, “Comparing top k lists,” in Proc. of the ACM-SIAM Symposium onDiscrete Algorithms (SODA’03), Baltimore, MD, Jan. 2003, pp. 28–36.

10. N. Fuhr, “Optimum database selection in networked IR,” in Proc. of the SIGIR’96 Workshop on NetworkedInformation Retrieval, Zurich, Switzerland, Aug. 1996.

11. L. Gravano and H. Garcia-Molina, “Generalizing gloss to vector-space databases and broker hierarchies,” inProc. of the 21st Int. Conf. on Very Large Data Bases, 1995, pp. 78–89.

12. I.F. Ilyas, W.G. Aref, and A.K. Elmagarmid, “Joining ranked inputs in practice,” in Proc. of the VLDBConference, Hong Kong, China, 2002, pp. 950–961.

13. S.T. Kirsch, Document Retrieval Over Networks wherein Ranking and Relevance Scores are Computed at theClient for Multiple Database Documents. US patent 5659732, 1999.


14. K.L. Kwok, L. Grunfeld, and D.D. Lewis, “TREC-3 Ad-hoc, routing retrieval and thresholding experimentusing PIRCS,” in Proc. of TREC-3, 1995, pp. 247–255.

15. MIND: Resource Selection and Data Fusion for Multimedia International Digital Libraries. EU project IST-2000-26061. http://www.mind-project.org.

16. N. Roussopoulos, S. Kelley, and F. Vincent, “Nearest neighbor queries,” in Proc. of the 1995 ACM SIGMODInt. Conf. on Management of data, San Jose, CA, May 1995, pp. 71–79.

17. L. Si and J. Callan, “Using sampled data and regression to merge search engine results,” in Proc. of Int. ACMSIGIR Conf. on Research and Development in Information Retrieval, Tampere, Finland, 2002, pp. 19–26.

18. L. Si and J. Callan, “A semisupervised learning method to merge search engine results,” ACM Transactionson Information Systems, Vol. 21, No. 4, pp. 457–491, 2003.

19. V.S. Subrahmanian, S. Adali, A. Brink, R. Emery, J. Lu, A. Rajput, T.J. Rogers, R. Ross, and C. Ward,HERMES: Heterogeneous Reasoning and Mediator System. http://www.cs.umd.edu/projects/hermes/.

20. A. Tomasic, L. Raschid, and P. Valduriez, “Scaling heterogeneous databases and the design of disco,” in Proc.of the Int. Conf. on Distributed Computer Systems, Hong-Kong, 1996.

21. E.M. Vorhees, N.K. Gupta, and B. Johnson-Laird, “Learning collection fusion strategies,” in Proc. ACM-SIGIR’95, 1995, pp. 172–179.

22. E.M. Vorhees, N.K. Gupta, and B. Johnson-Laird, “The collection fusion problem,” in The Third Text REtrievalConference (TREC-3).

Stefano Berretti received the doctoral degree in Electronics Engineering and the Ph.D. in Information Technologyand Communications from the Universita di Firenze, in 1997 and 2001, respectively. Since 2002, he is AssistantProfessor of Computer Engineering at the University of Firenze. His current research interest is mainly focusedon content modeling, retrieval, and indexing for image and video databases.

Alberto Del Bimbo is Full Professor of Computer Engineering at the Universita di Firenze, Italy. Since 1998 heis the Director of the Master in Multimedia of the Universita di Firenze. At the present time, he is Deputy Rectorof the Universita di Firenze, in charge of Research and Innovation Transfer.

His scientific interests are Pattern Recognition, Image Databases, Multimedia and Human Computer Interaction.Prof. Del Bimbo is the author of over 170 publications in the most distinguished international journals andconference proceedings. He is the author of the “Visual Information Retrieval” monography on content-basedretrieval from image and video databases edited by Morgan Kaufman.


He is Member of IEEE (Institute of Electrical and Electronic Engineers) and Fellow of IAPR (InternationalAssociation for Pattern Recognition). He is presently Associate Editor of Pattern Recognition, Journal of VisualLanguages and Computing, Multimedia Tools and Applications Journal, Pattern Analysis and Applications, IEEETransactions on Multimedia, and IEEE Transactions on Pattern Analysis and Machine Intelligence. He was theGuest Editor of several special issues on Image databases in highly respected journals.

Pietro Pala graduated in Electronics Engineering at the University of Firenze in 1994. He received the Ph.D.in Information and Telecommunications Engineering in 1997. Presently, he is Assistant Professor of ComputerEngineering at the University of Firenze, Italy where he teaches Database Management Systems. His main scientificinterests include Pattern Recognition, Image and Video Databases and Multimedia Information Retrieval. He isauthor of over 40 publications in the most distinguished international journals and conference proceedings andserves as reviewer for a number of international journals, including IEEE Transactions on Multimedia, IEEETransactions on Pattern Analysis and Machine Intelligence and IEEE Transactions on Image Processing.