a content-based image retrieval system

15
ELSEVIER Image and Vision Computing 16 (1998) 149- 163 A content-based image retrieval system Chung-Lin Huang *, Dai-Hwa Huang Electrical Engineering Department. National T.sing Hua Univusity. Hsin ChuTaiwan Republic of China Received 12 August 1996; revised 26 August 1997: accepted 1 I September 1997 Abstract This paper proposes a Content-Based Images Retrieval (CBIR) system which uses a modified geometric hashing technique to retrieve similar shape images from the image database. The CBIR system is a two-stage image retrieval system: the outline-based image retrieval and the hash-table-based image retrieval. For each object, we extract the feature points to generate the individual hash-table which is constructed by using the geometric properties of every three feature points. In the first retrieval stage, we use the shape parameters of the input sketched query image to select the possible candidate models in the database. The individual hash tables of these candidate models are combined as the global hash table for the second retrieval stage which is a voting process using the invariant indices from the sketched query image and the global hash table. The number of votes indicates the score of matching between the query image and the candidate models. In the experiments, we have illustrated that the CBIR system can accurately retrieve the similar images from the database by using scaled. rotated, or mirrored sketched query images. 0 1998 Elsevier Science B.V. Keywords: Geometric hashing; Fourier descriptor; Invariant moment; Feature point selection; Similarity measure 1. Introduction Image database systems have been generated at an ever- increasing rate. The conventional image databases use an alphanumeric index for images retrieval. However, human beings are not used to retrieving pictures based on their alphanumeric indices. Currently, an integrated feature extraction and object recognition technique has been devel- oped to allow queries on large image databases using example images, user-constructed sketches and drawings, selected color and texture patterns, and other graphical information. Among them, shape information is the simplest and most important one, for instance, in the field of engi- neering, meteorology, manufacturing, entertainment, people use free-hand line drawings to present their concepts. Ferguson [l] pointed out that mechanical engineers have relied on visual shapes for centuries, but nowhere is the use of visual shapes more pronounced and acknowledged than in the education of engineers. The similar-shape-based query system can be treated as the retrieval or selection of all contours or images in the databse that are similar to the shape of the query image. Texture pattern is another form of information for image query. The traditional KLT is used [2-41 to classify the * Corresponding author. E-mail: [email protected] 0262.8856/98/$19.00 0 1998 Elsevier Science B.V. All rights reserved PII SO262-8856(97)00062-O different textures in a large texture database that permits separate treatment of image components which are closer to the human perception of similarity. A more complex system called the Photobook system [5] is a set of interac- tive tools for browsing and searching images and image sequences. IBM’s QBIC (Query-By-IMage-Content) system [8-l I] is one of the most powerful image database query systems; it was developed to query a large on-line image database using the image content as the basis of the queries. Examples of the content they used include the color, texture, and shape of the image objects and regions. Most of the query systems use shape information for image query, e.g., the data-driven indexed hypotheses (DDIH) method [6] and the Feature Index-Based Similar-Shape Retrieval system [7]. However, these query systems handle simple shapes only and their polygonal-shape-based object and feature selection algorithm requires that the query image has to be very similar to the target image stored in the database. The shape’s area, circularity, eccentricity, and major axis orientation, and a set of algebraic moment invar- iants are the shape features [lo]. These shape features can distinguish the outline of different shapes, but cannot differ- entiate further the order of similarity with the query image using these shape features. A simple global feature-based shape representation and a multi-dimensional point-access method are used for similar shape retrieval [ 121. The global

Upload: chung-lin-huang

Post on 16-Sep-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A content-based image retrieval system

ELSEVIER Image and Vision Computing 16 (1998) 149- 163

A content-based image retrieval system

Chung-Lin Huang *, Dai-Hwa Huang

Electrical Engineering Department. National T.sing Hua Univusity. Hsin ChuTaiwan Republic of China

Received 12 August 1996; revised 26 August 1997: accepted 1 I September 1997

Abstract

This paper proposes a Content-Based Images Retrieval (CBIR) system which uses a modified geometric hashing technique to retrieve similar shape images from the image database. The CBIR system is a two-stage image retrieval system: the outline-based image retrieval and the hash-table-based image retrieval. For each object, we extract the feature points to generate the individual hash-table which is constructed by using the geometric properties of every three feature points. In the first retrieval stage, we use the shape parameters of the input sketched query image to select the possible candidate models in the database. The individual hash tables of these candidate models are combined as the

global hash table for the second retrieval stage which is a voting process using the invariant indices from the sketched query image and the global hash table. The number of votes indicates the score of matching between the query image and the candidate models. In the experiments, we have illustrated that the CBIR system can accurately retrieve the similar images from the database by using scaled. rotated, or mirrored sketched query images. 0 1998 Elsevier Science B.V.

Keywords: Geometric hashing; Fourier descriptor; Invariant moment; Feature point selection; Similarity measure

1. Introduction

Image database systems have been generated at an ever-

increasing rate. The conventional image databases use an alphanumeric index for images retrieval. However, human beings are not used to retrieving pictures based on their

alphanumeric indices. Currently, an integrated feature extraction and object recognition technique has been devel- oped to allow queries on large image databases using example images, user-constructed sketches and drawings, selected color and texture patterns, and other graphical information. Among them, shape information is the simplest

and most important one, for instance, in the field of engi- neering, meteorology, manufacturing, entertainment, people use free-hand line drawings to present their concepts. Ferguson [l] pointed out that mechanical engineers have

relied on visual shapes for centuries, but nowhere is the use of visual shapes more pronounced and acknowledged than in the education of engineers. The similar-shape-based

query system can be treated as the retrieval or selection of all contours or images in the databse that are similar to the shape of the query image.

Texture pattern is another form of information for image query. The traditional KLT is used [2-41 to classify the

* Corresponding author. E-mail: [email protected]

0262.8856/98/$19.00 0 1998 Elsevier Science B.V. All rights reserved PII SO262-8856(97)00062-O

different textures in a large texture database that permits separate treatment of image components which are closer to the human perception of similarity. A more complex system called the Photobook system [5] is a set of interac-

tive tools for browsing and searching images and image sequences. IBM’s QBIC (Query-By-IMage-Content) system [8-l I] is one of the most powerful image database query

systems; it was developed to query a large on-line image database using the image content as the basis of the queries. Examples of the content they used include the color, texture, and shape of the image objects and regions.

Most of the query systems use shape information for image query, e.g., the data-driven indexed hypotheses (DDIH)

method [6] and the Feature Index-Based Similar-Shape Retrieval system [7]. However, these query systems handle simple shapes only and their polygonal-shape-based object

and feature selection algorithm requires that the query image has to be very similar to the target image stored in the database. The shape’s area, circularity, eccentricity, and major axis orientation, and a set of algebraic moment invar- iants are the shape features [lo]. These shape features can distinguish the outline of different shapes, but cannot differ- entiate further the order of similarity with the query image using these shape features. A simple global feature-based shape representation and a multi-dimensional point-access method are used for similar shape retrieval [ 121. The global

Page 2: A content-based image retrieval system

150 C.-L. Huang and D.-H. Huang/Image and Vision Computing 16 (1998) 149-163

/ Calculrting

(1) Invariant Moments (2) Fourier Descriptors (3) Individual Hash Table \ /

(1) Invariant Moments -(2) Fourier Descriptors

(3) Invariant Indices L

“““““‘i I I I : t I I I I :

Outline-based I Retrieval Phase I

i :

Candidate Models i : :

Global Hash Table I __-______ Ccner8tion Phase

Fig. 1. The flow diagram of the CBIR system.

Another approach is the geometric indexing technique [ 13,141, in which the feature correspondence and the data-

base searching are replaced by a table look-up mechanism. They compute invariants from an image that are then used as indexes to look up tables containing references to the object model. The advantage of geometrical indexing is its applications involving large model database. However, Grimson and coworkers [ 15,161 pointed out the noise sen- sitivity of the geometrical hashing method. Geometric hash-

ing performs well on a single scene with little sensor noise, but its performance degrades significantly even with limited amounts of clutter or occlusion. Therefore, Califano and Mohan [ 17,181 used high-dimensional indices which dras- tically increase the signal-to noise ratio. They proposed seven-dimensional global invariants computed by correlat- ing triplets of local curve descriptors at longer range.

feature-based shape representation schemes cannot handle suitable for image query application. Here, we apply the geo-

query images with partially visible or a subpart of objects. metrical hashing technique to reduce the retrieval time (Fig. 1).

Geometric hashing pre-compiles the object models into a hash table by computing and combining the correlated local features to produce invariant high-dimensional global shape descriptors (indices). Then, it uses the invariance of the query image for fast indexing to the hash table to find the possible matches. If enough sensor invariance scores a hit

when the system indexes certain entries in the hash table, then the models similar to the input image are identified. Each image stored in the database has been pre-processed to obtain the shape boundary, and some interest points on .the boundary are automatically labeled. They may be the max- imum local curvature boundary points or vertices of the boundary polygonal approximation. These interest points with larger curvatures or sharp angles are neither likely to be part of image distortion nor sensitive to the noise effect.

The CBIR system, consisting of four phases, retrieves the images from the database based on the geometric properties of the input query image’s boundary. The input query ima- ge’s shape need not be very similar to the to-be-retrieved ones and the outputs will be a list of images with different similarity measure scores. The first phase is the image data-

base generation phase which includes the input images from scanner, thinning, editing, feature extraction and indi- vidual hash table generation. The second phase is the

outline-based image retrieval phase which identifies the global shape similarity. The third phase is the global

This paper proposes a Content-Based Image Retrieval (CBIR) system combining the conventional shape- parameter-based method and the geometric-hashing-based retrieval technique. CBIR is very similar to a model-based objection recognition system; a model representing each known shape has been developed and stored in the image database. Given an input query image, the system will com- pare the input with the object models in the database for the best match. However, the conventional matching process, which is one-by-one sequential matching, is slow and not

Page 3: A content-based image retrieval system

C.-L. Huang and D.-H. Huang/lmage and Vision Computing 16 (1998) 149-163

hash-table generation phase which combines the individual

hash-tables of these candidates (previously stored in the

database) to generate the global hash-table. The fourth

phase is the index-based image retrieval phase which uses the invariant features of the input query image to compute the indices. The indices are used to retrieve entries in the

global hash table. The entries which indicate the partial global matches are used to vote for all possible candidates whose shapes are similar to the sketched query image.

The major limitation of the geometrical hashing tech- niques is the finite size of the hash table. As the number of models stored in the hash table decreases, the list of linked entries corresponding to the same index will also increase dramatically and the table may saturate. If the

size of the hash table is extremely large, then the speed of geometric indexing will be very slow. Therefore, in our CBIR system, we use the outline-based retrieval to pre-

select some candidate models for the global hash-table gen- eration. However, there are a large varieties of boundary- similar images which may have different interior shapes. So we need to apply the geometric-hashing-based matching to do the final similarity rating. The advantages of the CBIR system are the fast image query and the small system mem- ory requirement for geometric indexing.

2. Image database generation phase

In the image database, each image frame consists of the original picture, its shape parameters, and the individual hash table. The shape parameters are the moments [ 191 and Fourier Descriptors (FDs) [ 191 of the image outline.

The individual hash table is composed of the invariant indices and entries. Invariant indices are generated from the tangent angles or the junction angles of three feature points. Entries are depicted in terms of the angles and dis- tances. The image database generation phase consists of six steps: (1) scan and then store the original images in the database for query, (2) pre-process the input images through

thinning, editing the contour, and removing the branches, (3) select the feature points, (4) compute the shape param- eters as indices, (5) generate the individual hash table for

each stored image, and (6) calculate the shape descriptors for each stored image. To retrieve the original images in the

image database by using a simple sketched contour as an input query image, we pre-edit each original complex image for a self-sufficient simple one. Then, the geometric proper- ties of the simplified contour are extracted and stored in the database, and then used to generate the individual hash table. Here, we can use off-line processes to modify these images, extract their shape parameters, and generate the individual hash table.

2.1. Thinning and contour editing

The images, captured by an image scanner and processed by the binary thresholding are separated into the background

Fig. 2. Original image

and the object. Pixels of gray levels larger than this thresh- old are assigned gray level 255 (i.e., bright), while the others are assigned gray level 0 (i.e., dark). Then the binary images

are processed through a thinning process followed by a branch removal process. In this section, we first use the thinning pro- cess to find the skeletons of the images stored in the database. However, the skeleton images have many small branches which are not required for the image retrieval processes

because this generates too many feature points. For the indi- vidual and global hash-table generation phase, more feature points indicate more space and computing time for indexing.

Moreover, the performance of the system wil rapidly decrease during the query process. To generate a further simplified version of each stored image, we edit the unnecessary

branches by using the following branch removal algorithm.

2.1.1. Branch removal algorithm

The region boundary points are assumed to have value 255 and background points are assumed to have value 0. A 3 X 3 mask is shown as

and the algorithm is described as follows:

step 1) Find any boundary point p ,. step 2) Circumvent the point p, following the order:

P2P3,“‘P9.

step 3) Find the number of transitions from background to boundary (O-255 transitions).

step 4) If the number is less than or equal to one, then

change p I from a boundary point to a background point. step 5) Repeat steps l-4 until no change occurs.

Fig. 3. After thinning and removing branch.

Page 4: A content-based image retrieval system

152 C-L. Huang and D.-H. HuangImage and Vision Computing 16 (1998) 149-163

Y ----- Jmtion point

- Curve Line

Fig. 4. Junction points.

After branch removal, there are some small redundant

closed loops which are useless for the indexing and query phases. To eliminate the redundant loops, we manually edit the small closed loops into branches, and then apply the

branch removal process. The results of the two procedures on the original image in Fig. 2 are shown in Fig. 3.

2.2. Feature point selection

The feature points must be similar whether they are selected from the query image or from the to-be-retrieved

ones in the database. Moreover, these feature points must be translation, rotation and scale invariant. So we choose junc- tion points and curvature points as the feature points in our CBIR system. Junction points are joints of three connection

lines and curvature points are selected from boundary points which have local maximum curvature. Feature points play a very important role in the CBIR system in hash-table gen- eration and indexing. Chang et al. [20] listed the following

requirements of feature points extraction: (1) The feature points are invariant for translation, rotation and scale of objects and a cyclic shift of the starting point for feature

points must be invariant. (2) The distinction between the feature points of two different objects must be as large as possible. (3) The number of feature points of an object must be as small as possible. (4) The computational time of

feature points must be short and the storage capacity of

feature points must be as small as possible. There are two kinds of feature points, junction points and

curvature points. The junction points are located at the joints of three connection lines as shown in Figs. 4 and 5.

The curvature points are found in the following steps:

step 1) A series of points r(k) on the connection lines between two junction points or a closed loop. There- fore, the branch points r(k) can be expressed as

r(k) = (rx(k), V(k)) (1)

where ~-x(k) is the x-axis coordinate and v(k) the y-axis coordinate of r(k), where k = O,l;..N.

step 2) To ensure that the noise will not affect the result of the query, we smooth r(k) by the Gaussian function

g(p,a> defined as

I (2)

Let x(k) and y(k) be

x(k) = rx(k) 0 g(u> 0) (3)

y(k) = ryW 0 g(uv 4

where @ indicates the convolution.

step 3) The curvature is defined as

(4)

K(k) = x’(W’(k) - y’bQ”W where k = o 1

(x’(k)* + y’(k)2)3’2 3 > .-*N (5)

step 4) Sort K(k) to generate a list of the local maximum curvature points. Select the local maximum curvature points from the sorted list. If the selected point is very close to the previously selected points, then we dis- regard the point and find the next one. The number of curvature points, denoted by nk, is determined by the number of junction points and the proportional ratio of the length of r(k) to the length of image contour, i.e.

I

15 Nj = 0

5 O<Nj<3

4 31Nj<5

nk = 1 3 5sNj<l

2 7sNjC9

1 Nj Z 9

0 Nj> 18

(6)

Fig. 5. Junction points. Fig. 6. Curvature points.

Page 5: A content-based image retrieval system

C.-L. Huang and D.-H. Huang/lmage and Vision Computing 16 (1998) 149-163 153

Fig. 7. Feature points.

where Nj represents the number of junction points

If L > 0.15 then increase nk by nk = nk + 1

If L > 0.25 then increase ??k by nk = nk + 2

If L > 0.35 then increase nk by nk = nk + 3

If L > 0.45 then increase nk by nk = nk + 4

If L > 0.55 then increase nk by nk = nk + 5

(7)

where L represents the proportion of the pixel number of branch points to the pixel number of image contour, i.e., L = (# of branch points)/(# of contour points).

The selected curvature points and the feature points of an image are shown in Fig. 6 and Fig. 7, respectively. To make an un-biased voting process for the geometric hashing, we select a fixed number of feature points for each image in the database. However, for each image, the number of the junc-

tion points is different. Therefore, for a fixed number of

, 0, rfA is a junction point A

=

1, If A is a curvature point

Ie = I

0, If B is a junction point

1, If B is a curvature point

I 0, If C is a junction point c

=

I 1. If C is a curvature point

el=&=e, If IA =o

el=e6=e, If h=o h= e,=e, y I~= 0 where 8 is shown in Fig. 9

feature points, we determine the number of the to-be-

selected curvature points based on the number of the exist-

ing junction points and the length of the contour where these curvature points are located.

2.3. Invariant indices and entries selection

This section disusses how to generate the invariant

indices and entries by using the geometric properties of

any three feature points and the relationship between the three feature points and the reference point (the centroid of the designated image). We select any three feature points

which produce the invariant indices as shown in Fig. 8. Let A, B and C be three feature points selected from a contour image. These three points from a triangle of which the properties can be described by either parameters

(terms) which are classified as the following three different index types:

(1) The first type is the indentity index. These terms are formed as an entry in the first invariant indices. The first

three terms are represented as I,, Ia and Ic which indi- cate whether the selected feature points are junction points or curvature points. (2) The second type is the feature angle index which consists of three terms f?,, Bj, and Ok (see Fig. 8). The indexes are the feature angles between the tangent line at this point and the connecting line from this point to the other two points as the feature angles. If the feature point is a junction point, then we generate three tangen- tial half-lines starting from this feature point and

measure the three angles, $ ,, $?, and +3 (see Fig. 9).

Fig. 8. The index selection from the feature points

Page 6: A content-based image retrieval system

154 C.-L. Huang and D.-H. Huanghage and Vision Computing I6 (1998) 149-163

c--, Tangent Line

-Curve Line

e = mWQ$dJ Fig. 9. The feature angle determined from a junction point.

The feature angle of this junction point is selected as the minimum angle of the three, i.e. 19 = min(4,,&,9s). (3) The last type is the triangle index. These three inner angles of the triangle connecting these three feature

points A, B, and C are denoted as CY, /3 and y. We only use two of the three angles as the triangle index, because the third one is a compliment to the other two

angles.

The entries of the individual hash table are shown in

Fig. 10. First, let A, B and C be the three feature points in an image and D denote the reference point. We can find the relative distances and relative angles between the feature

points and the reference point. These entries can be char- acterized by five parameters (terms) which can be categor- ized into the following three different entries types:

(1) The first type is the destination angle entry. The first two terms represent the angles between the three feature points and the reference point. The direction of the reference point can be retrieved by the angle entry from the feature points in query phase. i.e. pi, i

= 1,2;*. ,6. The destination angle entry index consists

of any combination of these six angles. There are a total of six combinations illustrated in Fig. 10. (2) The second type is the destination length entry. The middle two terms represent the relative distance from any two of the three feature points to the reference point. The function of the destination length entry is

the same as the destination angle entry. The relative distance of reference point can be retrieved by the length entry from the feature points in query phase. The destination length entry is defined as IdI = IDv, III-I, Id, = ID~~III~I~, where yI are v2 are any two of the three selected feature points, D is the reference point, and Im1 is the normalization factor. (3) The last type is the model ID entry. The last entry represents the label identification of the candidate model image in the database. In the individual hash table, there is only one model ID entry linked to the

Fig. 10. The entry of aa hash table.

specific angle and length entries. However, in the global hash table, there may be more than one candidate model that has the same index. There are many differ- ent model ID entries linked to the same angle and length entries. During the second retrieval process (voting process), the model ID entry indicates which accumulator to vote, whereas the angle entry and the length entry indicate the destination ballot box of that

designated accumulator.

2.4. Individual hash table generation

The construction of individual hash table is shown in

Fig. 11. Any three feature points are used to generate six indices and six entries as shown in Fig. 8 and Fig. 10,

respectively. Each image in the database is used to construct an individual hash table in the following steps:

step 1) Find the reference point D (i.e., centroid) of the image. step 2) Choose three feature points A, B, and C which are either junction points or curvature points. The dif- ferent combinations of the identities of the selected three feature points are used as the first three terms (IA, Ia, Ic) of the invariant indices. step 3) If the feature point i is a junction point then find

Page 7: A content-based image retrieval system

C.-L. Huang and D.-H. Huang/tmage and Vision Computing 16 (199s) 149-163 155

Fik8 Fit 10

Three feature points 1 1

Invariant indices Entries

Fig. I I. The construction of an individual hash table.

three tangential angles, $,, 42, and 43, and select the minimum one as the index 8 for the corresponding

feature point. step 4) If the feature point i is a curvature point then

generate a tangential line and a connecting line from this point to its next feature point. Measure the angle between these two lines as the index $,, i = 1,2;..,6 for the corresponding feature point (see Fig. 8).

step 5) Find the triangle angle indices CY, fi and y, i.e., the three interior angles of AABC. step 6) Analyze these three angles and see whether they are measurement error insensitive or not. If they are not, then these indices are not generated, or else the eight-

items indices (five of them need to be quantized) are created and stored in the hash table to link the following

entries. step 7) Find the destination angle entry (4;, i =

1,2;.. ,6), destination length entry and model ID entry. These three entries have been defined in the

previous section. step 8) Form six different combinations of indices of the three feature points and six different combinations of entries of the corresponding three feature points which are shown in Fig. 8 and Fig. 10, respectively. These entries are linked with the corresponding indices. step 9) Repeat steps 2-8 for every three feature points of the image. step 10) Repeat steps l-9 for each image in database.

The model ID entry is essential because the individual hash tables are combined to generate a global hash table in which the same index may point to a list of different model IDS entries.

3. The outline-based retrieval phase

Since the image database is large, if we use all the images

stored in the database for hash-table generation, then the

hash-table will be enormously large. In the second phase, we analyze the outline of the query image to retrieve all similar candidates. In the CBIR system, we have two image retrieval stages: the outline-based image retrieval and the hash-table-based image retrieval. CBIR uses the shape parameters as the features for the outline-based retrieval process. There are many shape descriptors for boundary representation [ 191. To make sure that the first

stage process can roughly scan the image database without losing the correct one, our CBIR system adopts the invariant moments and Fourier Descriptors (FDs) to describe the input object, and applies these parameters to identify most of the similar shapes from the image database.

During the image database generation phase, the CBIR system uses the invariant moments and FDs as shape

descriptors to describe the shapes of all theimages stored in the database. Then in the next phase, given an input query image, we use its shape descriptors and compare it with all shape descriptors stored in the image database to retrieve the most similar-shaped images. The selection criterion is based on the minimum distance between the input query image and all images in the database. Less than twenty of the most similar-shaped images are chosen to be the candidate

models for the next retrieval phase. The object’s region of each image is described by

calculating the invariant moments by using equations (Al)-(All) (see Appendix A). The object’s contour of each image frame in the image database is smoothed by the Gaussian filter defined as

(8)

After smoothing, the contour of the image is sampled and quantized to a fixed number of points (i.e., 520 points in our experiment). This process removes the noise and improves the accuracy of the second retrieval process. Then we cal- culate the FDs using Eq. (A12) and Eq. (A13).

In the first retrieval process, we let the shape parameters of the input query image be generated by using the same method. Let its shape parameters be the shape-invariant moments as Sl(k), for k = 1,2,3;..7, and FDs as SF(i). for i = 1,2,3;..9, respectively. Moreover, we assume that the shape-invariant moments of each image in the image

database are U(k), for k = 1,2,3;..7, and the FDs are IF(i), for i = 1,2,3...9. Since we need a fast pre-scan for the possible candidates stored in the database, we do not apply complex similarity measurement, but use a simple outline-based similarity measure called the city-block

Page 8: A content-based image retrieval system

156 C.-L. Huang and D.-H. Huang/Inqe and Vision Computing 16 (1998) 149-163

distance defined as

D = i kZ(k) -H(k) t + t b’(i) -IF’(i)/ (9) k=l i= 1

Based on the distance, we select some (say twenty) images as the candidate models for the next retrieval phase. The city-block distance measurement can assure the boundary similarity but not interior similarity between two images. The outline of the sketched query image may not be intem-

ally similar to the retrieved images. Therefore, to avoid a miss in the first retrieval phase, we would rather select as many candidate images as possible for the next retrieval phase. The number of selected candidates is determined

by the size of the database and the computation capacity of the system. If we put more images stored in the database,

then we need a larger main memory and a higher computa- tion power for selecting more candidates and generating a more powerful global hash table for the second retrieval

phase.

4. Measurement error analysis

To increase the query speed and improve the accuracy, we analyze the measurement errors effect on the indexing. There are two different measurement errors: (1) the loca- tions of feature points, (2) the interior angles of the triangle. Because the sketched query image cannot be completely matched with images in the database, we have to consider the unavoidable location errors of the feature points that may influence the three interior angles of the triangle con- necting three feature points. These angles are used as the

indices to the hash table. If the difference of the maximum and minimum values of one of three inner angles caused by

the measurement errors exceeds a certain threshold, then the hit ratio between the two sets of indices from the query image and the hash table will be extremely low. Therefore, such a combination of these three feature points will not be

selected for either the hash table generation or the geometric indexing in the second retrieval phase.

We need to analyze the measurement errors so that the indexing computed from the sensor data has the highest hit rate to the corresponding hash table entry. Here, we analyze the measurement-error effect on the angular index rather than on the parameters of affine transformation (&I) [l&19]. First, we disregard the feature angles generated from the three feature points of which the small misplace- ments of their locations have a major impact on the precise- ness of the feature angles. Second, we quantize the measured feature angles as the indices for the hash table indexing.

4.1. Three feature points selection

Here we show how to analyze the angular index gener- ated from three feature points which is sensitive to the

Fig. 12. Arbitrary three points with noise model.

measurement erros. First, given any arbitrary three points PO, p 1 andp2 with small misplacements shown in Fig. 12, we may analyze all the possible angles, from the largest to the smallest expanding angles (i.e., from point pI to points p. and&. Let these points be located atpo = (yo,yo), p, = (y ,, yl), and p&,y2). We assume that they are perturbed with noise and can be expressed by a circular area as

1 x =x0 + e*cos(to) x = y] + &cOS(tl) x =x2 + &.COS(Q)

y = y0 + .=in(to) i y = y1 + esin(t,) 1 y = y2 + e.sin(t2)

(10)

where E is the possible location measurement error of the

corresponding feature point. Then, ; and ? can be expressed by using the above equation as

2 = {c, + e[cos(to) - cos(tl)], cl + .5[sin(to) - sin(t,)}

(11)

C = ( c2 + e[cos(t,) - cos(t,)], c3 + e.[sin(&) - sin(

whereco=xo-xl,cl=yo-yl,~2=.x2-xl,~3=yZ-yl. Here, we assume that E << [;I, E < I;/. So, the cosine value of the expanding angle between two vector in and in (i.e., cos w) can be approximated by the inner product as

ii.; cos(w) = jzJjg = k(ih3) (12)

where k can be treated as a constant because we can

assume that the variation of the three variables (to, tl, and t2) have a negligible influence on the lengths 121 and [;I. To find the maximal or minimal values of the expanding angle, we solve the following three partial differential equations

dcos(w) = o

ato aces(o) _ o aces(w) = o

’ at,- ’ at2 (13)

Here, we can find two solutions, which are the maximal and minimal angles, from two of the eight combinations of the

Page 9: A content-based image retrieval system

C.-L. Huang and D.-H. Huang/Image and Vision Computing 16 (1998) 149-163 157

following extremities as

tan-’ c3 0 G +a

I t, = tan- ’ (z), tan-l(s) +7r (14)

Cl tz=tan-’ - ,

0 tan-’ Cl

CO 0 & +a

To increase the speed of three feature points selection for

the hash table generation, we pre-construct a look-up table (assume E = 2) which contains the maximum and minimum angles of an arbitrary combination of three designated feature points. These maximum and minimum angles w are obtained by substituting Eq. (14) into Eq. (11) and Eq. (12). By using li/, lpopll and Ip2p11 as indices for table look- up, we may easily find these extreme angles. Since the nonregular distortion of the simple sketched query image is unavoidable, for arbitrary three feature points, if the dif- ference between the maximum and minimum values of one of the three angles exceeds a certain threshold, these three feature points will not be selected for either the hash table

generation or the second retrieval phase. The feature points combination screening only reduces the missing rate during the hash table indexing, but also decreases the retrieval time.

4.2. Quantization

The measured expanding angles are continuous values (i.e., O”--180”) which need to be maximally quantized to L levels with minimal quantization error for hash table indexing. So, we need to analyze the histogram of the

expanding angles. Since each expanding angle has a range of possible values, the histogram generation has to count every possible value of each angle. Then, we may use the least mean square error quantizer to quantize the continuous

angle value to L levels. During the angle indexing, since each angle is measured with the maximal and minimal values, we apply the mean of the angle values to the quan-

tizer for generating the actual index for each angle. How- ever, in the experiments, we simplify our system by using only the uniform quantizer.

4.3. Global hash table generation phase

The global hash-table stores high dimensional global shape descriptors of all the candidate models. There are eight indices generated for each combination of three

feature points. The global hash-table generation phase mixes the individual hash-tables of all candidate models in different entries and generates a link list of entry. The list of entries indicates that different candidate models may have the same local shape information. In our CBIR system, the global hash table generation combines about twenty individual hash tables. The global hash table is generated by using the invariant indices of images. These indices are

translation, scale and orientation invariant. The global hash

table generation phase is mentioned as follows:

step 1) Find the skeleton of the query image. step 2) Find the invariant moments and FDs of the

query image. step 3) Find the candidate models from the outline- based retrieval phase. step 4) Combine the individual hash tables of the can-

didate models to generate a global hash table.

The combination is simply linking all the entries with the

same index which may come from different individual hash tables. If the link-list is too long, it may make the hash table function inefficient. Therefore, we cannot combine too many individual tables for a global hash table. Another concern for global hash table generation is the system mem- ory size. The hash table is resident inside the system mem- ory for fast indexing, so that the selected candidate models can be effectively queried in the second retrieval phase. There are two alternatives that allow the CBIR system to select more candidate models: (1) more system memory, (2) less selected features of each candidate model.

4.4. The second retrieval phase

In the second retrieval phase, CBIR fetches the internal features of the query image for indices and the geometric hashing, and then rates the similarity measure between the query image and the selected candidates. Here, the invariant indices of the query image are identified (see Section 2.3). Then, we use these indices to retrieve the corresponding destination angle entry, the destination length entry and the destination model entry in the global hash table. Using these entries, the CBIR system generates a destination ballot box to cast a vote for each candidate model indexed by model ID entry. The voting results are stored in a corre-

sponding accumulator for each candidate model. After all the indexing and voting operations having been finished, for each candidate model, we can find one ballot box which has the maximum number of votes of all ballot boxes in the accumulator. The order of similarity between the query

image and the candidate model is determined by the max- imum votes of the designated ballot box in the correspond- ing accumulator.

The concept of voting is originated from the Generalized Hough Transform [21] which exhibits a great merit for

shape identification. The most important of all is its immu- nity to noise (or measurement error) and its capacity to handle multiple objects identification in parallel operation. Here, the Hough Transform concept has been modified for geometrical hashing in which the voting processes for all candidate models are performed simultaneously. Instead of having all points for the voting process, our geometric bash- ing algorithm selects only the feature points for indexing which is followed by voting in the accumulators. Our CBIR system has made the following improvements: (1) hash

Page 10: A content-based image retrieval system

158 C.-L. Huang and D.-H. Huanghnage and Vision Computing 16 (1998) 149-163

Tbm feature points

fe8ture points of query im8ge

.

.

Global bash-table

Fig. 13. The second retrieval phase.

table indexing for voting process, (2) size-invariant index-

ing, (3) orientation-invariant indexing, (4) measurement- error-insensitive indexing, (5) mirror-image invariant indexing.

The CBIR can accurately retrieve the images from the pictorial database because: (1) the feature points selected from the input images and the designated candidate images are similar; (2) the generated indices are size- and orientation-invariant. Using these entries of invariant indices to vote, CBIR may easily retrieve the similar

shape images. With the global hash table, the second retrieval phase (shown in Fig. 13) is mentioned as follows:

step 1) Find all the curvature and junction points of the query image. step 2) From these feature points, select any three of them and do the measurement error analysis to see whether their cr, @ or y indices are measurement- error-insensitive.

step 3) If the (;Y, /3 or y indices of the three junction or curvature points are measurement-error-insensitive then they are used to generate the eight indices

(i.e., identity index, IA, Is, Ic; feature angle index, Bi, i = 1,2;** 6; and the triangle index, (Y, @ and 7) as the index to retrieve an entry from the global hash table. The indices generation operations are similar to steps

2-5 of the individual hash table generation. step 4) From the global hash table, using the above indices, we can obtain the destination angle entry, i.e., +i, i = 1,2;.. ,6, destination length entry, i.e., Id,,

Zdz, and model ID entry, which are used to indicate a destination point of certain model ID. For each indexed candidate model, we have a corresponding accumulator which is increased by one for the destination ballot when it is referred. step 5) After all the indexing operation are finished, the accumulated votes of each indexed model are counted. The number of votes indicates the score of matching

between the input query image and the candidate models stored in the image database.

4.5. Experimental results and discussion

The CBIR system is implemented under the Windows 95 operation system by using Visual C + + 4.0 tool kits and a personal computer with Pentium-166 CPU. The prototype system has 205 images stored in the database. There are two main operations in CBIR: one is image database generation, the other is image query. The system interfaces includes (1)

adding images to the database and removing images from database; (2) indexing, thinning and branches removal for individual images; and (3) querying from the image data- base has also been developed.

4.5.1. The experiments and results

Fig. 14 and Fig. 15 show some images stored in an image database with 205 images. The results of the image database query for similar images are shown in Figs. 16-25. In these

Fig. 14. Some images stored in the database (frames 1- 15).

Page 11: A content-based image retrieval system

C.-L. Huang and D.-H. Huang/Image and Vision Computing 16 (1998) 149-163

Fig. 15. Some images stored in the database (frames 191-205) Fig. 17. The retrieved images.

figures, the first row displays query image input from the table; the second and the third rows are the retrieved image frames. The similarity order is from up to down and from left to right. In each retrieved image frame, the first number at the left top comer of the frame represents the order of similarity. The second number right below the first number in each retrieved image frame denotes the ratio of the max- imum number of votes in a single ballot box to the total votes for the corresponding accumulator.

not locally similar to the candidate images, (2) the feature points selected from the candidate images and the query image are not very similar either, and (3) the placement error of the selected feature points not only decreases the votes of correct indexing but also increases the votes of incorrect indexing. To increase the accuracy of similarity measure, we apply the measurement error analysis to screen-out some error-sensitive indices generated from the feature points.

From Figs. 16-25, we illustrate that the geomtric-hash- ing-based similarity measure in the second retrieval phase is different from the outline-based similarity measure in the

first retrieval phase. Since the sketched query image may have quite a different shape from the to-be-retrieved ones, and the 205 images stored in the database have a large number of varieties, we cannot expect that all the retrieved images will be very similar to the sketched query image. We can only say that they will look similar. The problems of the shape-based image query system (it is different from the conventional object recognition problems) are nontrivial

due to the facts that: (1) the query image is globally but

In the experiments, we do the image query by using the scaled, rotated and mirrored sketched query images. In Figs. 16-25, we use a manually sketched line-drawing to

query the image database and we find that the results are acceptable. Comparing Fig. 17 and Fig. 18, we use two different but similar sketched query images and find that the most similar image is rated number one in the list for both cases, and five best matched figures in Fig. 17 are also found in Fig. 18. In Fig. 21, we use a smaller query image

for the similar images retrieval to demonstrate the scale- invariant property of our retrieval system. Then, in Fig. 22 and Fig. 24, we use the rotated sketched query image as an

Fig. 16. The retrieved images. Fig. 18. The retrieved images.

Page 12: A content-based image retrieval system

160 C.-L. Hung and D.-H. Huang/Image and Vision Computing 16 (1998) 149-163

Fig. 19. The retrieved images.

input, and find that the results are a little bit worse than that of Fig. 17. Some of the best matches in Fig. 17 are not found in Fig. 22. The correct image is rated number two. The second retrieval is not completely rotation-insensitive because the angular indices are sensitive to the measure- ment errors. However, there is a small difference in the similarity measure of the first two retrieved images. The correct image is still rated number two. In Fig. 23, we use

a mirrored sketched query image and find that the results are as good as Fig. 17 and Fig. 18. The sketch query image may be not similar to any one image in the database (i.e., Fig. 25). However, the CBIR system may still find some that are most similar to the query ones. The similarity is based on the subjective judgement of the end-user. If the sketch query is completely non-similar to any one in the database, the

system will fail.

4.5.2. Discussion

In the first retrieval phase, the reason why we select less than twenty images as our candidate models is because of the limitation of memory space and the query speed. The

Fig. 2 I. The retrieved images.

more candidate models selected, the larger number of the individual hash tables generated. On the other hand, if we select fewer candidate models, there is a possibility that we may lose the real best-matched image in the first retrieval phase. The outline-based similarity does not necessarily indicate interior similarity between two figures. Therefore, to avoid a probable miss, we choose as many candidate models as possible for the next retrieval phase. The number

of selected models depends on the computation power of the system. The shape information (including the FD and individual hash table) of each image is off-line generated and stored in the second memory, whereas the global hash table is on-line generated and stored in the main memory.

In the experiments, we discover that the more feature points selected, the more similar images are rated in the first five positions. However, a large hash table increases

the time of query. There is a trade-off between the accuracy and speed. If we select more candidate models, it will create the problems of memory size limitation and query speed. The CBIR system has been developed under a PC with 16 M DRAM; the system performance is good when the CBIR

Fig. 20. The retrieved images. Fig. 22. The retrieved images.

Page 13: A content-based image retrieval system

C.-L. Huang and D.-H. Huang/Image and Vision Computing 16 (1998) 149-163 161

Fig. 23. The retrieved images

selects less than twenty candidate models in the first outline- based retrieval phase and the number of selected feature points for each image is about 30, which has been described in section 2.2.

The retrieval time is proportional to the number of feature points of query image and the size of the global hash table of all the candidate models which have been retrieved in the first retrieval phase. In our experiments, it takes about 3 seconds to select about twenty candidate models in the first retrieval phase from 205 frames image database, and then takes less than one minute to display the order of simi- larity for these candidate models in the second retrieval phase. The retrieved images are more or less similar to the query image. In general, the performance of our system is

good in both accuracy and query speed. In the future, the image database in our CBIR system can be easily extended without introducing any noticeable delay.

We combine the global descriptors and geometric hashing which produces significantly better results. The global descriptor can only be applied for the outline-based similarity measure: it does not necessarily find the interior similarity between two figures. Since a large varieties of boundary-similar images may have different interior shapes, we need to apply the geometric-hashing-based matching (a complete similarity measure) to do the similarity rating for the selected candidates. We had tried using only the con- ventional global descriptor for similarity measure, but we found that the correct retrieval rate is too low. We may use only the geometrical indexing; however, the size of the global hash table was so large that the speed of indexing

was unacceptably slow. However, to avoid a probable miss in the first retrieval

phase, we need to extract as many candidate models as possible for the next retrieval phase. The number of selected candidates depends on the size of the database (the number of pre-stored image frames) and the capacity of the comput- ing environment. If there are thousands of images stored in the image database, then we need to select more candidates,

Fig. 24. The retrieved image\.

a larger main memory for the global hash table, and a higher computation power for the geometric hashing operation.

4.6. Conclusions

We have developed a content-based image retrieval (CBIR) system which is different from the conventional sequential image query system. The CBIR system can be implemented in parallel to retrieve the images from the database efficiently. The retrieval time is proportional to the number of the selected feature points of query image and the size of the global hash table. The size of the global hash table is determined by the number of selected candi- date models retrieved from the database in the first retrieval

phase. It takes about one minute to retrieve a query image from the database which contains 205 shape images. More- over, we have tested our system 100 times and found that

about 92% of the queries have correctly retrieved thedesig- nated image which is rated before the 5th place of all the candidate images.

Fig. 25. The retrieved images

Page 14: A content-based image retrieval system

162 C.-L. Huang and D.-H. Huanghnage and Vision Computing 16 (1988) 149-163

Appendix I. Invariant moments

The object shape can be described by moments. Here, we use the invariant moments [19] to describe object’s shape efficiently and effectively. In a two-dimensional continuous

function,Jlx,y), the moment of order (p + q) is defined as

m m

mP4= _-m J J _ $‘y’f<x, yW4y

then the central moments can be expressed as

m cc

J J pPq= --CD

_ JX - ~>P(Y - ~>‘fk y)dxdy 642)

where

x=m,,,y=!!!!Z ml0 m00

For a digital image, we modify the above equation as

Ilpq = x x 6 - w f_Y - LJqf’f(x7 Y) (A3) x Y

Use the above equation, we have [ 191

p00=m00 cc11 =mll -Wl0

ho=0 p30 = m30 - 3Xmzo + E2mlo

cc01 =o p12 = ml2 - 2Lml I - Tmo2 + 2Y2mlo

p20 = rnzo - Xrn,, p21 = m21 - 2Xml I - yrnzo + 2E2mol

cc02 =m02 -POI ~103 =mo3 -3Ym02 +2L2m01

Then the normalized central moments are defined as

PPq rl,, = Y

cc00 (A4)

wherer=(p+q)/2+ lforp+q=2,3,--.Asetofseven invariant moments can be derived from the second and the third moments as [ 191

+I= r/20+7702 W)

a'2 = (7720 - 7?02)2 + 477:1 646)

a.3 =(rl30-3m2)~+(3~21 -7703)~ (A7)

~4=(7?30+v12)2 +(7721 +ro3)2 (A@

95 =('?30 - 31112)('130 +1112)[(r]3O+r112)~ - 3(1/21 +1103)21

+(3v21 -vO3)('121 +vO3)[3(v30 +r]12)~ -(v21 +v03)21

(A9)

% = (r/20 - ~02)[(r/30+%2)2 - (7?21 +r]03)21

+417ll(r30 +vl2)(121 +1103) (AlO)

+7=(%2l -v30)(r/30+%2)[(v30+%2)2 -3(?21+v03)21

+(h2 - ??3O)(v21 +v03)[3('I30+%2)~ - (r]21 +r]03)21

(All)

This set of moments is invariant to translation, rotation, and scale. In the CBIR system, we only use the first four terms to

present an object’s shape.

Appendix 2. Fourier Descriptors

The Fourier Descriptors (FDs) have been used to describe object’s boundary efficiently and effectively. First, in the

digital image, an N-point digital boundary can be found in the x-y plane. Then, we arbitrarily select a starting point (xo,yo), and scan the boundary points in a clockwise order to

get an N-point boundary sequence (xo,yo),~~~,(~~_I,yN_I). The boundary sequence can be expressed by a series of complex numbers. To eliminate the effect of bias, we subtract the mean of the boundary point as

S(k)=[x,-x]+j~k-L]fork=0,1,2,*~.,N-1 (A12)

where

The Discrete Fourier Transform (DFT) of S(k) is defined as

N-l

a(u) = i z S(k)exp[ - j2nukJN] for u = 0, 1,2, .a~, N - 1. k-0

(Al3).

Then we normalize a(u) to obtain the magnitude of the complex coefficients a(u) which are translation, scale and rotation invariant. In the CBIR system, we only use the first nine terms to present an object’s contour.

References

[ l] E. Ferguson, Engineering and the Mind’s Eyes. Cambridge, MA, MIT

Press, 1992.

[2] R.W. Picard, T. Kabir, Finding similar patterns in large image data-

bases, IEEE ICASSP’93, pp. V-161-V-164.

[3] R.W. Picard, Fang Liu, A new world ordering for image similarity, in:

Proceeding of IEEE Conf. ASSP, Adelaide, Australia, 1994, pp. 1-4.

[4] S.F. Chang, J.R. Smith, Extracting multi-dimensional signal features

for content-based visual query, SPIE 2501 (1995) 995- 1006. [5] A. Pentland, R.W. Picard, S. Sclaroff, Photobook: Tools for content-

based manipulation of image databases, SPIE Conf. Storage and

Retrieval of Image and Video Databases II, No. 218.5, Feb 6-10,

San Jose CA, 1994, pp. I-14. [6] W.I. Grosky, R. Mehrotra, Index-based object recognition in pictorial

data management, Comput. Vision, Graphics, and Image Process 52

(1990) 416-436.

[7] R. Mehrotra, J.E. Gary, Similar-shape retrieval in shape data manage-

ment, IEEE Computer 28 (9) (1995) 57-62.

[8] M. Flickner et al., Query by image and video content system, IEEE

Computer 28 (5) (1995) 23-32.

Page 15: A content-based image retrieval system

C.-L. Huang and D.-H. Huang/Image and Vision Computing 16 (1998) 149-163 163

[9] R. Barber, W. Equitz, M. Flickner, W. Niblack, D. Petrovic, P. Yanker,

Efficient query by image content for very large image databases,

COMPCON, San Francisco, 1993, pp. 17-19.

[IO] W. Niblack et al., The QBIC Project: querying images by content

using color. texture and shape, in: Proc. of SPIE Storage and Retrieval

for Image and Video Databases, 1993, pp. 173- 187.

[I I] D. Lee, R. Barber, W. Niblack, M. Flickner, J. Hafner, D. Petkovic,

Query by image content using multiple objects and multiple features:

user interface issues, IEEE ICIP’94, 1994, pp. 76-80.

[ 121 C. Faloutsos et al.. Efficient and effective querying by image content.

Research Report RJ 9453, IBM Research Division, San Jose, CA,

1993.

[ 131 Y. Lamdan. H.J. Wolfson, Geometric hashing: A general and efficient

model-based recognition scheme. in: Proc. 2nd Int. Conf. Computer

Vision, 1988. pp. 238-249.

[ 141 D.T. Clemens, D.W. Jacobs, Model group indexing for recognition,

in: Proc. IEEE CVPR. 1991, pp. 449.

[ 151 W.E.L. Grimson, D.P. Huttenlocher, On sensitivity of geometric

hashing, in: Proc. 3rd Int. Conf. Computer Vision, Osaka. Japan.

1990, pp. 334-338.

[I61 W. Grimson, D. Huttenlocher. D. Jacobs, Aftine matching with

bounded sensor error: a study of geometric hashing and alignment.

Int. J. of Computer Vision I3 (I) (1994) 7-32.

[ 171 A. Califano, R. Mohan, Multidimensional indexing for recognizing

visual shapes, in: Proc. IEEE CVPR, 1991. pp. 28834.

[ 181 A. Califano. R. Mohan, Multidimensional indexing for recognizing

visual shapes, IEEE Trans. PAM1 I6 (4) (1994) 3733392.

1191 R.C. Gonzalez, R.E. Woods. Digital image processing. Addison-

Wesley Pub. Co.. New York, 1992.

[20] C.C. Chang. S.M. Hwang, D.J. Buehrer. Shape recognition scheme

based on relative distances of feature points from the centroid A.

Pattern Recognition 24 (I I) (1991) 1053-1063.

[21] D.H. Ballard, Generalizing the Hough transform to detect arbitrary

shapes. Pattern Recognition I3 (2) ( I98 I ) I I I 122.