[ieee 2010 international conference on management and service science (mass 2010) - wuhan, china...

4
A New Approach of Image Similarity Calculation Chin-Jung Huang Department of Mechanical and Computer Aided Engineering St. John's University Taipei, Taiwan [email protected] Abstract—Due to fast development of information technology and increasing demand for the picture similarity used in clinical diagnosis, quality control of product manufacturing process and identified pictures, a similarity calculation method with fast, effective, correct relative similarity of two pictures is essential. This study proposes a new image similarity computing method, the method is to divide picture into a limited number of grids and then use Hough Transform to do the features capture among grids and calculate the amount of features and express as a image numeric vector format. By the vector pair comparison of two images, we calculate the distance similarity and the angle similarity and get the average of these two similarities as the relative similarity of two images. The research shows that the accuracy of this new algorithm of picture similarity is 80% as dividing picture into 100 grids. When we increase the more quantity of picture dividing grids, we can obtain the higher accuracy. Keywords- grid; features capture; Hough Transform; similarity I. INTRODUCTION With large amounts of image information and the development in image processing technology, images have been widely applied to medical image, robot vision, industrial and commercial applications, and multimedia application domains, which require to compare the relative similarity of two images. To date, the image recognition is analyzed through image feature extraction. Therefore, a fast, effective and accurate computing method for the relative similarity of two images is demanded. The new image similarity computing method proposed in this study is to separate the preprocessed digital image into limited grids, and then carry out shape feature extraction and shape feature quantity calculation of the image in the grid by using Hough Transform. Then, the image numerical vectors and matrices in the same format are expressed, and the relative similarity between images is calculated. Finally, the image relative similarity matrix is exported. II. RELATED WORKS The three common methods for image feature extraction are to capture the features, such as color, shape and texture in images. The feature extraction technology can be divided into two parts according to the image attribute capture method, the first is global, and the second is local. The common feature extraction methods include Scale-Invariant Feature Transform [3] [4], Power Method and Hough Transform[5]. The Scale-Invariant Feature Transform (SIFT) was first proposed by David Low, and Lowe further proposed the algorithm using SIFT feature point for automatic panoramic image stitching. This algorithm uses the invariability of SIFT feature point in scale space, rotation and image scaling, even the invariability to a great extent in brightness change or affine transform or 3D-projection, as well as the local feature matching for search and identification of image comparison. The Power Method is to calculate the eigenvalues, but it only deduces the eigenvalues with the maximum absolute value among all eigenvalues and its equivalent eigenvectors. This eigenvalue with the maximum absolute value is called the primary eigenvalue; the eigenvector corresponding to this primary eigenvalue is called the primary eigenvector. The curve detection is to identify the curve conforming to curvilinear equation from some disordered geometric data. The curve detection can be used for searching lines, rounds and ellipses. Hough Transform is the most commonly used method among all curve detections; its main concept is to transform the image data points to the parameter space before majority decision. III. IMAGE SIMILARITY COMPUTING METHOD This study proposed a computing method to calculate the relative similarity between images in a simple, fast, effective and accurate manner for images with obvious shape features. Therefore, the flow chart of image similarity computing method proposed in this study is shown in Figure 1. The content is described as follows: (1) Image pre-processing The image pre-processing includes gray scaling, binarization and edge detection. The constructed image pre- processing and the image Hough Transform in grid are used for shape feature extraction and shape feature quantity computing system. The automotive rim with obvious shape feature is taken as an example. (2) Image segmenting grid, the Hough Transform is used for grid feature extraction and feature quantity calculation, expressed as image vector and matrix. The main shape features of images in this study include line, curve, and round [6]. The curve includes sector, the round refers to a complete round. The types and quantity of the primary features of all grids of image are captured, and the grid position is given the priority; secondly, the feature type in order of line, curve and round, and lastly, the feature quantity, 978-1-4244-5326-9/10/$26.00 ©2010 IEEE

Upload: chin-jung

Post on 06-Dec-2016

214 views

Category:

Documents


2 download

TRANSCRIPT

A New Approach of Image Similarity Calculation

Chin-Jung Huang Department of Mechanical and Computer Aided Engineering

St. John's University Taipei, Taiwan

[email protected]

Abstract—Due to fast development of information technology and increasing demand for the picture similarity used in clinical diagnosis, quality control of product manufacturing process and identified pictures, a similarity calculation method with fast, effective, correct relative similarity of two pictures is essential. This study proposes a new image similarity computing method, the method is to divide picture into a limited number of grids and then use Hough Transform to do the features capture among grids and calculate the amount of features and express as a image numeric vector format. By the vector pair comparison of two images, we calculate the distance similarity and the angle similarity and get the average of these two similarities as the relative similarity of two images. The research shows that the accuracy of this new algorithm of picture similarity is 80% as dividing picture into 100 grids. When we increase the more quantity of picture dividing grids, we can obtain the higher accuracy.

Keywords- grid; features capture; Hough Transform; similarity

I. INTRODUCTION With large amounts of image information and the

development in image processing technology, images have been widely applied to medical image, robot vision, industrial and commercial applications, and multimedia application domains, which require to compare the relative similarity of two images. To date, the image recognition is analyzed through image feature extraction. Therefore, a fast, effective and accurate computing method for the relative similarity of two images is demanded.

The new image similarity computing method proposed in this study is to separate the preprocessed digital image into limited grids, and then carry out shape feature extraction and shape feature quantity calculation of the image in the grid by using Hough Transform. Then, the image numerical vectors and matrices in the same format are expressed, and the relative similarity between images is calculated. Finally, the image relative similarity matrix is exported.

II. RELATED WORKS The three common methods for image feature extraction are

to capture the features, such as color, shape and texture in images. The feature extraction technology can be divided into two parts according to the image attribute capture method, the first is global, and the second is local. The common feature extraction methods include Scale-Invariant Feature Transform [3] [4], Power Method and Hough Transform[5].

The Scale-Invariant Feature Transform (SIFT) was first proposed by David Low, and Lowe further proposed the algorithm using SIFT feature point for automatic panoramic image stitching. This algorithm uses the invariability of SIFT feature point in scale space, rotation and image scaling, even the invariability to a great extent in brightness change or affine transform or 3D-projection, as well as the local feature matching for search and identification of image comparison.

The Power Method is to calculate the eigenvalues, but it only deduces the eigenvalues with the maximum absolute value among all eigenvalues and its equivalent eigenvectors. This eigenvalue with the maximum absolute value is called the primary eigenvalue; the eigenvector corresponding to this primary eigenvalue is called the primary eigenvector.

The curve detection is to identify the curve conforming to curvilinear equation from some disordered geometric data. The curve detection can be used for searching lines, rounds and ellipses. Hough Transform is the most commonly used method among all curve detections; its main concept is to transform the image data points to the parameter space before majority decision.

III. IMAGE SIMILARITY COMPUTING METHOD This study proposed a computing method to calculate the

relative similarity between images in a simple, fast, effective and accurate manner for images with obvious shape features. Therefore, the flow chart of image similarity computing method proposed in this study is shown in Figure 1. The content is described as follows:

(1) Image pre-processing

The image pre-processing includes gray scaling, binarization and edge detection. The constructed image pre-processing and the image Hough Transform in grid are used for shape feature extraction and shape feature quantity computing system. The automotive rim with obvious shape feature is taken as an example.

(2) Image segmenting grid, the Hough Transform is used for grid feature extraction and feature quantity calculation, expressed as image vector and matrix.

The main shape features of images in this study include line, curve, and round [6]. The curve includes sector, the round refers to a complete round. The types and quantity of the primary features of all grids of image are captured, and the grid position is given the priority; secondly, the feature type in order of line, curve and round, and lastly, the feature quantity,

978-1-4244-5326-9/10/$26.00 ©2010 IEEE

expressed in the form of [grid position - feature type - feature quantity], e.g. [grid 3 - line - 2] and [grid 3 - round - 1].

Images

pre-processing

Segmenting grid

Expressed as image vector and matrix

Hough Transform

Normalization

Image numerical vectors of normalized

Calculation of similarity

Relative similarity matrix of images

Input

Output

Process

Figure 1. The flow chart of image similarity computing method

An automotive rim image is taken as an example. The image is segmented into 16 grids to describe the simulation operating procedure of the example and its results, as shown in Figure 2. The results of feature extraction and feature quantity calculation of 16 grids of the full automotive rim image are listed in Table 1, in the order of line (L), curve (C) and round (R) features.

Figure 2. The automotive rim image is segmented into 16 grids

As each grid contains three features, which are line, curve and round, each grid has three component elements, thus, the 16 grids of the full image have 48 component elements. The image numerival vector I of unnormalized automotive rim image is as (1). Six automotive rim images are used for example simulation and result verification, as shown in Figure 3.

I =[0 2 0 1 2 0 1 2 0 0 2 0 2 3 0 5 1 1 5 1 1 2 3 0 0 2 0 4 0 1 4 0 1 0 2 0 2 1 0 2 1 0 2 2 0 2 1 0] (1)

(3)Calculation of image similarity (a) Express transformed image as image numeric vector

When m images are expressed as m image numeric vectors, each image numeric vector has n dimensions, the image numeric matrix table is as (2). For example, the i-th image numeric vector and the j-th image numeric vector.

⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥

⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢

=

⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥

⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢

mnmkmimm

jnjkjijj

inikiiii

nki

nki

m

j

i

IIIII

IIIII

IIIII

IIIIIIIIII

I

I

I

II

21

21

21

2222221

1111211

2

1

(2)

mi ,,3,2,1= ; mj ,,3,2,1= ; nk ,,3,2,1= (b) Normalize all dimension elements of image numeric vector

For the k-th dimension element of the i-th image numeric vector, use (3) to normalize the k-th dimension elements of m image numerival vectors, so as to transform the k-th dimension elements of m image numeric vectors between 0 and 1.

minmax

min)()(

)()(

ikik

ikikik II

IIv

−−

= (3)

ikv : Value of the k-th dimension element of the i-th image numeric vector after normalization, the value is 0~1.

ikI : Value of the k-th dimension element of the i-th image numeric vector before normalization.

max)( ikI : The maximum value of the k-th dimension element of m image numeric vectors

min)( ikI : The minimum value of the k-th dimension element of m image numeric vectors

(c) Image similarity calculation and image similarity matrix

When m images are represented as with n dimensions image numeric vectors after normalization, such as

[ ]iniiii vvvvNI ,,,, 321= and [ ]jnjjj vvvvNIj ,,,, 321= , their

Euclidean distance, length, and inner product are defined in equations (4) to (6) respectively.

Euclidean Distance : ∑ −==

n

kjkikji vvNINID

1

2),( (4)

Length : ∑=∑===

n

kjkj

n

kiki vNIvNI

1

2

1

2 , (5)

Inner Product : ∑ ⋅>=<=

n

kjkkiji vvNINI

1, (6)

In this paper, the relative similarity (S) between two images equals to the average mean of distance similarity (DS) and

angle similarity (AS), after 2

)1( −mm times of pairwise

comparison calculation of m images, the calculation of relative similarity between any two images is expressed as (7) to (9) [1] [2].

]),(

),([1),(

,ji

ji

jiji

NINIDmaxNINIDNINIDS

−= (7)

π

],

[

1),(

1

ji

ji

jiNINI

NINICos

NINIAS

><

−=

(8)

2),(),(

S jijiji

NINIASNINIDS += (9)

The distance similarity and the angle similarity values are between 0 and 1, so the similarity value is between 0 and 1. The larger the similarity between two image vectors is, the more similar the two images are. If S = 1, two images are completely the same; if S = 0, two images are completely different.

According to (4)~(9), each automotive rim image is segmented into 16 grids, then image numeric vectors of 6 normalized automotive rim images is is shown in Table 2. The calculation of similarity for image I1 to images I2 to I6 is shown in Table 3.

According to the abovementioned similarity computing results, the relative similarity matrices of 6 automotive rim images segmented into 16 grids are summarized. The relative similarity matrix is symmetric matrix. When the 6 automotive rim images are segmented into 25 grids, 36 grids, 49 grids, 64 grids 81 grids and 100 grids respectively, expressed as numeric vectors and the relative similarity matrix derived from similarity calculation after feature extraction. Due to limited paper length, only the relative similarity matrix of 16 grids is presented, as shown in Table 3.

TABLE 3. RELATIVE SIMILARITY MATRIX OF 6 IMAGES SEGMENTED INTO 16

GRIDS

Image No. I1 I2 I3 I4 I5 I6 I1 1 1 0.696 0.483 0.517 0.551 I2 1 1 0.696 0.483 0.517 0.551 I3 0.696 0.696 1 0.467 0.521 0.537 I4 0.483 0.483 0.467 1 0.423 0.399 I5 0.517 0.517 0.521 0.423 1 0.685 I6 0.551 0.551 0.537 0.399 0.685 1

In order to find out the relationship between the quantity of grids and the accuracy rate of similarity, as well as the accuracy of the similarity computing method, each automotive rim image is segmented into 16, 25, 36, 49, 64, 81 and 100 grids respectively for the similarity values converted into corresponding similarity rating. The accuracy rate of similarity rating deduced from different numbers of grids is then verified through comparison.

For example, the similarity matrix of 16 grids in Table 3 is converted into the corresponding similarity rating. The spacing between the maximum value and the minimum value is classified into very high, high, medium, low, very low similarity ratings as the separation of image similarity rating. Then the similarity value is converted into the corresponding similarity rating for comparison. The first row in Table 3 is taken as an example to describe the similarity value converted into corresponding similarity rating. The rating spacing is

103.05

483.011 =−=d , so the ratings of image similarity

values are separated into 0.483~0.586, 0.586~0.690, 0.690~0.793, 0.793~0.897 and 0.898~1, corresponding to very low, low, medium, high and very high similarity ratings.

(4) Verify the accuracy rate of similarity computing method

In order to have a preliminary understanding of the accuracy of the proposed similarity computing method, the actual image similarity determination was used for comparison. A questionnaire survey of relative similarity based on 6 experimental automotive rim images was carried out. The subjective similarity determination of these 20 professional persons in this domain was used as the comparison basis of verifying the example simulation inference similarity accuracy.

The relations between the aggregated grid quantity of 6 rim images and the accuracy rate of inference similarity rating are listed in Table 4 and Figure 4.

50%

60%70%

80%90%

100%

16 25 36 49 64 81 100

Numbers of grid

Accuracyrate

Figure 4. Graph of relation between the quantity of segmented grids and the

accuracy rate of inferenced similarity rating

IV. CONCLUSION Six rim image examples were adopted to simulate the

verification of the efficiency of new similarity computing method. The conclusions are as follows:

(1) When the image is segmented into grids, the numbers of regions increase, and the inferenced similarity computing accuracy rate increases relatively.

(2) When the image is segmented into 100 grid regions, the similarity computing accuracy rate deduced from the new similarity computing method is 80%. When the image is segmented into grids and the quantity of regions increases, the inferenced similarity accuracy rate is greater than 80%.

(3) The proposed new image similarity computing method considers both the similitude effects of distance and included angle, its similarity resolving ability is much better than that considers only distance similitude effect or included angle similitude effect.

REFERENCES

[1] C. J. Huang, M. Y. Cheng, “Rule-based Knowledge Similarity Measurement Using Conditional Probability,” Journal of Information Science and Engineering, Vol. 24., No. 3, 2008, pp. 769-784.

[2] C. J. Huang*, Min-Yuan Cheng, “Conflicting Treatment Model for Certainty Rule-based Knowledge,” Expert System with Applications, Vol. 35, issues 1-2, 2008, pp. 161-176.

[3] David G. Lowe, “Distinctive image features from 116 scale-invariant Key points,” International Journal of Computer Vision, No.60, 2004, pp. 91-110.

[4] David G. Lowe, ”Automatic Panoramic Image Stitching using Invariant Features,” International Journal of Computer Vision, No. 74 (1), 2006, pp. 59–73.

[5] Ioannou Dimitrios, Huda Walter, Laine Andrew F., ”Circle Recognition through a 2D Hough Transform and Radius Histogramming”, Image and Vision Computing, 17, 1999, pp. 15-26.

[6] Z. Wang, Z. Chi and D. Feng, “Shape Based Leaf Image Retrieval,” IEE Proc.-Vis.Image Signal Proc., vol.150, no. 1, 2003.

Image No. I1 I2 I3 I4 I5 66

Image

Figure 3. Six automotive rim images

TABLE 1. FEATURE QUANTITY OF THE AUTOMOTIVE RIM IMAGE I1 IS SEGMENTED INTO 16 GRIDS

Grid No. Grid 1 Grid 2 Grid 3 Grid 4 Grid 5 Grid 6 Grid 7 Grid 8 Features L C R L C R L C R L C R L C R L C R L C R L C R

Feature quantity 0 2 0 1 2 0 1 2 0 0 2 0 2 3 0 5 1 1 5 1 1 2 3 0 Grid No. Grid 9 Grid 10 Grid 11 Grid 12 Grid 13 Grid 14 Grid 15 Grid 16 Features L C R L C R L C R L C R L C R L C R L C R L C R

Feature quantity 0 2 0 4 0 1 4 0 1 0 2 0 2 1 0 2 1 0 2 2 0 2 1 0

TABLE 2. CALCULATION OF SIMILARITY OF IMAGE I1 TO IMAGES I2 TO I6

Items Image Distance Length Inner product Distance similarity Angle similarity Similarity

I2 0 4.0172 16.1380 1 1 1 I3 2.4576 4.2140 13.9280 0.5846 0.8076 0.6961 I4 4.3365 5.1962 12.1663 0.2670 0.6981 0.4825 I5 3.7349 3.3311 6.6423 0.3687 0.6653 0.5170 I6 3.4236 2.6300 5.6667 0.4213 0.6802 0.5507

TABLE 4. GRID QUANTITY OF IMAGE SEGMENTED AND ACCURACY RATE OF INFERENCE SIMILARITY RATING Quantity of

segmented grids Test of total quantity of

similarity ratingsInference correct

quantityInference wrong

quantity Accuracy

rate 16 15 9 6 60.0% 25 15 10 5 66.6% 36 15 10 5 66.6% 49 15 11 4 73.3% 64 15 11 4 73.3% 81 15 11 4 73.3%

100 15 12 3 80.0%