neighbor-finding based on space-filling curves

22
Information Systems 30 (2005) 205–226 Neighbor-finding based on space-filling curves $ Hue-Ling Chen, Ye-In Chang* Department of Computer Science and Engineering, National Sun Yat-Sen University, Lien Hai Road, 70, Kaohsiung 804, Taiwan, ROC Received 24 July 2002; received in revised form 14 November 2003; accepted 22 December 2003 Abstract Nearest-neighbor-finding is one of the most important spatial operations in the field of spatial data structures concerned with proximity. Because the goal of the space-filling curves is to preserve the spatial proximity, the nearest neighbor queries can be handled by these space-filling curves. When data are ordered by the Peano curve, we can directly compute the sequence numbers of the neighboring blocks next to the query block in eight directions in the 2D- space based on its bit shuffling property. But when data are ordered by the RBG curve or the Hilbert curve, neighbor- finding is complex. However, we observe that there is some relationship between the RBG curve and the Peano curve, as with the Hilbert curve. Therefore, in this paper, we first show the strategy based on the Peano curve for the nearest- neighbor query. Next, we present the rules for transformation between the Peano curve and the other two curves, including the RBG curve and the Hilbert curve, such that we can also efficiently find the nearest neighbor by the strategies based on these two curves. From our simulation, we show that the strategy based on the Hilbert curve requires the least total time (the CPU-time and the I/O time) to process the nearest-neighbor query among our three strategies, since it can provide the good clustering property. r 2003 Elsevier Ltd. All rights reserved. Keywords: Hilbert curve; Neighbor-finding; Peano curve; RBG curve; Space-filling curves; Spatial proximity 1. Introduction With the proliferation of wireless communica- tions and the rapid advances in geographic information systems (GIS), a very common spatial query is the ‘‘Nearest Neighbor Query’’ which seeks for the objects residing more closely to a given object. For example, when driving a car in a highway, the driver may ask where the nearest gas station is. In other words, nearest-neighbor-find- ing is one of the most important spatial operations in the field of spatial data structures regarding proximity. Basically, spatial data consist of objects in space made up of points, lines, regions, rectangles, and data of higher dimensions. Access methods are required to support efficient manipulation of the multi-dimensional spatial objects in the secondary storage. Several schemes [1–7] for the linear mapping of a multi-dimensional space have been proposed for various applications, such as access methods for traditional databases, GIS and spatio- temporal databases. However, in spatial applica- tions, the structure of databases become larger and more complex and means for extracting valuable ARTICLE IN PRESS $ Recommended by Nick Koudas. *Corresponding author. Tel.: +886-7-525-2000x4334; fax: +886-7-525-4301. E-mail address: [email protected] (Y.-I. Chang). 0306-4379/$ - see front matter r 2003 Elsevier Ltd. All rights reserved. doi:10.1016/j.is.2003.12.002

Upload: hue-ling-chen

Post on 26-Jun-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Neighbor-finding based on space-filling curves

ARTICLE IN PRESS

Information Systems 30 (2005) 205–226

$Recommen

*Correspond

+886-7-525-43

E-mail addr

0306-4379/$ - se

doi:10.1016/j.is

Neighbor-finding based on space-filling curves$

Hue-Ling Chen, Ye-In Chang*

Department of Computer Science and Engineering, National Sun Yat-Sen University, Lien Hai Road, 70, Kaohsiung 804, Taiwan, ROC

Received 24 July 2002; received in revised form 14 November 2003; accepted 22 December 2003

Abstract

Nearest-neighbor-finding is one of the most important spatial operations in the field of spatial data structures

concerned with proximity. Because the goal of the space-filling curves is to preserve the spatial proximity, the nearest

neighbor queries can be handled by these space-filling curves. When data are ordered by the Peano curve, we can

directly compute the sequence numbers of the neighboring blocks next to the query block in eight directions in the 2D-

space based on its bit shuffling property. But when data are ordered by the RBG curve or the Hilbert curve, neighbor-

finding is complex. However, we observe that there is some relationship between the RBG curve and the Peano curve, as

with the Hilbert curve. Therefore, in this paper, we first show the strategy based on the Peano curve for the nearest-

neighbor query. Next, we present the rules for transformation between the Peano curve and the other two curves,

including the RBG curve and the Hilbert curve, such that we can also efficiently find the nearest neighbor by the

strategies based on these two curves. From our simulation, we show that the strategy based on the Hilbert curve

requires the least total time (the CPU-time and the I/O time) to process the nearest-neighbor query among our three

strategies, since it can provide the good clustering property.

r 2003 Elsevier Ltd. All rights reserved.

Keywords: Hilbert curve; Neighbor-finding; Peano curve; RBG curve; Space-filling curves; Spatial proximity

1. Introduction

With the proliferation of wireless communica-tions and the rapid advances in geographicinformation systems (GIS), a very common spatialquery is the ‘‘Nearest Neighbor Query’’ whichseeks for the objects residing more closely to agiven object. For example, when driving a car in ahighway, the driver may ask where the nearest gasstation is. In other words, nearest-neighbor-find-

ded by Nick Koudas.

ing author. Tel.: +886-7-525-2000x4334; fax:

01.

ess: [email protected] (Y.-I. Chang).

e front matter r 2003 Elsevier Ltd. All rights reserve

.2003.12.002

ing is one of the most important spatial operationsin the field of spatial data structures regardingproximity.Basically, spatial data consist of objects in space

made up of points, lines, regions, rectangles, anddata of higher dimensions. Access methods arerequired to support efficient manipulation of themulti-dimensional spatial objects in the secondarystorage. Several schemes [1–7] for the linearmapping of a multi-dimensional space have beenproposed for various applications, such as accessmethods for traditional databases, GIS and spatio-temporal databases. However, in spatial applica-tions, the structure of databases become larger andmore complex and means for extracting valuable

d.

Page 2: Neighbor-finding based on space-filling curves

ARTICLE IN PRESS

H.-L. Chen, Y.-I. Chang / Information Systems 30 (2005) 205–226206

information become more sophisticated [4]. Thedesign of multi-dimensional access methods isdifficult compared to one-dimensional cases be-cause there is no total ordering that preservesspatial locality [5]. Once a total ordering is foundfor a given spatial database, one can use any one-dimensional access method which may yield goodperformance for multi-dimensional queries. Thus,what is needed is a mapping from a higherdimensional space to a one-dimensional (1D)space. In this mapping, the elements which areclose in space are mapped into nearby points onthe line to preserve the spatial locality. A space-filling curve passes through every point in thespace once. It is one kind of mapping to give anone-to-one correspondence between the coordi-nates and the sequence numbers of the points onthe curve [4].Because the goal of the space-filling curves is to

preserve spatial proximity, they can handle near-est-neighbor queries. Some examples of space-filling curves are the Peano curve, the RBG curveand the Hilbert curve. The Peano curve [6,7] (or theZ-order curve) has the bit shuffling property,which means interleaving the bits from twocoordinates of the point in base 2 in the two-dimensional (2D)-space. An improvement of thePeano curve is the Reflected Binary Gray-code

ðRBGÞ curve [1,2] which uses Gray coding on theinterleaved bits to reduce random accesses on thedisk for range queries. Based on the Peano curve,the Hilbert curve was proposed with superior data

clustering properties, which means that the localitybetween objects in the multi-dimensional space ispreserved in the linear space [3–5,8,9].When the data are linearly ordered by the Peano

curve, we can directly compute the sequencenumbers of the neighboring blocks next to thequery block in eight directions based on its bitshuffling property, even the order is greater than 1.(Note that the order of a curve deals with howmany sequence numbers be necessary to uniquelydefine that curve.) But with the data ordered bythe RBG curve or the Hilbert curve, neighbor-finding would be complex. However, we alsoobserve that there is some relationship betweenthe RBG curve and the Peano curve, as with theHilbert curve. Therefore, in this paper, we first

show how these space-filling curves are derivedand discuss the relationships between them. Next,we describe the effects on nearest-neighbor queriesby using different curves. Then, we show the wayto find the nearest neighbor by our strategy basedon the Peano curve, and present the rules fortransformation between the Peano curve andothers. Finally, we compare the performance ofour strategy for the nearest-neighbor-finding basedon different space-filling curves.The rest of paper is organized as follows. In

Section 2, we briefly describe three space-fillingcurves, the Peano curve, the RBG curve, theHilbert curve, and their spatial properties. InSection 3, we present our strategies for thenearest-neighbor-finding, based on the Peanocurve and the transformation rules between thePeano curve and the others. In Section 4, wedescribe the simulation experiment performed tocompare different strategies, and results obtainedtherefrom. Finally, we provide conclusions for thispaper.

2. Space-filling curves

The design of multi-dimensional access methodsis difficult compared to one-dimensional casesbecause there is no total ordering that preservesspatial locality [5]. One way is to look for amapping which preserves spatial proximity at leastto some extent. In this section, we discuss theproperties of the Peano curve, the RBG curve, andthe Hilbert curve.

2.1. Properties

A space-filling curve orders points linearly topreserve the distance between two points in the2D-space. This means that points which are closein space and represent similar data should bestored together in the linear sequence. Someexamples of space-filling curves are the Peanocurve, the RBG curve and the Hilbert curve [1–3,5–9]. In general, a space-filling curve starts with abasic path on a k-dimensional square grid of side2. The path visits every point in the grid exactlyonce without crossing itself. It has two free ends

Page 3: Neighbor-finding based on space-filling curves

ARTICLE IN PRESS

(a) (b) (c)

0

1 2

3

Fig. 2. RBG curves of order: (a) 1; (b) 2; (c) 3.

(a)

0

1 2

3

(b) (c)

Fig. 3. Hilbert curves of order: (a) 1; (b) 2; (c) 3.

H.-L. Chen, Y.-I. Chang / Information Systems 30 (2005) 205–226 207

which may be joined with other paths. The basiccurve is said to be of order 1. To derive a curve oforder i; each vertex of the basic curve is replacedby the curve of order i � 1; which may beappropriately rotated and/or reflected to fit thenew curve [3].For the Peano curve, the 1D-number (one-

dimensional sequence number) of a point isobtained by bit shuffling on the X and Y

coordinates of the point in the 2D-space. Thebasic Peano curve of order 1 for a 2� 2 grid,denoted as P1; is shown in Fig. 1(a). To derivehigher orders of the Peano curve, we replace eachvertex of the basic curve with the previous ordercurve. Figs. 1(b) and (c) show the Peano curves oforder 2 and 3, respectively [3,8].For the RBG curve, numbers are coded into

binary gray-codes such that successive numbersdiffer in exactly one bit position. Faloutsos [1–3]observed that in the RBG curve, the difference inonly one bit position has a relationship withlocality. He proposed a formula to transform thePeano curve into the RBG curve such that theproperty of locality can be preserved. The basicRBG curve of order 1 for a 2� 2 grid, denoted asR1 is shown in Fig. 2(a). Higher orders of thiscurve are derived by reflecting the previous ordercurve over the x-axis and then over the y-axis.Figs. 2(b) and (c) show the RBG curve of orders 2and 3, respectively [1–3,8].The Hilbert curve is a space-filling curve used to

preserve spatial locality as much as possible. Thus,the Hilbert curve has the important property thatconsecutively ordered points are adjacent in space[4], as shown in Fig. 3(a). As in the case of theprevious two curves, it replicates in four quad-rants. While replicating, the lower left quadrant isrotated clockwise 90�; the lower right quadrant is

0

1

2

3

(a) (b) (c)

Fig. 1. Peano curves of order: (a) 1; (b) 2; (c) 3.

rotated anti-clockwise 90�; and the sense (thedirection of traversal) of both lower quadrants arereversed. The two upper quadrants do not rotateand change the sense. Thus we obtain Fig. 3(b).Because all rotation and sense computations arerelative to previously obtained ones in a particularquadrant, a repetition of this step gives rise to Fig.3(c) [1–3,8,9].Space-filling curves are based on the assumption

that any attribute value can be represented withsome fixed number of bits. In the 2D-space, thereare two attributes: the horizontal and verticaldirections which are called the X - and Y -axis,respectively. The whole space is split into equal-sized sub-regions, and the direction of splittingalternates between the X - and Y -axis. A verticalsplit through the middle of the space amounts todiscrimination in the value of x0: If the resultingsub-regions are split horizontally, this correspondsto discrimination on the value of y0: At this point,there are four equal-sized sub-regions, correspond-ing to the possible values of x0 and y0:In general, each split on the X - or Y -axis

is characterized by one bit. After splitting k

times in each direction, a grid can be specified by2k bits of two coordinates, ðxk�1xk�2?x1x0;yk�1yk�2?y1y0Þ2: The bit string obtained by bit

Page 4: Neighbor-finding based on space-filling curves

ARTICLE IN PRESS

(b)(a)

0 1 2 3( 0 0 ) ( 0 1 ) ( 1 0 ) ( 1 1 )

X

0(0000)

2(0010)

8(1000)

10(1010)

1(0001)

3(0011)

9(1001)

11(1011)

4(0100)

6(0110)

12(1100)

14(1110)

5(0101)

7(0111)

13(1101)

15(1111)

0

1

2

3

( 0 0 )

( 0 1 )

( 1 0 )

( 1 1 )

Y

0

1 2

3

4

56

7 8

910

11

12

13 14

15

(c)

0

3

1

2

4

5

7

6

8

9 10

11

1213

14 15

Fig. 4. Space-filling curves of order 2: (a) z-ordering: Peano curve; (b) r-ordering: RBG curve; (c) h-ordering: Hilbert curve.

H.-L. Chen, Y.-I. Chang / Information Systems 30 (2005) 205–226208

shuffling on these 2k bits uniquely identifies onesub-region. The z-value is represented as thedecimal value of the bit string. This means thatthere is a concise description of the shape, size andposition of the sub-region. All points inside thesub-region have the same coordinates as thesub-region [7]. Consider the point or the regionwith X ¼ ð1Þ10 ¼ ð

%0%1Þ2 ¼ ðx1x0Þ2 and Y ¼ ð3Þ10 ¼

ð1 1Þ2 ¼ ðy1y0Þ2 in Fig. 4(a), where ð Þ2 representsthe binary form of the number, and ð Þ10 representsthe decimal form of the number. The point or theregion has the bit string ð

%0 1

%1 1Þ2 ¼ ðx1y1x0y0Þ2;

obtained by bit shuffling, which represents thesplitting process on the X -and Y -axis. In thedecimal representation, the z-value of the bit stringð%0 1

%1 1Þ2 is equal to ð7Þ10: Orenstein [6,7] used the

z-values and z-ordering to refer to the ordering ofthe Peano curve (Fig. 4(a)), which is a totalordering of elements that combines with the highlyconstrained splitting policy and preserves proxi-mity. Similarly, r-ordering and r-values are usedfor the RBG curve, and h-ordering and h-valuesare used for the Hilbert curve, as shown inFigs. 4(b) and (c), respectively.

3. Nearest-neighbor-finding

Nearest-neighbor-finding is one of the mostimportant spatial operations in the field of spatial

data structures which is concerned with proximity[10,11]. Because the goal of the space-filling curvesis to preserve spatial proximity, nearest neighborqueries can be handled by the space-filling curves.An example of the nearest-neighbor query shownin Fig. 5 is to find the closest hydrant to the pointof a fire by using three different curves. The entiregrid is a map of a neighborhood, which is stored ina database. There exists F ; which represents thepoint of the fire, and H; which represents ahydrant in the area. The lines represent streetblocks and the numbers represent the orders thatare stored in the database. In order to find thenearest data point H to the query point F ; webriefly describe the conventional approach asfollows.In the conventional approach [3], we first

compute the sequence number of F ; and find H

which has the closest sequence number (either inthe preceding or succeeding order) to that of F

along the curve path. Then, we calculate the actualdistance d in the 2D-space between the query pointand the retrieved data point, and issue a rangequery centered at the query point with radius d: Inthis way, we need to check all points and return theclosest point to the query point. In Fig. 5(a)(Peano curve of order 2), we calculate the sequencenumber of the query point, i.e., F9: We find theproceeding and succeeding sequence numbers onthe curve path until one of the points corresponds

Page 5: Neighbor-finding based on space-filling curves

ARTICLE IN PRESS

(a)

0

1

2

H3

4

H5

6

7

8

F9

H10

11

12

13

14

15

(c)

0

3

1

H2

4

H5

7

6

8

9 10

11

12F13

14 H15

(b)

0

1 H2

3

4

56

H7 8

910

11

12

F13 14

H15

H: a hydrantF: the fire

Fig. 5. An example of the nearest-neighbor query which is represented by three kinds of ordering: (a) z-ordering; (b) r-ordering; (c) h-

ordering.

(a)

(b)

(c)

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

FH HH

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

FH H H

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

FH HH

the data point which is one or twoblocks to F in the 2D-space

: a hydrant: the fire

HF

the data point which is more thantwo blocks to F in the 2D-space

Fig. 6. The shaded regions of 1D diagrams illustrate the hydrants H which are within two blocks of the fire F in the 2D-space, as

shown in Fig. 5: (a) z-ordering; (b) r-ordering; (c) h-ordering.

H.-L. Chen, Y.-I. Chang / Information Systems 30 (2005) 205–226 209

to H : In this case, H10 is the first one found to bethe nearest neighbor to F9: The distance from H10

to F9 is 2 ð¼ dÞ blocks in the 2D-space shown inFig. 5(a). Then, we check all points which arewithin two blocks of F9 to see if any closer hydrantexists by performing a range query. FromFig. 5(a), we see that a hydrant H3 exists whichis only one block away from F9 in the 2D-space.But in the 1D-diagram shown in Fig. 6(a), whichrepresents the ordering of the curve, we need tocheck 6 points (from 9 to 3) for the range query toobtain the data point H3; which is the nearestneighbor (one block away from F9) in the 2D-space. Similarly, in Fig. 5(b) (the RBG curve oforder 2), H2 is the nearest neighbor, which is onlyone block away from F13 in the 2D-space. But inthe 1D-diagram shown in Fig. 6(b), we need tocheck 11 points (from 13 to 2) for the range queryto obtain the data point H2: In Fig. 5(c) (theHilbert curve of order 2), H2 is the nearest

neighbor, which is only one block away from F13

in the 2D-space. But in the 1D-diagram shown inFig. 6(c), we need to check 11 points (from 13 to 2)for the range query to obtain the data point H2:Thus, when the data are linearly ordered by thespace-filling curves, it is time-consuming in thenearest-neighbor-finding to check all the data inthe database by using the conventional approach.In this section, we first show how to find the

nearest neighbor based on the property of thePeano curve. Then, we present the neighbor-finding algorithm based on the RBG curve andthe Hilbert curve. We also present the transforma-tion rules between the Peano curve and the othertwo curves in the neighbor-finding process.

3.1. Based on the Peano curve

In each of the space-filling curves, the block ischaracterized by the unique value, i.e., the

Page 6: Neighbor-finding based on space-filling curves

ARTICLE IN PRESS

H.-L. Chen, Y.-I. Chang / Information Systems 30 (2005) 205–226210

sequence number. The Peano curve uses the z-value to linearly order the blocks. As mentioned inSection 2, the z-value in base 2 of the Peano curvecan be easily obtained by interleaving the bits fromxi bits and then yi bits of two coordinates in base2. (Note that the value in base 2 is represented withthe base 2 bits, i.e., 0 or 1. A base 2 bit representsthat bit times a power of 2.) The relationshipbetween the z-value and the coordinates can beshown in Fig. 4(a). The value in parenthesesdenotes the value in base 2, and the mark ( )denotes the bit obtained from the X coordinate inbase 2. However, it is not so straightforward toobtain the two coordinates in base 2 from the r-value of the RBG curve or from the h-value of theHilbert curve in the same way. To obtain the twocoordinates in base 2 from the r-value requires thereflection recursively on the basic RBG curve oforder 1, as shown in Fig. 2. To obtain the twocoordinates in base 2 from the h-value requires thereflection and rotation recursively on the basicHilbert curve of order 1, as shown in Fig. 3. Ascompared to the r-value or the h-value, the z-valuehas a direct relationship with the coordinatesthrough the bit shuffling, even when the order isgreater than 1. Therefore, in this section, we firstdescribe the process of neighbor-finding based onthe z-value of the Peano curve.To simplify our presentation, we let z2 be the z-

value in base 2, i.e., the binary form, and z10 bethe z-value in base 10, i.e., the decimal form. (Notethat the value in base 10 is represented with base10 digits from 0 to 9. A base 10 digit representsthat digit times a power of 10.) For the queryblock numbered as ðZÞz10; it is the z-value of theblock in the 2n � 2n image space which is orderedby the Peano curve. The value ðZÞz2 is representedin base 2 with 2n ð¼ n þ nÞ bits. So, the sequencenumbers for the image space range from 0 to 22n �1: The nearest-neighbor algorithm based on thePeano curve is shown in Fig. 7.In Step 1, we separate ðZÞz2; e.g., ðx2y2x1

y1x0y0Þz2; into two parts. One part concatenatesthe odd bits to the X -coordinate, i.e., Xc2 ¼ðx2x1x0Þc2; where c2 denotes that the value of thecoordinate is represented in base 2. The other partconcatenates the even bits to the Y -coordinate,i.e., Yc2 ¼ ðy2y1y0Þc2: Then, we convert the co-

ordinates ðX ;Y Þc2 into ðX ;Y Þc10; where c10denotes that the value of the coordinate isrepresented in base 10. In Step 2, we useðX ;Y Þc10 to find the neighbor in a certaindirection. The variables HD and VD mean thehorizontal and vertical directions, respectively.For example, if we want to find the west neighbor,we let HD ¼ ‘west’ and VD ¼ ‘NULL’. Then, weobtain the resulting coordinates ðNX ;NY Þc10 ofthe neighbor, e.g., NXc10 ¼ Xc10 � 1 and NYc10 ¼Yc10 for the case of the west neighbor. It should benoticed that either NXc10 or NYc10 of the neighbormust not be beyond the range of the image spacein Step 2. If either one of them is beyond the range,there is no neighbor in a certain direction. In Step3, we convert ðNX ;NY Þc10 into ðNX ;NY Þc2; e.g.,ðx0

2x01x

00; y

02y

01y

00Þc2: Then, we interleave the bits

from NXc2 and NYc2 to form the correspondingðNZÞz2 for the west neighbor, i.e, ðx

02y

02x

01y

01x

00y

00Þz2:

Taking Fig. 8(a) as an example, the width d ofeach block is 1. For the query block ð45Þz10 ¼ðx2y2x1y1x0y0Þz2 ¼ ð

%1 0

%1 1

%0 1Þz2; one of its equal-

sized neighbors in the west direction can beobtained by following the steps shown in Fig. 9.Let X ðð45Þz10Þ and Y ðð45Þz10Þ denote the X -coordinate and the Y -coordinate of blockð45Þz10 ¼ ð

%1 0

%1 1

%0 1Þz2; respectively. First, we se-

parate ð45Þz10 ¼ ð%1 0

%1 1

%0 1Þz2 into X ðð45Þz10Þ and

Y ðð45Þz10Þ; i.e.,

X ðð45Þz10Þ ¼ X ðð%1 0

%1 1

%0 1Þz2Þ

¼ X ððx2y2x1y1x0y0Þz2Þ

¼ ðx2x1x0Þc2

¼ ð%1%1%0Þc2 ¼ ð6Þc10;

Y ðð45Þz10Þ ¼ Y ðð%1 0

%1 1

%0 1Þz2Þ

¼ Y ððx2y2x1y1x0y0Þz2Þ

¼ ðy2y1y0Þc2

¼ ð0 1 1Þc2 ¼ ð3Þc10:

Next, the NXc10 and NYc10 values of the westneighbor are shown as follows:

NXc10 ¼ X ðð45Þz10Þ � d ¼ ð6Þc10 � ð1Þc10 ¼ ð5Þc10;

NYc10 ¼ Y ðð45Þz10Þ ¼ ð3Þc10:

Page 7: Neighbor-finding based on space-filling curves

ARTICLE IN PRESS

Fig. 7. Function Peano NN:

H.-L. Chen, Y.-I. Chang / Information Systems 30 (2005) 205–226 211

There is no change in the Y coordinate of the westneighbor. Then, we convert ðNX ;NY Þc10 ¼ð5; 3Þc10 into ðNX ;NY Þc2¼ ð

%1%0%1; 0 1 1Þc2¼ ðx0

2x01x

00;

y02y

01y

00Þc2: Finally, we interleave the bits from

NXc2 and NYc2 to obtain the NZz2 of the westneighbor equal to ðx0

2y02x

01y

01x

00y

00Þz2 ¼ ð1 0 0

% %

1%1 1Þz2: Finally, we convert NZz2 into NZz10

to obtain the west neighbor, and the resultingblock ð39Þz10 is shown in Fig. 8(a). Similarly,the neighbors in the other directions can alsobe obtained by function Peano NN, as shown inFig. 7.

Page 8: Neighbor-finding based on space-filling curves

ARTICLE IN PRESS

block where the query point is located

neighboring block to the query block

the query point

(b)

0 3 63

6213

12 15 48

4914

51 60

6150

6 5

47 11

10 9 54

558

24 27

2625 3821

20 23 40

4122 3742

30 29

2831

33

3219

18 17 46

4716

45 34

3544

1 2

57

56

39

59

36

52

43

53 58

(c)

0 3 63

627

4 5 58

576

59 60

6156

14 13

1215 11

8 9 54

5310

16 17

1819 4429

30 31 32

3528 34

20 23

2221

43

4225

24 27 36

3726

39 40

4138

1 2

45

51

33

52

55 50 49

48

4746

(a)

0 2 42

439

8 10 32

3311

34 40

4135

4 6

75 13

12 14 36

3715

16 18

1917 5925

24 26 48

4927 5751

20 22

2321

62

6329

28 30 52

5331

54 60

6155

1 3

45

56

39

50

38 44 46

47

58

0 1 2 3 4 5 6 7

0

1

2

3

4

5

6

7

X

Y

Fig. 8. The sequence numbers linearly ordered by the curves of order 3: (a) the Peano curve; (b) the RBG curve; (c) the Hilbert curve.

( 1 0 1 1 0 1 )z2

( x2y2x1y1x0y0 )z2

( x2x1x0 , y2y1y0 )c2

( 1 1 0 , 0 1 1 )c2

( X ,Y )c10

( 6 , 3 )c10

( x'2x'1x'0 , y'2y'1y'0 )c2

( 1 0 1 , 0 1 1 )c2

( 1 0 0 1 1 1 )z2

( x'2y'2x'1y'1x'0y'0 )z2

( NX ,NY )c10

( 5 , 3 )c10

X-1 X+1 Y-1 Y+1'west' 'east' 'south' 'north'

Step 3:Bit Shufflingtransform from

the z-value intothe coordinates

transform from thecoordinates backinto the z-value

Step 2:Neighbor DecisionHD='west', VD='NULL'

Step 1:Bit Separation

(45)z10 (39)z10

Fig. 9. Illustration of the west neighbor-finding process for the query block ð45Þz10 ¼ ð1 0 1 1 0 1Þz2:

H.-L. Chen, Y.-I. Chang / Information Systems 30 (2005) 205–226212

3.2. Neighbor-finding based on the other two curves

Based on the Peano curve, we can efficiently anddirectly find the neighbors next to the query blockby function Peano NN (Fig. 7). But it is not sostraightforward to answer the nearest-neighborqueries based on the RBG curve or the Hilbertcurve, as mentioned above. However, we observethat there is some relationship between the RBGcurve and the Peano curve, and between the

Hilbert curve and the Peano curve. The r-valuein the RBG curve can be represented as the binarygray-code ðRÞr2 (also called gray code), where r2denotes the r-value in base 2. The formulaspresented in [1,2,12] can transform ðRÞr2 into itsorder in the binary form as ðzÞz2: The h-value in theHilbert curve can be obtained by some transfor-mation, reflection, and rotation on the z-value. So,for the RBG curve (or the Hilbert curve), we candevelop some transformation rules between the

Page 9: Neighbor-finding based on space-filling curves

ARTICLE IN PRESS

H.-L. Chen, Y.-I. Chang / Information Systems 30 (2005) 205–226 213

Peano curve and itself, and use functionPeano NN to help answer the nearest-neighborquery. Therefore, we can first transform the r-value (or h-value) of the query block into the z-value. Then, we use the corresponding z-value tofind the neighbors based on the Peano curve. Wefinally transform the z-value of the neighbor backinto the corresponding r-value (or h-value) of theneighbors based on the transformation rules. Inthis section, we first briefly describe the process ofthe neighbor-finding based on the RBG curve.Then, we show how to find the nearest-neighborbased on the Hilbert curve.

3.2.1. Based on the RBG curve

As mentioned before, for the r-value of the 2D-space RBG curve in the 2n � 2n image space, wecan consider the value ðRÞr2 as the gray code,where r2 denotes the r-value in base 2. That is thebit string ðR2n�1?R1R0Þr2: The neighbor-finding

Fig. 10. Function

algorithm based on the RBG curve is shown inFig. 10. Basically, we first transform ðRÞr2 into thecorresponding z-value, i.e., ðZÞz2; by the formulaspresented in [1,2,12]. That is, we perform thetransformation between the gray code and thebinary code. Then, we use ðZÞz2 to generate one ofthe neighbors in a certain direction by functionPeano NN : Finally, we transform ðNZÞz2 of theneighbor back into the corresponding ðNRÞr2 byagain using the formulas presented in [1,2,12].Take Fig. 8(b) as an example. It is a 23 � 23

image space which is ordered by the RBG curve oforder 3. The query point is located in block ð59Þr10;where r10 denotes the r-value in base 10. We useblock ð59Þr10 as the query block which has thesame coordinates ðX ;Y Þc10 ¼ ð6; 3Þc10 as the ex-ample shown in Fig. 8(a). We represent the r-valueð59Þr10 in base 2 as the gray code, i.e.,ðR5R4R3R2R1R0Þr2 ¼ ð1 1 1 0 1 1Þr2: The westneighbor finding process of block ð59Þr10 is shown

RBG NN:

Page 10: Neighbor-finding based on space-filling curves

ARTICLE IN PRESS

(NZ)z2 := Peano_NN((Z) z2 , 'west', 'Null')

Step 2Step 3

1 0 0 1 1 1( ) z2(NZ)z2:

1 1 0 1 0 0( ) r2(NR)r2:

Step 1

1 1 1 0 1 1( ) r2(R)r2:

1 0 1 1 0 1( ) z2(Z)z2:

(* Thta is, (R) r10: (59)r10. *)

(* That is, (NR) r10: (52)r10.*)(* That is, (Z)z10: (45)z10. *)

(* That is, (NZ)z10: (39)z10.*)

Fig. 11. Illustration of the west neighbor-finding process for the query block ð59Þr10 ¼ ð1 1 1 0 1 1Þr2:

H.-L. Chen, Y.-I. Chang / Information Systems 30 (2005) 205–226214

in Fig. 11. In Step 1, we transform the gray codeðRÞr2 ¼ ð1 1 1 0 1 1Þr2 into the z-value in base 2, i.e.,ðZÞz2 ¼ ð

%1 0

%1 1

%0 1Þz2: In Step 2, we use the z-value

ð%1 0

%1 1

%0 1Þz2 ¼ ð45Þz10 to find the west neighbor by

function Peano NN. Then, we find the z-valueð%1 0

%0 1

%1 1Þz2 ¼ ð39Þz10 of the neighboring block. In

Step 3, we transform the z-value ð%1 0

%0 1

%1 1Þz2 back

into the r-value ð1 1 0 1 0 0Þr2: Finally, we convertthe r-value ð1 1 0 1 0 0Þr2 into the decimal ð52Þr10which is the neighbor lies the west of the queryblock ð59Þr10 in Fig. 8(b). Note that block ð52Þr10also locates at ðNX ;NY Þc10 ¼ ð5; 3Þc10; so we havethe same answer as the previous example shown inFig. 8(a).

3.2.2. Based on the Hilbert curve

For the h-value of the 2D-space Hilbert curve inthe 2n � 2n image space, we consider the value H

in base 2 as a Hilbert code, i.e., ðh2n�1?h1h0Þh2;where h2 denotes the h-value in base 2. The pseudocodes for the neighbor-finding process is shown inFig. 12, and the related source codes about thetransformation rules between the Peano curve andthe Hilbert curve are shown in Appendix A.Basically, similar to the previous function, fromStep 1 to Step 4, we first transform the h-value intothe z-value. Then, in Step 5, we use the z-value togenerate the requested neighbor based on thePeano curve. Finally, from Step 6 to Step 9, wetransform the z-value of the neighbor into thecorresponding h-value. However, in this case, it isnot so straightforward to do the transformationbetween the Hilbert curve and the Peano curvesince the coordinates in base 2 from the h-valuerequire the reflection and rotation recursively onthe basic Hilbert curve of order 1.

Take Fig. 8(c) as an example. It is a 23 � 23

image space which is ordered by the Hilbert curveof order 3. The query point is located in blockð51Þh10; where h10 denotes the h-value in base 10.We use block ð51Þh10 as the query block, which hasthe same coordinates ðX ;Y Þc10 ¼ ð6; 3Þc10 as theexample shown in Fig. 8(a). The west neighbor-finding process of block ð51Þh10 is shown in Fig. 13.We first represent the h-value, i.e., ð51Þh10; in base 2as the Hilbert code ðh5h4h3h2h1h0Þh2 ¼ð1 1 0 0 1 1Þh2: We describe the neighbor-findingprocess of block ð51Þh10 in the following steps.

Step 1: We divide the Hilbert code ð11 0011Þh2into pairs of bits from left to right and store thecorresponding decimal value for each pair of bits inarray d: Those are d0 ¼ ð11Þ2 ¼ 3; d1 ¼ ð00Þ2 ¼ 0;and d2 ¼ ð11Þ2 ¼ 3: The decimal value denotes thesequence number (0, 1, 2, or 3) in one quadrantordered by the Hilbert curve, as shown in Fig. 3(a).We also let another array e be same as array d; i.e.,e0 ¼ d0 ¼ 3; e1 ¼ d1 ¼ 0; e2 ¼ d2 ¼ 3: (Note thatarray e is used for storing the temporary result inthe following steps.) Array d is represented asðd0d1d2Þhp = (3 0 3)hp; where hp denotes the codecomposed of the sequence numbers in base 10 forpairs of bits in ðHÞh2; and so is array e: Note thatsince all reflections and rotations in the Hilbertcurve of order p; p > 1 are relative to the previouslyobtained curve of order p � 1; array d is the inputused to store the sense of order p; and array e is theoutput used to store the sense of order p � 1: Inother words, for the following step, there will be nochange to array d to keep the information aboutreflection and rotation in the curve of order p: Andarray e will be repeatedly updated (if necessary) toshow the sense in the curve of order p � 1:

Page 11: Neighbor-finding based on space-filling curves

ARTICLE IN PRESS

Fig. 12. Function Hilbert NN:

H.-L. Chen, Y.-I. Chang / Information Systems 30 (2005) 205–226 215

Page 12: Neighbor-finding based on space-filling curves

ARTICLE IN PRESS

(H)h2: ( 1 1 0 0 1 1 )h2

( 3 0 3 )hp

( 3 0 3 )hpInput

( 3 2* 3 )hpOutput

( 3 0 3 )hpInput

( 3 2 1*)hpOutput

(Z)z2: ( 1 0 1 1 0 1 )z2

(NZ)z2 := Peano_NN((Z) z2 , 'west', 'Null')

(NZ)z2: ( 1 0 0 1 1 1 )z2

( 3 1 2 )hpInput

( 3 1 0*)hpOutput

( 3 1 0 )hpInput

( 3 1 0 )hpOutput

( 3 1 0 )hp

(NH)h2: ( 1 1 0 1 0 0 )h2

case 2

case 1 case 2

(* That is, (H) h10: (51)h10. *) (* That is, (NH) h10: (52)h10. *)

Step 1: Bit Division

Step 2: Rotate_Reflect

Step 3: PHTransformation Step 8: Rotate_Reflect

Step 9: Bit Concatenation

Step 5:

Step 4: Bit Concatenation

( 2* 3* 1)zpOutput

Step 6: Bit Division

( 2 1 3 )zp

Input: ( 3* 1 2* )hp

Step 7: PHTransformation

(* That is, (Z) z10: ( 45 )z10. *) (* That is, (NZ) z10: ( 39 )z10. *)

Fig. 13. Illustration of the west neighbor-finding process for the query block ð51Þh10 ¼ ð1 1 0 0 1 1Þh2:

H.-L. Chen, Y.-I. Chang / Information Systems 30 (2005) 205–226216

Step 2: We call procedure Rotate Reflect, asshown in Fig. 14, to perform the task of reducingthe Hilbert curve from order i ði > 1Þ to order 1.For each element in array d; we first consider thevalue of di to decide which case to follow inprocedure Rotate Reflect, as shown in Fig. 14.Arrays d and e are the input and output,respectively. In Case 1, if the value of di is 0, thenwe should change the value of the followingelement ej from 1 to 3, or from 3 to 1, j > i: InCase 2, if the value of di is 3, then we shouldchange the value of the following element ej if thevalue is from 0 to 2, or from 2 to 0, j > i: In Fig. 13,the input is array d = (3 0 3)hp and the output is

array e = (3 0 3)hp initially, where the markrepresents the processing point. First, we processthe first value of the input, i.e., d0 (= 3). Since it is3, which is Case 2, we change the second value ofthe output (e1) from 0 to 2. Since there is no other0 or 2 in the remaining ej ; j > i; we do not do anyother update to the output for the input d0: (Notethat in Fig. 13, the mark � represents the changedvalue.) Next, we process the second value of theinput, i.e., d1 (= 0). Since it is 0, which is Case 1,we change the third value of the output (e2) from 3to 1. Up to this point, we have finished the processof all elements of the input, and the resultingoutput is ð3 2 1Þhp:

Page 13: Neighbor-finding based on space-filling curves

ARTICLE IN PRESS

Fig. 14. Procedure Rotate Reflect.

0

1 2

3

0 1

23

1514

13 12

(0) (3)

(1) (2)

(a) (b)

Fig. 15. Hilbert curves of order: (a) 1; (b) 2. (N1): N1 is the

sequence number of the quadrant in the curve of order 1.

H.-L. Chen, Y.-I. Chang / Information Systems 30 (2005) 205–226 217

The reason for the change operation for Case 1and Case 2 is as follows. In order to preserve theproximity, the order p ðp > 1Þ of the Hilbert curveis derived from the rotation and reflection on thebasic curve of order 1 continuously in eachquadrant. The sense, i.e., the direction of traversal[8], in each quadrant is not always the same.However, different from the Hilbert curve, thePeano curve recursively shows the same sense asthe basic curve of order 1 to form the curve oforder p; p > 1: So, during the process of thetransformation between the h-value and the z-value, we need to perform the rotation andreflection in reverse order such that each dividedquadrant will have the same sense as the basicHilbert curve of order 1.Take Fig. 15 as an example. Figs. 15(a) and (b)

show the Hilbert curves of order 1 and 2,respectively. The ( ) denotes the sequence numberof the quadrant in the Hilbert curve of order 1, inwhich the quadrant has four blocks. Each blockhas the sequence number ordered by the Hilbert

curve of order 2. We can observe that the sense ofblock (0) of order 2 in Fig. 15(b) is different fromthe sense of order 1 in Fig. 15(a), as is block (3).That is, in block (0), the position of sequencenumber 1 is different from its position inFig. 15(a), as is the position of sequence number3. It is just the sense of block (0) of order 2

Page 14: Neighbor-finding based on space-filling curves

ARTICLE IN PRESS

H.-L. Chen, Y.-I. Chang / Information Systems 30 (2005) 205–226218

(Fig. 15(b)) as derived from the sense of one oforder 1 (Fig. 15(a)) by right-reflection (Fig. 16(a))and 90�-clockwise-rotation (Fig. 16(b)). Afterprocessing the reflection and rotation, the resultis shown in Fig. 16(c). This is Case 1 in procedureRotate Reflect. Similarly, for block (3) in Fig.15(b), the position of sequence number 12 ð¼4� 3þ

%0Þ is different from its position (sequence

number%0) in Fig. 15(a), so is the position of

sequence number 14 ð¼ 4� 3þ%2Þ vs. the position

of sequence number 2. This is because the sense ofblock (3) (Fig. 15(b)) is derived from the sense oforder 1 (Fig. 15(a)) by left-reflection (Fig. 17(a))and 90�-anticlockwise-rotation (Fig. 17(b)). Afterprocessing the reflection and rotation, the result isshown in Fig. 17(c). This is Case 2 in procedureRotate Reflect. However, the sense of block (1) inFig. 15(b) is the same as that of order 1

0(0)

1(0) 2(0)

3(0)

right-reflection

90-degree-clockwise-rotation

1(0)2(0)

0(0)3(0) 0(0) 1(0)

2(0)3(0)

(a) (b) (c)

Fig. 16. The process of reflection and rotation on block (0) in

the Hilbert curve of order 2 in Fig. 15: (a) before the right

reflection; (b) after the right reflection; (c) after the 90�-

clockwise-rotation. N2ðN1Þ: N1 is the sequence number of the

block in the curve of order 1, and N2 is the sequence number of

the block in the curve of order 2.

12(3)

13(3) 14(3)

15(3)

left-reflection

90-degree-anticlockwise-rotation

12(3)

13(3)14(3)

15(3)

12(3)13(3)

14(3) 15(3)

(a) (b) (c)

Fig. 17. The process of reflection and rotation on block (3) in

the Hilbert curve of order 2 in Fig. 15: (a) before the left

reflection; (b) after the left reflection; (c) after the 90�-

anticlockwise-rotation. N2ðN1Þ: N1 is the sequence number of

the block in the curve of order 1, and N2 is the sequence number

of the block in the curve of order 2.

(Fig. 15(a)) without processing rotation andreflection, so is block (2). Therefore, to reducethe Hilbert curve of order p ðp > 1Þ to the basiccurve of order 1, we have to perform procedureRotate Reflect. In this way, after all senses in theHilbert curve of order p ðp > 1Þ have been reducedto the ones in the basic Hilbert curve of order 1, wecan do the transformation from the Hilbert curveinto the Peano curve.

Step 3: Up to this point, we have the code of (3 21)hp; which is the code of the Hilbert curve of order1. Note that in procedure PHTransformation, asshown in Fig. 18, we observe that the position ofsequence number 2 in the basic Hilbert curve oforder 1 (Fig. 19(a)) is different from its position inthe basic Peano curve of order 1 (Fig. 19(b)), as issequence number 3. Therefore, we change theposition of sequence numbers 2 and 3 in the basicHilbert curve of order 1 to those in the basic Peanocurve to obtain the z-value. So, we transformð3 2 1Þhp into the corresponding code of the z-value, i.e., ð2�3�1Þzp; according to Table 1 byprocedure PHTransformation, where zp denotesthe code composed of sequence numbers in base 10for pairs of bits in ðZÞz2:

Step 4: We transform each number of ð%2%3%1Þzp

into the corresponding pair of bits. Then, weconcatenate them to form ð10 11 01Þz2:

Step 5: We use the z-value ð1 0 1 1 0 1Þz2 ¼ð45Þz10 to find the nearest neighbor to the westbased on the Peano curve by function Peano NN.We find the west neighbor which is ð1 0 0 1 1 1Þz2 ¼ð39Þz10 in Fig. 8(a).

Step 6: We divide the code ð10 0111Þz2 into pairsof bits from left to right and store the correspond-ing decimal value for each pair of bits in array c:These are c0 ¼ ð10Þ2 ¼ 2; c1 ¼ ð01Þ2 ¼ 1; andc2ð11Þ ¼ 3: We represent array c as (2 1 3)zp; whichrepresents the code of sequence numbers in thePeano curve.

Step 7: In this step, based on the similar reasonas Step 3, we change the position of sequencenumbers 2 and 3 in the basic Peano curve of order1 to those in the basic Hilbert curve of order 1. Wetransform the code of ðc0c1c2Þzp = (2 1 3)zp into thecorresponding hp code in the basic Hilbert curve oforder 1 according to Table 1 by using procedurePHTransformation in Fig. 18. At this point, we

Page 15: Neighbor-finding based on space-filling curves

ARTICLE IN PRESS

Fig. 18. Procedure PHTransformation:

(a)

0

1 2

3 0

1

2

3

(b)

Fig. 19. Curves of order 1: (a) the Hilbert curve; (b) the Peano

curve.

Table 1

Relationship between the sequence numbers in the Peano curve

and those in the Hilbert curve

Peano curve Hilbert curve

0 0

1 1

2 3

3 2

H.-L. Chen, Y.-I. Chang / Information Systems 30 (2005) 205–226 219

have the resulting array c as ð3� 1 2�Þhp; whichrepresents the code of sequence numbers in thebasic Hilbert curve of order 1.

Step 8: In this step, with a reason similar toStep 2, we should generate the Hilbert curve oforder p ðp > 1Þ from the basic Hilbert curveof order 1. For each element in array c; weconsider the value of ci to decide which case tofollow in procedure Rotate Reflect (Fig. 14). Notethat since all reflections and rotations in theHilbert curve of order p; p > 1 are relative tothe previously obtained curve of order p � 1; thearray c is not only the input, which represents

the sense of order p � 1; but also the output,which represents the sense of order p: First, weprocess the first value of the input, i.e., c0 ð¼ 3Þ:Since it is 3, which is Case 2, we change onlythe third value of the output (c2) from 2 to 0.Since there is no other 0 or 2 in the second valuec1 ð¼ 1Þ and the remaining cj ; j > i; we do not doany other update to the output for the input c0:Next, we process the second value of the input, i.e.,c1 ð¼ 1Þ: Since it is 1, we do not update the output.Up to this point, we have finished the processof all elements of the input, and the resultingoutput is (3 1 0)hp; which represents the code ofsequence numbers in the Hilbert curve of orderp ðp > 1Þ:

Step 9: We transform each number of ðc0c1c2Þhp

= (3 1 0)hp into the corresponding pair of bits.Then, we concatenate them to form ð11 01 00Þh2:Finally, we obtain block ð52Þh10 ¼ ð1 1 0 1 0 0Þh2 asthe west neighbor next to block ð51Þh10 as shown inFig. 8(c). Note that block ð52Þr10 also locates atðNX ;NY Þc10 ¼ ð5; 3Þc10; which is the same answeras the previous example in Fig. 8(a).

4. Performance

In this section, we compare the performance ofnearest-neighbor-finding strategies based on threespace filling curves. Our experiments were per-formed on a Pentium III 733 MHz; 128 MBRAM, 7200 rpm IBM Hard Disk (model name:IC35L040AVVN07-0), and running Windows2000.

Page 16: Neighbor-finding based on space-filling curves

ARTICLE IN PRESS

H.-L. Chen, Y.-I. Chang / Information Systems 30 (2005) 205–226220

4.1. The performance model

In the conventional approach as mentioned inSection 3, it requires the time to traverse the diskblocks by following the sequence numbers in thepreceding and succeeding order of the query blockin the 1D-space to find the neighboring blocks.The number of these traversed neighboring blocksare unpredictable and may be greater than eightblocks, since some unrelated disk blocks in the 1D-space may be accessed. Take the query block 13 inFig. 6(c) based on the Hilbert curve as an example,it will access 11 blocks in the 1D-space to find oneneighboring block 2, which is one block away fromthe query block 13 in the 2D-space. As comparedto the conventional approach, our approach doesnot need to access unrelated disk blocks in the 1D-space. In our approach, first, we compute thesequence numbers of the neighboring blocks nextto the query block in eight directions. It takes theCPU-time to compute these sequence numbersbased on the space-filling curve. Next, once thesequence numbers of the eight neighboring blocksare obtained, we can fetch the correspondingblocks in the disk directly according to thesequence numbers. Then, we can find the nearestpoint to the query point only in these eightneighboring blocks in the 2D-space, which speedsup the search operation.The I/O time, which means the time to access

a block, consists of two parts: positioning time

(PT) and transfer time ðTTÞ [13]. The positioningtime is the time to move to the right position in thedisk, which includes the seek time and rotationaldelays. The transfer time is simply the time totransfer the requested block from the disk intomain memory. Since it is more efficient to fetch aset of consecutive disk blocks than a randomlyscattered set, in order to reduce additionalpositioning time, we can make use of the clustering

property of the space filling curve to replaceseveral exact match queries with one continuousrange query to reduce the positioning time. Theclustering property means that the locality be-tween objects in the 2D-space can be preserved inthe 1D-space. In our approach, since the adjacentblocks on the disk can be accessed with a singlerange query, we can combine eight exact match

queries into some range queries to reduce thepositioning time.Take Fig. 8 as an example. In the case of the

Peano curve as shown in Fig. 8(a), let’s presumethe query block be block 45. The sequencenumbers of the neighboring blocks in eightdirections are blocks 56, 58, 47, 46, 44, 38, 39,and 50. Then, we can combine these eight exactmatch queries into six range queries, including38 to 39; 44; 46 to 47; 50; 56; and 58: Thepositioning time for six range queries is 6PT :The transfer time to transfer these eight blocksfrom the disk is 8TT : So, the total I/O time toaccess these eight disk blocks based on the Peanocurve is ð6PT þ 8TTÞ: (Note that in this case, thenumber of blocks which the conventional ap-proach has to access, is up to 21 (from 38 to 58)blocks in the worst case. Moreover, in theconventional approach, the higher the order ofthe space-filling curve is, the large the number ofblocks accessed [3,5,14,15].)In the case of the RBG curve as shown in

Fig. 8(b), let’s presume query block be block59. The sequence numbers of the neighboringblocks in eight directions are blocks 36, 39, 56, 57,58, 53, 52, and 43. Then, we can combine theseeight exact match queries into five range queries,including 36; 39; 43; 52 to 53; and 56 to 58:So, the total I/O time to access these eightdisk blocks based on the RBG curve is ð5PT þ8TTÞ:In the case of the Hilbert curve as shown in Fig.

8(c), let’s assume the query block be block 51. Thesequence numbers of the neighboring blocks ineight directions are blocks 46, 47, 48, 49, 50, 52,55, and 33. Then, we can combine these eight exactmatch queries into four range queries, including33; 46 to 50; 52; and 55: So, the total I/O time toaccess these eight disk blocks based on the Hilbertcurve is ð4PT þ 8TTÞ: Therefore, as compared tothe conventional approach, our approach canreduce the I/O time in accessing the unrelatedblocks by directly accessing the eight relatedblocks. Moreover, in this example, among threespace-filling curves, our strategy based on theHilbert curve needs the least I/O time because theHilbert curve provides the best clustering property[3–5,8,9].

Page 17: Neighbor-finding based on space-filling curves

ARTICLE IN PRESS

(a) (d)

(b) (e)

(c) (f)

Fig. 20. Six cases of data distribution: (a) uniform; (b)

centralized; (c) diagonal; (d) X-parallel; (e) sine; (f) the real

map.

H.-L. Chen, Y.-I. Chang / Information Systems 30 (2005) 205–226 221

In this comparison, we considered the CPU-timeand the I/O time as our performance measures.We focused on the point data. The experimen-tal analysis was carried out on six distinctspatial datasets. Fig. 20 shows these six casesof data distribution which are described as follows[16]:

(a)

Uniform distribution: The data objects areuniformly distributed in the overall dataspace.

(b)

Centralized distribution: Most of the dataobjects are centralized in a small region.

(c)

Diagonal distribution: The data objects followa uniform distribution along the main diag-onal.

(d)

X-parallel distribution: Most of the dataobjects are located on a line which is parallelto the X -axis.

(e)

Sine distribution: The data objects follow asine curve.

(f)

Real map: This is the point map of thepopulation estimate in Taipei, Taiwan. Thereare 26 376 data points on this map. Onepoint represents 100 people. This statisticalinformation is retrieved from http://www.taiwandm.com.tw/c1-1.htm.

Each dataset contains 10 000 data points in the2D-space which is 1000� 1000: First, the datapoints were assigned to the corresponding diskblock based on the space-filling curve. The pointsin the same disk block are linked. The capacity ofeach disk block, denoted as block capacity, wasassigned to be 10. Because of the limitation ofblock capacity, a space-filling curve of differentorder p ðp > 1Þ was required for the differentdatasets. Next, for each dataset in Figs. 20(a)–(f),5000 query points were randomly generatedto make the nearest-neighbor queries. Finally,we took the average of the CPU-time and the I/Otime required to answer the nearest neighborqueries for each of the dataset as shown inFigs. 20(a)–(f).

4.2. Simulation results

Table 2 shows that different orders for thespace-filling curve, which were required for thedifferent datasets to satisfy the limitation ofblock capacity (= 10), since they have differentdata distribution. For example, for the uniformdata distribution in Fig. 20(a), the space-fillingcurve of order 5 was required to order 10 000 datapoints. This is because there are 1024 ð¼ 25 � 25Þdisk blocks which are used to store thesepoints and each disk block contains 10 points onthe average. For the centralization, diagonal,X-parallel, and sine data distributions inFigs. 20(b), (c), (d), and (e), respectively, most oftheir data points are distributed in the certain

Page 18: Neighbor-finding based on space-filling curves

ARTICLE IN PRESS

Table 3

Comparison of the CPU-time ðmsÞ on different datasets

Data distribution Uniform Centralized Diagonal X-parallel Sine Real map

Peano 42.0 48.0 54.0 42.2 54.0 50.2

RBG 70.0 84.2 94.2 72.0 96.0 84.2

Hilbert 98.2 114.0 128.2 100.2 130.2 112.2

Table 2

Different orders for the space-filling curve to organize different datasets

Data distribution Uniform Centralization Diagonal X-parallel Sine Real map

Order 5 6 7 5 7 6

Table 4

Comparison of the average positioning time of I/O time for our strategies

Data distribution Uniform Centralization Diagonal X-parallel Sine Real map

Peano 5PT 5PT 6PT 5PT 6PT 5PT

RBG 5PT 5PT 5PT 5PT 5PT 5PT

Hilbert 4PT 4PT 4PT 4PT 4PT 4PT

H.-L. Chen, Y.-I. Chang / Information Systems 30 (2005) 205–226222

region or near the certain line. Because of thelimitation of block capacity, the space-filling curveof order (5 or 6 or 7) was required for each of thesedatasets. For the real map shown in Fig. 20, thespace filling curve of order 6 was required toorganize these data points.First, we discuss the results of the comparison

of the CPU-time of our strategies, as shown inTable 3. For all datasets, the CPU-time of thestrategy based on the Peano curve is always lessthan that of the other two strategies. The mainreason for this is as follows. For the strategy basedon the Peano curve, it takes the CPU-time infunction Peano NN as shown in Fig. 7. Incontrast, for the strategy based on the RBG curve,in addition to the CPU-time taken in functionPeano NN, it also takes the CPU-time in thetransformation between the Peano curve and itself(Fig. 10). For the strategy based on the Hilbertcurve, in addition to the CPU-time taken infunction Peano NN, it also takes the CPU-timein the transformation between the Peano curveand itself (Fig. 12). Thus, the strategy based on the

Peano curve will take less CPU-time than thestrategies based on the other two curves. There-fore, for the performance measure of the CPU-time, the strategy based on the Peano curveprovides better result than the strategies basedon the other two curves.Next, we discuss the effect of different datasets

to the needed CPU-time. For the strategy based onthe Peano curve, the CPU-time is the same amongthe first five different datasets. This is because thestrategy based on the Peano curve uses the bitshuffling directly on the coordinates and the z-value to find the neighboring blocks, even thoughthe order is greater than 1. For the real map shownin Fig. 20(f), since the data points are uniformdistributed, it takes the CPU-time for the uniformdistribution, which is organized by the space fillingcurve of order 6.However, for the strategy based on the RBG

curve (or the Hilbert curve), the higher the order ofthe space-filling curve is, the longer the neededCPU-time is. It takes more CPU-time andbecomes more complex than the strategy based

Page 19: Neighbor-finding based on space-filling curves

ARTICLE IN PRESS

H.-L. Chen, Y.-I. Chang / Information Systems 30 (2005) 205–226 223

on the Peano curve in computing the sequencenumbers of the neighboring blocks next to thequery block, especially when the order is increasedfor the different datasets. For example, since thesine data distribution requires the curve of order 7,which is larger than the curve of order 6 requiredby the centralized distribution, the sine datadistribution takes more CPU-time than the cen-tralized distribution does.Next, we discuss the performance of the I/O

time among our strategies. The average position-ing time for each of our three strategies on thedifferent datasets is shown in Table 4. The transfertime for our strategies is the same, i.e., 8TT :So, the average I/O time for the strategies basedon the Peano curve, the RBG curve, and theHilbert curve are ð5:3PT þ 8TTÞ; ð5PT þ 8TTÞ;and ð4PT þ 8TTÞ; respectively. In our simulationenvironment, one unit of the positioning timeis average 8:8 ms; which is much longer thanthe CPU-time. (Note that this information isretrieved from http://www.digit-life.com/articles/digests/hddreview-0602-ibm.html.) For the perfor-mance of the I/O time, our strategy based on theHilbert curve requires less time than the strate-gies based on the other two curves, because theHilbert curve can provide the better clusteringproperty than the other two curves [3,5,8]. But forthe performance of the CPU time, our strategybased on the Hilbert curve requires more time thanthe strategies based on the other two curvesbecause of the complicated transformation rulesbetween the Peano curve and the Hilbert curve.Since the I/O time is generally several orders ofmagnitude longer than the CPU time, our strategybased on the Hilbert curve can provide betterperformance (the CPU-time + the I/O time) thanthe strategies based on the Peano curve and theRBG curve for processing nearest-neighborqueries.Consequently, our strategy based on the Peano

curve requires less CPU-time than the strategybased on the RBG curve (or the Hilbert curve)by using the bit shuffling property of the Peanocurve. When the data are linearly ordered by theRBG curve (or the Hilbert curve), it takes mostof the CPU-time in using the complicated trans-formation rules between the Peano curve and

the RBG curve (or the Hilbert curve) to computethe sequence numbers of the eight neighbor-ing blocks next to the query block in the 2D-space. Moreover, the strategy based on the Hilbertcurve requires the least total time (the CPU-time + the I/O time) in the nearest-neighbor-finding, since the Hilbert curve provides the bestclustering property among three space-fillingcurves.

5. Conclusion

In this paper, we have shown that the propertyof the Peano curve, bit shuffling directly onthe coordinates in base 2, is useful in directlycomputing the sequence number of the eightneighboring blocks next to the query block in the2D-space. Then, we have presented the transfor-mation rules between the Peano curve and theRBG curve (or between the Peano curve andthe Hilbert curve), which can be used when thedata is linearly ordered by the RBG curve (or theHilbert curve). Finally, we have compared theCPU-time and the I/O time among differentnearest-neighbor strategies. From our simulation,among our three strategies, we have shown thatthe strategy based on the Peano curve requires theleast CPU-time in computing the sequence numberof the 8 neighboring blocks next to the queryblock, and the strategy based on the Hilbert curverequires the least I/O time to access eightneighboring blocks next to the query blockbecause of its good clustering property. Since theI/O time is much longer than the CPU-time, thestrategy based on the Hilbert curve requires theleast total time (the CPU-time þ the I/O time) toprocess the nearest-neighbor query among ourthree strategies.

Appendix A

The source codes of the transformation rules(Procedure HilbertToPeano and ProcedurePeanoToHilbert) between the Peano curve andthe Hilbert curve are shown in Figs. 21 and 22,respectively.

Page 20: Neighbor-finding based on space-filling curves

ARTICLE IN PRESS

Fig. 21. Procedure HilbertToPeano

H.-L. Chen, Y.-I. Chang / Information Systems 30 (2005) 205–226224

Page 21: Neighbor-finding based on space-filling curves

ARTICLE IN PRESS

Fig. 22. Procedure PeanoToHilbert:

H.-L. Chen, Y.-I. Chang / Information Systems 30 (2005) 205–226 225

Page 22: Neighbor-finding based on space-filling curves

ARTICLE IN PRESS

H.-L. Chen, Y.-I. Chang / Information Systems 30 (2005) 205–226226

References

[1] C. Faloutsos, Multiattribute hashing using gray codes,

Proceedings of ACM SIGMOD International Conference

on Management of Data, August 1986, pp. 227–238.

[2] C. Faloutsos, Gray codes for partial match and range

queries, IEEE Trans. Software Eng. 14 (10) (1988)

1381–1393.

[3] C. Faloutsos, S. Roseman, Fractals for secondary key

retrieval, Proceedings of ACM SIGACT-SIGMOD-SI-

GART Symposium on PODS, 1989, pp. 247–252.

[4] J.K. Lawder, P.J.H. King, Querying multi-dimensional

data indexed using the Hilbert space-filling curve, ACM

SIGMOD Record 30 (1) (2001) 19–24.

[5] B. Moon, H.V. Jagadish, C. Faloutsos, J.H. Saltz,

Analysis of the clustering properties of the Hilbert space-

filling curve, IEEE Trans. Knowledge Data Eng. 13 (1)

(2001) 124–141.

[6] J.A. Orenstein, T.H. Merrett, A class of data structures for

associative searching, Proceedings of Symposium on

PODS, 1984, pp. 181–190.

[7] J.A. Orenstein, Spatial query processing in an object-

oriented database system, Proceedings of ACM SIGMOD

International Conference on Management of Data, 1986,

pp. 326–336.

[8] H.V. Jagadish, Linear clustering of objects with multiple

attributes, Proceedings of ACM SIGMOD International

Conference on Management of Data, 1990, pp. 332–342.

[9] H.V. Jagadish, Analysis of the Hilbert curve for represent-

ing two-dimensional space, Inform. Process. Lett. 62 (1)

(1997) 17–22.

[10] A. Corral, Y. Manolopoulos, Y. Theodoridis, M. Vassi-

lakopoulos, Closest pair queries in spatial databases,

Proceedings of ACM SIGMOD International Conference

on Management of Data, 2000, pp. 189–200.

[11] R.H. Guting, An introduction to spatial database systems,

Special Issue on Spatial Database Systems of VLDB J. 3

(4) (1994) 1–32.

[12] E.M. Remgold, J. Nievergelt, N. Deo, Combinatorial

algorithms: Theory and Practice, Prentice-Hall Inc.,

Englewood Cliffs, NJ, 1977.

[13] B. Seeger, P.A. Larson, R. McFadyen, Reading a set of

disk pages, Proceedings of the 19th Conference on VLDB,

1993, pp. 592–603.

[14] S. Liao, M.A. Lopez, S.T. Leutenegger, High dimensional

similarity search with space filling curves, Proceedings of

the International Conference on Data Engineering, 2001,

pp. 615–622.

[15] M.F. Mokbel, W.G. Aref, I. Kamel, Performance of multi-

dimensional space-filling curves, Proceedings of the 10th

ACM International Symposium on Advances in GIS,

2002, pp. 149–154.

[16] B. Seeger, H.P. Kriegel, The buddy-tree: an efficient and

robust access method for spatial data base systems,

Proceedings of the 16th Conference on VLDB, 1990,

pp. 590–601.