[ieee 2009 tenth international conference on mobile data management: systems, services and...

10
On the Energy Efciency for Heterogeneous Data Broadcasting Chung-Hua Chu , Ming-Syan Chen Band Yu-Fen Chen BGraduate Institute of Communication Engineering B Department of Electrical Engineering National Taiwan University Research Center for Information Technology Innovation Academia Sinica Taipei, Taiwan, ROC E-mail: [email protected], [email protected] Abstract Data broadcast is an advanced technique to realize large scalability and bandwidth utilization in a mobile computing environment. In the heterogeneous data broadcast, the data size is variant with time in multiple broadcast channels. However, traditional indexing schemes do not consider vari- ant data size to design indexing techniques in the multiple channels. Therefore, the above drawback leads to large power consumption in the heterogeneous data broadcast. In this paper, we remedy the problem of devising an indexing technique to index the data of variant size via the multiple channels. In view of the characteristics of variant data size in the multiple channels, we propose an indexing technique using an index tree to minimize average waiting time and average tuning time. Experimental results show that our approach is able to generate broadcast programs including data indices with high quality and is very efcient in the heterogeneous data broadcast. 1. Introduction With rapid advances in wireless mobile communication technologies, mobile users can access information anytime, anywhere, via wireless mobile devices such as notebooks, palmtops, smart phones, and so on. Since earlier wireless communication was limited by scarce channel bandwidth and lower capability of mobile devices, content to be deliv- ered is almost textual information. Therefore, prior works on mobile data dissemination were mostly based on the assumption of equal data size. In the advanced communi- cation environment with larger bandwidth, mobile users can access various modern information services with their high- capability devices such as high-quality images, video, audio and multimedia applications. This scenario is referred to a heterogeneous environment [4] [6]. Hence, the data items of diverse sizes are disseminated in a modern information system. Due to the limit of power and bandwidth, several research results are developed to conserve energy and communication bandwidth such as mobile data dissemination [1] [2] [12], energy efcient indexing [3] [7] [14], and effective band- width allocation [9] [10] [13]. Compared to other indexing techniques such as (1,m) indexing [8], the distributed index- ing technique [8] results in lower waiting time. In contrast to the traditional distributed indexing technique, we cannot adopt a conventional index tree structure in our scenario, where the index tree provides a sequence of pointers leading to the required data. Since the traditional index tree does not contain the information of channel locations, the traditional index tree is infeasible for a multi-channel environment. In this paper, we consider the Index Allocation P roblem. We rst propose a new index tree referred to as a DSA (Data Size Aware) tree. With channel information in the index tree, the proposed indexing technique can facilitate access convenience in the user end to reduce the waiting time. In addition, the number of levels in a DSA tree is smaller than that in a conventional index tree. This feature results in effectively retrieving required data for users. To obtain the required data, users traverse a path from a root node to the required data node in the index tree. As compared to the traditional index tree, the DSA tree can achieve shorter time for the users to listen a channel for their required data due to fewer search points to reach the required data in the DSA tree with fewer levels. Due to shorter time listening to the channel in the user end, energy consumption is much lower than other index trees. We propose algorithm SIA (standing for data Size aware Index Allocation) to allocate data and their indices into 2009 Tenth International Conference on Mobile Data Management: Systems, Services and Middleware 978-0-7695-3650-7/09 $25.00 © 2009 IEEE DOI 10.1109/MDM.2009.20 92 2009 Tenth International Conference on Mobile Data Management: Systems, Services and Middleware 978-0-7695-3650-7/09 $25.00 © 2009 IEEE DOI 10.1109/MDM.2009.20 92

Upload: yu-fen

Post on 26-Feb-2017

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: [IEEE 2009 Tenth International Conference on Mobile Data Management: Systems, Services and Middleware - Taipei, Taiwan (2009.05.18-2009.05.20)] 2009 Tenth International Conference

On the Energy Efficiency for Heterogeneous Data Broadcasting

Chung-Hua Chu†, Ming-Syan Chen† ‡ and Yu-Fen Chen ‡†Graduate Institute of Communication Engineering

Department of Electrical EngineeringNational Taiwan University

‡Research Center for Information Technology InnovationAcademia Sinica

Taipei, Taiwan, ROCE-mail: [email protected], [email protected]

Abstract

Data broadcast is an advanced technique to realize largescalability and bandwidth utilization in a mobile computingenvironment. In the heterogeneous data broadcast, the datasize is variant with time in multiple broadcast channels.However, traditional indexing schemes do not consider vari-ant data size to design indexing techniques in the multiplechannels. Therefore, the above drawback leads to largepower consumption in the heterogeneous data broadcast. Inthis paper, we remedy the problem of devising an indexingtechnique to index the data of variant size via the multiplechannels. In view of the characteristics of variant data sizein the multiple channels, we propose an indexing techniqueusing an index tree to minimize average waiting time andaverage tuning time. Experimental results show that ourapproach is able to generate broadcast programs includingdata indices with high quality and is very efficient in theheterogeneous data broadcast.

1. Introduction

With rapid advances in wireless mobile communicationtechnologies, mobile users can access information anytime,anywhere, via wireless mobile devices such as notebooks,palmtops, smart phones, and so on. Since earlier wirelesscommunication was limited by scarce channel bandwidthand lower capability of mobile devices, content to be deliv-ered is almost textual information. Therefore, prior workson mobile data dissemination were mostly based on theassumption of equal data size. In the advanced communi-cation environment with larger bandwidth, mobile users canaccess various modern information services with their high-capability devices such as high-quality images, video, audioand multimedia applications. This scenario is referred to aheterogeneous environment [4] [6]. Hence, the data items

of diverse sizes are disseminated in a modern informationsystem.

Due to the limit of power and bandwidth, several researchresults are developed to conserve energy and communicationbandwidth such as mobile data dissemination [1] [2] [12],energy efficient indexing [3] [7] [14], and effective band-width allocation [9] [10] [13]. Compared to other indexingtechniques such as (1,m) indexing [8], the distributed index-ing technique [8] results in lower waiting time. In contrastto the traditional distributed indexing technique, we cannotadopt a conventional index tree structure in our scenario,where the index tree provides a sequence of pointers leadingto the required data. Since the traditional index tree does notcontain the information of channel locations, the traditionalindex tree is infeasible for a multi-channel environment.

In this paper, we consider the Index Allocation

Problem. We first propose a new index tree referred to asa DSA (Data Size Aware) tree. With channel informationin the index tree, the proposed indexing technique canfacilitate access convenience in the user end to reduce thewaiting time. In addition, the number of levels in a DSAtree is smaller than that in a conventional index tree. Thisfeature results in effectively retrieving required data forusers. To obtain the required data, users traverse a path froma root node to the required data node in the index tree. Ascompared to the traditional index tree, the DSA tree canachieve shorter time for the users to listen a channel fortheir required data due to fewer search points to reach therequired data in the DSA tree with fewer levels. Due toshorter time listening to the channel in the user end, energyconsumption is much lower than other index trees.

We propose algorithm SIA (standing for data Size awareIndex Allocation) to allocate data and their indices into

2009 Tenth International Conference on Mobile Data Management: Systems, Services and Middleware

978-0-7695-3650-7/09 $25.00 © 2009 IEEE

DOI 10.1109/MDM.2009.20

92

2009 Tenth International Conference on Mobile Data Management: Systems, Services and Middleware

978-0-7695-3650-7/09 $25.00 © 2009 IEEE

DOI 10.1109/MDM.2009.20

92

Page 2: [IEEE 2009 Tenth International Conference on Mobile Data Management: Systems, Services and Middleware - Taipei, Taiwan (2009.05.18-2009.05.20)] 2009 Tenth International Conference

Mobile UserBroadcast ProgramInformation System Downlink Channel

Uplink Channel

d1 d2 d3

d4 d5 d6 d7 d8

Figure 1. Architecture of broadcast-based data dissem-ination.

multiple broadcast channels. SIA first creates a set of allitems and then partitions the set into two subsets to min-imize average waiting time. This procedure repeats untilthe number of subsets reaches the number of channels.In order to evaluate the performance of SIA, we conductseveral experiments. During experiments, we analyze theeffectiveness of SIA with average waiting time and alsoinvestigate the efficiency of SIA by measuring its executiontime. The experimental results show that SIA is of veryhigh quality and in fact is very close to an optimal schemeOPT, which is designed by using a genetic algorithm1 forcomparison purposes. Therefore, SIA has the same qualityas OPT with incurring much lower complexity.

The rest of this paper is organized as follows. SectionII reviews related works. Preliminaries are given in SectionIII. In Section IV, we develop an algorithm SIA to generatea broadcast program. Experimental results are presented inSection V. Finally, this paper concludes with Section VI.

2. Preliminaries

2.1. System Architecture

Architecture of a broadcast-based information system ispresented in Figure 1. A server provides mobile users withinformation services in the wireless environment. Broadcastdownlink bandwidth is split into several sub-channels, andthe bandwidth of each channel is equal. A broadcast programis generated to broadcast data items through the broadcastchannels. Since the server generates the broadcast programto schedule data in each channel, we place the broadcast

1. In essence, GA (genetic algorithm) [5] is a pervasive approach inthe literature of soft computing and evolutionary computation to solveoptimization problems. GAs are iterative procedures to search the problemsolutions with an evolutionary process based on natural selection. Sincegenetic algorithms have been proved to be a versatile and effective ap-proach for solving optimization problems, we adopt GA to design optimalalgorithm OPT to schedule diverse data in the static and dynamic scenarios.

program between the information system and downlinkchannels. To retrieve the data on the air, the mobile usersreceive index information when arriving at the server andthen wait until the data appear. The server can model theaccess frequency for each data item by collecting accesspatterns from mobile users [11]. Note that the informationsystem provides an uplink channel so that the server cancollect access patterns from mobile users. That is, the servercan model the access frequency for each data item bycollecting access patterns from mobile users [11]. Withoutloss of generality, we assume in this paper that each dataitem is of different size. Therefore, our architecture canallow the mobile users to conveniently access the data inthe real broadcasting environment.

Table 1. Description of symbols.

Description SymbolSize of the j-th data item in the channel i zi,j

Bandwidth of the broadcast channel bNumber of broadcast channels K

The i-th broadcast channel ciNumber of the broadcast data items M

Database of broadcast data items DNumber of the data items allocated to ci Mi

Item set of data items allocated to ci Di

The j-th data item in ci di,jSize of an index segment index

Access frequency of the j-th data item in ci fi,j

2.2. Notation and Definition

Consider database D, and the number of broadcastingdata items is M. Assume that there are K channels. Thebandwidth of each channel ci is denoted by b, where1≤ i ≤ K. In a broadcast program, the items contained in setDi are broadcast via channel ci, where |Di| is Mi. We haveM =

PKi=1Mi, and Di∩Dj = ∅ for i 6= j. The size of the

j-th data item allocated in channel ci is zi,j , and all dataitems are read-only, where 1≤ j ≤ Mi. An item denotedby di,j stands for the j-th data item in channel ci. Theaccess frequency of data item di,j is denoted by fi,j , wherePK

i=1

PMi

j=1 fi,j = 1. Table 1 shows the description ofrelated symbols used in modeling the broadcast environment.

2.3. Analytical Model

Assume that data items di,1, di,2..., di,Mi are broadcastvia channel ci, and a user receives index information whenarriving at a system. Let ti,j represent waiting time of data

9393

Page 3: [IEEE 2009 Tenth International Conference on Mobile Data Management: Systems, Services and Middleware - Taipei, Taiwan (2009.05.18-2009.05.20)] 2009 Tenth International Conference

item di,j in channel ci, where 1≤ i ≤ K and 1≤ j ≤ Mi.

To retrieve a data item, the user first tunes a channel andthen waits for the data of interest to appear on the broadcastchannel. We provide an analytical model for indexing thedata items of variant sizes in the multi-channels. We firstderive waiting time, which is the sum of average probewait and average bcast wait explained as follows. Probewait is the average time for users to receive the firstindex segment when they arrive at a server in any time.Bcast wait is the average time from the time point whenusers firstly encounter the index segment to the time pointafter downloading their required data. We specifically deriveaverage probe wait and average bcast wait as follows.

Given channel bandwidth b, the size of an index segmentindex and the number of data items Mi in channel ci,

the average probe wait of ci can be formulated as Eq.(1). Specifically, assume that all index segments are of thesame size. The probability of receiving a request duringthe j-th indexing interval τ i,j of broadcast cycle li (li =PMi

k=1(zi,k+index)

b ) is given by τ i,jli, where zi,j is the size of

item di,j . Since the aggregate size of data items and indicesin channel ci is

PMi

k=1(zi,k + index), the broadcast cycleof ci can be derived as (

PMi

k=1 zi,k + index)/b. Since theprobe wait of the j-th indexing interval τ i,j is τi,j

2 , averageprobe wait for a user listening to channel ci is

pi =

MiXj=1

τ2i,j2li

,

where τ i,j is zi,j+indexb . Therefore, we can derive Eq. (1)

by substituting li and τ i,j into Eq. (1) expressed as

pi =

PMi

j=1(zi,j + index)2

2bPMi

k=1(zi,k + index). (1)

Given a broadcast program P, the average probe wait ofP can be formulated as Eq. (1). More specifically, accordingto Eq. (1), the average probe wait p of P can be viewed asthe expected value of pi as follows,

p = E[pi] =KXi=1

(MiXj=1

fi,j)pi

=KXi=1

{(MiXj=1

fi,j)[

PMi

k=1(zi,k + index)2

2bPMi

k=1(zi,k + index)]}

=KXi=1

[(PMi

j=1 fi,j)PMi

k=1(zi,k + index)2

2bPMi

k=1(zi,k + index)]. (2)

Given the channel bandwidth b, the size of an indexsegment index and the number of data items Mi, bcast waitcan be expressed as Eq. (3). This is because the data itemsin Di are broadcast through the channel ci. Assume that allindex segments are of the same size. Since the aggregatesize of data items and indices in channel ci is

PMi

k=1(zi,k+

index), the broadcast cycle of ci can be formulated as(PMi

k=1 zi,k + index)/b. Bcast time is a random variableuniformly distributed between [0,(

PMi

k=1 zi,k + index)/b].Therefore, average bcast wait, the expected value of thiswaiting time, is (

PMi

k=1 zi,k+index)/2b. The download timeis the duration that mobile users should spend downloadingthe data item di,j through the channel ci (i.e., zi,j

b ). Thus,bcast wait is the sum of download time and average bcasttime as

wi,j =

PMi

k=1(zi,k + index)

2b+

zi,jb. (3)

According to Eq. (3), the average bcast wait wi of dissem-inating the data items in the channel ci can be formulatedas,

wi =

PMi

j=1 fi,jwi,jPMi

j=1 fi,j

=

PMi

j=1 fi,jPMi

k=1(zi,k + index)

2bPMi

j=1 fi,j+

PMi

j=1 fi,jzi,j

bPMi

j=1 fi,j.(4)

According to Eq. (4), the average bcast wait w of P canbe viewed as the expected value of wi as follows,

w = E[wi] =KXi=1

(

MiXj=1

fi,j)wi

=KXi=1

{(MiXj=1

fi,j)[

PMi

j=1 fi,jPMi

k=1(zi,k + index)

2bPMi

j=1 fi,j

+

PMi

j=1 fi,jzi,j

bPMi

j=1 fi,j]}

=KXi=1

[

PMi

j=1 fi,jPMi

k=1(zi,k + index)

2b+PMi

j=1 fi,jzi,j

b]. (5)

2.4. Problem Formulation

We aim to generate a broadcast program to allocate dataitems and data indices to multiple channels such that waiting

9494

Page 4: [IEEE 2009 Tenth International Conference on Mobile Data Management: Systems, Services and Middleware - Taipei, Taiwan (2009.05.18-2009.05.20)] 2009 Tenth International Conference

time is minimized. Therefore, our cost is waiting time, whichis sum of average probe time in Eq. (2) and average bcasttime in Eq. (5) as follows,

cost =KXi=1

[(PMi

j=1 fi,j)PMi

k=1(zi,k + index)2

2bPMi

k=1(zi,k + index)+PMi

j=1 fi,jPMi

k=1(zi,k + index)

2b+PMi

j=1 fi,jzi,j

b]. (6)

Therefore, the problem of generating the broadcast programin heterogeneous environments can be formulated as follows.Given database D and a set of channels {ci}, 1≤ i ≤ K,

generate a broadcast program to allocate the items andindices into each channel so that cost can be minimized.

3. Index-allocation Broadcast Program

3.1. Size Aware Indexing

To facilitate data access on user side, we propose anindexing technique with an index tree as a data structurein static and dynamic scenarios. Users can first tune anychannel to receive index information to realize data locationsand channel locations of their desired data. Therefore, theusers do not exhaustively tune each channel to find theirdesired data.

3.1.1. Data Organization for Broadcasting. To allocatedata indices in a DSA tree to channels, we adopt distributedindexing [8] due to lower waiting time with broadcastingonly part of the index tree preceding each data segment. Adata and an index are allocated into a data segment and anindex segment respectively. The size of the data segment isequal to the size of the allocated item. The size of the indexsegment is much smaller than that of the data segment and isequal. Data segments and index segments are interleaved inthe broadcast channel. The index tree in the index segmentprecedes each data segment in the channel.

The whole DSA tree is composed of two parts: a repli-cated part and a nonreplicated part. Root index I of thereplicated part only appears at the first index segment ineach channel. Therefore, users can receive I on any channelin the first arrival. Channel index Ci of the replicated partis inserted into each index segment in broadcast channel ci(1≤ i ≤ K). Data index Ri,j of the nonreplicated part is

I

C1 C2 C3

R1,1 R1,2 R2,2 R3,1 R3,2 R3,3

d1 d2 d4 d5 d6 d7

Replicated Part

Non-R

eplicated Part

R2,1

d3

d1I C1 R1,1 d2C1 R1,2

d3I C2 R2,1 d4C2 R2,2

d5I C3 R3,1 d6C3 R3,2 d7C3 R3,3

First_I

Second_I

Third_I

(a) Example of a DAS tree.

(b) Example of distributed indexing in three channels

Figure 2. Examples of a DAS tree and index allocation.

inserted into an index segment preceding the data segmentof the j-th data item dk in the channel ci, where 1≤ i ≤ K,1≤ j ≤ Mi and 1≤ k ≤ |D|. For instance, the index pathof item d1 in Figure 2(a) is I, C1 and R1,1. Each data indexis broadcasted exactly once in a broadcast cycle. Since thenumber of levels in the DSA tree is four, there are threeindex nodes included in the index path for each data. Thenumber of levels in the DSA tree does not increase with thenumber of data. Therefore, time spent by a user to listen tothe channel is minimized due to the shorter index path inthe DSA tree. This time does not increase with increasingthe number of broadcasted data items. In addition, the DSAtree structure is feasible for maintenance if any item changesits location in a broadcast program. Since each item has itsunique data index, we merely move a data node to a subtreerooted by a channel index node of destination.

We discuss index structures and other auxiliary informa-tion in an index segment to facilitate searching requireddata for users. Each index sub-segment contains pointerspointing to other sub-segments containing its child indexnodes. These pointers are referred to as local indices [8].Consider Figure 2(b) as an example, each index sub-segmentcontaining index node C1 has a local index pointing at sub-segments containing index nodes R1,1 and R1,2 respectively.

9595

Page 5: [IEEE 2009 Tenth International Conference on Mobile Data Management: Systems, Services and Middleware - Taipei, Taiwan (2009.05.18-2009.05.20)] 2009 Tenth International Conference

Since users arriving at a server in any time may not receivethe index of the requested data item, more auxiliary infor-mation is used to direct the users to another sub-segmentscontaining their required information. To solve the aboveproblem, we introduce a control index [8] composed ofpointers pointing at the next occurrence of a sub-segmentcontaining the parent index nodes of the current index node.Since data size is non-uniform, there is a sub-segment inthe index segment to store the interval of each data segmentto facilitate the control index to precisely locate data inthe channel. Consider Figure 2(b) as an example where auser requires data item d3 and arrives at the occurrence ofbroadcasting index node C1. The control index in C1 therebydirects the user to second_I since item d3 is not in thesubtree of root C1. Therefore, with some auxiliary indicesembedded in the index segment, the server enables the usersto conveniently access their required data. The algorithmicform of Procedure Tree Construction is as follows.

Procedure Tree Construction (D1,D2, ...,DK)

Input: item sets D1, D2, ...,DK

Output: tree DSATree

begin1. Create root node I in DSATree;

2. for (i=1;i ≤ K;i++)3. Attach node Ci to I in DSATree;4. for (n=1;n ≤Mi;n++)5. Attach node Ri,n to Ci in DSATree;

6. Attach node di,n to Ri,n in DSATree;

7. return DASTree;end

3.1.2. Tuning Time Analysis. We provide an analysis oftuning time for indexing the data items of variant sizesin the multi-channels. Tuning time is the time spent by auser listening to the channel. Turning time is also used todetermine energy consumption in a user device to downloadthe required data. We specifically derive tuning time asfollows.

Give the size of index segment index and channel band-width b, the tuning time of downloading data item di,j can beformulated as Eq. (7). Specifically, tuning time is composedof three processes as follows. Firstly, a user listens to channelci until encountering the nearest index segment. Thus, thewait time of this probe is given by Lemma 1 (i.e., the firstterm in Eq. (7). Secondly, the user listens to three probes

for following the pointers in the index segment. This isbecause our index structure includes three indices to locaterequested data: a root index, a channel index, and a dataindex. Finally, the user spends time downloading requesteddata zi,j . Therefore, the tuning time of downloading dataitem di,j can be formulated as,

tti,j =

PMi

j=1(zi,j + index)2

2bPMi

k=1(zi,k + index)+(3index+ zi,j)

b. (7)

The average tuning time tti is the expected value of tuningtime tti,j in Eq. (7). Thus, we can obtain,

tti =

PMi

j=1 fi,jtti,jPMi

j=1 fi,j

=

PMi

j=1(zi,j + index)2

2bPMi

k=1(zi,k + index)+3index

b

+

PMi

j=1 fi,jzi,j

bPMi

j=1 fi,j. ¥ (8)

Average bcast wait tt of P can be viewed as the expectedvalue of tti in Eq. (8) as follows,

tt = E[tti] =KXi=1

(

MiXj=1

fi,j)tti

=KXi=1

{(MiXj=1

fi,j)[

PMi

j=1(zi,j + index)2

2bPMi

k=1(zi,k + index)+3index

b

+

PMi

j=1 fi,jzi,j

bPMi

j=1 fi,j]}

=KXi=1

[(PMi

j=1 fi,j)PMi

k=1(zi,k + index)2

2bPMi

k=1(zi,k + index)

+3index(

PMi

j=1 fi,j)

b+

PMi

j=1 fi,jzi,j

b]. (9)

3.2. Data Index Allocation Algorithm

We propose an algorithm SIA (standing for data Sizeaware Index Allocation) to effectively allocate data andindices into multiples channels. For each iteration, SIAfinds a data set of the largest cost and then divides the setinto two data subsets such that the cost can be minimizedafter partition. The above process continues to partition theother subset until the number of the subsets reaches K. Wespecifically state our algorithm as follows.

9696

Page 6: [IEEE 2009 Tenth International Conference on Mobile Data Management: Systems, Services and Middleware - Taipei, Taiwan (2009.05.18-2009.05.20)] 2009 Tenth International Conference

3.2.1. Data Index Allocation. We first model the featuresof data items with the benefit ratio [6] defined as the accessfrequency over the item size and then sort the data items indescending order according to the benefit ratio. Afterward,SIA inserts all data items in item set D1 associated withthe first channel c1. For each iteration, SIA moves the dataitems from D1 to D2 associated with channel c2 if the costcan be reduced. Afterward, SIA finds a set of the largest costand then repeats the above steps to divide the set into twosubsets. Note that SIA determines the most suitable pointto partition a data set into two subsets allocated to the twochannels respectively such that the cost is minimized. SIAiterates the above steps until the number of the item setsreaches K.

Feature Reduction. Since each item contains two fea-tures, we first model the features of data item di with benefitratio bri [6] defined as the ratio of access frequency fiover item size zi (i.e., bri = fi/zi). The reason is thataccess frequencies of items contribute the profit to reduceaverage waiting time, but data size burdens waiting time.In addition, the benefit ratio can reduce the two-dimensionsearch space to the one-dimension search space. Since eachitem contains two features, which are item sizes and accessfrequencies, optimally allocating the items to each channelrequires large complexity due to the two-dimension searchspace. Therefore, we transform the two features of each itemto one major feature to reduce the two-dimension search tothe one-dimension search.

Channel Allocation. For each iteration, SIA splits adata set of the largest cost into two subsets if the costcan be further reduced. Specifically, SIA first sorts dataitems d1, d2, ..., dM in descending order according to theirbenefit ratio bri (1≤ i ≤ M), where M is the number ofthe data items and K is the number of the channels. SIAcreates item sets D1, D2, ..., DK associated with channelsc1, c2, ..., cK . SIA starts with inserting all the data items tothe item set D1 and then divides D1 into the two item subsetsD

0containing {d1,1, d1,2, ..., d1,p} and D

00containing

{d1,p+1, d1,p+2, ..., d1,M1}.

SIA finds a partition point p to divide data items in D1

into two subsets D0

and D00

such that cost reduction ismaximal. That is, the SIA first moves the data items in D1

to D0. Next, the SIA moves p data items from D

0 to D00

if costb − costa is maximal, where costb is the cost beforepartition and costa is cost after partition. Specifically, weassume that the cost of data set Dm associated with the

channel cm is maximal, and the cost before partition is asfollows,

costb =(PMm

j=1 fm,j)PMm

k=1(zm,k + index)2

2bPMm

k=1(zm,k + index)+PMm

j=1 fm,j

PMm

k=1(zm,k + index)

2b

+

PMm

j=1 fm,jzm,j

b]. (10)

On the other hand, we assume that p data items are movedfrom Dm to Dn associated with the channel cn, and the costafter partition is as follows,

costa =(PMm

j=p+1 fm,j)PMm

k=p+1(zm,k + index)2

2bPMm

k=p+1(zm,k + index)+PMm

j=p+1 fm,j

PMm

k=1(zm,k + index)

2b+PMn

j=p+1 fn,jzn,j

b+

(Pp

j=1 fn,j)Pp

k=1(zn,k + index)2

2bPp

k=1(zn,k + index)+

(Pp

j=1 fn,j)Pp

k=1(zn,k + index)

2b

+

Ppj=1 fn,jzn,j

b. (11)

Afterward, SIA repeats the above steps to split the data setof the largest cost into two subsets until the number of theitem sets Di reaches K, where 1 ≤ i ≤ K. Procedure Tree

Construction then creates a DASTree according to theitem sets Di, where 1 ≤ i ≤ K. Finally, SIA allocates eachdata item and its index path into a data segment and anindex segment respectively. The algorithmic form of SIA isoutlined as follows.

Algorithm SIA(D,K)

Input: database D, K channels.Output: K disjoint subsets of D. i.e., {Di, 1 ≤ i ≤ K}.begin1. while (the number of subsets in D does not reach K)2. Find a subset DMax whose cost is maximal in D;

3. Find a partition point p such that costb − costa ismaximal, where costb and costa are calculatedin Eq. (10) and Eq. (11)respectively;

4. Partition DMax into D0 and D

00, where

D0= {dMax,1, dMax,2, ..., dMax,p},

D00= {dMax,p+1, dMax,p+2, ..., dMax,Mx

};

9797

Page 7: [IEEE 2009 Tenth International Conference on Mobile Data Management: Systems, Services and Middleware - Taipei, Taiwan (2009.05.18-2009.05.20)] 2009 Tenth International Conference

access frequency

size (MB)

0.05 0.06 0.03 0.03 0.02 0.03 0.03 0.03 0.020.04

6.31 7.94 3.98 3.98 3.16 6.31 6.31 7.94 7.945.01

item

access frequency

item

size (MB)

d1

0.12

d2

0.07

d3

0.04

d4

0.21

d5

0.04

d6

0.09

d7

0.02

d8

0.02

d9

0.02

d10

0.02

d12

1.58

d13

1.26

d14

1.26

d15

7.94

d16

1.58

d17

3.61

d18

1

d19

1

d20

1.26

d11

1.26

Figure 3. Profile of data items.

5. Insert D0and D

00into D;

6. DASTree =Tree Construction(D1, D2, ...,DK);7. Insert an index path of each data node in DASTree

into an index segment preceding each data segmentin each channel;

end

3.3. Example for Execution Scenario under SIA

3.3.1. Overview. SIA first sorts data items in descendingorder according to their benefit ratios. For each iteration,SIA searches for a data set of the largest cost and thendivides the data set into two data subsets such that the costcan be minimized. The above process continues to partitionthe other subset until the number of the subsets reaches K.Afterward, SIA constructs a DSA tree to index each dataitem in each channel. Finally, SIA allocates the data itemsand their indices in each channel. We specifically state thisexample as follows.

3.3.2. Index Allocation. Consider a broadcast profile shownin Figure 3. Database D contains 20 data items dj , 1 ≤j ≤ 20, and the number of broadcast channels K is 5. Thebandwidth of each channel is 5Mbps and the size of indexis 100KB. SIA starts with inserting all data items to an itemset D1. The data items have been sorted in ascending orderaccording to their benefit ratios bri (1≤ i ≤ 20) in D1,where bri = fi/zi. SIA first finds a partition point 10 to splitD1 into D

0 and D00 if costb−costa is maximal, where costa

and costb are in Eq. (11) and Eq. (10) respectively. SIApartitions D1 into the two item subsets D

0 containing {d1,d2, ..., d10} and D

00containing {d11, d12, ..., d20}. Next,

SIA moves data items d11, d12, ..., d20 are moved to D2

and moves data items d1, d2, ..., d10 are moved to D1 asshown in Figure 4(a). SIA repeats above steps to divide theset D2 of larger cost into two subsets and then obtains D3

as illustrated in Figure 4(b). SIA then iterates the above

(b) the second iteration of SIA.

(c) the third iteration of SIA. (d) the final iteration of SIA.

(a) the first iteration of SIA.

d11 d12 d19 d20

d13 d14 d15

d1 d2 d9 d10

d1 d2 d9 d10

d11 d12 d13 d14 d15

d16 d17 d18 d19 d20

d16 d17 d18 d19 d20

d11 d12 d13 d14 d15

d1 d2 d3 d4

d5 d6 d7 d8 d9 d10

d1 d2 d3 d4

d5 d6 d7 d8 d9 d10

d11 d12

d16 d17 d18 d19 d20

Figure 4. Running example under SIA.

steps to partition D1 of the largest cost into two subsetsas depicted in Figure 4(c). Afterward, the SIA terminateswhen the number of data sets Di reaches 5 as shown inFigure 4(d), where 1 ≤ i ≤ 5. Next, SIA builds a DSAtree according to D1, D2, ..., D5 as shown in Figure 5(a).Finally, SIA allocates the data items of each data set andtheir indices into each channel in Figure 5(b). In this cast,the average waiting time of SIA is 4.87.

4. Experimental Results

4.1. Simulation Model

We suppose that each data item has the variant data size.Table 2 lists some simulation parameters. The access fre-quency of the i-th data item is generated by Zipf distributionfi =

( 1i )θPN

j=1(1j )θ , where θ is a skewness parameter, and N

is the number of data items. In addition, the size of eachdata item is represented by 10Φ MByte [6], where the valueof Φ is uniformly distributed over an interval [0, Φ]. Φ isreferred to as a diversity parameter. We conduct severalexperiments to evaluate the quality of SIA. We compareour approach with OPT [5] and TOSA [15]. We adoptdistributed indexing [8] with a B+ tree structure in TOSAand OPT. In the beginning, we set the value of N to 300,the value of K as 5, the bandwidth of each channel as10MBps, the value of Φ as 1.5 and skewness parameter θas 0.8. In the following, we compare the quality of each

9898

Page 8: [IEEE 2009 Tenth International Conference on Mobile Data Management: Systems, Services and Middleware - Taipei, Taiwan (2009.05.18-2009.05.20)] 2009 Tenth International Conference

d5 d6 d7 d8 d9 d10

d16 d17 d18 d19 d20

d11 d12

d13 d14 d15

d1 d2 d3 d4

I

C1

R1,1

d1

R1,4

d4

C4

R4,1

d13

R4,3

d15

C5

R5,1

d16

R5,5

d20

(a) Running example of a DSA tree.

C2

R2,1

d5

R2,6

d10

C3

R3,1

d11

R3,2

d12

(b) Running example of a broadcast program under SIA.

Figure 5. Running examples of a DSA tree and abroadcast program under SIA.

algorithm in different parameters and the execution time ofeach approach.

Table 2. Parameters used in the simulation.

Parameters ValuesNumber of the broadcast channels (K) 5~11

Number of the broadcast data items (N) 100~400Skewness parameter (θ) 0.2~1Diversity parameter (Φ) 1~5

Channel bandwidth 10Mbps

4.2. Performance Analysis

4.2.1. Effect of Skewness. This experiment inspects theaverage waiting time of each approach with changing pa-rameter θ from 0.2 to 1. Figure 6(a) shows that the averagewaiting time of all approaches decreases with increasing θ.This is because fewer data items are accessed frequentlywhen the access frequencies become high skewed. SIAsurpasses TOSA with an average of 97% since SIA achievesthe fair broadcasting rate for the items of different ac-cess frequencies. The experimental result also demonstratesthat it makes sense to schedule the data items with theirbenefit ratios. That is, under the same diversity parameter,

1

11

21

31

41

0.2 0.4 0.6 0.8 1θ

Ave

rage

Wai

ting

Tim

e (s

ec) TOSA

SIAOPT

(a) Comparison of each scheme with the different skewness parameters.

0

20

40

60

100 200 300 400N

Ave

rage

Wai

ting

Tim

e (s

ec) TOSA

SIAOPT

(b) Comparison of each scheme with the different number of data items.

Figure 6. Comparison of each scheme with the differentnumber of channels and the different diversity parame-ters.

the access frequency mainly affects the channel allocation.Consequently, SIA performs well with adapting to variousdata of the variant features for allocating them into multiplechannels.

4.2.2. Effect of the Number of Data Items. Figure 6(b)shows that the average waiting time of all approachesincreases with increasing scales since the number of dataitems allocated to each channel also increases. The qualitygain of SIA over TOSA is about 60% under parameter Nfrom 100 to 400. This is because we focus on transformingmore variables into fewer variables to simplify problems in alarge-scale information system. Specifically, since each dataitem has two features: a data size and an access frequency,we devise benefit ratio bri of data item i to transform twofeatures into single feature. We merely need to tackle thebenefit ratio of each data item. The quality of our approachis close to OPT due to better scalability for the large-scaledata of variant size.

4.2.3. Effect of the Number of Channels. Figure 7(a)shows that the average waiting time of each approachdecreases with increasing parameter K from 5 to 11 sincethe increase in the number of channels causes the decreasein the number of data items allocated to each channel.Figure 7(a) shows the average waiting time of each approachdecreases as parameter K increases from 4 to 8. SIA hasbetter quality than TOSA with an average of 28%. The mainreason is that SIA is devised with a theoretical foundationto allocate the popular items of larger sizes into multiplechannels such that SIA can gurantee cost reduction in eachpartition operation. The experimental results also show that

9999

Page 9: [IEEE 2009 Tenth International Conference on Mobile Data Management: Systems, Services and Middleware - Taipei, Taiwan (2009.05.18-2009.05.20)] 2009 Tenth International Conference

(b) Comparison of each scheme with the different diversity parameters.

(a) Comparison of each scheme with the different number of channels.

0

9

18

27

36

45

5 7 9 11K

Ave

rage

Wai

ting

Tim

e (s

ec)

TOSASIAOPT

0

20

40

60

80

100

1 1.1 1.2 1.3Φ

Ave

rage

Wai

ting

Tim

e (s

ec) TOSA

SIAOPT

Figure 7. Comparison of each scheme with the differentskewness parameters and the number of data items.

our feature model of data items is well designed for betterbandwidth utilization to schedule the items of variant sizesand their indices in each channel. Therefore, SIA is indeedfeasible for heterogeneous data broadcasting in the multiplechannels.

4.2.4. Effect of Diversity Parameter. Figure 7(b) presentsthe average waiting time of each approach with changingparameter Φ from 1 to 1.3. The average waiting timeof all approaches increases with increasing the diversityparameter since increasing the diversity parameter resultsin the increase in large workload in each channel. Accord-ing to our experiment, SIA is superior to TOSA with anaverage of 61% since SIA uses an equation to calculatecost reduction in each partition operation. Figure 7(b) showsthat our algorithm is close to OPT under smaller diversityparameters. Since the data size slightly varies and increasesunder smaller diversity parameter, our algorithm can per-form similar quality as OPT. However, with larger diversityparameters beyond 1.1, the difference between SIA and OPTgradually increases due to larger workload over each chan-nel. In addition, the experimental result also demonstratesthat it makes sense to schedule the data items with theirbenefit ratios. That is, under the same skewness parameter,the data size mainly affects the channel allocation. Thequality of SIA is similar to OPT, whereas TOSA has theworst quality. Under TOSA, broadcasting the data items oflarger sizes more frequently in a cycle leads to larger cost.Therefore, the quality inconsistency between SIA and TOSAbecomes significant with increasing the diversity parameter.In summary, this experimental result shows that SIA issuitable for scheduling the data items of highly diverse datasizes in each channel.

0

10

20

30

40

50

60

100 200 300 400N

Exec

utio

n Ti

me

(sec

)

TOSASIAOPT

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

1 2 3 4 5Φ

Ave

rage

Tun

ing

Tim

e (s

ec)

Distributed index

SIA

(b) Comparison of each scheme with the different diversity parameters.

(a) Comparison of each scheme with the different number of data items.

Figure 8. Comparison of each scheme with the differentnumbers of data items and different diversity parame-ters.

4.2.5. Efficiency Analysis. Since the number of data itemsdominates the execution time of each scheme, we focus onthis effect to measure the execution time of each approach.We assess the execution time of each algorithm with chang-ing parameter N from 100 to 400. In Figure 8(a), we observethat the execution time of SIA is better than OPT. Thereason is that our approach efficiently clusters the data itemsinto each channel with merely calculating the cost of eachcluster. However, the execution time of OPT is prohibitivelylarge and increases with increasing N . This is because OPTimplemented by a genetic algorithm increases the length ofeach chromosome with increasing N . Thus, SIA is able toproduce results which are close to the optimum with muchsmaller execution time.

4.2.6. Tuning Time Analysis. In this experiment, we in-spect the tuning time of a B+ tree [8] and SIA. In the B+tree, indices are organized with a B+ tree structure, and dis-tributed indexing [8] is adopted to allocate these indices intoindex segments, where the size of index segment is 100KB.Figure 8(b) shows that the tuning time of each approachgradually increases with increasing diversity parameters dueto larger data size. The quality gain of SIA over B+ tree isabout 50% due to fewer levels on a DSA tree structure inSIA. Therefore, SIA can lead to lower energy consumptionas compared to the traditional tree structure.

5. Conclusion

Data index is a technique to realize energy saving andaccess convenience in a mobile computing environment.However, traditional schemes design data index without

100100

Page 10: [IEEE 2009 Tenth International Conference on Mobile Data Management: Systems, Services and Middleware - Taipei, Taiwan (2009.05.18-2009.05.20)] 2009 Tenth International Conference

considering variant data size. Therefore, the above drawbackleads to large power consumption in the heterogeneousenvironment. In this paper, the problem of devising anindexing technique of variant data size through multiplechannels was explored. In view of the characteristics ofthe variant data size, we proposed an algorithm to generatea broadcast program to avoid the above drawback so asto minimize average waiting time. Going beyond previousmethods, SIA was proposed to allocate data indices in abroadcast program for minimizing average waiting time. Inorder to evaluate the performance of SIA, we have conductedseveral experiments. During experiments, we analyzed theeffectiveness of SIA with the average waiting time and alsoinvestigated the efficiency of SIA by measuring its executiontime. Experimental results have showed that SIA is of veryhigh quality and in fact is very close to those by the optimalscheme OPT, which is designed by using a genetic algorithmfor comparison purposes. Furthermore, SIA incurs muchlower complexity as compared to OPT. Therefore, SIA isvery efficient and is practically useful in a data broadcastingenvironment.

6. Acknowledgment

The work was supported in part by the National ScienceCouncil of Taiwan, R.O.C., under Contracts NSC 97-2221-E-002-172-MY3.

References

[1] D. Barbará. Mobile computing and database - a survey.IEEE Transactions on Knowledge and Data Engineering,11(1):108–117, 1999.

[2] A. Celik, J. Holliday, and Z. Hurst. Data disseminationto a large mobile network: Simulation of broadcast clouds.Proceedings of the 7th International Conference on MobileData Management (MDM-06), 2006.

[3] M.-S. Chen, P. S. Yu, and K.-L. Wu. Optimizing indexallocation for sequential data broadcasting in wireless mobilecomputing. IEEE Transactions on knowledge and DataEngineering, 15(1), February 2003.

[4] C.-H. Chu, H.-P. Hung, and M.-S. Chen. Variant bandwidthchannel allocation in the data broadcasting environment. InIEEE International Conference on Mobile Data Management(MDM-07), 2007.

[5] D. E. Goldberg. Genetic algorithm in search, optimizationand machine learning. Addison-Wesley Publishing, 1989.

[6] H.-P. Hung and M.-S. Chen. On exploring channel allocationin the diverse data broadcasting environment. In Proc.of the 25th IEEE International Conference on DistributedComputing Systems (ICDCS-2005), 2005.

[7] T. Imielinski, S. Viswanathan, and B. R. Badrinath. Energyefficient indexing on air. In Proceedings of the 1994 ACMInternational Conference on Management of Data, pages 25–36, 1994.

[8] T. Imielinski, S. Viswanathan, and B. R. Badrinath. Dataon air: Organization and access. IEEE Transactions onKnowledge and Data Engineering, 9(3):353–372, May/June1997.

[9] H. Li, C.-C. Huang, and M. Devetsikiotis. A robust adaptiveeffective bandwidth allocation scheme. In Proceedings ofIEEE International Conference on Communication, pages115–119, 2005.

[10] B. Moon. Emergency handling in ethernet passive opticalnetworks using priority-based dynamic bandwidth allocation.Proceedings of the 27th International Conference on Com-puter Communications (INFOCOM), 2008.

[11] A. Nanopoulos, D. Katsaros, and Y. Manolopouslos. Effectiveprediction of web-user accesses: A data mining approach.Proc. WEBKDD Workshop, 2001.

[12] T. Repantis and V. Kalogeraki. Data dissemination in mobilepeer-to-peer networks. In Proceedings of the 6th InternationalConference on Mobile data management, pages 211–219,2005.

[13] J. Xu, D.-L. Lee, and B. Li. On bandwidth allocation for datadissemination in cellular mobile networks. In ACM/KluwerJournal of Wireless Networks (WINET), Special Issue onAdvances in Mobile and Wireless, volume 9 of 2, pages 103–116, 2003.

[14] J. Xu, B. Zheng, W.-C. Lee, and D.-L. Lee. Energy efficientindex for querying location-dependent data in mobile broad-cast environments. In Proceedings of the 19th InternationalConference on Data Engineering, pages 239–250, 2003.

[15] B. Zheng, X. Wu, X. Jin, and D.-L. Lee. Tosa: A near-optimalscheduling algorithm for multi-channel data broadcast. InProceedings of the 6th International Conference on MobileData Management (MDM-05), May 2005.

101101