querying in highly mobile distributed environments

Upload: muralikrishnan-ramane

Post on 06-Apr-2018

224 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/3/2019 Querying in Highly Mobile Distributed Environments

    1/12

    Querying in highly mobile distributed environmentsT. Imielinski and B. R. Badrinath

    Department of Computer ScienceRutgers UniversityNew Brunswick, NJ 08903

    e-mail:{imielins,badri}@cs.rutgers.edu

    AbstractNew challenges to the database field broughtabout by the rapidly growing area of wireless per-sonal communication are presented. Preliminarysolutions to two of the major problems, control ofupdate volume and integration of query process-ing and data acquisition, are discussed along withthe simulation results.

    1 IntroductionThe rapidly expanding technology of cellular com-munications will give users the capability of ac-cessing information regardless of the location ofthe user or of the information. It is expected thatin the near future, tens of millions of people in theU.S. alone will carry a lightweight, inexpensiveterminal that will give them access to a world-wide information network called PCN (PersonalCommunication Network). These users will beconstantly relocating between cells of size muchsmaller than today (future cell size might be abuilding or a floor of a building). Thus, withina day, a mobile user may cross the boundariesof these small cells tens and possibly hundreds oftimes.

    Permission to copy without fee all or part of this mate-rial is granted provided that the copies are not made ordistributed for direct commercial advantage, the VLDBcopyright notice and the title of the publication andits date appear, and notice is given that copying isby permission of the Very Large Data base Endow-ment. To copy or otherwise, or to republish, requiresa fee and/or special permission from the Endowment.Proceedings of the 18th VLDB ConferenceVancouver, British Columbia, Canada 1992

    Mobile users will be able to ask ad hoc queriesaddressed to large databases describing the localarea: information about people, places, its ge-ography, services available, etc. Typical queriesfrom mobile users will range from simple requestssuch as Where is X? or Where is the near-est doctor? to more complex Find me the thebest route to the hospital with the best trafficconditions. Queries may also request data froma mobile source in a location transparent mode:the recent sales figures from a traveling salesmanwho stores this data in the memory of his mobilepalm-top terminal.Before this vision can be fully realized, a num-ber of new database research problems have tobe solved. These problems include the controlof massive location updates, the integration ofquerying and communication, and the partitionof knowledge across the mobile network. It is es-timated (Meier & Hellstern et al., 1991) that theadditional signalling load generated by PCN willbe 4-11 times higher for cellular networks thanfor ISDN and 3-4 times higher for PCN than forcellular networks themselves. Most of this trafficwill be due to location updates from constantlyrelocating users. Additional trafllc, which is dif-ficult to predict yet, will be due to the massivedata transfers generated by ad-hoc queries.The goal of this paper is to present a number ofnew data management problems hopefully of in-terest to the database community that arise dueto the mobility of both data sources and data con-sumers. The management of mobile data is an im-portant part of the emerging area of mobile or no-madic computing. It can also be viewed as an in-tegral component of a broader vision of universaland worldwide access to information independentof its form, representation, location and mobilityof users. In addition to the discussion of the newproblems, we will also provide some preliminarysolutions and simulation results. The fuIl bodyof results, due to lack of space, will be presentedelsewhere.The work presented in this paper is part of a

    41

  • 8/3/2019 Querying in Highly Mobile Distributed Environments

    2/12

    larger project called DataMan which addressesall of the above questions. The DataMan (alogical successor to WalkMan and WatchMan)will be a distributed knowledgebase capable ofhandling a large number of queries from handheld terminals. Such a system running on smallpocket terminals can replace current pocket orga-nizers by making it possible to access and to pro-cess information at any location and at any time.Information could include maps and navigation,local services, and finally identification of othermoving objects as well. The DataMan projectwill be described elsewhere in more detail.This paper has two goals:l Provide preliminary solutions to two of themajor problems mentioned above: controlof the update volume and the integration ofquery processing and data acquisition.l Demonstrate new challenges to the databasefield brought about by the rapidly grow-ing area of wireless personal communication.Here we present a number of open problemsand challenging research questions addressedto the database community.In Section 2 , we fist present a general architec-ture of the cellular system. The next two sectionscover in detail the issues of update and query pro-cessing. These sections contain the main technicalcontributions of the paper.In Section 5, we outline a number of impor-tant open questions and future direction of thiswork. Section 6 summarizes related work and fi-nally, conclusions are presented in Section 7. Pre-liminary simulation results are described in theappendix.

    2 General ArchitectureThe system consists of a fixed information net-work extended with a wireless component. Thewireless elements include: wireless terminals, basestations and switches. The whole geographicarea is partitioned into cells. Each cell is cov-ered by a base station which is connected to thefixed network and provides a wireless communi-cation link between the mobile user and the rest

    The prototype of DataMan will make use of currentresources of the Wireless Information Network Laboratory(WINLA B) at Rutgers University which is supported by18 industrial research sponsors. These resources include acomplete cellular communication system with base stationsand switching software. We will assume the availability of asmall portable low-powered terminal with limited memory(say S-16MB) which can be switched on and off by software(to minimize power consumption).

    of the network. The allocated bandwidth of radiospectrum[l3] in North America includes fiequen-ties between 870 - 890 kHz (from base station tomobile) and 825-845 kHz (from mobile to base).This bandwidth is subdivided into 666 channelswhich are shared and reused among base stationswhich are sufficiently far apart. Currently the av-erage size of a cell is of the order of l-2 miles indiameter[l9]. It is expected, however that with anew generation of wireless systems this size willshrink to possibly 50-200 meters to accommodatea much larger set of users and still benefit frombandwidth sharing and reuse schemes.Additionally, we will assume that there is a hi-erarchy L of location servers which are connectedamong themselves and to the base stations by astandard network (Figure 1). Typically locationservers correspond to Mobile Switching Officesand there are about 60-100 base stations undera location server. Location servers are responsi-ble for keeping track of addresses of users whoare currently residing in the area below the lo-cation server. These addresses may be stored asexact locations - i.e., the identifier of a cell theuser is currently in, or as approximate locations;a zone, or a partition of cells.Each user (sometimes called mobile terminal)will be permanently registered under one of thelocation servers; additionally the user may alsoregister as a visitor under some other locationserver. Base stations will always be aware of mo-bile terminals which are active within their cells.Location servers, however, do not have to knowin which cell a given mobile terminal is currentlylocated - they can always find out by means ofpaging - a multicast message sent to a subset ofbase stations under the given location server.Since the location server needs to contact the basestations over the fixed network, the cost of broad-casting is equal to the number of base stations towhich paging requests are sent2.By the users location we will understand anidentifier of a cell in which the user is currentlyresiding. The database storing users locationswill typically be distributed among many locationservers. Importantly, the database will almostnever store the actual exact location of a user (i.e.,the cell id) but either some outdated previous po-sition or a set of possible locations. Therefore,it will almost always store incompletely specified

    2The cost of broadcast over the wireless medium doesnot depend on the number of recipients. However, broad-casting from the location server to the base stations is per-formed over the fixed network (tree of location servers andbase stations). This involves point to point communica-tion fro m the location server to each of the base stations.Hence, the cost of broadcasting in this case is proportionalto the number of base stations.

    42

  • 8/3/2019 Querying in Highly Mobile Distributed Environments

    3/12

    location data to avoid massive and frequent loca-tion updates.Querying locations will in general lead to com-binations of table look-ups and selective broad-casts. For instance, a simple query Where isJohn? asking for a cell number where John cur-rently resides, depending on the addressing pro-tocol may involve accessing Johns home locationserver and then pagin(corresponding to cells i him within base stationsbelonging to the area cov-ered by this location server. If he is not foundhere then we will have to find the location serverunder which John currently resides. Several pro-tocols can be used: e.g., broadcasting from theroot of the hierarchy of location servers, follow-ing the chain of forwarding addresses (if the userleaves them) etc.

    Public Access Network//\ Location Server xLJ0 nit W / \

    Figure 1: Architecture of Cellular Network

    Static and Mobile DatabasesEach location server will keep a home databaseof all user related data for users with the per-manent residence within the location area admin-istered by the server. Additionally, there willbe a visitor database3 which stores informationabout users who are not living in the aTea butare currently visiting the area. Finally, therewill be databases related to specialized servicessuch as l-SOO-Doctor which would keep informa-tion about doctors who are currently residing insome predefined area possibly different from thearea covered by the location server. Such special-ized databases (views) will be dynamically cre-ated as it is the case now fo r standard telephone

    3These terms have been introduced earlier in the con-text o f cellular networks[l6]

    services. Databases can also be mobile, especiallywith the advent of new generations of cellular sys-tems. People will be carrying pocket terminalsand running computer programs such as editors,data acquisition packages, etc. while relocating.In such a situation, location servers, specializedservers, etc., will become mobile sources of in-formation, which others in the network may con-stantly want to access. For instance, the user maybe collecting data while moving, and he may notnecessarily report all of this new information backto the static server.In the case when all components of the net-work become mobile, users and databases haveinterchangeable roles: each of them can query theother.2.1 Queries and UpdatesThe set of users together with their characteristics(including location) can be viewed as a databasedistributed between different location servers. Wewill assume a simple object oriented view: ourdatabase will have a form of the class of userscharacterized by a number of methods and in-stance variables (attributes). One of these meth-ods will determine the location of the user. Thisclass will be stored as a relation which is hor-izontally partitioned between different locationservers according to the home areas of users.Further in the paper we will talk extensivelyabout querying and updating such a database.By updates we will really mean location updatesof the current location of the user. These massiveupdates will not be transactional in nature andwill not require locking. Queries will be standardrelational queries which could involve the locationattribute both in selection as well as in the targetclause.In the next two sections we address two fun-damental questions: how to update and how toquery locations in a mobile environment.

    3 Do not tell me - if I do notwant to know: Updates

    The main idea in this section is that users do nothave to inform the location server about each andevery change of their location. In other words weshould leave a certain degree of ignorance aboutthe state of the system (i.e. users locations inthis case). W e postulate that ignorance shouldbe bounded, i.e., we should not be too far offfrom the truth. In fact, the bounded ignorance**This term is borrowed from [15], where it is used in

    a different framework - allowing a replicated system to

    43

  • 8/3/2019 Querying in Highly Mobile Distributed Environments

    4/12

    I I c I and patterns of calls.Users usually relocate, especially within theirhome areas according to stable daily patterns(routines). Most of us commute from home towork (and back) along the same route and thenhave daily routines which are also periodic. Theinformation about each users mobility is capturedby the notion of a user profile analogous to creditcard and telephone companies storing financialand calling profiles on a per user basi?.

    Pl P2

    Figure 2: Partitions

    approach will guarantee that the real position ofthe object and the position which is known are al-ways in the same partition. Partitions are defineddepending on the user mobility patterns and willhelp to maintain a certain degree of knowledgeabout users whereabouts (i.e., I do not knowwhere you are exactly at this time but I knowthat you are in New Brunswick). Maintainingsuch knowledge comes with a cost, but as we willdemonstrate it will be much cheaper than keepingthe complete information about individual loca-tions.3.1 Model of agents mobility within alocation serverIntuitively, partitions decrease the overall cost bygluing together cells between which the user re-locates very often and by separating cells betweenwhich the user relocates infrequently. For exam-ple, assume that we have gathered the data aboutthe users mobility between base stations and de-termined the frequencies w ith which the user re-locates.In Figure 2, the numbers on the edges indicatethe number of times the user relocates betweenthe end locations connected by the edge during acertain interval of time. Edges with a high cross-ing rate are contained in the same partition andedges with a low crossing rate in separate par-titions. In this way the update cost slightly in-creases (compared to one partition, when the up-date cost was simply zero) while the query costsignificantly decreases, since we only broadcast tobase stations within a partition. Obviously thereis a tradeoff. Therefore, there is a need for an op-timal partitioning which minimizes the expectedcost of querying and updating. This task requiresknowing more information about users mobilityviolate integrity constraints by a bounded amount

    The users profile will contain:1. Probabilities of relocation between any twolocations within a location serverbase(1).2. Average number of calls per unit time3. Average number of moves per unit timeWe assume that the profile is obtained either bymonitoring user activity (mobility and frequencyof calls) over an extended period of time or/andby getting some of this information directly fromthe user. In general, of course, probabilities willbe approximated by relative frequencies. In thefuture, we will consider models which allow differ-ent patterns of mobility at different time intervals(which is much more realistic, since our daily workroutines are different from after hour routines).This model helps us to decide when to notify thelocation server about the change in our position- it will occur only when we cross the borders ofpredefmed partitions. We assume that partitionswill be modified periodically. Shifts in trends ofuser mobility will be detected and partitions willbe modified accordingly at reorganization points.In a way it is analogous to periodic index reorga-nization.We will describe several possible algorithmsto locate users, assuming some partition basedscheme. In the appendix we present preliminarysimulation results.3.2 Algorithms for locating usersIn the following discussion we assume that usersreside under their home location servers. Thus,the partitions consist of base stations belongingto the same location server. A more general caseof user movement across location servers is pre-sented in [4]. We will consider three differentstrategies within the partition scheme. In each ofthese strategies, a user informs the location serverwhen he crosses partitions thus incurring updatecost, and the search for a user is always within apartition. However, these strategies d ifIer in theway a user is located within the partition. Thethree strategies are:

    5Users profiles will be distributed according to theirhome location areas. Additionally, we may also store ag-gregate profiles for specific geographic regions such as ashopping mall, a train station or a football stadium.

    44

  • 8/3/2019 Querying in Highly Mobile Distributed Environments

    5/12

    1.

    2.

    3.

    Broadcast: a broadcast is initiated within thepartition to locate a user. The cost of locat-ing a user is bounded by the size of the parti-tion, i.e., the number of base stations withinthe partition.List: a list of locations within each partitionis maintained. This list is sorted by likeli-hood o f being at a given location. Each loca-tion is tried successively. Maximum cost willbe incurred when a user is in the least likelylocation within a partition.Pointer: here, each time a user moves to an-other location, within the same partition, aforwarding address or a pointer is maintainedat the previous location. When locating, thepointer is followed from the location first vis-ited in the partition to the current location(similar to caIl forwarding). If the user vis-its a location which he has visited before wecut the resulting loop. The cost of locat-ing a user is bounded by the the length ofthis pointer chain; at most the number ofbase stations within the partition. When auser moves to a different partition, a newpointer chain is maintained in that partition.We assume that leaving forwarding addresses(pointers) at base stations has the same costas informing a base station of a move. How-ever, we have not included the cost of col-lapsing pointer loops nor the cost of removingpointers in the previous partition.

    In all these strategies, the total cost (updatecost + locating cost) depends upon a) the parti-tions and their sizes, b user profiles, i.e., the basestations visited, and c1 the strategy used.Simulation results comparing these strategieswith respect to the total communication cost (notresponse time) will be presented in the Appendix.

    4 I do not know but can findout - Queries

    The discussion in the previous section indicatesthat the database will not usually know the pre-cise location of each user. If the precise locationis needed by a query some extra work is needed.This leads to a new paradigm of query process-ing which involves acquisition of new informationin the run time of the query. Before discussingqueries and their evaluation we wiIl show how tomodel the notion of error when dealing with fastchanging values.

    4.1 Modeling fast changing values - liv-ing with errors

    Location is an example of a fast changing at-tribute. Other examples include temperature,speed, pressure etc. in some sensory environment.It is costly to maintain rapidly changing quanti-ties. Therefore we have to be prepared to ac-cept erroneous (outdated) values. In such caseswe would like to know, however, what is the mar-gin of error, how to find a better, more precisevalue and finally, what is the cost of such a pre-cise value finding procedure.Below we will make a fundamental distinctionbetween the actual and stored value of a dynamicattribute since most often they will not coincide.The notion of partition will help us to define themargin of error. Various user locating methods(such as paging, pointer and probabilistic address-ing schemes) will provide an exact value at addi-tional cost. We will describe all these notions withthe help of an example.

    Example 1A user named John will be specified as fol-lows:Object: JohnPartitions:(Cells, Celld, Cells} (Celll, Cellz),Correctors: Paging with pointerPredictors: {John is at home, in Cells after 6P.M.}Additionally (aside from standard at-tributes such as age, profession, etc), the fol-lowing methods will be defined on John:LOC: JohnLOC will return the location ofJohn according to the database (which maybe outdated and incorrect). For instance, inour case it could be that John.LOC = Cell1ERR: John.LOC.ERR wiIl return the par-tition (a set of locations) which is specifiedby Johns profile and contains JohnLOC -the stored location of John. If John.LOC =Cell1 then John.LOC.ERR = {Cell~,CelZ~}lot: Johnloc returns the actual locationof John. This method may involve not onlydatabase access but also additional consider-able communication in the form of paging andmulticasting (so called corrector methods).Correctors will stand for methods which canbe used to obtain the exact value of the loca-tion, if it is required (that is xloc). In gen-eral, one could have the same correctors asso-ciated with all users (this corresponds to oneuniversal protocol of user location) or differ-ent correctors associated with classes of users(due to e.g. mobility patterns). For instance,the pointer method may be good for one pro-

    45

  • 8/3/2019 Querying in Highly Mobile Distributed Environments

    6/12

    file while probabilistic probing superior for an-other.Finally, the last element of Johns profileillustrates the notion of the predictor. Lo-cation predictor (viewed as a set of rules ora standard procedural method) will computea default location of the user in case theuser himself did not notify the location serverabout ezceptional behavior. In Johns case westated that at 6 P.M. he is (usually) at home.The location server, at 6 P.M. will try firstthe users home to locate him, unless there isa message from the user stating an exception.In such a case, we may have to resort to oneof the paging methods.

    It turns out that quite often queries can be an-swered even in the presence of errors thereforeincurring no cost resulting from applying correc-tors. Some other queries may require some of theerroneous data to be corrected. We will discussthis further in the next section.4.2 Query processingAs we said in the introduction, our queries canrange from simple ones such as where is X tomore complex ones like Find me a doctor nearthe campus or Find X, Y and Z such that all ofthem are on the same highway and Y is betweenX and Z.These queries differ in the complexity of Zoca-tion constraints - i.e., constraints which involveindividual location of users. For example, thequery Find me a doctor near the campus hasone non-location based constraint (doctor) andanother which is a unary constraint on location.The query Find X, Y and Z such that all of themare on the same highway and Y is between X andZ involves a more complex constraint - a ternaryone (between) plus three unary constraints in-volving individual locations (on the highway).Generally, one can get even more complex querieswhich should recognize patterns of moving ob-jects.The main new problem arising in query process-ing in the presence of imprecise knowledge aboutlocations of users can be summarized as follows:How to minimize the communication cost tofind out the missing information necessary toanswer the query?Without loss of generality we will assume adomain calculus query language. We will con-centrate on describing the evaluation strategy forqueries written in a pseudo SQL form:SELECT z1 >., XmFROM UsersWHERE xr.Zoc = Zr A . . . A xn.Zoc = In)

    A C(h,. . .) In) A W(x1,. . .) 41

    where C( El, . . . , In) is an nary constraint im-posed on locations II,. . . , 1, and W(zr, . . . , zCn)is a constraint on individual objects. Users isa name o f the class which stores all instances ofusers in the system.The query wants to evaluate a predicate overactual locations of users, while the only locationsavailable are {xl .LOC, . . . , x,.LOC} with the as-sociated error determined by ERR.A naive strategy to process the query presentedabove is as follows:

    Find objects al,...,% such thatJJ+l,. * a, a,) is satisfied.Determine exact locations al .Zoc, . . . , a,.Zoc ofeach of al,...,% by paging for each i(l

  • 8/3/2019 Querying in Highly Mobile Distributed Environments

    7/12

    d

    Figure 3: Near By Locations

    a doctor are near by. The other two strategiesboth start by first determining whether paging isrequired at all: maybe the predicate Near(x, y)is true or false uniformly for all combinations oflocations in John.LOC.ERR and d.LOC.ERR re-spectively. If this is not the case then we havea choice of first paging John.LOC.ERR or pag-ing d.LOC.ERR. These two strategies may leadto different costs.Notice that paging u = d.LOC.ERR first isclearly advantageous. No matter what the resultof paging is, paging of v = John.LOC.ERR is notnecessary. Indeed, if John is in 17 then no matterwhere v is, the Near predicate is true. If the doc-tor is in 13 or in Z4 hen no matter where John isthe constraint is false. Therefore, the overall ex-pected cost is 3 in this case. This is representedas the first tree in Figure 4, where the nodes la-beled by POS and NEG respectively correspondto situations which satisfy or falsify the query con-straint. The first o f the trees, which correspondsto paging first the location of a doctor, has onlyone level since no further paging is necessary.If we page v first then (without selective pag-ing) we still have in each case to page u. Thisis why the second tree has two levels. Thereforethe expected cost is 6 in this case, shown as thesecond tree in Figure 4 (if we could page selec-tively it would be 4, since we could just page 17and on the basis of us presence or absence thereconclude the truth value of the constraint). If theError(z,Zre) included 1s (i.e. the partition for valso included Ze) then we could no longer ignore vafter paging 21.In general, we are interested in a (expected)minimal communication cost strategy to evalu-ate the query. By a strategy we mean the se-quence in which we will page partitions (if wepage them at all) zk.LOC.ERR. Such a strat-

    A3 114NEG NEG POSJohn

    NEG NEG Pas NEG NEG Pas =G mG POS

    Figure 4: Classification Tree

    egy, in general, can be represented by a tree withnodes corresponding to variables z;, i = 1,. . . , m,and two special terminal nodes called POS (posi-tive) and NEG (negative). POS and NEG respec-tively will indicate that we can already determinewhether the constraint is true or false respectivelyas indicated in our example. The edges are la-beled by conditions (z;.Zoc = c;) where c; belongsz;.LOC.ERR. In the example, we simply labeledthe edges by c; skipping z; as clear from the con-text of the picture.Each node N of the tree is identified withpath(N) which is a conjunction of all condi-tions along the path from the root to the node.A path (2; .Zoc =*h c;,d for k = 1,:. . ,m ter-minates wit POS no e 8 the conjunction ofall (it .zoc = cik) along the path implies con-straint C(zr.Zoc,. . . , z,.Zoc). The same path ter-minates with NEG if the conjunction of all con-ditions along the path implies that the constraintC(Zl .zoc, . . . ) z,.Zoc) is false. For instance, iu thefirst tree of our example, all edges labeled by(d.loc = 13), (d.loc = 14) and (d.loc = 1 7) areterminal nodes. The first two in NEG, the last inPOS. Indeed, d.loc=13 implied logically that theconstraint Near(d.loc, Johnloc) is false no matterwhat Johnloc is (within John.LOC.ERR).

    The cost associated with the node is equal tothe number of outgoing edges which do not term i-nate in NEG node (intuitively this is the cost ofpaging and we do not have to page the locationswhich we can determine lead to a failure).The cost of the path is equal to the sum of thecost of nodes along the path. Each path is equallylikely (hence we assume that any combination oflocations from xl.LOC.ERR, . . ., x,.LOC.ERR

    47

  • 8/3/2019 Querying in Highly Mobile Distributed Environments

    8/12

    is equally likely).The expected cost of the path is the sum ofcosts of all paths weighted by the associatedprobabilities. The expected cost corresponds tothe expected number of paging messages whichwill have to be sent in order to determine ifq.Eoc, . . . , z,.Zoc satisfy the query.How hard is it to find a tree corresponding tothe optimal strategy?In [3] we demonstrate that the problem is NP-complete, by showing a strong (and somewhatsurprising) connection of this problem and theclassification problem in machine learning. Agreedy heuristic based on the ID3 [21] like algo-rithm to construct a tree is demonstrated there.

    5 New ChallengesHaving shown our approaches to the importantproblems of update and query in mobile environ-ments, we now proceed to a summary of otherchallenges and open problems which the mobileenvironment of personal communication systemsbrings into the database field.

    l Incorporating Data Acquisition into QueryProcessingWe have provided here a model for incorpo-rating data acquisition into the query answer-ing process.Interesting open problems involve the impactof different types of constraints on the selec-tion process for the best classification tree. Itshould be much easier to find such a tree ifthe constraint is a conjunction of unary (bi-nary) predicates. Also, there is a need fora more refined cost model which would alsotake into account the total time necessary tobroadcast all paging messages.Further, we have ignored the possibility thatqueried objects may reside in the same areaand we may be able to locate a number ofthem within a single broadcast (we have as-sumed that finding out about each object re-quires a separate broadcast).

    l New Types of QueriesA highly mobile and distributed environmentcreates new scenarios for query processing:

    - Queries depending on location of theperson asking the query.Where is the nearest restaurant? Giveme the best directions to the hospital in-cluding the current traffic reports. Suchqueries will involve geographic and real

    time data. This will lead us to the ques-tion of how to create yellow pages dy-namically.- Queries depending on direction of move-ment Knowing our position on a mapwill not be sufficient; we may need toknow the direction of movement to an-swer a query about the best way to

    reach the destination.- Aggregate QueriesSuch queries could measure global traf-fic in the area to help in dynamic al-location of resources such as frequencyallocation in a particular cell. For ex-ample, the area surrounding a footballstadium after a game should be assignedmore frequencies since the expected ac-tivity may be higher there. Aggregatequeries will help in establishing suchcritical zones. Such queries have to takeunder consideration the road maps inthe areas considered especially when de-tection of high trafllc patterns is of in-terest. This requires careful integrationwith geographic data, street and build-ing maps etc. One may want to detectareas with no policemen or with toomany taxi cabs (for a taxi dispatcher).This may require complicated matchingof patterns of locations of mobile agentsin the presence of transient data.

    l Querying Transient DataInformation changes so fast in our applica-tion that it may change during query evalu-ation. This creates an interesting researchproblem on the very basic, semantic level.What is the meaning of the answer to aquery? How to compute it and how to aug-ment it? How to incorporate prediction intoquery processing?Frequency is another important resource: forexample we cannot issue an arbitrary numberof paging actions. Usually there is a limitednumber of paging channels and if all of themare busy we have to wait. This blocking effecthas to be treated as part of the general prob-lem of dynamic resource allocation in the cel-lular network - some of these roblems arecurrently being addressed in [13P

    6 Related WorkCellular technology is rapidly gaining public ac-ceptance. Wireless computing and access to in-formation in a wireless network will soon follow.

    48

  • 8/3/2019 Querying in Highly Mobile Distributed Environments

    9/12

    The research work described in this paper is, toour knowledge, one of the few efforts that is ad-dressing issues of information access in a mobiledistributed environment.Current efforts in providing support fo r wirelesscomputing have focussed on adapting current cel-lular technology to hand held computers or wire-less terminals. Networking software designed for

    intemet use is not applicable to the wireless net-work problem of dynamically changing addresses;seamless handoffs from one network to anotherrequire new solutions. Designing network soft-ware that is transparent to movement across net-works, as well as the desigu o f hardware and userinterfaces fo r hand held computers, is the focusof research at Columbia [8]. Solutions to main-taining a fle system for mobile users (where localdisks and excessive amounts of RAM are not feasi-ble), and allowing remote paging from file serversin a mobile environment are presented in [ll, 121.Automobile navigation systems also providelimited information about surrounding areas byindicating destinations on a display map. Thishelps drivers to navigate in unknown areas[5].Continuous monitoring of moving objects for air-defense coverage was the goal of the SAGE project

    i181. Continuous monitoring of moving objectsplatforms) by means of active objects [7] hasbeen discussed in [S] Here, the cost of polling forthe evaluation of conditions is compared to that ofmonitoring by means of active objects or triggers.

    Cellular networking technology for access tofuture wireless networks (third generation andbeyond) is being investigated at WINLAB inRutgers[l3, 141. Research is focused on networkprotocols, cellular packet switching, network ar-rangements, and spectrum reuse. Further, mobil-ity requires that users and service providers needto be located. The issue of distributed locationservice for tracking mobile users has tist been ad-dressed in [l, 21. Here, the directory server sup-ports find and move operations. A hierarchicaldirectory structure that minimizes the cost of a se-quence of find and move operations is presented.Further, on a move, only nearby directories areupdated, so that only approximate informationabout users location is maintained. For find op-eration, nearby users will be able to find the exactlocation but users far away will have to follow achain o f pointers to Ford the exact location.Thus, most of the the current research workhas concentrated on cellular networking and ar-chitectures for hand held terminals in mobile net-works. To our knowledge the research work pre-sented here is the first attempt to address the is-sues of querying information in wireless and mo-bile environment (in a much wider context thanautomobile navigation systems).

    7 ConclusionsThe vision of universal wireless personal com-munication offers new challenges to the databasefield. In this paper we have identified a number ofsuch new research problems arising due to mobil-ity. We provided preliminary solutions to updatecontrol and query processing problems ; we havealso described a number of additional research is-sues.This research e ffort can be viewed as an in-tegral component of a broader vision: universaland worldwide access to information independentof its form, representation, location and mobilityof users. Work on heterogeneous database sys-tems addresses the first two issues; Our focus ison the other two issues - location and mobility.All these features have to be f!uUy addressed in or-der to achieve the final goal of total transparencyof access regardless of form, representation, loca-tion and mobility.AcknowledgmentsWe are grateful to Irv Traiger for the extensivecomments and help in improving the presentationof the paper. We would like to thank David Good-man for letting us know the state of the art incellular technology and Greg Polbni for explain-ing the models used in cellular networks. Thanksalso go to Phil Bohannon, Asher Mahboob, MilesMurdocca, and Wojtek Szpankowski for variouscomments on the paper.

    ReferencesPI

    PIPIPIPI

    Baruch Awerbuch and David Peleg, Con-current online tracking of mobile users,Proc. ACM SIGCOMM Symposium on Com-munication, Architectures and Protocols,October 1991.Baruch Awerbuch and David Peleg,SparsePartitions, Proceedings 31st IEEE Sympo-sium on FOCS, October 1990, St. Louis.T. Im.ielinski and B. R. Badrinath, Query-ing mobile data, In preparation.T. Imielinski and B. R. Badrinath, Adaptivetracking of mobile ob jects, In preparation.James L. Buxton, Stanley K. Honey, WadymE. Suchowerskyj, and Alfred Templehof,The Travelpilot: A Second Generation Au-tomotive Navigation System, IEEE Trans-actions on Vehicular Technology, Vol. 40,NO. 1, February 1991.

    49

  • 8/3/2019 Querying in Highly Mobile Distributed Environments

    10/12

    PI

    PI

    PI

    PIWIWI

    WI

    WIPIP51

    WI

    PI

    Sharma Chakravarty and Susan Nesson,Making an Object-Oriented DBMS Active:Design, Implementation, and Evaluation of aPrototype, International Conference on Ex-tending Database Technology, Venice, Italy,March 1990. (Also in Lecture notes in Com-puter Science, Vol. 416, pp. 393-406.Umesh Day4 etal., The HiPAC Project:Combining Activedatabases and timing constraints, Proceed-ings of SIGMOD Record, March 1988.Daniel Duchamp, Steven K., Gerald Q.Maguire, Software technology for wirelessmobile computing, IEEE Network Maga-zine, November 1991.Garey and Johnson, A guide to NP completeproblems, W. H. Freeman, 1979.HyafiI, L., and Rivest, R.L., Construct-ing optimal binary decision trees is NP-complete, Inform. Process. Lett. 5 , No.1,1976 pp. 15-17.Bill N. Schilit and Dan Duchamp Adap-tive remote paging for mobile comput-ers Columbia University, Technical Report,CUCS-004-91, February 1991.Carl D. Tait and Dan Duchamp, Serviceinterface and replica management algorithmfor mobile fiIe system clients, Proceeding ofthe Parallel and distributed information sys-tems conference, December 1991.David J. Goodman, Trends in Cellular andCordless Communications IEEE Communi-cations Magazine June 1991.David J. Goodman, CelIuIar Packet Com-munications IEEE Transactions on Commu-nications, VOL. 38. NO 8. August 1990.N. Krishnakumar and Arthur BernsteinBounded Ignorance in distributed systemsIn Proc. ACM PODS, Denver, CO, May1991, pp. 63-74.Kathleen S. Meier-He&tern, EduardoAIonso, The Use of SS7 and GSM to sup-port high density personal communications,Technical Report WINLAB-TR-24, RutgersUniversity, December 1991.Readings in Distributed AI, Edited by AlanBond and Les Gasser, Morgan Kaufmann,1988

    WIWIPO1

    PllWI

    1231

    PIP51

    A

    John F. Jacobs, SAGE Overview, Annalsof the History of Computing, Vol. 5, NO. 4,October 1983.William Lee Mobile cellular Telecommuni-cation systems, Mcgraw-Hill, 1989.C. N. Lo, R. S. Wolff and R. C. Bern-hardt, An estimate of network databasetransaction volume to support universal per-sonal communication services, Submitted tothe 1st International conference on UniversalPersonal Conmrunications.J. R. Quinlan Induction of decision trees,Machine Learning, 1986, pp. 81-106.John Mylopolous, Alex Borgida, MathiasJarke, and Manolis Koubarakis, Telos: Rep-resenting knowledge about information sys-tems ACM TOIS, Vol. 8, No. 4, Ott 90 pp.325-360.M. Stonebraker and L. Rowe, The Design ofPOSTGRES, Proc. of the ACM SIGMODconference on Management of Data, Wash-ington, D.C., May 1986.Gio Wiederhold Mediators in the architec-ture for future information systems Unpub-lished Manuscript.Ouri Wolfson and Sushil Jajodia, Dis-tributed AIgorithms for Dynamic replicateddata To appear in ACM PODS, June 1992.

    Simulation studiesPrehrninary simulation studies have been con-ducted to study the effects of using user profiles,in conjunction with the idea of partitioning, onthe cost of tracking users by various algorithmsdescribed in Section 3.2. The objective of the sim-ulation study is to evaluate various design choicesand not an in-depth performance evaluation.A.1 Simulation modelIn the simulation model we assume that ourqueries are simple calIs to locate a user. Up-dates are determined by mobility. The ratio ofqueries to updates, termed the call to mobilityratio, is a critical parameter in our studies.Further, the important aspect of the perfor-mance model is the use of profiles in evaluatingvarious algorithms for locating a user. Since theuser profiles vary considerably, it is necessary to

    50

  • 8/3/2019 Querying in Highly Mobile Distributed Environments

    11/12

    consider typical profiles and study how each strat-egy employed to locate a user performs for theseprofiles.There are a fixed number of base stationsamong which users move. Moves and calls aregenerated according to random variables. Themean times between successive calls or successivemoves are given by exponentially distributed ran-dom variables. The call/mobility ratio is the ra-tio of the rate (reciprocal of the mean) of callsto the rate of moves. This ratio is an experi-mental parameter. When a move is generated,the user either decides to stay in the same basestation or moves to the next base station whichis determined from the user profile. This profileis represented as a transition probability matrix.This matrix gives the probability of a user mov-ing from base station i to j. From the transitionprobability matrix, it is possible to determine thenumber of times a user moves from base station ito j within a specified time interval. Thus, withina time interval, it is possible to determine theweights of the edges, i.e., the number of cross-ings per time interval. Examples of user profiles,with transition probabilities shown as labels onthe edges, are shown in Figure 5. For example,in user profile I (partitions are Pr ={2,3,4,5), J'3 = {6}, Pa =

    (11

    ,P2 ={7,8,9,10 ), theuser moves randomly within partitions P2 and Pd.Profile IIP3 = {5,6 ipartitions are PI = {1,2},Pz = {3,4},, PJ = {7,8}, P5 = (9,lO) , representsthe case where a user relocates often 2 etween twobase stations and then moves to another base sta-tion and repeats this behavior.When a user crosses a partition, the current lo-cation is updated, incurring an update cost. Thecost of the user informing the base station or loca-tion server is more expensive than a base stationor location server contacting the user. This is be-cause when a user contacts a base station, thereis contention among other users and also the basestation needs to authenticate users. Authentica-tion requires messages to be sent between the userand the base station. Hence, more messages andtime are needed for a user to contact the base sta-tion than the other way around. For purposes o fthe simulation study, we have chosen the cost ofupdate to be four times that of a calI (when thecurrent location is known). This is because of theresource contention due to blocking when com-municating from the mobile user to the locationserver (via the base station). The cost of the callsimply reflects the communication cost from thelocation server to the mobile user. If the locationof the user were unknown then the cost would alsohave to include the cost of search.Further, a table is maintained in the locationserver to detect the partition to which a base sta-tion belongs. This table gives the partition num-

    Profile1

    Profile11Figure 5: User Profiles

    ber for each base station. When a call is gener-ated, the cost of the call, if the current locationis not known, will depend upon the strategy em-ployed to locate the user. Thus, for each event,the cost due to either an update or a locate call isaccumulated to obtain the total cost incurred byvarious strategies.A.2 Simulation resultsIn this section, we present performance results forstrategies under the partition scheme. The perfor-mance metric employed in comparing various al-gorithms is the total cost (update cost + locationcost) incurred as a function of the call/mobilityratio. The experiment is repeated for differentuser profiles. The user profiles chosen are shownin Figure 5.Figure 6 shows the cost per event (calls +moves) (average of 5 runs, for 100000 events) asa function of the call to mobility ratio for varioususer profiles. Each curve represents the perfor-mance of an algorithm. The various algorithmsare: Broadcast, List, and Pointer.Under high call/mobility ratios, the cost of anevent for broadcast converges to the number ofbase stations in the partition; and to the averagelength of the pointer chain and the average num-ber of base stations in the partition for the pointerbased scheme and the list based scheme, respec-

    51

  • 8/3/2019 Querying in Highly Mobile Distributed Environments

    12/12

    tively. The list based scheme perform s better thanboth the broadcast and the pointer based schemesfor Profile I over a wide range of call/mobilityratios. The pointer based scheme performs bet-ter than the list based scheme only for a highcall/mobility ratio in Profile II. Pointer basedscheme incurs a penalty for informing the basestations of the new address for each move. Thelower the mobility, the lower the cost of pointers.Thus under high call/mobility ratios, the pointerbased scheme incurs a lower cost than under lowcall/mobility ratios.The improvement in performance (decrease incost) as a result of partitioning is very sign ificantcompared to that of broadcasting without parti-tioning. For example, with the above user profilesand the chosen partitioning, the improvement inperformance for moderate to high call/mobilityratios with the pointer and list based schemesis two to three fold over schemes that just em-ploy broadcasting. Further, broadcasting with-out partitioning (i.e., broadcasting to all base sta-tions under a location server) would perform evenworse. The pointer based scheme is suited to usersexhibiting random movements within each par-tition and when the call/mobility ratio is mod-erate to high (Profile I). Under these circum-stances, if a list based scheme is used, then a callwill be expensive as one has to try all equallylikely locations. However, the list based schemeis suited to profiles exhibiting a more determ in-istic movement within small partitions and whenthe call/mobility ratio is low to moderate (ProfileII). At low call/mobility ratios (high mobility),the list based scheme does not incur any cost dueto updates and when a user is often found in oneor two most likely locations, the cost of the call isalso cheaper. Hence the list based scheme outper-forms the pointer and the broadcasting schemesunder this condition.Simulations demonstrate that profile informa-tion can drastically reduce the number of loca-tion updates. In the future, we plan to performmore extensive simulations and also extend ourcost model to include the costs of storing profileinformation. We also plan to simulate a dynamicupdating protocol, where the decision whether toupdate the location or not is based on the recenthistory of the call to mobility ratio. This is anal-ogous to dynamic replication schemes [25] wherethe number of replicated copies of data dependsupon the read-write pattern on the data item.

    CO