[ieee 2007 ieee international conference on research, innovation and vision for the future - hanoi,...

6
A Gaussian Mixture Model for Mobile Location Prediction Nguyen Thanh An, Tu Minh Phuong Posts and Telecommunications Institute of Technology, Vietnam ancntt2002gyahoo.com.au, phuongtmgfpt.com.vn Abstract Location prediction is essential for efficient (LA), which usually consists of several cells. For the latter location management in mobile networks. In this paper, we location update scheme, when a call arrives, the exact location propose a novel method for predicting the current location of a of the terminal is queried by paging all the cells belonging to mobile user and describe how the method can be used to facilitate the last updated LA. To further reduce the cost of paging, paging process. Based on observation that most mobile users researchers have attempted to develop more efficient paging have mobility patterns that they follow in general, the proposed techniques [2,3,6,7]. Usually, the information about the last method discovers common mobility patterns from a collection of user moving logs. To do this, the method models cell-residence updated location is used to derive the most probable current times as generated from a mixture of Gaussian distributions and location. More sophisticated methods rely on movement use the expectation maximization (EM) algorithm to learn the history or directional bias to make paging decision. model parameters. Mobility patterns, each is characterized by a In previous works on location management, the user common trajectory and a cell-residence time model, are then mobility is often modeled as a stochastic random walk, based used for making predictions. Simulation studies show that the on which location management techniques are designed and proposed method has better prediction performance when evaluated. In practice, however, mobile users do not move compared with two other prediction methods. randomly at all. Instead, almost every user has a movement Keywords_- . pattern which he follows, in general. For example, a typical Kobileynetwords user leaves home in the morng, follows a certain route to the mobile network. work and gets back home in the evening. The existence of such movement patterns has motivated studies on using them I. INTRODUCTION to predict the exact location, thus reduce the number of paged In a personal communication service (PCS) network, each cells. mobile terminal is free to move within the whole service area. Along this direction, Yavas et al. [13] used a sequence As a call arrives for a terminal, the system searches the mining algorithm to find user movement patterns (UMPs) terminal for call delivery by sending signals to cells where the from a set of training movement trajectories and predicts the terminal is probably located. This process is known as paging. next cell the user is moving into by matching the sequence of A naive strategy for paging is to search the entire network. the last cells visited to the stored UMPs. By using sequence Since the number of cells can be very large, to reduce the cost alignment techniques, the method allows inexact matching of movement patterns and thus reduces the affect of fluctuations of paging, every terminal periodically informs the network about its location. This process is known as location update. m t o . For the network to be always aware about the current location long the user stays within a cell; instead it requires the of a terminal, location update should be done every time the terminal to inform whenever it is going to make an inter-cell mobie trmial cosss acell's boundary. However, for large move. Another method that also uses data mining techniques metwobile with small cell sizes, such a scheme temnlcrsen to discover movement behavior of each user was described by networks with small cell sizes, such a scheme results inWmta.[5.Teatos sue httm onso significant increase of signaling traffic for updating alone, Wu et al. [15]. The authors assumed that time points of which can achieve up to 7000 of additional traffic by some asstin testimate topmal pinarafutions and used this estimates [4]. assumpt.on to estimate the optimal pagg area for any given As a result, to minimize paging and updating traffic, a tRmepointl compromise between pure paging and updating is usually R taken, e.g. by combining a partial update scheme with an networks (ANN) has been used for location prediction by efficient paging strategy. The mobile terminal does not inform several research groups. In [4], for each mobile host, one neural network is trained on data of its past movements and is the network every time it crosses the boundary of a cell. Rather, the terminal updates its location at constant time maintained by the system to make prediction about probable intervals or when it crosses the boundary of a location*area areas where the host may reside. The malor limitation of this method iS that maintainig one neural network for each mobile host is infeasible when the number of users is large. To This work was supported by Ministry of Science and Technology of overcome this, Majumdar and Das [9] proposed a combination Vietnam under agrant for fundamental research. of a self organizing feature map (SOFM) and ANNs. The 1-4244-0695-1/07/$25.00 ©2007 IEEE. 152

Upload: tu-minh

Post on 24-Mar-2017

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: [IEEE 2007 IEEE International Conference on Research, Innovation and Vision for the Future - Hanoi, Vietnam (2007.03.5-2007.03.9)] 2007 IEEE International Conference on Research, Innovation

A Gaussian Mixture Model for Mobile LocationPrediction

Nguyen Thanh An, Tu Minh PhuongPosts and Telecommunications Institute of Technology, Vietnam

ancntt2002gyahoo.com.au, phuongtmgfpt.com.vn

Abstract Location prediction is essential for efficient (LA), which usually consists of several cells. For the latterlocation management in mobile networks. In this paper, we location update scheme, when a call arrives, the exact locationpropose a novel method for predicting the current location of a of the terminal is queried by paging all the cells belonging tomobile user and describe how the method can be used to facilitate the last updated LA. To further reduce the cost of paging,paging process. Based on observation that most mobile users researchers have attempted to develop more efficient paginghave mobility patterns that they follow in general, the proposed techniques [2,3,6,7]. Usually, the information about the lastmethod discovers common mobility patterns from a collection ofuser moving logs. To do this, the method models cell-residence updated location is used to derive the most probable currenttimes as generated from a mixture of Gaussian distributions and location. More sophisticated methods rely on movementuse the expectation maximization (EM) algorithm to learn the history or directional bias to make paging decision.model parameters. Mobility patterns, each is characterized by a In previous works on location management, the usercommon trajectory and a cell-residence time model, are then mobility is often modeled as a stochastic random walk, basedused for making predictions. Simulation studies show that the on which location management techniques are designed andproposed method has better prediction performance when evaluated. In practice, however, mobile users do not movecompared with two other prediction methods. randomly at all. Instead, almost every user has a movement

Keywords_- . pattern which he follows, in general. For example, a typicalKobileynetwords user leaves home in the morng, follows a certain route to themobile network.

work and gets back home in the evening. The existence ofsuch movement patterns has motivated studies on using them

I. INTRODUCTION to predict the exact location, thus reduce the number of pagedIn a personal communication service (PCS) network, each cells.

mobile terminal is free to move within the whole service area. Along this direction, Yavas et al. [13] used a sequenceAs a call arrives for a terminal, the system searches the mining algorithm to find user movement patterns (UMPs)terminal for call delivery by sending signals to cells where the from a set of training movement trajectories and predicts theterminal is probably located. This process is known as paging. next cell the user is moving into by matching the sequence ofA naive strategy for paging is to search the entire network. the last cells visited to the stored UMPs. By using sequenceSince the number of cells can be very large, to reduce the cost alignment techniques, the method allows inexact matching of

movement patterns and thus reduces the affect of fluctuationsof paging, every terminal periodically informs the networkabout its location. This process is known as location update. m t o .For the network to be always aware about the current location long the user stays within a cell; instead it requires theof a terminal, location update should be done every time the terminal to inform whenever it is going to make an inter-cell

mobietrmial cosss acell's boundary. However, for large move. Another method that also uses data mining techniquesmetwobile with small cell sizes, such a scheme temnlcrsen to discover movement behavior of each user was described bynetworks with small cell sizes, such a scheme results inWmta.[5.Teatos sue httm onsosignificant increase of signaling traffic for updating alone, Wu et al. [15]. The authors assumed that time points ofwhich can achieve up to 7000 of additional traffic by some asstin testimate topmal pinarafutionsand used thisestimates [4]. assumpt.on to estimate the optimal pagg area for any given

As a result, to minimize paging and updating traffic, a tRmepointlcompromise between pure paging and updating is usually R

taken, e.g. by combining a partial update scheme with an networks (ANN) has been used for location prediction byefficient paging strategy. The mobile terminal does not inform several research groups. In [4], for each mobile host, one

neural network is trained on data of its past movements and isthe network every time it crosses the boundary of a cell.Rather, the terminal updates its location at constant time maintained by the system to make prediction about probableintervals or when it crosses the boundary of alocation*area areas where the host may reside. The malor limitation of this

method iS that maintainig one neural network for each mobilehost is infeasible when the number of users is large. To

This work was supported by Ministry of Science and Technology of overcome this, Majumdar and Das [9] proposed a combinationVietnam under agrant for fundamental research. of a self organizing feature map (SOFM) and ANNs. The

1-4244-0695-1/07/$25.00 ©2007 IEEE. 152

Page 2: [IEEE 2007 IEEE International Conference on Research, Innovation and Vision for the Future - Hanoi, Vietnam (2007.03.5-2007.03.9)] 2007 IEEE International Conference on Research, Innovation

SOFM takes input data about movement histories of mobile to end at ci, and a new trajectory is started at ci,1. Thus the firstusers and clusters them into groups with similar movement subsequence < (c1, t1), (c2, t2), ..., (ci, ti)> is stored and the restbehaviors. The movement pattern of each group is then of Tr is processed further. Note that in [13], the time points t1,modeled by a single neural network. This allows reducing the t2, ...,tk were used only to partition the original trajectories andoverall number of neural networks maintained in the system. were not stored after this step. In our work, these times are

In this paper, we propose using Gaussian mixture models stored together with cell-ids and will be used to describe thefor learning user movement profiles and making location cell-residence time patterns of the user.prediction. Following the standard distance-based updatescheme, the mobile host sends its recent locations along withthe update message. Using data on movement histories oflarge number of subscribers, the model training is done off-line and the trained model is stored for future prediction.Whenever a call arrives for a mobile host, based on the recentupdated locations of the host, the most probable locations arepredicted with use of the trained model. The prediction is thenused to guide paging process until the mobile host is found.

The rest of the paper is structured as follows. In section 2,we present the location management settings and describe the Figure 1. Location area of distance based update with distance threshold 3input data. The location prediction method is detailed insection 3. We present experimental design and results in The trajectories obtained by the above procedure are calledsection 4 and conclude the paper in section 5. user actual trajectories (UAT). For the ease of our next

II. BACKGROUND explanation, each UAT < (cl, tl), (c2, t2), . . ., (cn, tn) > of lengthn is said to have two components which are named differently:

We assume settings that are similar to ones described in the first component contains only cell-ids < c1, c2, . . ., cn> and[9]. The location management strategy consists of a is called user actual path (UAP); the second component is thepredetermined location update scheme. On call arrival, the sequence of time points < t1, t2, ..., tn > and is named usersystem predicts the current location of the terminal based on actual time sequence (UATS). The UATs serve as input data,the information received during the last updates, and paging is from which the models describing user mobility patterns willdone accordingly. be learnt.

Following [9], we choose the distance based location The learning algorithm consists of two steps. During theupdate scheme, in which the terminal updates whenever it first step, the algorithm finds all the paths of a predefinedmoves certain threshold distance from the last update. This length that are frequently followed by the mobile users. Suchscheme can be implemented without GPS as follows [14]. paths are called user location patterns (ULPs) and do notWhenever a mobile terminal updates its location, the network contain the time component of the users' mobility patterns. Insends to it the list L of all the cell-ids, the distances from the second step, for each ULP, the algorithm constructs awhich to the current cell are d, where d is the distance model that incorporates the cell-residence time characteristicsthreshold for update (fig. 1). By doing this, the network forms of the users. Here cell-residence times are modeled as aa circle LA of radius d with the present cell being the center. mixture of normal distributions, the parameters of which areWhenever the terminal crosses the boundary of a cell, it learnt from input data using the expectation maximizationmatches the cell-id against the entries of L and updates if a (EM) algorithm. Together with ULP, the mixture modelmatch occurs. In this work, we assume the hexagonal shape of allows completely characterize user mobility patterns.cells, but this update scheme is easily generalized to any After the models are learnt, they can be used by the systemcellular architecture. to predict the current location. For the prediction to be made,

With the update scheme above, the terminal can be the following information is presented as input to the system:anywhere inside a circle of radius d when a call arrives. Based 1) the last m cell-ids the terminal visited; 2) the times points aton the observation that most mobile users have their own which the terminal entered the cells; and 3) the time elapsedmobility patterns, it is possible to predict the current location from the last update or call termination. Based on theseby learning the models that summarize the patterns from past features, the system calculates the most probable cells, inevents. Following [13], we collect the movement trajectories which the terminal can be located along with theirof a mobile user in the form of Tr = < (c1, t1), (c2, t2), ..., (Ck, probabilities. The cells are then paged in descending order oftk) >, where ci (i = 1,.. .,k) is the ID of the cell which the user their probabilities until the terminal is found.enters at time ti. After the movement history of a user iscollected for a predefined time interval in the above format, III. LOCATION PREDICTION WITH GAUSSIAN MIXTUREthese trajectories are partitioned into subsequences by the MODELSfollowing procedure. If the time interval between two Our algorithm for learning user mobility models first findsconsecutive entries (ci, ti), (ci±1, ti±1) of a trajectory Tr exceeds frequently followed paths in form of UEP and then learns thea threshold t*, i.e. if t1±1- t1> t*, then the trajectory is assumed models of cell-residence time for each ULP from input data. In

153

Page 3: [IEEE 2007 IEEE International Conference on Research, Innovation and Vision for the Future - Hanoi, Vietnam (2007.03.5-2007.03.9)] 2007 IEEE International Conference on Research, Innovation

this section, we give a description for each of these steps and should be modeled carefully as described next.for the prediction phase. Consider a ULP < c1, C2,.., cm+i > and its correspondingA. Findingfrequent user location paths set T= {T,, T2,...,Tn} ofUATSs, where T = < fi1, t62, ..., tm-l

> for all i 1, , n; n is the support of the ULP. The cell-As mentioned in the previous section, a ULP is defined as a residence times r for cell c (1 . . m) are calculated as

sequence of neighboring cells, and has a predefined length m. . L r d t vEach ULP should be contained in a large number of UAPs so times for the i-th UATS.that it corresponds to a frequent pattern in user movement We assume that there are K types of mobile users sharingbehavior. Here we define a sequence A = < a1, a2, . ,an > to be the given ULP. The users of each type have similar mobilitycontained in a sequence B = < b1, b2, ..., bl > if A is a behaviors and thus have similar cell-residence times, insubsequence of B, that is if there exists an integer i such that aj general. For example, users that travel from home to work by= bi + for all I <j < n. The support of a ULP is then defined as cars tend to move faster than ones using bicycles and thereforethe number ofUAPs that the ULP is contained in. have shorter cell-residence times. In practice, there are a small

The problem of finding ULPs is a special case of the number of user types depending on the city transportationsequential pattern mining problem [1]. However, in organization. For each type of users, since they follow theconventional sequential pattern mining, the lengths of the common pattern in general, it is naturally to model cell-patterns to be discovered are not known in advance. In this residence times as a Gaussian distribution (figure 2). Let thepaper, the requirement that the ULP length be predefined has Gaussian distribution for user type k at cell cj be given bybeen introduced for the simplicity when implementing the m2vnext step of the algorithm. With this requirement, findig all mULPs becomes trivial. Let 0 denote a pre-specified support P(r' k, Au, ca) = 1 P(r'j kk,u, ca)threshold. For every subsequence s of length m+1 of each j=1UAP, we count the number of times s occurs in the UAP set,(1which is the support of s by definition. If the counted supports,I exp- (rliu)2jj)exceeds Othen s is recorded as a new ULP. Here m is a

I --kj k 2 j

paremeter of the method.we

The result of this step is a set of ULPs, each of length m+±1. where Au ande

are the parameter vectors for the Gaussians.Together with each ULP we also store all the user actual time The first equation follows from the assumption that cell-subsequences (UATS) corresponding to this ULP. residence times at any two cells are independent from each

other.B. Gausian mixture model learning With the assumptions above, cell-residence times r< can beAlthough ULPs provide important information about seen as generated independently by the following process. For

mobility behaviors of the users, they cannot be used alone to each j, a user type k from a set of K types is picked randomlymake accurate predictions without considering other aspects of with probability Pk. Then, r'< is drawn from a Gaussian whoseuser movement such as velocity or cell-residence time. To mean and deviation depend on the user type k as mentionedillustrate this, let us use the following example. Assume the above. Thus we have:recent movement history of a user is given by a sequence of n Kcells <C4, C7, c8>, which matches the prefix of the following P(r A ,cra) 1 pkP(r' Ik, Au, )ULP <C4, C7, C8, C6, cjO>. A naive use ofULP would predict the i=1 k=1 (2)current location to be c6. In practice, depends on the time n K m 1 (r'jAkj)2elapsed from the last update t (when the user enters c8), the =}7J Pk exp1 - Juser can still be in c8 if t is small or he can be in c1o if t is large i=1 k=1 j=l -21Tkj )jand the user's velocity is high. To make more precise where r is the set of all n vectors r1.predictions, the time component of user mobility patterns

dUll,G1 dUl~~~~~~~YI3,G713

fU2l,G2 3 U23,62

Cl C2 C3 C4

Figure 2. A ULP with two user types. Cell-residence time distributions are shown above cell sequence

154

Page 4: [IEEE 2007 IEEE International Conference on Research, Innovation and Vision for the Future - Hanoi, Vietnam (2007.03.5-2007.03.9)] 2007 IEEE International Conference on Research, Innovation

Learning the model. Equation (2) gives the model, from ok, for every user type k of each ULP. We call each user type kwhich data are generated. What we need now is to learn the learnt above together with its ULP U a mobility pattern,model parameters from data. To do this, a standard approach is denoted by Muk. The set of mobility patterns are stored andto estimate the model parameters ,u and ar that maximize the used for location prediction as described next.likelihood of the observed data, that is, parameters that C. Location predictionmaximizeP(r |i,cr) . maximiz P(r lya).Having the mobility patterns Mu estimated above we nowIf we know from which user type k vector r' is generated ym k.then it is easy to calculate the maximum likelihood parameters descri ho t make preion. Let thercnmovmntby computmng the appropriate sufficient statistics. Since we d hitrofamobl temia be gvnbth folwgbycomputing the appropriate sufficient statistics. Since we do parameters: 1) the sequence of the last I cell-ids it visited L = <not have such information, we assume that they are given by X lhidden variables that are not observed and instead use the aLas' .a >, her 1 nlm her m±1Is the lent of theexpectation maximization (EM) algorithm [5] to lean the seULPs (if > m then only the last 1 cells are used) ; 2) the

parameters. ~~~~~~~~~~~~sequence of times it enters the 1 cells T = t1,t2,. ,t,> ;3) theThrameters.E aot iaietepmaoaot current time to. To make prediction, the algorithm determinesThat g aratesto fi a localm o the

optil.ikliho to which mobility pattern L most probably belongs and predict

functio. Staratin from an inalga.ue ofparameters,EM the current cell based on this pattern. The prediction procedurefunction. Starting from an initial guess of parameters, EM cabesm ridasflo.repeats the two steps: the Expectation (E) step and the can beesummarizda fols... . . . ........................ First, the algorithm finds UEPs that contain P in theirMaximization (M) step until convergence. Using a derivation prefixes. We call these ULPs matching ULPs. Assume aprocess similar to that given in [10], we derived the following matching UEP has the formU < Cl, C2,..., cm+, > and LEM algorithm for learning the parameters of the model given matches its subsequence starting at cell ci. Without loss ofin equation (2):

generality, we assume that i=l, that is cl = al, c2= al, ..., Ck= al

EM algorithm: . For example, for path P = < 2,4,5> , ULP < 2,4,5,7> is one ofits matching rule.

1. Initialize parameters. Second, let U have K user types. For all k = 1,...,K the2. E step: For all i = 1,...,n and k 1,...,K, compute the algorithm calculates the probability that L belongs to type k of

posterior probabilities U, which is in essence the probability that L has mobilityPko~~(r' AI Pk, COpattern Muk, as follows:

Pi,k -K 1 rj _lkJ)2"Pko(r Il k,fk) P(L - MUk)=Pk exp- 2 (3)

j=1 -2;TUk 2u kj)where

where Pk, ,Ukj and uk, are model parameters calculated fromz(r'k,Ak,uk)=J7J 1 exp(- (r'j _kj)2 the previous step, rj is the cell-residence time for cell aj and is

-1r' 2Ik2j calculated as rj = tj+l- tj.J=1-~2;T UkJ 2ukjThird, for each mobility pattern Muk above, the algorithm

3. M step: Update parameters withPi,k computed from predicts the current location of the terminal by comparing thethe E step time elapsed from the last update ro = to- tl with mean ukl.

n Specifically, if rO < ukl+ okj then the terminal is predicted to be/ Pi,k still located in cell cl (the last updated cell), otherwise, the

Pk - i=1 predicted cell is cl+. Recall ukl is the mean of cell-residencetime for cell cl, and it is obvious to conclude that the terminal

n

v, pi,krij has moved to the next cell c1+1 if the time period after it entered,/1 J I,k Jthe cell is much longer than the mean.

I-kj n The algorithm stores predicted cells along with values P(LE, Pi,k <-- Muk) for all mobility patterns according to matching ULPsi=1 and sorts them in descending order ofP(L -- Muk).n Last, we define a parameter N, which is maximum numberE (KPik kj j)2 of predictions to make. The first N cells with the highest P(L

2 i= v- Muk) are then selected for paging in descending order ofn probabilities. In our experiments we used only the firstE Pi,k prediction, that is N= 1.

4. Repeat steps 2 and 3 until convergence. IV. EXPERIMENTAL RESULTSIn this section, we discuss the experimental assessment of

The results ofthese steps are estimated values ofPk, IClki and the proposed method (we call it GMM). In practice, the model

155

Page 5: [IEEE 2007 IEEE International Conference on Research, Innovation and Vision for the Future - Hanoi, Vietnam (2007.03.5-2007.03.9)] 2007 IEEE International Conference on Research, Innovation

is to be trained on real movement histories of different mobile .e.i # ofcorrectpredictionsusers. Since there is no such dataset available, we used a precso ofpredictionsmadestatistical mobility model to generate user movement profiles ofcorrect predictionsand evaluated the model on simulated data. recall = o

# ofrequestsA. Simulation designOur simulation is based on the model presented in [7]. In

this model, the network consists of 225 (15xl5) hexagonalcells. To generate user actual paths (UAPs), first a number of Following [13], we compared our method with two otheruser location patterns (ULPs) were generated. The length of a algorithms. The first method predicts location by usingULP is chosen uniformly from {4,5,6,7,8}. Each ULP is taken Transition Matrix (TM) [11]. In this method, a cell-to-cellas a random walk over the network. For each ULP, we defined transition matrix is calculated from past inter-cell movementsK user movement types, which differ from each other by their of mobile users. The elements of this matrix contain thecell-residence times. For user type k, the cell-residence time at probabilities with which a user would move to other cells fromcell ci is characterized by choosing a mean time uniformly the current location. The prediction is made by choosing thefrom 5 minutes to 100 minutes and a standard deviation from 1 most probable cell from the matrix as the predicted cell.to 6. In our experiments we used the same number of user The second method is Ignorant Prediction [13]. Thistypes K = 4 for all generated ULPs. method randomly picks one of the neighboring cells as the next

Next, each UAP is generated by randomly picking a ULP. cell to move in regardless of information about movementIn addition to UAPs that were generated this way and thus history. In fact, this method makes no prediction based on afollow a ULP, we also formed UAPs which are outliers, i.e. priori knowledge of user movements and is used as a basethose which do not follow any ULP. Such UAPs are formed as method to observe the performance of the system in absence ofrandom walks over the mobile network. The ratio of the location prediction. Since in our experiments the user can be innumber of UAP-outliers to the number of all generated UAPs the next cell or in the last updated cell, the two methods wereis denoted by o. For each UAP generated from a ULP, we next modified so that they can choose to predict the last updateddetermined cell-residence times as by a two-step procedure: 1) cell with probability 1/7.a user type k is randomly picked from K predefined types; and D. Results2) cell-residence times are determined by sampling the normaldistribution with mean and standard deviation for user type k First we fixed the outlier percentage o=30% and comparedof the current ULP. Having cell-residence times calculated, it the proposed method GMM with the TM and Ignorant. Foris trivial to form user actual time sequence (UATS) by GMM, the following parameters were used for learning: ULPchoosingva starting time. length m = 3, support threshold 0 = 300. Figure 3 shows

precision values plotted against recall values for the threeB. Training and evaluation methods. Note that because TM and Ignorant never return "NoIn all, 10000 UAPs were generated, from which 9000 prediction" and only one prediction is made for each request,

UAPs were randomly chosen to form the training set and the precision and recall of each of these methods are the same. Therest 1000 UAPs were left as the test set. For each UAP in the results show that GMM outperforms the two other methods bytest set, a prefix of length > 2 is chosen to form the input for a large margin.the predictor. The times elapsed from the last update werechosen so that in 20% of cases the user stays in the lastupdated cell and in 80% of cases the user has moved to the 0.9next cell. 0.8 - * GMM

During the first stage of the algorithm, a ULP is recorded 0.7 -

only if its support exceeds a support threshold 0. Since the 0.6 -

number of user types K is not known in advance, the EM .O 05algorithm was running with different K and the one that gives o. *the maximum likelihood score is chosen. In general, it can beproblematic to choose K by this trial procedure if the domain 0.3 -

ofK is large. Fortunately, for the location prediction problem, 0.2 Ignorantthere are usually a small number of user types, which can be 0.1 -

guessed by analyzing the transportation situation of the city. 0Every time the system is requested to make prediction, it 0 0.2 0.4 0.6 0.8

can return:Recall

- A correct prediction.- An incorrect prediction. Figure 3. Recall and precision of the three prediction methods- No prediction because no matching pattern is found.Therefore, to assess the proposed method, we used two Next, we studied the impact of ULP length m used during

performance measures: precision and recall. the training and prediction. We kept other parameters the same

156

Page 6: [IEEE 2007 IEEE International Conference on Research, Innovation and Vision for the Future - Hanoi, Vietnam (2007.03.5-2007.03.9)] 2007 IEEE International Conference on Research, Innovation

as for the previous experiment and changed m. The recall and On simulated data, the proposed method shows promisingprecision for m = 2 and m = 3 are given in table 1. The results results and outperform two other prediction methods.show a small increase in recall and a significant increase in Important factors that influence the prediction accuracy are theprecision when m increases from 2 to 3. length of mobility pattern and the support threshold. With

small support, the training data may be too sparse for the EMTABLE 1. RECALL AND PRECISION FOR DIFFERENT ULP LENGTHS algorithm, which leads to overfitting, a problem that is well

known in the machine learning community. Also, the successM=3 m =2 of the method highly depends on the choice of training data

precision 0.79 0.70 and thus the method is yet to be studied extensively in realprecision 0.79 0.70 ~~~~~data.

Recall 0.61 0.59REFERENCES

[1] R. Agrawal, R. Srikant,"Mining sequential patterns", Proceedings ofAnother parameter that has impact on prediction accuracy the IEEE Conference on Data Engineering (ICDE_95), 1995, pp. 3-14.

is the support threshold 0. In the next experiment, we [2] I.F.Akyildiz and J.S.M. Ho, "Movement-based location update andmeasured recall and precision while varying 0. Parameter m selective paging for PCS networks", IEEE/ACM Transactions on

was fixed at 2. ~~~~~~~~~~~~~Networking 4(4) (1996) 629-638.was fixed at 2. [3] A. Bhattacharya, S.K. Das, "LeZi Update: an information-theoretic

The precision and recall for different support threshold are approach to track mobile users in PCS networks", ACM Wirelessgiven in table 2. Both precision and recall increase as 0 goes Networks 8 (2-3) (2002) 121-135.

fn[4] G. Chakrabarty, "Efficient location management bymovementfrom 100 to 300. After 0 reaches 300, no significant changes prediction of mobile host", Proc. Int.Workshop on Distributedof precision and recall are observed except for a small Computing IWDC 2002, Lecture Notes in CS, Vol. 2571 (Springerfluctuation of recall at 0 = 400. 2002) pp. 142-153.

[5] A.P. Dempster, N.M. Laird, and D.B. Rubin. "Maximum likelihoodfrom incomplette data via EM algorithm". Journal of the Royal

TABLE 2. RECALL AND PRECISION FOR DIFFERENT SUPPORT THRESHOLDS Statistical Society, Series B, 39(1), pp. 1-38, 1987.[6] T. Liu, P. Bahl and I. Chlamtac, "Mobility modeling, location

precision recall tracking, and trajectory prediction in wireless ATMnetworks", IEEETransactions on Selected Areas of Commn. 16 (1998) 922-936.

O = 100 0.67 0.52 [7] D. Katsaros, A. Nanopoulos, M. Karakaya, G. Yavas, 0. Ulusoy, Y.Manolopoulos, "Clustering mobile trajectories for resource allocation

O =200 0.73 0.57 in mobile environments", in: Intelligent Data Analysis Conference(IDA_2003)Lecture Notes in Computer Science, vol. 2810, Springer-

0 = 300 0.79 0.61 Verlag,2003.[8] G.L. Lyberopoulos, J.G. Markoulidakis, D.V. Polymeros, D.F. Tsirkas

O =400 0.79 0.65 and E.D. Sykas, "Intelligent paging strategies for third generationmobile telecommunication systems", IEEE Transactions on Vehicular

O = 500 0.79 0.61 Technology 44(3) (1995) 543-553.[9] K. Majumdar and N. Das, "Mobile user tracking using a hybrid neural

O= 600 0.79 0.60 network, Wireless networks", 11 (2005), pp. 275-284.*,________-________________*______*____ [10] T. Mitchell, Machine learning, McGrawhil, 1997.[11] S. Rajagopal, R.B. Srinivasan, R.B. Narayan, X.B.C. Petit, "GPS-

based predictive resource allocation in cellural networks", Proceedingsof the IEEE International Conference on Networks (IEEE ICON_02),2002, pp.229-234.

V. CONCLUSION [12] C. Rose, "Minimizing the average cost of paging and registration: Atimer-based method", Wireless Networks 2(2) (1996) 109-116.

In this paper, a new method for moblity prediction has [13] G.Yavas, D.Katsaros,O.Ulusoy and Y.Manolopoulos. "A data miningbeen proposed. The method uses Gaussian mixture models to appoach for location prediction in mobile environments". Data &characterize regularities in cell-residence times of mobile Knowledge Engineering 54 (2005), pp. 121-146users. In combination with frequently followed trajectories, [14] V.W.S. Wong and V.C.M. Leung, "An adaptive distance-basedthese models form mobility pattems, which used to predict location update algorithm for next-generation PCS networks", IEEEthese models form mobility patterns, which are used to predict Transactions on Vehicular Technology 19(10) (2001) 1942-1952.the location of a mobile user at a given time point. The main [15] H.-K. Wu, M.-H. Jin, J.-T. Horng, C.-Y. Ke, "Personal paging areafeature of this method is that it explicitly models the time design based on mobile's moving behaviors", Proceedings of the IEEE

of user movements and uses the models to find Conference on Computer and Communications (IEEEcomponent Ousrmvmnsaauetn oesoIn INFOCOM 01), 2001, pp. 21-30.clusters of similar user profiles. We believe this is anadvantage of our method over the methods that ignore timecharacteristics of user movements.

157