mining interesting locations and travel sequences from gps trajectories idb & ids lab. seminar...

35
Mining Interesting Locations and Travel Sequences from GPS Trajectories IDB & IDS Lab. Seminar Summer 2009 강 강 강 [email protected] July 23 rd , 2009 Yu Zheng, Lizhu Zhang, Xing Xie, Wei-Ying Ma WWW 2009 Center for E-Business Technology Seoul National University Seoul, Korea Microsoft Research Asia Intelligent Database Systems Lab.

Upload: angelina-aubrie-clarke

Post on 29-Dec-2015

223 views

Category:

Documents


2 download

TRANSCRIPT

Mining Interesting Locations and Travel Se-quences from GPS Trajectories

IDB & IDS Lab. Seminar

Summer 2009

강 민 석[email protected]

July 23rd, 2009

Yu Zheng, Lizhu Zhang, Xing Xie, Wei-Ying Ma

WWW 2009

Center for E-Business TechnologySeoul National UniversitySeoul, Korea

Microsoft Research Asia

Intelligent Database Systems Lab.

Copyright 2009 by CEBT

Abstract

Mining Interesting Locations and Travel Sequences from GPS Trajectories

GPS log : record users’ outdoor movements with GPS

By mining multiple users’ location histories,discover interesting locations and travel sequences in a given region

Problem

How to model multiple users’ location history from GPS log

How to infer the interest level of a location Location interest not only depend on the number of visiting, but also users’ travel expe-

riences.

How to detect classical sequences in a given region

2

timestamp Latitude longitude07-01

12:30:00N 33º 30’

19.5”E 126º 29’

35.3”07-01

12:30:30N 33º 30’

19.4”E 126º 29’

35.2”07-01

12:31:00N 33º 30’

19.2”E 126º 29’

35.3”07-01

12:31:30N 33º 30’

19.1”E 126º 29’

35.3”07-01

12:32:00N 33º 30’

19.1”E 126º 29’

35.4”

timestamp Latitude longitude07-01

12:30:00N 33º 30’

19.5”E 126º 29’

35.3”07-01

12:30:30N 33º 30’

19.4”E 126º 29’

35.2”07-01

12:31:00N 33º 30’

19.2”E 126º 29’

35.3”07-01

12:31:30N 33º 30’

19.1”E 126º 29’

35.3”07-01

12:32:00N 33º 30’

19.1”E 126º 29’

35.4”

timestamp Latitude longitude07-01

12:30:00N 33º 30’

19.5”E 126º 29’

35.3”07-01

12:30:30N 33º 30’

19.4”E 126º 29’

35.2”07-01

12:31:00N 33º 30’

19.2”E 126º 29’

35.3”07-01

12:31:30N 33º 30’

19.1”E 126º 29’

35.3”07-01

12:32:00N 33º 30’

19.1”E 126º 29’

35.4”

Contents

Introduction

Modeling Location History

Location Interest Inference

Experiments

Related Work

Conclusions

3

Copyright 2009 by CEBT

Introduction

GPS log Recently, many users record their outdoor movements with GPS.

Travel experience sharing, Life Logging, Sports activity

GPS devices are changing the way people interact with the Webby using locations as contexts.

4

Copyright 2009 by CEBT

Introduction

GPS log Let’s look at my GPS Trajectories!

5

removed some photosfor privacy

6

Copyright 2009 by CEBT 7

Copyright 2009 by CEBT

Introduction

Architecture System comprises of three parts

Location history modeling, location interest & sequence mining, recom-mendation

8

Tree-Based Hierarchical Graph

HITS-Based Inference Model

User Travel Experience

Location Interest

Location History Modeling

Location Interest and Sequence Mining

Recommendation

ModelingLocation History

GPS Logs

Experienced Users

Interesting Locations

Travel SequencesMining TravelSequences

Location Recommender

Contents

Introduction

Modeling Location History

GPS Trajectory & Stay Point

Location History

Tree-Based Hierarchical Graph (TBHG)

Location Interest Inference

Experiments

Related Work

Conclusions

9

Copyright 2009 by CEBT

Modeling Location History

GPS Trajectory GPS point : contain (timestamp, latitude, longitude)

GPS log : a collection of GPS points

GPS trajectory : sequentially connect GPS points

Stay Point geographic region where a user stayed over a certain period time

interval

Time threshold T : stay over T (e.g. 20 min)

Distance threshold D : distance between two points is less than D (e.g. 200 m)

10

timestamp Latitude longitude

07-01 12:30:00

N 33º 30’ 19.5”

E 126º 29’ 35.3”

07-01 12:30:30

N 33º 30’ 19.4”

E 126º 29’ 35.2”

07-01 12:31:00

N 33º 30’ 19.2”

E 126º 29’ 35.3”

07-01 12:31:30

N 33º 30’ 19.1”

E 126º 29’ 35.3”

07-01 12:32:00

N 33º 30’ 19.1”

E 126º 29’ 35.4”

07-01 12:32:30

N 33º 30’ 19.1”

E 126º 29’ 35.4”

07-01 12:33:00

N 33º 30’ 19.2”

E 126º 29’ 35.4”

Copyright 2009 by CEBT

Modeling Location History

Location History represented as a sequence of stay points

with corresponding arrival and leaving times

11

S1S2

S3

S4S5

S6

S7

Home

Supermarket

Company

Restaurant

S8

S9 S10

Copyright 2009 by CEBT

Modeling Location History

Model multiple users’ location histories Location history of various people are inconsistent and incompara-

ble

stay points of different individuals are not identical

Considering the scale of location

12

A

B

S1S2

S3

S4S5

S6

S7

Home

Supermarket

Company

Restaurant

S8

S9 S10

C1C2

C3

C4

Copyright 2009 by CEBT

Modeling Location History

Tree-Based Hierarchy Build a tree using a hierarchical clustering algorithm

Density-based clustering algorithm OPTICS (Ordering Points to Identify the Clustering Structure)

Hierarchically cluster stay points into some geospatial regions

Different levels denote different geospatial granularity

13

Copyright 2009 by CEBT

Modeling Location History

Tree-Based Hierarchical Graph (TBHG)

1. Formulate a Tree-based Hierarchy

Hierarchically cluster stay points

2. Build Graphs on each Level

Link is generated when consecutive stay points are contained in two clus-ters

14

Copyright 2009 by CEBT

Modeling Location History

Tree-Based Hierarchical Graph (TBHG) location history can be represented by a sequence of stay point

clusters with transition time between two clusters on different geospatial scales

15

S1S2

S3

S4S5

S6

S7

Home

Supermar-ket

Com-pany

Restaurant

S8

S9S10

C1C2

C3

C4

S1S2

S3

S4 S5

S6S7 S8

S9 S10

A

B

Contents

Introduction

Modeling Location History

Location Interest Inference

HITS-Based Inference Model

Mining Classical Travel Sequences

Experiments

Related Work

Conclusions

16

Copyright 2009 by CEBT

Location Interest Inference

HITS (Hypertext Induced Topic Search) search query dependent ranking algorithm for Web IR

produce two rankings

Hub : web page with many out-links

Authority : web page with many in-links

Hub and Authority have a mutual reinforcement relationship

17

Copyright 2009 by CEBT

Location Interest Inference

HITS-Based Inference Model regard an user’s visit to a location as

an implicitly directed link from the user to that location

Hub and Authority

Hub : a user who has accessed many places → users’ travel experiences

Authority : a location which has been visited by many users → location interest

mutual reinforcement relationship

Users’ travel experiences (hub scores) & interest of locations (au-thority scores)

18

Copyright 2009 by CEBT

Location Interest Inference

Data Selection Strategy Motivation

User’s travel experience is region-related.

need to specify a geospatial region before conducting HITS-based infer-ence

Strategy

calculate scores using regions specified by their ascendant clusters

can have multiple authority and hub scores based on the different region scales

19

Copyright 2009 by CEBT

Location Interest Inference

Inference Build adjacent matrix between users and locations

mutual reinforcement relationship of user travel experience and location interest

Iterative process for generating the final results

Calculate authority and hub scores using the power iteration method

20

Copyright 2009 by CEBT

Mining Classical Travel Sequences

calculate Score for each Location Sequence the Travel Experiences of Users taking this sequence

Hub scores of the user

the Interests of the Locations contained in the sequence

Authority scores of the locations in this sequence

21

5 users have taken A→CWe know each user’s hub score.

What is the classical score of sequence A→C→D

TBHG We know location C’s authority score.

Copyright 2009 by CEBT

Mining Classical Travel Sequences

calculate Score for each Location Sequence the Travel Experiences of Users taking this sequence

Hub scores of the user

the Interests of the Locations contained in the sequence

Authority scores of the locations in this sequence

Authority scores are weighted based on the probability to take sequence

22

What is the classical score of sequence A→C→D

Authority score of location A

Hub score of Users

Probability of moving out from A to this sequence

Contents

Introduction

Modeling Location History

Location Interest Inference

Experiments

Related Work

Conclusions

23

Copyright 2009 by CEBT

Experimental Settings

GPS Data GPS devices to collect data

Users

107 users record their outdoor movements

get payments based on the distance of GPS log

Data

mostly in China, some in the USA, Korea, Japan

1 year (from May 2007 to Oct. 2008)

5 million GPS points (166,372 km)

Parameter Stay Point

extracted 10,354 stay points

Clustering

159 clusters (4th level TBHG)

24

Copyright 2009 by CEBT

Evaluation Approaches

Evaluation Explore effectiveness of location & travel recommendation by a user study

29 subjects who have been in Beijing for more that 6 years

Two Aspects of Evaluation Presentation

the ability of the retrieved interesting locations in presenting a given region

Representative, Comprehensive, Novelty

Rank

The ranking performance of the retrieved locations based on relative interests

User Desirability Rating on each location & each sequence

employ two criteria – nDCG and MAP

Baseline Interesting Locations

rank-by-count, rank-by-frequency

Classical Travel Sequences

rank-by-count, rank-by-interests, rank-by-experience

25

Copyright 2009 by CEBT

Experimental Results

Results outperformed baseline approaches

Investigations Advantages of the hierarchy of the TBHG

Help users understand the region step-by-step (level-by-level)

can be used to specify users’ travel experiences in different regions

26

Contents

Introduction

Modeling Location History

Location Interest Inference

Experiments

Related Work

Mining Location History

Location Recommenders

Conclusions

27

Copyright 2009 by CEBT

Related Work

Mining Location History Individual location history

Detect significant locations of a user

Predict user’s movement

Recognize user-specific activities at each location

Multiple users’ location history

Mining similar sequences

Predict where a driver may be going

Recognize the social pattern in daily user activity

28

Copyright 2009 by CEBT

Related Work

Location Recommenders Recommenders based on real-time location

Mobile Tourist Guide System

Recommenders based on location history

More Personalized recommendation using location history

Recommend geographic locations like shops or restaurants

Enhance collaborative filtering solution

29

Contents

Introduction

Modeling Location History

Location Interest Inference

Experiments

Related Work

Conclusions

30

Copyright 2009 by CEBT

Conclusion

Mining Interesting Locations and Travel Sequences from GPS

propose a tree-based hierarchical graph (TBHG), which can model multiple users’ location history

propose a HITS-based model to infer users’ travel experiences and interest of a location within a region

consider users’ travel experiences and location interests, and mine travel se-quences

evaluate methodology using large GPS dataset

31

Tree-Based Hierarchical Graph

HITS-Based Inference Model

User Travel Experience

Location Interest

Location History Model-ing

Location Interest and Sequence Mining Recommendation

ModelingLocation History

GPS Logs

Experienced Users

Interesting Locations

Travel SequencesMining TravelSequences

Location Recommender

Copyright 2009 by CEBT

Conclusion

Implications

Help understand the correlation between users and locations

Enable location and travel recommendation

Step towards enhancing mobile Web from multiple users’ location histories

Improve location-based services by integrating social networking into mobile Web

GeoLife project

Building social networks using human location history

a location-based social-networking service on Microsoft Virtual Earth.

enables users to share life experiences and build connections among each other using human location history.

32

Copyright 2009 by CEBT

Discussion

Discussion about this paper (talked with Sungchan) Modeling Location History

Stay point detection is simple and easy to apply

Hierarchy model is appropriate to zoom in/out map

HITS-based Location Interest Inference

Pretty Reasonable : consider user’s travel experience is better than rank-by-count

But, try another way to find location interest and user travel experience

Travel Sequence

too naïve for calculating sequence score

Motivation Context-aware Service

Time + Location

33

Copyright 2009 by CEBT

References

This Slide Some Images from

GeoLife : Building social networks using human location history, Microsoft Research

Y. Zheng, Mining Individual Life Pattern Based on Location History: A Paradigm and Framework, Slide, 2009

References [5], [7], [14], [18]

GeoLife Project Paper

Yu Zheng and Xing Xie, Mining Individual Life Pattern Based on Location History, IEEE, 2009

Yu Zheng, Xing Xie, and Wei-Ying Ma, GeoLife2.0: A Location-Based Social Networking Service, IEEE, 2009

Yu Zheng, Xing Xie, and Wei-Ying Ma, Mining Interesting Locations and Travel Sequences From GPS Trajectories, ACM, 2009

Quannan Li, Yu Zheng, Xing Xie, and Wei-Ying Ma, Mining user similarity based on location history, ACM, 2008

Yu Zheng, Xing Xie, and Wei-Ying Ma, Understanding mobility based on GPS data, ACM, 2008

Yu Zheng and Xing Xie, Learning Transportation Mode from Raw GPS Data for Geographic Application on the Web, ACM, 2008

34

35

Clustering the Tagged Web

Thank you~