the sigspatial special · ing. ridesharing can be either static or dynamic [9, 10]. most...

45
The SIGSPATIAL Special Volume 7 Number 3 November 2015 Newsletter of the Association for Computing Machinery Special Interest Group on Spatial Information

Upload: others

Post on 28-May-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The SIGSPATIAL Special · ing. Ridesharing can be either static or dynamic [9, 10]. Most ridesharing systems operating today belong to static ridesharing, which arrange the driver

The SIGSPATIAL Special

Volume 7 Number 3 November 2015

Newsletter of the Association for Computing Machinery

Special Interest Group on Spatial Information

Page 2: The SIGSPATIAL Special · ing. Ridesharing can be either static or dynamic [9, 10]. Most ridesharing systems operating today belong to static ridesharing, which arrange the driver

i

The SIGSPATIAL Special

The SIGSPATIAL Special is the newsletter of the Association for Computing Machinery (ACM) Special

Interest Group on Spatial Information (SIGSPATIAL).

ACM SIGSPATIAL addresses issues related to the acquisition, management, and processing of spatially-

related information with a focus on algorithmic, geometric, and visual considerations. The scope includes,

but is not limited to, geographic information systems.

Current Elected ACM SIGSPATIAL officers are:

Chair, Mohamed Mokbel, University of Minnesota

Past Chair, Walid G. Aref, Purdue University

Vice-Chair, Shawn Newsam, University of California at Merced

Secretary, Roger Zimmermann, National University of Singapore

Treasurer, Egemen Tanin, University of Melbourne

Current Appointed ACM SIGSPATIAL officers are:

Newsletter Editor, Chi-Yin Chow (Ted), City University of Hong Kong

Webmaster, Ibrahim Sabek, University of Minnesota

For more details and membership information for ACM SIGSPATIAL as well as for accessing the

newsletters please visit http://www.sigspatial.org.

The SIGSPATIAL Special serves the community by publishing short contributions such as SIGSPATIAL

conferences’ highlights, calls and announcements for conferences and journals that are of interest to the

community, as well as short technical notes on current topics. The newsletter has three issues every year,

i.e., March, July, and November. For more detailed information regarding the newsletter or suggestions

please contact the editor via email at [email protected].

Notice to contributing authors to The SIGSPATIAL Special: By submitting your article for distribution in

this publication, you hereby grant to ACM the following non-exclusive, perpetual, worldwide rights:

to publish in print on condition of acceptance by the editor,

to digitize and post your article in the electronic version of this publication,

to include the article in the ACM Digital Library,

to allow users to copy and distribute the article for noncommercial, educational or research

purposes.

However, as a contributing author, you retain copyright to your article and ACM will make every effort to

refer requests for commercial use directly to you.

Notice to the readers: Opinions expressed in articles and letters are those of the author(s) and do not

necessarily express the opinions of the ACM, SIGSPATIAL or the newsletter.

The SIGSPATIAL Special (ISSN 1946-7729) Volume 7, Number 3, November 2015.

Page 3: The SIGSPATIAL Special · ing. Ridesharing can be either static or dynamic [9, 10]. Most ridesharing systems operating today belong to static ridesharing, which arrange the driver

ii

Table of Contents

Page

Message from the Editor…………………………………………………………………..

Chi-Yin Chow

1

Section 1: Special Issue on Mobile Data Analytics

Introduction to this Special Issue: Mobile Data Analytics…………...………...……….

Chi-Yin Chow

2

Dynamic Ridesharing……...……………………………………………………………....

Bilong Shen, Yan Huang, and Ying Zhao

3

Geo-social Media Data Analytic for User Modeling and Location-based Services........

Jie Bao, Defu Lian, Fuzheng Zhang, and Nicholas Jing Yuan

11

How Events Unfold: Spatiotemporal Mining in Social Media..........................................

Ting Hua, Liang Zhao, Feng Chen, and Chang-Tien Lu 19

Point-of-Interest Recommendations in Location-Based Social Networks……………...

Jia-Dong Zhang and Chi-Yin Chow

26

Section 2: Event Reports

ACM SIGSPATIAL GeoPrivacy 2015 Workshop Report…………………………..…..

Grant McKenzie, Krzysztof Janowicz, and Gueorgi Kossinets

34

ACM SIGSPATIAL EM-GIS 2015 Workshop Report.....................................................

Hui Zhang, Yan Huang, and Jean-Claude Thill

35

ACM SIGSPATIAL MELT 2015 Workshop Report........................................................

Ying Zhang and Bodhi Priyantha

36

ACM SIGSPATIAL IWGS 2015 Workshop Report.........................................................

Chengyang Zhang, Farnoush Banaei-Kashani, and Abdeltawab Hendawi

37

Page 4: The SIGSPATIAL Special · ing. Ridesharing can be either static or dynamic [9, 10]. Most ridesharing systems operating today belong to static ridesharing, which arrange the driver

Message from the Editor

Chi-Yin ChowDepartment of Computer Science, City University of Hong Kong, Hong Kong

Email: [email protected]

In the first section, we have a special issue of some topic of interest to the SIGSPATIAL community. Thetopic of this issue is “Mobile Data Analytics” which is edited by our associate editor Dr. Chi-Yin Chow (Ted).Dr. Chow is currently an Assistant Professor in the Department of Computer Science, City University of HongKong.

The second section consists of four event reports from:

1. The 2nd ACM SIGSPATIAL International Workshop on Privacy in Geographic Information Collectionand Analysis (ACM SIGSPATIAL GeoPrivacy 2015)

2. The 1st ACM SIGSPATIAL International Workshop on the Use of GIS in Emergency Management (ACMSIGSPATIAL EM-GIS 2015)

3. The 5th ACM SIGSPATIAL International Workshop on Mobile Entity Localization and Tracking in GPS-less Environments (ACM SIGSPATIAL MELT 2015)

4. The 6th ACM SIGSPATIAL International Workshop on GeoStreaming (ACM SIGSPATIAL IWGS 2015)

I would like to sincerely thank all the newsletter authors, Dr. Chow, and event organizers for their generouscontributions of time and effort that made this issue possible. I hope that you will find the newsletters interestingand informative and that you will enjoy this issue.

You can download all Special issues from:

http://www.sigspatial.org/sigspatial-special

.

1

Page 5: The SIGSPATIAL Special · ing. Ridesharing can be either static or dynamic [9, 10]. Most ridesharing systems operating today belong to static ridesharing, which arrange the driver

The SIGSPATIAL Special

Section 1: Mobile Data Analytics

ACM SIGSPATIAL

http://www.sigspatial.org

Page 6: The SIGSPATIAL Special · ing. Ridesharing can be either static or dynamic [9, 10]. Most ridesharing systems operating today belong to static ridesharing, which arrange the driver

Introduction to this Special Issue: Mobile Data Analytics

Chi-Yin ChowDepartment of Computer Science, City University of Hong Kong, Hong Kong

With the rapid advancement of mobile devices and location acquisition technologies, more and more large-scale location-based data have been available for data analytics and realizing new location-aware services. Var-ious large-scale location-based data including user-generated data in location-based social networks (LBSNs)and GPS data reported from vehicles or sensors have led to research challenges and opportunists in location-based services, intelligent transportation systems, geographic information systems, urban computing, and smartcities. Different computer science techniques, e.g., data mining, machine learning, artificial intelligence, andspatial and spatio-temporal databases, can be used or combined to address such challenges.

This special issue consists of four contributions that address different problems in the research area of mobileanalytics. The first contribution is about a dynamic ridesharing framework which allows real-time dynamicmatching of travel requests with available cards under certain constrains in large scale. The second contributionintroduces the recent advances in location-based user preferences modeling for location-based recommendationsand prediction through geo-tagged social media data analytics. Ting Hua et al. contribute the third article whichfocuses on the allocation of social media analytics for spatio-temporal event mining. In the fourth contribution,Jia-Dong Zhang and Chi-Yin Chow summarize their recent research project which fuses social, categorical,geographical, sequential, and temporal influences for point-of-interest recommendations in LBSNs.

I hope the readers will enjoy reading this issue and find it useful in their research work.

2

Page 7: The SIGSPATIAL Special · ing. Ridesharing can be either static or dynamic [9, 10]. Most ridesharing systems operating today belong to static ridesharing, which arrange the driver

Dynamic Ridesharing

Bilong Shen1, Yan Huang2, Ying Zhao3

1Computer Science, Tsinghua University, China2Computer Science and Engineering, University of North Texas, U.S.A

3Computer Science, Tsinghua University, China

Abstract

Ridesharing, which offers empty seats in a car to other passengers, is an efficient way of transporta-tion. In this way, the utilization of seats can be improved and the number of cars used can be reduced.Ridesharing has the potential to solve the problems of congestion, pollution, high travel cost, and en-ergy. The development of internet, smart phone, GPS allows dynamic matchings of travel requests withavailable cars through real-time travel planning systems. However, matching requests and cars undercertain constrains in large scale remains challenging. In this paper, we formally address the problemof dynamic ridesharing and introduce the solution framework of filter and refine, under which we sum-marize existing state-of-the-art works. Finally, we point out possible research directions and problemsneeded to be solved.

1 Introduction

With the development of the urban and metropolitan, the number of private cars is growing tremendous. The pri-vate cars not only bring convenience, but also bring worsening traffic congestion, increasingly serious pollution,and increasing energy consumption. On the one hand the number of vehicles is growing, but on the other handthe rate of empty seats on the moving vehicles has not been improved. The average occupancy rate of the privatecar in United States is only 1.6 persons per vehicle mile [10]. Ridesharing is a pattern of transportation in whichpeople with similar itineraries and time schedules utilize spare seats in the vehicle and share the travel cost [9].Numerous studies show that ridesharing is a good solution with triple Win [4, 9]. First win is the participants(e.g., reducing of the cost of the drivers and the riders by splitting the cost of gas, toll ,and parking fee, moreconvenient for traveling). Second win is the environment(e.g., less emissions, less noise, less fuel consumption).Third win is the social( e.g., alleviating traffic jams, integration of idle resources). As the so many benefits, moreand more people be attracted to the travel pattern of ridesharing.

Currently, many mobile phone applications are providing their ridesharing services, e.g. Avego, Mitfahrgele-genheit, Zimride, Carpooling, Blablacar, and Didi. Mobile transportation platforms such as Uber also provideridesharing travel options. Participants, society, and environment can get more and more benefit from rideshar-ing. Ridesharing can be either static or dynamic [9, 10]. Most ridesharing systems operating today belong tostatic ridesharing, which arrange the driver and passengers before trips start and the matching can’t be madeafter a trip starts. Dynamic ridesharing is a service which can match real-time trip requests with running ve-hicles, is more convenient, and provide more flexibility to the passengers and drivers. Because of the potentialadvantage of real-time dynamic ridesharing, much research effort has be launched recently.

The real-time ridesharing at urban scale brings benefits and challenges at the same time. The core technicalchallenge is the complexity of the matching process. The static ridesharing route planing algorithm is not

3

Page 8: The SIGSPATIAL Special · ing. Ridesharing can be either static or dynamic [9, 10]. Most ridesharing systems operating today belong to static ridesharing, which arrange the driver

suitable for real-time matching. The first challenge is that the vehicles are moving fast on the road network.The second challenge is that the route planning must not only satisfy the constraints of the new request but alsothe requests confirmed. Various research work has been done, e.g. minimizing the vehicle’s traveling distance[2, 17, 8, 1, 15, 7, 3], maximizing the rate of the requests been matched [1, 15, 11, 3], and minimizing the systemresponse time [17, 15, 11, 20, 7, 3, 12, 16, 22, 2] . And many studies have been conducted based on real datasets. In this paper we organize existing work of real-time ridesharing under a filter and refine framework.

We recognize that computation is one of the technical problems to be solved in order to enable wide adoptionof ridesharing. Other factors such inter-personal interaction, safety, social discomfort, and pricing are alsoimportant. The reputation systems and the pricing mechanism for dynamic real-time ridesharing have beenstudied studied by the researchers. Ridesharing is an social activity and trust is very important to enable suchan activity. The research in the trust area is still in its early stage. Researchers have tried to use call descriptionrecords to quantify the potential of ridesharing [5] and others investigated the reputation system in a ridesharingsystem [21]. Different pricing mechanisms are also been studied with different goals including fairness, deficitcontrol, and promoting the rate of matching [14, 23] and privacy issues are also be considered[18].

2 Dynamic Ridesharing System

2.1 Overview

In a dynamic ridesharing service system, as shown in Figure 1, a set of vehicles running over a road network.Real time requests, consisting of two points, an origin and a destination, are received in real-time. Each requestalso specifies two constrains, a waiting time, defining the maximal time the rider can wait after making therequest, and a service constraint, defining the acceptable extra detour time from the shortest possible trip durationfor ridesharing. After the request is submitted to the service server, the server processes it immediately. Theserver will match a request with an appropriate vehicle based on road network and travel constraints of therequest and plan an efficient scheduling. When matching and scheduling are made, the constraints of the newrequest and the requests already assigned to the vehicle must be both satisfied.

Driver

Vehicle FilterConstrain

matchRoute Plan

Empty Seat

Current Route

Current

Postion

Match

Service

Passenger

Origin

Destination

Constrain

Request

Match

Server

Road

Network

Traffic

Data

History

Data

Figure 1: Dynamic ridesharing system framework

4

Page 9: The SIGSPATIAL Special · ing. Ridesharing can be either static or dynamic [9, 10]. Most ridesharing systems operating today belong to static ridesharing, which arrange the driver

2.2 Basic Definition

A road network G = ⟨V,E,W ⟩ consists of a vertex set V and an edge set E. V represents the intersection ofthe road. Each edge (u, v) ∈ E(u, v ∈ V ) is associated with a weight W (u, v) which represents the cost formu to v. The cost can be either time or distance which can be converted from one to the other. The minimal costfrom x to y (x, y ∈ V ) on the road network is denoted by dist(x, y).

Definition 1: Trip Request. Over a road network, a trip quest tr = ⟨o, d, pn, tow, tdw, ϵ⟩ is defined by anorigin o ∈ V , a destination d ∈ V , a number of passengers pn, a pick up time window tow = ⟨to.s, to.e⟩ (to.sand to.e represent the start and the end time when the passenger can be picked up) , a drop off time windowtdw = ⟨td.s, td.e⟩ (td.s and td.e represent the start and the end time when the passenger can be dropped off at thedestination), a service constraint ϵ (the extra detour acceptable in a trip, bounding the overall time from o to dby (1 + ϵ)dist(o, d)).

In a particular road network G, the total trip requests set of the current time is denoted by TR.

Definition 2: Car State. A cruising car is represented by car = ⟨id, t, loc, TRrec, shchedule, ncap, nemp⟩,where id is a unique ID, t is a timestamp, loc is the current location of car, TRrec is a set of accepted trips,schedule is the route schedule of accepted trips, ncap is the total number of seats of the car, and nemp is theavailable seats at current time . A schedule is represented by schedule = ⟨v0, v1, ..., vk⟩, (vi ∈ V ,(vi, vi+1) ∈E), where the cost of the schedule is Σk

0W (vi, vi+1), (vi ∈ V ).In a particular road network, all the cars of the current time is represented by C = ⟨car⟩.

Definition 3: Dynamic Ridesharing. Given a set of cars C on the road network G at a particular time, thereal-time trip requests set TR, the goal is to match tr ∈ TR to car ∈ C in a particular optimization goal underthe dynamic ridesharing constraints.

In the subsection below, the optimization goal and the constraints will be discussed.

2.3 Constraints

Different ridesharing system has different constraints such as waiting time, detour as well as safety, pricingconcerns, and social comfort. In this paper we focus on the service constraints. The following are the majorconstraints in dynamic ridesharing.

• Available Seats Constraint. The number of riders of a trip tri is not allowed to exceed the number ofavailable seats of the car:

tri.pn ≤ car.nemp. (1)

• Time Constraint. As mentioned above, after a trip is assigned to a car, the time window of all the triptr, including the new trip received and the trips already assigned to the car, should be satisfied. Let tr.tvorepresent the time that the car can pick the rider at tr.o, and tr.tvd represent the time that the car can dropoff the rider at tr.d.

∀tri ∈ car.TRrec, tri.to.s ≤ tri.trvp ≤ tri.to.e, tri.td.s ≤ tri.tvd ≤ tri.td.e (2)

• Detour Constraint. Equally important is the detour constraint. A trip can be matched to a car, only ifthe detour rate of all the trip tr including the new trip received and the trips already assigned to the car,are satisfied. The cost of a trip tri from tri.o to tri.d without ridesharing is dist(tri.o, tri.d). Duringridesharing a route may include detour. Let distr(tri.o, tri.d) represent the total distance of the trip ona possibly detoured ridesharing route between the origin tri.o at time tri.tvo and the destination tri.d attime tr.tvd. Then δi = distr(tri.o, tri.d)/dist(tri.o, tri.d).

∀tri ∈ car.TRrec, δi ≤ ϵ (3)

5

Page 10: The SIGSPATIAL Special · ing. Ridesharing can be either static or dynamic [9, 10]. Most ridesharing systems operating today belong to static ridesharing, which arrange the driver

Table 1: Constraints and Optimization GoalsResearch Method Author Driver Cost Matching Rate Time window Detour Cost

SIGMOD13 [20]; VLDB14 [12] Noah Yan Huang et al.√ √ √

EUR J OPER RES14 [16] People and parcels sharing taxis. Baoxiang Li et al.√ √ √

ICTAI14 [2] Minimising the Driving Distanc Vincent Armant and√ √

ICDE12 [17] T-Share Shuo Ma et al.√ √

ITSC12 [8] Distributed Taxi-Sharing System Pedro M. dOrey et al.√ √

TRANSPORT RES B-METH11 [1] A simulation study in metro Atlanta Niels A.H. Agatz et al.√ √ √

IJCAI11[15] Parallel Auctions Alexander Kleiner et al.√ √

MATES09 [22] SMIZE Xin Xing et al.√

EDBT08 [11] Highly scalable trip grouping GyozoGidofalvi et al.√

EUR J OPER RES06 [7] A two-phase insertion technique Luca Coslovich et al.√ √

PARALLEL COMPUT04 [3] Parallel Tabu search heuristics Andrea Attanasio et al.√ √ √

2.4 Optimization Goals

Different optimization methods have been studied in recent years. From the driver side, the optimization goalmay be minimizing the driving distance. For the riders, the goal could be maximizing the chance of finding avehicle with given service constraints. Table ?? summarizes studies with different constraints and optimizationgoals.

3 Optimization Method

The main challenge of dynamic ridesharing is to deal large number of trip requests and cars in real-time. Themost effective way to deal with this problem is to use the framework of Filter and Refine. In this section, weintroduce the Filter and Refine framework and discuss different studies under this framework.

3.1 The Challenges

In a dynamic ridesharing system, assigning a rider request to a car is not a simple pair matching problem. Whenthe ridesharing system gets a new request, the cars moving on the road network may already been assigned sometrips. To find the car to assign the tr to, we need to consider not only tr.o and tr.d, but also the trips alreadyin TRrec. The combined route should be replanned on the road network to satisfy all the constraints of thenew trip and the trips in TRrec. As can been seen in Figure 2, when a request is submitted to the ridesharingsystem, a car is cruising to position P and TRrec = ⟨tr1, tr2, tr3⟩. For tr2 the trip has finished. The rider oftr1 is already in the car, but has not arrived tr1.d, the rider of tr3 has not been picked up, and the schedule atthis time is schedule = ⟨tr1.d, tr3.o, tr3.d⟩. To determine whether the constraints of tr4 can be satisfied bythe car, we need to add tr4 and reshuffle the new schedule = ⟨tr1.d, tr3.o, tr3.d, tr4.o, tr4.d⟩ to find the validschedules that meet all the time window constraints and detour constraints of tr1, tr3, and tr4. This problemis a form of Traveling Salesman Problem and has been proved to be a NP-hard problem[19, 17, 12]. The large

!2."

!3.#

!3."

!1." !4.#

Figure 2: Possible-schedules with a new request

6

Page 11: The SIGSPATIAL Special · ing. Ridesharing can be either static or dynamic [9, 10]. Most ridesharing systems operating today belong to static ridesharing, which arrange the driver

number of trip requests and the enormous number of running cars in mega cities challenges the scalability ofmatching algorithms. For example, in Beijing, there are more than 5,591,000 vehicles finishing around 9,090,000passenger trips per day in 2013. In addition, a request not matched to a car may be resubmitted one or moretimes and generate more requests.

The route searching algorithm such as branch-and-bound [13] and integer programming [6] are designedfor offline computation. Their calculation time was measured in minutes or hours while the real-time dynamicridesharing requires millisecond response time.

3.2 The Filter and Refine Framework

Obviously, it is very expensive to go through all of the cars to match a trip request. The solution of the real-timeproblem is the framework of Filter and Refine. The core idea of this framework is to filter and conquer: filter thevehicles and trip requests and then conquer the problem in a small scale. Step 1 Filter: given G = ⟨V,E,W ⟩,trip request set TR, and the cars on the road network set C, remove the elements from the set of C and the set ofTR which do not have matching possibility. An alternative is to cluster the requests into some groups and processthem as a batch. After the filtering, the Cfiltered, (|Cfiltered| ≤ |C|) and TRfiltered, (|TRfiltered| ≤ |TR|) areprepared for the next step. Step 2 Refine: specific algorithms are applied to get the matching pairs under theconstraints such as time and detour.

3.3 The Filter Method

The target of filter step is to reduce the scale of the problem. When a request is received, the ridesharing systemmatches the request with the cars cruising on the network. The challenge of this step is to filter the movingobjects on the road network using a quick and efficient mechanism. Though spatial index method as R − tree,R∗ − Tree ,TPR − Tree is proposed, they are not designed for the ridesharing problem. So the direct usageof these structures may bring lots of update cost on the nodes. Some research has been done to speed up thefiltering process. For example in [7], the system calculates the feasible neighbourhoods of the current routebetween the stops so that when a new request comes, the candidate stops can be retrieved quickly from the stopsin the neighborhoods. Depending on method, for a route with n possible stops, the pre-calculation complexityis O(n4)orO(n3) and the complexity of the intersection shrinks to O(n). The limitation of this method is thatthe pre-calculation is not suitable for large scale ridesharing problem in real-time.

A searching algorithm using a spatio-temporal index to quickly filter the candidate cars that may satisfy atrip request is proposed. This method first partitions the road network using a gird, then uses the anchor nodes todenote girds as shown in Figure 3(A). Each cell maintains a temporally-ordered grid cell list sorted in ascendingorder of the grid’s travel time, a spatially-ordered grid cell list stored in travel distance, and a car list recordingthe cars scheduled to enter the gird as shown in Figure 3(C). When a new request is submitted, the system usesgrids from near to far to filter the candidate cars as shown in Figure 3(D).

Combining similar trip requests to some clusters is also an efficient method to reduce the problem size. AData Stream Management System with different space-partitioning policies is used for trip grouping using aparallel implementation [11, 3].

3.4 The Refine Method

The scale of solution space can be effectively reduced by the filter step. In the Refine step, a trip tr ∈ TRfilter

and a car ∈ Cfilter will be matched. As be mentioned above, this step needs to reschedule the route to includethe trip in set of trips TRrec already accepted by the car, while satisfying all the constraints. This is a NP-hard problem. As mentioned above, algorithms as branch-and-bound [13] and integer programming [6] are

7

Page 12: The SIGSPATIAL Special · ing. Ridesharing can be either static or dynamic [9, 10]. Most ridesharing systems operating today belong to static ridesharing, which arrange the driver

A)Grid-partitioned map B)Grid distance matrix C) Spatio-temporal index of taxis

D) Over view of the dual-side taxi searching algorithm

Figure 3: Spatio-temporal index [17]

not suitable for real-time dynamic ridesharing problem. So new methods need to be developed to solve theridesharing routing problem in this step.

Some simple methods are used to reduce the computational complexity of the system. For example, one canconvert the rescheduling problem to a insertion problem. For a trip with k different points, the complexity isreduced from O(k!) to O(k2). However, insertion method does not try to achieve optimality.

tr1 tr2t

s2s2 s2s2 s2s2A.Trip Request

B.Route need to reschedule C. Insertion with Kinetic Tree

tr2. d

tr1. d tr2. o

tr2. o

tr1. d

tr1. d

tr2. o

p

ppp p

Figure 4: Kinetic Tree for Trip Schedules [12]

Traditional routing algorithms such as branch-and-bound reschedule unfinished origins and destinationswith the new request from scratch. Thus the computations of previous scheduling is not used. The repeatedcomputation make dramatically increase the rescheduling time. To solve this problem, a Kinetic Tree structurethat maintains the calculations performed so far and use them effectively for a new requests is proposed[12]. Ascan be seen from Figure 4(C), the kinetic tree structure stores all the valid trip schedules as a tree. When thecar is running on the road network, the visited points and unvisited branches are dropped from the tree. Theroot of the tree tracks the current location loc of the car. The rest of the tree represent all valid schedules ina compact tree structure. When a new request comes in, the system does not need to do the reschedule fromscratch, instead it tries to insert the new trip into the Kinetic Tree structure. In order to do so, it extends all validand active schedules in the prefix tree into a new valid schedule to include tri. It deals with the origin pointtri.o first and then the destination tri.d. The prefix needs to be scanned to determine which edge can have tri.oinserted while staying maintaining the service constraints, and then it tries to insert tri.d after the position of

8

Page 13: The SIGSPATIAL Special · ing. Ridesharing can be either static or dynamic [9, 10]. Most ridesharing systems operating today belong to static ridesharing, which arrange the driver

tri.o in a similar fashion.As shown in Figure 4(A), the car accepts a request tr1 and picks up the rider at position tr1.o. The new

request of tr2 is submitted to the ridesharing system, when the car is at position P . At this time the car shouldreschedule ⟨tr1.d, tr2.o, tr2.d⟩. If we use the branch-and-bound method, the graph needs to be searched toreschedule the whole route, which can not utilize the computation performed before and the complexity ofreschedule is O(n!). In contrast, the Kinetic Tree could efficiently use the tree structures to store the computationresult. To justify the new trip tr2, we first determine if an insertion is valid at position (1),(2) as shown in Figure4(C) to satisfy the constraints of available seats, time window, and detour. And then tr2.d is inserted at a positionafter tr2.o. After the insertion, the new route is shown as in Figure 4(C)-(3). The main problem with the basictree algorithm is the exponential explosion of the size of the tree when there are multiple clustered origin pointsand destination points. For example, if 8 origin points occur in spatial-temporal proximity, any permutation ofpickups may result in a valid schedule, which yields 8! = 40, 320 possibilities. The hotspot clustering algorithmproposed to deal with this problem. Once a point is combined with any hotspot, the insertion to the other edgesis stopped. The experiments on a large Shanghai taxi dataset has be performed, showing that the kinetic treealgorithms outperform other algorithms significantly.

4 Conclusion and Future Directions

Dynamic ridesharing brings both opportunities and challenges. In this article, we summarize some real-timeridesharing algorithms under the framework of Filter and Refine. And introduce the different methods could beused in different step of the framework in dynamic ridesharing. We believe pervasiveness of location enabledmobile devices will make large scale ridesharing a reality in the near future. However, there are many researchquestions left. On the scheduling side, the impact of the traffic and the associated uncertainty problem hasnot been studied well in dynamic ridesharing and is a challenge research problem. Another important researchquestion still in its early stage is trust and privacy. Tapping into social network analysis may help with theseproblems. With the advent of driverless vehicles, ridesharing will involve more research issues such as pre-routeempty cars to maximize rideshare through learning from historical trips.

Acknowledgments

References[1] N. A. Agatz, A. L. Erera, M. W. Savelsbergh, and X. Wang. Dynamic ride-sharing: A simulation study in metro atlanta. Trans-

portation Research Part B: Methodological, 45(9):1450–1464, 2011.

[2] V. Armant and K. N. Brown. Minimizing the driving distance in ride sharing systems. In Tools with Artificial Intelligence (ICTAI),2014 IEEE 26th International Conference on, pages 568–575. IEEE, 2014.

[3] A. Attanasio, J.-F. Cordeau, G. Ghiani, and G. Laporte. Parallel tabu search heuristics for the dynamic multi-vehicle dial-a-rideproblem. Parallel Computing, 30(3):377–387, 2004.

[4] N. D. Chan and S. A. Shaheen. Ridesharing in north america: Past, present, and future. Transport Reviews, 32(1):93–112, 2012.

[5] B. Cici, A. Markopoulou, E. Frı́as-Martı́nez, and N. Laoutaris. Quantifying the potential of ride-sharing using call descriptionrecords. In Proceedings of the 14th Workshop on Mobile Computing Systems and Applications, page 17. ACM, 2013.

[6] J.-F. Cordeau. A branch-and-cut algorithm for the dial-a-ride problem. Operations Research, 54(3):573–586, 2006.

[7] L. Coslovich, R. Pesenti, and W. Ukovich. A two-phase insertion technique of unexpected customers for a dynamic dial-a-rideproblem. European Journal of Operational Research, 175(3):1605–1615, 2006.

[8] P. M. d’Orey, R. Fernandes, and M. Ferreira. Empirical evaluation of a dynamic and distributed taxi-sharing system. In IntelligentTransportation Systems (ITSC), 2012 15th International IEEE Conference on, pages 140–146. IEEE, 2012.

[9] M. Furuhata, M. Dessouky, F. Ordez, M.-E. Brunet, X. Wang, and S. Koenig. Ridesharing: The state-of-the-art and futuredirections. Transportation Research Part B: Methodological, 57:28 – 46, 2013.

9

Page 14: The SIGSPATIAL Special · ing. Ridesharing can be either static or dynamic [9, 10]. Most ridesharing systems operating today belong to static ridesharing, which arrange the driver

[10] K. Ghoseiri, A. E. Haghani, M. Hamedi, and M.-A. U. T. Center. Real-time rideshare matching problem. Mid-Atlantic UniversitiesTransportation Center, 2011.

[11] G. Gidofalvi, T. B. Pedersen, T. Risch, and E. Zeitler. Highly scalable trip grouping for large-scale collective transportation systems.In Proceedings of the 11th international conference on Extending database technology: Advances in database technology, pages678–689. ACM, 2008.

[12] Y. Huang, F. Bastani, R. Jin, and X. S. Wang. Large scale real-time ridesharing with service guarantee on road networks. Proceed-ings of the VLDB Endowment, 7(14):2017–2028, 2014.

[13] B. Kalantari, A. V. Hill, and S. R. Arora. An algorithm for the traveling salesman problem with pickup and delivery customers.European Journal of Operational Research, 22(3):377–386, 1985.

[14] E. Kamar and E. Horvitz. Collaboration and shared plans in the open world: Studies of ridesharing. In IJCAI, volume 9, page 187,2009.

[15] A. Kleiner, B. Nebel, and V. Ziparo. A mechanism for dynamic ride sharing based on parallel auctions. 2011.

[16] B. Li, D. Krushinsky, H. A. Reijers, and T. Van Woensel. The share-a-ride problem: People and parcels sharing taxis. EuropeanJournal of Operational Research, 238(1):31–40, 2014.

[17] S. Ma, Y. Zheng, and O. Wolfson. T-share: A large-scale dynamic taxi ridesharing service. In Data Engineering (ICDE), 2013IEEE 29th International Conference on, pages 410–421. IEEE, 2013.

[18] K. Radke, M. Brereton, S. Mirisaee, S. Ghelawat, C. Boyd, and J. G. Nieto. Tensions in developing a secure collective informationpractice-the case of agile ridesharing. In Human-Computer Interaction–INTERACT 2011, pages 524–532. Springer, 2011.

[19] M. W. Savelsbergh. Local search in routing problems with time windows. Annals of Operations research, 4(1):285–305, 1985.

[20] C. Tian, Y. Huang, Z. Liu, F. Bastani, and R. Jin. Noah: a dynamic ridesharing system. In Proceedings of the 2013 ACM SIGMODInternational Conference on Management of Data, pages 985–988. ACM, 2013.

[21] J. Witkowski, S. Seuken, and D. C. Parkes. Incentive-compatible escrow mechanisms. In AAAI, 2011.

[22] X. Xing, T. Warden, T. Nicolai, and O. Herzog. Smize: a spontaneous ride-sharing system for individual urban transit. InMultiagent System Technologies, pages 165–176. Springer, 2009.

[23] D. Zhao, D. Zhang, E. H. Gerding, Y. Sakurai, and M. Yokoo. Incentives in ridesharing with deficit control. In Proceedings ofthe 2014 international conference on Autonomous agents and multi-agent systems, pages 1021–1028. International Foundation forAutonomous Agents and Multiagent Systems, 2014.

10

Page 15: The SIGSPATIAL Special · ing. Ridesharing can be either static or dynamic [9, 10]. Most ridesharing systems operating today belong to static ridesharing, which arrange the driver

Geo-social Media Data Analytic for User Modeling andLocation-based Services

Jie Bao1, Defu Lian2, Fuzheng Zhang1, Nicholas Jing Yuan1

1Microsoft Research, China2Big Data Research Center, University of Electronic Science and Technology of [email protected], [email protected], {fuzzhang,nicholas.yuan}@microsoft.com

Abstract

More and more geo-tagged social media data is generated, nowadays, from the geo-tagged tweets, geo-tagged photos to check-ins. Analyzing this flourish data enables the possibility for us to discover usersdaily mobility patterns, profiles and preferences. As a result, based on the analyzed results, new typesof location-based services emerge. In this article, we first introduce the recent advances in location-based user preferences modeling, which includes: 1) inferring users demographics, 2) identifying usersnovelty-seeking characteristics and 3) discovering users shopping impulsiveness. After that, we presenta comprehensive summary on the state-of-arts of the location-based services, which take advantage ofthe geo-social media, including: 1) location-based recommendations, 2) location-based predication.

1 Introduction

With the advances in GPS-embedded devices, like smart phones and tablets, and the popularity of online socialnetworking services, like Facebook, Flickers, and Foursquare, billions of geo-tagged social media data aregenerated from geo-tagged tweets, photos and check-ins from location-based social networking services (i.e.,LBSN). Users’ geo-tagged social media not only records the locations the user has been, but also reflects herhabits and preferences. The geo-tagged social media data serves as a bridge between users’ online and offlinelives, and is drawing significant attentions in many different commerce areas, such as user profiling [27], partnermarketing [2] and recommendations [4].

Use

r-L

oca

tion

Gra

ph

Location Graph

User Graph

Users

Locations

Trajectory

User

Correlation

Location

Correlation

Location-tagged user-generated content

Figure 1: An Overview of LBSN

Figure 1 gives an overview a typical location-based social networks, in which the addition of locationscreates new relations and correlations. The geo-social media data analysis focuses on mining the relationshipsamong the following three types of graphs:

11

Page 16: The SIGSPATIAL Special · ing. Ridesharing can be either static or dynamic [9, 10]. Most ridesharing systems operating today belong to static ridesharing, which arrange the driver

• Location-location graph. In the location-location graph, a node is a location and a directed edge rep-resents the relation between two locations. This relations can be explained in many possible ways. Forexample, it can indicate the physical distances between the locations, or the similarities between the loca-tions. Also, it can be connected by the user activities.

• User-location graph. In the user-location graph, there are two types of nodes, users and locations. Anedge starting from a user and ending at a location can indicate that the user’s travel histories.

• User-user graph. In the user-user graph (shown in the top-right of Figure 1), a node is a user and an edgebetween two nodes represents the relations between users, as: a) the physical distances, b) the friendshiprelations; and c) the other relation derived from the users’ location histories.

With these heterogeneous graphs, semantic meanings of the locations and their temporal orders, many anal-ysis can be done on geo-social media analysis. In this article, we briefly discuss some recent techniques thatanalyze users’ geo-tagged media data to build accurate profiles and provide personalized services.

2 Location-based User Modeling

The users’ profile/model information has a great value on providing the personalized services. The geo-taggedsocial media is also a very informative source to determine the user’s demographics and some psychologicalcharacteristics.

2.1 Location to User Profile

User profiling is crucial to many online services. Several recent studies suggest that demographic attributes arepredictable from different online behavioral data. In [27], we investigate the predictive power of location check-ins for inferring users’ demographics and propose a simple yet general location to profile (L2P) framework.More specifically, we extract rich semantics of users’ check-ins in terms of spatiality, temporality, and loca-tion knowledge, where the location knowledge is enriched with semantics mined from heterogeneous domainsincluding both online customer review sites and social networks.Spatiality. To capture the spatial distributions of users’ check-ins, we segment a city into disjointed regions.Each check-in of a user is assigned to the region that the check-in occurs in. Nevertheless, instead of using auniform segmentation (grids) for a city, we adopt a morphological segmentation of urban spaces [23], where theregions are segmented using high level roads in a road network. Since transportation in urban areas is usuallyrestricted by networks, such segmentation preserves the semantics of users’ movements and the topology of roadnetworks. Figures 2(a) and Figure 2(b) visualize the segmentation results of Beijing and Shanghai, where thesegmented regions are indicated with different colors.Temporality. Human mobility is imbued with ample temporal patterns at different granularities, e.g., day of theweek and time of day. For example, office staff commute from home to their company every weekday morning.It’s common to see a retired person shopping in the supermarket on a weekday afternoon and a taxi driverworking at midnight during holidays. Recent studies have also found that human mobility follows a high degreeof regularity [9]. Here, we split a week into two parts: weekdays and weekends. For both of them, we split a dayinto hourly time bins. Thus, we have a total of 24× 2 time bins for the expression of temporal patterns. Similarto spatiality, we discretize the timestamps and assign the corresponding time bins for each check-in of a user.Location Knowledge. Human mobility strongly correlates to the functionality of locations which motivatespeople to travel between different places. Apt instances are: students go to school because they acquire knowl-edge there; businessmen go to a city’s central business district because they conduct commercial affairs there;people go to restaurant districts because they have lunch or dinner there. A check-in is typically associated

12

Page 17: The SIGSPATIAL Special · ing. Ridesharing can be either static or dynamic [9, 10]. Most ridesharing systems operating today belong to static ridesharing, which arrange the driver

500

1000

1500

2000

(a) Beijing

500

1000

1500

2000

(b) Shanghai

Figure 2: Region segmentation for spatiality

O1 O2 O1 O2 ?O1

O2

O3

User1

O2 O3 O2User2

O2 O3 O2 O3 O4User3

O3

(a) Dynamic choice novelty with regards to users

Position

2

5

3

4

1

1 2 3

1 2 2

1 2

12 3

O1 O2 O3 Choice

1 1 11

(b) DCN matrix

Figure 3: Dynamic choice novelty of individuals w.r.t an example of DCN matrix

with a Point of Interest (POI), which belongs to a certain category, e.g., teaching building or shopping mall.Furthermore, the semantics of a location sometimes contain far more information than just the category, e.g., theatmosphere and price range of a restaurant, or the quality of a college. These semantics can be enriched with“human knowledge” which are revealed from customer review sites (such as Yelp and Dianping) and onlinesocial networks (when users mention these locations).

2.2 Location to User Characteristic

2.2.1 Novelty-Seeking

Novelty-seeking is one of very important user characteristics, which can be used in recommendations and pro-motions. In previous research [19] looks into the measurement of novelty-seeking trait, using the survey-basedapproaches, which are vulnerable to memory error. Based on the geo-tagged data on social media, we can ex-plore individual novelty-seeking trait in a complete data-driven way. Such an approach can analyze data at amuch larger scale than questionnaire-based methods. For example, if we observe a person prefers to explorenew places on Foursquare, we probably would conclude that she is a novelty-lover.

In this work [24], we use a user’s check-in sequence on social media to infer her novelty-seeking trait. Wepresent a matrix, which is termed as dynamic choice novelty (DCN), to calculate each place’s novelty degree ateach time, where the each row measures the partial order of the places for a particular check-in behavior. Asshown in Figure 3(a), given user1 faces three places for choice at last check-in, the vector corresponding to thelast row in Figure 3(b) indicates, at this moment, which place is more novel for her. At each time in a user’scheck-in sequence, we define her novelty-seeking level. For example, in Figure 3(a), if user1 choose place o3at the last time, it is more likely she has a high novelty-seeking propensity at that moment and wants to exploresomething new. Then, we define novelty-seeking trait as a real number, which is represented as the mean of a

13

Page 18: The SIGSPATIAL Special · ing. Ridesharing can be either static or dynamic [9, 10]. Most ridesharing systems operating today belong to static ridesharing, which arrange the driver

Check-in

Social Network Post

12�1�2013 12�18�2013 1�6�20140.000

0.002

0.004

0.006

0.008

0.

0.002

0.004

0.006

0.008

0.01

0.012

0.014

Date

Chec

k-in

pdf

Soci

alN

etow

ork

Post

pdf

The Chinese presidentXi Jinping visitedQing-Feng Steamed Dumpling Shop

on 12�28�2013

Figure 4: Check-in/social post density distribution of Qing-Feng Steamed Dumpling Shop

multinomial distribution, where each element refers to the probability of having a specific novelty-seeking level.The larger the novelty-seeking trait, the greater the novelty-seeking propensity the individual possesses and viceversa.

2.2.2 Shopping Impulsivity

Consumer impulsivity is another important user feature that is very useful for the consumer behavior studyand recommendation. It is reported that more than 70% of all the supermarket-buying decisions are unplannedor impulse purchases [11]. Over decades, researchers have developed numerous scales to measure consumerimpulsivity, e.g., [18], using interviews and surveys. However, their work is also vulnerable to memory andlarge-scale population.

In this work [25], by connecting visible posts on social media with consumer’s physical consumption behav-ior reflected by geo-tagged activity, we propose a new method to explore consumer impulsivity in a completelydata-driven way. For example, Figure 4 shows after the Chinese president visited a shop, there was a surge inboth social media exposure of this shop and check-in frequency at this shop. The strong association betweensocial media exposure and check-in frequency implies some consumers are impulsive triggered by stimuli fromsocial media.

The intuition is that the check-ins of a consumer contain abundant information of her in-store consumptionin daily life, e.g., POI indicates the geo-location and shop category where she consumes, while the timestampreveals the chronological order. Note that some check-ins are not related to consumption activity. Thus, we onlyfocus on check-ins related to consumption activity, e.g. check-ins in the restaurant, shopping or entertainment.On the other hand, a wealth of information embodied in the published posts on social media might provideincentive for consumption behavior as illustrated in Figure 4. Thus, we consider the posts published by a user’sfriends as the source of stimuli for impulsive physical consumption behavior.

Given different shop’s stimuli strength, we apply a graphical model to calculate a user’s consumer impulsiv-ity. The basic principle of the model is that, when a consumer is at a higher impulsivity level, she is more likelyto be primed for accepting a shop with a larger stimuli intensity, and the usual shop preference is more likely tobe neglected in this case; however, when she is at a lower impulsivity level, in this situation, she is more likelyto choose a shop according to her usual preference.

3 Location-based Services

With the users’ massive location histories, many new types of location-based services also emerge as the resultof the analytic results.

14

Page 19: The SIGSPATIAL Special · ing. Ridesharing can be either static or dynamic [9, 10]. Most ridesharing systems operating today belong to static ridesharing, which arrange the driver

(a) CF-based Method

Users: Hub nodes

Iterative

Inference

Locations: Authority nodes

….. …..

Locations

User u User u

Locations

(b) Link-Analysis based Method

Figure 5: Main Idea Illustrations of Popular Location-based Recommendation Techniques

3.1 Location-based Recommendation

Location-based recommendations emerge from two lines of services: 1) location-based services and 2) recom-mendation services. The traditional location-based services answer spatial queries. However, in many cases, theresults with the closest spatial distances do not satisfy a user’s preferences. On the other side, the conventionalrecommendation services are very successful in providing the suggestions for the users with generic items, likebooks, movies and products. With the rapid development in GPS-embedded phones, more users are looking forrecommendations related with their locations, e.g., suggestions for travel, news, and activities.

The earliest attempts, e.g., [17, 20] in location-based recommendation uses content-based filtering tech-niques to suggest venues to a user by match the similar terms between the user and venue profiles. The follow-ing work, e.g., [26] includes the crowd wisdom to improve the quality of the recommendation, by mining theuser trajectory patterns. The most recent research, e.g., [3], further extend the previous work and making therecommendations more personalized.

Most recently, with the popularity of geo-tagged media data, the location-based recommendations are notlimited to recommend the stand-alone venues to the user. Many different types of applications are proposed anddeveloped to recommend various items using users current and historical locations, including: 1) travel routes,2) users, 3) activities, and 4) social media. More detailed survey can be found in [4].

There are mainly three types of methodologies used by location-based recommendations, as being based on:1) content, 2) link analysis, or 3) collaborative filtering.

Content-based Recommendations, such as [17], match user preferences (e.g., income, gender, and race),with features extracted from locations, such as tags, price ranges and categories, to make recommendations. Inthis approach, the predication score is calculated based on the similarity measures of the matching contents, aswell as the spatial distances.

Link Analysis-based Recommendations, e.g., PageRank [16] and Hypertext Induced Topic Search(HITS) [10], extract high quality nodes and nodes closeness by analyzing the network structure. In location-based recommendation scenario, [21] extends a random walk-based link analysis algorithm to provide locationrecommendation. [26] extends the HITS algorithm, which is a reinforcement iterations between the user expe-riences and venue popularities, to recommend experienced users and interesting locations, as demonstrated inFigure 5(b).

Collaborative Filtering-based Recommendations, or (CF) is one of the most widely used models in con-ventional recommendation systems. The intuition to extend the CF model for location-based recommendationis that a user is more likely to visit a location if it is preferred by similar users. As shown in Figure 5(a), the CFapproach used by location-based recommendation systems consists of three processes: 1) candidate selection,2) similarity inference, and 3) recommendation score predication.

15

Page 20: The SIGSPATIAL Special · ing. Ridesharing can be either static or dynamic [9, 10]. Most ridesharing systems operating today belong to static ridesharing, which arrange the driver

3.2 Location-based Predication

Location prediction, which infers the places a user would be in a near future, as illustrated in Fig 6. This serviceplays important parts in urban planning, traffic forecasting, advertising and recommendations.

? ?

?

Figure 6: A typical scenario for next check-in location prediction

The traditional approach applies Markov models directly, which estimate the transition probability betweenlocations, e.g., [22, 1]. However, when being applied for mobility trajectories from location-based social net-works, due to the data sparsity and the missing-value issues, such models suffer from the over-fitting problems.Thus, a variety of improved models have been suggested and validated, such as Markov model with smoothingtechniques [12], hierarchical bayesian language model [8]. The basic idea of these models is to recursively inte-grate high-order Markov model with low-order one [5], so that when lacking sufficient data, lower-order Markovmodels could play a more important role. As the higher-order Markov model limits location predictability withlimited number of mobility data, non-parametric methods have been used for estimating the limit of predictabil-ity [14], which first apply non-parametric entropy measures, i.e., LZ estimators, to get the extent of the repetitionof mobility patterns, and then based on Fano’s inequality [7], transform the estimated entropy into the limit ofpredictability. The predictability on cell tower traces can achieve up to 93% but only 40% on check-in tracesfrom location-based social networks.

To incorporate the geographical information of locations, much work focuses on geographical modelingof human mobility data. For example, periodic mobility model (PMM) based on a two-component GaussianMixture model has been proposed to take into account the temporally and spatially periodic behavior [6]. Sincetwo-component mixture models are often impractical, two-dimensional kernel density estimation has been pro-posed for this goal [12]. In order to take into account the implicit-feedback characteristics of mobility data [13],an optimization-based density estimation has been suggested for geographical modeling and achieves a superiorpredicting performance.

To further address the issue, when a user comes to an unfamiliar area, many other information in the so-cial media has been leveraged, e.g., 1) social relationships [8, 6], where they find that social relationships canexplain about 10% to 30% of all human movement; and 2) similar users, as studied in [12], typically, collabo-rative knowledge is not only from online friends, but also from users sharing similar mobility patterns, socialbackgrounds, interests, or social statuses.

We have already observed the advantage of collaborative prediction models, but their two different perspec-tives show the distinct superiority in performance. For example, collaborative filtering models usually workswell when users deviate from routines. This imposes another challenge to collaborative prediction models, thatis to determine in which case social conformity should take effect. This is challenging since these collaborativemodels don’t take into account the context information, such as time, activity and accompanies. Therefore, anexploration prediction algorithm has been studied in [12, 15] and applied for improving collaborative predictionalgorithms.

16

Page 21: The SIGSPATIAL Special · ing. Ridesharing can be either static or dynamic [9, 10]. Most ridesharing systems operating today belong to static ridesharing, which arrange the driver

4 Conclusion

In this article, we discussed many case scenarios on analyzing users’ geo-social media data, from discoveringusers’ profile and characteristics to personalized location-based services, like recommendation and predication.We show that there is a great potential in mining and analyzing the geo-social data. We also believe that with themore generated geo-tagged social media and other context information, like weather and mood, more interestingand useful scenarios can be served.

References

[1] Daniel Ashbrook and Thad Starner. Using gps to learn significant locations and predict movement acrossmultiple users. Personal and Ubiquitous Computing, 7(5):275–286, 2003.

[2] J Bao, A Deshpande, S McFaddin, and Chandra Narayanaswami. Partner-marketing using geo-socialmedia data for smarter commerce. IBM Journal of Research and Development, 58(5/6):6–1, 2014.

[3] Jie Bao, Yu Zheng, and Mohamed Mokbel. Location-based and preference-aware recommendation usingsparse geo-social networking data. In ACM SIGSPATIAL, 2012.

[4] Jie Bao, Yu Zheng, David Wilkie, and Mohamed Mokbel. Recommendations in location-based socialnetworks: a survey. GeoInformatica, 19(3):525–565, 2015.

[5] Stanley F Chen and Joshua Goodman. An empirical study of smoothing techniques for language modeling.In Proceedings of ACL’96, pages 310–318. ACL, 1996.

[6] E. Cho, S.A. Myers, and J. Leskovec. Friendship and mobility: user movement in location-based socialnetworks. In Proceedings of KDD’11, pages 1082–1090, 2011.

[7] R.M. Fano. Transmission of information: a statistical theory of communications. M.I.T. Press, 1961.

[8] H. Gao, J. Tang, and H. Liu. Exploring social-historical ties on location-based social networks. In Pro-ceedings of ICWSM’12, 2012.

[9] Marta C Gonzalez, Cesar A Hidalgo, and Albert-Laszlo Barabasi. Understanding individual human mo-bility patterns. Nature, 453(7196):779–782, 2008.

[10] Jon M Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM (JACM),46(5):604–632, 1999.

[11] John G Knight, David K Holdsworth, and Damien W Mather. Country-of-origin and choice of foodimports: an in-depth study of european distribution channel gatekeepers. Journal of International BusinessStudies, 38(1):107–125, 2007.

[12] Defu Lian, Xing Xie, Vincent W. Zheng, Nicholas Jing Yuan, Fuzheng Zhang, and Enhong Chen. Cepr:A collaborative exploration and periodically returning model for location prediction. ACM Trans. Intell.Syst. Technol., 6(1):8:1–8:27, April 2015.

[13] Defu Lian, Cong Zhao, Xing Xie, Guangzhong Sun, Enhong Chen, and Yong Rui. GeoMF: joint geograph-ical modeling and matrix factorization for point-of-interest recommendation. In Proceedings of KDD’14,pages 831–840. ACM, 2014.

[14] Defu Lian, Yin Zhu, Xing Xie, and Enhong Chen. Analyzing location predictability on location-basedsocial networks. In Proceedings of PAKDD’14, 2014.

17

Page 22: The SIGSPATIAL Special · ing. Ridesharing can be either static or dynamic [9, 10]. Most ridesharing systems operating today belong to static ridesharing, which arrange the driver

[15] James McInerney, Sebastian Stein, Alex Rogers, and Nicholas R Jennings. Breaking the habit: Measuringand predicting departures from routine in individual human mobility. Pervasive and Mobile Computing,2013.

[16] Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. The pagerank citation ranking: Bring-ing order to the web. Technical Report, 1999.

[17] Moon-Hee Park, Jin-Hyuk Hong, and Sung-Bae Cho. Location-based recommendation system usingbayesian users preference model in mobile devices. In Ubiquitous Intelligence and Computing, pages1130–1139. Springer, 2007.

[18] Radhika Puri. Measuring and modifying consumer impulsiveness: A cost-benefit accessibility framework.Journal of Consumer Psychology, 5(2):87–113, 1996.

[19] Puthankurissi S Raju. Optimum stimulation level: its relationship to personality, demographics, and ex-ploratory behavior. Journal of Consumer Research, pages 272–282, 1980.

[20] Lakshmish Ramaswamy, P Deepak, Ramana Polavarapu, Kutila Gunasekera, Dinesh Garg, KarthikVisweswariah, and Shivkumar Kalyanaraman. Caesar: A context-aware, social recommender system forlow-end mobile devices. In MDM, pages 338–347. IEEE, 2009.

[21] Rudy Raymond, Takamitsu Sugiura, and Kota Tsubouchi. Location recommendation based on locationhistory and spatio-temporal correlations for an on-demand bus system. In ACM SIGSPATIAL. ACM, 2011.

[22] L. Song, D. Kotz, R. Jain, and X. He. Evaluating location predictors with extensive wi-fi mobility data. InProceedings of INFOCOM’04, volume 2, pages 1414–1424. IEEE, 2004.

[23] Jing Yuan, Yu Zheng, and Xing Xie. Discovering regions of different functions in a city using humanmobility and pois. In SIGKDD, pages 186–194. ACM, 2012.

[24] Fuzheng Zhang, Nicholas Jing Yuan, Defu Lian, and Xing Xie. Mining novelty-seeking trait across het-erogeneous domains. In WWW, pages 373–384. ACM, 2014.

[25] Fuzheng Zhang, Nicholas Jing Yuan, Kai Zheng, Defu Lian, Xing Xie, and Yong Rui. Mining consumerimpulsivity from offline and online behavior. In UbiComp, pages 1281–1292. ACM, 2015.

[26] Yu Zheng, Lizhu Zhang, Xing Xie, and Wei-Ying Ma. Mining interesting locations and travel sequencesfrom gps trajectories. In WWW, pages 791–800. ACM, 2009.

[27] Yuan Zhong, Nicholas Jing Yuan, Wen Zhong, Fuzheng Zhang, and Xing Xie. You are where you go:Inferring demographic attributes from location check-ins. In WSDM, pages 295–304. ACM, 2015.

18

Page 23: The SIGSPATIAL Special · ing. Ridesharing can be either static or dynamic [9, 10]. Most ridesharing systems operating today belong to static ridesharing, which arrange the driver

How Events Unfold:Spatiotemporal Mining in Social Media

Ting Hua1,∗ Liang Zhao 1 Feng Chen2 Chang-Tien Lu1

Naren Ramakrishnan1

1 Department of Computer Science, Virginia Tech2 Department of Computer Science, University at Albany-SUNY

∗ E-mail: [email protected]

Abstract

There has been significant recent interest in the application of social media analytics for spatiotemporal event min-ing. However, no structured survey exists to capture developments in this space. This paper seeks to fill this void byreviewing recent research trends. Three branches of research are summarized here—corresponding (resp.) to mod-eling the past, present, and future—information tracking and backward analysis, spatiotemporal event detection,and spatiotemporal event forecasting. Each branch is illustrated with examples, challenges, and accomplishments.

1 IntroductionWith rapid developments in modern geo-tagged communication, especially social media, spatial computing is now engen-dering a revolution in the modeling and understanding of human behavior. The rise of “big data” (e.g., via channels likeTwitter, Facebook, Youtube) has given a new window into studying events across the globe. It has become possible toaggregate public data to capture triggers underlying events, detect on-going trends, and forecast future happenings. Con-comitantly there has been a rapid development of new computational methods for spatiotemporal mining of social mediadatasets.

This paper structures recent research into three directions:

• The Past, i.e., information tracking and backward analysis. Social media data has grown enormously over the years(in 2013, more than 400 million tweets are estimated posted by millions of users 1). Such voluminous dynamicsprovides an interesting opportunity to track information on targeted topics and analyze triggers underlying them. Itis also possible to capture diversity in information flows: for instance, a user can obtain information from a friend’stweets in his/her social network or obtain the necessary information externally (e.g., TV or news media). Studies ofthe interaction across multiple data sources provide richer contextual information into information flows.

• The Present, i.e., spatiotemporal event detection. It is believed that news breaks earlier in social media than intraditional media [18]. The aim of event detection is to identify ongoing events from social media data beforetheir reporting in mainstream news outlets. With real-time data streams as input, event detection models can outputspatiotemporal summaries for on-going events, including information about event occurrence time, the locationsinvolved, and a textual event summary.

• The Future, i.e., spatiotemporal event forecasting. Twitter data has been shown powerful in forecasting [20] that sometweets may contain context indicating future events. For example, tweet “TRAFFIC ALERT: Rt. 20 closed due to awreck” provided evidence of future hazard along roadways. In addition, compared to traditional documents, socialmedia is endowed with multiple features, such as time stamps, geo-tags, and an underyling network. Utilization ofthese multiple features and indicating information make forecasting spatiotemporal events before their occurrencepossible.

1https://blog.twitter.com/2013/celebrating-twitter7

19

Page 24: The SIGSPATIAL Special · ing. Ridesharing can be either static or dynamic [9, 10]. Most ridesharing systems operating today belong to static ridesharing, which arrange the driver

2 The Past: Information Tracking and Backward AnalysisThis section introduces the capture of developments involving an event, evaluating its trustworthiness [25, 27], and howwe can identify the underlying event triggers [9]. Figure 1 illustrates the evolution of a civil unrest event in Mexico. Thisevent occurred in January 2013 and involves a civil unrest event in Mexico. News media initially reported that (human)bodies were found in an suburban area on Jan 3 and Jan 4, but this event received little or no attention in social media.Next, the government captured some dogs as suspects. This event was first discussed in the news on Jan 7. A hashtagspecifically denoting the event named “YoSoyCan26” was created on Jan 8, and soon spread rapidly among Twitter users.Tweets using this hashtag predominantly called for the release of the captured dogs. In the following day, news mediabegan reporting on the chatter underyling the popular trending Twitter topic “YoSoyCan26”. This online trending topictriggered a real world protest event on Jan 9. As can be seen from the development underlying the “dog protests”, Twitterusers could be influenced by news reports, and conversely, news media also can be influenced by information from Twitter.Topics in Twitter cover almost all themes from daily life happenings to breaking news. How to harvest the large volumeand fast dynamics underyling social media to track events and analyze triggers are key questions of interest.

0

5

10

15

20

25

30

35

40

45

50

0

50

100

150

200

250

300

350

400

2-Jan 3-Jan 4-Jan 5-Jan 6-Jan 7-Jan 8-Jan 9-Jan 10-Jan 11-Jan 12-Jan 13-Jan 14-Jan

Chart Title

Twitter News

Andrés Gómez Pliego @ draagp 7 Jan

Help release the dogs of Iztapalapa just passing by # yosoycan26 @ luisbecerrilr @ brozo_xmiswebs @ FcoEVillaZapata

SSPDF had

captured a

pack of dogs.

Demanding

release of dog:

#yosoycan26

Liberating Iztapalapa

dogs protest

Bodies of a

woman and a

baby are found

in Iztapalapa

La Silla Rota @ lasillarota 8 Jan

# YoSoyCan26 "Iztapalapa Dogs are innocent", recorded tweeters

AnimaNaturalisMéxico @ AnimaNat_Mexico 9 Jan

We demand justice for the dogs identified as causing four deaths! # YoSoyCan26

Two more

bodies found

in Iztapalapa

Figure 1: Evolution of “dog protest” in Mexico.

In contrast to traditional plain text, tweets often contain features such as hashtags, replying, and friendships, and har-nessing such features can be crucial to dealing with the massive dynamics of social media data. There has been significantwork on general theme tracking [2, 23], where themes are typically represented as mixtures of latent topics. Research ontargeted theme tracking most adopt a supervised classification framework for theme extraction [16, 17]. Lin et al. usepredefined keywords as query terms, and this approach requires significant human effort and can introduce bias [14, 15].Backward analysis for identifying triggers and key players of interest is a relatively new research direction, and builds upondevelopments in time series modeling [12] and temporal topic mining [21].

2.1 Tracking themes in TwitterZhao et al. [25] suggest a framework that can track themes of targeted domain dynamically. This framework jointlymodels textual content and social network simultaneously. Specifically, it first builds a heterogeneous graph that includesmultiple types of theme entities such as tweets, terms, and users. The connections between nodes are computed through co-occurrence, authorship, and replying relationships, and the theme snapshots are arranged on time-ordered sub-collections.The underlying parameters are estimated by minimizing the Kullback-Leibler divergence between inferred themes andground truth data. This methodology is effective as well as efficient. By using heterogeneous relationships from Twitter,this method makes full use of Twitter textual and structural features. Meanwhile, the model also enjoys linear scalabilitywhich is ensured by conditional independence relationships among entities.

Evaluating the trustworthiness of themes is of equal importance as tracking them. Most previous work focused onevaluating the credibility of general tweets [22, 4, 6], while Zhao et al. made the first attempt at evaluating topic-focusedtweets [27]. Topic-focused tweets are first identified through a text classifier that is trained through Twitter labels, andthen trustworthiness of users as well as their posts are updated by an iterative propagation algorithm. Specifically, the

20

Page 25: The SIGSPATIAL Special · ing. Ridesharing can be either static or dynamic [9, 10]. Most ridesharing systems operating today belong to static ridesharing, which arrange the driver

trustworthiness is evaluated through multiple aspects, such as trustworthiness of Twitter texts, authorship, and underlyingsocial graph.

2.2 Identifying TriggersHua et al. [9] present a study that analyzes the root causes of civil unrest through tweets. Tweets related to specific protestsprovide insights into the root causes, i.e., who the organizers are, and how online expression reflects or contributes to suchevents. In addition, the causes of social events can also be viewed through the analysis of interactions between social mediaand traditional news streams, which support a variety of applications, including: understanding the underlying factors thatdrive the evolution of data sources, tracking the triggers behind events, and discovering emerging trends.

Hua et al. also recently proposed a hierarchical Bayesian model that jointly models news and social media topics andtheir interactions [10]. This model jointly considers news data and tweet data in an asymmetrical frame. Such structure cansignificantly improve modelling performance for short texts (tweets), without loss of accuracy in long documents (news).Besides, the output of this model enables a variety of applications to understand the complex interaction between newsand social media data such as: checking the topical coverage of different data sources, capturing the influencers from topicto topic, identifying key documents and key players. Some interesting conclusions can be drawn from their experimentalresults. For instance, news topics are generally more influential than Twitter topics; topic occurring first in one data sourcebut growing popular in another data source might be a source of triggers.

3 The Present: Spatiotemporal Event DetectionThis section introduces semi-supervised and unsupervised methods that can be used for automatical detection of targetedevents. First, social media is known to be a more responsive medium than traditional news outlets [18]. Early detectionwith social media data is therefore of great practical use. Second, tweets not only contain plain texts, but also includespatiotemporal information such as geo-tags and time-stamps. Figure 2 shows an example spatiotemporal event detectionon July 14, 2012 in Mexico. Red nodes in the figure represent event relevant tweets (a protest against president Pena Nieto)that were published on that day and can be positioned by the longitude and latitude according to their geo-tags. Thesetweets are mainly distributed within two clusters, corresponding to two metropolitan centers: Mexico city and Monterreycity. The aim of early event detection in social media data is to identify these spatiotemporal clusters and summarize tweetsinto events. Numerous previous studies have focused on detecting events from formal documents such as news articles oremails [11, 5]. However, data in Twitter streams are heavily informal, ungrammatical, and dynamic so that traditionalmethods cannot be applied to the mining of such noisy data.

List the marches anti PEÑA NIETO by city,Locate your MARCHA #MegaMarcha http://t.co/hwGTl3FH

List of the cities for marcha on July 14 #MexicoExigeDemocracia #YoSoy132 via @RedPuenteSur

"#MexicoExigeDemocracia""http://t.co/MdG5T3z0 Twitterers help me with a RT?. See you onSaturday at 15:00

@epigmenioibarra good morning!!! excusefrom where the #MegaMarcha ??

Figure 2: Example of spatiotemporal event detection (Mexico, July, 2012).

Most existing targeted-domain event detection algorithms adopted in social media analytics are supervised methods.These methods [17, 13] usually first train a classifier to recognize tweets from the targeted domain, and then apply tech-niques such as Kalman filtering to detect locations from these tweets as event occurrence locations. However, the building

21

Page 26: The SIGSPATIAL Special · ing. Ridesharing can be either static or dynamic [9, 10]. Most ridesharing systems operating today belong to static ridesharing, which arrange the driver

and maintenance of high quality labeled data requires extensive human efforts. Although most classifiers in existing workare sensitive to the features, there is no accepted methodology for feature selection. Therefore, semi-supervised and unsu-pervised methods are in great need.

3.1 Unsupervised event detectionZhao et al. made an attempt to detect events of user-specified interest in an unsupervised way [24], requiring no pre-givenlabeled data. Starting from seed words, this model acts like a search engine that can retrieve the relevant informationautomatically. Specifically, given a targeted domain, the algorithm expands the seed words to domain-related terms via atweet homogeneous graph. The expansion of seed words with co-occuring terms can be viewed as a process of knowledgeacquisition, while traditional supervised methods are limited in this respect. This process is iteratively repeated so that keyterms are exhaustively extracted and then weighted in each iteration. A graph induced using this expanded vocabulary isthen subject to a local modularity spatial scan (LMSS) to capture both semantic similarities and geographical proximitiesby jointly maximizing local modularity and spatial scan statistics. This event detection framework can be the foundationto more sophisticated models, e.g., tracking the evolution of targeted themes [25].

3.2 Semi-supervised event detectionA semi-supervised method is another solution for automatic event detection. Hua et al. proposed a model that can learnpseudo-labels (from news) for Twitter data [8] and therefore save costs in human labeling without significant loss inaccuracy. Specifically, this model first transfers labels from newspapers to tweets through a novel ranking algorithm, andfurther expands the initial label subspace by an EM inference algorithm. The noisy nature of Twitter data is a new challengefor text classification. To address these challenges, a customized text classifier for Twitter analysis is provided to combinetweets into mini-clusters by social ties. To make maximum usage of all Twitter geographical information, this model alsoextends spatial scan statistics with multinomial distributions, and thus factors from various location-related items (e.g.,user-profile locations or geo-tags) can be considered together to improve geo-coding performance.

4 The Future: Spatiotemporal Event ForecastingThe development of spatiotemporal events usually contain several different stages. Figure 3 shows an example of influenzaoutbreak in November 2014. Taking Louisiana state as an example, this state was “healthy” in week 45 as its inuenza-likeillness (ILI) activity level was minimal (green), became “lightly infected” in week 46 (low ILI activity level), and endedwith becoming “seriously infected” in week 47 (high ILI activity level). It is clear that the evolution of a spatiotemporalevent is not only impacted by its current stage but also influenced by its geographical and temporal neighborhood.

Influenza outbreak on Week 47 ending Nov 22, 2014 in southern region

Week 47 Week 46 Week 45

Figure 3: Influenza outbreak on Week 47 ending Nov 22,2014 in southern region.

Unlike traditional plain text, social media data is multi-dimensional, including spatial feature “geo-tags”, temporalfeature “publish date”, textual feature “content”, and influence feature “friend relationships”. How to utilize social mediadata for forecasting is an active research topic of current interest. However, dynamic patterns of features (keywords) andthe geographic heterogeneity of social media data bring critical challenges. Some studies utilize regression or SVM modelsto predict the occurrence of future events. Their difference lies in the features they used, where some adopted tweet volumeor sentiment scores [1, 3, 7] while others may use more informative features such as semantic topics. Most of them ignoredthe geographical information which is in fact one of the most important features of an event. Beyond these simple solutions,here we introduce two novel methods that can forecast spatial events through social media data.

22

Page 27: The SIGSPATIAL Special · ing. Ridesharing can be either static or dynamic [9, 10]. Most ridesharing systems operating today belong to static ridesharing, which arrange the driver

4.1 Forecasting through HMMWith tweet streams as input, the model introduced by Zhao et al. [26] can forecast spatiotemporal events (e.g., the oneshown in Figure 3) involving multiple stages. This model is built by modeling the evolution of events, which can thereforepredict the events at multiple stages. Some Twitter forecasting models only focus on prediction of temporal pattern [19],while this work jointly modeled the structural contexts, geo-locations, and time in one frame. The spatial informationis harnessed through assignments of geographical priors, the accurate sequence likelihood is estimated through dynamicprogramming, and historical geographical information is used here for the prediction of new event location. It also enablesthe understanding for the relationship of inside and outside event venue under the tweet observations as it is built to modelthe evolutionary development of events.

4.2 Forecasting with multi-task learningSupervised methods (e.g., LASSO) are of great use in identifying static features such as keywords, and unsupervisedmodels (e.g., DQE) are suitable to handle dynamical features. Tweet streams are known to contain both sets of features.Zhao et al. proposed a novel multi-task learning framework that can concurrently address both the statical and dynamicalfeatures [28]. Specifically, given locations (e.g., cities) as input, the proposed model is able to forecast events for alllocations simultaneously. One secret of its success is that this model can extract and utilize shared information amonglocations and therefore effectively increases the sample size for each individual location. The other advantage is that themodel considers both the static features from a predefined vocabulary (made by domain experts) and dynamic featuresfrom DQE [24] in one multi-task feature learning framework. Different strategies are used to control the common set offeatures and thus balance the homogeneity and diversity between static and dynamic terms.

5 ConclusionRapid developments in social media bring new opportunities for the spatial computing community. This paper reviews themost popular research branches of social media event mining, including theme tracking and backward analysis, on-goingevent detection, and future event forecasting. Tracking and backward analysis is a relatively new research branch and willcontinue to attract more attention. The key issues in this area may include: identifying information of users’ interests,correlating multiple data sources, and evaluating the influences between users/topics/events. Compared to forecasting andbackward analysis, on-going event detection of targeted domain is relatively well developed. Unsupervised and semi-supervised methodologies that require less cost in human effort should be widely used in practical applications soon.Prediction via social media is the most popular research direction but far from well studied. The most challenging problemsin this topic are how to model event evolution and how to extend existing forecasting models to social media data.

6 AcknowledgementThis work is supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of InteriorNational Business Center (DoI/NBC) contract D12PC00337. The views and conclusions contained herein are those of theauthors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed orimplied, of IARPA, DoI/NBC, or the US government.

References[1] Marta Arias, Argimiro Arratia, and Ramon Xuriguera. Forecasting with twitter data. In ACM Transactions on

Intelligent Systems and Technology (TIST), volume 5, page 8. ACM, 2013.

[2] David M Blei and John D Lafferty. Dynamic topic models. In Proceedings of the 23rd international conference onMachine learning, pages 113–120. ACM, 2006.

[3] Johan Bollen, Huina Mao, and Xiaojun Zeng. Twitter mood predicts the stock market. In Booktitle of ComputationalScience, volume 2, pages 1–8. Elsevier, 2011.

23

Page 28: The SIGSPATIAL Special · ing. Ridesharing can be either static or dynamic [9, 10]. Most ridesharing systems operating today belong to static ridesharing, which arrange the driver

[4] Carlos Castillo, Marcelo Mendoza, and Barbara Poblete. Information credibility on twitter. In Proceedings of the20th international conference on World wide web, pages 675–684. ACM, 2011.

[5] Gabriel Pui Cheong Fung, Jeffrey Xu Yu, Philip S Yu, and Hongjun Lu. Parameter free bursty events detection intext streams. In Proceedings of the 31st international conference on Very large data bases, pages 181–192. VLDBEndowment, 2005.

[6] Aditi Gupta, Ponnurangam Kumaraguru, Carlos Castillo, and Patrick Meier. Tweetcred: Real-time credibility assess-ment of content on twitter. In Social Informatics, pages 228–243. Springer, 2014.

[7] Jingrui He, Wei Shen, Phani Divakaruni, Laura Wynter, and Rick Lawrence. Improving traffic prediction with tweetsemantics. In Proceedings of the Twenty-Third international joint conference on Artificial Intelligence, pages 1387–1393. AAAI Press, 2013.

[8] Ting Hua, Feng Chen, Liang Zhao, Chang-Tien Lu, and Naren Ramakrishnan. Sted: semi-supervised targeted-interest event detectionin in twitter. In Proceedings of the 19th ACM SIGKDD international conference on Knowledgediscovery and data mining, pages 1466–1469. ACM, 2013.

[9] Ting Hua, Chang-Tien Lu, Naren Ramakrishnan, Feng Chen, Jaime Arredondo, David Mares, and Kristen Summers.Analyzing civil unrest through social media. In Computer, number 12, pages 80–84. IEEE, 2013.

[10] Ting Hua, Ning Yue, Feng Chen, Chang-Tien Lu, and Naren Ramakrishnan. Topical analysis of interactions betweennews and social media. In Proceedings of the 30th AAAI Conference on Artificial Intelligence, 2016.

[11] Giridhar Kumaran and James Allan. Text classification and named entities for new event detection. In Proceedings ofthe 27th annual international ACM SIGIR conference on Research and development in information retrieval, pages297–304. ACM, 2004.

[12] Jure Leskovec, Lars Backstrom, and Jon Kleinberg. Meme-tracking and the dynamics of the news cycle. In Proceed-ings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 497–506.ACM, 2009.

[13] Rui Li, Kin Hou Lei, Ravi Khadiwala, and Kevin Chen-Chuan Chang. Tedas: A twitter-based event detection andanalysis system. In Proceedings of the 28th International Conference on, pages 1273–1276. IEEE, 2012.

[14] Cindy Xide Lin, Qiaozhu Mei, Jiawei Han, Yunliang Jiang, and Marina Danilevsky. The joint inference of topicdiffusion and evolution in social communities. In Data Mining (ICDM), 2011 IEEE 11th International Conferenceon, pages 378–387. IEEE, 2011.

[15] Cindy Xide Lin, Bo Zhao, Qiaozhu Mei, and Jiawei Han. Pet: a statistical model for popular events tracking in socialcommunities. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and datamining, pages 929–938. ACM, 2010.

[16] Jimmy Lin, Rion Snow, and William Morgan. Smoothing techniques for adaptive online language models: topictracking in tweet streams. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discoveryand data mining, pages 422–429. ACM, 2011.

[17] Takeshi Sakaki, Makoto Okazaki, and Yutaka Matsuo. Earthquake shakes twitter users: real-time event detection bysocial sensors. In Proceedings of the 19th International Conference on World Wide Web, pages 851–860. ACM, 2010.

[18] Zeynep Tufekci and Christopher Wilson. Social media and the decision to participate in political protest: Observationsfrom tahrir square. In Booktitle of Communication, volume 62, pages 363–379. Wiley Online Library, 2012.

[19] Andranik Tumasjan, Timm Oliver Sprenger, Philipp G Sandner, and Isabell M Welpe. Predicting elections withtwitter: What 140 characters reveal about political sentiment. In ICWSM, volume 10, pages 178–185, 2010.

[20] Xiaofeng Wang, Matthew S Gerber, and Donald E Brown. Automatic crime prediction using events extracted fromtwitter posts. In Social Computing, Behavioral-Cultural Modeling and Prediction, pages 231–238. Springer, 2012.

24

Page 29: The SIGSPATIAL Special · ing. Ridesharing can be either static or dynamic [9, 10]. Most ridesharing systems operating today belong to static ridesharing, which arrange the driver

[21] Xuerui Wang and Andrew McCallum. Topics over time: a non-markov continuous-time model of topical trends. InProceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pages424–433. ACM, 2006.

[22] Wouter Weerkamp and Maarten de Rijke. Credibility-inspired ranking for blog post retrieval. In Information Re-trieval, volume 15, pages 243–277. Springer, 2012.

[23] Xintian Yang, Amol Ghoting, Yiye Ruan, and Srinivasan Parthasarathy. A framework for summarizing and analyzingtwitter feeds. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and datamining, pages 370–378. ACM, 2012.

[24] Liang Zhao, Feng Chen, Jing Dai, Ting Hua, Chang-Tien Lu, and Naren Ramakrishnan. Unsupervised spatial eventdetection in targeted domains with applications to civil unrest modeling. In PLOS ONE, volume 9, 2014.

[25] Liang Zhao, Feng Chen, Chang-Tien Lu, and Naren Ramakrishnan. Dynamic theme tracking in twitter. In Proceed-ings of the 3rd IEEE International Conference on Big Data, pages 561–570, 2015.

[26] Liang Zhao, Feng Chen, Chang-Tien Lu, and Naren Ramakrishnan. Spatiotemporal event forecasting in social media.In Proceedings of the 15th SIAM International Conference on Data Mining, pages 963–971, 2015.

[27] Liang Zhao, Ting Hua, Chang-Tien Lu, and Ray Chen. A topic-focused trust model for twitter. In Computer Com-munications. Elsevier, 2015.

[28] Liang Zhao, Qian Sun, Jieping Ye, Feng Chen, Chang-Tien Lu, and Naren Ramakrishnan. Multi-task learning forspatio-temporal event forecasting. In Proceedings of the 21th ACM SIGKDD International Conference on KnowledgeDiscovery and Data Mining, pages 1503–1512, 2015.

25

Page 30: The SIGSPATIAL Special · ing. Ridesharing can be either static or dynamic [9, 10]. Most ridesharing systems operating today belong to static ridesharing, which arrange the driver

Point-of-Interest Recommendations in Location-BasedSocial Networks

Jia-Dong Zhang, Chi-Yin ChowDepartment of Computer Science, City University of Hong Kong, Hong Kong

Abstract

Location-based social networks (LBSNs), e.g., Foursquare, Gowalla and Yelp, bridge the physical worldwith the virtual online world. LBSNs have accumulated plenty of community-contributed data such associal links between users, check-ins of users on points-of-interest (POIs), geographical information andcategories of POIs, which reflect the preferences of users to POIs. Recommending users with their pre-ferred POIs benefits people to explore new places and businesses to discover potential customers. Thispaper aims to recommend personalized POIs for users based on their preferences that are learned fromthe community-contributed data. To this end, this paper models the social, categorical, geographical,sequential, and temporal influences on the visiting preferences of users to POIs.

1 Introduction

With the rapid pervasiveness of mobile devices embedded with wireless communication and location acquisitionabilities, location-based social networks (LBSNs) such as Foursquare, Gowalla, Brightkite, Yelp, and Facebookplaces, have become some of the most popular Internet applications and attracted millions of users. The LBSNsbridge the physical world with the virtual online world. In an LBSN (Figure 1), users can establish sociallinks with each other to share their experiences of visiting some interesting locations, also known as points-of-interest (POIs), e.g., restaurants, stores, and museums, through performing check-ins to these POIs in theLBSNs via their handheld device. In LBSNs, there are plenty of community-contributed data including sociallinks between users, check-ins of users on POIs, geographical information and categories of POIs. These richdata are the reflection of human behaviors in reality and bring new opportunities to model the decision makingprocess of users visiting POIs. In the LBSNs, it is crucial to recommend personalized POIs to users based ontheir preferences learned from the community-contributed data, which benefits for users to know new POIs anddiscover a city while for businesses to delivery advertisements to targeted users and improve business profits.

In LBSNs, there are five major characteristics that affect the visiting preferences or check-in behaviorsof users to POIs. (1) Social influence. In the real world, people interact with each other. For example, friendsoften go to some places like movie theaters or restaurants together, or a person may travel on spots highlyrecommended by her friends. Thus, a person’s preference on POIs can be influenced by her close friends or agroup of friends that are likely to share some common interests. Accordingly, in LBSNs, users establish sociallinks and form communities to share their experiences of visiting POIs. (2) Categorical influence. The categoryof a POI reflects its usual business activities and nature. For instance, a person checking in a restaurant indicatesthat she may have a meal and checking in a cinema means that she is watching a movie there. In practice, peoplehave shown different biases on the categories of POIs: a foodie often visits restaurants to taste a variety of food,and a tourism enthusiast usually travels on tourism attractions all over the world. (3) Geographical influence.Spatial POIs are totally different from other non-spatial items, e.g., books, music and movies in conventional

26

Page 31: The SIGSPATIAL Special · ing. Ridesharing can be either static or dynamic [9, 10]. Most ridesharing systems operating today belong to static ridesharing, which arrange the driver

Check-ins

Points-of-

Interest

in Map

Social Links

Users

StoreBar CinemaMuseum RestaurantStadium

Figure 1: A location-based social network

recommender systems, because physical interactions are required for users to visit POIs. The geographicalinformation (i.e., latitude and longitude coordinates) of POIs significantly affects users’ check-in behaviors. Forinstance, people tend to visit POIs close to their homes or offices, and also may be interested in exploring thenearby places of their visited POIs. (4) Sequential influence. In reality, human movements exhibit sequentialpatterns. For instance, cinemas or bars may be usually visited after restaurants as users would like to relax afterdinner, and checking in stadiums first and then restaurants is better than the reverse way because it is not healthyto exercise right after a meal. Thus, the influence of sequential patterns is also important for users’ check-inbehaviors. (5) Temporal influence. Time is a very important factor influencing human activities at differenttimes on weekdays and weekends. For example, users often visit restaurants at noon on weekdays and bars atmidnight on weekends. These weekday and weekend patterns reflect the temporal check-in preferences of usersto POIs, which can be used to make time-aware POI recommendations by suggesting properly visiting time.

This paper aims to exploit the social, categorical, geographical, sequential, and temporal influences to rec-ommend personalized POIs for users, in which the key tasks are to estimate the preference or relevance scoresof a user to her unvisited POIs and return the POIs with the top-k highest preference scores for the user.

2 Modeling Social Influence

In reality, the social links between users greatly affect the check-in behaviors of users to POIs. Existing workssimply employ the social links of users to derive the similarities between users and integrate them into the tra-ditional collaborative filtering techniques. Nevertheless, the traditional collaborative filtering techniques oftensuffer from the data sparsity problem in the user-POI check-in matrix, since users only visit a very small pro-portion of POIs in an LBSN. Thus, it is much better to devise a new and sophisticated approach to exploit thesocial links between users for POI recommendations. In our recent study [5], we deduce the relevance score ofa user and an unvisited POI through leveraging the social correlations between the user with her friends whohave visited the POI. The process consists of three steps: social frequency aggregation, distribution estimationof social frequency, and social relevance score computation.

Step 1: Social frequency aggregation. Formally, given a user u and an unvisited POI l, we aggregate thecheck-in frequency or rating xu,l of the user u’s friends (i.e., u′ with Su,u′ = 1) on the POI l, given by

xu,l =∑

u′∈USu,u′ ·Ru′,l, (1)

where Ru′,l is the frequency or rating of user u′ visiting POI l and Su,u′ indicates whether there exists a sociallink between users u and u′. One can naively regard the social check-in frequency xu,l as the relevance scorebetween user u and POI l or simply divide xu,l by the number of friends of u as in the traditional collaborativefiltering techniques, but more sophisticatedly in this study we transform the social check-in frequency into a

27

Page 32: The SIGSPATIAL Special · ing. Ridesharing can be either static or dynamic [9, 10]. Most ridesharing systems operating today belong to static ridesharing, which arrange the driver

100

101

102

103

10−5

10−4

10−3

10−2

10−1

100

Social check−in frequency

Pro

babi

lity

dens

ity ← β=2.0799

(a) Foursquare

100

101

102

103

10−5

10−4

10−3

10−2

10−1

100

Social check−in frequency

Pro

babi

lity

dens

ity

← β=1.6196

(b) Yelp

Figure 2: Social check-in frequency distribution in the real-world data

normalized relevance score based on the social check-in frequency distribution that is learned from the historicalcheck-in data of all users.

Step 2: Distribution estimation of social frequency. In real-world data sets, the social check-in frequencyrandom variable x follows a power-law distribution, the probability density function of which is defined by

fSo(x) = (β − 1)(1 + x)−β, x ≥ 0, β > 1, (2)

where β is estimated by the check-in or rating matrix R and social link matrix S based on maximum likelihoodestimation:

β = 1 + |U ||L|[∑

u′∈U

∑l′∈L

ln(1 +

∑u′′∈U

Su′,u′′ ·Ru′′,l′

)]−1, (3)

in which∑

u′′∈U Su′,u′′ ·Ru′′,l′ is the social check-in frequency of the friends u′′ of user u′ on POI l′.To observe the real distribution of the social check-in frequency, we conducted analysis on the two pub-

licly available real-world data sets with social links between users and categories of POIs: Foursquare [1] andYelp [2]. Figure 2 shows that the social check-in frequency (i.e., the dots) in the two real-world data sets fits acertain power-law distribution very well (i.e., the line), estimated through Equations (2) and (3). Thus, modelingthe social check-in frequency as a power-law distribution is reasonable and effective.

Step 3: Social relevance score computation. The estimated probability density function fSo in Equation (2)is monotonically decreasing with respect to the social check-in frequency x, but the social relevance score shouldbe monotonically increasing with regard to the social check-in frequency because friends share more commoninterests on POIs. Thus, we define the social relevance score of xu,l in Equation (1) based on the cumulativedistribution function of fSo, given by

FSo(xu,l) =

∫ xu,l

0fSo(z)dz = 1− (1 + xu,l)

1−β, (4)

where FSo is an increasing function on the social check-in frequency xu,l because of 1−β < 0. Moreover, basedon the cumulative distribution function FSo in Equation (4), the social check-in frequency xu,l is transformedinto a social relevance score that reflects the relative position of xu,l in all social check-in frequencies.

3 Modeling Categorical Influence

In practice, the category of a POI has a strong indication about what activities happen in the POI and peoplehave shown distinct biases on the categories of POIs. Hence, we also can derive the relevance score of a user

28

Page 33: The SIGSPATIAL Special · ing. Ridesharing can be either static or dynamic [9, 10]. Most ridesharing systems operating today belong to static ridesharing, which arrange the driver

100

101

102

103

10−8

10−6

10−4

10−2

100

Categorical popularity

Pro

babi

lity

dens

ity ← γ=2.5231

(a) Foursquare

100

101

102

103

10−8

10−6

10−4

10−2

100

Categorical popularity

Pro

babi

lity

dens

ity ← γ=1.5455

(b) Yelp

Figure 3: Categorical popularity distribution in the real-world data

to an unvisited POI through exploiting the categorical correlations between the visited POIs and the unvisitedPOI of the user. In addition, the popularity of a POI reflects the quality of products or services offered by thePOI, e.g., a popular restaurant usually indicates that it supplies high-quality foods. Therefore, it is helpful toutilize the popularity for POI recommendations. Specifically, we develop a new approach [5] to combine thecategory bias of a user and the popularity of a POI into a relevance score between the user and POI through threesteps: weighing popularity by categorical bias, distribution estimation of categorical popularity, and categoricalrelevance score computation.

Step 1: Weighing popularity by categorical bias. At first, we take the bias of a user u to a certain categoryc as Bu,c, i.e., the frequency of user u visiting the POIs that belong to category c. Then, the bias Bu,c is used toweigh the popularity or overall rating of an unvisited POI l in category c, i.e., Pc,l. Correspondingly, we obtainthe categorical popularity yu,l for user u on POI l as follows:

yu,l =∑

c∈CBu,c ·Pc,l, (5)

where a larger value of yu,l indicates that the category of POI l is more satisfied with the bias of user u andthe POI l is more popular to the general public. One may naively consider the categorical popularity yu,l as therelevance score between user u and POI l or simply normalize the categorical bias Bu,c in advance. Nevertheless,in this research the categorical popularity of a user to an unvisited POI is sophisticatedly mapped into a relevancescore based on the distribution of the categorical popularity that is learned from the historical check-in data.

Step 2: Distribution estimation of categorical popularity. As the distribution of the social check-infrequency, we apply the similar process to build the distribution of the categorical popularity. Formally, weassume the probability density function of the categorical popularity random variable y, defined by

fCa(y) = (γ − 1)(1 + y)−γ , y ≥ 0, γ > 1, (6)

in which γ can be learned from the categorical bias matrix B and popularity matrix P based on maximumlikelihood estimation:

γ = 1 + |U ||L|[∑

u′∈U

∑l′∈L

ln(1 +

∑c∈C

Bu′,c ·Pc,l′

)]−1, (7)

where∑

c∈C Bu′,c ·Pc,l′ is the categorical popularity of user u′ on POI l′.As depicted in Figure 3, we have also observed that the categorical popularity (i.e., the dots) in the two

real-world data sets [1, 2] approaches to the power-law distribution (i.e., the line) that is estimated in termsof Equations (6) and (7). In addition, when the categorical popularity is higher than 200, the deviation of the

29

Page 34: The SIGSPATIAL Special · ing. Ridesharing can be either static or dynamic [9, 10]. Most ridesharing systems operating today belong to static ridesharing, which arrange the driver

estimated power-law distribution becomes larger. Fortunately, the categorical popularity has a considerably lowprobability with the value that is higher than 200. Thus, these results have validated that the assumption of thepower-law distribution is in accordance with reality.

Step 3: Categorical relevance score computation. Similarly, the estimated probability density functionfCa in Equation (6) is monotonically decreasing regarding the categorical popularity y; however, the categoricalrelevance score is monotonically increasing respecting the categorical popularity, since people prefer the popularPOIs that also meet their categorical biases. To this end, we employ the cumulative distribution function of fCa

to obtain the categorical relevance score of yu,l in Equation (5), given by

FCa(yu,l) =

∫ yu,l

0fCa(z)dz = 1− (1 + yu,l)

1−γ , (8)

where due to 1 − γ < 0, FCa is an increasing function with respect to the categorical popularity yu,l. Impor-tantly, the categorical yu,l is also normalized into a categorical relevance score, i.e., the relative position of yu,lcompared to other categorical popularities of users on POIs.

4 Modeling Personalized Geographical Influence

The geographical information of POIs plays a significant influence on users’ check-in behaviors and has beenintensively exploited to make POI recommendations for users. Current works usually model the geographi-cal influence as a universal distance distribution for all users. However, the geographical influence on users’check-in behaviors is unique. For instance, indoorsy persons like visiting POIs around their living areas whileoutdoorsy persons prefer traveling around the world to explore new POIs. Therefore, we argue that the influenceof geographical information on individual users’ check-in behaviors should be personalized when recommend-ing POIs for users. In our previous studies, we model the geographical influence for each user as an individualone-dimensional distance distribution [3, 9] or two-dimensional check-in distribution [4].

This paper presents the approach that models the geographical influence as two-dimensional check-in distri-butions over latitude and longitude coordinates, which are more reasonable and intuitive than one-dimensionaldistance distributions. The reason is twofold. (1) The probability of a user visiting a location is not simplymonotonous respecting their distance, because the visiting probability is not only affected by the distance butalso the location’s intrinsic characteristics. For example, in reality the check-in locations of a user are usuallydistributed in several areas. (2) It is hard to compute a visiting probability for a location based on a distancedistribution, since it needs to find a reference location to derive a reasonable distance for the location in the firstplace. Conversely, it is considerably intuitive to employ a two-dimensional check-in distribution to compute avisiting probability for any location with latitude and longitude.

Hence, we utilize the personalized two-dimensional geographical influence for POI recommendations.Specifically, we estimate a personalized two-dimensional check-in probability density for each user, based on thekernel density estimation (KDE) that does not have any assumption on the form of the underlying distribution.Let Lu = {l1, l2, . . . , ln} be the set of locations of POIs visited by the user u, the two-dimensional check-indensity f using Lu is given by:

f(l) =1

nσ2

∑n

i=1K

(l− liσ

), (9)

where each location li = (lati, loni)T is a two-dimensional column vector with the latitude (lati) and longitude

(loni), K(·) is the kernel function and σ is a smoothing parameter, called the bandwidth. In our paper [4], weapply the widely used standard two-dimensional normal kernel:

K(x) =1

2πexp(−1

2xTx). (10)

30

Page 35: The SIGSPATIAL Special · ing. Ridesharing can be either static or dynamic [9, 10]. Most ridesharing systems operating today belong to static ridesharing, which arrange the driver

Figure 4: Personal check-in probability density over two-dimensional geographic coordinates

Figure 4 depicts the individual check-in probability density of three users randomly chosen from Foursquarebased on Equation (9). We have the following two findings. (1) The geographical influence of locations on thesethree users’ check-in behaviors is unique since their check-in probability densities are distinct from each other.(2) These check-in probability densities are usually multimodal rather than unimodal or monotonous.

5 Modeling Sequential Influence

Human movement exhibits spatiotemporal sequential influence. The sequential influence may associate withthe time of a day (e.g., people usually visit museums or libraries at daytime, go to restaurants for dinner in theevening, and then relax in cinemas or bars at night), the geographical proximity of POIs (e.g., tourists oftenorderly visit London Eye, Big Ben, Downing Street, Horse Guards, and Trafalgar Square), the place nature andhuman preference (e.g., checking in stadiums first and then restaurants is better than the reverse way becauseit is not healthy to exercise right after a meal). To utilize the sequential influence for POI recommendations,current methods apply the first-order Markov chain by assuming that the next possibly visiting POI of a user onlyrelies on her latest visited POI. Nevertheless, in reality the next POI may depend on her all visited POIs. Hence,in our previous research [6, 8], we propose a new POI recommendation approach with sequential influence basedon additive Markov chain (AMC) that considers the effect of all visited POIs on the next visiting POI. In theAMC, the sequential patterns are represented as a location-location transition graph (L2TG).

Definition 1 (L2TG): A location-location transition graph (L2TG) G = (L,E) consists of a set of nodes L anda set of edges E ⊆ L × L. Each node li ∈ L represents a POI associated with an outgoing count of li as atransition predecessor to other POIs denoted by OCount(li). And each edge (li, lj) ∈ E represents a transitionli → lj associated with a transition count denoted by TCount(li, lj).

In Definition 1, L2TG is associated with transition counts and outgoing counts instead of transition prob-abilities so that L2TG can be incrementally updated in an online fashion. In terms of transition counts andoutgoing counts associated with L2TG, transition probabilities can be determined based on Definition 2.

Definition 2 (Transition probability): If the outgoing count of li is non-zero, i.e., OCount(li) > 0, the tran-sition probability of li → lj , denoted TP (li → lj), is calculated by

TP (li → lj) = TCount(li, lj)/OCount(li). (11)

Otherwise, TP (li → lj) = 1 for lj = li and TP (li → lj) = 0 for lj ̸= li.

31

Page 36: The SIGSPATIAL Special · ing. Ridesharing can be either static or dynamic [9, 10]. Most ridesharing systems operating today belong to static ridesharing, which arrange the driver

By Definition 2, if the outgoing count of li is non-zero, the transition probability of li → lj is defined asthe proportion of TCount(li, lj) to OCount(li) in Equation (11), which is essentially the relative frequencydefinition of probability. On the other hand, if OCount(li) = 0 that means all users do not check in any otherPOIs after li; accordingly we define the transition probability of li to itself is one for simplicity.

Therefore, given a POI sequence Su = ⟨l1, l2, . . . , ln⟩, our AMC defines the sequential probability of visitinga new POI ln+1 by

p(ln+1|Su) ∝∑n

i=12−α·(n−i) · TP (li → ln+1), (12)

where 2−α·(n−i) represents the sequence decay weight with the decay rate parameter α ≥ 0 and the larger αis, the higher is the decay rate. More importantly, the transition probability TP (li → ln+1) of li to ln+1 isweighed through leaning towards recently visited POIs, since the POIs with recent check-in timestamps usuallyhave stronger influence on a newly possibly visiting POI than the POIs with old timestamps.

6 Modeling Temporal Influence for Time-aware POI Recommendations

Heretofore, all aforementioned modeling approaches cannot suggest appropriate time for users to visit a rec-ommended POI, because they do not consider the influence of the temporal context when users visiting POIson their check-in behaviors. In reality, time is a very important factor influencing human activities at differ-ent times on weekdays and weekends. To suggest properly visiting time when recommending POIs for users,existing methods split a day into time slots, e.g., 24 hours, and apply collaborative filtering recommendationtechniques to infer users’ preferences on POIs at each time slot separately. Unfortunately, these methods gener-ally suffer from two major limitations due to discretization: time information loss and lack of temporal influencecorrelations between different times. Thus, we propose a probabilistic framework to model continuous temporalinfluence for time-aware POI recommendations in our recent study [7].

In the problem of time-aware POI recommendations, it is required to not only recommend interesting POIsto users based on their preferences but also suggest proper time for users to visit recommended POIs. That is, weneed to predict the probability p(l|u, T ) of user u visiting POI l ∈ L at time interval T . In terms of probabilitytheory,

p(l|u, T ) = p(l|u)p(T |u, l)p(T |u)

∝ p(l|u)p(T |u, l) = p(l|u)∫t∈T

f(t|u, l)dt, ∀l ∈ L, (13)

where p(l|u) is the prior probability of user u visiting POI l that is independent of time interval T and can bederived using any non-time-aware methods, and f(t|u, l) is the time probability density conditioned on user uand POI l that is essential to utilize the temporal influence. We also estimate the time probability density basedon KDE, given by

f(t|u, l) ∝∑

ti∈Su,lWu,l(ti)

1

σK

(t⊖ tiσ

), (14)

where t ⊖ ti is their time difference, Su,l is the time sample for estimating f(t|u, l) and Wu,l(ti) is the weightof the sample point ti.

Note that usually u has not checked in POI l yet, so we need to obtain the time sample Su,l based on the twoimportant kinds of temporal influence correlations: (1) The check-in behaviors of different users to the same POIat different times may be correlated. For example, a group of friends may visit a POI at different times, becausethey have the common interest in the POI, but with different available time. (2) The check-in behaviors of thesame user to different POIs at different times may be correlated as well. For instance, the POIs belonging to thesame category may be visited by a user at different times, because she could visit the POIs for different purposes(e.g., a user visits a restaurant for a breakfast, lunch or dinner). Thus, we can derive the time sample Su,l of useru to POI l by combining the check-in samples: (i) Du′,l of another user u′ visiting l (i.e., u, u′ ∈ U ∧ u ̸= u′)

32

Page 37: The SIGSPATIAL Special · ing. Ridesharing can be either static or dynamic [9, 10]. Most ridesharing systems operating today belong to static ridesharing, which arrange the driver

and (ii) Du,l′ of u visiting another POI l′ (i.e., l, l′ ∈ L ∧ l ̸= l′). Formally,

Su,l =(∪

u′∈U{ti|ti ∈ Du′,l}

)∪(∪l′∈L

{tj |tj ∈ Du,l′}), (15)

Further, we consider the cosine similarity between users or POIs using their check-in data as the sample weightof the corresponding time sample points, since the higher the similarity is, the smaller is the time difference ofusers visiting POIs, i.e.,

∀ti ∈ Du′,l,Wu,l(ti) = sim(u, u′);∀tj ∈ Du,l′ ,Wu,l(tj) = sim(l, l′). (16)

7 Conclusions and Future Research Directions

To recommend personalized POIs for users, this paper proposes the approaches for modeling the social, cate-gorical, geographical, sequential, and temporal influences on the visiting preferences of users to POIs. Theseapproaches complement each other and can be integrated together to improve the quality of recommendationresults. For example, the work [5] employs the robust product rule to combine the social, categorical, andgeographical influence, while the literature [10] develops a gravity model to fuse the social influence with s-patiotemporal sequential influence. Hence, one research direction is to devise new methods to integrate allinfluences. Our recent study [10] mines the user opinions on POIs from textual comments to derive the userpreferences and obtains better POI recommendations for users. Thus, another research direction is to explorenew opinion mining methods for understanding the specific preferences of users on different aspects of POIs.

References

[1] H. Gao, J. Tang, X. Hu, and H. Liu. Content-aware point of interest recommendation on location-basedsocial networks. In AAAI, pages 1721–1727, 2015.

[2] Yelp. Challenge Data Set. http://www.yelp.com/dataset_challenge, 2014.

[3] J.-D. Zhang and C.-Y. Chow. iGSLR: Personalized geo-social location recommendation - a kernel densityestimation approach. In ACM SIGSPATIAL, pages 334–343, 2013.

[4] J.-D. Zhang and C.-Y. Chow. CoRe: Exploiting the personalized influence of two-dimensional geographiccoordinates for location recommendations. Information Sciences, 293:163–181, 2015.

[5] J.-D. Zhang and C.-Y. Chow. GeoSoCa: Exploiting geographical, social and categorical correlations forpoint-of-interest recommendations. In ACM SIGIR, pages 443–452, 2015.

[6] J.-D. Zhang and C.-Y. Chow. Spatiotemporal sequential influence modeling for location recommendations:A gravity-based approach. ACM TIST, 7(1):11:1–11:25, 2015.

[7] J.-D. Zhang and C.-Y. Chow. TICRec: A probabilistic framework to utilize temporal influence correlationsfor time-aware location recommendations. IEEE TSC, accepted, 2015.

[8] J.-D. Zhang, C.-Y. Chow, and Y. Li. LORE: Exploiting sequential influence for location recommendations.In ACM SIGSPATIAL, pages 103–112, 2014.

[9] J.-D. Zhang, C.-Y. Chow, and Y. Li. iGeoRec: A personalized and efficient geographical location recom-mendation framework. IEEE TSC, 8(5):701–714, 2015.

[10] J.-D. Zhang, C.-Y. Chow, and Y. Zheng. ORec: An opinion-based point-of-interest recommendation frame-work. In ACM CIKM, pages 1641–1650, 2015.

33

Page 38: The SIGSPATIAL Special · ing. Ridesharing can be either static or dynamic [9, 10]. Most ridesharing systems operating today belong to static ridesharing, which arrange the driver

The SIGSPATIAL Special

Section 2: Event Reports

ACM SIGSPATIAL

http://www.sigspatial.org

Page 39: The SIGSPATIAL Special · ing. Ridesharing can be either static or dynamic [9, 10]. Most ridesharing systems operating today belong to static ridesharing, which arrange the driver

GeoPrivacy 2015 Workshop ReportThe Second ACM SIGSPATIAL International Workshop

on Privacy in Geographic Information Collection andAnalysis

Seattle, WA, USA - November 3, 2015Grant McKenzie1 Krzysztof Janowicz1 Gueorgi Kossinets2

1University of California, Santa Barbara, USA2Google Inc., USA

[email protected] [email protected] [email protected](Workshop Co-chairs)

Developments in mobile and surveying technologies over the past decade have enabled the collection ofindividual-level and aggregated geographic information at unprecedented scale. These data are valuable sourcesfor answering scientific questions about human behavior and improving related services, from public trans-portation to location-aware recommendations. However, privacy intrusion is an imminent risk when individualtrajectories (and in some cases aggregated travel patterns) are used for commercial purposes such as customerprofiling, or even for political persecution. Similarly, there is a trade-off between location privacy and quality ofspatial search and recommender systems. The GeoPrivacy workshop will hence focus on discussing methods toprotect individuals’ privacy while enabling collection, analysis, and sharing of useful geographic information.

GeoPrivacy 2015 (http://stko.geog.ucsb.edu/geoprivacy2015) was held in conjunction with the 23nd ACMSIGSPATIAL International Conference on Advances in Geographic Information Systems (SIGSPATIAL 2015)on November 3, 2015 in Seattle, WA, USA. This workshop touched on an area of geospatial science that affectsany researcher working with real-world geodata. With the recent rise in geosocial networking applications aswell as advances on location-enabled mobile devices, the topic of geoprivacy has become a major discussionpoint both in location-specific research as well as everyday life. This workshop offered a unique platform fromwhich to really delve into a dialog on issues related to privacy and credibility within the domain of geoscienceand computational geography. The goal of this workshop was to bring together researchers, developers and usersof geospatial data to explore methods, techniques, datasets and issues surrounding an area of GI science that hasattracted significant interest among researchers and the public.

The workshop received 8 submissions of which 5 research papers (4 full and 1 short) were accepted forpublication in the proceedings and for presentations (30 minutes for each paper). The one-day workshop openedwith a thought-provoking keynote from Dr. Darakhshan Mir and the remainder of the day was split in to twosessions, Geoprivacy Preservation and Geoprivacy Application. The GeoPrivacy workshop concluded withstructured discussion on the “Big questions” in GeoPrivacy from social and technical perspectives.

We would sincerely like to thank the authors for publishing and presenting their work at GeoPrivacy 2015,the keynote speaker and the program committee members and external reviewers for their thoughtful evaluationand help in the paper review process. We hope that readers of the workshop proceedings will find it interestingand it will motivate continued discussion on the future of geoprivacy-based research.

34

Page 40: The SIGSPATIAL Special · ing. Ridesharing can be either static or dynamic [9, 10]. Most ridesharing systems operating today belong to static ridesharing, which arrange the driver

EM-GIS 2015 Workshop ReportThe First ACM SIGSPATIAL International Workshop on

the Use of GIS in Emergency ManagementSeattle, Washington, USA - November 3, 2015

Hui Zhang1 Yan Huang2 Jean-Claude Thill31Institute of Public Safety Research, Department of Engineering Physics, Tsinghua University, China

2Department of Computer Science and Engineering, University of North Texas, USA3Department of Geography & Earth Sciences, University of North Carolina at Charlotte, USA

[email protected] [email protected] [email protected](Workshop Co-chairs)

Emergency management aims to develop strategies and establish operations to decrease the potential impactof unexpected events (i.e., human or natural disasters). By quick response and rescue, it saves human livesfrom the secondary disasters and enhances the stability of communities after disasters. Emergency managementrequires lots of new geospatial technologies to support the quick response and recovery and the integrating oflocation-based wireless information streams. With the advances of GIS technologies, the improvement of theemergency management research becomes possible.

EM-GIS 2015 (http://www.dviz.cn/em-gis2015/) was held in conjunction with the 23rd ACM SIGSPATIALInternational Conference on Advances in Geographic Information Systems (SIGSPATIAL 2015) on Novem-ber 3, 2015 in Seattle, Washington, USA. It aims at bringing together researchers and practitioners in massivespatio-temporal data management, spatial database, spatial data analysis, spatial data visualization, data integra-tion, model integration, cloud computing, parallel algorithms, internet of things, complex event detection, opti-mization theory, intelligent transportation systems and social networks to support better public policy throughdisaster detection, response and rescue. EM-GIS’s goal is to foster an opportunity for researchers from thesecommunities to gather and discuss ideas that will influence the emergency management research with the aid ofthe advances in GIS technologies.

EM-GIS 2015 has received 35 submissions in which 12 research papers were accepted as full researchpapers and 9 were accepted as short papers. Each paper is presented on the workshop (20 or 15 minutes foreach paper). EM-GIS 2015 was a one-day workshop consisting of four sessions: (1) EM-GIS Applications,(2) EM-GIS strategies, (3) EM-GIS Analytics and (4) HealthGIS researches.

We would like to thank the authors for publishing and presenting their papers in EM-GIS 2015, and theprogram committee members and external reviewers for their professional evaluation and help in the paperreview process. We hope that the proceedings of EM-GIS 2015 will inspire new research ideas, and that youwill enjoy reading them.

35

Page 41: The SIGSPATIAL Special · ing. Ridesharing can be either static or dynamic [9, 10]. Most ridesharing systems operating today belong to static ridesharing, which arrange the driver

MELT 2015 Workshop ReportThe Fifth ACM SIGSPATIAL International Workshop on

Mobile Entity Localization and Tracking in GPS-lessEnvironments

Seattle, Washington, USA - November 3, 2015

Ying Zhang1 Bodhi Priyantha21Google Inc., 1600 Amphitheatre Pkwy, Mountain View, CA, USA2Microsoft Research, One Microsoft Way, Redmond, WA, [email protected] [email protected]

(Workshop Co-chairs)

After a decade of research and development for GPS-less localization and tracking, there has been significantprogress in location-awareness and location-based services around the world. Almost all cell phone platforms,Android, iPhone and Windows phones, have localization and tracking facilities. In addition, special hardwareand infrastructure (e.g., RFID, UWB and BLE or sensor tags) have been developed and deployed for trackingpeople and merchandise. The advances in cameras and computer vision make it possible for cheap simulta-neous localization and mapping (SLAM). However, there are still many challenging problems to be solved,such as accuracy, power management, effective sensor fusion with increasingly powerful embedded computa-tion and resourceful parallel computing backend, learning, transmitting and storing individual trajectories andspatial environment representation, labor-less environmental survey or infrastructure establishment for location-awareness, crowd computing or collective intelligence for map creations, and big data analytics of real-time andhistorical semantic locations that helps improving efficiency for both consumers and businesses.

After four successful workshops (2008, 2009, 2010 and 2011) held with various conferences in mobile, ubiq-uitous and sensor computing, this year we have aimed to provide a forum for knowledge sharing and academic-industrial networking. We have successfully hold a full day program with five invited talks from industrialresearch institutions (Google, Microsoft Research, Disney Research and OmniTrail Technologies) and six tech-nical talks from academia/universities worldwide (Finland, Japan, Brazil, Israel and US). At the end of theworkshop, we have demonstrated Google’s recent innovation on Tango devices. In addition, Google sponsoreda networking lunch and OmniTrail Technologies sponsored the best paper and the best presentation awards.MELT workshop at SIGSPATIAL 2015 continued to provide a leading international forum for researchers, de-velopers, and practitioners in the field of mobile tracking and localization for location-based services.

We would like to thank all the invited speakers and their organizations, Google, Microsoft Research, DisneyResearch and OmniTrail Technologies, for the support of this workshop. We like to also thank the authors whosubmitted papers and authors who accepted our invitations for submitting their work in short amount of time.

36

Page 42: The SIGSPATIAL Special · ing. Ridesharing can be either static or dynamic [9, 10]. Most ridesharing systems operating today belong to static ridesharing, which arrange the driver

IWGS 2015 Workshop Report

The 6th ACM SIGSPATIAL International Workshop on

GeoStreaming

Seattle, WA, USA - November 3, 2015

Chengyang Zhang Farnoush Banaei-Kashani Abdeltawab Hendawi

Teradata Inc. University of Colorado Denver University of Virginia

[email protected] [email protected] [email protected]

(Workshop Co-chairs)

The ACM SIGSPATIAL International Workshop on Geostreaming (IWGS) was held for the sixth time in

conjunction with the 23rd ACM SIGSPATIAL International Conference on Advances in Geographic Informa-

tion Systems (ACMGIS 2015). The workshop has been a successful event that attracted participants from both

academia and industry. The workshop addressed topics that are at the intersection of data streaming and geospa-

tial systems. The workshop fostered an environment where geospatial researchers can benefit from the advances

in geosensing technologies and data streaming systems.

We are entering the era of ”big data” thanks to the exponential growth and availability of structured and

unstructured data, among which a large amount are real-time streaming data emitted from sensors, imagery and

mobile devices. In addition to the temporal nature of stream data, various sources provide stream data that has

geographical locations and/or spatial extents, such as geotagging twitter streams, mobile GPS location streams,

spatial temporal image streams, and so on. On one hand, this amount of streamed data has been a major propeller

to advance the state of the art in geographic information systems. On the other hand, the ability to process, mine,

and analyze that massive amount of data in a timely manner prevented researchers from making full use of the

incoming stream data. The geostreaming term refers to the ongoing effort in academia and industry to process,

mine and analyze stream data with geographic and spatial information.

This workshop addresses the research communities in both stream processing and geographic information

systems. It brings together experts in the field from academia, industry and research labs to discuss the lessons

they have learned over the years, to demonstrate what they have achieved so far, and to plan for the future of

geostreaming.

The workshop featured a keynote by John Krumm from Microsoft Research, providing a review of research

toward addressing the question ”what we can learn about people from the places they go”. This keynote, which

was very well attended and engaging, examined some of the on-going research on the aforementioned topic,

including fundamental models of human mobility; how peoples movements throughout the day give clues about

which types of places they are visiting, such as their home, work, and school; how movements also give insights

into how far people are willing to travel to different types of places and the routes they prefer. how to go from

recorded location data to surprisingly accurate predictions of where people will travel in the future, both over

the next several minutes as well as the next several weeks; and finally how what we learn from mobility patterns

can be applied to automated personal assistants, local search, and routing.

The call for paper resulted in 15 submissions of research papers. A program committee of 7 members re-

viewed the submissions and as a result 11 highest quality papers were accepted. On average, Over 22 attendees

37

Page 43: The SIGSPATIAL Special · ing. Ridesharing can be either static or dynamic [9, 10]. Most ridesharing systems operating today belong to static ridesharing, which arrange the driver

were present at every session of the workshop. The topics presented in the workshop include but are not lim-

ited to: Geostream Query Processing, Geostream Theory and Applications in Tranportation and social media,

Streaming Trajectories and Moving Regions and Geostreaming Systems.

38

Page 44: The SIGSPATIAL Special · ing. Ridesharing can be either static or dynamic [9, 10]. Most ridesharing systems operating today belong to static ridesharing, which arrange the driver

SIGSPATIAL & ACMjoin today!

www.acm.orgwww.sigspatial.org

The ACM Special Interest Group on Spatial Information (SIGSPATIAL) addresses issues related to the acquisition, management, and processingof spatially-related information with a focus on algorithmic, geometric, and visual considerations. The scope includes, but is not limited to, geo-graphic information systems (GIS).

The Association for Computing Machinery (ACM) is an educational and scientific computing society which works to advance computing as ascience and a profession. Benefits include subscriptions to Communications of the ACM, MemberNet, TechNews and CareerNews, full and unlimitedaccess to online courses and books, discounts on conferences and the option to subscribe to the ACM Digital Library.

payment information

Mailing List RestrictionACM occasionally makes its mailing list available to computer-related organizations, educational institutions and sister societies. All email addresses remain strictly confidential. Check one of the following ifyou wish to restrict the use of your name:

� ACM announcements only� ACM and other sister society announcements� ACM subscription and renewal notices only SIGAPP

Questions? Contact:ACM Headquarters

2 Penn Plaza, Suite 701New York, NY 10121-0701

voice: 212-626-0500fax: 212-944-1318

email: [email protected]

Remit to:ACM

General Post OfficeP.O. Box 30777

New York, NY 10087-0777

www.acm.org/joinsigsAdvancing Computing as a Science & Profession

� SIGSPATIAL (ACM Member). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . $ 15

� SIGSPATIAL (ACM Student Member & Non-ACM Student Member). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . $ 6

� SIGSPATIAL (Non-ACM Member). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . $ 15

� ACM Professional Membership ($99) & SIGSPATIAL ($15) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . $114

� ACM Professional Membership ($99) & SIGSPATIAL ($15) & ACM Digital Library ($99) . . . . . . . . . . . . . . . . . . . . . . . $213

� ACM Student Membership ($19) & SIGSPATIAL ($6). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . $ 25

Name __________________________________________________

ACM Member # __________________________________________

Mailing Address __________________________________________

_______________________________________________________

City/State/Province _______________________________________

ZIP/Postal Code/Country___________________________________

Email _________________________________________________

Mobile Phone___________________________________________

Fax ____________________________________________________

Credit Card Type: � AMEX � VISA � MC

Credit Card # ______________________________________________

Exp. Date _________________________________________________

Signature_________________________________________________

Make check or money order payable to ACM, Inc

ACM accepts U.S. dollars or equivalent in foreign currency. Prices includesurface delivery charge. Expedited Air Service, which is a partial air freightdelivery service, is available outside North America. Contact ACM formore information.

Page 45: The SIGSPATIAL Special · ing. Ridesharing can be either static or dynamic [9, 10]. Most ridesharing systems operating today belong to static ridesharing, which arrange the driver

The SIGSPATIAL Special

ACM SIGSPATIAL

http://www.sigspatial.org