smartadp: visual analytics of large-scale taxi ... · pdf filetaxi trajectories for selecting...

SmartAdP: Visual Analytics of Large-scaleTaxi Trajectories for Selecting Billboard Locations

Dongyu Liu, Di Weng, Yuhong Li, Jie Bao, Yu Zheng, Huamin Qu, and Yingcai Wu

click to remove

click to remove

click to remove

Data Set

Using: Tianjin Taxi Trajectory

Target Area

39.1072, 117.1676

39.1071, 117.1828

39.1166, 117.1691

Solution Area

No area selected.

Solution Parameters

Normal Trajectory Weight:

Target Trajectory Weight:

Temporal Filter:

Speed Less Than:

OD Heatmap

�

�

�

Mark favorite places

Number of Billboards : 5

1

1,000

All

Unrestricted

Generate Solution

Solution Explorer

Road

A

SOLUTION PREVIEW

N C S V M R O G N C S V M R O G N C S V M R O G N C S V M R O G N C S V M R O G

Number of Billboards: 5Target to Normal: 1000Temporal Filter: noneSpeed Less Than: N/A#1

Edit Delete


Edit Delete


Edit Delete


Edit Delete


Edit Delete

12

3

4

5

B

C

Speed +average speed per node

Volume +total traffic volume

VFM +value for money

Reach +solution coverage of target area

OTSopportunity to see

GRPgross rating point

Solution #1

Solution #2

Solution #3

Solution #4

Solution #5

Solution #6

Solution #7

Solution #8

Solution #9

Solution #10

162.00

662.50

1,163.00

1,663.50

2,164.00

2,664.50

3,165.00

3,665.50

4,166.00

4,666.50

5,167.00

REMOVED COLUMNS: To remove, drag column title here

034

034

034

034

034

034

034

034

034

034

SOLUTION VIEW LOCATION VIEW

D

E

F

C1

C2C3

C4

Fig. 1. SmartAdP system. (A) Dashboard View shows the information of the current solution for billboard placements. (B) Map Viewprovides a visual summary of the geospatial environment. (C) Solution Preview lists the parameters and statistics of the candidatesolutions. (D) Solution View lays out all the solutions as glyphs to reveal the relationships among the solutions. (E) Location Viewsupports in-depth analysis at the fine-grained location level. (F) Ranking View displays multi-typed ranks of the solutions.

Abstract— The problem of formulating solutions immediately and comparing them rapidly for billboard placements has plaguedadvertising planners for a long time, owing to the lack of efficient tools for in-depth analyses to make informed decisions. In this study,we attempt to employ visual analytics that combines the state-of-the-art mining and visualization techniques to tackle this problemusing large-scale GPS trajectory data. In particular, we present SmartAdP, an interactive visual analytics system that deals with thetwo major challenges including finding good solutions in a huge solution space and comparing the solutions in a visual and intuitivemanner. An interactive framework that integrates a novel visualization-driven data mining model enables advertising planners toeffectively and efficiently formulate good candidate solutions. In addition, we propose a set of coupled visualizations: a solution viewwith metaphor-based glyphs to visualize the correlation between different solutions; a location view to display billboard locations in acompact manner; and a ranking view to present multi-typed rankings of the solutions. This system has been demonstrated using casestudies with a real-world dataset and domain-expert interviews. Our approach can be adapted for other location selection problemssuch as selecting locations of retail stores or restaurants using trajectory data.

Index Terms—optimal billboard locations, taxi trajectory, visual analytics, comparative analysis

1 INTRODUCTION

Billboards are the most common forms of outdoor advertising. Despitethe decline of other traditional advertising media, billboard advertis-ing remains critically important, because people today are spending

• D. Liu is with Zhejiang University and the Hong Kong University ofScience and Technology. This work was done when Dongyu Liu was avisiting student supervised by Yingcai Wu at Zhejiang University. Email:[email protected]

• D. Weng, and Y. Wu are with the State Key Lab of CAD & CG, ZhejiangUniversity. Yingcai Wu is the corresponding author. E-mail:[email protected], [email protected]

• Y. Li is with University of Macau. E-mail: [email protected]• J. Bao and Y. Zheng are with Microsoft Research, Beijing, China. E-mail:

[email protected], [email protected]• H. Qu is with the Hong Kong University of Science and Technology.

E-mail: [email protected]

Manuscript received xx xxx. 201x; accepted xx xxx. 201x. Date ofPublication xx xxx. 201x; date of current version xx xxx. 201x.For information on obtaining reprints of this article, please sende-mail to: [email protected]. Digital Object Identifier:xx.xxxx/TVCG.201x.xxxxxxx/

considerable time in transit. Billboard advertising has several evidentadvantages like prominent visibility, low cost per mille, and superioraccumulation of local influence over other advertising methods.

However, launching a successful billboard campaign is difficult, be-cause many factors should be cautiously considered, such as contentdesign, locations, and visibility. Among them, the geographical loca-tions of billboards are considered the most critical. Appropriate bill-board locations can increase audience exposure significantly, whereasinappropriate ones lead to waste of time and investment. Conventionalapproaches involve manually computing traffic volume, conductingtravel surveys, and inviting experts to build mathematical models [23].As a result, this process is time consuming, less flexible, and canbe completed by professional advertising agencies only. Currently,several famous outdoor advertising companies, such as APN [1] andLAMAR [2], have provided online campaign planning tools that in-clude billboard planners with comprehensive information of each bill-board, such as the traffic statistics and demographics.

These tools enable planners to create and offer various tailored so-lutions to their business customers; however, several limitations havebeen identified. First, the acquisition of such comprehensive data is ex-pensive, thereby limiting the data coverage. Second, the large solutionspace with many advertising factors still poses difficulties for experts

in utilizing the tools to create proper solutions without leveraging thecomputational power of machines. Third, in most cases, several can-didate solutions need to be provided to customers. Since customers’own criteria and expectations may vary significantly from one personto another, they are in urgent need of tools that can help quickly dis-tinguish the commonalities and differences among multiple solutions.Unfortunately, existing tools cannot support this kind of analysis.

In this study, we attempt to employ visual analytics techniques toovercome these limitations using taxi GPS trajectory data. We utilizethe taxi trajectory data for two reasons: (1) the data is relatively easy tocollect and provide citywide coverage in most cities; and (2) the large-scale data can effectively reveal the underlying traffic patterns [8, 49].To our knowledge, previous studies have not reported the use of taxitrajectory data to address the billboard location selection and solution-level comparison problems.

However, there are two major challenges in visual analytics of thetrajectory data: (1) dealing with a wide solution space to determinethe desired solutions; (2) creating an intuitive visualization to facili-tate comparative analysis. A typical billboard solution involves a setof locations. The solution space is almost infinite because of numerouspossible billboard locations in a city. Thus, the cost of searching foran optimal solution under multiple criteria is prohibitively expensive.Furthermore, a billboard location can be characterized by multiplespatio-temporal attributes, such as traffic volume, speed, origins anddestinations (OD), and surrounding environment (e.g., points of inter-est (POIs)). Visual comparisons among solutions can be viewed ascomparing different groups of locations, which are depicted by multi-ple spatio-temporal attributes. Hence, creating a concise and readablevisual representation to facilitate comparison is non-trivial.

To deal with the huge space, we tightly integrate the knowledgeand expertise of humans with the computational power of machines.In particular, we introduce a novel visualization-driven data miningmodel based on a tailored application-specific data index mechanismto efficiently generate billboard candidate locations. To enable effec-tive visual comparison, we propose a set of coupled visualizations,allowing users to compare solutions from multiple perspectives anddifferent levels of details. With these techniques, we develop a visualanalytics system called Smart Advertising Placement (SmartAdP) tovisualize and explore the heterogeneous urban data, such as taxi tra-jectory data, POI data, and a citywide geospatial route network. Oursystem can also be easily adapted to other applications with respect torecourse allocation, such as retail chain location selections.

The major contributions of our study are as follows:• A systematic characterization of the problem of billboard loca-

tion selection using taxi trajectory data, and a thorough discus-sion and summary of the design requirements and space.

• An interactive framework to generate billboard solutions witha novel visualization-driven data mining model and a tailoredapplication-specific data index mechanism.

• A set of new visualization techniques to empower end users toexplore the major features of multiple solutions and compare thecommonalities and differences among them.

2 RELATED WORK

This section discusses the prior studies closely related to our work.Trajectory query processing has received considerable attention

in recent years. The queries on trajectories can be categorized intothree types, namely, the point, region, and trajectory query [12, 48].Point query aims to find the points with an expected spatio-temporalrelationship to several given trajectories or retrieve the trajectories thathave a specific relationship with a few given points. Examples includefinding the k nearest neighbors to a path in road networks [9], or query-ing for the nearest trajectories to a point [21]. Similar to point query,region query seeks the trajectories within the specified spatio-temporalregions or identifies the regions with a specific relationship to a set ofgiven trajectories. Examples include retrieving the most frequent pathbetween a user-specified OD [42], as well as discovering the gatheringpatterns from trajectories [46]. Trajectory query focuses on search-ing the trajectories that possess similar features within a given trajec-

tory set (i.e., classification or clustering [28, 29]). Our query can becategorized as point query, which aims to find points (i.e., billboardlocations) within user specified regions that can cover maximum tra-jectories. In our application scenario, users aim to highlight importantregions or trajectories while playing down other regions or trajecto-ries. Hence, the query method should allow users to flexibly set differ-ent weights for different regions or trajectories. However, identifyingoptimal locations with millions of trajectories in real time is challeng-ing due to the large search space and high algorithm complexity. Toour knowledge, the problem has not been systematically studied in theareas of database and data mining. In this study, we present a novelvisualization-driven data mining model to support this kind of query,in which users are allowed to participate in the improvement of thequality and efficiency of query processing.

Billboard location selection is one of multicriteria decision mak-ing (MCDM) problems in a spatial context [33, 34]. Before makinga choice, billboard campaign planners typically have to run appro-priate trade-offs among multiple conflicting criteria. Karamshuk etal. [22] demonstrate the power of computational methods (i.e., datamining) to tackle the problem of optimal retail store placement byusing human mobility information (i.e., Foursquare check-in dataset)and POIs. However, sometimes the optimal solution cannot be solelygenerated by computer, as people always have their own opinions andcriteria. Exploratory visualization methods are commonly used to ad-dress the problem and make a well-substantiated decision. As such,several mechanisms of integrations of computational methods with in-teractive visualization tools by multiple coordinated views have beensuggested [5]. Nonetheless, to our knowledge, there is little workon combining data mining and visualization techniques to address theproblem of billboard location selection with taxi trajectory data.

Taxi trajectory visualization has been extensively studied and ap-plied in traffic monitoring [43], mobility pattern discovery [16], routerecommendation [30], urban planning [19], and so on. Different fromthe prior studies, our research focuses on solving the billboard loca-tion selection problem by utilizing visual analytics techniques. In thislight, many trajectory visualization techniques [7, 8, 31] can be used.

Taxi trajectories can be viewed as a sequence of time-orderedspatial points with multiple attributes. By leveraging point-relevant vi-sual channels and animation techniques, trajectories can be intuitivelyobserved [38]. With the increase of points, one can use a heatmap toshow the integrated quantity of points in a map. Besides, researcherspropose several line-based aggregation visualization techniques to de-pict traffic flow in a distributed network, such as density map [39]and aggregation flow [6]. These techniques are capable of revealingmovement patterns intuitively. To depict both spatio-temporal infor-mation and related attributes of taxis (speed, direction, and volume,etc.), several space-time-cube based and radial metaphor based visu-alizations are also introduced [30, 41]. However, these techniques arenot designed for visual comparison of complex advertising solutionswith dozens of billboards that possess special features, such as ge-olocations, cost, and coverage. Therefore, these techniques cannot besimply applied in our solution comparison task.

Visual comparison is one of the most fundamental and commonvisualization tasks [24]. There are three widely-accepted categoriesof visual comparison: juxtaposition (i.e., side by side), superposition(i.e., overlay), and explicit encoding (i.e., visually showing differencesor correlations) [17]. Various visualization approaches have been de-veloped on the basis of these three basic methods for different com-parison tasks [40, 44]. For example, VisLink [10] links the sameobjects in two juxtaposed visualizations. Kehrer et al. [25] use hi-erarchically organized small-multiple displays (i.e., juxtaposition) tocompare multi-variate data comprising categorical and numerical in-formation. They also support superposition and explicit linking in asmall-multiple display cell. Our work adopts a juxtaposition method,allowing users to conduct multi-level and multivariate comparativeanalysis. The work unifies three views, namely, the solution, location,and ranking view, through explicit visual linking and user interactions(Fig. 1(D, E, F)).

3 BACKGROUND

This section introduces the background on billboard advertising andthe types of data used. Thereafter, the analytical tasks are discussed.

3.1 Background Knowledge

Billboard location selection is a multidisciplinary research problemthat involves advertising, communication, and urban computing. In thepast year, we have been working with three experts in these fields. Inparticular, one expert is a manager from an advertising agency who hasconsiderable experience in advertisement planning (EA), another oneis a postdoctoral researcher in communication (EB), and the third is asenior researcher in urban computing (EC). EB and EC were invitedspecially to solve the billboard location selection problem that wasoriginally proposed by EA.

As with the real estate business, billboard locations are considereda decisive factor for a billboard campaign. However, different peoplemay have different opinions on locations. We run structured interviewswith EA for several rounds and summarize the main challenges forbillboard location selection as follows.� Finding befitting areas to place billboards. The first step is to

determine several areas to place billboards based on customers’ re-quirements. Areas frequently visited by the target audience are de-sired. However, this type of information is difficult to gather. Thus,planners from outdoor advertising companies often make recom-mendations based on their own experience and knowledge.

� Selecting proper locations in the specified areas. Each specifiedarea contains numerous locations for placing billboards. Plannershave to spend considerable time on manually selecting proper loca-tions from the locations in each area based on the basic informationof each location; such information includes, the surrounding POIs,daily circulation, and cost. The best and worst locations are easyto determine, but the ones in between are difficult to distinguish.Moreover, the information on candidate locations is often incom-plete. Furthermore, information on passers-by along the locationsis rather rare. Planners often obtain the information through fieldstudies or based on their own experience and knowledge.

� Evaluating a solution and convincing customers. Assessing asolution still remains difficult. Field studies, such as those involvingthe use of questionnaires, are usually conducted. Nevertheless, thelimited data and lack of appropriate tools have resulted in difficultyin convincing customers who often have different criteria.

� Providing customers with multiple solutions. Planners have toformulate multiple solutions and present them to customers. Thisprocess frequently costs substantial time. To our knowledge, notool is available for both planners and customers to further analyzeand compare multiple candidate solutions.The feedback from EA suggests that a visual analytics system

is necessary to empower planners to formulate multiple solutionsquickly, as well as to compare the solutions effectively. To design thissystem, we followed a user-centered design process [36] and involvedexperts in every stage of the iterative development since early 2015.

To elaborate the problem, we formally defined two terms, namely,target area and target trajectory. Target areas often refer to the areaswhere the target audience lives or works. Target trajectories are thosewhose origins or destinations are within the target areas.• Traffic volume refers to the volume of passengers who are likely

to see the billboard.• Traffic Speed refers to the amount of time that passengers allo-

cate to see the billboard.• Traffic OD can reveal a passenger’s demographic information to

assist in identifying the target trajectories.• Environment reveals the information on the surrounding POIs

and the residents living or working around the location.• Cost indicates the cost of a billboard.We adopted three widely-accepted performance indicators [23] for

our scenario based on the suggestions of the domain experts.• Coverage/Reach represents the percentage of the covered target

trajectories among all target trajectories. A trajectory is covered

if it passes by at least one of the billboards (i.e., at least onecontact) within a specified time.

• Opportunities to see (OTS) indicates the average number of bill-board contacts among all target trajectories that see a billboardof the campaign.

• Gross rating points (GRP ) measures the average number ofbillboard contacts that 100 target trajectories produce (GRP =reach∗OT S∗100).

• Value for money (VFM) states the value of the cost (V FM =covered target tra jectories/total cost).

3.2 Data Abstraction

We mainly used three types of data collected in one city, where taxisare a common form of transportation. The detailed information is de-scribed as follows:

Road network data comprises 133,726 road segments (the averagelength is 243 m) and 99,007 vertices (i.e., intersections of road seg-ments) in the city.

GPS trajectory data includes the trajectories of 3,501 taxis fromthe city in two months. A total of 3,500 sample GPS points are col-lected for each taxi in one day at a sampling rate of 24 seconds perpoint. These points constitute approximately 4 million trajectories(segmented by passenger on/off events).

POI data contains 154,633 points in the city. Each POI is denotedby its ID, category, and GPS location.

3.3 Task Analysis

By discussing with the experts in the form of structured interviews, wecompiled a list of analytical tasks.R.1 Spatio-temporal distribution: How are the target trajectories

distributed across the city? What is the difference between week-day and weekend? This information help users select the targetareas judiciously.

R.2 Location recommendation: How many billboards should beplaced in the target areas? Where are the optimal locations?Manually selecting proper locations without computational aidsis time-consuming and may easily lead to suboptimal solutions.Therefore, an interactive visual exploratory tool combining withauto recommendation mechanisms (i.e., computational methods)is strongly required.

R.3 Location assessment: How good is a billboard location? Whyis it selected for a billboard? Users want to easily access thedetailed information of a location, such as its cost, neighboringenvironment, traffic volume, speed, and OD, to enable them tomake informed decisions.

R.4 Solution assessment: How effective is a billboard solution?How does it satisfy customers’ requirements? Users want toknow the performance indicators of the selected solution such asreach, traffic volume, traffic speed, and cost, to furhter estimatewhether it can meet the customers’ requirements. In particular,the geospatial distribution of billboards should also be provided.

R.5 Solution comparison: What are the differences and similaritiesamong multiple candidate solutions? An in-depth understand-ing of the differences and similarities among the candidate solu-tions can help planners elaborate the selected solutions and ex-plain their choices to customers.

R.6 Solution classification: How many groups of candidate solutionsexist? How these groups differentiate from each other? Usersare able to formulate multiple solutions in short time with theassistance of computational models. Thus, it is critical to knowhow the generated solutions can be grouped to obtain a quickoverview of the solutions.

R.7 Solution ranking: What is the ranking of multiple solutions?Which ones are optimal? Users may have different opinions onthe optimal, thereby an interactive ranking method should be pro-vided, allowing users to rank solutions as desired. A reasonableranking can help planners quickly find the desired solutions.

POI Data

Loca

tion

Opt

imiz

er

Road Network

Map-Matching

Clean Trajectory

Data Parser

Trajectory-EdgeIndex

Trajectory-VertexIndex

Vertex-TrajectoryIndex

Geospatial Index

MongoDB

k-Lo

catio

n Q

uery

τ-Bu

dget

Con

stra

int Q

uery

Solution Generator

Select target areas

Heatmap (OD) Heatmap (Road)Evaluate target areas

Select solution areasand set parameters

Preview solutions

Edit

Generate

Solution Explorer

Raw Trajectories

032

032

032

032

032

032

SOLUTION VIEW LOCATION VIEW

Numbernumber of billboards






Slownessinversed average spe

Solution #1

Solution #2

Solution #3

Solution #4

Solution #5

Solution #6

REMOVED COLUMNS: To restore, click on column titleCost Speed

Solution View

Ranking View

Average daily traffic amount:

amount

0 1 2 3 4 5 6

15000

16000

17000

18000

14275

18917

Volume 48332 16683 65015Reach 0.014 0.015 0.014Speed 16.184 17.570 16.519VFM 0.786 0.271 1.057

Trajectory POI

Hourly Speed

Map data ©2016

Daily Amount

+

-

Leaflet

Location View

Map data ©2016

� � �

Leaflet

Solution Parameters

Normal Trajectory Weight:

Target Trajectory Weight:

Temporal Filter:

Speed Less Than:

Number of Billboards : 5

1

1,000

All

Unrestricted

N C S V M R O G N C S V M R O G N C S V M R O G


Edit Delete


Edit Delete


Edit Delete

C1

C2C3

Fig. 2. SmartAdP comprises three major components: data manager, location optimizer, and visualization explorer (i.e., solution generator andexplorer). The raw data is preprocessed and stored into the four application-specific data indexes for the use of location optimizer. The locationoptimizer aims to determine the optimal billboard locations by leveraging the computational power of machines. The solution generator interactswith the optimizer to create good candidate solutions. Meanwhile the solution explorer enables visual exploration and comparison of these solutions.

4 SYSTEM ARCHITECTURESmartAdP is a web-based application developed under the full-stackframework of MEAN.js (i.e., MongoDB, ExpressJS, AngularJS, andNode.js). The visual analysis module is implemented using D3.jsand Leaflet.js. We deployed the back-end part into our server with2.40GHz Intel Xeon E5-2620 CPU and 64GB memory. Fig. 2 showsthe SmartAdP’s system architecture.

The solution generator helps users formulate a candidate solution.Users need to select several target areas initially, and then two typesof heatmaps are provided to help users determine the befitting solu-tion areas to place billboards (R1). When the solution areas are deter-mined, users set the parameters of model and obtain a recommendedsolution from the location optimizer (R2). Meanwhile, users can as-sess whether the selected locations or the generated solutions are goodenough (R3, R4) and make adjustments accordingly. To further ex-plore and compare multiple solutions, users can switch to solutionexplorer that comprises three sub-views. The solution view shows ahigh-level overview of the basic information of each solution and therelationships among the solutions (R5, R6). The location view furtherassists users in identifying the relationships at a locational level (R5,R6). The ranking view visualizes the detailed performance related tothe attributes of each solution (R4, R5, R7).

5 MODELThis section first describes the construction of data structure and thenintroduces our novel visualization-driven model.

5.1 Data Structure ConstructionThe major data structures utilized in this study are trajectory-edge,trajectory-vertex, and vertex-trajectory indexes. We refer to the roadsegment in the road network as the edge, and the intersection (i.e.,candidate location for placing billboard) as the vertex.

Trajectory-Edge Index, Ite. A GPS trajectory is a sequence oftime-ordered spatial points. We firstly apply a map-matching algo-rithm [32] to map the spatial points of a given trajectory to the un-derlying road network. Thereafter, the trajectory-edge index can beconstructed. From this index, the road segments passed by a giventrajectory can be identified.

Trajectory-Vertex Index, Itv. The trajectory-vertex index recordsthe covered vertices of all trajectories and can be easily constructedfrom Ite. Thus, each entry in Ite represents a unique trajectory, whichis identified by the trajectory ID Tri. Itv[Tri] lists all vertices that arepassed by Tri, that is, {Tri | vi,v j, ... }.

Vertex-Trajectory Index, Ivt . This index is an inverted trajectory-vertex index, where each entry is identified by a vertex vi on the roadnetwork. The entry Ivt [vi] in this index stores all the trajectories that

are covered by vi, that is, {vi | Tri, Tr j ... }. One can easily obtain thecoverage of a given vertex by using this index.

With these indexes, we can directly calculate the statistics (e.g., vol-ume and speed) for each road segment and each location through aseries of summation and averaging operations. The reach, OTS, GRP,and VFM can be calculated in the same manner, where only the targettrajectories are counted. Besides, a geospatial index can be naturallysupported by MongoDB [3]; for example, the vertices or the trajecto-ries (OD) within multiple polygonal regions can be queried.

5.2 Extracting the Optimal LocationsSmartAdP provides two types of interactive queries, namely, k-location query and τ-budget constraint query, to assist the domain ex-perts in selecting the billboard locations. These two queries are basedon different scenarios. In particular, the k-location query aims to ex-tract k locations from the candidates, where the cost of billboards isnot considered. Meanwhile, the τ-budget constraint query focuses onmining a set of locations with the total cost not exceeding the budgetτ . Each trajectory has different weights in both queries. We define thecoverage value as the sum of the weights of all the covered trajectories.Thus, both queries are aimed at extracting a set of locations with themaximum coverage value to achieve a satisfactory advertising effect.However, identifying k locations with the maximum trajectory cover-age is an NP-hard problem, which is computing infeasible for largek. In this situation, a tradeoff between the efficiency and effectivenessshould be considered. Therefore, we finally propose a visualization-driven mining model that not only human knowledge can play a role,but also an efficient searching and pruning strategy is employed (i.e.,the greedy heuristic method).

5.2.1 k-Location Query

The literature [15] has proven that the greedy heuristic is the mosteffective polynomial solution to our problem and can provide (1−1/e) approximation to the optimal solution. Algorithm 1 shows thepseudo code of the greedy heuristic for the k-locations query. Thisalgorithm first assigns the weights to each trajectory based on whetherthey are target trajectories (cf. line 1-6). For example, the weights oftrajectories with their OD within the given spatial regions Rod are wod(cf. line 4-6), whereas the weights of the remaining trajectories areset to wnor. Then, the coverage value of each candidate vertex can becalculated by adding the weights of its covered trajectories (cf. line 8).Finally, the greedy heuristic is applied in selecting the k locations (cf.line 10-16). In each iteration, the algorithm contains two steps:Selection: In this step, the algorithm selects the vertex with the max-imum coverage value and put it into the result set (cf. line 11- 12).Updating: In this step, the algorithm updates the coverage value of all

Algorithm 1 k-location queryAlgorithm KLocation( Candidate vertices Vcan, Trajectory-vertex index Itv, Vertex-trajectory index Ivt , OD regions Rod , Normal weight wnor , OD weight wod , k)

1: identify all the trajectories covered by Vcan → T Rcan

2: for each trajectory Tr in T Rcan do3: set w(Tr) to wnor

4: identify all the trajectories that one of its OD vertices located in Rod → T Rod

5: for each trajectory Tr in T Rod do6: set w(Tr) to wod

7: for each vertex v in Vcan do8: calculate the coverage value c(v) as ∑Tr∈Ivt [v] w(Tr)

9: Vresult := /0; T Rcovered := /010: for i := 0 to k-1 do11: pickup vmax in Vcan with the maximum coverage value12: Vresult := Vresult

⋃vmax

13: for Tr in Ivt [vmax]−T Rcovered do14: for v in Itv[Tr] do15: c(v) := c(v)−w(Tr)

16: T Rcovered := T Rcovered⋃

Ivt [vmax]

17: return Vresult

vertices. Specifically, for each newly covered trajectory Tr in the cur-rent iteration (cf. line 13), it can identify the passing vertices by usingthe trajectory-vertex index, i.e., Itv[Tr]. The coverage value of everypassing vertex v is updated to c(v)−w(Tr) (cf. line 15).

5.2.2 τ-Budget Constraint Query

We denote the cost of placing a billboard in vertex v as f (v), which isestimated by considering two important factors, namely, the distancefrom central areas and the traffic volume near the location; with thecost of each central area being predefined, the cost of each billboardlocation (the building cost is not considered here) can be calculated byinterpolation. The τ-budget constraint query attempts to extract a setof locations with the total cost within τ . Therefore, the problem canbe modeled as the budget constraint maximum coverage problem, andthe better solution between the small cardinality optimal solution andmodified greedy heuristic can also provide a performance guarantee(1−1/e) [26]. The algorithm follows two steps:Small cardinality optimal solution: It identifies the optimal solutionwith small cardinality. The cardinality of the optimal solution can beset to 1, 2, or 3 according to the requirement of the result quality andresponse time. Typically, a large cardinality tends to have more re-sponse time but better performance guarantee.Modified greedy heuristic: Different from the greedy heuristic in Al-gorithm 1, the modified greedy heuristic adds the vertex with the max-imum utilization ratio to the result set in each round. This processcontinues until the total cost reaches the budget constraint τ . The uti-lization ratio u(v) of v can be calculated by the equation u(v) = c(v)

f (v) .

5.2.3 Handling Interactive Query

Mining the appropriate locations to place the billboards requires sev-eral rounds of involvements by users (R2). Note that all the input pa-rameters except Itv and Ivt are defined by users when they are interact-ing with the solution generator. More specifically, the model supportsthree particular interactive queries, namely, (1) pinning the preferredvertices to the final result directly, (2) removing the unsatisfied verticesfrom the candidates in next round query, and (3) setting the temporalfilter to consider the trajectories within that specified period only. Thek-location and τ-budget constraint query can provide these interfacesthrough a few minor modifications. For the removal operation, we canmodify the input parameter Vcan. For the pin operation, we can addanother parameter Vpin to initially allocate its vertices to the final re-sult and update the coverage value of all vertices. The vertices thatshould be removed or pinned depend on the users who utilize visual-ization to facilitate their judgments. Similarly, for temporal filter, wecan add a parameter Tperiod to filter all trajectories that are not withinthe specified period.

6 VISUAL DESIGNThis section describes a set of visualization techniques that assist usersin generating solutions for billboard placements and comparing them.

6.1 Solution GeneratorUsers are desired to leverage the computational power of machines tofind the optimal solutions (R2). The candidate solutions , however, areexponential in the number of locations, leading to a huge search spacefor seeking the optimal solutions. Thus, an efficient search and prun-ing strategy is requested to solve the problem. Moreover, the optimallocations cannot be solely computed by machines, as users typicallyhave their own requirements (R2). Thus, a highly interactive system isneeded to allow efficient and smooth interactions with machines. Thesolution generator (Fig. 1(A, B, C)) combines visualization and min-ing techniques to enable users to explore the solution space as theydesire. It is mainly composed of three sub-views, namely, the dash-board, map, and solution preview view.

6.1.1 Dashboard ViewThe dashboard view (Fig. 1A) shows the information of the currentsolution under construction. From top to bottom, the view shows thedataset, target area panel, solution area (the area considered for plac-ing billboard) panel, and the parameter setting panel, which includesthe budget (number of billboards for k-location query and cost for τ-budget constraint query), normal trajectory weight, target trajectoryweight, temporal filter (weekday/weekend), and speed filter.

6.1.2 Map ViewMap-centred exploratory approach is very common when it comes tomaking multiple criteria spatial decisions [20], as it can provide intu-itive insights into environment. Thus, we provide a map view (Fig. 1B)that integrates the real roadmap pictures at multiple scales by usingGoogle Maps API; users are allowed to change the map style intosatellite or plain map (Fig. 4D) for different purposes. In additionto the roadmap layer, we add three other feature layers, namely, thearea drawing, heatmap, and marker layer.

The area drawing layer provides the function of drawing polygonsin the map, enabling users to specify the target and solution areas (i.e.,the areas considered for placing billboards) in the forms of red andblue polygons, respectively (Fig. 2). The editing and removing inter-actions in the area are also supported on the dashboard (i.e., target areaand solution area panel).

To determine the befitting solution areas (R1), an evident visual rep-resentation of spatio-temporal distributions of the target trajectoriesare required. Thus, we add a heatmap layer, where we provide userswith two types of density maps, namely, the OD and road heatmap(Fig. 2). The OD heatmap represents the density of the target trajecto-ries’ pick-up and drop-off geolocations through color encoding, withthe dense red areas indicating frequent pick-up and drop-off events.The road heatmap shows the density of the target trajectories on eachroad segment using the same color encoding in the OD heatmap. Tofacilitate in-depth analysis, both heatmaps can support OD filter (i.e.,showing the trajectories to or from the target areas) and temporal filter(i.e., showing the trajectories that occurred in weekdays or weekend)by selecting different options on the top-right corner of the map.

A marker layer presents the locations selected by users or ma-chines with blue markers on the corresponding locations on the map(Fig. 4D). The markers can be removed by clicking them on the dash-board. Users are also allowed to pin their favorite locations on themap. When a location marker is clicked, the detailed information of abillboard location is shown (Fig. 2), including the statistics of passingtrajectories, the surrounding POI information, and the OD heatmap ofthe passing trajectories (R3).

6.1.3 Solution PreviewUsers want to know their operating records and the general perfor-mance information of each previously generated solutions, so they canexplore the solution space efficiently based on previous experience.

The solution preview saves the user’s previous settings on each so-lution and also provides the statistical information for each of them.Fig. 1C illustrates the solution preview, where each box represents onesolution. The parameters are shown inside the boxes. A bar chartis horizontally aligned on the top of each box where eight attributesare shown, including the number of billboard (N), cost (C), averagespeed (S), traffic volume (V), value for money (M), reach (R), OTS(O), and GRP (G). The bar charts are assigned with different categor-ical colors to indicate which solutions they belong to. When usershover over a bar of one solution, the bars indicating the same attributevalues of other solutions would be highlighted to facilitate compari-son (as shown in Fig. 6 #1 - #6). The horizontally laid boxes enableusers to perform a rough-level comparison among candidate solutions;thus, the solutions with poor performances can be easily detected anddeleted (R4). Besides, users are allowed to directly edit any solutionby adding or deleting billboard locations from it (R2).

6.2 Solution ExplorerUsers have to prepare multiple solutions and then analyze the relativemerits of these solutions with customers. This way, customers can se-lect the most satisfying one according to their criteria and preferences.Thus, a visualization tool is necessary for users to conduct in-depthcomparative analysis of the candidate solutions. The solution explorer(Fig. 1(D, E, F)) is designed to meet the requirements of R4 to R7.It assists users in conducting multi-perspective comparative analysisamong different solutions. The solution explorer comprises three sub-views, namely, the solution, location, and ranking view. These viewscan be linked to assist users analyze and compare the multiple candi-date solutions in the same time at different levels of details.

6.2.1 Solution ViewThe solution view aims to provide users with a visual summary of eachsolution and the relationships among them (R5, R6). Meanwhile, thisview is a pivot that connects the location and ranking view, therebyenabling users to explore the solutions from multiple perspectives andassisting them determine the optimal solution immediately (R5).

Glyph Design. A glyph design that can reveal the general perfor-mance of a solution is required to visually summarize the importantfeatures of each solution. However, numerous features may influencethe performance of a solution. If we show all of these features simul-taneously, then the glyph can be visually complex, which may imposecognitive burden to the users working memory and reduce the taskperformance. After discussions with our collaborators, we identify thekey information that should be visually encoded, namely, reach (week-day and weekend), traffic speed, cost, and POIs. These features enableusers to obtain a quick overview of a solution.

A familiar metaphor can greatly enhance comprehension and re-duce the cognitive burden on working memory. Inspired by thedashboards of vehicles, we design a novel radial-based [11] visualmetaphor to represent a solution, as all traffic-related attributes arenaturally related to vehicles. Fig. 3A shows our design. Users oftengenerate at most a dozen solutions and people can efficiently distin-guish a dozen colors [37]. Hence, we use the color of the inner circleto indicate the particular solution (consistent with the colors used inthe solution preview). The radius of the inner circle represents thetotal cost by default (the encoding attribute can be specified by users).

A radius heatmap is attached inside the outer circle, which is sim-ilar to the speed meter in a vehicle’s dashboard; the dark red colorindicates high speed. We use a pointer to clearly indicate the averagetraffic speed for all the trajectories passing the locations of the solu-tion. In addition, the arc area outside the outer circle represents thevolume of weekday and weekend reach. The arc is constantly at 180◦,the scale is equal to the total number of target trajectories. The arc sub-areas lying to the left and to the right of vertical dashline represent theweekday and weekend volume of the target trajectories, respectively.The blue area encodes the proposition of the covered target trajecto-ries among all target trajectories. We add a scale to the arc area to helpperceive the proposition accurately, and attach nine small radial nodesto the remaining space surrounding the glyph to show nearby POIs in

different POI categories (nine categories are selected by our collabora-tor). We use the size of the nodes to encode the number of POIs. Fromtop to bottom, they are public transport, academic, residence, hotel,sport, life service, shopping, catering, and automobile service.

Design alternatives: Fig. 3 presents several alternative designs thathave been evaluated. Design C uses a horizontal ruler on the top toshow the reach volume information. The advantage is the use of lengthto represent the magnitude value, which is perceived more accuratelythan angle [37]. However, it is not symmetrical and discords with cir-cle. Our collaborators did not like the design. Thus, we proposed twoother designs (Design D-E) using arc length to represent the reach vol-ume information. Design D utilizes two scales (one is on the volumearc and the other one lies insider the circle). Although design D cutsdown on the use of color, the two radial scales stay too close and caneasily confuse users. In contrast, design A and E similarly use a scaleon the volume arc and a radial heatmap inside the glyph to representdifferent scales. Design E uses the number of blocks to visually en-code the average traffic speed without using a pointer. However, ourcollaborators reported that it is not as effective as design A for com-paring speeds among multiple solutions.

C D

Max

0

Speed Scale

Min

1/4Mid

3/4

Max

Speed Boxplot

A B

040

Weekday Reach Weekend Reach

Billboard Location belongs to

Inactive Solutions

Not belongs to

Solutions of Interest

E

Weekday Reach Weekend Reach

Fig. 3. (A) A dashboard-like glyph design to summarize the key fea-tures of each solution. (B) A radial location glyph to present a selectedbillboard location. (C, D, E) show the alternative solution glyph designs.

Layout: The solution view based on a scatterplot layout allowsusers to obtain a quick overview of the relationships among solutions(R5). We compute the similarities among different solutions and uti-lize Multidimensional Scaling (MDS) [27] to create the layout. Wealso use a well-established method to eliminate the overlap issue [14].The similarity between two solutions depends on how many the sametrajectories are covered. Thus, we can treat the trajectories covered inthe two solutions as two sets A and B. We calculate the set similarityby |A∩B/min(|A|, |B|)| [45]. Two sets are more similar when theyshare more elements. In the resulting layout, the closer the solutionglyphs stay, the more common target trajectories share.

6.2.2 Location View

From the solution view, users can detect the relationships among dif-ferent solutions with respect to the similarity on the covered targettrajectories. Users may want to further analyze the relationships at thefine-grained location level. For example, they may want to know howthe locations in each solution distribute geographically, and which so-lutions the locations of interests belong to. Such information can helpusers identify the commonalities and differences among the solutionsfrom the perspective of locations (R5). Therefore, we design a visual-ization that can assist users in this kind of tasks.

To this end, we showed a few designs to our collaborators. Line-set style [35] design (i.e., locations shown in one solution are linkedusing a line) was first introduced; however, this design has several is-sues. First, the users are unfamiliar with the map style. Second, this

technique suffers from a scalability issue that can lead to visual clutterwhen a location is shared by many solutions. Moreover, this techniquecan only show the geographical distribution, whereas our collaboratorswant to acquire more information, such as the cost of each location.To address these concerns, we had several rounds of discussions withour collaborators. Finally, we designed a new radial location glyph(Fig. 3E) and derived a layout algorithm by extending Dorling car-togram [13] to show the value-by-location maps effectively.

In Dorling cartogram, the relative geographical positions amongdifferent objects are considerably preserved. In our case, we use acircle to represent a selected billboard location. By employing the lay-out of Dorling cartogram, the relative geographical positions of thesebillboard locations are preserved (Fig. 5B). Fig. 3B shows a locationglyph. The primary visual variables should be used to encode the pri-mary data attributes; as suggested by our collaborators, we use the ra-dius of a circle to encode the cost information by default (the encodingattribute can be specified by users). In addition, we fill the circle witha plain-style roadmap in a novel manner. The center of the circle rep-resents the billboard location marked as a red cross. The visual designenables users to immediately identify the road environment around thebillboard. To reveal the set relations between solutions and locations,we attach a set of radial color bars surrounding the circle to indicatethe solutions that the location belongs to. For the solutions to whichusers pay attention, the corresponding color bars will be highlightedwith increased thickness.

6.2.3 Ranking View

Users need a flexible ranking tool to help them quickly identify goodsolutions they desire (R7). In particular, all performance-related in-dicators should be displayed on demand. The system should enableusers and customers to freely adjust each attributes weight becausethey may have different opinions on the performance indicators. Theranking should be updated accordingly and instantly. Moreover, usersexplained that showing only one value for each attribute is insufficient;they need to analyze further details with respect to locations containedin each solution. For example, users want to know the reach of eachlocation in every solution for the reach indicator. This type of infor-mation provides users with detailed insight into how good a solutionis regarding a given indicator (R4).

The ranking view visualizes the detailed performance related to theattributes of each solution, including the number of billboard, cost,speed, volume, VFM, reach, OTS, GRP, slowness (inverse of speedfor ranking use), in a highly organized and interactive tabular form.To support location-level comparison, we embed boxplots [4] into thematrix. This new design enables users to glean insight into the relativeperformance of the solutions. This view together with the aforemen-tioned two views (i.e., the location view and the solution view) em-power users to quickly find the optimal solutions as they desire (R7).

Fig. 1F shows a matrix-based view, which is inspired by lineup [18].The first column lists all the candidate solutions with color bars on theleft indicating different solutions. The color scheme used is consistentwith that in the solution and location views. Other columns displaythe attribute values, which are normalized and encoded by the lengthof bars. The width of a column represents the weight users assignedto the attribute. Users can click on the header of a column to rank thesolutions by the associated attribute. The columns can also be groupedby right clicking on their headers to rank the solutions by the weightedsum of the attributes in the group. For example, in Fig. 2, the columnsreach, OTS, GRP, VFM, Volume, and slowness are grouped; the solu-tions are ranked by weighting the sum of these attributes. When usersare interested in an attribute, they can click on the body of that column, which can expand as a boxplot to show the statistical distributionsof the corresponding attributes of the locations in each solution (seethe column reach in Fig. 1F). In particular, the minimum, first quartile,median, third quartile, and maximum values are revealed.

6.3 Interactions

User interactions supported by the system are summarized as follows.

Details-on-demand is supported by SmartAdP to facilitate explo-ration in solution space and analysis of patterns at different levels ofdetails. In the general map view and location view, users are allowedto click any location to see its detailed information. In addition, theboxplot is only shown on demand.

Filtering and highlighting enable users to focus on the informa-tion of interests. In SmartAdP, users can filter trajectories using a tem-poral filter. In particular, the traffic OD information can be furtherfiltered based on whether the trajectories enter or leave for the targetareas. The highlighting feature is supported in every view. For exam-ple, when hovering on a bar in a solution preview’s barchart, the barsencoding the same attribute in other solutions are highlighted. Whenhovering on a glyph in the solution view, the corresponding locationsand solutions are highlighted in the location and ranking views.

Linking connects the three views in the solution explorer. Whenusers click on a glyph in the solution view or a solution color bar in theranking view, they can be visually linked with the contained locationsin the location view. Edge bundling technique [50] is applied to reducethe visual clutter and increase the readability.

7 EVALUATION


5.548.4311.3214.2117.1019.9822.8725.7628.6531.5434.42

Average hourly traffic amount:

amount

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

500

1000

1500

141

1640

Daily SpeedHourly AmountCost +total cost of solution

Solution #1

Solution #2

Solution #3

Solution #4

Solution #5

Solution #6

Solution #7

Solution #8

Solution #9

Solution #10

29,402.08

32,895.89

36,389.71

39,883.52

43,377.33

46,871.15

50,364.96

53,858.77

57,352.58

60,846.40

64,340.21

1

2

4

3

D

A

B C

E

F

night night

Fig. 4. Three patterns detected. First, the road heatmap (D) highlightsthe road stretches heavily passed by the target trajectories where areas1 and 4 have more target trajectories than areas 2 and 3. Second,areas 2 and 3 tend to have higher speed and cost, respectively (A, E,and F). Third, the locations marked with the green and blue rectanglesare selected unexpectedly. The green one (B) shows two locations inclose proximity and the blue one has a lot of traffics at night (C).

7.1 Case StudiesWe conducted the case studies with our domain experts; they werefamiliar with our designs and databases. As a note, that the cost men-tioned in this section denotes monthly cost.

7.1.1 Exploring the solution spaceIn this scenario, the experts used SmartAdP to explore the solutionspace. They selected a few target areas (the blue areas in Fig. 4D), inwhich four famous universities are located.

Exploring the distribution of target trajectories. The expertssought to determine which areas are appropriate for placing billboards(R1). To this end, they first needed to understand the distribution ofthe target trajectories. From the OD heatmap (Fig. 1B), the expertsidentified five different areas: (1) an area with a large railway sta-tion, (2) a business area that caters to dining or shopping activities,(3) a research center that is home to a number of scientific researchinstitutions, (4) a park area for leisure or exercise, and (5) a centralarea in which the placement of billboards is expensive. From the roadheatmap (Fig. 4D), the experts found that the road stretches in areas 1and 4 were used more heavily than those in areas 2 and 3.

DORLING MAP VIEWSCATTERPLOT VIEW

025

025

Cost +total cost of solution







Solution #1

Solution #2

A B

CD

Park

Fig. 5. The two locations marked with the blue rectangles present the difference between the weekend (blue) and weekday (orange) solutions. Thelocation detail view (D) shows that the destinations of the passed trajectories on weekends are mostly parks.

Comparing solution areas for billboard placements. The expertsaimed to determine how good each selected solution area was (areas1-4 in Fig. 4D). A global area depicted as a combination of the fourareas was also included. The experts attempted a setting with fivebillboards and another setting with eight billboards in each solutionarea (four areas in four directions and one global area). Hence, 5 ∗2 = 10 candidate solutions were formulated, during which process thesolution preview provided a quick overview of the performances ofthe created solutions (R4). To conduct in-depth analyses, the expertsswitched to the solution explorer, where they identified four clusters(C1-C4 in Fig. 1D) the solution view (R5, R6).

As observed by the experts, the solutions in C3 and C4 clearly havea significantly low reach because of the short blue arcs. Moreover, thesolutions in C3 achieve fast speeds (see the speed pointer and radialboxplot inside the associated glyphs). Fast speeds are not desired bythe experts because fast speeds do not allow sufficient time for view-ing billboards. The speeds are reflected in the boxplot in the rankingview (Fig. 4A), where the medians of solutions #2 and #7 are muchhigher than others. The location view (with the circle size encodingthe speed in Fig. 4F) reveals that the locations in C3 are in the bottomarea. The solutions in C4 entail a relatively high cost (the glyphs havelarger inner circles in Fig. 1D), as the associated locations in the lo-cation view (with circle size encoding cost by default in Fig. 4E) areclose to the city central area (area 5 in Fig. 1B). This can also be vali-dated by the size of POI circles around the glyphs in the solution view(Fig. 1D); typically the larger the POI circles (more POIs around), themore expensive the areas for placing billboards. C1 contains four so-lutions with higher reach than those in the other clusters because theblue radial arcs are longer. Furthermore, the experts found that thebillboards of the solutions in C1 are located near the top of the map(e.g., the billboards of the purple solution in Fig. 1E). Meanwhile, theexperts changed the radius of the circle in the location view to encodethe reach, and found the circles on the top tend to have larger sizethan others, which indicates the higher reach. This is in accordancewith the OD and road heatmaps. By further checking the boxplot inranking view (the reach column in Fig. 1F), the experts found that so-lutions #4, #5, #9, and #10 are of high reach. Solutions #4 and #9 areformulated with the solution area 4 (Fig. 4D). Solutions #5 and #10are formulated with the global solution area, which also include area4. This indicates most of optimal locations are within the area 4 inFig. 4D. C2 contains two solutions whose solution area is area 2 inFig. 4D. These two solutions perform normally. In summary, areas 4(Fig. 4D) is preferable solution area for billboard placements.

Improving and editing solutions. When the experts evaluated area3 in Fig. 4D, they found two unexpected locations selected by themodel (R3). First, the two selected locations are too close (markedwith the green rectangle in Fig. 4(D, E)). It was explained that the roadis made up of several lanes in opposite directions and both directionshave large volume of target trajectories. Whereas, the experts thoughtthat placing two billboards in close proximity is not necessary. Sec-

ond, despite the relative large traffic volume of the location (markedwith the blue rectangle in Fig. 4(D, E)), it is mainly attributed to thetraffics at night (Fig. 4C. Experts zoomed in on the map, thereby find-ing the location surrounded by a number of bars. As the illuminationat night is not as good as daytime, daytime traffic is preferable. Thus,the experts removed the two unexpected locations.

7.1.2 Finding the optimal solution

This case is aimed to demonstrate the effectiveness of the solutionexplorer in identifying optimal solutions (R7). Note that the targetareas in this case is the same as the previous one.

Weekday vs. Weekend. Driven by the previous experience, theexperts quickly generated two solutions. The two solutions coverthe same solution areas, that is, the global area combining area 1-4Fig. 4D. The two solutions differ in terms of the setting of the tempo-ral filter. One solution only considers weekday trajectories, whereasthe other solution only considers weekend trajectories. As indicatedby the glyphs in Fig. 5A and the performance indicators in Fig. 5C,these two solutions (blue for weekend and orange for weekday) per-form similarly because all the indicators, except OTS, are quite similar.By further checking the location view (with circle size encoding costin Fig. 5B), the experts found that the difference between the two so-lutions was mainly caused by the two locations highlighted in the bluerectangle. They were particularly interested in the one on the rightside, as it was expensive (large circle) and far from the commonly se-lected locations. Hence, the experts explored the OD heatmap of thelocation (Fig. 5D), and found that the destinations of the trajectoriesthat passed the location on weekends were mostly parks. The expertsinferred that people tended to visit parks on weekends. This resultalso explains why the location was picked by the model (R3), that is,the location showed a large number of target trajectories on weekends.People going out on weekends, especially those visiting parks, tend tobe in a good mood and are highly likely to be influenced by advertis-ing content. All the experts preferred solution #1 despite solution #2being slightly better in terms of the OTS and GRP.

Dispersed Strategy vs. Clustered Strategy. The experts formu-lated a series of candidate solutions given a budget of $380,000. Twoadvertising campaign strategies were employed: (1) a dispersed strat-egy, which involves placing billboards in a large region to increase thereach as much as possible; (2) a cluster strategy, which involves plac-ing billboards in a small region to increase the OTS as much as possi-ble. Using SmartAdP , the experts generated three candidate solutionsfor each strategy with different parameter settings (Fig. 6). Specif-ically, solutions #1 to #3 were generated by adopting the dispersedstrategy (the solution area is nearby area 4 in Fig. 4D). Solution #1and #2 were generated with a weight ratio (the target trajectory weightto the normal trajectory weight) of 5 and 100, respectively. Solution#3 involved the addition of a speed filter (≤ 15km/h). Solutions #4to #6 were generated with the same settings while limited in a smallregion in area 4 of Fig. 4D.

Through the solution explorer, the experts identified an outlier (theblue solution) in the solution view (C2 in Fig. 2 Solution View). Forthe locations in the blue solution, they are distributed dispersedly inthe location view (Fig. 2 Location View) and show high traffic vol-ume (Fig. 2 Ranking View), as well as a low reach (the glyph in solu-tion view has short outer radial arc). The other 5 solutions form twoclusters, where the solutions with the same advertising strategy clustertogether (C2 and C3 in Fig. 2 Solution View). From the glyphs, theexperts knew the speeds of the brown and green solutions were slowowing to the speed filter, but their reach was relatively low in com-parison with that of the other solutions. The experts further used theranking function in the ranking view to determine the optimal solu-tion. The results created by ranking the solutions according to a singlecolumn (attribute) are listed in Fig. 6. Generally, the dispersed strat-egy (#1 - #3) performs much better than the clustered strategy (#4 -#6) in terms of reach, whereas the clustered strategy achieves a higherOTS than the dispersed strategy. However, the solutions have differ-ent rankings when ranked by different attributes. Hence, determiningthe optimal solution at first glance was difficult and as such the ex-perts further ranked the solutions with a group of specified columns(attributes). The view allowed the experts to adjust the weight ofeach column flexibly in the group. Finally, they obtained a rankingof the solutions (Fig. 2 Ranking View), where the GRP and VFM areof higher weights than the other attributes. As a result, solution #5generated with clustered strategy was selected.

N C S V M R O G

Maximum Cost: 380000Target to Normal: 5Temporal Filter: noneSpeed Less Than: N/A#1

Edit Delete

#1

N C S V M R O G


Edit Delete

#2

N C S V M R O G

Maximum Cost: 380000Target to Normal: 100Temporal Filter: noneSpeed Less Than: 15#3

Edit Delete

#3

N C S V M R O G


Edit Delete

#4

N C S V M R O G


Edit Delete

#5

N C S V M R O G

Maximum Cost: 380000Target to Normal: 100Temporal Filter: noneSpeed Less Than: 15#6

Edit Delete

#6

0.37

0.83

0.68

0.63

0.76

0.59

Reach +solution coverage of target

Solution #1

Solution #2

Solution #3

Solution #4

Solution #5

Solution #6

Reach

1.09

1.34

1.35

1.46

1.58

1.48


Solution #1

Solution #2

Solution #3

Solution #4

Solution #5

Solution #6

OTS

40.04

111.11

91.89

92.62

120.14

87.47


Solution #1

Solution #2

Solution #3

Solution #4

Solution #5

Solution #6

GRP20.53

19.19

13.27

17.66

16.27

13.33


Solution #1

Solution #2

Solution #3

Solution #4

Solution #5

Solution #6

Speed

Fig. 6. The solution preview (top) displays the parameters and statisticsof each solution. The bottom shows the solution ranks with the differentattributes selected.

7.2 Domain Expert InterviewWe collected the feedback of the domain experts by conducting one-on-one interviews. The feedback was summarized as follows.

Visual Design and Interactions. The experts confirmed that oursystem is well designed and user friendly. In particular, our metaphor-based glyph design received high praises from the domain experts.They believed that the system could be easily understood by users withdifferent backgrounds. EA commended, “Today’s consumers are notso suggestible for the sake of an increasingly accessible Internet. Yourvisualizations can be an incredible tool to clearly demonstrate the up-side of our service.” EC praised the smooth interactions, comment-ing “SmartAdP integrates many advanced navigation and interactiontechniques, enabling me to smoothly work with machines.”

Usability and Improvements The experts appreciated our systemand found the functions provided by SmartAdP quite useful. They allagreed that our system is useful in not only making adjustments tocampaigns but also interpreting the data and leveraging the findings tobenefit their clients and brands. Apart from the aforementioned, ourexperts also provided some valuable suggestions. EA mentioned, “Inmany times, I found several flaws of candidate solutions in the solutionexplorer. It would be nice if I can edit solutions directly in this inter-face.” EB further suggested an advanced solution merging functionwould be helpful, as two solutions are sometimes complementary.

8 DISCUSSIONOur evaluation demonstrates the effectiveness of SmartAdP . Never-theless, there is still space for improvement.

The system does not support visual comparison of hundreds of so-lutions, because we use colors to differentiate candidate solutions and

people cannot effectively distinguish over one dozen colors [37]. Nev-ertheless, we believe that our design works for most cases becauseusers typically do not generate many solutions. Likewise, the locationview suffers from the same problem with the increase of billboards. Apossible solution to the problem is to employ a hierarchical strategyto handle numerous circles. Additionally, the current system mainlyfocuses on comparison at the solution level and does not support dis-playing values of multiple criteria for locations directly. Although thelocation view and embed boxplots in the rank view can facilitate com-parison among locations, they can only display one attribute at thesame time. Therefore, we plan to investigate more design choices forshowing multiple attributes of locations in the rank view.

Travel direction should be considered when a road is so wide thatbillboards are visible only from one side of the road. The current sys-tem is capable of dealing with that situation for the reason that thepassing-by trajectories of the locations on the two opposite sides of awide road are counted separately. More specifically, for a wide roadin the road network data, it is always treated as two road segmentswith opposite directions; hence, the locations on the two road seg-ments count only the trajectories from one direction. However, traveldirection can also influence the manner in which we place a billboardin a given location. For instance, we may also need to take into ac-count several additional factors, such as the billboard orientation andthe height off the ground. These factors are not considered currentlybut are worthy of being explored in the future.

Regarding the model itself, the greedy heuristic is suggested to ex-tract the locations. Nevertheless, our work does not limit the utilizationof other more sophisticated methods to further improve the quality ofthe results. For example, the anytime refinement [51] is one of thepossible methods. More advanced methods enabling users to interac-tively train and improve models are also worth further investigations.For taxi data, despite its prominent advantages, it cannot adequatelyrepresent all the mobility movement patterns. This issue is common inprior studies that utilize taxi trajectories. Hence, we are interested inintegrating additional types of data, such as location-based social net-work data or the information about regional functions. This requiresmore efforts on solving the problem of heterogeneous data fusion [47].

Although our work is primarily designed for the billboard locationselection problem, it can be easily adapted for other similiar problems,such as selecting locations of retail stores or restaurants. In these si-miliar problems, a number of good locations can be recommended byadapting our novel visualization-driven model with the data that canreveal human mobility patterns, such as taxi trajectories. Our visual-ization approach, including solution generator and solution explorer,can also be tailored and extended to support visual editing and com-parison of different sets of locations.

9 CONCLUSIONIn this paper, we systematically study the problem of identifying theoptimal billboard locations using massive trajectory data. Closelyworking with the end users enables us to derive two major challengesfacing billboard advertising planners, namely, creating and comparingmultiple solutions in an immediate and accurate manner. Hence, wepresent SmartAdP, an interactive visual analytics system that combinesa new application-driven mining model with several well-designed vi-sualization and interaction techniques. We conduct case studies andexpert interviews to demonstrate the system. Positive feedback andin-depth insights show the usefulness and effectiveness of our system.

ACKNOWLEDGMENTS

The work is supported by National 973 Program of China(2015CB352503), the Fundamental Research Funds for Central Uni-versities (2016QNA5014), National Natural Science Foundation ofChina (No. 61502416), the research fund of the Ministry of Educa-tion of China (188170-170160502), 100 Talents Program of ZhejiangUniversity, RGC GRF16241916, ITS/170/15FP, and a grant from Mi-crosoft Research Asia.

REFERENCES

[1] APN Outdoor. http://goo.gl/Aln5AB.[2] Lamar. http://goo.gl/P86jkQ.[3] MongoDB: 2dsphere Indexes. https://goo.gl/wlarPf.[4] S. Few. Distribution displays, conventional and potential, 2014.

Technical report. http://goo.gl/zy8uAo.[5] N. Andrienko and G. Andrienko. Informed spatial decisions

through coordinated views. Information Visualization, 2(4):270–285, 2003.

[6] N. Andrienko and G. Andrienko. Spatial generalization and ag-gregation of massive movement data. IEEE TVCG, 17(2):205–219, 2011.

[7] N. Andrienko and G. Andrienko. Visual analytics of movement:An overview of methods, tools and procedures. Information Vi-sualization, 12(1):3–24, 2013.

[8] W. Chen, F. Guo, and F.-Y. Wang. A survey of traffic data visu-alization. IEEE TITS, 16(6):2970–2984, 2015.

[9] Z. Chen, H. T. Shen, X. Zhou, and J. X. Yu. Monitoring pathnearest neighbor in road networks. In Proc. ACM SIGMOD,pages 591–602, 2009.

[10] C. Collins and S. Carpendale. Vislink: Revealing relationshipsamongst visualizations. IEEE TVCG, 13(6):1192–1199, 2007.

[11] W. Cui, H. Qu, H. Zhou, W. Zhang, and S. Skiena. Watch thestory unfold with textwheel: Visualization of large-scale newsstreams. ACM TIST, 3(2):20, 2012.

[12] K. Deng, K. Xie, K. Zheng, and X. Zhou. Trajectory indexingand retrieval. In Computing with Spatial Trajectories, pages 35–60. Springer, 2011.

[13] D. Dorling. Area cartograms: their use and creation. In TheMap Reader: Theories of Mapping Practice and CartographicRepresentation, pages 252–260. Wiley, 2011.

[14] T. Dwyer, K. Marriott, and P. J. Stuckey. Fast node overlap re-moval. In Proc. Graph Drawing, pages 153–164, 2005.

[15] U. Feige. A threshold of ln n for approximating set cover. Jour-nal of the ACM (JACM), 45(4):634–652, 1998.

[16] N. Ferreira, J. Poco, H. T. Vo, J. Freire, and C. T. Silva. Visualexploration of big spatio-temporal urban data: A study of newyork city taxi trips. IEEE TVCG, 19(12):2149–2158, 2013.

[17] M. Gleicher, D. Albers, R. Walker, I. Jusufi, C. D. Hansen, andJ. C. Roberts. Visual comparison for information visualization.Information Visualization, 10(4):289–309, 2011.

[18] S. Gratzl, A. Lex, N. Gehlenborg, H. Pfister, and M. Streit.LineUp: Visual analysis of multi-attribute rankings. IEEETVCG, 19(12):2277–2286, 2013.

[19] X. Huang, Y. Zhao, J. Yang, C. Zhang, C. Ma, and X. Ye. Traj-Graph: A graph-based visual analytics approach to studying ur-ban network centralities using taxi trajectory data. IEEE TVCG,22(1):160–169, 2016.

[20] P. Jankowski, N. Andrienko, and G. Andrienko. Map-centredexploratory approach to multiple criteria spatial decision mak-ing. International Journal of Geographical Information Science,15(2):101–127, 2001.

[21] H. Jeung, Q. Liu, H. T. Shen, and X. Zhou. A hybrid predictionmodel for moving objects. In Proc. ICDE, pages 70–79, 2008.

[22] D. Karamshuk, A. Noulas, S. Scellato, V. Nicosia, and C. Mas-colo. Geo-spotting: Mining online location-based services foroptimal retail store placement. In Proc. ACM SIGKDD, pages793–801, 2013.

[23] H. Katz. The media handbook: A complete guide to advertis-ing media selection, planning, research, and buying. Routledge,2014.

[24] J. Kehrer and H. Hauser. Visualization and visual analysis ofmultifaceted scientific data: A survey. IEEE TVCG, 19(3):495–513, 2013.

[25] J. Kehrer, H. Piringer, W. Berger, and M. E. Groller. A model forstructure-based comparison of many categories in small-multipledisplays. IEEE TVCG, 19(12):2287–2296, 2013.

[26] S. Khuller, A. Moss, and J. S. Naor. The budgeted maximumcoverage problem. Information Processing Letters, 70(1):39–45,

1999.[27] J. B. Kruskal. Nonmetric multidimensional scaling: a numerical

method. Psychometrika, 29(2):115–129, 1964.[28] J.-G. Lee, J. Han, X. Li, and H. Gonzalez. TraClass: Trajec-

tory classification using hierarchical region-based and trajectory-based clustering. In Proc. VLDB, pages 70–79, 2008.

[29] J.-G. Lee, J. Han, and K.-Y. Whang. Trajectory clustering: Apartition-and-group framework. In Proc. ACM SIGMOD, pages593–604, 2007.

[30] H. Liu, Y. Gao, L. Lu, S. Liu, H. Qu, and L. M. Ni. Visualanalysis of route diversity. In Proc. IEEE VAST, pages 171–180,2011.

[31] S. Liu, W. Cui, Y. Wu, and M. Liu. A survey on information visu-alization: recent advances and challenges. The Visual Computer,30(12):1373–1393, 2014.

[32] Y. Lou, C. Zhang, Y. Zheng, X. Xie, W. Wang, and Y. Huang.Map-matching for low-sampling-rate GPS trajectories. In Proc.ACM SIGSPATIAL, pages 352–361, 2009.

[33] J. Malczewski. GIS and multicriteria decision analysis. Wiley,1999.

[34] J. Malczewski. Gis-based multicriteria decision analysis: a sur-vey of the literature. International Journal of Geographical In-formation Science, 20(7):703–726, 2006.

[35] W. Meulemans, N. H. Riche, B. Speckmann, B. Alper, andT. Dwyer. KelpFusion: A hybrid set visualization technique.IEEE TVCG, 19(11):1846–1858, 2013.

[36] T. Munzner. A nested model for visualization design and valida-tion. IEEE TVCG, 15(6):921–928, 2009.

[37] T. Munzner. Visualization Analysis and Design. CRC Press,2014.

[38] R. Scheepens, C. Hurter, H. Van De Wetering, and J. J. Van Wijk.Visualization, selection, and analysis of traffic flows. IEEETVCG, 22(1):379–388, 2016.

[39] R. Scheepens, N. Willems, H. Van de Wetering, G. Andrienko,N. Andrienko, and J. J. Van Wijk. Composite density maps formultivariate trajectories. IEEE TVCG, 17(12):2518–2527, 2011.

[40] C. Shi, Y. Wu, S. Liu, H. Zhou, and H. Qu. LoyalTracker:Visualizing loyalty dynamics in search engines. IEEE TVCG,20(12):1733–1742, 2014.

[41] C. Tominski, H. Schumann, G. Andrienko, and N. Andrienko.Stacking-based visualization of trajectory attribute data. IEEETVCG, 18(12):2565–2574, 2012.

[42] Y. Wang, Y. Zheng, and Y. Xue. Travel time estimation of a pathusing sparse trajectories. In Proc. ACM SIGKDD, pages 25–34,2014.

[43] Z. Wang, M. Lu, X. Yuan, J. Zhang, and H. Van De Wetering.Visual traffic jam analysis based on trajectory data. IEEE TVCG,19(12):2159–2168, 2013.

[44] Y. Wu, F. Wei, S. Liu, N. Au, W. Cui, H. Zhou, and H. Qu.OpinionSeer: interactive visualization of hotel customer feed-back. IEEE TVCG, 16(6):1109–1118, 2010.

[45] M. A. Yalcin, N. Elmqvist, and B. B. Bederson. AggreSet: Richand scalable set exploration using visualizations of element ag-gregations. IEEE TVCG, 22(1):688–697, 2016.

[46] K. Zheng, Y. Zheng, N. J. Yuan, and S. Shang. On discovery ofgathering patterns from trajectories. In Proc. IEEE ICDE, pages242–253, 2013.

[47] Y. Zheng. Methodologies for cross-domain data fusion: Anoverview. IEEE transactions on big data, 1(1):16–34, 2015.

[48] Y. Zheng. Trajectory data mining: an overview. ACM TIST,6(3):29, 2015.

[49] Y. Zheng, L. Capra, O. Wolfson, and H. Yang. Urban computing:concepts, methodologies, and applications. ACM TIST, 5(3):38,2014.

[50] H. Zhou, P. Xu, X. Yuan, and H. Qu. Edge bundling in informa-tion visualization. Tsinghua Science and Technology, 18(2):145–156, 2013.

[51] S. Zilberstein. Using anytime algorithms in intelligent systems.AI magazine, 17(3):73, 1996.

smartadp: visual analytics of large-scale taxi ... · pdf filetaxi trajectories for selecting...

Documents