a location-item-time sequential pattern mining algorithm for route recommendation

Accepted Manuscript

A Location-Item-Time Sequential Pattern Mining Algorithm for Route Recom-

mendation

Chieh-Yuan Tsai, Bo-Han Lai, J. Lu

PII: S0950-7051(14)00352-9

DOI: http://dx.doi.org/10.1016/j.knosys.2014.09.012

Reference: KNOSYS 2959

To appear in: Knowledge-Based Systems

Received Date: 14 January 2014

Revised Date: 5 September 2014

Accepted Date: 26 September 2014

Please cite this article as: C-Y. Tsai, B-H. Lai, J. Lu, A Location-Item-Time Sequential Pattern Mining Algorithm

for Route Recommendation, Knowledge-Based Systems (2014), doi: http://dx.doi.org/10.1016/j.knosys.

2014.09.012

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers

we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and

review of the resulting proof before it is published in its final form. Please note that during the production process

errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

http://dx.doi.org/10.1016/j.knosys.2014.09.012

http://dx.doi.org/http://dx.doi.org/10.1016/j.knosys.2014.09.012

http://dx.doi.org/http://dx.doi.org/10.1016/j.knosys.2014.09.012

1

A Location-Item-Time Sequential Pattern Mining Algorithm

for Route Recommendation

Chieh-Yuan Tsai a b

*

Bo-Han Lai a

a Department of Industrial Engineering and Management, Yuan-Ze University, Taiwan

b Innovation Center for Big Data and Digital Convergence, Yuan-Ze University, Taiwan

* Corresponding Author. Email: [email protected]

A revised manuscript submitted to

Knowledge-Based Systems

Professor J. Lu

Fac. of Engineering and Information Technology,

Building 2, Level 7 CB02.07.037,

University of Technology Sydney, P.O. Box 123,

Sydney, NSW 2007, New South Wales, Australia

October 3, 2014

2

ABSTRACT

To survive in a rapidly changing environment, theme parks need to provide high

quality services in terms of visitor tastes and preferences. Understanding the spatial

and temporal behavior of visitors could enhance the attraction management and

geographical distribution for visitors. To fulfill the need, this research defined a

Location-Item-Time (LIT) sequence to describe visitor’s spatial and temporal

behavior. Then, the Location-Item-Time PrefixSpan (LIT-PrefixSpan) mining

algorithm is developed to discover frequent LIT sequential patterns. Next, the route

suggestion procedure is proposed to retrieve suitable LIT sequential patterns for

visitors under the constraints of their intended-visiting time, favorite regions, and

favorite recreation facilities. A simplified theme park is used as an example to show

the feasibility of the proposed system. The experimental results show that the system

can help managers understand visitors’ behavior and provide appropriate visiting

experiences for visitors.

Keyword: Recommendation systems; Sequential pattern; Sequence mining; Theme

park.

3

1. Introduction

A theme park is an aggregation of attractions including architecture, landscape,

rides, shows, food services, costumed personnel and retail shops. Well-known

examples include Disney World, Disneyland, Universal Studios and Six Flags.

Although the theme park industry has enjoyed steady attendance growth in the past

several decades, the theme park market has entered a mature stage and is no longer

experiencing high growth [5, 6]. To survive in a rapidly changing environment, theme

parks need to provide high quality services in terms of visitor tastes and preferences.

Understanding the spatial and temporal behavior of visitors could enhance the

management of attractions and contribute to extending the geographical distribution

of visitors within regions.

In the past decade, the recommendation technique has been regarded as a popular

technique for providing a variety of products, services and items to customers in the

tourism industry [4, 7, 13]. Personalized tourism services aim at helping users to find

what they are looking for by comparing the user profile to reference characteristics.

Wang et al. [19] presented semantic web technologies for providing personalized

access to digital museum collections. Niaraki and Kim [12] proposed a generic

ontology-based architecture using a multi-criteria decision making technique to

design a personalized route planning system. Schiaffino and Amandi [14] developed

an expert software agent in the tourism and travel domain, named Traveler. This agent

combines collaborative filtering with content-based recommendations and

demographic information about customers to make recommendations. García-Crespo

et al. [3] presented the SPETA system, which uses knowledge of user’s current

location, preferences, as well as a history of past locations to provide the type of

recommendation services that tourists expect from a real tour guide. Tsai and Lo [17]

took previous popular visiting behaviors as the foundation and developed a sequential

4

pattern based route suggestion system to generate personalized tours. Tsai and Chung

[16] developed a route recommendation system that provides personalized visiting

routes for tourist in theme parks that consider a set of visiting sequences. Based on the

retrieved visiting behavior data and facility queuing situation, their system can

generate a proper route suggestion for visitors.

The above recommendation systems have demonstrated themselves efficient

tools by designing user interfaces that can smoothly interact with the environment,

providing convenient information query tools, or suggesting a set of associated

products (or services). However, three major problems are revealed. First, these

systems simply return a set of suggested facilities (items) in a sequential order, but fail

to illustrate the complete visiting path for visitors. For example, their systems might

suggest a visitor visit items k1, k4, and k8 in order (i.e., k1→k4→k8). However,

the actual path to complete the route should contain “by-pass items” such as

k1→k4→k7→k8, k1→k4→k6→k8, or even k1→k4→k7→k6→k8. Without providing

complete path information, a visitor might get confused and spend much more time to

finish the route. Second, previous systems seldom take the geographic constraints into

consideration so that their suggested routes are often trivial and impractical. For

example, previous studies might suggest visitor a long route

k1→k2→k6→k4→k7→k10→k8→k12. However, the route is trivial and hard-to-follow

since k1, k2, and k4 are in region A, k6, k7 and k8 are in region B, and k10 and k12 in

region C. In fact, a theme park consists of several regions where each region contains

dozens of facilities and shops. It will be worthwhile to suggest a no-trivial suggestion

such as A(k1, k4, k2) →B(k8, k6)→C(k10, k12) for visitors. Third, previous studies

seldom took the time constraints into consideration when they provided route

suggestion for visitors. For example, previous systems simply suggest a route format

5

such as k1→k4→k8 for visitors. However, when time interval information between

items are revealed, this route will be k1→(1 hours)→k4→(1 hour)→k8. If the

intended-visiting time for a visitor is 90 minutes, this suggestion is unacceptable since

the visitor cannot finish the route on time. On the other hand, if intended-visiting time

is 300 minutes, this suggestion is not suitable also. Without providing time interval

between items in the suggestion, tourists are unsure whether she/he can complete the

suggested route on time or not.

To solve the above problems, this research defines a Location-Item-Time (LIT)

sequence to describe visitor’s spatial and temporal behavior. To the best of our

knowledge, this study is the first work to include location (region), item, and

time-interval information simultaneously into a sequence. Then, the

Location-Item-Time PrefixSpan (LIT-PrefixSpan) mining procedure is developed to

discover frequent LIT sequential patterns. Finally, the route suggestion procedure is

proposed to retrieve suitable LIT sequential patterns under the constraints of visitor’s

intended-visiting time, favorite regions and its related visiting time, favorite recreation

facilities. This paper is organized as follows. Section 2 reviews previous works related

to sequential pattern mining and suggestion. Section 3 introduces the framework of

the proposed route recommendation system. Section 4 demonstrates a case to show

the feasibility of the proposed system. Finally, Section 5 summarizes the conclusions

and points out possible future directions.

2 Literature review

Yavas et al. [20] proposed a three-phase mobility prediction algorithm for the

prediction of user movement in a mobile computing system. Their algorithm enables

the system to allocate resource for users in an efficient manner, and to produce more

6

accurate answers to location-dependent queries that refer to future positions of mobile

users. Cho et al. [2] proposed a sequential rule-based recommendation method that

considers the evolution of customers’ purchase sequences. The purchase transaction

records of a customer for a certain period are used to build a customer profile. Then, a

collaboration-based system is in charge to find a set of customers, through calculating

the correlations among customers profile. Tan et al. [15] proposed a new approach to

build personalization recommendation system based on access sequential patterns,

named Frequent Accessed Sequence Tree (FAS-Tree). All frequent access sequential

patterns are compressed into FAS-Tree to save storage greatly. During personalization

recommendation stage, it is only necessary to traverse sub paths of FAS-tree referring

to page views in active window to find match patterns, without the need to generate

association rules. Yun and Chen [21] developed a mining mobile sequential patterns

algorithm to better reflect the customer usage patterns in the mobile commerce

environment, which takes both the moving patterns (location) and purchase patterns

(items) of customers into consideration. Tseng and Lin [18] proposed a novel data

mining method, namely SMAP-Mine that can efficiently discover mobile users’

sequential movement patterns associated with requested services. Through empirical

evaluation under various simulation conditions, SMAP-Mine is shown to deliver

excellent performance in terms of accuracy, execution efficiency and scalability.

Meanwhile, the proposed prediction strategies are also verified to be effective in

measurements of precision, hit ratio and applicability.

Li et al. [8] proposed a Multi-Stage Collaborative Filtering (MSCF) process to

provide the location-aware event recommendation service in mobile environment. The

first stage in MSCF performs the People-to-People Collaborative Filtering (P2P-CF),

while the Event-to-Event Collaborative Filtering (E2E-CF) discovers the sequential

rules of event-participation in the second stage. Liu and Chang [9] proposed a route

7

recommendation system which guides the user through a series of locations. Their

system used the methods of sequential pattern mining to extract popular route patterns

from a large set of historical user’s route records. Then, the system recommends

routes by matching the user’s current route with the set of popular route patterns. Liu

et al. [10] proposed a novel hybrid recommendation approach that combines the

segmentation-based sequential rule (SSR) method with the segmentation-based

KNN-CF (SKCF) method. In order to enhance the quality of product

recommendations, their method considers customers’ purchase sequences over time

and their purchase data for the current period. Hung and Peng [6] proposed a

Regression-based approach for mining User Movement Patterns (RUMP). Large

Sequence (LS) algorithm extracts the call detail records and Time Clustering (TC)

algorithm determines the number of regression functions. Then, Movement Function

(MF) algorithm generates the movement function representing user movement

patterns of mobile users. Lu et al. [11] proposed a hybrid semantic recommendation

approach which integrates item-based CF similarity with item-based semantic

similarity techniques. The hybrid semantic recommendation approach has been

implemented in an Intelligent Business Partner Locator recommendation system

prototype named BizSeeker. Similarly, Zhang et al. [22] developed a hybrid

recommendation approach which combines user-based and item-based collaborative

filtering techniques with fuzzy set techniques and knowledge base for mobile product

and service recommendation. It particularly implements the approach in a

personalized recommender system for telecom products/services called FTCP-RS.

Although the above sequential pattern algorithms are efficient in different

environment, however, they did not take location, item, and time-interval information

into consideration at the same time.

8

3 Research Method

3.1. Environment assumption and system overview

Typically, a theme park is divided into several regions and each region contains a

set of recreation facilities. It is assumed that each region is fully covered by RFID

readers. In addition, RFID readers are installed in the entrance of each recreation

facility, and entrance and exit of the park. When a visitor with a RFID tagged

wristband enters a region or entrance of a facility, RFID readers record the RFID tag

code, region id, facility id, and the time into a route database. The recording process

continues until the visitor leaves the park. Let’s take the layout in Fig. 1 as an example.

At timestamp t1, a visitor passes the entrance k11 of the park in region B. Then, she

moves to region A at timestamp t2, region F at timestamp t3, and region G at

timestamp t4. In region G, she takes facility k1. After that, she moves to region K at

timestamp t5, region O at timestamp t6. In region O, she takes facilities k2 and k3. The

recording process continues until she leaves the park from the exit k12 in region B.

Finally, the route sequence <(B, t1, {k11}), (A, t2, φ ), (F, t3, φ ), (G, t4, {k1}), (K, t5,

φ ), (O, t6, {k2, k3}), (K, t7, φ ), (G, t8, φ ), (B, t9, {k12})> is collected and stored in the

route database.

Fig. 1. An illustrative example for route sequence generation.

9

Whenever a visitor wants to request a route suggestion, he/she can reach the kiosk

machine in the park and input his/her preference information to the route

recommendation system. The preference includes intended total visiting time, favorite

regions, intended visiting time in the favorite regions, and favorite recreation facilities.

The route recommendation system consists of two major modules. The first module is

to generate a set of frequent Location-Item-Time (LIT) sequential patterns from the

route database using the proposed Location-Item-Time PrefixSpan (LIT-PrefixSpan)

mining procedure. The second module evaluates the similarity between the visitor’s

preference and candidate LIT routes, retrieves top ranking routes for the visitor. The

framework of the proposed system is shown in Fig. 2.

Fig. 2. Two modules in the proposed route recommendation system.

3.2. Location-Item-Time (LIT) sequential patterns

Let N = {n1, n2, …, ng} be the set of cells (regions) in the theme park and K= {k1,

k2, …, kh} be the set of items (facilities, entrance, and exit). In the route database RD,

a record is represented by <sid, rs> where sid is the identifier of the record and rs is a

10

route sequence. Formally, rs is represented as <(B1, t1, itemset1), (B2, t2, itemset2), …,

(Bn, tn, itemsetn)> where (Bi, ti, itemseti) is an event; Bi is the visited region and Bi ∈ N;

ti stands for the timestamp that region Bi is first entered and ti-1≤ti for ni ≤≤2 ;

itemseti is the set of items visited in region Bi and itemseti ⊆ K. Without timestamp

information, < Bi, itemseti> is called a transaction if itemseti is a non-empty set.

Definition 1. A transaction pattern is defined as <Bi; z> where z is the non-empty

subset of itemseti. A transaction pattern <Bi; z> is called a k-transaction pattern if the

length of z is k.

Example I

There are two route sequences sid 300 and sid 600 in the route database RD

shown in Table 1. 6 itemsets {k11}, {k1}, {k3}, {k4}, {k5}, and {k12} can be found in

sid 300, while 4 itemsets of {k11}, {k1}, {k2,k3}, and {k12} can be found in sid 600.

Therefore, transaction patterns <B;{k11}>, <G;{k1}>, <O;{k3}>, <L;{k4}>, <Q,{k5}>,

<B,{k12}> can be extracted from sid 300. Similarly, transaction patterns <B;{k11}>,

<G;{k1}>, <O;{k2,k3}>, <O,{k2}>, <O,{k3}>, <B,{k12}> can be extracted from sid

600. Finally, seven 1-transaction patterns of <B;{k11}>, <G;{k1}>, <O;{k3}>,

<L;{k4}>, <Q,{k5}>, <B;{k12}> and <O,{k2}> and one 2-transaction pattern of

<O;{k2,k3}> can be obtained.

Table 1 A simple route database, RD.

Sid Route sequence

300 <(B,8,{k11}), (G,9,{k1}), (F,11, φ ), (K,24, φ ), (O,25,{k3}), (P,35, φ ), (L,37,{k4}),

(Q,39,{k5}), (M,40,φ ), (H,45,φ ), (D,46,φ ), (C,51,φ ), (B,54, {k12})>

600 <(B,7, {k11}), (A,8,φ ), (F,21,φ ), (G,30,{k1}), (K,41,φ ), (O,44, {k2,k3}), (K,51,φ ),

(G,54,φ ), (B,58, {k12})>

Let ii ttt −=Δ +1 be the time interval between two successive events where

11

11 −≤≤ ni and Tc be a set of given constants for rc ≤≤1 . Then, the time interval

tΔ can be mapped as one of the elements in the set of discrete time intervals TI = {I1,

I2, …, Ir} by

1 1

1

0( )

1j j j

I if t TDiscTI t

I if T t T for j r−

< Δ ≤⎧⎪Δ = ⎨ < Δ ≤ < ≤⎪⎩ (1)

For example, assume T1 = 10, T2 = 20, T3 = 30, T4 = 40, T5 = 50, and T6 = 60.

Therefore, the set of discrete time intervals is TI = {I1, I2, I3, I4, I5, I6}, where I1: 0< tΔ

≤ 10, I2: 10< tΔ ≤ 20, I3:

20< tΔ ≤ 30, I4: 30< tΔ ≤ 40, I5:

40< tΔ ≤ 50, I6: 50< tΔ ≤ 60.

Definition 2. Let Γ = {γ1, γ2, …, γn} be the set of transaction patterns and TI = {I1, I2, …,

Ir} be the set of discrete time intervals. A sequence β = (D1, ε1, D2, ε2,…, Dq-1, εq-1, Dq)

is a Location-Item-Time (LIT) sequence if ΓDs ∈ for qs ≤≤1 and TIs ∈ε for

11 −≤≤ qs .

3.3. Location-Item-Time mining procedure

Similar to the work of Yun and Chen [21], the proposed LIT sequential pattern

mining method consists of three phases: the large-transaction generation phase,

large-transaction transformation phase, and LIT sequential pattern generation phase.

3.3.1. Large-transaction generation phase

The large-transaction generation phase generates the large transactions from the

route database RD. Fig. 3 shows the pseudo-code of the large-transaction generation

algorithm. This algorithm consists of two main steps. As shown from line 1 to 11, the

first step derives all k-transaction patterns from the RD according to Definition 1. In

addition, the support count of each k-transaction pattern is calculated. The second step,

as shown from line 12 to 16, finds the set of large k-transaction patterns. If the support

12

count of a k-transaction pattern is greater than or equal to the user-specified minimum

support count (called min_sup_count), the k-transaction pattern is called a large

k-transaction pattern. Next, the itemsets in all large k-transaction patterns is replaced

by unique symbols. The set of all large k-transaction patterns after symbol

replacement are called large 1-sequential patterns.

Large-transaction generation algorithm Input:

RD A route database min_sup_count Minimum support count

Output: Γ’ The set of large 1-sequential patterns (1-LIT sequential patterns)

Method: (1) for each event (Bi, ti, itemseti) in RD (2) if itemseti is not empty then (3) for each z ⊆ itemseti // z is non-empty subset of itemseti (4) if < Bi, z > is not exist in Γ then (5) add γ =< Bi, z > to Γ , and set its sup_count to 1; (6) else (7) increase sup_count of < Bi, z > by 1; (8) end if (9) end for (10) end if (11) end for (12) for each γ =< Bi, z > in Γ (13) if sup_count of γ ≧ min_sup_count then (14) give z an unique symbol and save to Γ’ ; (15) end if (16) end for

Fig. 3. Pseudo-code of large-transaction generation algorithm.

Example II

Let’s take six route sequences in Fig. 4 as example to explain the

large-transaction generation phase. If the min_sup_count is set as 2, 12 candidate

1-transaction patterns and 2 candidate large 2-transaction patterns are found and

shown in Fig. 5(a). If the sup_count of a transaction pattern is less than the

min_sup_count, the transaction pattern should be deleted. Therefore, transaction

patterns <P;{k9}>, <Q;{k6}>, <Q;{k5,k6}> are deleted. Next, the itemsets in all large

k-transaction patterns is replaced by unique symbols as shown in Fig. 5(b). The set of

13

all large k-transaction patterns after symbol replacement, called large 1-sequential

patterns, is summarized in Fig. 5(c).

Fig. 4. Six route sequences in the RD.

Candidate 2-transaction patterns Cell Itemset Sup_count *Q {k5,k6} 1 O {k2,k3} 2

Candidate 1-transaction patterns Cell Itemset Sup_count G {k1} 6 O {k2} 3 O {k3} 6 *P {k9} 1 L {k4} 5 Q {k5} 3

*Q {k6} 1 M {k7} 2 U {k8} 2 R {k10} 2 B {k11} 6 B {k12} 6

Cell Itemset Large

transaction G {k1} {G;g1} O {k2} {O;g2} O {k3} {O;g3} L {k4} {L;g4} Q {k5} {Q;g5} M {k7} {M;g6} U {k8} {U;g7} R {k10} {R;g8} B {k11} {B;g9} B {k12} {B;g10} O {k2,k3} {O;g11}

Large 1-sequential patterns Large transaction Sup_count

<G;g1> 6 <O;g2> 3 <O;g3> 6 <L;g4> 5 <Q;g5> 3 <M;g6> 2 <U;g7> 2 <R;g8> 2 <B;g9> 6 <B;g10> 6 <O;g11> 2

Fig. 5. Large 1-sequential patterns.

14

3.3.2. Large-transaction transformation phase

The large-transaction transformation phase transforms route sequences into the

maximal large-transaction sequences. Fig. 6 shows the pseudo-code of

large-transaction transformation algorithm. Line 2 to 4 initializes variables String, ML,

and Path as empty values. String is a temporary variable storing the on-going string in

a buffer; ML represents the on-going maximal large-transaction sequence; Path

represents the on-going path of the maximal large-transaction sequence. For each

event (Bi, ti, itemseti) in route sequence rs, itemseti might be non-empty or empty. If

itemseti is non-empty (line 7 to 14), the algorithm checks whether < Bi, z > exists in

the set of large 1-sequential patterns Γ’ or not where z is non-empty subset of itemseti.

If z does exist in Γ’, the algorithm appends its unique symbol g to Gi. After all z are

checked, <(Bi, Gi), ti> will be appended to ML and String will be appended to Path.

Finally, the algorithm sets String to < Bi >. If itemseti is empty (line 16 to 20), the

algorithm checks whether < Bi > has been visited or not. If < Bi > has not been visited,

the algorithm will append < Bi > to String. Otherwise, anything after the first < Bi > in

String will be deleted. Through the phase, the record with the form of <sid, rs> in the

RD will be transferred to the form of <sid, maximal large-transaction sequence, path>

which is stored in the transformed route database TRD.

15

Large-transaction transformation algorithm Input: Γ’ The set of large 1-sequential patterns RD A route database min_sup_count Minimum support count

Output: TRD The set of maximal large-transaction sequence and its path

Method: (1) for each rs in RD (2) set String to empty; // The on-going string in the buffer (3) set ML to empty; // The maximal large-transaction sequence (4) set Path to empty; // The path of the maximal large-transaction sequence (5) for each event (Bi, ti, itemseti) in rs (6) if itemseti is not empty then // itemseti is taken (7) for each z ⊆ itemseti ( z is non-empty subset of itemseti) (8) if < Bi, z > exists in Γ’ then (9) append its unique symbol g to Gi ; (10) end if (11) end for (12) append <(Bi, Gi), ti> to ML; (13) append String to Path; (14) set String to < Bi >; (15) else // itemseti is empty (16) if < Bi > is not in String then (17) append < Bi > to String; (18) else // < Bi > is in String (19) delete anything after the first < Bi > in String; (20) end if (21) end if (22) end for (23) append String to Path; (24) end for

Fig. 6. Pseudo-code of large-transaction transformation algorithm.

Example III

According to the large 1-sequential patterns shown in Fig. 5, Table 2 illustrates

the operations in the large-transaction transformation phase for route sequence sid 600.

The first column is the sequence of movements, the second column is the visited

regions, the third column is the visited time, and the fourth column is the recreation

facilities played by the visitor. The fifth column gives the on-going large-transaction

in the buffer and the sixth column gives on-going string in the buffer. The seventh

column shows the maximal large-transaction sequence and the eighth column shows

16

the path of the maximal large-transaction sequence. After a series of transformation,

the maximal large-transaction sequence for sid 600 becomes <<(B;g9),7>,

<(G;g1),30>, <(O;g2,g3,g11),44>, <(B;g10),58>> and its path is BAFGKOKGB.

Through the same process, all route sequences in the RD of Fig. 4 are transformed to

maximal large-transaction sequences in the TRD as shown in Table 3.

Table 2 Process of producing the maximal large-transaction sequence for sid 600.

Move Cell Time Items Large-

transaction String

Maximal large-transaction

sequence Path

1 B 7 k11 <B;g9> B <(B;g9),7> -

2 A 8 - - BA - -

3 F 21 - - BAF - -

4 G 30 k1 <G;g1> G <(B;g9),7>,<(G;g1),30> BAF

5 K 41 - - GK - -

6 O 44 k2,k3 <O;g2,g3,g11> O <(B;g9),7>,<(G;g1),30>,<(O;

g2,g3,g11),44> BAFGK

7 K 51 - - OK - -

8 G 54 - - OKG - -

9 B 58 k12 <B;g10> B <(B;g9),7>,<(G;g1),30>,<(O;

g2,g3,g11),44>,<(B;g10),58>

BAFGKO

KGB

able 3 Transformed route database, TRD.

Sid Maximal large-transaction sequence Path

100 <(B;g9),4>,<(G;g1),5>,<(O;g2,g3,g11),14>,<(L;g4),20>,<(Q;g5),

22>,<(M;g6),38>,<(U;g7),52>,<(B;g10),60> BGKOTPLQRMQVUPKGB

200 <(B;g9),1>,<(G;g1),20>,<(O;g3),39>,<(L;g4),46>,<(Q;g5),47>,

<(M;g6),50>,<(B;g10),60> BAFGKOTPLQRMIHCB

300 <(B;g9),8>,<(G;g1),9>,<(O;g3),25>,<(L;g4),37>,<(Q;g5),39>,<

(B;g10),54> BGFKOPLQMHDCB

400 <(B;g9),2>,<(G;g1),7>,<(O;g3),17>,<(L;g4),27>,<(R;g8),46>,<

(U;g7),53>,<(O;g2),56>,<(B;g10),60>

BFGKOTPLQRQVUTOKG

B

500 <(B;g9),1>,<(G;g1),2>,<(O;g3),14>,<(L;g4),19>,<(R;g8),40>,<

(B;g10),60> BGKOTPLQRNIDCB

600 <(B;g9),7>,<(G;g1),30>,<(O;g2,g3,g11),44>,<(B;g10),58> BAFGKOKGB

3.3.3. Location-Item-Time sequential pattern generation phase

Next, a LIT sequential pattern algorithm is developed to generate all large LIT

sequential patterns from the TRD. Similar to Chen et al. [1], the proposed LIT

17

sequential pattern algorithm, called LIT-PrefixSpan algorithm, is based on PrefixSpan

mining concept. Before introducing the LIT-PrefixSpan algorithm, the following

definitions are given.

Definition 3. For a maximal large-transaction sequence α = (<(B1; z1), t1>,<(B2; z2),

t2>,…, <(Bn; zn), tn>) and a Location-Item-Time (LIT) sequence β = (D1, ε1, D2, ε2,…,

Dq-1, εq-1, Dq), β is said to be contained in α or β is a LIT subsequence of α if the

integers njjj q ≤<<<≤ ...1 21 exist such that,

1. D1= );(11 jj zB , D2= );(

22 jj zB , …, Dq= );(qq jj zB .

2. 1−

−ii jj tt satisfies the condition of time-interval εi-1 for qi ≤≤2 .

Definition 4. support_countTRD(α) = |{(sid, maximal large-transaction sequence, path)

| (sid, maximal large-transaction sequence, path) ∈ TRD ∧ α is contained in TRD}|.

A LIT sequence α is called a LIT sequential pattern if the percentage of records in

TRD consisting of α is greater than or equal to the pre-defined minimum support,

called min_sup. That is, α is named a LIT sequential pattern in TRD if

support_countTRD(α) ≥ |TRD|× min_sup or support_countTR(α) ≥min_sup_count. A

LIT sequence whose length is l is denoted as a l-LIT sequence.

Definition 5. Given a maximal large-transaction sequence α = (<(B1; z1), t1>,<(B2; z2),

t2>,…, <(Bn; zn), tn>) and a LIT sequence β = (D1, ε1, D2, ε2,…, Dq-1, εq-1, Dq) ( nq ≤ ),

β is a LIT prefix of α if and only if (1) Di=(Bi; zi) for mi ≤≤1 ; (2) 1−− ii tt satisfies

the condition of εi-1 for 11 −≤< mi .

Definition 6. Given a maximal large-transaction sequence α= (<(B1; z1), t1>,<(B2; z2),

18

t2>,…, <(Bn; zn), tn>) and a LIT sequence β = (D1, ε1, D2, ε2,…, Dq-1, εq-1, Dq) ( nq ≤ )

such that β is a subsequence of α. Let i1<i2<…<iq be the indexes of the

large-transaction patterns in α that match the large-transaction patterns of β. A

subsequence 'α = (<( '1B ; '

1z ), '1t >,<( '

2B ; '2z ), '

2t >,…, <( 'Bp ; 'pz ), '

pt >) of sequence α, where

qinqp −+= is called a projection of α with respect to β if and only if (1) β is a LIT

prefix of 'α and (2) the last qin− large-transaction patterns of 'α are the same as

the last qin− large-transaction patterns of α.

Definition 7. Let 'α = (<( '1B ; '

1z ), '1t >,<( '

2B ; '2z ), '

2t >,…, <( 'Bp ; 'pz ), '

pt >) be the projection

of α with respect to a LIT prefix β = (D1, ε1, D2, ε2,…, Dq-1, εq-1, Dq). Then θ = (<( '1B +q ;

'1+qz ), '

1+qt >,<( '2B +q ; '

2+qz ), '2+qt >, …, <( 'Bp ; '

pz ), 'pt >) is the postfix of α with respect to

prefix β.

The pseudo-code of the proposed LIT-PrefixSpan algorithm is illustrated in Fig.

7. The α-projected database defined by the collection of postfixes of maximal

large-transaction sequences in TRD with respect to α is denoted as TRD|α. The major

difference between LIT-PrefixSpan and I-PrefixSpan is that the LIT-PrefixSpan

includes both cells and items in transaction pattern. Therefore, a table LIT_Table is

used to store this type of relation, where a column corresponds to a large-transaction

pattern and a row corresponds to a time-interval in TI = {I1, I2, …, Ir}. Each cell

LIT_Table(Ii, 'iγ ) in the table records the number of transactions in TRD|α which

contains transaction pattern and the time difference between this transaction pattern

and the last transaction pattern of α lies within Ii. Processing every transaction in

19

TRD|α sequentially enables LIT_Table to be formed and the frequent cells to be

identified. If the cell LIT_Table(Ii, 'iγ ) is a frequent cell, (Ii, 'iγ ) can be appended to α

to yield a LIT sequential pattern 'α , and to construct the 'α -projected database TRD

'α . Recursively discovering the LIT sequential patterns in TRD 'α finally yields all

LIT sequential patterns in the TRD.

Subroutine LIT-PrefixSpan(α, l, TRD|α) Input: Γ’={ '1γ , '2γ ,…, 'nγ } The set of large 1-sequential patterns

(1-LIT sequential patterns) TRD The set of maximal large-transaction sequence and its path

(The transformed route database) TI={I1,I2,…,Ir} Time-interval LIT_Table The table stores the relation between the set of large 1-sequential patterns

and time-interval. min_sup_count Minimum support count

Output: The LIT sequential patterns

Method: (1) if l = 0 then (2) for each 'iγ in Γ’ (3) append 'iγ to α as α’ ; (4) output α’ ; (5) construct α’-projected database TRD|α’, (6) and call LIT-PrefixSpan (α’, l+1, TRD|α’) ; (7) end for (8) else (9) set LIT_Table to empty; (10) for each sid in TRD|α (11) construct α-time ; (12) for each Gi of the sid in TRD|α (13) calculate the time-interval between Gi and α, and classify to Ii ; (14) for each g of Gi (15) Increase the count of (Ii,

'iγ ) in LIT_Table by 1; (16) end for (17) end for (18) end for (19) for every frequent cell (Ii,

'iγ ) in LIT_Table (20) append (Ii,

'iγ ) to α as α’ ; (21) output α’ ; (22) construct α’-projected database TRD|α’, (23) and call LIT-PrefixSpan (α’, l+1, TRD|α’) ; (24) end for (25) end if

Fig. 7. Pseudo-code of the LIT-PrefixSpan algorithm.

20

Example IV

Suppose TI = {I1, I2, I3, I4, I5, I6}, where I1: 100 ≤Δ< t , I2: 2010 ≤Δ< t , I3:

3020 ≤< t , I4: 4030 ≤Δ< t , I5:

5040 ≤Δ< t , I6: 6050 ≤Δ< t . Consider the TRD

shown in Table 3 and the min_sup_count is set as 2. At the beginning, α is empty and

the frequent transaction patterns <B;g9>, <G;g1>, <O;g2>, <O;g3> <O;g11>, <L;g4>,

<Q;g5>, <M;g6>, <U;g7>, <R;g8> and <B;g10> are discovered. Appending these

frequent transaction patterns to α is empty and yields 9 different 'α . Table 4

summarizes the LIT sequential pattern mining result. The total number of LIT

sequential patterns is 68 (=11+25+23+8+1) since there are 11 1-LIT sequential

patterns, 25 2-LIT sequential patterns, and so on.

Table 4 LIT sequential pattern mining result.

k Number of patterns k-LIT sequential patterns Sup_Count

1 11

<B;g9>

<G;g1>

..

<B;g10>

6

6

…

6

2 25

<B;g9>,I6,<B;g10>

<B;g9>,I2,<O;g3>

…

<R;g8>,I2,<B;g10

5

3

…

2

3 23

<B;g9>,I1,<G;g1>,I6,<B;g10>

<B;g9>,I1,<G;g1>,I2,<O;g3>

…

<L;g4>,I1,<O;g5>,I2,<B;g10>

3

2

…

2

4 8

<B;g9>,I1,<G;g1>,I4,<R;g8>,I2,<B;g10>

<B;g9>,I1,<G;g1>,I1,<O;g3>,I4,<U;g7>

..

<G;g1>,I3,<L;g4>,I1,<Q;g5>,I2,<B;g10>

2

2

…

2

5 1 <B;g9>,I1,<G;g1>,I1,<O;g3>,I4,<U;g7>,I1,<B;g10> 2

3.4. Route recommendation procedure

When a visitor requires a route suggestion, he/she is requested to enter persoal

preference to the route recommendation system in the kiosk. The visitor’s preference

can be represented as a VP vector:

21

VP=<ITVT, <FR1, FItems1, IRVT1>, <FR2, FItems2, IRVT2>,…> (2)

where ITVT is the intended total visiting time. FRi is the favorite region i , FItemsi is

the set of favorite facilities in FRi, and IRVTi is the intended visiting time in FRi. For

example, VP = <420, <G, {k1}, 90>, <O, {k2, k3}, 120>> indicates that a visitor

intends to spend 420 minutes in the theme park. In addition, he/she would like to

spend 90 minutes in region G and take recreation facility k1 in region G, and 120

minutes in region O and take recreation facility k1 and k3 in region O. Note that the

more information a visitor centers, the more satisfied suggestion the visitor can

obtain.

3.4.1. Time constraint

The number of LIT sequential patterns generated from LIT-PrefixSpan algorithm

might be large. However, a LIT sequential pattern is a candidate LIT route if the

pattern satisfies the following rules. First, a LIT sequential pattern should include

entrance and exit. Second, a LIT sequential pattern should satisfy the time constraint

provided by the visitor. As mentioned in Section 3.2, the time interval tΔ can be

transferred as one of elements in the set of discrete time intervals TI = {I1, I2, …, Ir}

according to Equation (1). Therefore, the lower bound and upper bound of a time

intervals Ij are derived using Equations (3) and (4) respectively.

rj

j

TIf

jjLB ≤<

=

⎩⎨⎧

=− 1if

1if

,

,0)(

1

(3)

rj

j

T

TIf

jjUB ≤<

=

⎩⎨⎧

=1if

1if

,

,)(

1 (4)

Let a LIT sequential pattern β be represented as (D1, ε1, D2, ε2,…, Dq-1, εq-1, Dq). The

total visiting time of β can be represented as VTβ = ( β

LBVT , βUBVT ] where the lower bound

22

of VTβ is derived as:

∑ −

== 1

1)(

q

s sLBLB fVT εβ

(5)

and the upper bound of VTβ is defined as:

∑ −

== 1

1)(

q

s sUBUB fVT εβ (6)

If ITVTVTLB ≤β and ITVTVTUB ≥β , we say that LIT sequential pattern β satisfies the a

visitor’s time constraint where ITVT is the visitor’s the intended total visiting time in

Equation (2).

Example V

Suppose TI = {I1, I2, I3, I4} where I1: 0< tΔ ≤ 30, I2:

30< tΔ ≤ 60, I3: 60< tΔ ≤ 90,

I4: 90< tΔ ≤ 120, and five LIT sequential patterns are shown in the first three columns

of Table 5. According to Equations (3) to (6), the total visiting time of each pattern

can be derived in the last column of Table 5. If a visitor’s intend total visiting time

ITVT is 320 minutes, LIT sequential patterns 1, 2, and 3 are considered as candidate

routes since LIT sequential patterns 4 and 5 do not satisfy the visitor’s time constraint.

Table 5 Five LIT sequential patterns.

No LIT sequential pattern Path Total visiting

time

1 <B;k11>,I1,<O;k3>,I2,<L;k4>,I4,<Q;k6>,I1,<M;k7>,I4,

<B;k12> BAKOKLQMHCB (210,360]

2 <B;k11>,I1,<G;k1>,I3,<O;k2,k3>,I4,<L;k4>, I4,<B;k12> BGKOTPLHCB (240,360]

3 <B;k11>,I1,<G;k1>,I3,<L;k4>,I4,<Q;k6>,I1,<O;k2,k3>,

I2, <B;k12> BGLQPOKFB (180,330]

4 <B;k11>,I1,<G;k1>,I1,<O;k3>,I4,<U;k8>,I4, <B;k12> BGKOTUQLHCB (180,300]

5 <B;k11>,I1,<G;k1>,I3,<L;k4>,I1,<Q;k5>,I2, <B;k12> BGLQMHCB (90,210]

3.4.2. Similarity measurement

The similarity between VP=<ITVT, <FR1, FItems1, IRVT1>, <FR2, FItems2,

23

IRVT2>,…> and candidate LIT route β = ((B1; z1), ε1, (B2; z2), ε2,…, (Bq-1; zq-1), εq-1,

(Bq; zq)) is evaluated based on the following concepts. First, the intended visiting time

for region i, IRVTi, in VP will be mapped as one of the elements in TI = {I1, I2, …, Ir}

according to Equation (1) for all i. Second, when conducting the similarity evaluation,

<FRi, FItemsi, IRVTi> in VP and <(Bj; zj), εj> in β are considered as comparison units.

Third, if FRi and Bj are the same region, similarity evaluation between <FItemsi,

IRVTi> and <zj, εj> will be initialized. Base on above concepts, the similarity between

ith unit in VP and the jth unit in β is defined as:

1 2 3

,

if1 ( , ) ( , )

if0i j

i ji j

FR Bw w ISim i j w TSim i jSim

FR B

=× + × + ×⎧= ⎨ ≠⎩

(7)

where w1, w2, and w3 are the important degrees for region, facility, time-interval

considerations respectively, and w1 + w2 + w3 = 1. ISim(i, j) is the itemset similarity

between FItemsi and zj which is defined as:

||/||),( iji FItemszFItemsjiISim ∩= (8)

where ∩ is the set union operator and | | is the cardinality of the set. In addition,

TimeIntervalSim(i, j) is the time interval similarity between IRVTi and εj which is

defined as:

)(/|)()(|1),( rji IffIRVTfjiTSim ε−−= (9)

where || ⋅ is the absolute value operator and f (Ib) is the rank of the time-interval Ib in

TI and is defined as f (Ib) = b where b = 1, …, r. With Equation (7), the similarity

between VP and β is defined as:

||),(),(||

1

||

1

VPjisimVPSimVP

i j∑∑

= =

=β

β (10)

where || ⋅ is the length of the sequence. After the similarities between VP and all

candidate routes are derived, they are sorted in decreasing order and returned back to

24

the kiosk machine as suggested routes. If more than one candidate routes have the

same similarity value, the route having larger number of total facilities has higher

ranking order.

Example VI

Assume LIT sequential patterns 1, 2, and 3 in Example V are candidate LIT

routes and visitor preference of a visitor is VP = <300, <O, {k3}, 70>, <Q, {k5, k6},

100>>. According to discrete time-interval definition in Example V, VP will be

transferred as <300, <O, {k3}, I3>, <Q, {k5, k6}, I4>>. For candidate LIT route #1, we

have Sim1,1 = 1/3 + 1/3*(1/1) + 1/3* (1-|f(I3)-f(I2)|/f(I4)) = 11/12.; Sim1, 2= 0; Sim1, 3=0;

Sim1, 4= 0. Sim2, 1= 0 ; Sim2, 2= 0 ; Sim2,3= 1/3 + 1/3*(1/2) + 1/3*(1-|f(I4)-f(I1)|/f(I4)) =

7/12; Sim2,4= 0. Hence, the total similarity between VP and candidate LIT route #1 is

((11/12 + 0 + 0 +0 ) + (0+ 0 + 7/12 + 0))/2 = 0.75. With the similar process, we have

Sim(VP, #1) = 0.75, Sim(VP, #2) = 0.458, and Sim(VP, #3) = 0.75. It is found that the

candidate LIT route #1 and #3 have the same total similarity score. When the total

similarity score are the same, their total number of facilities will be compared. The

total number of facilities for candidate LIT route #1 and #3 are 4 and 5 respectively.

Therefore, candidate LIT route #3 is ranked as 1. Table 6 shows the final ranking

result for the three candidate LIT routes. Based on candidate LIT route #3 and its path,

the route recommendation system will suggest a visitor to pass entrance k11 in region

B. After time-interval I1, the visitor is suggested to move to region G and take k1.

Then, after time-interval I3, he/she is suggested to take k4 in region L, and so on.

25

Table 6 Three LIT candidate routes and their rankings.

N

o. Candidate LIT route Path

Total

visiting

time

Total

similarity

score

Total

facility

number

Final

rank

3 <B;k11>,I1,<G;k1>,I3,<L;k4>,I4,

<Q;k6>,I1,<O;k2,k3>,I2, <B;k12> BGLQPOKFB (180,330] 0.75 5 1

1 <B;k11>,I1,<O;k3>,I2,<L;k4>,I4,

<Q;k6>,I1,<M;k7>,I4, <B;k12>

BAKOKLQMH

CB (210,360] 0.75 4 2

2 <B;k11>,I1,<G;k1>,I3,<O;k2,k3>,

I4,<L;k4>,I4,<B;k12> BGKOTPLHCB (240,360] 0.458 4 3

4. Implementation and experiment results

The proposed route recommendation system is implemented using C# and tested

on a PC with Core i5 2.80GHz CPU and 4GB memory.

4.1. Case description and route generator

In this study, a simplified theme park is used as an example to illustrate the

feasibility of the proposed system. As shown in Fig. 8, there are seven thematic

regions and thirty-four recreation facilities (k1 to k34). For example, thematic region B

contains facilities k1, k2, and k3, while thematic region H contains facilities k31, k32, k33,

and k34. To simulate visiting behavior, a route generator is developed. In the generator,

visitors start their visiting from the entrance (k35) and finish at the exit (k36). The

regions that visitors pass through must be adjacent. The total visiting time of a route

sequence is randomly determined by a uniform distribution within 780 (minute) since

the operation time of the park is from 9:00 a.m. to 10:00 p.m.. The time in which a

visitor moves to the next region is randomly generated from a uniform distribution

between 15 (minute) and 30 (minute). In addition, the time in which a visitor spends

for taking a recreation facility is randomly generated from a uniform distribution

between 30 (minute) and 90 (minute).

26

Fig. 8. Layout of the implementation example.

According to the tourism reports, five must-visited recreation facilities are k4, k12,

k13, k25, k26 and seven popular facilities are k2, k6, k17, k22, k23, k24, k32. Therefore, if a

generated route sequence does not contain one of the five must-visited recreation

facilities, the route will be discarded. Likewise, if a generated route sequence does not

contain one of the seven popular recreation facilities, this route sequence will have

80% of probability to be discarded. In addition, the average number of visitors in the

theme park is 26,000 per day. Therefore, 26,000 route sequences are generated to

simulate the visiting behaviors of visitors in one day.

4.2. Route recommendation

Before executing the proposed LIT-PrefixSpan mining procedure, the minimum

support and the set of discrete time intervals should be determined. For simplicity, the

time intervals in this study are set as equal length of 30 minutes and the minimum

27

support is set as 0.02%. That is, the set of discrete time intervals are TI = {I1, I2, I3, …,

I20}, where I1: 0 30t< ≤ , I2: 30 60t< ≤ , I3: 60 90t< ≤ , …, I20: 800760 ≤< t . Based

on the settings, 380,735 LIT sequential patterns are discovered from LIT-PrefixSpan

mining procedure.

Assume a new visitor intends to spend 420 minutes (7 hours) and wishes to play

recreation facility {k12} of region D, and recreation facility {k22} of region F. In

addition, he/she wishes to spend 150 minutes in region D and 120 minutes in region F,

respectively. Thus, the visitor preference, VP, is <420, <D, {k12}, 150>, <F, {k22},

120>>. The important degrees for region w1, facility w2, time-interval w3 in Equation

(7) are set equally as 1/3. Based on the set of discrete time-intervals I, the total

visiting time (VTu) of each LIT sequential pattern can be calculated. After deleting the

sequential patterns that do not contains entrance and exit as well as the patterns that

do not satisfy the time constraint, 5,471 candidate LIT routes can be found. Table 7

shows the ranking information of candidate LIT routes derived by the route

recommendation generation module.

28

Table 7 Ranking information of each candidate LIT routes.

Ranking Candidate LIT route Total visiting time (min)

Total similarity score

Total facility number

Sup. Path

1 <A;k35>,I1,<B;k2>,I4, <D;k12,k13>,I4,<F;k22>,I4, <A;k36>

(360,520] 0.991667 4 5

ABDFCBA ABDFCBA ABDFCBA ABDFCBA ABDFCBA

2 <A;k35>,I1,<B;k2>,I4,<D;k12>,I4,<F;k22>,I4,<A;k36>

(360,520] 0.991667 3 5

ABDFCBA ABDFCBA ABDFCBA ABDFCBA ABDFCBA

3 <A;k35>,I1,<B;k2>,I3, <D;k12,k13>,I4,<F;k22>,I5, <A;k36>

(360,520] 0.983333 4 5

ABDFCBA ABDFCBA ABDFGDBA ABDFCBA ABDFCBA

4 <A;k35>,I3,<D;k12,k13>,I4, <F;k22>,I5,<A;k36>

(360,480] 0.983333 3 8

ABDFCBA ABDFGDBA … ABDFCBA

… … … … … … …

5,471 <A;k35>,I1,<B;k1>,I11,<A;k36> (400,480] 0 1 522

ABDEBA ABDCBA � ABDEHEBA

Fig. 9 shows top one ranking visiting route. The recommendation system suggests

the visitor starts the trip from the entrance in region A. Within 30 minutes

(time-interval I1), the visitor is suggested to take k2 recreation facility in region B.

After 120 minutes to 160 minutes (time-interval I4), the system suggests the visitor

takes k12 and k13 in region D. Again, after 120 minutes to 160 minutes (time-interval

I4), the visitor is suggested to take k22 in region F. Finally, after 120 minutes to 160

minutes (time-interval I4), the visitor is suggested to leave the theme park from the

exit in region A by passing through regions C, B, and A sequentially.

29

Fig. 9. Visiting sequence recommendation based on visitor’s preference.

To validate the proposed route recommendation module, different visitor’s

preferences shown in Table 8 are experimented. Case I is the case previously

introduced and used as the benchmark case. For Case II, a shorter intended-leaving

time (300 minutes) is inputted. Therefore, it is straightforward that less recreation

facilities will be suggested. Fig. 10(b) shows the suggested rout

<A;k35>,I1,<D;k12,k13>,I4,<F;k22>,I4,<A;k36> with the path ABDFCBA. For Case III,

the visitor simply inputs the constraints of takings k12 in region D and spending 150

minutes in region D. Since less constrains are provided in Case III, the similarity

between the visitor’s preference and many candidate routes are 1. Fig. 10(c) shows

one of candidate routes, <A;k35>,I1,<B;k1,k2,k3>,I6,<D;k12,k13>,I4,<A;k36>, suggested

by the system. For Case IV, the intended-leaving time is the same as the one in Case I,

but other preferences are different. Fig. 10(d) shows the route recommendation

system suggests 3 recreation facilities (k3, k12, k32) among 3 regions (B, D, H) for Case

30

IV.

Table 8 Different visitors’ preference settings.

Case ITVT < FRi, { Fav-itemseti}, VTi >

I 420 (min.) <D, {k12}, 150>, <F, {k22}, 120>

II 300 (min.) <D, {k12}, 150>, <F, {k22}, 120>

III 420 (min.) <D, {k12}, 150>

IV 420 (min.) <H, {k32}, 120>,<B, {k1}, 90>

(a) Case I (b) Case II

(c) Case III (d) Case IV

Fig. 10. Route recommendation results.

4.3. Experimental designs

In the proposed route recommendation system, different parameter settings might

affect the final suggestion results. Therefore, a set of experiments are conducted to

31

observe the affection caused by these parameters. Without other notice, the setting of

parameters and visitor preference is the same as Case I in Section 4.2.

4.3.1. Discussion of data size

As discussed in Section 3.3, the LIT-PrefixSpan mining procedure module

consists of three major phases: the large-transaction generation phase (Phase I), the

large-transaction transformation phase (Phase II), and the Location-Item-Time

sequential pattern generation phase (Phase III). To observe how the number of route

sequences (data size) affects the LIT-PrefixSpan mining procedure module, data size

is changed from 10,000 to 26,000. Table 9 summarizes the execution time of each

phase in the LIT-PrefixSpan mining procedure module. It is clear that, when the

number of route sequences increases, the execution time for the LIT-PrefixSpan

mining procedure module increases linearly. In addition, the execution time of Phase

III is significantly longer than the time of other two phases. Although the

LIT-PrefixSpan mining procedure module takes much time to execute, this module is

typically daily or weekly instead of every request.

Table 9 Execution time (in second) of each phase in LIT-PrefixSpan mining procedure

module.

The number of route sequences

10,000 13,000 16,000 19,000 22,000 26,000

Phase I 0.58 0.74 0.89 1.16 1.22 1.44

Phase II 0.96 1.09 1.30 1.66 1.82 2.08

Phase III 8240.01 7241.19 10563.63 11088.18 15220.72 17285.32

Total 8241.55 7243.01 10565.82 11090.99 15223.77 17288.83

4.3.2. Discussion of minimum support

To know how the minimum support in LIT-PrefixSpan mining procedure module

affect the result, the minimum support ranging from 0.02% to 0.2% is experimented.

32

Fig. 11(a) represents the number of LIT sequential patterns generated from the

LIT-PrefixSpan mining procedure module, and Fig. 11(b) represents the number of

candidate LIT routes in route recommendation generation module under different

minimum supports. When the minimum support increases, both the number of LIT

sequential patterns and the number of candidate LIT routes decrease. If the minimum

support value is set as 0.02%, the first module generates 380,735 LIT sequential

patterns and the second module generates 5,471 candidate LIT routes. However, if the

minimum support is 0.2%, there are only 14,831 LIT sequential patterns generated

from the first module, and 108 candidate LIT routes in the second module. Therefore,

based on the observation from the Fig. 11, the minimum support value is suggested as

0.02% in this case.

(a) (b)

Fig. 11. (a) Number of LIT sequential patterns and (b) Number of candidate LIT

routes under different minimum supports.

Fig. 12 shows the execution time of LIT-PrefixSpan mining procedure module and

route recommendation generation module under different minimum supports. It is

clear that, when the minimum supports increases, the execution time of the two

380735

139981

72955

4965337833

2128714831

0

50000

100000

150000

200000

250000

300000

350000

400000

Th

e n

um

be

r o

f LI

T s

eq

ue

nti

al

pa

tte

rn

Minimumsupport

5471

2478

1313

870563

239108

0

1000

2000

3000

4000

5000

6000

Th

e n

um

be

r o

f ca

nd

ida

te L

IT r

ou

tes

Minimumsupport

33

modules decreases. It is notes that the execution time for route recommendation

generation module is 1.27 second if minimum support is 0.2%. The execution time

should be acceptable for visitors to conduct the on-line recommendation request.

Fig. 12. Execution time of the two modules under different minimum supports.

4.3.3. Discussion of time-interval range

To observe how the time-interval range affects the proposed route

recommendation system, a set of time-interval ranges from 10 minutes to 120 minutes

are experimented. Fig. 13(a) summarizes the number of LIT sequential patterns

generated from the LIT-PrefixSpan mining procedure module and Fig. 13(b)

summarizes the number of candidate LIT routes generated from route

recommendation generation module. As shown in Fig. 13, when the range of time

interval increases, both LIT sequential patterns and the number of candidate LIT

routes increase. That is, if the time interval range is large, the time between two

events will fall into the same time interval range. Thus, it is easier to satisfy the

minimum support threshold and generate many same LIT sequential patterns. For

example, assume that there are ten route sequences of <(D,28,{k12}),(A,45,{k35})>

and ten route sequences of <(D,40,{k12}),(A,85,{k35})>. If time-interval range is set as

17288.83

11678.17

8781.02 7625.68 7164.62

5715.35 5386.71

751.46 209.04 70.85 32.47 16.18 3.79 1.27

0.00

2000.00

4000.00

6000.00

8000.00

10000.00

12000.00

14000.00

16000.00

18000.00

20000.00

0.02% 0.04% 0.06% 0.08% 0.10% 0.15% 0.20%

Th

e e

xe

cuti

on

tim

e (

seco

nd

)

Minimumsupport

LIT-PrefixSpan mining procedure module

Route recommendation generation module

34

30 minutes, two different LIT sequential patterns <(D;k12), I1, (A;k35)> and <(D;k12),

I2, (A;k35)> are found where I1:0~30, I2:30~60. However, if the time-interval range is

set as 60 minutes, two the same LIT sequential pattern <(D;k12), I1, (A;k35)> is

generated since I1:0~60. Based on the observation from Fig. 13, time-interval range is

suggested as the value between 40 minutes to 60 minutes to ensure the quality of the

suggested routes.

(a) (b)

Fig. 13. (a) Number of LIT sequential patterns and (b) Number of candidate LIT

routes under different time-interval ranges.

4.3.4. Discussion of w1, w2, and w3

In Equation (7), w1,w2 and w3 are the important degree for region, facility and

time-interval consideration respectively. To observe how important degree values

affect the route ranking, three more experiments are conducted. As shown in Table 10,

no matter how the important degree value is changed, the top-four ranking candidate

LIT routes are the same. The reason is that the region comparison is conducted first

according to the third rule of similarity measurement design in Section 3.4.2. That is,

if the region in the VP vector is not the same with the region in the candidate LIT

193507

261202

380735

462142

525464

582069

637042

0

100000

200000

300000

400000

500000

600000

700000

10 20 40 60 80 100 120

Th

e n

um

be

r o

f LI

T s

eq

ue

nti

al

pa

tte

rn

Time-interval range (minute)

3862169

5471 6105

19277

34647

37849

0

5000

10000

15000

20000

25000

30000

35000

40000

10 20 40 60 80 100 120

Th

e n

um

be

r o

f ca

nd

ida

te L

IT r

ou

tes

Time-interval range (minute)

35

route, the similarity between the facilities and time-interval of the two regions will not

be counted. This design makes the important degree values have less affection for the

proposed system. Based on the observation from the Table 10, the important degree

for region, facility and time-interval is suggested as 1/3, 1/3, and 1/3 respectively in

this study.

Table 10 Route ranking using different important degrees.

w1 w2 w3 Ranking Candidate LIT route Total similarity score

1/3 1/3 1/3

1 <A;k35>,I1,<B;k2>,I4,<D;k12,k13>,I4,<F;k22>,I4,<A;k36> 0.9917 2 <A;k35>,I1,<B;k2>,I4,<D;k12>,I4,<F;k22>,I4,<A;k36> 0.9917 3 <A;k35>,I1,<B;k2>,I3,<D;k12,k13>,I4,<F;k22>,I5,<A;k36> 0.9833 4 <A;k35>,I3,<D;k12,k13>,I4,<F;k22>,I5,<A;k36> 0.9833 5 <A;k35>,I1,<B;k3>,I4,<D;k12>,I3,<F;k22>,I4,<A;k36> 0.9833 6 <A;k35>,I1,<B;k2>,I3,<D;k12>,I4,<F;k22>,I5,<A;k36> 0.9833

0.8 0.1 0.1


0.1 0.8 0.1


0.1 0.1 0.8

1 <A;k35>,I1,<B;k2>,I4,<D;k12,k13>,I4,<F;k22>,I4,<A;k36> 0.98 2 <A;k35>,I1,<B;k2>,I4,<D;k12>,I4,<F;k22>,I4,<A;k36> 0.98 3 <A;k35>,I1,<B;k2>,I3,<D;k12,k13>,I4,<F;k22>,I5,<A;k36> 0.96 4 <A;k35>,I3,<D;k12,k13>,I4,<F;k22>,I5,<A;k36> 0.96 5 <A;k35>,I1,<B;k2>,I3,<D;k12>,I4,<F;k22>,I5,<A;k36> 0.96 6 <A;k35>,I3,<D;k12>,I4,<F;k22>,I5,<A;k36> 0.96

5. Conclusions and further study

In the past decade, the recommendation technique has been regarded as a popular

technique for providing a variety of products, services and items to potential visitors

in the tourism industry. Many recommendation systems have demonstrated

themselves efficient tools by designing user interfaces that can smoothly interact with

36

the environment, providing convenient information query tools, or suggesting a set of

associated products (or services). However, three major problems are revealed. First,

these systems simply return a set of suggested facilities (items) in sequential order,

but fail to illustrate the complete visiting path for visitors. Second, previous systems

seldom take the geographic constraints into consideration so that their suggested

routes might be trivial and hard to follow. Third, previous studies seldom take the

time interval between items into consideration. To solve the above problems, this

research defines a Location-Item-Time (LIT) sequence to describe visitor’s spatial and

temporal behavior. To the best of our knowledge, this study is the first work to include

location (region), item, and time-interval information at the same time into a sequence.

Then, the Location-Item-Time PrefixSpan (LIT-PrefixSpan) mining procedure is

developed to discover frequent LIT sequential patterns. Next, the route suggestion

procedure is developed to retrieve suitable LIT sequential patterns. The experimental

results show that the managers can understand their visitors clearly in terms of

proposed location-item-time sequential patterns.

Although the case of a theme park is illustrated in this paper, the proposed

three-phase methodology can be applied to any field if records of location, item, and

time are available. For example, in mobile commerce environment, a customer moves

among cellular girds and makes transaction in the corresponding cell through the

mobile device. Through the proposed recommendation system, a customer can obtain

real time store/shopping suggestion by the mobile device before he/she moves to the

next cellular grid. Similarly, in a grocery store, a customer moves around store aisles

and pick up his/her target products. The recommendation system can provide the

customer an efficient moving path and prompt him/her other popular products to

increase cross-selling opportunity.

Some potential extensions for this research are as follows. First, in some cases, the

37

entrance and exit of a facility might belong to different regions. It would be

worthwhile to discuss such irregular layouts in the future. Second, the minimum

support, time interval range, and the important degree should be decided by users

currently. Further studies can explore how to automate the parameter settings by

adopting optimization techniques. Third, when visitors are visiting a theme park, they

might plan to visit some facilities than others. As such, the further study can ask

visitors input the facility priorities and rearrange the route according to the priorities.

Finally, the proposed system assumes that a visitor makes a recommendation request

at the time he/she enters the park. It is possible, however, that a visitor wants to make

a recommendation request at anytime and anywhere in the park. The future study

might record visitor’s requested location and time so that the system can provide more

flexible suggestions.

Acknowledgement

This work was partially supported by the National Science Council of Taiwan, R.O.C.,

No. 102-2221-E-155-041-MY3.

References

1. Y.L. Chen, M.C. Chiang, M.T. Ko, Discovering time-interval sequential patterns

in sequence database, Expert Systems with Applications 25 (3) (2003) 343-354.

2. Y.B. Cho, Y.-H. Cho, S.H. Kim, Mining changes in customer buying behavior for

collaborative recommendations, Expert Systems with Applications 28 (2) (2005)

359-369.

3. A. García-Crespo, J. Chamizo, I. Rivera, M. Mencke, R. Colomo-Palacios, J.M.

Gómez-Berbís, SPETA: social pervasive e-tourism advisor, Telematics and

Informatics 26 (3) (2009) 306–315.

4. A. Guerbas, O. Addam, O. Zaarour, M. Nagi, A. Elhajj, M. Ridley, R. Alhajj,

38

Effective web log mining and online navigational pattern prediction,

Knowledge-Based Systems 49 (2013) 50–62.

5. C.Y. Heo, S. Lee, Application of revenue management practices to the theme

park industry, International Journal of Hospitality Management 28 (3) (2009)

446–453.

6. C.C. Hung, W.C. Peng, A regression-based approach for mining user movement

patterns from random sample data, Data and Knowledge Engineering 70 (1)

(2011) 1-20.

7. K. Kabassi, Personalizing recommendation for tourists, Telemetric and

Informatics 27 (1) (2010) 51-66.

8. L.H. Li, F.M. Lee, Y.C. Chen, C.Y. Cheng, A multi-stage collaborative filtering

approach for mobile recommendation, Proceedings of the 3rd International

Conference on Ubiquitous Information Management and Communication, 2009,

pp. 88-97.

9. D. Liu, M. Chang, Recommend touring routes to travelers according to their

sequential wandering behaviors, Proceedings of the 10th International

Symposium on Pervasive Systems, Algorithms, and Networks, 2009, pp.

350-355.

10. D.R. Liu, C.H. Lai, W.J. Lee, A hybrid of sequential rules and collaborative

filtering for product recommendation, Information Sciences 179 (20) (2009)

3505-3519.

11. J. Lu, Q. Shambour, Y. Xu, Q. Lin, G. Zhang, BizSeeker: a hybrid semantic

recommendation system for personalized government-to-business e-services,

Internet Research 20 (3) (2010) 342-365.

12. A.S. Niaraki, K. Kim, Ontology based personalized route planning system using

a multi-criteria decision making approach, Expert Systems with Applications 36

(2) (2009) 2250–2259.

13. M. Salehi, I.N. Kamalabadi, Hybrid recommendation approach for learning

material based on sequential pattern of the accessed material and the learner’s

preference tree, Knowledge-Based Systems 48 (2013) 57–69.

14. S. Schiaffino, A. Amandi, Building an expert travel agent as a software agent,

Expert Systems with Applications 36 (2) (2009) 1291–1299.

15. X. Tan, M. Yao, M. Xu, An effective technique for personalization

recommendation based on access sequential patterns, Proceedings of 2006 IEEE

Asia-Pacific Conference on Services Computing, 2006, pp. 42-46.

16. C.Y. Tsai, S.H. Chung, A personalized route recommendation service for theme

parks using RFID information and tourist behavior, Decision Support Systems 52

(2) (2012) 514-527.

39

17. C.Y. Tsai, P.H. Lo, A sequential pattern based route suggestion system,

International Journal of Innovative Computing, Information and Control 6 (10)

(2010) 4389-4408.

18. V.S. Tseng, K.W. Lin, Efficient mining and prediction of user behavior patterns

in mobile web systems, Information and Software Technology 48 (6) (2006)

357-369.

19. Y. Wang, N. Stash, L. Aroyo, P. Gorgels, L. Rutledged, G. Schreiberb,

Recommendations based on semantically enriched museum collections, Web

Semantics: Science, Services and Agents on the World Wide Web 6 (4) (2008)

283-290.

20. G. Yavas, D. Katsaros, O. Ulusoy, Y. Manolopoulos, A data mining approach for

location prediction in mobile environments, Data and knowledge Engineering 54

(2) (2005) 121-146.

21. C.H. Yun, M.S. Chen, Mining mobile sequential patterns in a mobile commerce

environment, IEEE Transactions On Systems, Man and Cybernetics Part C:

Application and Reviews 37 (2) (2007) 278-295.

22. Z. Zhang, H. Lin, K. Liu, D. Wu, G. Zhang, J. Lu, A hybrid fuzzy-based

personalized recommender system for telecom products/services, Information

Sciences 235 (2013)117-129.

a location-item-time sequential pattern mining algorithm for route recommendation

Documents