a location-item-time sequential pattern mining algorithm for route recommendation
TRANSCRIPT
Accepted Manuscript
A Location-Item-Time Sequential Pattern Mining Algorithm for Route Recom-
mendation
Chieh-Yuan Tsai, Bo-Han Lai, J. Lu
PII: S0950-7051(14)00352-9
DOI: http://dx.doi.org/10.1016/j.knosys.2014.09.012
Reference: KNOSYS 2959
To appear in: Knowledge-Based Systems
Received Date: 14 January 2014
Revised Date: 5 September 2014
Accepted Date: 26 September 2014
Please cite this article as: C-Y. Tsai, B-H. Lai, J. Lu, A Location-Item-Time Sequential Pattern Mining Algorithm
for Route Recommendation, Knowledge-Based Systems (2014), doi: http://dx.doi.org/10.1016/j.knosys.
2014.09.012
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers
we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and
review of the resulting proof before it is published in its final form. Please note that during the production process
errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
1
A Location-Item-Time Sequential Pattern Mining Algorithm
for Route Recommendation
Chieh-Yuan Tsai a b
*
Bo-Han Lai a
a Department of Industrial Engineering and Management, Yuan-Ze University, Taiwan
b Innovation Center for Big Data and Digital Convergence, Yuan-Ze University, Taiwan
* Corresponding Author. Email: [email protected]
A revised manuscript submitted to
Knowledge-Based Systems
Professor J. Lu
Fac. of Engineering and Information Technology,
Building 2, Level 7 CB02.07.037,
University of Technology Sydney, P.O. Box 123,
Sydney, NSW 2007, New South Wales, Australia
October 3, 2014
2
ABSTRACT
To survive in a rapidly changing environment, theme parks need to provide high
quality services in terms of visitor tastes and preferences. Understanding the spatial
and temporal behavior of visitors could enhance the attraction management and
geographical distribution for visitors. To fulfill the need, this research defined a
Location-Item-Time (LIT) sequence to describe visitor’s spatial and temporal
behavior. Then, the Location-Item-Time PrefixSpan (LIT-PrefixSpan) mining
algorithm is developed to discover frequent LIT sequential patterns. Next, the route
suggestion procedure is proposed to retrieve suitable LIT sequential patterns for
visitors under the constraints of their intended-visiting time, favorite regions, and
favorite recreation facilities. A simplified theme park is used as an example to show
the feasibility of the proposed system. The experimental results show that the system
can help managers understand visitors’ behavior and provide appropriate visiting
experiences for visitors.
Keyword: Recommendation systems; Sequential pattern; Sequence mining; Theme
park.
3
1. Introduction
A theme park is an aggregation of attractions including architecture, landscape,
rides, shows, food services, costumed personnel and retail shops. Well-known
examples include Disney World, Disneyland, Universal Studios and Six Flags.
Although the theme park industry has enjoyed steady attendance growth in the past
several decades, the theme park market has entered a mature stage and is no longer
experiencing high growth [5, 6]. To survive in a rapidly changing environment, theme
parks need to provide high quality services in terms of visitor tastes and preferences.
Understanding the spatial and temporal behavior of visitors could enhance the
management of attractions and contribute to extending the geographical distribution
of visitors within regions.
In the past decade, the recommendation technique has been regarded as a popular
technique for providing a variety of products, services and items to customers in the
tourism industry [4, 7, 13]. Personalized tourism services aim at helping users to find
what they are looking for by comparing the user profile to reference characteristics.
Wang et al. [19] presented semantic web technologies for providing personalized
access to digital museum collections. Niaraki and Kim [12] proposed a generic
ontology-based architecture using a multi-criteria decision making technique to
design a personalized route planning system. Schiaffino and Amandi [14] developed
an expert software agent in the tourism and travel domain, named Traveler. This agent
combines collaborative filtering with content-based recommendations and
demographic information about customers to make recommendations. García-Crespo
et al. [3] presented the SPETA system, which uses knowledge of user’s current
location, preferences, as well as a history of past locations to provide the type of
recommendation services that tourists expect from a real tour guide. Tsai and Lo [17]
took previous popular visiting behaviors as the foundation and developed a sequential
4
pattern based route suggestion system to generate personalized tours. Tsai and Chung
[16] developed a route recommendation system that provides personalized visiting
routes for tourist in theme parks that consider a set of visiting sequences. Based on the
retrieved visiting behavior data and facility queuing situation, their system can
generate a proper route suggestion for visitors.
The above recommendation systems have demonstrated themselves efficient
tools by designing user interfaces that can smoothly interact with the environment,
providing convenient information query tools, or suggesting a set of associated
products (or services). However, three major problems are revealed. First, these
systems simply return a set of suggested facilities (items) in a sequential order, but fail
to illustrate the complete visiting path for visitors. For example, their systems might
suggest a visitor visit items k1, k4, and k8 in order (i.e., k1→k4→k8). However,
the actual path to complete the route should contain “by-pass items” such as
k1→k4→k7→k8, k1→k4→k6→k8, or even k1→k4→k7→k6→k8. Without providing
complete path information, a visitor might get confused and spend much more time to
finish the route. Second, previous systems seldom take the geographic constraints into
consideration so that their suggested routes are often trivial and impractical. For
example, previous studies might suggest visitor a long route
k1→k2→k6→k4→k7→k10→k8→k12. However, the route is trivial and hard-to-follow
since k1, k2, and k4 are in region A, k6, k7 and k8 are in region B, and k10 and k12 in
region C. In fact, a theme park consists of several regions where each region contains
dozens of facilities and shops. It will be worthwhile to suggest a no-trivial suggestion
such as A(k1, k4, k2) →B(k8, k6)→C(k10, k12) for visitors. Third, previous studies
seldom took the time constraints into consideration when they provided route
suggestion for visitors. For example, previous systems simply suggest a route format
5
such as k1→k4→k8 for visitors. However, when time interval information between
items are revealed, this route will be k1→(1 hours)→k4→(1 hour)→k8. If the
intended-visiting time for a visitor is 90 minutes, this suggestion is unacceptable since
the visitor cannot finish the route on time. On the other hand, if intended-visiting time
is 300 minutes, this suggestion is not suitable also. Without providing time interval
between items in the suggestion, tourists are unsure whether she/he can complete the
suggested route on time or not.
To solve the above problems, this research defines a Location-Item-Time (LIT)
sequence to describe visitor’s spatial and temporal behavior. To the best of our
knowledge, this study is the first work to include location (region), item, and
time-interval information simultaneously into a sequence. Then, the
Location-Item-Time PrefixSpan (LIT-PrefixSpan) mining procedure is developed to
discover frequent LIT sequential patterns. Finally, the route suggestion procedure is
proposed to retrieve suitable LIT sequential patterns under the constraints of visitor’s
intended-visiting time, favorite regions and its related visiting time, favorite recreation
facilities. This paper is organized as follows. Section 2 reviews previous works related
to sequential pattern mining and suggestion. Section 3 introduces the framework of
the proposed route recommendation system. Section 4 demonstrates a case to show
the feasibility of the proposed system. Finally, Section 5 summarizes the conclusions
and points out possible future directions.
2 Literature review
Yavas et al. [20] proposed a three-phase mobility prediction algorithm for the
prediction of user movement in a mobile computing system. Their algorithm enables
the system to allocate resource for users in an efficient manner, and to produce more
6
accurate answers to location-dependent queries that refer to future positions of mobile
users. Cho et al. [2] proposed a sequential rule-based recommendation method that
considers the evolution of customers’ purchase sequences. The purchase transaction
records of a customer for a certain period are used to build a customer profile. Then, a
collaboration-based system is in charge to find a set of customers, through calculating
the correlations among customers profile. Tan et al. [15] proposed a new approach to
build personalization recommendation system based on access sequential patterns,
named Frequent Accessed Sequence Tree (FAS-Tree). All frequent access sequential
patterns are compressed into FAS-Tree to save storage greatly. During personalization
recommendation stage, it is only necessary to traverse sub paths of FAS-tree referring
to page views in active window to find match patterns, without the need to generate
association rules. Yun and Chen [21] developed a mining mobile sequential patterns
algorithm to better reflect the customer usage patterns in the mobile commerce
environment, which takes both the moving patterns (location) and purchase patterns
(items) of customers into consideration. Tseng and Lin [18] proposed a novel data
mining method, namely SMAP-Mine that can efficiently discover mobile users’
sequential movement patterns associated with requested services. Through empirical
evaluation under various simulation conditions, SMAP-Mine is shown to deliver
excellent performance in terms of accuracy, execution efficiency and scalability.
Meanwhile, the proposed prediction strategies are also verified to be effective in
measurements of precision, hit ratio and applicability.
Li et al. [8] proposed a Multi-Stage Collaborative Filtering (MSCF) process to
provide the location-aware event recommendation service in mobile environment. The
first stage in MSCF performs the People-to-People Collaborative Filtering (P2P-CF),
while the Event-to-Event Collaborative Filtering (E2E-CF) discovers the sequential
rules of event-participation in the second stage. Liu and Chang [9] proposed a route
7
recommendation system which guides the user through a series of locations. Their
system used the methods of sequential pattern mining to extract popular route patterns
from a large set of historical user’s route records. Then, the system recommends
routes by matching the user’s current route with the set of popular route patterns. Liu
et al. [10] proposed a novel hybrid recommendation approach that combines the
segmentation-based sequential rule (SSR) method with the segmentation-based
KNN-CF (SKCF) method. In order to enhance the quality of product
recommendations, their method considers customers’ purchase sequences over time
and their purchase data for the current period. Hung and Peng [6] proposed a
Regression-based approach for mining User Movement Patterns (RUMP). Large
Sequence (LS) algorithm extracts the call detail records and Time Clustering (TC)
algorithm determines the number of regression functions. Then, Movement Function
(MF) algorithm generates the movement function representing user movement
patterns of mobile users. Lu et al. [11] proposed a hybrid semantic recommendation
approach which integrates item-based CF similarity with item-based semantic
similarity techniques. The hybrid semantic recommendation approach has been
implemented in an Intelligent Business Partner Locator recommendation system
prototype named BizSeeker. Similarly, Zhang et al. [22] developed a hybrid
recommendation approach which combines user-based and item-based collaborative
filtering techniques with fuzzy set techniques and knowledge base for mobile product
and service recommendation. It particularly implements the approach in a
personalized recommender system for telecom products/services called FTCP-RS.
Although the above sequential pattern algorithms are efficient in different
environment, however, they did not take location, item, and time-interval information
into consideration at the same time.
8
3 Research Method
3.1. Environment assumption and system overview
Typically, a theme park is divided into several regions and each region contains a
set of recreation facilities. It is assumed that each region is fully covered by RFID
readers. In addition, RFID readers are installed in the entrance of each recreation
facility, and entrance and exit of the park. When a visitor with a RFID tagged
wristband enters a region or entrance of a facility, RFID readers record the RFID tag
code, region id, facility id, and the time into a route database. The recording process
continues until the visitor leaves the park. Let’s take the layout in Fig. 1 as an example.
At timestamp t1, a visitor passes the entrance k11 of the park in region B. Then, she
moves to region A at timestamp t2, region F at timestamp t3, and region G at
timestamp t4. In region G, she takes facility k1. After that, she moves to region K at
timestamp t5, region O at timestamp t6. In region O, she takes facilities k2 and k3. The
recording process continues until she leaves the park from the exit k12 in region B.
Finally, the route sequence <(B, t1, {k11}), (A, t2, φ ), (F, t3, φ ), (G, t4, {k1}), (K, t5,
φ ), (O, t6, {k2, k3}), (K, t7, φ ), (G, t8, φ ), (B, t9, {k12})> is collected and stored in the
route database.
Fig. 1. An illustrative example for route sequence generation.
9
Whenever a visitor wants to request a route suggestion, he/she can reach the kiosk
machine in the park and input his/her preference information to the route
recommendation system. The preference includes intended total visiting time, favorite
regions, intended visiting time in the favorite regions, and favorite recreation facilities.
The route recommendation system consists of two major modules. The first module is
to generate a set of frequent Location-Item-Time (LIT) sequential patterns from the
route database using the proposed Location-Item-Time PrefixSpan (LIT-PrefixSpan)
mining procedure. The second module evaluates the similarity between the visitor’s
preference and candidate LIT routes, retrieves top ranking routes for the visitor. The
framework of the proposed system is shown in Fig. 2.
Fig. 2. Two modules in the proposed route recommendation system.
3.2. Location-Item-Time (LIT) sequential patterns
Let N = {n1, n2, …, ng} be the set of cells (regions) in the theme park and K= {k1,
k2, …, kh} be the set of items (facilities, entrance, and exit). In the route database RD,
a record is represented by <sid, rs> where sid is the identifier of the record and rs is a
10
route sequence. Formally, rs is represented as <(B1, t1, itemset1), (B2, t2, itemset2), …,
(Bn, tn, itemsetn)> where (Bi, ti, itemseti) is an event; Bi is the visited region and Bi ∈ N;
ti stands for the timestamp that region Bi is first entered and ti-1≤ti for ni ≤≤2 ;
itemseti is the set of items visited in region Bi and itemseti ⊆ K. Without timestamp
information, < Bi, itemseti> is called a transaction if itemseti is a non-empty set.
Definition 1. A transaction pattern is defined as <Bi; z> where z is the non-empty
subset of itemseti. A transaction pattern <Bi; z> is called a k-transaction pattern if the
length of z is k.
Example I
There are two route sequences sid 300 and sid 600 in the route database RD
shown in Table 1. 6 itemsets {k11}, {k1}, {k3}, {k4}, {k5}, and {k12} can be found in
sid 300, while 4 itemsets of {k11}, {k1}, {k2,k3}, and {k12} can be found in sid 600.
Therefore, transaction patterns <B;{k11}>, <G;{k1}>, <O;{k3}>, <L;{k4}>, <Q,{k5}>,
<B,{k12}> can be extracted from sid 300. Similarly, transaction patterns <B;{k11}>,
<G;{k1}>, <O;{k2,k3}>, <O,{k2}>, <O,{k3}>, <B,{k12}> can be extracted from sid
600. Finally, seven 1-transaction patterns of <B;{k11}>, <G;{k1}>, <O;{k3}>,
<L;{k4}>, <Q,{k5}>, <B;{k12}> and <O,{k2}> and one 2-transaction pattern of
<O;{k2,k3}> can be obtained.
Table 1 A simple route database, RD.
Sid Route sequence
300 <(B,8,{k11}), (G,9,{k1}), (F,11, φ ), (K,24, φ ), (O,25,{k3}), (P,35, φ ), (L,37,{k4}),
(Q,39,{k5}), (M,40,φ ), (H,45,φ ), (D,46,φ ), (C,51,φ ), (B,54, {k12})>
600 <(B,7, {k11}), (A,8,φ ), (F,21,φ ), (G,30,{k1}), (K,41,φ ), (O,44, {k2,k3}), (K,51,φ ),
(G,54,φ ), (B,58, {k12})>
Let ii ttt −=Δ +1 be the time interval between two successive events where
11
11 −≤≤ ni and Tc be a set of given constants for rc ≤≤1 . Then, the time interval
tΔ can be mapped as one of the elements in the set of discrete time intervals TI = {I1,
I2, …, Ir} by
1 1
1
0( )
1j j j
I if t TDiscTI t
I if T t T for j r−
< Δ ≤⎧⎪Δ = ⎨ < Δ ≤ < ≤⎪⎩ (1)
For example, assume T1 = 10, T2 = 20, T3 = 30, T4 = 40, T5 = 50, and T6 = 60.
Therefore, the set of discrete time intervals is TI = {I1, I2, I3, I4, I5, I6}, where I1: 0< tΔ
≤ 10, I2: 10< tΔ ≤ 20, I3:
20< tΔ ≤ 30, I4: 30< tΔ ≤ 40, I5:
40< tΔ ≤ 50, I6: 50< tΔ ≤ 60.
Definition 2. Let Γ = {γ1, γ2, …, γn} be the set of transaction patterns and TI = {I1, I2, …,
Ir} be the set of discrete time intervals. A sequence β = (D1, ε1, D2, ε2,…, Dq-1, εq-1, Dq)
is a Location-Item-Time (LIT) sequence if ΓDs ∈ for qs ≤≤1 and TIs ∈ε for
11 −≤≤ qs .
3.3. Location-Item-Time mining procedure
Similar to the work of Yun and Chen [21], the proposed LIT sequential pattern
mining method consists of three phases: the large-transaction generation phase,
large-transaction transformation phase, and LIT sequential pattern generation phase.
3.3.1. Large-transaction generation phase
The large-transaction generation phase generates the large transactions from the
route database RD. Fig. 3 shows the pseudo-code of the large-transaction generation
algorithm. This algorithm consists of two main steps. As shown from line 1 to 11, the
first step derives all k-transaction patterns from the RD according to Definition 1. In
addition, the support count of each k-transaction pattern is calculated. The second step,
as shown from line 12 to 16, finds the set of large k-transaction patterns. If the support
12
count of a k-transaction pattern is greater than or equal to the user-specified minimum
support count (called min_sup_count), the k-transaction pattern is called a large
k-transaction pattern. Next, the itemsets in all large k-transaction patterns is replaced
by unique symbols. The set of all large k-transaction patterns after symbol
replacement are called large 1-sequential patterns.
Large-transaction generation algorithm Input:
RD A route database min_sup_count Minimum support count
Output: Γ’ The set of large 1-sequential patterns (1-LIT sequential patterns)
Method: (1) for each event (Bi, ti, itemseti) in RD (2) if itemseti is not empty then (3) for each z ⊆ itemseti // z is non-empty subset of itemseti (4) if < Bi, z > is not exist in Γ then (5) add γ =< Bi, z > to Γ , and set its sup_count to 1; (6) else (7) increase sup_count of < Bi, z > by 1; (8) end if (9) end for (10) end if (11) end for (12) for each γ =< Bi, z > in Γ (13) if sup_count of γ ≧ min_sup_count then (14) give z an unique symbol and save to Γ’ ; (15) end if (16) end for
Fig. 3. Pseudo-code of large-transaction generation algorithm.
Example II
Let’s take six route sequences in Fig. 4 as example to explain the
large-transaction generation phase. If the min_sup_count is set as 2, 12 candidate
1-transaction patterns and 2 candidate large 2-transaction patterns are found and
shown in Fig. 5(a). If the sup_count of a transaction pattern is less than the
min_sup_count, the transaction pattern should be deleted. Therefore, transaction
patterns <P;{k9}>, <Q;{k6}>, <Q;{k5,k6}> are deleted. Next, the itemsets in all large
k-transaction patterns is replaced by unique symbols as shown in Fig. 5(b). The set of
13
all large k-transaction patterns after symbol replacement, called large 1-sequential
patterns, is summarized in Fig. 5(c).
Fig. 4. Six route sequences in the RD.
Candidate 2-transaction patterns Cell Itemset Sup_count *Q {k5,k6} 1 O {k2,k3} 2
Candidate 1-transaction patterns Cell Itemset Sup_count G {k1} 6 O {k2} 3 O {k3} 6 *P {k9} 1 L {k4} 5 Q {k5} 3
*Q {k6} 1 M {k7} 2 U {k8} 2 R {k10} 2 B {k11} 6 B {k12} 6
Cell Itemset Large
transaction G {k1} {G;g1} O {k2} {O;g2} O {k3} {O;g3} L {k4} {L;g4} Q {k5} {Q;g5} M {k7} {M;g6} U {k8} {U;g7} R {k10} {R;g8} B {k11} {B;g9} B {k12} {B;g10} O {k2,k3} {O;g11}
Large 1-sequential patterns Large transaction Sup_count
<G;g1> 6 <O;g2> 3 <O;g3> 6 <L;g4> 5 <Q;g5> 3 <M;g6> 2 <U;g7> 2 <R;g8> 2 <B;g9> 6 <B;g10> 6 <O;g11> 2
Fig. 5. Large 1-sequential patterns.
14
3.3.2. Large-transaction transformation phase
The large-transaction transformation phase transforms route sequences into the
maximal large-transaction sequences. Fig. 6 shows the pseudo-code of
large-transaction transformation algorithm. Line 2 to 4 initializes variables String, ML,
and Path as empty values. String is a temporary variable storing the on-going string in
a buffer; ML represents the on-going maximal large-transaction sequence; Path
represents the on-going path of the maximal large-transaction sequence. For each
event (Bi, ti, itemseti) in route sequence rs, itemseti might be non-empty or empty. If
itemseti is non-empty (line 7 to 14), the algorithm checks whether < Bi, z > exists in
the set of large 1-sequential patterns Γ’ or not where z is non-empty subset of itemseti.
If z does exist in Γ’, the algorithm appends its unique symbol g to Gi. After all z are
checked, <(Bi, Gi), ti> will be appended to ML and String will be appended to Path.
Finally, the algorithm sets String to < Bi >. If itemseti is empty (line 16 to 20), the
algorithm checks whether < Bi > has been visited or not. If < Bi > has not been visited,
the algorithm will append < Bi > to String. Otherwise, anything after the first < Bi > in
String will be deleted. Through the phase, the record with the form of <sid, rs> in the
RD will be transferred to the form of <sid, maximal large-transaction sequence, path>
which is stored in the transformed route database TRD.
15
Large-transaction transformation algorithm Input: Γ’ The set of large 1-sequential patterns RD A route database min_sup_count Minimum support count
Output: TRD The set of maximal large-transaction sequence and its path
Method: (1) for each rs in RD (2) set String to empty; // The on-going string in the buffer (3) set ML to empty; // The maximal large-transaction sequence (4) set Path to empty; // The path of the maximal large-transaction sequence (5) for each event (Bi, ti, itemseti) in rs (6) if itemseti is not empty then // itemseti is taken (7) for each z ⊆ itemseti ( z is non-empty subset of itemseti) (8) if < Bi, z > exists in Γ’ then (9) append its unique symbol g to Gi ; (10) end if (11) end for (12) append <(Bi, Gi), ti> to ML; (13) append String to Path; (14) set String to < Bi >; (15) else // itemseti is empty (16) if < Bi > is not in String then (17) append < Bi > to String; (18) else // < Bi > is in String (19) delete anything after the first < Bi > in String; (20) end if (21) end if (22) end for (23) append String to Path; (24) end for
Fig. 6. Pseudo-code of large-transaction transformation algorithm.
Example III
According to the large 1-sequential patterns shown in Fig. 5, Table 2 illustrates
the operations in the large-transaction transformation phase for route sequence sid 600.
The first column is the sequence of movements, the second column is the visited
regions, the third column is the visited time, and the fourth column is the recreation
facilities played by the visitor. The fifth column gives the on-going large-transaction
in the buffer and the sixth column gives on-going string in the buffer. The seventh
column shows the maximal large-transaction sequence and the eighth column shows
16
the path of the maximal large-transaction sequence. After a series of transformation,
the maximal large-transaction sequence for sid 600 becomes <<(B;g9),7>,
<(G;g1),30>, <(O;g2,g3,g11),44>, <(B;g10),58>> and its path is BAFGKOKGB.
Through the same process, all route sequences in the RD of Fig. 4 are transformed to
maximal large-transaction sequences in the TRD as shown in Table 3.
Table 2 Process of producing the maximal large-transaction sequence for sid 600.
Move Cell Time Items Large-
transaction String
Maximal large-transaction
sequence Path
1 B 7 k11 <B;g9> B <(B;g9),7> -
2 A 8 - - BA - -
3 F 21 - - BAF - -
4 G 30 k1 <G;g1> G <(B;g9),7>,<(G;g1),30> BAF
5 K 41 - - GK - -
6 O 44 k2,k3 <O;g2,g3,g11> O <(B;g9),7>,<(G;g1),30>,<(O;
g2,g3,g11),44> BAFGK
7 K 51 - - OK - -
8 G 54 - - OKG - -
9 B 58 k12 <B;g10> B <(B;g9),7>,<(G;g1),30>,<(O;
g2,g3,g11),44>,<(B;g10),58>
BAFGKO
KGB
able 3 Transformed route database, TRD.
Sid Maximal large-transaction sequence Path
100 <(B;g9),4>,<(G;g1),5>,<(O;g2,g3,g11),14>,<(L;g4),20>,<(Q;g5),
22>,<(M;g6),38>,<(U;g7),52>,<(B;g10),60> BGKOTPLQRMQVUPKGB
200 <(B;g9),1>,<(G;g1),20>,<(O;g3),39>,<(L;g4),46>,<(Q;g5),47>,
<(M;g6),50>,<(B;g10),60> BAFGKOTPLQRMIHCB
300 <(B;g9),8>,<(G;g1),9>,<(O;g3),25>,<(L;g4),37>,<(Q;g5),39>,<
(B;g10),54> BGFKOPLQMHDCB
400 <(B;g9),2>,<(G;g1),7>,<(O;g3),17>,<(L;g4),27>,<(R;g8),46>,<
(U;g7),53>,<(O;g2),56>,<(B;g10),60>
BFGKOTPLQRQVUTOKG
B
500 <(B;g9),1>,<(G;g1),2>,<(O;g3),14>,<(L;g4),19>,<(R;g8),40>,<
(B;g10),60> BGKOTPLQRNIDCB
600 <(B;g9),7>,<(G;g1),30>,<(O;g2,g3,g11),44>,<(B;g10),58> BAFGKOKGB
3.3.3. Location-Item-Time sequential pattern generation phase
Next, a LIT sequential pattern algorithm is developed to generate all large LIT
sequential patterns from the TRD. Similar to Chen et al. [1], the proposed LIT
17
sequential pattern algorithm, called LIT-PrefixSpan algorithm, is based on PrefixSpan
mining concept. Before introducing the LIT-PrefixSpan algorithm, the following
definitions are given.
Definition 3. For a maximal large-transaction sequence α = (<(B1; z1), t1>,<(B2; z2),
t2>,…, <(Bn; zn), tn>) and a Location-Item-Time (LIT) sequence β = (D1, ε1, D2, ε2,…,
Dq-1, εq-1, Dq), β is said to be contained in α or β is a LIT subsequence of α if the
integers njjj q ≤<<<≤ ...1 21 exist such that,
1. D1= );(11 jj zB , D2= );(
22 jj zB , …, Dq= );(qq jj zB .
2. 1−
−ii jj tt satisfies the condition of time-interval εi-1 for qi ≤≤2 .
Definition 4. support_countTRD(α) = |{(sid, maximal large-transaction sequence, path)
| (sid, maximal large-transaction sequence, path) ∈ TRD ∧ α is contained in TRD}|.
A LIT sequence α is called a LIT sequential pattern if the percentage of records in
TRD consisting of α is greater than or equal to the pre-defined minimum support,
called min_sup. That is, α is named a LIT sequential pattern in TRD if
support_countTRD(α) ≥ |TRD|× min_sup or support_countTR(α) ≥min_sup_count. A
LIT sequence whose length is l is denoted as a l-LIT sequence.
Definition 5. Given a maximal large-transaction sequence α = (<(B1; z1), t1>,<(B2; z2),
t2>,…, <(Bn; zn), tn>) and a LIT sequence β = (D1, ε1, D2, ε2,…, Dq-1, εq-1, Dq) ( nq ≤ ),
β is a LIT prefix of α if and only if (1) Di=(Bi; zi) for mi ≤≤1 ; (2) 1−− ii tt satisfies
the condition of εi-1 for 11 −≤< mi .
Definition 6. Given a maximal large-transaction sequence α= (<(B1; z1), t1>,<(B2; z2),
18
t2>,…, <(Bn; zn), tn>) and a LIT sequence β = (D1, ε1, D2, ε2,…, Dq-1, εq-1, Dq) ( nq ≤ )
such that β is a subsequence of α. Let i1<i2<…<iq be the indexes of the
large-transaction patterns in α that match the large-transaction patterns of β. A
subsequence 'α = (<( '1B ; '
1z ), '1t >,<( '
2B ; '2z ), '
2t >,…, <( 'Bp ; 'pz ), '
pt >) of sequence α, where
qinqp −+= is called a projection of α with respect to β if and only if (1) β is a LIT
prefix of 'α and (2) the last qin− large-transaction patterns of 'α are the same as
the last qin− large-transaction patterns of α.
Definition 7. Let 'α = (<( '1B ; '
1z ), '1t >,<( '
2B ; '2z ), '
2t >,…, <( 'Bp ; 'pz ), '
pt >) be the projection
of α with respect to a LIT prefix β = (D1, ε1, D2, ε2,…, Dq-1, εq-1, Dq). Then θ = (<( '1B +q ;
'1+qz ), '
1+qt >,<( '2B +q ; '
2+qz ), '2+qt >, …, <( 'Bp ; '
pz ), 'pt >) is the postfix of α with respect to
prefix β.
The pseudo-code of the proposed LIT-PrefixSpan algorithm is illustrated in Fig.
7. The α-projected database defined by the collection of postfixes of maximal
large-transaction sequences in TRD with respect to α is denoted as TRD|α. The major
difference between LIT-PrefixSpan and I-PrefixSpan is that the LIT-PrefixSpan
includes both cells and items in transaction pattern. Therefore, a table LIT_Table is
used to store this type of relation, where a column corresponds to a large-transaction
pattern and a row corresponds to a time-interval in TI = {I1, I2, …, Ir}. Each cell
LIT_Table(Ii, 'iγ ) in the table records the number of transactions in TRD|α which
contains transaction pattern and the time difference between this transaction pattern
and the last transaction pattern of α lies within Ii. Processing every transaction in
19
TRD|α sequentially enables LIT_Table to be formed and the frequent cells to be
identified. If the cell LIT_Table(Ii, 'iγ ) is a frequent cell, (Ii, 'iγ ) can be appended to α
to yield a LIT sequential pattern 'α , and to construct the 'α -projected database TRD
'α . Recursively discovering the LIT sequential patterns in TRD 'α finally yields all
LIT sequential patterns in the TRD.
Subroutine LIT-PrefixSpan(α, l, TRD|α) Input: Γ’={ '1γ , '2γ ,…, 'nγ } The set of large 1-sequential patterns
(1-LIT sequential patterns) TRD The set of maximal large-transaction sequence and its path
(The transformed route database) TI={I1,I2,…,Ir} Time-interval LIT_Table The table stores the relation between the set of large 1-sequential patterns
and time-interval. min_sup_count Minimum support count
Output: The LIT sequential patterns
Method: (1) if l = 0 then (2) for each 'iγ in Γ’ (3) append 'iγ to α as α’ ; (4) output α’ ; (5) construct α’-projected database TRD|α’, (6) and call LIT-PrefixSpan (α’, l+1, TRD|α’) ; (7) end for (8) else (9) set LIT_Table to empty; (10) for each sid in TRD|α (11) construct α-time ; (12) for each Gi of the sid in TRD|α (13) calculate the time-interval between Gi and α, and classify to Ii ; (14) for each g of Gi (15) Increase the count of (Ii,
'iγ ) in LIT_Table by 1; (16) end for (17) end for (18) end for (19) for every frequent cell (Ii,
'iγ ) in LIT_Table (20) append (Ii,
'iγ ) to α as α’ ; (21) output α’ ; (22) construct α’-projected database TRD|α’, (23) and call LIT-PrefixSpan (α’, l+1, TRD|α’) ; (24) end for (25) end if
Fig. 7. Pseudo-code of the LIT-PrefixSpan algorithm.
20
Example IV
Suppose TI = {I1, I2, I3, I4, I5, I6}, where I1: 100 ≤Δ< t , I2: 2010 ≤Δ< t , I3:
3020 ≤< t , I4: 4030 ≤Δ< t , I5:
5040 ≤Δ< t , I6: 6050 ≤Δ< t . Consider the TRD
shown in Table 3 and the min_sup_count is set as 2. At the beginning, α is empty and
the frequent transaction patterns <B;g9>, <G;g1>, <O;g2>, <O;g3> <O;g11>, <L;g4>,
<Q;g5>, <M;g6>, <U;g7>, <R;g8> and <B;g10> are discovered. Appending these
frequent transaction patterns to α is empty and yields 9 different 'α . Table 4
summarizes the LIT sequential pattern mining result. The total number of LIT
sequential patterns is 68 (=11+25+23+8+1) since there are 11 1-LIT sequential
patterns, 25 2-LIT sequential patterns, and so on.
Table 4 LIT sequential pattern mining result.
k Number of patterns k-LIT sequential patterns Sup_Count
1 11
<B;g9>
<G;g1>
..
<B;g10>
6
6
…
6
2 25
<B;g9>,I6,<B;g10>
<B;g9>,I2,<O;g3>
…
<R;g8>,I2,<B;g10
5
3
…
2
3 23
<B;g9>,I1,<G;g1>,I6,<B;g10>
<B;g9>,I1,<G;g1>,I2,<O;g3>
…
<L;g4>,I1,<O;g5>,I2,<B;g10>
3
2
…
2
4 8
<B;g9>,I1,<G;g1>,I4,<R;g8>,I2,<B;g10>
<B;g9>,I1,<G;g1>,I1,<O;g3>,I4,<U;g7>
..
<G;g1>,I3,<L;g4>,I1,<Q;g5>,I2,<B;g10>
2
2
…
2
5 1 <B;g9>,I1,<G;g1>,I1,<O;g3>,I4,<U;g7>,I1,<B;g10> 2
3.4. Route recommendation procedure
When a visitor requires a route suggestion, he/she is requested to enter persoal
preference to the route recommendation system in the kiosk. The visitor’s preference
can be represented as a VP vector:
21
VP=<ITVT, <FR1, FItems1, IRVT1>, <FR2, FItems2, IRVT2>,…> (2)
where ITVT is the intended total visiting time. FRi is the favorite region i , FItemsi is
the set of favorite facilities in FRi, and IRVTi is the intended visiting time in FRi. For
example, VP = <420, <G, {k1}, 90>, <O, {k2, k3}, 120>> indicates that a visitor
intends to spend 420 minutes in the theme park. In addition, he/she would like to
spend 90 minutes in region G and take recreation facility k1 in region G, and 120
minutes in region O and take recreation facility k1 and k3 in region O. Note that the
more information a visitor centers, the more satisfied suggestion the visitor can
obtain.
3.4.1. Time constraint
The number of LIT sequential patterns generated from LIT-PrefixSpan algorithm
might be large. However, a LIT sequential pattern is a candidate LIT route if the
pattern satisfies the following rules. First, a LIT sequential pattern should include
entrance and exit. Second, a LIT sequential pattern should satisfy the time constraint
provided by the visitor. As mentioned in Section 3.2, the time interval tΔ can be
transferred as one of elements in the set of discrete time intervals TI = {I1, I2, …, Ir}
according to Equation (1). Therefore, the lower bound and upper bound of a time
intervals Ij are derived using Equations (3) and (4) respectively.
rj
j
TIf
jjLB ≤<
=
⎩⎨⎧
=− 1if
1if
,
,0)(
1
(3)
rj
j
T
TIf
jjUB ≤<
=
⎩⎨⎧
=1if
1if
,
,)(
1 (4)
Let a LIT sequential pattern β be represented as (D1, ε1, D2, ε2,…, Dq-1, εq-1, Dq). The
total visiting time of β can be represented as VTβ = ( β
LBVT , βUBVT ] where the lower bound
22
of VTβ is derived as:
∑ −
== 1
1)(
q
s sLBLB fVT εβ
(5)
and the upper bound of VTβ is defined as:
∑ −
== 1
1)(
q
s sUBUB fVT εβ (6)
If ITVTVTLB ≤β and ITVTVTUB ≥β , we say that LIT sequential pattern β satisfies the a
visitor’s time constraint where ITVT is the visitor’s the intended total visiting time in
Equation (2).
Example V
Suppose TI = {I1, I2, I3, I4} where I1: 0< tΔ ≤ 30, I2:
30< tΔ ≤ 60, I3: 60< tΔ ≤ 90,
I4: 90< tΔ ≤ 120, and five LIT sequential patterns are shown in the first three columns
of Table 5. According to Equations (3) to (6), the total visiting time of each pattern
can be derived in the last column of Table 5. If a visitor’s intend total visiting time
ITVT is 320 minutes, LIT sequential patterns 1, 2, and 3 are considered as candidate
routes since LIT sequential patterns 4 and 5 do not satisfy the visitor’s time constraint.
Table 5 Five LIT sequential patterns.
No LIT sequential pattern Path Total visiting
time
1 <B;k11>,I1,<O;k3>,I2,<L;k4>,I4,<Q;k6>,I1,<M;k7>,I4,
<B;k12> BAKOKLQMHCB (210,360]
2 <B;k11>,I1,<G;k1>,I3,<O;k2,k3>,I4,<L;k4>, I4,<B;k12> BGKOTPLHCB (240,360]
3 <B;k11>,I1,<G;k1>,I3,<L;k4>,I4,<Q;k6>,I1,<O;k2,k3>,
I2, <B;k12> BGLQPOKFB (180,330]
4 <B;k11>,I1,<G;k1>,I1,<O;k3>,I4,<U;k8>,I4, <B;k12> BGKOTUQLHCB (180,300]
5 <B;k11>,I1,<G;k1>,I3,<L;k4>,I1,<Q;k5>,I2, <B;k12> BGLQMHCB (90,210]
3.4.2. Similarity measurement
The similarity between VP=<ITVT, <FR1, FItems1, IRVT1>, <FR2, FItems2,
23
IRVT2>,…> and candidate LIT route β = ((B1; z1), ε1, (B2; z2), ε2,…, (Bq-1; zq-1), εq-1,
(Bq; zq)) is evaluated based on the following concepts. First, the intended visiting time
for region i, IRVTi, in VP will be mapped as one of the elements in TI = {I1, I2, …, Ir}
according to Equation (1) for all i. Second, when conducting the similarity evaluation,
<FRi, FItemsi, IRVTi> in VP and <(Bj; zj), εj> in β are considered as comparison units.
Third, if FRi and Bj are the same region, similarity evaluation between <FItemsi,
IRVTi> and <zj, εj> will be initialized. Base on above concepts, the similarity between
ith unit in VP and the jth unit in β is defined as:
1 2 3
,
if1 ( , ) ( , )
if0i j
i ji j
FR Bw w ISim i j w TSim i jSim
FR B
=× + × + ×⎧= ⎨ ≠⎩
(7)
where w1, w2, and w3 are the important degrees for region, facility, time-interval
considerations respectively, and w1 + w2 + w3 = 1. ISim(i, j) is the itemset similarity
between FItemsi and zj which is defined as:
||/||),( iji FItemszFItemsjiISim ∩= (8)
where ∩ is the set union operator and | | is the cardinality of the set. In addition,
TimeIntervalSim(i, j) is the time interval similarity between IRVTi and εj which is
defined as:
)(/|)()(|1),( rji IffIRVTfjiTSim ε−−= (9)
where || ⋅ is the absolute value operator and f (Ib) is the rank of the time-interval Ib in
TI and is defined as f (Ib) = b where b = 1, …, r. With Equation (7), the similarity
between VP and β is defined as:
||),(),(||
1
||
1
VPjisimVPSimVP
i j∑∑
= =
=β
β (10)
where || ⋅ is the length of the sequence. After the similarities between VP and all
candidate routes are derived, they are sorted in decreasing order and returned back to
24
the kiosk machine as suggested routes. If more than one candidate routes have the
same similarity value, the route having larger number of total facilities has higher
ranking order.
Example VI
Assume LIT sequential patterns 1, 2, and 3 in Example V are candidate LIT
routes and visitor preference of a visitor is VP = <300, <O, {k3}, 70>, <Q, {k5, k6},
100>>. According to discrete time-interval definition in Example V, VP will be
transferred as <300, <O, {k3}, I3>, <Q, {k5, k6}, I4>>. For candidate LIT route #1, we
have Sim1,1 = 1/3 + 1/3*(1/1) + 1/3* (1-|f(I3)-f(I2)|/f(I4)) = 11/12.; Sim1, 2= 0; Sim1, 3=0;
Sim1, 4= 0. Sim2, 1= 0 ; Sim2, 2= 0 ; Sim2,3= 1/3 + 1/3*(1/2) + 1/3*(1-|f(I4)-f(I1)|/f(I4)) =
7/12; Sim2,4= 0. Hence, the total similarity between VP and candidate LIT route #1 is
((11/12 + 0 + 0 +0 ) + (0+ 0 + 7/12 + 0))/2 = 0.75. With the similar process, we have
Sim(VP, #1) = 0.75, Sim(VP, #2) = 0.458, and Sim(VP, #3) = 0.75. It is found that the
candidate LIT route #1 and #3 have the same total similarity score. When the total
similarity score are the same, their total number of facilities will be compared. The
total number of facilities for candidate LIT route #1 and #3 are 4 and 5 respectively.
Therefore, candidate LIT route #3 is ranked as 1. Table 6 shows the final ranking
result for the three candidate LIT routes. Based on candidate LIT route #3 and its path,
the route recommendation system will suggest a visitor to pass entrance k11 in region
B. After time-interval I1, the visitor is suggested to move to region G and take k1.
Then, after time-interval I3, he/she is suggested to take k4 in region L, and so on.
25
Table 6 Three LIT candidate routes and their rankings.
N
o. Candidate LIT route Path
Total
visiting
time
Total
similarity
score
Total
facility
number
Final
rank
3 <B;k11>,I1,<G;k1>,I3,<L;k4>,I4,
<Q;k6>,I1,<O;k2,k3>,I2, <B;k12> BGLQPOKFB (180,330] 0.75 5 1
1 <B;k11>,I1,<O;k3>,I2,<L;k4>,I4,
<Q;k6>,I1,<M;k7>,I4, <B;k12>
BAKOKLQMH
CB (210,360] 0.75 4 2
2 <B;k11>,I1,<G;k1>,I3,<O;k2,k3>,
I4,<L;k4>,I4,<B;k12> BGKOTPLHCB (240,360] 0.458 4 3
4. Implementation and experiment results
The proposed route recommendation system is implemented using C# and tested
on a PC with Core i5 2.80GHz CPU and 4GB memory.
4.1. Case description and route generator
In this study, a simplified theme park is used as an example to illustrate the
feasibility of the proposed system. As shown in Fig. 8, there are seven thematic
regions and thirty-four recreation facilities (k1 to k34). For example, thematic region B
contains facilities k1, k2, and k3, while thematic region H contains facilities k31, k32, k33,
and k34. To simulate visiting behavior, a route generator is developed. In the generator,
visitors start their visiting from the entrance (k35) and finish at the exit (k36). The
regions that visitors pass through must be adjacent. The total visiting time of a route
sequence is randomly determined by a uniform distribution within 780 (minute) since
the operation time of the park is from 9:00 a.m. to 10:00 p.m.. The time in which a
visitor moves to the next region is randomly generated from a uniform distribution
between 15 (minute) and 30 (minute). In addition, the time in which a visitor spends
for taking a recreation facility is randomly generated from a uniform distribution
between 30 (minute) and 90 (minute).
26
Fig. 8. Layout of the implementation example.
According to the tourism reports, five must-visited recreation facilities are k4, k12,
k13, k25, k26 and seven popular facilities are k2, k6, k17, k22, k23, k24, k32. Therefore, if a
generated route sequence does not contain one of the five must-visited recreation
facilities, the route will be discarded. Likewise, if a generated route sequence does not
contain one of the seven popular recreation facilities, this route sequence will have
80% of probability to be discarded. In addition, the average number of visitors in the
theme park is 26,000 per day. Therefore, 26,000 route sequences are generated to
simulate the visiting behaviors of visitors in one day.
4.2. Route recommendation
Before executing the proposed LIT-PrefixSpan mining procedure, the minimum
support and the set of discrete time intervals should be determined. For simplicity, the
time intervals in this study are set as equal length of 30 minutes and the minimum
27
support is set as 0.02%. That is, the set of discrete time intervals are TI = {I1, I2, I3, …,
I20}, where I1: 0 30t< ≤ , I2: 30 60t< ≤ , I3: 60 90t< ≤ , …, I20: 800760 ≤< t . Based
on the settings, 380,735 LIT sequential patterns are discovered from LIT-PrefixSpan
mining procedure.
Assume a new visitor intends to spend 420 minutes (7 hours) and wishes to play
recreation facility {k12} of region D, and recreation facility {k22} of region F. In
addition, he/she wishes to spend 150 minutes in region D and 120 minutes in region F,
respectively. Thus, the visitor preference, VP, is <420, <D, {k12}, 150>, <F, {k22},
120>>. The important degrees for region w1, facility w2, time-interval w3 in Equation
(7) are set equally as 1/3. Based on the set of discrete time-intervals I, the total
visiting time (VTu) of each LIT sequential pattern can be calculated. After deleting the
sequential patterns that do not contains entrance and exit as well as the patterns that
do not satisfy the time constraint, 5,471 candidate LIT routes can be found. Table 7
shows the ranking information of candidate LIT routes derived by the route
recommendation generation module.
28
Table 7 Ranking information of each candidate LIT routes.
Ranking Candidate LIT route Total visiting time (min)
Total similarity score
Total facility number
Sup. Path
1 <A;k35>,I1,<B;k2>,I4, <D;k12,k13>,I4,<F;k22>,I4, <A;k36>
(360,520] 0.991667 4 5
ABDFCBA ABDFCBA ABDFCBA ABDFCBA ABDFCBA
2 <A;k35>,I1,<B;k2>,I4,<D;k12>,I4,<F;k22>,I4,<A;k36>
(360,520] 0.991667 3 5
ABDFCBA ABDFCBA ABDFCBA ABDFCBA ABDFCBA
3 <A;k35>,I1,<B;k2>,I3, <D;k12,k13>,I4,<F;k22>,I5, <A;k36>
(360,520] 0.983333 4 5
ABDFCBA ABDFCBA ABDFGDBA ABDFCBA ABDFCBA
4 <A;k35>,I3,<D;k12,k13>,I4, <F;k22>,I5,<A;k36>
(360,480] 0.983333 3 8
ABDFCBA ABDFGDBA … ABDFCBA
… … … … … … …
5,471 <A;k35>,I1,<B;k1>,I11,<A;k36> (400,480] 0 1 522
ABDEBA ABDCBA � ABDEHEBA
Fig. 9 shows top one ranking visiting route. The recommendation system suggests
the visitor starts the trip from the entrance in region A. Within 30 minutes
(time-interval I1), the visitor is suggested to take k2 recreation facility in region B.
After 120 minutes to 160 minutes (time-interval I4), the system suggests the visitor
takes k12 and k13 in region D. Again, after 120 minutes to 160 minutes (time-interval
I4), the visitor is suggested to take k22 in region F. Finally, after 120 minutes to 160
minutes (time-interval I4), the visitor is suggested to leave the theme park from the
exit in region A by passing through regions C, B, and A sequentially.
29
Fig. 9. Visiting sequence recommendation based on visitor’s preference.
To validate the proposed route recommendation module, different visitor’s
preferences shown in Table 8 are experimented. Case I is the case previously
introduced and used as the benchmark case. For Case II, a shorter intended-leaving
time (300 minutes) is inputted. Therefore, it is straightforward that less recreation
facilities will be suggested. Fig. 10(b) shows the suggested rout
<A;k35>,I1,<D;k12,k13>,I4,<F;k22>,I4,<A;k36> with the path ABDFCBA. For Case III,
the visitor simply inputs the constraints of takings k12 in region D and spending 150
minutes in region D. Since less constrains are provided in Case III, the similarity
between the visitor’s preference and many candidate routes are 1. Fig. 10(c) shows
one of candidate routes, <A;k35>,I1,<B;k1,k2,k3>,I6,<D;k12,k13>,I4,<A;k36>, suggested
by the system. For Case IV, the intended-leaving time is the same as the one in Case I,
but other preferences are different. Fig. 10(d) shows the route recommendation
system suggests 3 recreation facilities (k3, k12, k32) among 3 regions (B, D, H) for Case
30
IV.
Table 8 Different visitors’ preference settings.
Case ITVT < FRi, { Fav-itemseti}, VTi >
I 420 (min.) <D, {k12}, 150>, <F, {k22}, 120>
II 300 (min.) <D, {k12}, 150>, <F, {k22}, 120>
III 420 (min.) <D, {k12}, 150>
IV 420 (min.) <H, {k32}, 120>,<B, {k1}, 90>
(a) Case I (b) Case II
(c) Case III (d) Case IV
Fig. 10. Route recommendation results.
4.3. Experimental designs
In the proposed route recommendation system, different parameter settings might
affect the final suggestion results. Therefore, a set of experiments are conducted to
31
observe the affection caused by these parameters. Without other notice, the setting of
parameters and visitor preference is the same as Case I in Section 4.2.
4.3.1. Discussion of data size
As discussed in Section 3.3, the LIT-PrefixSpan mining procedure module
consists of three major phases: the large-transaction generation phase (Phase I), the
large-transaction transformation phase (Phase II), and the Location-Item-Time
sequential pattern generation phase (Phase III). To observe how the number of route
sequences (data size) affects the LIT-PrefixSpan mining procedure module, data size
is changed from 10,000 to 26,000. Table 9 summarizes the execution time of each
phase in the LIT-PrefixSpan mining procedure module. It is clear that, when the
number of route sequences increases, the execution time for the LIT-PrefixSpan
mining procedure module increases linearly. In addition, the execution time of Phase
III is significantly longer than the time of other two phases. Although the
LIT-PrefixSpan mining procedure module takes much time to execute, this module is
typically daily or weekly instead of every request.
Table 9 Execution time (in second) of each phase in LIT-PrefixSpan mining procedure
module.
The number of route sequences
10,000 13,000 16,000 19,000 22,000 26,000
Phase I 0.58 0.74 0.89 1.16 1.22 1.44
Phase II 0.96 1.09 1.30 1.66 1.82 2.08
Phase III 8240.01 7241.19 10563.63 11088.18 15220.72 17285.32
Total 8241.55 7243.01 10565.82 11090.99 15223.77 17288.83
4.3.2. Discussion of minimum support
To know how the minimum support in LIT-PrefixSpan mining procedure module
affect the result, the minimum support ranging from 0.02% to 0.2% is experimented.
32
Fig. 11(a) represents the number of LIT sequential patterns generated from the
LIT-PrefixSpan mining procedure module, and Fig. 11(b) represents the number of
candidate LIT routes in route recommendation generation module under different
minimum supports. When the minimum support increases, both the number of LIT
sequential patterns and the number of candidate LIT routes decrease. If the minimum
support value is set as 0.02%, the first module generates 380,735 LIT sequential
patterns and the second module generates 5,471 candidate LIT routes. However, if the
minimum support is 0.2%, there are only 14,831 LIT sequential patterns generated
from the first module, and 108 candidate LIT routes in the second module. Therefore,
based on the observation from the Fig. 11, the minimum support value is suggested as
0.02% in this case.
(a) (b)
Fig. 11. (a) Number of LIT sequential patterns and (b) Number of candidate LIT
routes under different minimum supports.
Fig. 12 shows the execution time of LIT-PrefixSpan mining procedure module and
route recommendation generation module under different minimum supports. It is
clear that, when the minimum supports increases, the execution time of the two
380735
139981
72955
4965337833
2128714831
0
50000
100000
150000
200000
250000
300000
350000
400000
Th
e n
um
be
r o
f LI
T s
eq
ue
nti
al
pa
tte
rn
Minimumsupport
5471
2478
1313
870563
239108
0
1000
2000
3000
4000
5000
6000
Th
e n
um
be
r o
f ca
nd
ida
te L
IT r
ou
tes
Minimumsupport
33
modules decreases. It is notes that the execution time for route recommendation
generation module is 1.27 second if minimum support is 0.2%. The execution time
should be acceptable for visitors to conduct the on-line recommendation request.
Fig. 12. Execution time of the two modules under different minimum supports.
4.3.3. Discussion of time-interval range
To observe how the time-interval range affects the proposed route
recommendation system, a set of time-interval ranges from 10 minutes to 120 minutes
are experimented. Fig. 13(a) summarizes the number of LIT sequential patterns
generated from the LIT-PrefixSpan mining procedure module and Fig. 13(b)
summarizes the number of candidate LIT routes generated from route
recommendation generation module. As shown in Fig. 13, when the range of time
interval increases, both LIT sequential patterns and the number of candidate LIT
routes increase. That is, if the time interval range is large, the time between two
events will fall into the same time interval range. Thus, it is easier to satisfy the
minimum support threshold and generate many same LIT sequential patterns. For
example, assume that there are ten route sequences of <(D,28,{k12}),(A,45,{k35})>
and ten route sequences of <(D,40,{k12}),(A,85,{k35})>. If time-interval range is set as
17288.83
11678.17
8781.02 7625.68 7164.62
5715.35 5386.71
751.46 209.04 70.85 32.47 16.18 3.79 1.27
0.00
2000.00
4000.00
6000.00
8000.00
10000.00
12000.00
14000.00
16000.00
18000.00
20000.00
0.02% 0.04% 0.06% 0.08% 0.10% 0.15% 0.20%
Th
e e
xe
cuti
on
tim
e (
seco
nd
)
Minimumsupport
LIT-PrefixSpan mining procedure module
Route recommendation generation module
34
30 minutes, two different LIT sequential patterns <(D;k12), I1, (A;k35)> and <(D;k12),
I2, (A;k35)> are found where I1:0~30, I2:30~60. However, if the time-interval range is
set as 60 minutes, two the same LIT sequential pattern <(D;k12), I1, (A;k35)> is
generated since I1:0~60. Based on the observation from Fig. 13, time-interval range is
suggested as the value between 40 minutes to 60 minutes to ensure the quality of the
suggested routes.
(a) (b)
Fig. 13. (a) Number of LIT sequential patterns and (b) Number of candidate LIT
routes under different time-interval ranges.
4.3.4. Discussion of w1, w2, and w3
In Equation (7), w1,w2 and w3 are the important degree for region, facility and
time-interval consideration respectively. To observe how important degree values
affect the route ranking, three more experiments are conducted. As shown in Table 10,
no matter how the important degree value is changed, the top-four ranking candidate
LIT routes are the same. The reason is that the region comparison is conducted first
according to the third rule of similarity measurement design in Section 3.4.2. That is,
if the region in the VP vector is not the same with the region in the candidate LIT
193507
261202
380735
462142
525464
582069
637042
0
100000
200000
300000
400000
500000
600000
700000
10 20 40 60 80 100 120
Th
e n
um
be
r o
f LI
T s
eq
ue
nti
al
pa
tte
rn
Time-interval range (minute)
3862169
5471 6105
19277
34647
37849
0
5000
10000
15000
20000
25000
30000
35000
40000
10 20 40 60 80 100 120
Th
e n
um
be
r o
f ca
nd
ida
te L
IT r
ou
tes
Time-interval range (minute)
35
route, the similarity between the facilities and time-interval of the two regions will not
be counted. This design makes the important degree values have less affection for the
proposed system. Based on the observation from the Table 10, the important degree
for region, facility and time-interval is suggested as 1/3, 1/3, and 1/3 respectively in
this study.
Table 10 Route ranking using different important degrees.
w1 w2 w3 Ranking Candidate LIT route Total similarity score
1/3 1/3 1/3
1 <A;k35>,I1,<B;k2>,I4,<D;k12,k13>,I4,<F;k22>,I4,<A;k36> 0.9917 2 <A;k35>,I1,<B;k2>,I4,<D;k12>,I4,<F;k22>,I4,<A;k36> 0.9917 3 <A;k35>,I1,<B;k2>,I3,<D;k12,k13>,I4,<F;k22>,I5,<A;k36> 0.9833 4 <A;k35>,I3,<D;k12,k13>,I4,<F;k22>,I5,<A;k36> 0.9833 5 <A;k35>,I1,<B;k3>,I4,<D;k12>,I3,<F;k22>,I4,<A;k36> 0.9833 6 <A;k35>,I1,<B;k2>,I3,<D;k12>,I4,<F;k22>,I5,<A;k36> 0.9833
0.8 0.1 0.1
1 <A;k35>,I1,<B;k2>,I4,<D;k12,k13>,I4,<F;k22>,I4,<A;k36> 0.9975 2 <A;k35>,I1,<B;k2>,I4,<D;k12>,I4,<F;k22>,I4,<A;k36> 0.9975 3 <A;k35>,I1,<B;k2>,I3,<D;k12,k13>,I4,<F;k22>,I5,<A;k36> 0.995 4 <A;k35>,I3,<D;k12,k13>,I4,<F;k22>,I5,<A;k36> 0.995 5 <A;k35>,I1,<B;k3>,I4,<D;k12>,I3,<F;k22>,I4,<A;k36> 0.995 6 <A;k35>,I1,<B;k2>,I3,<D;k12>,I4,<F;k22>,I5,<A;k36> 0.995
0.1 0.8 0.1
1 <A;k35>,I1,<B;k2>,I4,<D;k12,k13>,I4,<F;k22>,I4,<A;k36> 0.9975 2 <A;k35>,I1,<B;k2>,I4,<D;k12>,I4,<F;k22>,I4,<A;k36> 0.9975 3 <A;k35>,I1,<B;k2>,I3,<D;k12,k13>,I4,<F;k22>,I5,<A;k36> 0.995 4 <A;k35>,I3,<D;k12,k13>,I4,<F;k22>,I5,<A;k36> 0.995 5 <A;k35>,I1,<B;k3>,I4,<D;k12>,I3,<F;k22>,I4,<A;k36> 0.995 6 <A;k35>,I1,<B;k2>,I3,<D;k12>,I4,<F;k22>,I5,<A;k36> 0.995
0.1 0.1 0.8
1 <A;k35>,I1,<B;k2>,I4,<D;k12,k13>,I4,<F;k22>,I4,<A;k36> 0.98 2 <A;k35>,I1,<B;k2>,I4,<D;k12>,I4,<F;k22>,I4,<A;k36> 0.98 3 <A;k35>,I1,<B;k2>,I3,<D;k12,k13>,I4,<F;k22>,I5,<A;k36> 0.96 4 <A;k35>,I3,<D;k12,k13>,I4,<F;k22>,I5,<A;k36> 0.96 5 <A;k35>,I1,<B;k2>,I3,<D;k12>,I4,<F;k22>,I5,<A;k36> 0.96 6 <A;k35>,I3,<D;k12>,I4,<F;k22>,I5,<A;k36> 0.96
5. Conclusions and further study
In the past decade, the recommendation technique has been regarded as a popular
technique for providing a variety of products, services and items to potential visitors
in the tourism industry. Many recommendation systems have demonstrated
themselves efficient tools by designing user interfaces that can smoothly interact with
36
the environment, providing convenient information query tools, or suggesting a set of
associated products (or services). However, three major problems are revealed. First,
these systems simply return a set of suggested facilities (items) in sequential order,
but fail to illustrate the complete visiting path for visitors. Second, previous systems
seldom take the geographic constraints into consideration so that their suggested
routes might be trivial and hard to follow. Third, previous studies seldom take the
time interval between items into consideration. To solve the above problems, this
research defines a Location-Item-Time (LIT) sequence to describe visitor’s spatial and
temporal behavior. To the best of our knowledge, this study is the first work to include
location (region), item, and time-interval information at the same time into a sequence.
Then, the Location-Item-Time PrefixSpan (LIT-PrefixSpan) mining procedure is
developed to discover frequent LIT sequential patterns. Next, the route suggestion
procedure is developed to retrieve suitable LIT sequential patterns. The experimental
results show that the managers can understand their visitors clearly in terms of
proposed location-item-time sequential patterns.
Although the case of a theme park is illustrated in this paper, the proposed
three-phase methodology can be applied to any field if records of location, item, and
time are available. For example, in mobile commerce environment, a customer moves
among cellular girds and makes transaction in the corresponding cell through the
mobile device. Through the proposed recommendation system, a customer can obtain
real time store/shopping suggestion by the mobile device before he/she moves to the
next cellular grid. Similarly, in a grocery store, a customer moves around store aisles
and pick up his/her target products. The recommendation system can provide the
customer an efficient moving path and prompt him/her other popular products to
increase cross-selling opportunity.
Some potential extensions for this research are as follows. First, in some cases, the
37
entrance and exit of a facility might belong to different regions. It would be
worthwhile to discuss such irregular layouts in the future. Second, the minimum
support, time interval range, and the important degree should be decided by users
currently. Further studies can explore how to automate the parameter settings by
adopting optimization techniques. Third, when visitors are visiting a theme park, they
might plan to visit some facilities than others. As such, the further study can ask
visitors input the facility priorities and rearrange the route according to the priorities.
Finally, the proposed system assumes that a visitor makes a recommendation request
at the time he/she enters the park. It is possible, however, that a visitor wants to make
a recommendation request at anytime and anywhere in the park. The future study
might record visitor’s requested location and time so that the system can provide more
flexible suggestions.
Acknowledgement
This work was partially supported by the National Science Council of Taiwan, R.O.C.,
No. 102-2221-E-155-041-MY3.
References
1. Y.L. Chen, M.C. Chiang, M.T. Ko, Discovering time-interval sequential patterns
in sequence database, Expert Systems with Applications 25 (3) (2003) 343-354.
2. Y.B. Cho, Y.-H. Cho, S.H. Kim, Mining changes in customer buying behavior for
collaborative recommendations, Expert Systems with Applications 28 (2) (2005)
359-369.
3. A. García-Crespo, J. Chamizo, I. Rivera, M. Mencke, R. Colomo-Palacios, J.M.
Gómez-Berbís, SPETA: social pervasive e-tourism advisor, Telematics and
Informatics 26 (3) (2009) 306–315.
4. A. Guerbas, O. Addam, O. Zaarour, M. Nagi, A. Elhajj, M. Ridley, R. Alhajj,
38
Effective web log mining and online navigational pattern prediction,
Knowledge-Based Systems 49 (2013) 50–62.
5. C.Y. Heo, S. Lee, Application of revenue management practices to the theme
park industry, International Journal of Hospitality Management 28 (3) (2009)
446–453.
6. C.C. Hung, W.C. Peng, A regression-based approach for mining user movement
patterns from random sample data, Data and Knowledge Engineering 70 (1)
(2011) 1-20.
7. K. Kabassi, Personalizing recommendation for tourists, Telemetric and
Informatics 27 (1) (2010) 51-66.
8. L.H. Li, F.M. Lee, Y.C. Chen, C.Y. Cheng, A multi-stage collaborative filtering
approach for mobile recommendation, Proceedings of the 3rd International
Conference on Ubiquitous Information Management and Communication, 2009,
pp. 88-97.
9. D. Liu, M. Chang, Recommend touring routes to travelers according to their
sequential wandering behaviors, Proceedings of the 10th International
Symposium on Pervasive Systems, Algorithms, and Networks, 2009, pp.
350-355.
10. D.R. Liu, C.H. Lai, W.J. Lee, A hybrid of sequential rules and collaborative
filtering for product recommendation, Information Sciences 179 (20) (2009)
3505-3519.
11. J. Lu, Q. Shambour, Y. Xu, Q. Lin, G. Zhang, BizSeeker: a hybrid semantic
recommendation system for personalized government-to-business e-services,
Internet Research 20 (3) (2010) 342-365.
12. A.S. Niaraki, K. Kim, Ontology based personalized route planning system using
a multi-criteria decision making approach, Expert Systems with Applications 36
(2) (2009) 2250–2259.
13. M. Salehi, I.N. Kamalabadi, Hybrid recommendation approach for learning
material based on sequential pattern of the accessed material and the learner’s
preference tree, Knowledge-Based Systems 48 (2013) 57–69.
14. S. Schiaffino, A. Amandi, Building an expert travel agent as a software agent,
Expert Systems with Applications 36 (2) (2009) 1291–1299.
15. X. Tan, M. Yao, M. Xu, An effective technique for personalization
recommendation based on access sequential patterns, Proceedings of 2006 IEEE
Asia-Pacific Conference on Services Computing, 2006, pp. 42-46.
16. C.Y. Tsai, S.H. Chung, A personalized route recommendation service for theme
parks using RFID information and tourist behavior, Decision Support Systems 52
(2) (2012) 514-527.
39
17. C.Y. Tsai, P.H. Lo, A sequential pattern based route suggestion system,
International Journal of Innovative Computing, Information and Control 6 (10)
(2010) 4389-4408.
18. V.S. Tseng, K.W. Lin, Efficient mining and prediction of user behavior patterns
in mobile web systems, Information and Software Technology 48 (6) (2006)
357-369.
19. Y. Wang, N. Stash, L. Aroyo, P. Gorgels, L. Rutledged, G. Schreiberb,
Recommendations based on semantically enriched museum collections, Web
Semantics: Science, Services and Agents on the World Wide Web 6 (4) (2008)
283-290.
20. G. Yavas, D. Katsaros, O. Ulusoy, Y. Manolopoulos, A data mining approach for
location prediction in mobile environments, Data and knowledge Engineering 54
(2) (2005) 121-146.
21. C.H. Yun, M.S. Chen, Mining mobile sequential patterns in a mobile commerce
environment, IEEE Transactions On Systems, Man and Cybernetics Part C:
Application and Reviews 37 (2) (2007) 278-295.
22. Z. Zhang, H. Lin, K. Liu, D. Wu, G. Zhang, J. Lu, A hybrid fuzzy-based
personalized recommender system for telecom products/services, Information
Sciences 235 (2013)117-129.