cosolorec: joint factor model with content, social ...zhaoxi35/paper/ksem2016.pdf3 iflytek research,...

15
CoSoLoRec: Joint Factor Model with Content, Social, Location for Heterogeneous Point-of-Interest Recommendation Hao Guo 1 , Xin Li 3 , Ming He 1 , Xiangyu Zhao 1 , Guiquan Liu 1(B ) , and Guandong Xu 2 1 University of Science and Technology of China, Hefei, China {guoh916,zxy1105}@mail.ustc.edu.cn, [email protected], [email protected] 2 University of Technology Sydney, Sydney, Australia [email protected] 3 IFLYTEK Research, Hefei, China [email protected] Abstract. The pervasive use of Location-based Social Networks calls for more precise Point-of-Interest recommendation. The probability of a user’s visit to a target place is influenced by multiple factors. Though there are several fusion models in such fields, heterogeneous informa- tion are not considered comprehensively. To this end, we propose a novel probabilistic latent factor model by jointly considering the social cor- relation, geographical influence and users’ preference. To be specific, a variant of Latent Dirichlet Allocation is leveraged to extract the topics of both user and POI from reviews which is denoted as explicit inter- est. Then, Probabilistic Latent Factor Model is introduced to depict the implicit interest. Moreover, Kernel Density Estimation and friend-based Collaborative Filtering are leveraged to model user’s geographic alloca- tion and social correlation respectively. Thus, we propose CoSoLoRec, a fusion framework, to ameliorate the recommendation. Experiments on two real-word datasets show the superiority of our approach over the state-of-the-art methods. Keywords: Location-based Social Network · Point-of-Interest recom- mendation · Topic model · Probabilistic latent factor model · Heteroge- neous information 1 Introduction In recent years, with rapidly development of Location-based Social Net- works (LBSNs), boundary between the physical world and virtual networks is broken. As an interlink between these two worlds, Point-of-Interest (POI) refers to a place, such as restaurant that users may find useful or tend to visit and plays an essential role in LBSN thereby leading to an application - POI recom- mendation. This application can not only benefit merchants by increasing their c Springer International Publishing AG 2016 F. Lehner and N. Fteimi (Eds.): KSEM 2016, LNAI 9983, pp. 613–627, 2016. DOI: 10.1007/978-3-319-47650-6 48

Upload: others

Post on 01-Oct-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CoSoLoRec: Joint Factor Model with Content, Social ...zhaoxi35/paper/ksem2016.pdf3 IFLYTEK Research, Hefei, China xinli2@iflytek.com Abstract. The pervasive use of Location-based Social

CoSoLoRec: Joint Factor Model with Content,Social, Location for HeterogeneousPoint-of-Interest Recommendation

Hao Guo1, Xin Li3, Ming He1, Xiangyu Zhao1, Guiquan Liu1(B),and Guandong Xu2

1 University of Science and Technology of China, Hefei, China{guoh916,zxy1105}@mail.ustc.edu.cn, [email protected], [email protected]

2 University of Technology Sydney, Sydney, [email protected]

3 IFLYTEK Research, Hefei, [email protected]

Abstract. The pervasive use of Location-based Social Networks callsfor more precise Point-of-Interest recommendation. The probability of auser’s visit to a target place is influenced by multiple factors. Thoughthere are several fusion models in such fields, heterogeneous informa-tion are not considered comprehensively. To this end, we propose a novelprobabilistic latent factor model by jointly considering the social cor-relation, geographical influence and users’ preference. To be specific, avariant of Latent Dirichlet Allocation is leveraged to extract the topicsof both user and POI from reviews which is denoted as explicit inter-est. Then, Probabilistic Latent Factor Model is introduced to depict theimplicit interest. Moreover, Kernel Density Estimation and friend-basedCollaborative Filtering are leveraged to model user’s geographic alloca-tion and social correlation respectively. Thus, we propose CoSoLoRec,a fusion framework, to ameliorate the recommendation. Experiments ontwo real-word datasets show the superiority of our approach over thestate-of-the-art methods.

Keywords: Location-based Social Network · Point-of-Interest recom-mendation · Topic model · Probabilistic latent factor model · Heteroge-neous information

1 Introduction

In recent years, with rapidly development of Location-based Social Net-works (LBSNs), boundary between the physical world and virtual networks isbroken. As an interlink between these two worlds, Point-of-Interest (POI) refersto a place, such as restaurant that users may find useful or tend to visit andplays an essential role in LBSN thereby leading to an application - POI recom-mendation. This application can not only benefit merchants by increasing their

c© Springer International Publishing AG 2016F. Lehner and N. Fteimi (Eds.): KSEM 2016, LNAI 9983, pp. 613–627, 2016.DOI: 10.1007/978-3-319-47650-6 48

Page 2: CoSoLoRec: Joint Factor Model with Content, Social ...zhaoxi35/paper/ksem2016.pdf3 IFLYTEK Research, Hefei, China xinli2@iflytek.com Abstract. The pervasive use of Location-based Social

614 H. Guo et al.

revenue through virtual marketing but also benefit customers accrelating theirdecision-making by filtering out uninteresting places thus makes them satisfied.

Traditional recommender systems can be seamlessly applied by treating POIas an ordinary item, however there are several characteristics of POI recommen-dation that make it different from conventional recommendations thus if wellconsidered, the performance would be improved in a significant margin.

– Tobler’s Law of Geographical Influence. As Tobler [1] indicates, the aggrega-tion of a user’s check-ins depicts that users’ check-in probability is inverselyproportional to the geographical distance. Geographical influence can bedenoted as a physical metric between the user and the POI.

– Homophily of Social Correlation. Homophily is one of the most importanttheories in sociology and also works in social network [2]. The depiction ofhomophily suggests that people tend to trust and have similar favorite astheir friends psychologically, thus making social correlation a psychologicalmetric between the user and the POI.

– Heterogeneous Information. A LBSN contains heterogeneous information,such as geographical location, social network, rating data and text reviews.Undoubtedly, utilizing heterogeneous data indubitably results in more preciseuser profiling and more personalized recommendations.

Despite the successes and improvements of the existing studies, the hetero-geneous information is not comprehensively considered in one model. Generallyspeaking, users’ behavior in choosing POIs can be influenced by multiple factors.So our recommendation will show more accurate and efficient if we consider morefactors which will influence users’ bahaviors. However, under some circumstancein reality, one or several pieces of information are not available, so real life appli-cations demand for robust modeling. To this end, we propose a novel probabilisticlatent factor model by considering the geographical location, social correlationand textual reviews simultaneously. Specifically, our model consists of four-foldmeasurements, i.e., physical distance, psychological distance, explicit interestand implicit interest. Firstly, physical distance denotes the distance between theuser and the POI in real world. We apply Kernel Density Estimation (KDE) toestimate the visiting probability of a user at a target POI based on his or herhistorical visiting records in adherence to Tobler’s Law. Secondly, for calculatingpsychological distance, friend-based Collaborative Filtering (CF) is simultane-ously utilized to predict the visiting probability under the assumption called “thephenomenon of homophily” which means a user’s visiting behavior is influencedby his or her friends. Thirdly, in order to leverage a user’s explicit interest,we aggregate all the reviews written by him or her as a document and applyLatent Dirichlet Allocation (LDA) on it to derive the topic distribution of suchuser afterwards. The matching on the corresponding topics reveals preferencefor a user to a POI. Finally, under the belief that there are several factors thatimplicitly affect a user’s decision-making, we apply the Latent Factor Model todepict the implicit interest thereby augmenting it to the explicit interest to formthe entire interest in the model. The detail architecture of this model named asCoSoLoRec can be seen in Fig. 1.

Page 3: CoSoLoRec: Joint Factor Model with Content, Social ...zhaoxi35/paper/ksem2016.pdf3 IFLYTEK Research, Hefei, China xinli2@iflytek.com Abstract. The pervasive use of Location-based Social

CoSoLoRec: Joint Factor Model with Content, Social, Location 615

POI latent factor

user latent factor

user topics

POI topics

explicit interest

implicit interest

interest

psychologicaldistance

social correlation

physical distance

geographical influence

distancex

Probability

Fig. 1. The architecture framework of CoSoLoRec model

In summary, we have made several contributions in this paper:

– We propose a novel probabilistic latent factor model for heterogeneous Point-of-Interest recommendation by simultaneously considering the geographicallocation, social correlation and the user preference.

– Our model can keep its robustness due to its modularization. It means everyheterogeneous information is embedded into the model as a module thusremoving anyone of them won’t affect the praticality but only decline theperformance somewhat.

– We conduct extensive experiments on two real-world large scale datasets toevaluate both the efficiency and the effectiveness of our model. The resultsdemonstrate that our approach outperforms other state-of-the-art methods.

The rest of the paper is organized as follows. We review the recent studies inSect. 2. Problem is defined in Sect. 3.1 along with the model formation in Sect. 3.2and the model description in Sect. 4. The experiment is demonstrated in Sect. 5.We conclude our work in Sect. 6.

2 Related Work

In recent years, POI recommendation has grown in popularity with the increas-ing demand of LBSNs. In reality, geographical information plays critical rolesin influencing user’s behaviors [4,5] since physical interactions are required inLBSNs which differs from other non-spatial situations totally. To exploit influ-ence of geographical information in improving the quality of location recommen-dations, Some techniques model the distance between two locations visited bya user as a distribution for all users. In [4], Ye et al. employed power-law dis-tribution (PD) to model user’s checkin behaviors using naive bayesian method.Instead of making PD assumption, Cheng et al. [5] proposed the Multi-centerGaussian Model (MGM) to capture features of user’s checkin behaviors. How-ever, the geographical influence on individual user’s visiting behaviors should bepersonalized rather than appearing as a common distribution. So Zhang et al.[17] used kernel density estimation (KDE) to model geographical influence as

Page 4: CoSoLoRec: Joint Factor Model with Content, Social ...zhaoxi35/paper/ksem2016.pdf3 IFLYTEK Research, Hefei, China xinli2@iflytek.com Abstract. The pervasive use of Location-based Social

616 H. Guo et al.

personalized distance distribution for each user. In our work, we adopt KDE tomodel geographical factor since its superior in personalized modeling.

According to homophily of social correlation, friends tend to share commoninterests which will make recommendation more accurate and efficient. H. Maproposed social trust concept [12] summarized as friend-based Collaborative Fil-tering (CF) to explain social influence. In [13], H. Tong proposed Random Walkwith Restart(RWR) to capture social relations. In this paper, we adopt friend-based CF since its lower computation cost and more accuracy.

Exploring text information can also better understand patterns in LBSNs andimprove LBSNs services. Ye et al. [24] explored explicit patterns of individualplaces and implicit revelance among similar places through semantic annotation.Liu et al. [9] proposed TL-PMF model to consider both the extent to which a userinterest matches the POI in terms of topic distribution and the word-of-mouthopinion of the POI. Kurashima et al. [19] proposed Geo-Topic Model whichjointly estimates user’s interests and activity areas. Yin [20] proposed LCA-LDAmodel by giving consideration to both personal interest and local preference.Zheng et al. [22] proposed a cross-region collaborative filtering method basedon hidden topics about check-in records to recommend new POIs. Zhang et al.[23] distinguished the user preferences on the content of POIs from the POIsthemselves and combined the predicted rating on content and location of POI.

Previous studies mainly focused on just one or two aspects which affectsuser’s visiting behaviors. However, overall consideration on the joint effects ofheterogeneous information can show great superiority in POI recommendation.Ye et al. [4] incorporated social and geographical influences into user-based CFframework by using linear interpolation. In [5], Cheng et al. considered geo-graphical influence, check-in patterns, frequency and social networks in POI rec-ommendation. Hu and Ester [6] considered spatial and textual aspects of postspublished by mobile users and predicted user’s willing locations. However, thiswork is more similar to a location prediction problem rather than POI recom-mendation. B. Liu [10] proposed the Geo-BNMF model to embrace geographicalinfluence, popularity, text information and latent factor model. Based on this,They also proposed Geo-PFM [11] to combine geographical influence, latentfactor model with latent region. Lian et al. [21] proposed a weighted matrixfactorization method incorporating the modeling of the spatial clustering phe-nomenon. In this paper, we jointly fused heterogeneous information with latentfactor model to describe users’ behaviors.

3 Fusion Model with Heterogeneous Information

In this section, we define the problem of POI recommendation and introduce afused probabilistic latent factor framework for heterogeneous information.

3.1 Problem Definition

The major task for POI recommendation is to recommend POIs which auser has not visited using heterogeneous information in LBSNs. Let U =

Page 5: CoSoLoRec: Joint Factor Model with Content, Social ...zhaoxi35/paper/ksem2016.pdf3 IFLYTEK Research, Hefei, China xinli2@iflytek.com Abstract. The pervasive use of Location-based Social

CoSoLoRec: Joint Factor Model with Content, Social, Location 617

{u1, u2, . . . , ui, . . . , um} be the set of users and V = {v1, v2, . . . , vj , . . . , vn} bethe set of POIs. Each user ui visited some POIs Li historically and rated onthese POIs.

User’s visiting behavior may be not only influenced by geographical distancebetween his destination and visited POIs but also influenced by ratings madeby user’s friends Fi. We regard these factors as geographical influence and socialinfluence which can also be regarded as physical distance and psychological dis-tance respectively. Text information such as reviews may also reflects explicitinterest of user ui with user-topic distribution θi in topic model.

The problem under investigation is essentially how to effectively and accu-rately estimate user’s probability in visiting new POIs by employing informationcontaining above three aspects. To attack this problem, in Sect. 3.2, We formu-late a fused probabilistic latent factor model by incorporating these factors.

Table 1. Mathematical notations

Symbol Size Meaning

U m Set of all users in one LBSN

V n Set of all POIs in one LBSN

Li |Li| Set of locations visited by a user

Xi

(|Li|2

)Sample of distances between Li

Fi |Fi| Friends of user i

ui d Preference of user i

vj d Affinity of POI j

θi K User i’s review topic distribution

πj K POI j’s review topic distribution

As for POIs, each POI vj has its own location lj labeled as vector <lonj , latj > in representing latitude and longitude respectively. Also each POIhas its own textual profiles with its topic distribution πj in topic model. Forconvenience, we term i and j as user ui and location lj respectively.

3.2 Fused Probabilistic Latent Factor Model

To make CoSoLoRec model concrete, we assume the follow factor representa-tion: (1) each user i is associated with his or her interest η (i, j) with respect toPOI j. (2) each user has an intended visiting probability pf (i, j) with respectto POI j on the basis of friend-based CF. (3) geographical influence impels useri to estimate the probability he or she will visits POI j denoted as pl (i, j). Weintegrally consider user’s interest, physical distance and psychological distance.Finally we got a joint model with these three factors:

p (i, j) ∝ η (i, j) ((1 − λ) pl (i, j) + λpf (i, j)) (1)

Page 6: CoSoLoRec: Joint Factor Model with Content, Social ...zhaoxi35/paper/ksem2016.pdf3 IFLYTEK Research, Hefei, China xinli2@iflytek.com Abstract. The pervasive use of Location-based Social

618 H. Guo et al.

The recommend process of user i for POI j can be represented in a generativeway. For user’s preference, η (i, j) can be represented as a linear combinationof latent factor uT

i vj and function of user’s and POI’s observable propertieswhich can be expressed as topic distribution of user i and POI j named asθi,πj . We denote these two parts as implicit interests and explicit interestsof a user respectively. We use η1 (i, j) and η2 (i, j) to notate them. Also, user’srating y (i, j) can be influenced by his visiting probability. Here we adopt Possiondistribution to describe this relation. So fused probabilistic latent factor modelcan be expressed as follows:

1. Draw a user interest(a) Generator user latent factor uiw ∼ Gamma(αU , βU )(b) Generator item latent factor vjw ∼ Gamma(αV , βV )(c) user’s explicit interest η1 (i, j) = θT

i πj , implicit interest η2 (i, j) = uTi vj

(d) user’s interest η(i, j) = η1(i, j) + η2(i, j)2. y (i, j) ∼ P (p (i, j)) where

p (i, j) = (η1 (i, j) + η2 (i, j)) ((1 − λ) pl (i, j) + λpf (i, j))

4 Model Specification

In this section, we introduce detailed model specifications and present our fusionmodel called CoSoLoRec.

4.1 Geographical Influence

We aim to exploit geographical Influence by measuring the distance from auser’s visited POIs to an unvisited POI. Thus we employ Kernel Density Esti-mation (KDE) to model the geographical influence of POIs on users’ visitingbehaviors.

Like MGM, KDE is also a widely-adopted method to estimate geographicalinfluence. What’s more, it shows superior to other methods which model geo-graphical influence in considering visited POIs. We can evaluate general influenceof all POIs using the following method:

pl (i, j) = P

⎛⎝

|Li|⋃t=1

(ct → c0)

⎞⎠ = 1 − P

⎛⎝

|Li|⋂t=1

ct → c0

⎞⎠ = 1 −

|Li|∏t=1

(1 − P (ct → c0)) (2)

From the Eq. (2), we can discover that before fetching geographical influenceof locations pl (i, j), our task is to learn the probability that event ct → c0 occurs.Here we use the same algorithm proposed in [17] to learn it.

P (ct → c0) =1

|Xi|∑

x∈Xi

K(zt − x

δ

)=

1√2π|Xi|

x∈Xi

e− (zt−x)2

2δ2 (3)

Page 7: CoSoLoRec: Joint Factor Model with Content, Social ...zhaoxi35/paper/ksem2016.pdf3 IFLYTEK Research, Hefei, China xinli2@iflytek.com Abstract. The pervasive use of Location-based Social

CoSoLoRec: Joint Factor Model with Content, Social, Location 619

Here, zt is the distance between user visiting POI c0 and each of user’s his-torical POIs, which can be used to derive the probability of c0. K(·) representskernal function. Also δ is a smoothing parameter which is called the bandwidth.We use optimal bandwidth [3] δ ≈ 1.06δ|Xi|−1/5. However, the computationalcomplexity grows rapidly with the increment of Li. So we use efficient approxi-mation algorithm [17] to measure pl (i, j).

Eventually, by combining Eqs. (2) and (3), we can exploit geographical influ-ence of locations by using pl (i, j). However, only the geographical informationis not sufficient. Thus social correlation is introduced.

4.2 Friend-Based Collaborative Filtering

With the exponential growth of online social network, social relationship playsan important role in influencing users’ behaviors. Friends usually have similarbehaviors due to the phenomenon that sociologists call homophily [4].

Aiming to predict the probability of user i to a POI j, we adopt the user-based collaborative filtering(CF) by regarding all of i’s friends as neighbors. Inorder to determine the probability in interval [0, 1], We devise the calculation as:

pf (i, j) =

∑i′∈Fi

sim (i, i′) ri′j∑

i′∈Fisim (i, i′)

· 1

rmax(4)

Here, rmax can ensure pf (i, j) is normalized. sim (i, j) refers to similaritybetween user i and user j. In our study, we choose cosine similarity.

4.3 Probabilistic Latent Factor Model

Probabilistic Factor Model (PFM) [14] is a generative probabilistic model.The notations involved in PFM are defined in Table 1. Here, fij is assumedto follow Possion Distribution, the mean is yij . Also, ui and vj are givencertain distributions as priors. Here, ui = (ui1, ui2, ..., uiw, ..., uid) and vj =(vj1, vj2, ..., vjw, ..., vjd).

Therefore, the process of Probabilistic Factor Model is as follows:

1. for all w, generate uiw ∼ p (uiw|Φuiw)

2. for all w, generate vjw ∼ p(vjw|Φvjw

)

3. generate fij from user i to location j with equation fij =d∑

w=1uiwvjw = uivj

4. generate yij ∼ P(fij

)

Here, Φuiwand Φvjw

are hyperparameter lists respect to uiw and vjw. Weassume latent factors are non-negative in real situations. So uiw and vjw aregiven Gamma distributions as empirical priors [14,15]. The gamma distributionof U and V can be represented as functions:

p (U |αU , βU ) =

m∏

i=1

d∏

w=1

uiwαU −1 exp (−uiw/βU )

βUαU Γ (αU )

(5)

Page 8: CoSoLoRec: Joint Factor Model with Content, Social ...zhaoxi35/paper/ksem2016.pdf3 IFLYTEK Research, Hefei, China xinli2@iflytek.com Abstract. The pervasive use of Location-based Social

620 H. Guo et al.

p (V |αV , βV ) =

n∏

j=1

d∏

w=1

vjwαV −1 exp (−vjw/βV )

βVαV Γ (αV )

(6)

where uiw, vjw, αU , βU , αV , βV >0, Γ (�) is the Gamma function. Given user latentfactor ui and item latent factor vj , the The possion distribution of yij given fijis

P(yij |fij

)= (uivj )

ˆyijexp (−uivj )

yij !(7)

Expressed in matrix of the above equation with F = UV T . With the method ofmaximum a posterior(MAP), posterior distribution of Y can be modeled as

p (U, V |Y, αU , βU , αV , βV ) ∝ P (Y |F ) p (U |αU , βU ) p (V |αV , βV ) (8)

Finally, we can infer parameters with stochastic gradient descent (SGD) method.

4.4 Textual Analysis

In order to extract users’ explicit interest, we use an aggregated LDA model. Themodel has two latent variables with corresponding super parameters as priors:(1) document-topic distributions Θ. (2) topic-word distributions Φ. In order tolearn users’ interests, we aggregate all the reviews written by each user intoa document. Thus, user and document are interchangable in reflecting user’sinterest.

In this way, we build an aggregated Topic Model. Each user in LBSN is asso-ciated with topics following a multinomial distribution, denoted as θ. Also, eachtopic is associated with textual items according to a multinomial distribution.As we have sampled Θ and Φ of Topic Model of users. Obviously, the dimensionfor document-topic distribution of both Topic Model should be the same. Usinggibbs sampling, the topic distribution for POI j and document-topic distributionθ of users is:

πjs =n(s)j + α

K∑

s=1

n(s)j + Kα

θis =n(s)i + α

K∑

s=1

n(s)i + Kα

(9)

Here, n(s)j is the topic observation count for POI j. n

(s)i is the topic observa-

tion counts for user i(document d). V and K are the number of unique wordsand topics. α and β are hyperparameters in corresponding to topic model.

4.5 Learning and Inference

Parameter Estimation. Let us denote all parameters as Λ = {U, V } and letΩ = {αU , βU , αV , βV } be the hyperparameters. Hyperparameters are all apriorigiven. Given the observed data collection D = {p (i, j)}Iij where p (i, j) is theuser visiting probability, and Iij = 1 when user i visited POI j, and Iij = 0otherwise.

Page 9: CoSoLoRec: Joint Factor Model with Content, Social ...zhaoxi35/paper/ksem2016.pdf3 IFLYTEK Research, Hefei, China xinli2@iflytek.com Abstract. The pervasive use of Location-based Social

CoSoLoRec: Joint Factor Model with Content, Social, Location 621

To estimate the parameters Λ, we use maximum likelihood estimation (MLE)method and sampling algorithm to learn all the parameters. So the postpriorprobability can be expressed as the following:

P (Λ|D, Ω) ∝∏

DP (y (i, j) |U, V, Ω)Iij × P (U |αU , βU ) P (V |αV , βV ) (10)

For simplicity, we use logarithmic form of posterior distribution instead. Weexpress this as follows:

L (U, V ;D) =

M∑i=1

N∑j=1

Iij (y (i, j) log p (i, j) − p (i, j))+

M∑i=1

d∑w=1

((αU − 1) log uiw − uiw

βU

)

+

N∑j=1

d∑w=1

((αV − 1) log vjw − vjw

βV

)(11)

Here, p (i, j) =(uTi vj + θT

i πj

)((1 − λ) pl (i, j) + λpf (i, j)) In order to approxi-

mate actual value of ui and vj , we use stochastic gradient descent (SGD) methodto optimize them and update parameters iteratively using all training samples,ξi and ξj are learning rates.

uiw ← uiw + ξi × ∂L∂uiw

vjw ← vjw + ξj × ∂L∂vjw

(12)

4.6 Recommendation

After we learn the parameters Λ, CoSoLoRec model predicts the ratings of a userfor a given POI using E (y (i, j)) =

(uTi vj + θT

i πj

)((1 − λ) pl (i, j) + λpf (i, j)).

We adjust hyperparameters in training process and adjust parameter λ to bal-ance physical and psychological influence in making decisions. Our model learnsthe latent factors by SGD effectively. Therefore we make POI recommendationvia our trained model.

5 Experiment

In this section, we conduct several experiments based on CoSoLoRec model andbaselines to evaluate the performance of our proposed approach empirically. Allexperiments are conducted on two real-world datasets in LBSNs, collected fromYelp and Foursquare.

5.1 Dataset

Yelp dataset. Yelp is a famous website which provides reviews and ratings forrestaurants and other business places [16]. To make the experimental results moreconvinced, we manually choose two cities named “Phoenix” and “Las Vegas”which consist of 80 % of original datasets to evaluate our approaches. We filter

Page 10: CoSoLoRec: Joint Factor Model with Content, Social ...zhaoxi35/paper/ksem2016.pdf3 IFLYTEK Research, Hefei, China xinli2@iflytek.com Abstract. The pervasive use of Location-based Social

622 H. Guo et al.

out users who have more than 300 friends to avoid spam generated by brushedfrom robots and less than 20 friends to make our datasets dense. With the samereason, we filter out users who have more than 300 reviews and less than 20reviews. Finally, we obtain a dataset consists of 3059 users, 26446 business alongwith 180755 review records.

Foursquare dataset. Besides the Yelp dataset, we also evaluate our models onFoursquare (4sq). The dataset includes POIs distributed in the United States.With the same reason in handling Yelp dataset, we filter out users with morethan 500 friends and less than 18 friends. Similarly, we filter our ratings givenby more than 500 users and less than 20 users. We finalize a dataset of 6895users for 13208 POIs with 166989 ratings. Table 2 indicates the data statisticsfor Yelp and Foursquare.

Table 2. Data description

Yelp Foursquare

Number of users 366715 571700

Number of locations 61184 8318919

Review items 1569265 5550203

User-location matrix density 6.99 × 10−5 1.17 × 10−6

Number of cities 10 50

To unify ratings with probability to visit a POI, we normalize the discrete rat-ing using f (x) = x

max{x} , where max{x} represents the largest rating value [7].

5.2 Evaluation Metrics

We present each user with N POIs sorted by the predicted probability andevaluated based on which of these POIs were actually visited by users.

rPrecision. To unify evaluation of a universal baseline and baseline with region-based attached, we introduce relative precision for evaluation. We assume Cas the candidate POIs. The precision in a top-K list of a random recommen-dation system is |Svisited|

|C| . Then, the relative precision [10,25] is defined as:

rPrecision@N = |SN,rec∩Svisited|·|C||Svisited|·N

RMSE. We also use Root Mean Square Error (RMSE) for evaluation. RMSE =√1N

∑(u,i)∈E (rui − rui)

2. where E is the test dataset. rui and rui represent theobserved and predicted performance used for user i on business j respectively.Smaller value of RMSE implies better performance in our recommendation.

Page 11: CoSoLoRec: Joint Factor Model with Content, Social ...zhaoxi35/paper/ksem2016.pdf3 IFLYTEK Research, Hefei, China xinli2@iflytek.com Abstract. The pervasive use of Location-based Social

CoSoLoRec: Joint Factor Model with Content, Social, Location 623

5.3 Baseline Comparison

In this section, we introduce baselines and parameters involved in experimentsand evaluate our model with baselines in a number of experiments.

Comparative Approaches. In this section, in order to show the effectivenessof our CoSoLoRec model, we compare our model with the following baselines:

1© Probabilistic Matrix Factorization (PMF). PMF [7] is a recommen-dation method widely used for different recommendation tasks.

2© Non-negative Matrix Factorization (NMF). NMF [8] is a methodused in recommender system which constrains the factors to be non-negative.

3© Bayesian Non-Negative Matrix Factorization (BNMF). BNMF[18] is an unsupervised learning method using Markov chain Monte Carlo sam-pling method based on Non-negative Matrix Factorization.

4© Geographical-Topical Bayesian Nonnegative Matrix Factoriza-tion (GT-BNMF) [9]. This is a new method combining geographical informa-tion, textual information with other aspects based on BNMF.

5© Geographical Probabilistic Factor Model (Geo-PFM) [11]. This isa fusion method using latent factor model considering geographical informationwith no textual information and social correlation.

To further understand the benefits using different forms of implicit interestand distinction between implicit interest and explicit interest, we implementthree modifications of CoSoLoRec model.

6© CoSoLo-PMF (C-PMF) is a modification of CoSoLoRec model usingPMF to depict implicit interest.

7© CoSoLo-NMF (C-NMF) is a modification of CoSoLoRec model usingNMF to depict implicit interest.

8© CoSoLo-BNMF (C-BNMF) is a modification of CoSoLoRec modelusing BNMF to depict implicit interest.

We divide data as training set and test set on the ratio of 7:3 with thereview time order. For CoSoLoRec model, we set αU = 5 and βU = 0.2 ashyper parameters of prior of U . Also, we set αV = 20 and βV = 0.2 as hyperparameters of prior of V . We initialize λ = 0.5 for comparing our model withbaselines. For PMF, we set hyper parameters σU = 0.05 and σV = 0.05 as priorsof U and V . Also, we set σ = 0.2 as prior of y (i, j). For textual aspects, weinitialize the topic number K = 30, α = 40/K, β = 0.3. We set the number ofregions |R| = 50 for foursquare which is the number of regions according to allthe states in USA(expect Alaska). For yelp, we set |R| = 2 for data from twocities.

Expermental Results. To reflect how the models actually outperform a uni-versal baseline and baseline with region-based attached, we use relative per-formance. Figure 2(a) and (b) indicates CoSoLoRec model outperforms all thebasslines including classical baseline models (PMF,NMF,BNMF) as well asrecent proposed model (GT-BNMF,Geo-PFM) in both datasets. Furthermore,

Page 12: CoSoLoRec: Joint Factor Model with Content, Social ...zhaoxi35/paper/ksem2016.pdf3 IFLYTEK Research, Hefei, China xinli2@iflytek.com Abstract. The pervasive use of Location-based Social

624 H. Guo et al.

10,5 10,10 20,5 20,100

10

20

30

40

50

r-Precision

Dimension,@N

(a) rPrecision@N on Yelp

10,5 10,10 20,5 20,100

10

20

30

40

50

r-Precision

Dimension,@N

PMF NMF BNMF CoSoLo-PMF CoSoLo-NMF CoSoLo-BNMF GT-BNMF Geo-PFM CoSoLoRec

(b) rPrecision@N on Foursquare

Fig. 2. r-Precision with different latent factor dimension and @N

CoSoLo-PMF, CoSoLo-NMF and CoSoLo-BNMF show almost equivalent per-formance in precision. However, they performs better than classical beselinesbut no obvious better performance than recent proposed model. This phenom-enon indicates heterogeneous information can reflect user’s interests accurately.Also, probabilistic latent factor model can reflect user’s implicit interest bet-ter. GT-BNMF model considers heterogeneous information such as geographicalinformation, popularity and user’s interests. However, our model considers fur-ther more about these factors. Geo-PFM model use Possion factor model whichcan guarantees a rigorous probabilistic generative process. However, our modelperforms better since heterogeneous information fused.

Table 3. Performance comparison in different dimensions

D Metrics PMF NMF BNMF C-PMF C-NMF C-BNMF GT-BNMF Geo-PFM CoSoLoRec

Yelp 10 RMSE 0.8225 0.7644 0.766 0.7639 0.7824 0.7769 0.7241 0.7076 0.6692

Improve 18.64% 12.45% 12.64% 12.40% 14.47% 13.86% 7.58% 5.43%

20 RMSE 0.8455 0.7502 0.7564 0.7365 0.7716 0.7672 0.7573 0.6881 0.6693

Improve 20.84% 10.78% 11.52% 9.12% 13.26% 12.76% 11.62% 2.73%

4sq 10 RMSE 0.8792 0.8515 0.8624 0.8335 0.8612 0.8454 0.8282 0.7815 0.7476

Improve 14.97% 12.20% 13.31% 10.31% 13.19% 11.57% 9.73% 4.34%

20 RMSE 0.8763 0.8498 0.85 0.8019 0.8509 0.8334 0.8132 0.7739 0.7319

Improve 16.48% 13.87% 13.89% 8.73% 13.99% 12.18% 10.00% 5.43%

In order to measure the difference between estimated rating values and realrating values in test datasets, RMSE is introduced. We conduct experiments ondifferent latent factor dimensions. From Table 3, we can conclude that our modelachieves less mean square error than baselines with different data divisions. So itis obvious that our model can achieve more accurate recommendation for user’sinterestd places. Also, three modified model based on CoSoLoRec achieves lessmean square error than classical baselines and performs almost equivalently withrecently proposed models. This phenomenon indicates heterogeneous can ensuremore precise prediction in recommendation system.

Page 13: CoSoLoRec: Joint Factor Model with Content, Social ...zhaoxi35/paper/ksem2016.pdf3 IFLYTEK Research, Hefei, China xinli2@iflytek.com Abstract. The pervasive use of Location-based Social

CoSoLoRec: Joint Factor Model with Content, Social, Location 625

5.4 Parameters Sensitivity

As mentioned, both geographical influence and social influence play importantroles in estimating user’s interest on unvisited POIs. So in the following part, werespectively set λ as 0.2, 0.4, 0.6, 0.8 to detect importance of geographical andsocial factors.

Fig. 3. Experimantal results in parameter sensitivity and model robust

From Fig. 3(a) and (b), we can find that rPrecision rises first and falls later whileRMSE shows a reverse trend. Since PMF,NMF,BNMF,GT-BNMF and Geo-PFM do not consider relation between geographical distance and psychologicaldistance, results show no change with different parameter λ. We can find bothgeographical and social influence play comparative roles. Our model outperformsbaselines in every value of λ while three modifications show almost equivalentperformance with Geo-PMF and GT-BNMF.

5.5 Impact of Geographical, Social and Textual Information

In some situations, it is ideal that all aspects are covered in our model. Soit is necessary to evaluate our model if not all the data is present. In thispart, we choose three models which are based on CoSoLoRec model: (1) CoL-oRec: social correlation removed. (2) CoSoRec: geographical factor removed.(3) SoLoRec: textual information removed. All the experiments are conductedin K = 20, N = 10.

Figure 3(c) shows results comparing the above three models with CoSoLoRecmodel. We conclude that above three models perform worse than CoSoLoRecmodel since user preference can not be completely described if one of factorsin CoSoLoRec removed. However, relevant indicators show not much declinecomparing with CoSoLoRec model. In particular, CoLoRec and CoSoRec modelsshow not much decline compared to SoLoRec model which shows users’ textinformation contributes greater than geographical factor and social correlation.

Page 14: CoSoLoRec: Joint Factor Model with Content, Social ...zhaoxi35/paper/ksem2016.pdf3 IFLYTEK Research, Hefei, China xinli2@iflytek.com Abstract. The pervasive use of Location-based Social

626 H. Guo et al.

6 Conclusion

In this paper, we proposed CoSoLoRec model fusing heterogeneous informationlike geographical factor, social correlation and text information. We incorporateuser preference with geographical factor realized by KDE and social correla-tion realized by friend-based CF. Further more, we devide user interest intoexplicit interest implemented by Topic Model and implicit interest implementedby probabilistic latent factor model. Experimental result conducted with Yelpand foursquare dataset demonstrated that CoSoLoRec model is superior to allother approaches evaluated, such as PMF, NMF, GT-BNMF and Geo-PFM andthree different forms of CoSoLoRec model. Also we can conclude text infor-mation is more important than geographical factor and social correlation. Ourmodel performs better in any different combinations between geographical andsocial influence.

References

1. Tobler, W.R.: A computer movie simulating urban growth in the Detroit region.Econ. Geogr. 46, 234–240 (1970)

2. McPherson, M., Smith-Lovin, L., Cook, J.M.: Birds of a feather: homophily insocial networks. Ann. Revi. Sociol. 27, 415–444 (2001)

3. Dehnad, K.: Density estimation for statistics and data analysis. Technometrics29(4), 495–495 (1987)

4. Ye, M., et al.: Exploiting geographical influence for collaborative point-of-interestrecommendation. In: Proceedings of ACM SIGIR. ACM (2011)

5. Cheng, C., Yang, H., et al.: Fused matrix factorization with geographical and socialinfluence in location-based social networks. In: AAAI (2012)

6. Hu, B., Ester, M.: Spatial topic modeling in online social media for location rec-ommendation. In: Proceedings of the 7th ACM Recsys. ACM (2013)

7. Mnih, A., Salakhutdinov, R.: Probabilistic matrix factorization. In: NIPS (2007)8. Lee, D.D., Seung, H.S.: Learning the parts of objects by non-negative matrix fac-

torization. Nature 401(6755), 788–791 (1999)9. Liu, B., Xiong, H.: Point-of-interest recommendation in location based social net-

works with topic and location awareness. In: SDM, vol. 13 (2013)10. Liu, B., et al.: Learning geographical preferences for point-of-interest recommen-

dation. In: Proceedings of ACM SIGKDD. ACM (2013)11. Liu, B., et al.: A general geographical probabilistic factor model for point of interest

recommendation. TKDE 27, 1167–1179 (2015)12. Ma, H., Lyu, M.R., King, I.: Learning to recommend with trust and distrust rela-

tionships. In: Proceedings of ACM Recsys. ACM (2009)13. Tong, H., Faloutsos, C., Pan, J.-Y.: Fast random walk with restart and its appli-

cations. In: Proceedings of IEEE ICDM. IEEE Computer Society (2006)14. Ma, H., et al.: Probabilistic factor models for web site recommendation. In: Pro-

ceedings of ACM SIGIR. ACM (2011)15. Chen, Y., et al.: Factor modeling for advertisement targeting. In: NIPS (2009)16. Anderson, M., et al.: Learning from the crowd: regression discontinuity estimates

of the effects of an online review database. Econ. J. 122(563), 957–989 (2012)

Page 15: CoSoLoRec: Joint Factor Model with Content, Social ...zhaoxi35/paper/ksem2016.pdf3 IFLYTEK Research, Hefei, China xinli2@iflytek.com Abstract. The pervasive use of Location-based Social

CoSoLoRec: Joint Factor Model with Content, Social, Location 627

17. Zhang, J., Chow, C.-Y., et al.: iGeoRec: a personalized and efficient geographi-cal location recommendation framework. IEEE Trans. Serv. Comput. 8, 701–714(2015)

18. Schmidt, M.N., Winther, O., Hansen, L.K.: Bayesian non-negative matrix fac-torization. In: Adali, T., Jutten, C., Romano, J.M.T., Barros, A.K. (eds.) ICA2009. LNCS, vol. 5441, pp. 540–547. Springer, Heidelberg (2009). doi:10.1007/978-3-642-00599-2 68

19. Kurashima, T., Iwata, T., Hoshide, T., Takaya, N., Fujimura, K.: Geo topic model:joint modeling of user’s activity area and interests for location recommendation.In: Proceedings of ACM WSDM. ACM (2013)

20. Yin, H., Sun, Y., Cui, B., Hu, Z., Chen, L.: Lcars: a location-content-aware rec-ommender system. In: Proceedings of ACM SIGKDD. ACM (2013)

21. Lian, D., Zhao, C., Xie, X., Sun, G., Chen, E., Rui, Y.: GeoMF: joint geograph-ical modeling and matrix factorization for point-of-interest recommendation. In:Proceedings of ACM SIGKDD. ACM (2014)

22. Zheng, N., Jin, X., Li, L.: Cross-region collaborative filtering for new point-of-interest recommendation. In: Proceedings of WWW Companion (2013)

23. Zhang, C., Wang, K.: POI recommendation through cross-region collaborative fil-tering. KIS 46, 369–387 (2016)

24. Ye, M., Shou, D., Lee, W.-C., et al.: On the semantic annotation of places inlocation-based social networks. In: Proceedings of ACM SIGKDD. ACM (2011)

25. Yin, P., Luo, P., Lee, W.-C., Wang, M.: App recommendation: a contest betweensatisfaction and temptation. In: Proceedings of ACM WSDM. ACM (2013)