content-based cross-domain recommendations using segmented...

Content-Based Cross-Domain Recommendations UsingSegmented Models

Shaghayegh SahebiIntelligent Systems Program

University of PittsburghPittsburgh, PA

[email protected]

Trevor WalkerLinkedIn

Mountain View, [email protected]

ABSTRACTCross-Domain Recommendation is a new field of study in thearea of recommender systems. The goal of this type of rec-ommender systems is to use information from other sourcedomains to provide recommendations in target domains. Inthis work, we provide a generic framework for content-basedcross-domain recommendations that can be used with var-ious classifiers. In this framework, we propose an e�cientmethod of feature augmentation to implement adaptationof domains. Instead of defining the notion of domain basedon item descriptions, we introduce user-based domains. Wedefine meta-data features as a set of features to characterizethe fields that domains come from and introduce indicatorfeatures to segment users into di↵erent domains based onvalues of the meta-data features. We study an implementa-tion of our framework based on logistic regression and per-form experiments on a dataset from LinkedIn to perform jobrecommendations. Our results show promising performancein certain domains of the data.

1. INTRODUCTIONRecommender systems can help users to address the infor-mation overload problem by providing related items con-sidering user’s interests. So far, most of the recommendersystems were focused on specific domains, recommendingone type of item (such as books) to all categories of users.Recent research introduced cross-domain recommender sys-tems that aim to take advantage of shared information amongvarious domains [10]. In cross-domain recommendation, thegoal is to use various source domain information to recom-mend items in target domains. Previous studies on col-laborative filtering cross-domain recommender systems hasshown an improvement of accuracy of recommendations, es-pecially in the cold-start case [14]. Most of the work oncross-domain recommender systems and the definition of do-mains in them has been based on cross-domain collaborativefiltering methods and ignored the domains that can be de-fined based on user specifications. Li [10] has categorizedthe domains in cross-domain recommendation into system,

CBRecSys 2014, October 6, 2014, Silicon Valley, CA, USA.Copyright 2014 by the author(s)

data, and temporal domains. These domains are relatedto, respectively, di↵erent datasets that a recommender sys-tem is built upon, various representation of user preferences(explicit or implicit), and various time points in which thedata is gathered. Although this is a good classification ofpossible domains in recommender systems, it focuses on thetype or domains that are defined based on items. In otherwords, usually the notion of domain is selected as a constantcharacteristic of items, systems, etc. For example, type ofitems (e.g. books, movies, etc.)[14], genre of items (e.g. formovies) [1], or indicators of various systems that the datais gathered from [6] are some of features that have beenused as domain indicators. Joshi et. al. [5] have chosendomains based on meta-data features. These meta-data fea-tures can be selected from all unique subsets of features byexperimenting (e.g. selecting the best performing featureson a validation set) which is a time-consuming task. Choos-ing the proper domain indicator among features is still anopen field of research. In this paper, we propose a frame-work for performing content-based cross-domain recommen-dation. We propose that in content-based and hybrid rec-ommender systems, the domain notion can be extended tothe type or domain of users. Here, the definition of domaincan be determined based on the recommendation task andusers’ information, such as users’ demographic data. For ex-ample, in a movie recommendation task, age of user can bean e↵ective factor in deciding which movies match the bestfor her. As another example, in job recommendation, weexpect the model parameters to be di↵erent for di↵erent jobfunctions of users. For a designer, it is important to havematching skills with the job description, while for a networkengineer, his certificates might have more importance.

We choose job recommender application in our experimentin this paper, although it is applicable to other recommendersystem domains. Having many di↵erent jobs listed online invarious industries with di↵erent job descriptions, it is es-sential for people to find the job that best matches theirabilities and specifications. Searching for the right job is atime-consuming task for a user. It needs spending a lot ofe↵ort on defining the criterion the user is looking for. Jobrecommender systems can address this problem by activelyfinding good job matches for the user, utilizing her profileinformation, search keywords, etc. Based on previous resultsin the job recommendation literature [8, 9] and our field ex-perience, we believe that job recommendations can benefitfrom cross-domain information.

57

Copyright 2014 for the individual papers by the paper’s authors. Copying permitted for private and academic purposes. This volume is published and copyrighted by its editors.CBRecSys 2014, October 6, 2014, Silicon Valley, CA, USA.

In this paper, our proposed framework can be utilized byvarious algorithms defined on any notion of domain fromdata attributes. Our work also di↵ers from the existing lit-erature in defining domains on the user profile side insteadof item side. It can, of course, be used for domains definedon item-set features. We experiment with LinkedIn datafor job recommendations. Our experiments lead to promis-ing results for content-based cross-domain recommendationsbased on user job functions.

In Section 2, we briefly discuss related literature. In Section3, we introduce our approach to content-based cross-domainrecommendation. We present our dataset and experimentsetup in Section 4 and we discuss the results in section 5.

2. RELATED WORK2.1 Segmented Regression ModelSegmented regression or piece-wise regression [13] can beused as a classification method in which data features arepartitioned into intervals using some breakpoints. In thefinal model, a separate model will fit each of the segments.This model is useful for approximating higher-degree modelswith multiple lower-degree models in smaller ranges. In thispaper, we adopt this method to our problem of cross-domainrecommendation.

2.2 Feature Augmentation in Domain Adap-tation

Feature augmentation was introduced by Daume [3] in domain-adaptation literature. In his paper, Daume considers a sourcedomain and a target domain separately. He augments eachof these domains individually by copying the feature spacethree times: once copying all features for the general version,and once for each of the source and target versions. Even-tually, the augmented source data will contain only generaland source-specific versions and the augmented target datacontains general and target-specific versions. After Daume’spaper, this method have been used in the domain-adaptationfield, especially in Natural Language Processing (NLP)[2, 4,5].

Our approach improves Daume’s model in the possibilityof having multiple meta-data features for defining the do-mains (instead of having one dimension of source and tar-get domains). In addition, each of the meta-data featurescan have multiple values and define multi-dimensional do-mains. Moreover, we can have separate sets of common(overlapping) and uncommon features in the main-e↵ect anddomain-specific models. Our model is extensible to incorpo-rating cross-products of domain indicator features.

2.3 Job RecommendationDespite of the importance of job recommender systems, therehave not been many research on this subject. Rafter etal. [12] introduced CASPER, an intelligent online recruit-ment service. Keim [7] provided a multilayer frameworkto support the matching of individuals for recruitment pro-cesses. In [11] Hutterer used hybrid user profiling to en-hance the job recommendation results. He incorporated ex-plicit and implicit feedback of user in the user profile. Leeand Brusilovsky [8, 9] implemented and experimented withProactive, which has multiple interfaces for various types of

users. They showed that di↵erent users use various informa-tion resources to look for the perfect job.

3. OUR FRAMEWORK: SEGMENTED MODELFOR CROSS-DOMAIN RECOMMENDA-TION

In cross-domain recommendation, we aim to build a modelthat can be general and flexible enough, to transfer the infor-mation in multiple related domains, and specific enough, tocapture particular aspects of each individual domain. Thismeans that we expect a trade o↵ between the bias and thevariance in our model. We want all models to be close toeach other in particular dimensions (having less variance)and we want them to be biased towards each domain’s spe-cific distribution. For example, if we think of various jobfunctions as di↵erent domains in job recommendation, weexpect the user profile to have a good match to the job de-scription in all domains (common feature of the domains).Also, we expect the skills feature to be more important for anartist than a university professor (domain-specific feature).If we consider one main model for all of the data presentin various domains, we are going to have no variance in themodel, but we will lose the bias we are looking for. On theother hand, if we treat each domain with a separate model,we will achieve the bias each domain is introducing, but wewill have too much variance in the achieved models. In otherwords, we will lose the ability to transfer common infor-mation among di↵erent models. Our framework consists oftwo parts: the main-e↵ect model, and domain-specific mod-els. The main-e↵ect model is to model the shared statisticsamong all domains. We have one domain-specific model perdomain to capture the domain-specific characteristics. Ageneral formulation of model can be seen in Equation 1.

Final-Model = Main-E↵ect Model+X

i2Domains

Domain-Specific Modeli(1)

3.1 Cross-Domain Augmentation and Segmen-tation

As said before, we characterize the fields that the domainscome from by some features called meta-data features. Eachdataset has a set of features, such as user-related features,item-related features, and features that represent similaritiesbetween users and items, that we call them “base features”.Meta-data features are a subset of base features, which spec-ify aspects that we want to define the domains based on.Each domain is constructed based on the values of thesemeta-data features and their combinations. For example, ifwe want to recommend movies to users, base features areuser features, such as age, education, language, etc, itemfeatures, such as movie genre, actors, director, etc, or therelationship between users and items, such as the similaritybetween each movie genre and genres that a user likes. Wecan choose some of these base features as meta-data featuresto define the notion of domain based on them. For example,we can choose the genre base feature as the meta-data fea-ture. In this case, the defined domains will be action movies,drama movies, action-drama movies, etc. If we choose twometa-data features, the domains can be a combination of val-

57

ues for those meta-data movies. For example, if we choosegenre of movie and age of user as meta-data features, the do-mains will be like: action-middle-age, drama-young, etc. Inthe case of user-based domains for job recommendation, wecan choose some base features of users, such as job function,or job seniority of users, as meta-data features. For example,if we want to define the domains based on job function, peo-ple who have IT job function form one domain and peoplewho have medical job function form another domain.

3.1.1 AugmentationEach of the domain-specific models in Equation 1 works onone domain’s data. As a result, we should split the datasetfor each domain and provide each domain-specific modelwith the section of the dataset related to that domain. Toaddress this splitting, we expand on the idea of segmentedregression model. We propose to augment the feature spacebased on the domain definitions and copy each datapointinto the related domain’s sub-space. The main space is thenused for the main-e↵ect model and each copy of the space isused for the related domain-specific model.

However, this augmentation has a problem: if we have kdi↵erent domains (e.g. k di↵erent job functions), we need topartition the data space into 2k separate segments (copies)to capture all of di↵erent settings of the dataset which takestoo much space. Each one of these copies is for each subsetof the possible combination of meta-data feature values (do-mains). For example, if we want to partition users based onthe job function represented in their resume, and we havethree di↵erent possible job functions (e.g. operations, edu-cation, and sales), we will have to partition the data into23 = 8 segments: people for whom the job function is inoperations, the ones who have sales function, the ones whohave education job function, the ones who have sales andoperation functions, and etc. In addition to the space prob-lem, segmenting the data into combinations of domains maylead to very sparse copies of the original dataset. Addition-ally, looking at all various combinations of the domains andtheir interactions might not be necessary for our purpose.

3.1.2 Indicator Features for SegmentationTo alleviate the problem indicated in Section 3.1.1, we ex-pand on the idea of Segmented Regression Model and in-troduce indicator features for each domain. These featuresallow us to augment the feature space in a polynomial order,while being able to keep the main e↵ect model and controlthe granularity of combinations among di↵erent domains.Suppose that each meta-data feature can take k di↵erent val-ues and segment our data into k domains and suppose thatwe are choosing only one meta-data feature. For each of thek possible values in each of the domains, we define a binaryindicator feature, representing if a data point falls into thatspecific domain or not. As a result, we will end up with kbinary indicator features for the selected meta-data feature.Eventually, we will augment the feature space based on theindicator features in the following way: We keep the originalfeature space for the common features used in the main ef-fect model. For each of the features falling into the domain-specific models, we augment the feature space by copyingit k times for each of the meta-data feature values. Con-sequently, we have a polynomial expansion of space. Note

Figure 1: Augmented Feature Space with c CommonFeatures in the Main E↵ect Model, f Features inEach of the d Domains, that Are Represented byIndicator Features with k Possible Values

that we do not consider combinations of meta-data featurevalues yet.

Now, if we have d meta-data features to choose the do-mains from, each of which can take k values, and in eachof the domain-specific models, f of the base-features exist,we should replicate this f dimensional space for dk+1 times.This number is polynomial in d (number of meta-data fea-tures) and k (number of values for each meta-data feature).While if we have not used the indicator features in segmentedmodel, the f -dimensional feature space should have beenreplicated for dk times. Considering having c common fea-tures for the Main E↵ect Model, we can represent the newfeature space by Figure 1.

3.1.3 Challenges and AdvantagesAn advantage of this framework is its extensibility to higherorder cross-products of values between and within domains.For example, if we want to consider the e↵ect of interactionbetween two domains, we can extend the model to consideran indicator feature, representing cross-products of featurevalues in the domains. E.g. if we want to take into ac-count the combination of every two feature values withineach of the domains, we will end up with a space that hasO(c+ f(dk+1)+ f(dk2 +1)) dimensions or is expanded fordk + dk2 + 2 times. This gives us the ability to control thedimensionality of feature space while avoiding the sparsityin each of the segments.

Another challenge is choosing the features that should bein the main-e↵ect model and the ones that should remainas domain-specific features. In other words, which featuresshould be responsible for transferring the information amongdomains (controlling the variance) and which ones shouldprovide the domain-specific bias? In our approach, the modelcan learn which features to use in the main e↵ect model andwhich features to use in each of the domains using regular-ization. Since regularization imposes coe�cient values to beas close to zero as possible, the less important coe�cientsof the model will have very small values and are removedfrom the model. While the model can choose between thesesets of features, we can also initialize the main e↵ect and

58

domain-specific features by the expert’s domain knowledge.

Besides, in some of the cross-domain recommender prob-lems, each domain has its own subset of a general featureset, which might di↵er in the size or type with other do-mains’ feature sets. We expect our cross-domain solutionto consider this problem and be extensible to domains withheterogeneous number and types of features.

3.2 Implementation Using Logistic RegressionAlthough the approach we presented here can be used invarious classification algorithms, we used a straightforwardclassification algorithm to implement the model.

Suppose that fci is the ith common feature among the do-mains; M is the set of meta-data features; Vj is the set ofvalues (domains) for the jth meta-data feature; Iij is the bi-nary indicator feature for the domain i of meta-data featurej; fijk represents the kth base feature specific to the ith do-main of meta-data feature j; and p is the probability of themodel’s outcome. Equation 2 shows the resulting segmentedregression model with indicator features. As we can see, ithas a simple representation that can be implemented easilyfor domain adaptation.

logit(p) =X

i

wci ⇥ fci +X

j2M

X

i2Vj

IijX

k

wfijk ⇥ fijk (2)

Here w represent the weight (importance) of each featurein the model. In case we want to extend it to having two-way interaction e↵ects of values of each meta-data feature(belonging to two domains), we will have:

logit(p) =X

i

wci ⇥ fci +X

j2M

X

i2Vj

IijX

k

wfijk ⇥ fijk

X

j2M

X

i2Vj

X

l2Vj

Ii,l,jX

k

wfi,l,j,k ⇥ fi,l,j,k(3)

In Equation 3 Ii,l,j represents the binary indicator featurefor the datapoint belonging to both i and l domains (orhaving both i and l values) of meta-data feature j. To decidewhich features should be in the common set of features andwhich should be in each of the domain-specific models, weuse L2 regularization.

4. EXPERIMENTAL SETUP4.1 DataThe dataset we are using in this study is LinkedIn’s job ap-plication data. It contains records of users and job featuresand a binary label indicating if the user has applied for thejob or not. Some of base features are calculated similar-ities between the job and the user. For example, we useTF.IDF to calculate the similarity of job description withuser’s skills and store it as a feature in the user-job record.Some other features, from which we have chosen the meta-data features, are user-specific. For example, the past andcurrent job functions of a user, past and current industries

Figure 2: Coverage of User Current Job Functionsin O✏ine Data

the user has worked at, or the geographical location of user.Meta-data features should have categorical values, so thatwe can extract binary indicator features from them. In casewe want to use features with continuous values, we partitionvalues into more than one category.

The o✏ine dataset used in the following experiments consistsof three million records of more than 150, 000 users. Thereis a one to ten ratio of positive job applications to negativejob applications in the dataset. We use 100 user-job featuresas base features in the model.

We pick one meta-data feature (user’s current job function)from user-specific features and split domains based on it.This feature is specifically related to what a user does inhis/her job, e.g., a user can be an IT (job function) engineerin a bank. We choose this feature based on our experiencethat people working in various functions (e.g. arts and legaldomain) have di↵erent requirements and definitions for agood job recommended to them.

Based on the LinkedIn data, current job functions of a usercan have 26 di↵erent values. Each user can have multiplejob functions at the same time. The distribution of user jobfunctions is not uniform in the dataset: some functions aremore covered in the dataset and some include less number ofusers. Figure 2 shows the coverage of current job functionsin the data. As we can see in the picture, “Sales”, “Oper-ations”, and “Information Technology” are among the mostcovered job functions in the data and “Community and So-cial Services”, “Real Estate”, and “Military and ProtectiveServices” are the job functions with least coverage.

4.2 Implementation of Models for Job Recom-mendation

As we discussed in section 3, we can have two extremes ofmodeling users as our baselines: a) when there is only onemain model for all of the users, and b) when there is a sepa-rate model for each segment of users and there is no shared

59

Figure 3: Three Experiment Settings with Di↵erentGranularities of Domain Definition

information among the models. In each of our studies, wecompare our model to at least one of these two baselines.

To capture the e↵ect of domain granularity on recommenda-tion results, we experiment with three di↵erent settings fordomains: segmenting on two domain indicator features ver-sus all other domains (two-vs-all), segmenting on all indica-tor features of a domain (all-indicators), and segmenting onclusters of indicator features of a domain (domain-clusters).Figure 3 shows a graphical demonstration of user segmenta-tion in each case for job functions.

For the two-vs-all model, we choose the two most coveredvalues of the selected meta-data feature and define three in-dicator features based on that: the indicator feature that se-lects users with the most covered value, the indicator featurethat selects users with the second most covered value, andthe indicator feature for the rest of users. For example, foruser’s job function meta-data feature, we will have the fol-lowing indicator features: IS for users with “Sales” job func-tion, IO for users with “Operations” job function, and IOther

for all other users. The final model is presented in Equation4. Here, fci shows the i

th common features among domains,fSi , fOi , and fOtheri are features used in the “Sales”, “Oper-ations”, and “Other” domains respectively, wj is the weightfor feature j, and p is the probability that the target userapplies for the target recommended job. Note that since wehave chosen only one meta-data feature, we do not need topresent it in the model (e.g. j 2 M in Eq. 2). Also, notethat by including IOther as an indicator feature, we are cap-turing the e↵ect of the interaction or cross-product of“Sales”and “Operations” domains.

logit(p) =X

i

wci ⇥ fci + ISX

i

wfSi⇥ fSi+

IOX

i

wfOi⇥ fOi + IOther

X

i

wfOtheri⇥ fOtheri

(4)

For the all-indicators model, we segment based on all valuesof the selected meta-data feature. If the meta-data featurecan take k values, we will end up with k indicator featuresto segment all users into k di↵erent partitions. For user’sjob function meta-data feature we end up with 26 di↵erentindicator features. Our final model is shown in Eq. 5.

logit(p) =X

i

wci ⇥ fci +X

i21..26

IiX

j

wfi,j ⇥ fi,j (5)

Here, Ii shows the indicator feature for domain i (di↵er-

Figure 4: Clusters of User Job Function

ent values of job function); and fi,j represents the jth basefeature of domain i.

For the last set of experiments (domain-clusters), we clus-ter the values of meta-data features into groups. Each grouprepresents a cluster of domains. We use one indicator featurefor each group. We use spectral clustering to group 26 di↵er-ent job functions into 8 clusters. The clusters are based onthe user transition between job functions in the data. Figure4 shows a tag-cloud representation of these clusters. Eachcolor indicates one cluster. As we can see in the picture,functions like “Sales” and“Marketing”are clustered togetherand “Research” and “Education” functions fall in one clus-ter. We run the segmented model having one indicator percluster. Suppose that C is the set of cluster indicators for ameta-data feature. The final model is shown in Equation 6.

logit(p) =X

i

wci ⇥ fci +X

i2C

IiX

j

wfi,j ⇥ fi,j (6)

The final model we have in domain-clusters is similar to theall-indicators model, but domain indicators are representa-tive of each cluster of job functions.

5. PERFORMANCE ANALYSISWe experiment in the two-vs-all and domain-clusters set-tings for current job function of users as meta-data feature.We divide the data into 70% train and 30% test subsets.

We measure accuracy of the algorithms on the test set. Tofind the performance of algorithm in each domains of thedata, we partition the test set into domains in the sameway that we partitioned the training set and calculate theaccuracy in each domains of the dataset. Our model is com-pared to at least one of the two baseline models: “one-for-all” and “independent”models. The “one-for-all”model onlycontains one model for all of the datapoints, ignoring thedomain-specific models. The “independent” model trains aseparate model for each of the domains independently. Thismodel ignores the common information among the domainsand treats them as independent from each other.

To dig deeper into the o✏ine results, we look at the coe�-cient values obtained by the algorithm in each of the models.

60

Table 1: Accuracy of “two-vs-all” vs. “one-for-all”and “independent” models for job functions

Domain One-for-AllTwo-vs-All(Segmented)

Independent

Sales 96.28% 96.33% 95.01%Operations 96.43% 96.49% 94.93%Sales andOperations

96.54% 96.58% NA

Other 96.44% 96.45% 96.44%

5.1 Two vs. All ModelAs explained in Section 4, in this two-vs-all setting two mostcovered domains are compared to the rest of the domains.We compare our model with the two base models: “one-for-all” and “independent” models. The most covered jobfunctions in the data are “Sales” (about 12% coverage) and“Operations” (about 8% coverage). The accuracy resultsfor the “job function” meta-data feature are shown in Table1. The “Sales and Operations” row represent the domainwith users in both“Sales”and“Operations”domains and the“Other” row shows the users who are not in any of “Sales”or “Operations” domains. The first two rows show the userswho are only in “Sales” or only in “Operations” domainsrespectively.

As we see in table 1, the segmented model has slightly higheraccuracy than the base models. The di↵erence is bigger forthe “independent” base model, specially in the two most-covered domains. To understand how the models work dif-ferently, we look at the coe�cients assigned to base vari-ables of “two-vs-all” segmented model and “one-for-all” basemodel in Figure 5. In this picture, we can compare the co-e�cient values for these two models. The “two-vs-all” seg-mented model can have more than one coe�cient for eachvariable: the variable might repeat in the main-e↵ect partof the model or in each of the domain-specific parts of themodel. To be able to compare the coe�cient values, we usethe average coe�cient value of the main-e↵ect and domain-specific parts of the “two-vs-all” segmented model. The reddots represent this average value and the bars around themrepresent the variance of these coe�cients in the model.The blue dots are coe�cient values in the “one-for-all” basemodel. As we can see in the picture, some of coe�cients havea di↵erent value in the two models. For example, the simi-larity of user skills with job description has more importancein the “two-vs-all” segmented model. In addition, we can seethat some of base variables existing in the “two-vs-all” seg-mented model, do not exist in the “one-for-all” base model.The reason is that these variables were removed automat-ically during the regularization process. For example, thesimilarity between user location and the location of the jobis only present in the “two-vs-all” segmented model. Basedon Figure 5, the “two-vs-all”model’s coe�cients have di↵er-ent variance in the main-e↵ect and domain-specific models.Some of the coe�cients vary more than the others. Thiscan indicate that this model is able to capture the di↵erencebetween di↵erent domains.

To understand if the “two-vs-all” segmented model is cap-turing the di↵erence between each of the domains, we lookat coe�cients of variables in all four job function domains(Sales, Operations, Sales and Operations, and other) in Fig-

Figure 5: Coe�cient Values for Two-vs-All Seg-mented Model (Red) Compared to the“One-for-All”Base Model (Blue)

Table 2: Accuracy of domain-clusters segmentedmodel vs. “one-for-all” model for job functions

Domain One-for-AllDomain Clusters

(Segmented)Cluster 1 96.57% 96.52%Cluster 2 96.18% 96.26%Cluster 3 96.62% 96.75%Cluster 4 96.98% 97.09%Cluster 5 97.58% 97.61%Cluster 6 96.68% 96.85%Cluster 7 96.56% 96.61%Cluster 8 96.34% 96.36%

ure 6. The coe�cient values are shown as stacked over eachother in the picture. Since there are many base variablesin the model and their names are not easily readable, weremoved the names in figures of this section. As we can see,coe�cient values for some of the variables are di↵erent forvarious domains. With a closer look we can find the dif-ferences in coe�cient values. For example, the similarity ofpast positions of user to the job description is more impor-tant for the Sales domain than the Operations domain. Thesimilarity of user skills to the job’s required skills are moreimportant for users in the Operations domain than Salesdomain.

5.2 Domain Clusters ModelAs explained in section 4, user job function meta-data fea-tures are grouped into 8 clusters. The accuracy results forthe clusters in the “job function”meta-data feature is shownin Table 2. As we can see in this table, the accuracy ofthe models are very close to each other, for some clustersthe baseline models work better than the domain-clusterssegmented model and for others it is the reverse.

We compare coe�cient values of the one-for-all baseline anddomain-clusters model to have a more detailed insight ofthe results. Looking at the di↵erences of coe�cient valuesfor each of the domains in the segmented model, we canunderstand how di↵erent features are more important foreach of the domains. Figure 7 shows the coe�cient values ofdomain-clusters model for the job function meta-data fea-ture. As we can see in the picture, some of the coe�cientsare more important in some of the domains and less in oth-ers. For example, The similarity of user’s previous searchesto the job description is more important to users in cluster2 (including sales, marketing, and similar job functions).

61

Figure 6: Coe�cient Values for Di↵erent Domains in Two-vs-All Segmented Model

Figure 7: Coe�cient Values for Domain Clusters Segmented Model for Job Functions

Figure 8: Accuracy of for All Indicators Segmented Model for Job Functions

62

Figure 9: Coe�cient Values for All-Indicators Seg-mented Model (Red) Compared to the“One-for-All”Base Model (Blue)

5.3 All Indicators ModelIn this model, we pick all of the possible values for a meta-data feature as domain indicators: each indicator featureis representative of one of the values the meta-data featurecan take. Since we choose job function as our meta-datafeature, we will end up with 26 domains related to job func-tions, such as IT, sales, engineering, real estates, and mar-keting. We can see accuracy results of the all-indicatorssegmented model and the two one-for-all and independentbaseline models in Figure 8. As we can see here, the all-indicators model is usually slightly better than the two othermodels. In some of the domains (such as Product Manage-ment (number 19) and Real Estate (number 23) domains)the one-for-all model has more accuracy than all-indicatorsmodel.

Comparing coe�cients of one-for-all and all-indicators mod-els in Figure 9, we can see that some of the base featureshave a very di↵erent coe�cients in the model. Also, someof base features have a large variance in di↵erent domains ofthe all-indicators model. There are some base features thatare removed from the one-for-all model by regularization,while they still play a role in the all-indicators model.

6. CONCLUSION AND FUTURE WORKIn this paper, we propose a framework for content-basedcross-domain recommender systems. This framework is flex-ible enough to be implemented with various classifiers. Themodel in this framework can transfer common informationamong di↵erent domains while keeping them distinct. Wedefine user-based domains based on users’ meta-data fea-tures and implement our framework using logistic regression.The regularization in the model allows us to pick importantfeatures of each of the domains automatically, while keepingit flexible to accept expert knowledge in choosing the fea-tures. We experiment on job recommendations for LinkedInusers. Our results indicate slight improvement in recom-mendation accuracy in the o✏ine setting. Furthermore, theexperimental results are promising: i) di↵erent features havedi↵erent coe�cient values in each of the domains; and ii) co-e�cients are di↵erent in the cross-domain model comparedto the one-for-all base model. As a result, we are hopefulthat this model can be a good fit to our problem in the onlineexperiments (A/B testing). We expect several directions forfuture work: implementing the framework based on various

classifier algorithms, expansion of experiments of the modelusing di↵erent meta-data features, and experimenting on in-teraction of various possible domains. Automatic selectionof meta-data features is another interesting direction of re-search.

7. REFERENCES[1] S. Berkovsky, T. Kuflik, and F. Ricci. Mediation of

user models for enhanced personalization inrecommender systems. User Modeling andUser-Adapted Interaction, 18(3):245–286, 2008.

[2] J. H. Clark, A. Lavie, and C. Dyer. One system, manydomains: Open-domain statistical machine translationvia feature augmentation. In Proceedings of the TenthBiennial Conference of the Association for MachineTranslation in the Americas, 2012.

[3] H. Daume III. Frustratingly easy domain adaptation.In ACL, volume 1785, page 1787, 2007.

[4] L. Duan, D. Xu, and I. Tsang. Learning withaugmented features for heterogeneous domainadaptation. arXiv preprint arXiv:1206.4660, 2012.

[5] M. Joshi, M. Dredze, W. W. Cohen, and C. P. Rose.What’s in a domain? multi-domain learning formulti-attribute data. In Proceedings of NAACL-HLT,pages 685–690, 2013.

[6] M. Kaminskas and F. Ricci. Location-adapted musicrecommendation using tags. In User Modeling,Adaption and Personalization, pages 183–194.Springer, 2011.

[7] T. Keim. Extending the applicability of recommendersystems: A multilayer framework for matching humanresources. In System Sciences, 2007. HICSS 2007.40th Annual Hawaii International Conference on,pages 169–169, 2007.

[8] D. Lee and P. Brusilovsky. Fighting informationoverflow with personalized comprehensive informationaccess: A proactive job recommender. In Autonomicand Autonomous Systems, 2007. ICAS07. ThirdInternational Conference on, pages 21–21, 2007.

[9] D. Lee and P. Brusilovsky. Proactive: Comprehensiveaccess to job information. Journal of InformationProcessing Systems, 8(4):721–738, December 2012.

[10] B. Li. Cross-domain collaborative filtering: A briefsurvey. In Tools with Artificial Intelligence (ICTAI),2011 23rd IEEE International Conference on, pages1085–1086. IEEE, 2011.

[11] H. M. Enhancing a Job Recommender with ImplicitUser Feedback. PhD thesis, Fakult Lt f§r Informatikder Technischen Universit Lt Wien, 2011.

[12] R. Rafter, K. Bradley, and B. Smyth. Personalizedretrieval for online recruitment services. In InProceedings of the 22nd Annual Colloquium onInformation Retrieval, 2000.

[13] H. Ritzema. Frequency and regression analysis(chapter 6). Drainage principles and applications,16:175–224, 1994.

[14] S. Sahebi and P. Brusilovsky. Cross-domaincollaborative recommendation in a cold-start context:The impact of user profile size on the quality ofrecommendation. In User Modeling, Adaptation, andPersonalization, pages 289–295. Springer, 2013.

63

content-based cross-domain recommendations using segmented...

Documents