2012 draft - to appear - · pdf file(3.22) ... (4.4) ..... 113. draft - final ... this...

Providing Actionable Recommendations:

Design and Evaluation of a Method for Provision of

Recommendations and Effective Explanations thereof.

Dipl.-Oek. Paul Marx

A dissertation thesis submitted in fulfillment

of the requirements for the degree of

Doctor rerum politicarum

of the

Bauhaus-University of Weimar

Faculty of Media

July 2011

DRAFT -

final

revisi

on to

appe

ar in

2012

DRAFT -

final

revisi

on to

appe

ar in

2012

Acknowledgements iii

Acknowledgements

A paradox of a dissertation project is that it is always accomplished by a single person

but actually represents the result of a joint effort of plenty of individuals. I want to thank all

these people for their invaluable contribution to the success of my work – without you, I

would be have been facing hard times.

First of all, a great, big thank you goes to my supervisor Thorsten Hennig-Thurau for

waking my interest for the project and academic work along with his exceptional, never end-

ing and encouraging personal example of how hard work with the attention to details yields

fruits and satisfaction as well as professional advancements. Thank you to Tobias Bauckhage,

the CEO of MoviePilot, for providing invaluable data for my experiments.

Thanks are also due to all my teachers, who taught me the value of constant learning

and inspired my curiosity and respect for the unknown. Especially, I would like to thank the

teachers of Novosibirsk Aerospace Lyceum and professors of the Aircraft faculty of the No-

vosibirsk State Technical University as well as the professors of Khristianovich Institute of

Theoretical and Applied Mechanics. I am proud to have studied there. Special mention also

goes to Anne Priller and Denis Rechkin for teaching me languages: Anne for English and

Denis for C#.

And of course, my eternal gratitude belongs to my family. Thank you to my parents for

making me out of me, for your absolute love and continuing support throughout my life and

especially during writing this thesis. Thank you to my wife Elena for your patience and un-

derstanding. Thank you to my kids Vera and Michael for reminding me that there are other

important things going on out there in the world. Thank you for all your support. I have no

doubt that this thesis would not have been possible without you.

Langenhagen, July 2011 Paul Marx

DRAFT -

final

revisi

on to

appe

ar in

2012

DRAFT -

final

revisi

on to

appe

ar in

2012

Table of Contents v

Table of Contents

Glossary .................................................................................................................................. viii

List of Tables ............................................................................................................................. x

List of Figures .......................................................................................................................... xi

List of Equations ..................................................................................................................... xii

1 Introduction and Motivation ........................................................................................... 1

1.1 Motivation ................................................................................................................... 1

1.2 Objectives .................................................................................................................... 6

1.3 Outline of the Thesis .................................................................................................... 7

2 Background and Related Work ....................................................................................... 8

2.1 Explanations in Recommender Systems ...................................................................... 8

2.1.1 Relevance and Advantages of Explanation Facilities .......................................... 9

2.1.2 Explanation Styles .............................................................................................. 14

2.1.3 Explanations within Recommendation Process ................................................. 18

2.1.4 Summary ............................................................................................................ 21

2.2 Movie Related Preferences and Relevant Movie Characteristics .............................. 23

2.2.1 Operationalizing Preferences: Multiattribute Utility Model

and Weighted Additive Decision Rule ............................................................... 23

2.2.2 Preference Relevant Attributes of Motion Pictures ............................................ 26

2.2.3 Summary ............................................................................................................ 34

2.3 Key Recommendation Techniques ............................................................................ 35

DRAFT -

final

revisi

on to

appe

ar in

2012

Table of Contents vi

2.3.1 Collaborative Filtering ....................................................................................... 36

2.3.1.1 User-based Approach .................................................................................. 36

2.3.1.2 Item-based Approach .................................................................................. 42

2.3.1.3 Matrix Factorization and Latent Factor Models ......................................... 44

2.3.2 Content-based Filtering ...................................................................................... 49

2.3.2.1 The Principles of Content-based approaches .............................................. 50

2.3.2.2 Exploiting Content Characteristics in Non-textual Item Domains ............. 54

2.3.3 Trade-offs and Problems of Collaborative and Content-based Approaches ...... 60

2.3.3.1 Data Sparcity ............................................................................................... 61

2.3.3.2 “Ramp-up”: New User and New Item Problems ........................................ 62

2.3.3.3 Overspecialization ....................................................................................... 63

2.3.3.4 “Gray Sheep”, “Starvation” and Shilling Attacks. ...................................... 64

2.3.3.5 Stability vs. Plasticity .................................................................................. 65

2.3.4 Hybrid Recommender Systems .......................................................................... 66

2.3.4.1 Principles of Hybrid Methods ..................................................................... 66

2.3.4.2 Explanations in Hybrid Approaches ........................................................... 68

2.4 Summary .................................................................................................................... 70

3 Conceptual Framework of a Hybrid Recommender System

that allows for Effective Explanations of Recommendations ...................................... 72

3.1 Modeling User Preferences ....................................................................................... 73

3.1.1 Motivation of the Approach ............................................................................... 73

3.1.2 Basic Model of User Preferences ....................................................................... 74

3.1.3 Accounting for Static Effects beyond the User-Item Interaction ....................... 76

3.1.4 Accounting for Time .......................................................................................... 78

3.2 Estimating Model Parameters .................................................................................... 82

3.2.1 Step 1: Estimation of Initial Parameter Values .................................................. 84

DRAFT -

final

revisi

on to

appe

ar in

2012

Table of Contents vii

3.2.1.1 Omitted Variable Bias in OLS Models and a Method to Counteract the

Biasness 85

3.2.1.2 Estimating User and Item Related Effects .................................................. 89

3.2.1.3 Estimating Attribute Part-worths ................................................................ 91

3.2.2 Step 2: Optimization of the Parameters .............................................................. 95

3.3 Hybridization with Collaborative Approaches ........................................................ 100

3.3.1 Motivation for Hybridization ........................................................................... 100

3.3.2 Methods to Hybridize and the Method of Hybridization ................................. 102

4 Empirical Study ............................................................................................................ 105

4.1 Datasets and their Properties ................................................................................... 107

4.2 Measures of Prediction Accuracy ............................................................................ 112

4.3 Employed Algorithms and Benchmarks .................................................................. 114

4.4 Results ..................................................................................................................... 116

4.4.1 Comparison of Prediction Accuracy ................................................................ 116

4.4.2 Provided Explanation Style .............................................................................. 124

4.5 Summary .................................................................................................................. 126

5 Conclusions and Future Work ..................................................................................... 128

5.1 Research Summary, Findings and Contributions .................................................... 128

5.2 Discussion and Implications .................................................................................... 135

5.3 Future Research ....................................................................................................... 136

Bibliography ......................................................................................................................... 139

Appendix A: Sources of Error in Recommender Systems ............................................... 160

Appendix B: List of Preference Relevant Attributes ........................................................ 165

Appendix C: Technical Details of Prediction Accuracy Tests ......................................... 168

DRAFT -

final

revisi

on to

appe

ar in

2012

Glossary viii

Glossary

ACM Association for Computing Machinery

CB Content-Based filtering

CF Collaborative Filtering

CSCW Computer Supported Cooperative Work

DFG Deutsche ForschungsGemeinschaft (German Research Foundation)

DVD Digital Versatile Disk

EBA Elimination By Aspecs

esp. Especially

GB GigaByte

GHz GigaHerz

GPS Global Positioning System

IDF Inverse Document Frequency

IMDb Internet Movie Database

kNN k Nearest Neighbor

MAE Mean Absolute Error

MAU Multiattribute Utility

MDS MultiDimensional Scaling

MF Matrix Factorization

NMAE Normalized Mean Absolute Error

DRAFT -

final

revisi

on to

appe

ar in

2012

Glossary ix

NRMSE Normalized Root Mean Squared Error

OLS Ordinary Least Squares

RAM Random-Access Memory

RecSys Recommender Systems

RMSE Root Mean Squared Error

RS Recommender System; Recommender Systems

SD Standard Deviation

SE Standard Error

SVD Singular Value Decomposition

TF Term Frequency

TF-IDF Term Frequency - Inverse Document Frequency

WAAD Weighted ADDitive linear model

w.r.t. with respect to

DRAFT -

final

revisi

on to

appe

ar in

2012

List of Tables x

List of Tables

Table 2.1: Reasons and benefits for provision of explanations ................................................ 13

Table 2.2: Summary of motion picture success factors ........................................................... 28

Table 2.3: Summary of preference relevant movie attributes .................................................. 33

Table 2.4: Ratings database for collaborative filtering ............................................................ 37

Table 2.5: Principle of content-based filtering ......................................................................... 51

Table 2.6: Summary of strengths and weaknesses

of different recommendation approaches .............................................................. 60

Table 4.1: Descriptive statistics of the raw rating datasets .................................................... 109

Table 4.2: Descriptive statistics of the datasets employed in the study ................................. 110

Table 4.3: Comparison of the prediction accuracy of different algorithms

for MoviePilot dataset ......................................................................................... 117

Table 4.4: Comparison of the prediction accuracy of different algorithms

for Netflix dataset ................................................................................................ 118

Table 4.5: Distribution parameters of the absolute prediction error

of the optimization step ....................................................................................... 121

Table 4.6: Accuracy improvement of the hybrid method ...................................................... 123

Table 4.7: Provided explanation style .................................................................................... 124

Table C.1: Overview of the employed source code snippets from Press et al. 2007 ............. 169

DRAFT -

final

revisi

on to

appe

ar in

2012

List of Figures xi

List of Figures

Figure 2.1: Comparing three user rating profiles ..................................................................... 39

Figure 2.2: Comparing three movie rating profiles .................................................................. 43

Figure 2.3: A simplified illustration of the latent factor approach ........................................... 45

Figure 2.4: Illustration of the extraction of a features vector from a document ...................... 50

Figure 3.1: Decomposition of a time changing measure in three components:

baseline, long-term trend, and short-term fluctuations .......................................... 79

Figure 3.2: Successive minimization with gradient methods ................................................... 96

Figure 3.3: Flowchart of the optimization step ........................................................................ 98

Figure 4.1: Rating scales in user interfaces of recommender systems ................................... 108

DRAFT -

final

revisi

on to

appe

ar in

2012

List of Equations xii

List of Equations

(2.1) .......................................................................................................................................... 23

(2.2) .......................................................................................................................................... 24

(2.3) .......................................................................................................................................... 35

(2.4) .......................................................................................................................................... 38

(2.5) .......................................................................................................................................... 38

(2.6) .......................................................................................................................................... 40

(2.7) .......................................................................................................................................... 40

(2.8) .......................................................................................................................................... 40

(2.9) .......................................................................................................................................... 40

(2.10) ........................................................................................................................................ 43

(2.11) ........................................................................................................................................ 44

(2.12) ........................................................................................................................................ 46

(2.13) ........................................................................................................................................ 47

(2.14) ........................................................................................................................................ 47

(2.15) ........................................................................................................................................ 48

(2.16) ........................................................................................................................................ 53

(2.17) ........................................................................................................................................ 53

(2.18) ........................................................................................................................................ 53

(2.19) ........................................................................................................................................ 56

(2.20) ........................................................................................................................................ 57

(3.1) .......................................................................................................................................... 75

(3.2) .......................................................................................................................................... 75

(3.3) .......................................................................................................................................... 76

(3.4) .......................................................................................................................................... 77

(3.5) .......................................................................................................................................... 79

(3.6) .......................................................................................................................................... 81

(3.7) .......................................................................................................................................... 81

(3.8) .......................................................................................................................................... 83

DRAFT -

final

revisi

on to

appe

ar in

2012

List of Equations xiii

(3.9) .......................................................................................................................................... 85

(3.10) ........................................................................................................................................ 85

(3.11) ........................................................................................................................................ 87

(3.12) ........................................................................................................................................ 87

(3.13) ........................................................................................................................................ 87

(3.14) ........................................................................................................................................ 87

(3.15) ........................................................................................................................................ 88

(3.16) ........................................................................................................................................ 88

(3.17) ........................................................................................................................................ 88

(3.18) ........................................................................................................................................ 88

(3.19) ........................................................................................................................................ 90

(3.20) ........................................................................................................................................ 93

(3.21) ........................................................................................................................................ 94

(3.22) ........................................................................................................................................ 94

(3.23) ........................................................................................................................................ 99

(4.1) ........................................................................................................................................ 112

(4.2) ........................................................................................................................................ 112

(4.3) ........................................................................................................................................ 113

(4.4) ........................................................................................................................................ 113

DRAFT -

final

revisi

on to

appe

ar in

2012

DRAFT -

final

revisi

on to

appe

ar in

2012

Chapter 1: Introduction and Motivation 1

Chapter 1

Introduction and Motivation

1 Introduction and Motivation

This chapter describes the motivation leading to the presentation of the thesis. The ob-

jectives of this thesis and the subjects included in this document are briefly explained. The

chapter ends describing the structure and contents of the thesis.

1.1 Motivation

Recommendations are a part of everyday life. It is natural for people to seek recommen-

dations whenever they are going to make a decision about a particular item or action. We rely

on recommendations coming from different sources such as other people, bestseller lists, trav-

el guides, test reports, technical reviews, restaurant and movie critics and so forth. Personal-

ized recommender systems (RS) are intended to support and augment this natural social pro-

cess by helping their users find the most interesting and valuable items for them in a quick

and efficient way.

On the internet, where the service providers are not bound to the shelves‟ space and thus

can carry far more inventory than traditional retailers1, the choice task becomes overwhelming

1 For instance, the internet music shop Raphsody offers 19 times as many songs as Wal-Mart’s stocks of 39,000

tunes. Amazon’s offering includes 2.3 million books while specialized book retailers can carry up to maximum

DRAFT -

final

revisi

on to

appe

ar in

2012


to the customers, making it nearly impossible to land at optimal selection decisions – some-

thing being referred to as the information overload problem (Jacoby, Speller, and Berning

1974; Anderson 2004). In such situations people strive to minimize their search effort, i.e.

they are eager not to be overloaded by a vast amount of irrelevant offerings they are not inter-

ested in and require only such items to be presented, which are at least potentially valuable for

them (Herlocker et al. 2004). Recommending relevant items (e.g. product offerings such as

books, CDs, movies, etc.) to their users RS not only largely mitigate the information overload

problem on the users‟ side but also support sales at online stores: RS allow e-commerce pro-

viders to increase their up-selling and cross-selling potentials (Schafer, Konstan, and Riedl

2001; Bodapati 2008) as well as help them to better manage customer relationships that lead

to higher loyalty and greater competitive barriers (Wei, Shaw, and Easley 2002; Ricci,

Rokach, and Shapira 2011). In other words, RS allow both counterparts of a business transac-

tion to considerably benefit from it by solving their tasks more efficiently.

Accordingly, recommender systems have already found their way into many commer-

cial applications and established themselves as an important component of online stores

(Shafer et al. 1999; Ansari et al. 2000). Indeed, most internet users have come across a rec-

ommender system in one way or another. A prominent example of a commercial RS is Ama-

zon‟s2 service of offering personalized book recommendations, which is also widely known as

“Customers Who Bought This Item Also Bought”. An online DVD rental and video streaming

service, Netflix3, recommends its subscribers a movie to watch next in the form “If You Liked

This Movie You Will Also Like”. Last.fm4 and Pandora

5 offer their users to create their own

“personalized radio stations” online, which then play songs in accordance with the user‟s

taste. Mendeley6, a researcher community web site, recommends scientific articles to read. A

pure movie recommendation service, Moviepilot7, offers a series of recommendation systems:

one of them produces forecasts of how good a user will find a particular movie, while the se-

cond one suggest a cinema nearby showing the movie, which s/he will like most of all movies

currently running in the cinemas. The third one ranks in the real-time the TV program con-

130,000 book items. The online DVD rental Netflix offers 25,000 DVDs whereas an average inventory of a con-ventional Store consists only of 3,000 DVDs (Anderson 2004). 2 http://www.amazon.com

3 http://www.netflix.com

4 http://www.last.fm

5 http://www.pandora.com

6 http://www.mendeley.com

7 http://www.moviepilot.com

DRAFT -

final

revisi

on to

appe

ar in

2012


sistent with the user‟s preferences and then recommends a channel to watch. Besides conven-

tional goods and food other widespread examples of the domains where RS are employed also

include recommending of restaurants, jokes, news, physicians, lawyers, sightseeing places,

vacation resorts, libraries, web sites, acquaintances, sport centers, and even lifestyles. Finally,

the fact that Netflix has recently awarded one million dollar prize to the team that first suc-

ceeded to substantially improve the performance of its own recommender algorithm (Koren,

Bell, and Volinsky 2009) convincingly indicates the importance of the RS for the industry.

At the same time the research interest in recommender systems has dramatically in-

creased. In accordance with EBSCO Business Source Premier Database over 300 scientific

papers were published explicitly on this topic in the last fifteen years. Conferences and work-

shops on RS became premier annual events8. Sessions dedicated to RS are frequently included

in the more traditional conferences in the area of information systems9. Furthermore, several

noted academic journals presented special issues covering the research and developments in

the area of RS10

. The topic of recommender systems is also frequently tackled in the academic

publications in the field of psychology, e-commerce, and marketing11

.

Providing personalized recommendations, however, requires that the RS knows some-

thing about its users. Every RS must obtain and maintain a user profile, i.e. data that allows to

draw conclusions about what is relevant for the users. Such data may come, for example, from

the users‟ purchase history. In this case each purchase act or purchased item can be seen as an

expression of user‟s preference in the item‟s domain, thus, providing RS information about

what the user likes or in which part of the item‟s domain do his/her tastes or interests mani-

fest. Another source of information descriptive of users‟ preferences is the users‟ explicit rat-

ings to the items. Ratings are potentially more informative to RS as they also allow users to

8 We refer specifically to ACM Recommender Systems (RecSys), founded in 2007 and now taking place annual-

ly. 9 Among the conferences that included sessions dedicated to RS the most prominent ones are: ACM Special

Interest Group on Information Retrieval (SIGIR), User Modeling, Adaption and Personalization (UMAP), and ACM Special Interest Group on Management of Data (SIGMOD) (Ricci et al. 2011, p. 3). 10

Among the journals that presented special issues on RS are: AI Communications (2008), IEEE Intelligent Sys-tems (2007), International Journal of Electronic Commerce (2006), International Journal of Computer Science and Applications (2006), ACM Transactions on Computer-Human Interaction (2005), and ACM Transactions on Information Systems (2004) (Ricci et al. 2011, p. 3). 11

For example: Hennig-Thurau, Marchand, and Marx (2011), Hennig-Thurau et al. (2010), Bodapati (2008), Aksoy et al. (2006), Yuanping, Feinberg, and Wedel (2006), Fritzmons and Lehmann (2004), Rutkovsky, Senecal, and Nantel (2004), Fairchild and Rijsman (2004), Gershoff, Mukherjee, and Mukhopadhyay(2003), Cooke et al. (2002), Mild and Natter (2002), Ansari, Essagier, and Kohli (2000).

DRAFT -

final

revisi

on to

appe

ar in

2012


indicate the amount and the direction of the preference they associate with an item, i.e. the

degree to which the item is liked – or even disliked.

Once the user profiles are acquired RS can begin to produce recommendations. This is

usually done by the means of numeric algorithms that exploit the data from the user profiles

and the items catalogue. In accordance with the modern literature on recommender systems,

three state-of-the-art recommendation approaches can be distinguished: content-based, col-

laborative, and hybrid approaches (Balabanovic and Shoham 1997; Adomavicius and Tuzhil-

in 2005). In each given case the choice of a recommendation approach depends heavily on the

type of the user profiles data and the characteristics of the item domain it is applied to. These

approaches will be discussed in-depth below in Chapter 2.1. At this point, it is worth mention-

ing that personalized recommendations arise from a process, which relies greatly on the quali-

ty of the input data, i.e. the user and item profiles, and the characteristics of the underlying

algorithms.

The numeric algorithms, in their turn, are subject to errors that may result from a num-

ber of factors, such as incompleteness of data, data input and profile extraction errors, algo-

rithmic processing errors, and misspecification of the user decision strategy model (Herlocker,

Konstan, and Riedl 2000; Aksoy et al. 2006). By presenting users erroneously predicted rec-

ommendations RS risk to compromise their credibility for the users as well as the users‟ trust,

which may result in detracting and losing customers (Sinha and Swearingen 2002; Gershoff,

Mukherjee, and Mukhopadhyay 2003; O‟Donovan and Smith 2005; Cramer et al. 2008). This

issue raises two questions:

(i) How can the recommendation algorithms be improved in order to reduce the er-

ror rate and magnitude?

(ii) How can the negative effect of inaccurate recommendations on acceptance and

trust be mitigated?

While the first question is directly related to the numeric algorithms, the second one is

typically addressed in modern RS literature through the issue of explanations. That is, provid-

ing personalized explanations can reduce the negative effects on inaccurate recommendations

thus improving credibility and trust of the RS (Herlocker, Konstan, and Riedl 2000). Alt-

hough both questions have been studied in the extant literature, there is still room for im-

provement in these research directions. An important shortcoming of the current research

DRAFT -

final

revisi

on to

appe

ar in

2012


could be seen in the fact that the both research streams have occurred mostly separated from

each other. We argue that the integrative approach to these independent research streams may

be beneficial for the reasons explained below.

Stimulated by Netflix‟ One Million Dollar Prize Competition, the research was primari-

ly concentrated on the accuracy of recommender algorithms. Having provided a movie rating

dataset of more than 100 million date-stamped ratings performed by about half million anon-

ymous Netflix customers on 17,770 movies (Bennet and Lanning 2007) Netflix indirectly

influenced the research by focusing it on this available data. The concentration solely on the

rating data was additionally aggravated by limited ability of the contemporary information

processing algorithms to automatically extract meaningful attributes, descriptive to multime-

dia content, i.e. movies (Wei, Shaw, Easely 2002; Pazzani, Billsus 1997; Lops, de Gemmis,

and Semeraro 2011). Consequently, the movie characteristics such as stars, budgets, country

of origin, etc. were not handled adequately by recommender research. The fact that movie

research provides evidence that these characteristics significantly influence the movie success

as a result of consumers‟ preferences (Hennig-Thurau, Houston, and Walsh 2006) was largely

ignored in the RS literature12

. We argue that incorporating such characteristics in the recom-

mendation process can be fruitful at least for the following reasons:

Firstly, capturing the attribute-related movie preferences offers potentially more infor-

mation than the rating data does alone. This allows addressing the user preferences in a more

flexible way and at a finer level of resolution while generating recommendations, thus leading

to potentially more precise predictions of user ratings, i.e. overall preferences, towards partic-

ular items. Secondly, having the attribute-related preference information at hand, it is possible

to align the recommendation process with the users‟ preference structures and so to reflect the

users‟ intrinsic attribute weights and their decision strategies within the recommendation gen-

eration procedure. According to Aksoy et al. (2006) this leads to higher choice efficiency at

the user‟s side. Thirdly, knowing the attribute-related weights that lead to a particular recom-

mendation allows RS to provide users with reasons underlying recommendation, i.e. personal-

ized explanations. This increases recommender transparency and credibility (Sinha and

Swearingen 2002; Cramer et al. 2008; Herlocker, Konstan, and Riedl 2000), as well as it of-

12 When preferences towards movie attributes were used in extant work (e.g. Ying, Feinberg, and Wedel 2006),

the choice of the attributes was either based on information availability, not a thorough study of relevant at-tributes, or the attributes were used for post processing of recommendation generation (e.g. Symenoidis, Na-nopoulos, and Manolopoulos 2009).

DRAFT -

final

revisi

on to

appe

ar in

2012


fers other benefits for the users, thus, reducing negative effects of inaccurate recommenda-

tions13

. On the other side, addressing preferences at the attribute level increases the degree of

detail at which the explanations can be provided. Moreover, because preference relevant at-

tributes can be taken into account, such explanations are enabled to emphasize those aspects

of the items, which users themselves consider important while evaluating the items. Conse-

quently, such explanations can be better understood by the users, thus, being potentially more

valuable and, what is also important, actionable for them. Intuitively, the reliability of attrib-

ute-based explanations depends on the ability of the underlying recommendation algorithm to

handle the preferences on the attribute level. So we conclude that the questions of improving

the accuracy of the recommendation algorithms and the handling of inaccurate recommenda-

tions are not mutually independent, but rather complementing. Hence, these questions should

be addressed simultaneously within an integrative approach.

The considerations exposed above motivate the presentation of the current thesis and

build the basis for the objectives that will be formulated in the next Section.

1.2 Objectives

Following the considerations and reasons provided in the previous section, the current

thesis targets for developing a recommendation method which is capable of providing both,

accurately predicted recommendations and actionable explanations of the reasoning behind

them, as well as aligning the recommendation process with the user preferences.

Contrary to the typical RS research approach of building an explanation facility around

pre-calculated recommendations, we aim to incorporate the ability of providing explanations

already within the basis framework of the recommendation algorithm.

The objectives stated should be accomplished by the means of incorporating the attrib-

ute-based preferences into the recommendation process. Through an integrated consideration

of algorithmic and explanatory issues of RS we aim to combine the advantages of pure algo-

13 These positive effects of explanations will be discussed in Chapter 2.1

DRAFT -

final

revisi

on to

appe

ar in

2012


rithmic accuracy and the benefits offered by explanation facilities while mitigating the disad-

vantages of the respective issues.

Finally, despite the limited algorithmic abilities to automatically process the multimedia

content the development of the recommendation method should occur having the domain of

the motion pictures in focus. This represents an additional challenge for our research, while

enhancing its contribution to the RS literature.

1.3 Outline of the Thesis

This document is structured in five chapters with a bibliography section and appendixes

at the end.

Chapter 2 presents the description of the research related to the objectives of this thesis.

In particular, it encompasses the study of the research on multiattribute utility, movie prefer-

ences, explanations of the recommendations, and an overview of the contemporary recom-

mendation algorithms. Thus, this chapter provides us with indispensable information for de-

signing our proposals.

Chapter 3 describes our proposed conceptual framework to a recommendation algo-

rithm, which incorporates attribute-based preferences of the users and allows the aligning of

the recommendation process with the users‟ preference structures as well as it provides the

information needed for the generation of detailed and actionable explanations. This represents

the core of the current thesis.

In Chapter 4, the proposed algorithm is empirically tested using the real-world data of

commercial recommendation systems Moviepilot and Netflix. The accuracy of the proposed

method is compared against the accuracy of the state-of-the-art recommendation algorithms.

Additionally the comparison of the explanation details level over the different algorithms

takes place.

Finally, Chapter 5 concludes the thesis restating its main contributions and listing ave-

nues for further work.

DRAFT -

final

revisi

on to

appe

ar in

2012

Chapter 2: Background and Related Work 8

Chapter 2

Background and Related Work

2 Background and Related Work

This chapter sums up the theoretical background that underlies the proposals of the cur-

rent thesis and provides an overview of the work related to our objectives.

Specifically, the first section addresses the questions of why the explanations of the rea-

soning behind recommendations should be provided and how particularly this should be done.

The second section projects these findings into the domain of the motion pictures and elabo-

rates on the operationalization of the movie characteristics for their subsequent use in the pro-

cess of recommendation generation. The third section provides an overview of the key rec-

ommendation approaches and presents detailed descriptions of the correspondent recommen-

dation algorithms – the knowledge essential for development of a new recommendation

method. The fourth section recapitulates the main points of the theoretical discussion and

concludes the chapter.

2.1 Explanations in Recommender Systems

A cover issue of the Wall Street Journal from the year 2002 titled “If TiVo Thinks You

Are Gay, Here’s The Way How to Set it Right” describes users‟ frustration with irrelevant

choices made by their digital video recorder “TiVo” that records programs it assumes its

owner will like, based on shows s/he has chosen to record in the past. For instance, Mr.

Iwanyk suspected that his TiVo thought he was gay, since it inexplicably kept recording pro-

DRAFT -

final

revisi

on to

appe

ar in

2012


grams with gay themes. Another case described in the article concerns the founder of Ama-

zon.com Jeff Bezos. “For a live demonstration before an audience of 500 people, Mr. Bezos

once logged onto [amazon.com] to show how it caters to his interests. The top recommenda-

tion it gave him? The DVD for “Slave Girls From Beyond Infinity”. That popped up because

he had previously ordered “Barbarella”, starring Jane Fonda, a spokesman explains” (Zaslow

2006). While Mr.‟s Bzos could save the situation by providing a reasonable justification to a

risqué recommendation, Mr. Iwanyk, in absence of explanations, had to figure out how to put

things straight by his own. These examples already convincingly foreshadow the need of inte-

gration of explanation facilities into recommender systems.

More detailed evidence behind providing explanations as well as the foundation of the

criteria of how the explanations should be formed will be elaborated, with respect to our aims,

in the subsequent sections of this chapter.

2.1.1 Relevance and Advantages of Explanation Facilities

The idea of providing explanations to the users of intelligent systems is not new. Expla-

nations have repeatedly become an issue of the research dedicated to expert systems (e.g. Bu-

chanan and Shorliffe 1984; Hovitz, Breeze, and Henrion 1988; Andersen, Olsen, and Jensen

1990; Johnson and Johnson 1993; Miller and Larson 1992; Sørmo, Cassens, and Aamodt

2005). So, for example, the most frequently referred expert system MICIN14

designed by

Shorliffe and Buchanan (1975) to assist physicians while prescribing antibiotics incorporated

an explanation facility as an important component. Having a knowledge base of about 600

rules it would ask the physician a series of simple yes/no questions in order to identify the

bacteria causing a patient‟s infection. At the end of the query process, the expert system pro-

vided a list of possible bacteria ranked from high to low based on the probability of each di-

agnosis and recommended a course of drug treatment. MYCIN has also provided the reason-

14 The name MYCIN is not an acronym but was rather derived from the typical for antibiotics suffix “-mycin” the

expert system is intended to prescribe.

DRAFT -

final

revisi

on to

appe

ar in

2012


ing behind its recommendations, i.e. a list of questions and rules which led to particular diag-

nosis and its rank order.

Despite MYCIN's success as an expert system, the developers claimed that its power

was lesser related to the details of the underlying numeric model, but rather to its knowledge

representation and reasoning scheme, i.e. explanations that allowed physicians to control why

a conclusion was arrived at and how much was known about a certain concept. They conclude

that expert systems that act as decision guides need to provide explanations for their advice

(Buchanan and Shorliffe 1984).

Since MYCIN the need to provide explanations of the reasoning behind the recommen-

dations produced by expert systems has been widely recognized. It has been pointed out that

the explanation facilities are required for expert systems to be considered useful and accepta-

ble because they remove a black-box from around the recommendation process thus raising

the confidence in recommendations through providing users with transparency, i.e. the under-

standing of the model used and the ability to reassess recommended actions (Moore and

Swartout 1988; Hovitz, Breeze, and Henrion 1988; Majchrzak and Gasser 1991; Miller and

Larson 1992; Johnson and Johnson 1993; Brézillon and Pomerol 1996; Doyle, Tsymbal, and

Cunningham 2003; Lacave and Diéz 2004).

Because RS and expert systems have common roots and strive for similar goals – provi-

sion of recommendations that help users make their choices more efficiently – RS can be con-

sidered successors of expert systems. Hence, the arguments supporting the provision of ex-

planations to recommendations in the domain of the expert systems remain valid also in the

domain of the RS (Herlocker, Konstan, and Riedl 2000; see also Tintarev and Masthoff 2008;

Cramer et al. 2008).

Similarly to the expert systems, explanations play a crucial role in RS. They bring

transparency into the recommendation process of RS and provide users with an instrument to

handle errors that come along with recommendations (Herlocker, Konstan, and Riedl 2000).

The importance of such an instrument cannot be overestimated:

Firstly, it is natural for humans to ask for reasoning while handling recommendations.

“Consider how we […] handle suggestions as they are given to us by other humans. We rec-

ognize that other humans are imperfect recommenders. In the process of deciding to accept a

recommendation from a friend, we might consider the quality of previous recommendations

DRAFT -

final

revisi

on to

appe

ar in

2012


by the friend or we may compare how that friend‟s general interests compare to ours in the

domain of the suggestion. However, if there is any doubt, we will ask “why?” and let the

friend explain their reasoning behind a suggestion. Then we can analyze the logic of the sug-

gestion and determine for ourselves if the evidence is strong enough.” (Herlocker, Konstan,

and Riedl 2000, p. 242).

Secondly, the recommendations generated by RS are inherently prone to errors. Auto-

mated recommender systems are in essence stochastic processes that infer their recommenda-

tions based on heuristic approximations of human processes by the means of numeric algo-

rithms. Their computations are done on extremely sparse and incomplete data. These two

conditions result in recommendations that are often correct and reliable, but also occasionally

very wrong, i.e. the suggestions generated by RS are subject to errors. The errors can be

caused, for example, by misspecification of the employed user model or by inadequate data

(see Appendix A for further details). A chance to receive an erroneous recommendation im-

pairs the users‟ acceptance and trust in RS. Explanations of the reasoning behind the recom-

mendations provide users with indications when to trust a recommendation and when to doubt

one. Helping the users to detect or estimate the likelihood of errors in recommendations, ex-

planations mitigate and may even recover the loss of acceptance and trust caused by errone-

ous recommendations (Herlocker, Konstan, and Riedl 2000).

In contrast to expert systems, the topic of the effects of transparence on acceptance and

trust was not yet extensively explored in the area of RS. To our knowledge, there exists only

one study that examines these effects (Cramer et al. 2008). Unfortunately this study is limited

to the domain of artworks and operates with rather a small sample of 60 persons divided into

three between-subject experiment settings, so that the findings can hardly be considered gen-

eralizable. Nevertheless, the study by Cramer et al. provides initial support for above stated

suitability of the transferring of the arguments that justify the provision of explanations to

recommendations from the area of expert systems into the area of the RS. So their findings

confirmed that explaining to the user why a recommendation was made (i.e. transparency)

significantly increases the acceptance of recommendations. In this study trust in RS itself was

not directly influenced by transparency. However, the results showed that the RS that provid-

ed explanations of the reasoning behind recommendations were perceived as more under-

standable by the users. Perceived understanding in turn correlated with perceived competence,

trust and acceptance of the system. This indicates that the effects of transparency on trust in

DRAFT -

final

revisi

on to

appe

ar in

2012


and perceived competence of the RS might either have not been surfaced in this study due to

the small sample-size, or this could also mean that these effects are mediated or moderated by

the perceived understanding of the explanations (which unfortunately was not tested by the

authors). Both of these possible outcomes point out to the importance of the transparency im-

plied by explanations for RS and specifically for users‟ trust in RS.

Many other arguments that prove the reasons for providing explanations and substanti-

ate their benefits for users and RS providers can be found in the literature on RS. However, in

their publications the authors cover mostly just a few arguments a time and without a claim of

providing a systematical overview of the reasons and benefits. To our knowledge, only two

groups of authors attempted to develop a systematic classification of the reasons behind

providing explanations in the recent work. However, they approach the derivation of their

classifications from different perspectives, so that the developed taxonomies are neither ex-

clusive, nor are they complete: While Herlocker, Konstan, and Riedl (2000), in their classifi-

cation, consider the benefits of explanations from the user‟s point of view and constrain their

considerations to the case of automated collaborative filtering systems, Tintarev and Masthoff

develop their taxonomy from the provider‟s perspective with an emphasis on the aims for

provision of explanations by different kinds of RS (Tintarev 2007; Tintarev and Masthoff

2007, 2011). Furthermore, in Herlocker‟s et al. classification all user benefits follow from the

transparency, whereas in Tintarev and Masthoff‟s variant transparency is just one of coequal

aims.

Table 2.1 summarizes the reasons and benefits for provision of explanations according

to the classifications of Herlocker, Konstan, and Riedl and Tintarev and Masthoff supple-

mented with arguments of Chen (2009) that do not fall into either of the upper mentioned

classifications15

. With this classification we still do not raise the claim of completeness, but

rather aim to expand our understanding of the topic and emphasize the need of explanations in

RS and thus the need of recommendation algorithms that allow the generation of comprehen-

sive explanations.

15 Other authors also have elaborated on the reasons for provision of explanations. However, as mentioned

above, their arguments are rather fragmented and are either complementary with the points suggested in Table 2.1 (e.g. Sinha and Swearigen 2002; O’Donovan and Smith 2005; Cramer et al. 2008; Symeonidis, Napo-poulos, and Manopoulos 2008; Jannach et al. 2011) or have served as a basis for the aforementioned publica-tions. For the sake of brevity we do not refer to the latter works here and kindly ask the interested readers to consult Herlocker, Konstan and Riedl (2000) and Tintarev and Masthoff (2011) for correspondent references.

DRAFT -

final

revisi

on to

appe

ar in

2012


Table 2.1: Reasons and benefits for provision of explanations

Reason/Benefit Definition Author(s)

Justification / Validation User understanding of the reasoning, so

that he may decide how much confi-

dence to place in recommendation

Herlocker, Konstan

& Riedl (2000)

User Involvement Allow user to add his knowledge and

inference skills to complete decision

process

Education Help users to understand strengths and

limitations of the system, to help them

better understand the product domain

Acceptance Greater acceptance of RS, because its

strengths and limits are fully visible and

its suggestions are justified

Transparency Explain how the system works, why one

item was preferred over another

Tintarev 2007,

Tintarev & Masthoff

(2007, 2011)

Scrutability Allow users to tell the system is wrong,

justify why additional information is

needed

Trust and Credibility Increase users’ confidence in the sys-

tem, hence reduce complexity of deci-

sion making in uncertain situations

Effectiveness Help users make better decisions

Efficiency Help users make decisions faster, re-

duce decision-making effort, i.e. time

needed or cognitive effort

Persuasiveness Change users’ buying behavior, con-

vince users to try or buy

Satisfaction Increase the ease of usability, enjoy-

ment and customer return rate

Address contextual needs Help user to determine if the recom-

mendation is suitable in the user’s given

context or situation

Chen (2009)

Uncover hidden criteria Help users uncover important choice

criteria they did not perceive relevant

before

Solve preference conflicts Make preferable option more evident

due to additional preference relevant

information

DRAFT -

final

revisi

on to

appe

ar in

2012


At this point it is, however, important to mention that the reasons and benefits of

providing explanations, although identified in Table 2.1 as distinct, are not mutually inde-

pendent and thus may interact. So, for example, providing explanations for justification may

also help uncover hidden preferences, increase the decision’s efficiency and effectiveness as

well as increase satisfaction and trust (Herlocker, Konstan, and Riedl 2000; Tintarev and

Masthoff 2007).

Because of the advantages and benefits discussed above, as well as because of their pos-

itive interactions it seems sensible to provide explanation facilities for RS. The next section

elaborates on the question about how the explanations should be formed, i.e. what explanation

style our recommendation algorithm should allow for.

2.1.2 Explanation Styles

The capability of providing personalized explanations varies across different recom-

mendation approaches, being very limited in collaborative filtering approaches and the most

informative in the case of content-based ones (Tintarev and Mashoff 2007, 2011; Jannach et

al. 2011, p. 165).

Collaborative filtering (CF) approaches predict their recommendations based solely on

the holistic preference data, i.e. ratings of items or buying acts. Due to this fact the explana-

tion ability of these approaches is limited, allowing only for two kinds of rather generalized

statements, such as (i) “customers who bought item X also bought items Y, Z, …” and (ii)

“item Y is recommended to you because you rated item X” (Symeonidis, Napopoulos, and

Manopoulos 2008)16

. The first kind of explanation statements mimics the human word-of-

mouth recommendation process (Jannach et al. 2011). It connects the user to whom the rec-

ommendations are presented, i.e. an active user, to other users who have rated the recom-

mended item. Because in this case, the underlying process produces recommendation on the

basis of user profile similarities, i.e. considers only the users who revealed preferences that

16 In the context of movie recommendations these statements can be correspondingly paraphrased into “peo-

ple who liked movie X also like Y” and “you will like movie Y because you liked movie X”.

DRAFT -

final

revisi

on to

appe

ar in

2012


are similar to those of the active user, this explanation style is referred to as “nearest neigh-

bor” style. In contrast, the second kind of statements connects the recommended item to the

items the same user has bought or rated in the past. In doing that, the system isolates the item

X that influenced the recommendation of Y the most. Thus, this explanation style is denoted

as the “influence” style in the literature (Tintarev and Masthoff 2007, 2011; Symeonidis, Na-

popoulos, and Manopoulos 2008).

In contrast to CF, content-based (CB) filtering systems utilize attribute level preferences

for the generation of recommendations17

. Thus, they are able to explain their recommenda-

tions on a finer resolution level where the item attributes that are relevant for the building of

the users‟ preferences and their choice making can be individually addressed. Because the

attributes are typically extracted from the content of recommended items, the explanations are

said to be presented in the “content-based” (Symeonidis, Napopoulos, and Manopoulos 2008;

Jannach et al. 2011; Tintarev and Masthoff 2011) or “keyword” (Bilgic and Mooney 2005;

Tintarev 2007; Tintarev and Masthoff 2011) style18

. An example of such explanation could be

“This story received a high relevance score, because it contains the words f1, f2 and f3”19

(Bil-

lus and Pazzani 1999).

To this moment, only three studies that involve real users provide an evaluation of ex-

planation styles for RS. In the light of the goals of this thesis, the results of these studies can

be summarized as follows:

The study by Herlocker, Konstan and Riedl (2000) examined various variants of im-

plementations of explanation interfaces in the domain of the “MovieLens”20

– a CF movie

recommender system. Twenty-one variants of explanation presentations were compared to the

base case in which no explanations were provided. The results showed that the integration of

explanation facility, in many cases can significantly increase the acceptance of the recom-

mendations by the users, which generally supports the thesis of Section 2.1.1. However, the

acceptance can also decrease when the information provided exceeds cognitive skills of the

17 For detailled description of CB approaches see Section Fehler! Verweisquelle konnte nicht gefunden wer-

den.. 18

As the terms “content-based” and “keyword” explanation style are used largely synonymously, in the further narration we adopt the term “keyword explanation style” to avoid ambiguities. 19

For the domain of movie recommendation the example of a contend-based explanation can be altered to “we recommend you to watch this movie, because Bruce Willis acts in it and it was awarded an Oscar”. 20

http://www.movielens.com

DRAFT -

final

revisi

on to

appe

ar in

2012


users, i.e. cannot be easily understood. Particularly in cases when additional information such

as complex graph, percentage of agreement of closest neighbors, number of neighbors with

standard deviation or average correlation between the neighbors, was presented, the ac-

ceptance of recommendations decreased below the base line. That is, although such technical

details undoubtedly increase the transparence of the functioning of an RS, the users might not

consider them relevant for forming their decisions. Hence, transparence is only beneficial for

the users if they are able to cognitively handle it, i.e. if they can deduce and understand the

details provided about the ways the system produces recommendations. We interpret this fact

as being consistent with the conclusions of the Aksoy and colleagues that RS should “think

like the people they are attempting to help” (Aksoy et al. 2006, p. 310) and argue that this

conclusion maintains its validity with regard to explanations. That is, not only should RS

think like the users they support in decision making, but also should they explain their rec-

ommendations in terms the users evaluate their choices.

Bilgic and Mooney (2005) criticize Herlocker and colleagues for their overly narrow

concentration on the acceptance and inability to demonstrate that any of the explanation vari-

ants actually increased the users‟ satisfaction with items they eventually chose. They argue

that “the goal of a good explanation should not be to “sell” the user on a recommendation, but

rather, to enable the user to make a more accurate judgment of the true quality of an item”.

Therefore, the authors conducted a user study in which they evaluated different explanation

approaches according to how well they allow users to accurately predict their true opinion of

an item. The results showed that the users who were presented explanations in the nearest

neighbor style tend to overestimate the quality of the recommended items. The authors claim

that such an overestimation leads to mistrust and could cause users to stop using the system.

Keyword-style and influence-style explanations were found to be significantly more effective

at enabling accurate assessments, whereby the keyword explanation style dominated the influ-

ence style, though not significantly.

Symeonidis, Napopoulos, and Manopoulos (2008) conducted a survey to measure user

satisfaction against three styles of explanation. Based on the results of Bilgic and Mooney,

they omitted the nearest neighbor explanation style from their study and introduced a new

one that combined the keyword and influence styles and had the following form: “Item X is

recommended, because it contains features a, b, …, which are included in items Z, W, … that

DRAFT -

final

revisi

on to

appe

ar in

2012


you have already rated”21

. In the between-subject experiment design they recommended each

user a movie, justified by one of the three explanation styles. The users then were asked to

rate each explanation style separately to explicitly express their actual preference among the

three styles. The survey showed that the combined explanation style dominated both keyword

and influence style at a high significance level ( ). In this study, however, the influ-

ence explanation style performed better than the keyword style. Unfortunately, the authors do

not report the significance of the latter outcome, which might indicate consistency with Bilgic

and Mooney‟s results that show that the difference between the both styles is not significant.

The authors, however, argue that the keyword explanation style provides the advantages of

convenience and effectiveness over the influence style as it lesser tasks users‟ inference skills.

To further understand the advantages of the keyword explanation style, consider two

examples of explanations provided by a movie recommender: A keyword style explanation

could be that “Million Dollar Baby (2004) is recommended because it is a Drama directed by

Clint Eastwood and starring Morgan Freeman, which are features contained in the movies

you rated high.” In contrast, an influence style justification will be “Million Dollar Baby

(2004) is recommended because you gave high ratings to Unforgiven (1992), Se7en (1995)

and Gran Torino (2008)”. The latter explanation style burdens the user to make a connection

between the movies mentioned and understand their commonalities, e.g. that they all are dra-

mas, two of the movies were directed by Clint Eastwood and two others starring Morgan

Freeman. Although for a heavy movie watcher such commonalities can be easy to deduce, for

not so experienced movie consumers such an effort can be rather discouraging. Nevertheless,

it can be argued that the specification of the common features simplifies the inference process

for both of the user types.

From the above discussed observations it follows that the explanations are able to in-

crease the acceptance of RS and user satisfaction as well as they can help users to make better

choices. The keyword and influence explanation styles lead to both, higher user satisfaction

and better ability of the users to accurately judge the true quality of recommended items.

While the combination of both explanation styles leads to the best overall satisfaction with

21 A concrete wording employed in their study was “Recommended movie title: Indiana Jones and the last cru-

sade (1989). The reason for recommendation is the participant Harrison Ford, who appears in 5 movies you have rated.”

DRAFT -

final

revisi

on to

appe

ar in

2012


recommendation, its keyword part seems to be the most important for the users‟ ability to

efficiently judge the quality of recommendations.

2.1.3 Explanations within Recommendation Process

Another study worth mentioning in the context of explanation effectiveness and that we

have already referred to above is the study by Aksoy et al. (2006). This study does not directly

concern explanations but can add to our understanding of how the effective explanations

should be formed and what aspects a recommender algorithm should account for while pro-

ducing recommendations.

The authors examine the role of similarity between a RS and a consumer on the quality

of consumer choices. Two dimensions of similarity are considered: One dimension is the de-

gree to which consumer preferences for different product attributes are incorporated in the

process of the generation of a recommendation22

. Another dimension of similarity is the de-

gree to which the RS employ the decision-making strategies that are similar to those used by

consumers23

. Aksoy et al. hypothesize that the attributes weight similarity and perceived deci-

sion strategy similarity influence the decision quality independently of each other. Surprising-

ly, the results of preliminary study showed that using a RS that was similar in either attribute

weights or decision strategy led to consumer decisions of equal quality as the usage of an

agent that was similar on both of these aspects. Notably is that the authors verified this finding

in their main study and successfully replicated the results. This means that it is enough for a

RS to be similar to a user on one of both similarity dimensions in order to produce recom-

22 Recommendation algorithms differ with regard to the extent of incorporation of the user preferences in the

recommendation process. Some recommendation agents, like mySimon.com, provide randomly ordered alter-native lists that do not incorporate any information about consumer preferences. Other agents, like Ama-zon.com, indirectly elicit attribute importance information based on customer previous choices, which may or may not be concordant with the consumer’s own utility function. At last, there exist recommendation agents, such as activeBuyersGuide.com, which directly elicit consumer’s attribute importance weights and explicitly use them to rank alternatives (Aksoy et al. 2006; Diehl, Kornish, and Lynch 2003). 23

According to decision-making research consumer may employ a variety of cognitive strategies when choosing among products. These strategies range from compensatory decision strategies, such as the weighted additive model (WADD), to simplifying heuristics, such as lexicographic decision rules or elimination by aspects (EBA). For a comprehensive review see Bettmann, Johnson, and Payne (1991).

DRAFT -

final

revisi

on to

appe

ar in

2012


mendations that significantly increase the decision quality and reduce search effort. In addi-

tion, Aksoy et al. showed that web site loyalty and satisfaction also increase regardless of the

dimension on which a RS and the users are similar. On the contrary, dissimilarity in both at-

tribute weight and decision strategy hurts consumer welfare by increasing perceived costs,

reducing choice quality, and lowering web site loyalty. The latter makes consumers “believe

they make better decisions using no [recommendation] agent at all than using a doubly dis-

similar agent” (Aksoy et al. 2006, p. 311). Based on these findings the authors conclude that

the similarity between RS and consumers matters and that recommendation agents “should

think like the people they are attempting to help if the goal is to assist consumers in making

better choices” (Aksoy et al. 2006, p. 310).

In a further study Aksoy, Cooil, and Luire (2011) extend the outcomes of Aksoy et al.

(2006) by showing that the relative utility and the sum of attribute values of the chosen alter-

native capture the majority of variance in objective decision quality. Combined, these findings

support the suggestion by Ansari, Essegaier, and Kohli (2000) that RS that provide recom-

mendations based on preference models used in marketing (i.e. incorporate individual attrib-

ute importance weights) might lead to higher consumer choice effectiveness than RS that rec-

ommend products according to the preferences of other dissimilar consumers (i.e. through

collaborative filtering).

The results of Aksoy and colleagues (2006, 2011) emphasize the importance of the in-

dividual attribute preferences for RS as a whole and specifically for the process of recom-

mendation generation. In the light of this, the finding that it is enough to maintain either at-

tribute weight or decision strategy similarity allows a recommender algorithm to concentrate

on the first type of similarity, while maintaining reasonable decision quality at the user side.

The concentration on preference attributes within the recommendation process allows

generating explanations that address single attributes, i.e. producing explanations in the key-

word explanation style. As it was shown above, this allows users to efficiently judge the

quality of recommendations and increases the quality of the choice outcome. Additionally, the

inferred attribute preference weights allow to rank order the keywords within explanation

statements in accordance with their relative importance to individual users, and so, to poten-

tially simplify the choice task by emphasizing the most relevant keywords.

DRAFT -

final

revisi

on to

appe

ar in

2012


Taking into account the reduced need to maintain the similarity of decision strategy, the

algorithm may employ a weighted additive compensatory decision rule (WADD) in order to

ensure the highest quality of recommendations with respect to decision effectiveness. WADD

offers a RS at least four advantages:

Firstly, WADD is capable of processing user preferences at the attribute level. There-

fore, this decision strategy can easily be implemented within a numeric algorithm that strives

for an increase of users‟ choice efficiency addressing attribute-related user preferences in the

process of recommendation generation and calculation.

Secondly, WADD is found to lead to normatively best consumer decisions when com-

pared to heuristic decision procedures, i.e. simplifications of the choice process (Payne,

Bettmann, and Johnson 1988). Hence, the WADD model should produce the most effective

choices given that the consumer‟s attribute-related preference weights are known or can be

accurately estimated by a RS. The task of producing efficient recommendations is, therefore,

reduced to the task of eliciting users attribute-related preference weights. Both, employing

attribute preferences that are similar to the user‟s own ones and the use of a decision rule that

leads to the best consumer decisions potentially increases the robustness of recommendations,

i.e. makes a RS as a whole potentially more tolerant to violations of the premise of attribute

preference weight similarity, which may be caused, for example, by calculation errors.

Thirdly, from psychology it is known that consumers do not have a stable utility func-

tion (Jannach et al. 2011, p. 195). That is, the decision rule consumers use is subject to change

dependent on the context of a choice situation at hand, e.g. mood, cognitive effort, time pres-

sure, product involvement, consuming environment, uncertainty, etc. (Payne, Bettmann, and

Johnson 1993; Bettmann, Johnson, and Payne 1991). Under these circumstances if a RS pur-

suits to maintain decision strategy similarity, it would have to infer the decision strategy the

consumer currently employs each time s/he requests a recommendation. However, the process

of deriving the decision rule is likely to be time consuming and often leads to a cognitive

overload of the respondents. This diminishes the advantages of RS and potentially even elim-

inates them. Instead it seems reasonable for RS to employ a decision rule that works best in

overall (i.e. WADD), while maintaining the attribute preference weights similarity. Although

the consumer preferences may change over time, they are more likely to be persistent over

longer periods of time than the decision strategies are. Furthermore, the changes in prefer-

ences can be tracked automatically and without a need to interfere the user interaction with a

DRAFT -

final

revisi

on to

appe

ar in

2012


RS: The recalculation of the attribute preference weights can be triggered automatically after

each implicit user input, such as a buying act or item rating.

At last, because WADD is compensatory (i.e. accounts for preference valence so that

negative attribute-related preferences can compensate the positive ones), it allows the usage of

attributes of recommended items that exhibit negative preferences as negative cues in expla-

nation statements. This again offers potentials for increasing the choice efficiency, since sev-

eral researchers have found that consumers tend to place more weight on negative information

in making evaluations (Lutz 1975; Wright 1974; Kanouse and Hanson 1972; Ito, Larsen, and

Cacioppo 1998). The keyword explanation style can thus be extended to a “pros-and-cons”

style, leading to an explanation form, such as

“Titanic (1997) is recommended to you because it matches your preference highly.

Pros: High budget Hollywood movie directed by James Cameron.

Cons: You don’t like the movie’s drama genre and its star Leonardo Di Caprio.

Taking these factors into account, we expect that you will rate this movie 8 of 10.”

This explanation style maintains the advantages of the keyword explanation style with

respect to choice effectiveness and strengthens them by taking the advantages of negative

cues into consideration. It can be argued that because the item features here are derived direct-

ly from the attributes towards that the user preferences exist, a “pros-and-cons” explanation

involves the terms users actually employ in their evaluations. Hence, this style is informative,

understandable and actionable for the users.

2.1.4 Summary

Summarizing the discussion of Section 2.1, we can conclude that it seems sensible to

incorporate an explanation facility into RS because it provides a series of benefits for both

users and RS provider (see Table 2.1). Besides the increase in transparence, user acceptance

of, trust in, and loyalty to RS, explanations of the reasoning behind recommendations provide

users with an instrument to handle errors in recommendations and hence mitigating the nega-

tive effects of the latter. Furthermore, explanations allow users to form their judgments and

DRAFT -

final

revisi

on to

appe

ar in

2012


evaluate the recommendations more efficiently, which increases the choice quality and its

effectiveness.

However, in order for the benefits to surface it seems essential that (i) explanations pro-

vided are understandable to the users and (ii) recommendation process is concordant with the

way the users evaluate choice alternatives on at least one dimension: attribute preference

weights or decision strategy. A possible (conceptual) solution that fulfills both requirements

simultaneously is to incorporate user attribute preferences into the recommendation genera-

tion algorithm that employs a weighted additive decision rule (WADD).

This approach has several advantages: On the one hand, it reduces the recommendation

task to the task of eliciting attribute preference weights, without having to derive a decision

model for each user in each given recommendation setting, and thus unifies and simplifies the

problem of recommendation calculation. On the other hand, because the attribute preference

weights in this case are directly involved in the calculation of recommendations, the contribu-

tion of each attribute to every recommendation is known. Hence, this information can be used

straightaway for generating an explanation of the reasoning behind the recommendation in a

keyword explanation style, which is known to be understandable and actionable to the users

as well as to reasonably contribute to choice effectiveness. Finally, the keyword explanation

style can be extended to the “pros-and-cons” style hence offering a merit of negative cues that

play an important role in making evaluations and thus potentially further increase decision

effectiveness at the user side. Altogether, the proposed approach offers an integrative view to

the explanatory and algorithmic issues of RS within a common framework.

While the rationale for advantages of explanations was provided above in the Section

2.1, the algorithmic part will be elaborated on in the subsequent chapters. In the context of the

objectives of this thesis it is, however, important to clarify which attributes are relevant for the

preference building and choice making in the domain of movies and to provide background

knowledge about the state-of-the-art recommender algorithms. The following Sections 2.2

and 2.3 are dedicated to these questions.

DRAFT -

final

revisi

on to

appe

ar in

2012


2.2 Movie Related Preferences and Relevant Movie Characteristics

In the previous Section we proposed an integrative approach to the effective generation

of recommendations and explanations of the reasoning behind recommendations. This ap-

proach incorporates attribute preferences into the recommendation process and employs the

weighted additive model (WADD) for the generation of recommendations. At this point, in

order to develop a numeric algorithm that implements this approach in the movie domain

(which is the objective of the current thesis) and to enable the reader to comprehend its devel-

opment, further understanding of the involved topics is needed. The next two Subsections,

therefore, provide a brief overview of the topics of (i) operationalization of the preferences

and (ii) what attributes of the motion pictures are relevant for the preference formation in this

domain.

2.2.1 Operationalizing Preferences: Multiattribute Utility Model and

Weighted Additive Decision Rule

The concept of multiattribute utility (MAU) has a long history in the research fields of

psychology, decision-making and marketing (eg. Edwards 1954; Tversky 1967; Fishburn

1967; Green, Wind, and Jain 1972; Luce 1992; Caroll and Green 1995). This concept relies

on two fundamental notions: the principle of utility maximization and the decomposition hy-

pothesis. The former asserts that people make choices according to some criteria of worth.

Hence, each alternative is associated with a certain amount of utility ( ), so that the alterna-

tive that is considered best or preferred by a consumer over other alternatives has the highest

utility (Tversky 1967). In other words, if alternative A is preferred over alternative B, then it

applies:

(2.1)

DRAFT -

final

revisi

on to

appe

ar in

2012


The decomposition hypothesis states that the utility of an alternative can be decomposed

into basic independent components. That is, people are assumed to evaluate alternatives on a

set of their components, i.e. attributes (Tversky 1967). In doing so, they assign partial utilities,

i.e. part-worths, to each of the attributes of an alternative, which are thought to reflect the

amount of the preference that a consumer associates with the levels of the attributes that occur

within an evaluated alternative (Bettman, Johnson, Payne 1991)24

. Additionally, because the

relative importance of different attributes may vary with regard to the preference formation of

the consumer, the part-worths are weighted by the relative importance of the respective attrib-

utes ( ). Hence, the utility of a multiattribute alternative ( ) equals the sum of the part-

worths ( ) of its attributes weighted by their relative importance. Formally, this yields to:

∑

(2.2)

where

= utility of a multiattribute alternative

= relative importance of the j-th attribute

= part-worth of the k-th level of the j-th attribute

: attributes of an alternative

: levels of an attribute, embodied in the alternative

Equation (2.2) specifies an additive composition model of the multiattribute utility and,

thereby, represents an operationalization of preferences – because utility reflects preferences.

That is, the MAU model allows to rank order a set of alternatives (e.g. products, such as mov-

ies) with regard to consumer‟s preference, assuming that all the part-worths and all the corre-

spondent importance weights are known or can be elicited, for example, by the means of a

numeric algorithm.

Such a procedure of rank ordering corresponds to the weighted additive (WADD) deci-

sion rule (Bettman, Johnson, and Payne 1991; Corner and Kirkwood 1991; Weiss, Weiss, and

24 To further understand the relations between alternatives, attributes, and attribute levels, consider an exam-

ple of choosing between different models of cellular phones. Each model represents a (choice) alternative, which may be evaluated on its attributes such as brand, display size, battery durability, price, etc. The levels of the attribute brand may be, e.g., Motorolla, Samsung, Siemens, HTC, etc.; the levels of the attribute price may be, e.g., €20, €60, €120, etc. While a consumer may consider price more important than brand (i.e. the attrib-ute importance weight of price is higher than the one of the brand), s/he might prefer cheaper phones over expensive ones (i.e. the part-worth of €20 is higher than those of €60 and €120).

DRAFT -

final

revisi

on to

appe

ar in

2012


Edwards 2009). Concordant to the above described principle of decomposition, WADD sug-

gest a normative procedure of decision making that involves the consideration of all the rele-

vant information about the problem. That is, the WADD rule considers the values of each

alternative on all the relevant attributes as well as all the relative importance weights of the

attributes to the individual (Bettman, Johnson, and Payne 1991).

Although MAU and WADD have been criticized for their restricted ability to describe

how individuals actually make choices (Simon 1982; Edwards 1961; Luce 1992) and a series

of simplifying heuristics was suggested to describe the actual choice behavior better under

certain circumstances, e.g. time pressure, routine choosing, low involvement products, etc.

(Kahneman and Tversky 1984; Bettman, Johnson, and Payne 1991; Gigerenzer et al. 1999), in

the normative view of decision analysis WADD proved to lead to the most effective choices

(Tversky 1967; von Winterfeld and Edwards 1986; Payne, Bettman, and Johnson 1988;

Aksoy, Cooil, and Luire 2011). Because our aim is not to describe the actual consumer behav-

ior, but rather to provide them with a decision aid that helps to achieve better choices, this

property of WADD is advantageous in the context of RS. Further advantages of WADD for

the generation of explanations and producing recommendations were discussed in Section

2.1.3.

In the view of the objectives of the current thesis and taking into account the discussion

of Section 2.1.3, the MAU model and WADD rule prescribe the way of how the recommend-

er algorithm should be constructed: For each user the algorithm should elicit their individual

attribute level preferences, i.e. part-worths, and the importance weights of the attributes which

are relevant for the user‟s preference formation. The obtained information can then be aggre-

gated by the means of WADD in order to calculate the utilities of the alternatives. Rank order-

ing of the latter leads then directly to a recommendation of the most preferred alternative (or a

set of alternatives ranking high along user preferences).

Within the framework of movie RS the utility ( ) from the Equation (2.2) can be

thought of as a rating (e.g. number of starts) that a user gives to a particular movie. The higher

rating a movie receives from the user, the more the user likes the movie. Accordingly, a mov-

ie with the highest rating possesses the highest utility for the user and thus is the most pre-

ferred one. Hence, the operationalization of the utility as movie rating allows comparisons of

different movies in terms of user‟s preferences. Moreover, given an attribute composition

model of user utility, this operationalization allows numeric inference of the attribute part-

DRAFT -

final

revisi

on to

appe

ar in

2012


worths as well as their later composition to holistic utilities for arbitrary movies, their rank-

ordering, and so, the recommendations.

To complete our conception of the movie recommendation algorithm that involves at-

tribute related preferences we now need a notion of what attributes of the motion pictures

should be considered within the algorithm, i.e. what movie attributes should be elicited from

the users or their preference data. The next Section is dedicated to this topic.

2.2.2 Preference Relevant Attributes of Motion Pictures

Understanding which attributes of motion pictures drive consumer preferences and de-

termine their choices is not a trivial task as it may seem. The extant research on movie con-

sumption leads to the recognition that addressing comprehensible preference relevant movie

attributes is challenged by the nature of the movies:

Movies are experiential experience goods. That means that, on the one hand, the main

motive for people to consume a movie consists in receiving a hedonic value (e.g. pleasure,

thrill) from experiencing it, rather than in fulfilling a utilitarian need (Cooper-Martin 1991,

1992; Holbrook and Hirschman 1982). The nature and the outcomes of hedonic motives are,

however, much more difficult to understand than utilitarian motives (Hennig-Thurau, Hou-

ston, and Walsh 2007), and thus are hard to formalize. On the other hand, the domain of mo-

tion pictures is dominated by experience qualities. This means that the quality of a movie can

be assessed by the consumers only when watching it (De Vany and Walls 1999). The latter

forces consumers to rely on proxies called “quasi-search qualities”, i.e. movie traits that a

consumer can comprehend before watching a movie, and on movie related communication for

forming their quality judgments (Hennig-Thurau, Walsh, and Wruck 2001; Hennig-Thurau,

Houston, and Walsh 2007).

Although the research on movie consumption has repeatedly put consumers into the

center of interest in recent years (e.g. Hirschmann and Morris 1982; Austin 1981, 1982, 1989;

Cooper-Martin 1991, 1992; Moon, Bergey, and Iacobucci 2010), it was mainly driven by the

hedonic nature of the movies; hence, it was mostly concentrated on the unique aspects of the

DRAFT -

final

revisi

on to

appe

ar in

2012


consumer behavior for this type of goods, rather than on the search for formalizable movie

attributes that allow to assess preferences of individual movie watchers. Accordingly, little is

known about which attributes of the movies actually form individual movie watcher prefer-

ences.

To our knowledge, Austin (1989) is the only author who provides a thorough overview

of the reasons for selecting a specific movie by an individual. Although the author himself

critically questions the general validity of his asserts, we see it sensible to provide a brief ex-

cerpt thereof. Austin suggests movie genre to be the most influential attribute that determines

the choice of a particular movie by moviegoers, though one movie can be simultaneously

classified in several genres. The genre categorization informs the consumer about the type of

the story and the elements of the film‟s plot, and so narrows down the set of hedonic qualities

which the consumer can anticipate from the film.

Further attributes influencing the movie choice are the onscreen and offscreen produc-

tion personnel, whose name recognition can affect attendance decision, i.e. acting stars, direc-

tors, producers, screenwriters, and production companies that are responsible for visual ef-

fects. While acting stars “no doubt contribute much to the audiences‟ awareness and

knowledge about the film” (Austin 1989, p. 77), only a few persons from the offscreen per-

sonnel gain public recognition that is strong enough to affect movie attendance decisions

(Austin 1989). Hence, a recommender algorithm does not need to consider every name from

the movie industry as a preference relevant movie attribute: We can narrow down the list of

persons to be considered to those who possess star power, i.e. whose names are popular

enough to influence consumer‟s movie preference assessment. Such list can be obtained, e.g.

from analytic web sites that maintain an up-to-date list of movie stars and offscreen personnel

with star power, such as IMDb25

or InsideKino26

.

Other factors that, according to Austin, influence movie choice are advertising, trailers,

critic reviews, and word-of-mouth (Austin 1989). These entities, however, can hardly be clas-

sified as movie attributes, but rather represent additional sources of information. That is, they

influence the process of preference assessment by the means of providing customers with ad-

ditional clues about the qualities of a movie. Although additional information may increase

25 http://pro.imdb.com/people

26 http://insidekino.de/Starpower.htm

DRAFT -

final

revisi

on to

appe

ar in

2012


choice effectiveness, the information sources themselves are unlikely to possess distinct char-

acteristics towards that an individual may exhibit more or less stable movie relevant prefer-

ences: The utility of an information source depends on the utility of the information it trans-

fers. In other words, we argue that it is unlikely that a consumer will like all the movies equal-

ly more or equally less just because s/he saw a trailer, a TV add, or heard of a movie from a

particular friend. Hence, we discard the above mentioned entities from the list of preference

relevant movie attributes and from the further discussion thereof.

Further preference relevant movie attributes can be obtained from the stream of movie

research that concerns the economic success of motion pictures. This research stream also

considers consumer preferences, however, approaches them from the perspective of the movie

producing industry, rather than from the consumers‟ side. Specifically, the focus of interest

lies here on the economic values such as movie profitability and box-office gross (Hennig-

Thurau, Walsh, and Wruck 2001), which are generated through the fees that consumers pay,

e.g. for attending a movie in a cinema, acquiring it on DVD, etc. Thus, the “success factors”

are determined by the consumers‟ reactions to the studio actions, non-studio factors, as well

as to the characteristics of movies themselves (Hennig-Thurau, Houston, and Walsh 2007). A

summary of motion picture success factors is provided in Table 2.2.

Table 2.2: Summary of motion picture success factors based on Hennig-Thurau, Walsh, and Wruck (2001)

and Hennig-Thurau, Houston, and Walsh (2007)

Movie characteristics Post-filming studio actions Non-studio actions

Genre

Advertising expenditures

Critical reviews

Stars Timing of movie release Awards

Directors Number of screens Customer-perceived movie quality

Budget Early box-office information

Symbolicity Word-of-mouth

Certification

Sequel

Language

Country of Origin

Movie Length

DRAFT -

final

revisi

on to

appe

ar in

2012


However, consumers are involved in the analysis of movie success only indirectly –

through the monetary value they generate on the aggregate level. That is, the consideration of

individual customers does not take place. This means that the empirical evidence behind the

influence of success factors on the decision to consume a movie in general cannot be simply

interpreted as a proof of the relevance of the success factors, and specifically that of movie

characteristics, for movie preferences of individuals. Nevertheless, the fact that the success

factors significantly influence consumption decisions on aggregate can be interpreted as an

indication for the possibility that those factors are valid on the level of individual consumers.

Hence, we assume that the motion picture success factors listed in Table 2.2 potentially pos-

sess an explanation power for individual consumer movie preferences. For our objectives, this

assumption has two consequences: Firstly, it strengthens the support for the relevance of gen-

re and production personnel for individual preferences. Secondly, it extends the list of movie

attributes that are potentially relevant for an individual‟s preference assessment.

New to our list of preference relevant movie characteristic, i.e. attributes, are budget,

symbolicity, certification, sequel, language, country of origin, and movie length. In the fol-

lowing we will briefly describe the meaning of these attributes and the motivation for their

inclusion into the list of preference relevant movie attributes to consider by a recommendation

algorithm:

Movie budgets serve for consumers as an indicator of quality, “since the budget indi-

cates whether the producer has the resources to turn an idea into convincing reality trough

acting, artistry, and technology” (Hennig-Thurau, Walsh, and Wruck 2001, p. 11). Thus,

budget allows consumers to assess their expectation of movie quality prior to watching it. In

fact, if we consider the popularity of high budgets movies of recent years (e.g. Avatar, Master

of the Rings, Titanic, Godzilla), we notice the tendency of higher budgets to attract more

movie watchers. Hence, although many consumers may not explicitly consider movie budgets

while making their movie consumption decisions (e.g. because such information is not always

available), we should avow by all means its indirect influence to exist. A recommender algo-

rithm can elicit such hidden preferences from the data on past movie consumption and use

them for making predictions.

Certifications are intended to classify movies with regard to their potential offensive-

ness for audiences and concern subjects such as suitability for children, violence, sex, abusive

language, etc. Although their impact on the movies‟ box-office remains disputable, certifica-

DRAFT -

final

revisi

on to

appe

ar in

2012


tions are considered to influence consumers‟ interest in movies (Hennig-Thurau, Huston, and

Walsh 2007). Thus, we include certifications in the list of preference relevant movie attrib-

utes.

Some movie producing countries are often associated with having a specific style of

narration that may be more or less attractive for consumers. For instance, French movies are

expected to be more arty and the Hollywood ones rather „merely‟ entertaining (Hennig-

Thurau, Walsh, Wruck 2001). Hence, the country of origin describes individual‟s movie pref-

erences.

Language spoken in a movie is closely related to the movie‟s country of origin and may

also influence the consumer‟s decision to watch the movie. Conventional wisdom tells us that

consumers who are not able to understand foreign languages are unlikely to watch undubbed

movies, whereas other people like to watch movies in original language. However, in several

non-English speaking countries the original language of movies is less important, as a majori-

ty of foreign movies is either dubbed (e.g. Germany, Russia, France) or subtitled (e.g. Nether-

lands, Sweden, Bulgaria; Hennig-Thurau, Walsh, Wruck 2001). Hence, the informativeness of

movie language with respect to consumer‟s preference may depend on the country a recom-

mender algorithm operates in.

Movie length can also be considered to impact consumer movie choice, since a signifi-

cant number of consumers are not willing to spend more time watching a movie than what can

be regarded the „critical length‟ (Hennig-Thurau, Walsh, Wruck 2001).

Awards given by prestigious institutions such as the Academy of Motion Picture Arts

and Sciences (AMPAS) can be seen as an independent indicator of the aesthetic quality of a

movie (Hennig-Thurau, Walsh, and Wruck 2001; Hennig-Thurau, Houston, and Walsh 2007).

The relevance of awards for consumer behavior was illustrated in the service sector (Dick and

Basu 1994; Hennig-Thurau and Klee 1997) and is suggested to persist in the domain of mo-

tion pictures (Hennig-Thurau, Houston, and Walsh 2007). Although awards are not inherent

attributes of movies, they are closely attributed to the latter. Due to concerns above we can

consider awards to be „exogenous‟ movie characteristics, and thus preference relevant attrib-

utes of motion pictures.

DRAFT -

final

revisi

on to

appe

ar in

2012


Still, not all motion picture success factors listed in Table 2.2 can be considered as rele-

vant movie attributes from the viewpoint of our goals, because they are not suitable for algo-

rithmic prediction of consumer preferences.

So, the “customer-perceived movie quality”, which encompasses the movie‟s experi-

ence traits as well as structure qualities, such as the movie‟s budget and personnel (Hennig-

Thurau, Walsh, and Wruck 2001), has three serious drawbacks: Firstly, it is a composite fac-

tor which contains several entities, whereas their exact composition rule is not specified by

previous research. This makes the addressing of this factor within a numeric process not oper-

ationalizable and thus not sensible. Further, it comprises the movie‟s budget and personnel.

While movie budget represents a new piece of information, movie personnel is already in-

cluded in our list of attributes. The repeated consideration of the latter is not necessary and

can even harm an algorithm through the perfect multicollinearity between multiple instances

of the same entity. Lastly, besides the previously mentioned reasons, the concept of customer-

perceived movie quality implies that a consumer has already seen a movie and, due to this

fact, can assess his or her preferences towards the experiential traits of the movie. This means

that a part of the information cannot be made available to an algorithm prior to the consum-

er‟s watching of a movie. Thus, movies unknown to consumers would be impossible to rec-

ommend. Accordingly, a recommendation algorithm with such a feature would make no

sense.

Similar arguments apply to the symbolicity, which refers to movie‟s potential to be easi-

ly categorized by consumers into existing categories that the consumer is familiar with (Hen-

nig-Thurau, Walsh, and Wruck 2001). As a basis for this categorization serve the movie‟s

relationship to prior works (e.g. novels, myths, fairy tales, comics, TV programs, computer

games, etc.) or its affiliation to a series of movies (Hennig-Thurau, Walsh, and Wruck 2001;

Hennig-Thurau, Houston, and Walsh 2007). Accordingly, a property of being a sequel can

also be seen as a dimension of the concept “symbolicity” (Hennig-Thurau, Houston, and

Walsh 2007), since sequels are both part of a series of movies and related to their respective

predecessors. Whereas reporting the elements of symbolicity can help customers to assess

their liking of a movie prior to watching it and so potentially increases the decision effective-

ness, we doubt its potential as a single attribute (i.e. whether or not a movie is based on prior

work) to increase the quality of predictions by a recommender algorithm: Although a con-

sumer may tend to like some set of groundworks movies can be based on (e.g. Greek myths),

DRAFT -

final

revisi

on to

appe

ar in

2012


at the same time s/he may dislike a subset of them (e.g. myths about Heracles). Similarly,

from the fact that a movie watcher liked some sequel (e.g. Mission Impossible, Matrix), we

cannot conclude that s/he generally likes sequels, since at the same time s/he might dislike

other sequels (e.g. Batman, Spiderman). Hence, we see the movie characteristics “symbolici-

ty” and “sequel” as not appropriate to model within our preference eliciting recommendation

algorithm.

Further, although the features number of screens, timing of movie release, advertising

expenditures and early box-office information influence the movie attendance decisions, it

can be argued that their impact is rather concentrated at the period proximate to the movie

start and diminishes during the course of time. Moreover, the influence of these factors hap-

pens mostly by the means of increasing the awareness of a movie, rather than by directly im-

pacting the consumer‟s preference of the movie itself. Because the value of a recommendation

algorithm consists in recommendations of those movies that above all match the user‟s prefer-

ences irrespective of the movies‟ release times and especially such movies that the user is not

aware of, we can omit the above mentioned movie success factors from further consideration.

Analogously, because word-of-mouth and critical reviews are hard to operationalize and

because they do not necessarily mimic the consumer‟s own preferences, we consider them

irrelevant for describing the individual‟s preferences within a recommendation algorithm.

However, complete discarding of the factors that are proven to reflect the aggregate

consumers‟ movie attendance decisions may be dangerous as it involves the loss of some

preference relevant information that might not necessarily be captured by the remaining mov-

ie attributes. We suggest to fill this information loss by accounting for the movie‟s box-office

and admissions (i.e. the number of people that have attended a movie) in our recommendation

algorithm. We propose two arguments to justify this suggestion: Firstly, because within movie

success research these quantities are formed through movie watchers‟ decisions to consume a

particular movie, they also to some extent reflect the movie‟s relative popularity. We argue

that the popularity of a particular movie itself may be a separate and independent motive to

consume it. Hence, the amount of preference towards box-office and admissions represents a

“quasi-search” quality, since it indicates the movie‟s popularity as a quality judgment of other

consumers. Secondly, because of the proven influence of the success factors, we suggested to

omit, onto the movie box-office (Hennig-Thurau, Houston, Walsh 2007), the latter captures

the variance in the former and so can serve as a proxy to assess the experiential qualities of

DRAFT -

final

revisi

on to

appe

ar in

2012


the movie as well as other omitted factors that would be difficult to operationalize otherwise

(e.g. advertising pressure, word-of-mouse qualities, etc.).

Another attribute we propose to include in our list is the movie‟s year of production. We

suggest that among other factors the age of a movie may also determine the consumers‟ inten-

tion to watch it. Some consumers may tend to prefer only newly released movies, others may

have stronger preferences for elder „mature‟ films. So we assume the year of production to be

relevant for the consumer‟s preference formation towards watching a particular motion pic-

ture.

The discussion above provides an overview of movie attributes that are relevant for the

consumers‟ assessment of preferences to the movies prior to seeing them, and thus may be

incorporated into a recommendation algorithm that aims to generate recommendations that

reflect individual user preferences as well as providing comprehensive and actionable expla-

nations behind recommendations. The final list of the preference relevant movie attributes is

summarized in Table 2.3.

Table 2.3: Summary of preference relevant movie attributes

Genre

Acting starts

Directors

Producers

Screenwriters

Production companies

Movie Length

Language

Country of origin

Certification

Budget

Admissions

Box-office

Year of production

DRAFT -

final

revisi

on to

appe

ar in

2012


2.2.3 Summary

Continuing the discussion of Section 2.1, which stresses the role of movie attributes for

the provision of actionable explanations that increase decision effectiveness, in Section 2.2,

we presented some insights into how attribute related consumer preferences can be operation-

alized at the individual level, so that a numeric algorithm can produce personalized recom-

mendations. Specifically, we utilize the multi-attribute utility (MAU) theory to underlie the

decomposition of consumer‟s preferences into attribute part-worths. In doing so, we propose

operationalizing the movie utility for a consumer as a rating that the consumer assigns to a

movie in order to embody his or her preferences. Provided that the attribute part-worths can

be elicited by the algorithm, they can be used to calculate ratings, i.e. utilities, of arbitrary

movies by the means of weighted additive (WADD) decision rule, and so, to rank order alter-

natives, i.e. movies, in accordance with the consumer‟s preferences. The movies with the

highest calculated preference rating represent the actual recommendations.

The list of movie attributes that are relevant for the individual‟s preferences was elabo-

rated in Section 2.2.2. Though, the elaboration was challenged by the lack of research on this

subject: While research on movie consumption merely suggests a set of movie attributes

without proving its explanation power, research on movie success operates with attribute

preferences on the aggregate level. Arguing that empirical evidence behind movie attributes

on the aggregate level also confirms their relevance for individual preferences, we combined

the suggestions of both research streams. In addition, the suggested attributes were examined

regarding their potential descriptive power in context of RS and their suitability for operation-

alizing within a recommender algorithm. We also suggested one further attribute (year of pro-

duction), which was not subject to either of the research streams, to be descriptive of consum-

er movie preferences. The resulting list of preference relevant movie attributes is presented in

Table 2.3.

At this point, we dispose of all the important concepts that allow us to construct an algo-

rithm for the provision of personalized recommendations that is capable of effective and ac-

tionable explanations. Nevertheless, in order to ensure the novelty and conceptual effective-

ness of our approach as well as to substantiate our development process, an overview of the

DRAFT -

final

revisi

on to

appe

ar in

2012


key algorithms employed in contemporary recommendation systems will be given in the fol-

lowing Section.

2.3 Key Recommendation Techniques

The goal of RS is to provide users with recommendations of the items they are not

aware of and that are potentially interesting to them. In other words, RS help users to find

useful items. In doing so, RS try to predict user preferences, i.e. ratings, for the yet unseen

items. These predictions are based on preference data, usually ratings, that RS acquire from

their user base. Once the ratings for the yet unrated items are estimated, the system can rec-

ommend the item(s) with the highest estimated rating to the user (Adomavicius and Tuzhilin

2005).

More formally, the recommendation task can be described as follows: Let

be the set of all users and let be the set of all items that can be rec-

ommended, such as movies, books, CDs, websites, news articles, etc. Let denote a

matrix of ratings with indexes denoting a particular user-item combination. Finally, let

be a preference function that measures the preference of user on item , i.e., .

Then the recommendation task is: for each user choose such item that corre-

sponds to the maximum of the user‟s preference:

(2.3)

The central problem of RS is, however, that the preference function is unknown and

its mapping onto the rating space is not defined on the whole space but only on some

subset of it. This means that if a certain user has not rated an item , the corresponding ma-

trix entry remains empty. Consequently, needs to be estimated from the non-empty en-

tries of and then extrapolated to the whole space of in order to predict the unknown

ratings (Adomavicius and Tuzhilin 2005; Jannach et al. 2011). Once the predictions are made,

recommendations are produced according to (2.3).

DRAFT -

final

revisi

on to

appe

ar in

2012


To estimate the ratings of the not-yet-rated items contemporary RS employ a number of

techniques. Although these techniques vary in details of implementation, based on the under-

lying principle of how recommendations are produced, they can be classified into three gen-

eral categories (Balabanovic and Shoham 1997):

Collaborative filtering,

Content-based filtering,

Hybrid approaches.

The approaches of the three categories differ in the strategies they employ, methods

they use, the data basis they rely on, and their inherent strengths and weaknesses. The follow-

ing subsections describe these approaches in more detail.

2.3.1 Collaborative Filtering

The key concept of collaborative filtering (CF) is that the information about the prefer-

ences of the entire user base of an RS can be exploited in order to produce recommendations.

That is, CF methods utilize all ratings from all users to all items available to the system to

predict which items a particular participant of the RS community will most probably like or

be interested in. The fact that every user potentially contributes to a recommendation entails

the title of this group of the methods, i.e. the users a thought to jointly “collaborate” on the

recommendation process. The CF methods family encompasses three approaches that differ in

the ways the rating data is used: user-based CF, item-based CF, and matrix factorization. Be-

low, we provide a brief overview of each of these approaches.

2.3.1.1 User-based Approach

The main idea of the user-based CF approaches (e.g. Shardanand and Maes 1995; Kon-

stan et al. 1997; Breese, Heckerman, and Kadie 1998; Nakamura and Abe 1998; Delgado and

DRAFT -

final

revisi

on to

appe

ar in

2012


Ishii 1999; Herlocker et al. 1999; Jannach et al. 2011) is that those users, who exhibited pref-

erences similar to the ones of the current user in the past, can serve as predictors of the prefer-

ences of the current user on items s/he has not seen yet. That is, the aggregated ratings of such

users (also referred to as peer users or nearest neighbors) are used as predictors of the ratings

of the current user. Accordingly, the algorithm can be broken down in the following steps:

1. From all users in the user base, find a subset of users that are similar to the

current user .

2. Aggregate the ratings of these users for the set of items the current user has

not rated yet.

3. Recommend the item from that exhibits the highest rating.

To gain an intuition for how this algorithm works, let us examine Table 2.4, which

shows an example of the rating database. The active user, Daniela, for instance, has rated “Sin

City” with “10” on a 1-to-10 scale, which means that she strongly liked this movie. Now, the

task of our RS is to predict Daniela‟s rating on “Thor”, which she has not seen or rated yet.

The system searches the database for users with tastes similar to Daniela‟s, i.e. who rated the

movies similarly, and uses their ratings to predict her liking of “Thor”. If the system can pre-

dict that Daniela will like “Thor” strongly, then it should recommend this movie to her.

Table 2.4: Ratings database for collaborative filtering

Daniela Thorsten André Michael Paul

Sin City 10 5 8 5 1

Titanic 5 8 5 8 10

Memento 8 3 8 1 10

Avatar 8 5 5 10 3

Thor ? 7 10 6 3

In our simple example, Thorsten‟s rating profile is the most similar to the Daniela‟s,

whereas Paul‟s profile is the most dissimilar one (see also Figure 2.1). Thus, Thorsten‟s rating

on “Thor” will be used to predict Daniela‟s liking of this movie.

Various approaches have been proposed for the compuation of the similarity

between users of CF systems (Herlocker et al. 1999; Herlocker, Konstan, and Riedl 2002;

Adomavicius and Tuzhilin 2005). Most of them compute the similarity based on the ratings of

items that the users have rated in common. The two most popular similarity measures are

Pearson’s correlation coefficient and cosine similarity (Adomavicius and Tuzhilin 2005;

DRAFT -

final

revisi

on to

appe

ar in

2012


Jannach et al. 2011). To introduce them, let | be the set of

items commonly rated by both users and . Then, Pearson‟s correlation coefficient is de-

fined as (e.g. Resnick et al. 1994; Shardanand and Maes 1995):

∑

√∑ ∑

(2.4)

The cosine-based approach (e.g. Breese, Heckerman, and Kadie 1998; Sarwar et al.

2001) treats users as vectors in m-dimensional space with | |, i.e. the number of items

the users have rated in common. The similarity between the users is then computed as the

cosine of the angle between both vectors27:

‖ ‖ ‖ ‖

∑

√∑

√∑

(2.5)

where denotes the dot-product28

between vectors and and ‖ ‖ is the second norm of

the vector, i.e. the vector‟s Euclidean length, defined as the square root of the dot-product of

the vector with itself.

Other metrics such as Spearman’s rank correlation coefficient, normalized Euclidian

distance or the mean squared difference measure have also been proposed to determine the

proximity between users (Shardanand and Maes 1995; Herlocker et al. 1999, 2002; Ado-

mavisius and Tuzhilin 2005; Jannach et al. 2011). However, empirical analysis provides evi-

dence that for user-based CF systems the Pearson‟s coefficient outperforms other measures of

comparing users (Herlocker et al. 1999). For the item-based CF systems, which will be de-

scribed in the next section, it has been reported that cosine similarity consistently outperforms

the Pearson correlation metric (Jannach et al. 2011).

27 Here and further we use bold font face to denote vectors and regular font face to denote scalars.

28 Recall that the dot-product of two vectors and in -dimensional Euclidean space is defined as a sum of

pairwise scalar products of the vectors’ coordinates, resulting in a scalar: ∑

DRAFT -

final

revisi

on to

appe

ar in

2012


Figure 2.1: Comparing three user rating profiles modified from Jannach et al. (2011, p. 15)

Both metrics, Pearson‟s correlation coefficient and cosine similarity, vary in the interval

between +1 and -1. While +1 corresponds to the case of perfect positive correlation, i.e. the

user profiles are identical, -1 corresponds to perfect negative correlation, i.e. the user profiles

are the exact opposite of each other. The value of zero shows that user profiles are absolutely

unrelated, i.e. dissimilar. Accordingly, the nearer the similarity measure to +1, the more simi-

lar both users are. This property of similarity measures is usually used for weighting the proxy

users‟ ratings within the aggregation process, so that the most similar users are given more

weight in the prediction of active user‟s ratings. The value of the similarity measure, as it will

be shown below, is directly adopted in the aggregation function as the weight of a user.

Before the ratings of the active user can be predicted, a set of peer users, whose rat-

ings will be considered in the prediction, needs to be defined, i.e. we have to select the most

similar users according to some rule. The set of the most similar peers is also referred to as “k

nearest neighbors”. Because these neighbors build up the basis for predictions, i.e. recom-

mendations, the collaborative approaches are often called k-Nearest Neighbor or kNN ap-

proaches.

The value of can range anywhere from 1 to the number of all users (Adomavicius and

Tuzhilin 2005). The question of the determination of the exact value of , however, remains

open until now. Hence, it is usually set heuristically, either by defining a specific minimum

similarity threshold (e.g. Shardanand and Maes 1995; Breese, Heckerman, and Kadie 1998)

or by choosing some explicit value of (Herlocker et al. 1999, 2002; Anand and Mobasher

1

2

3

4

5

6

7

8

9

10

Sin City Titanic Memento Matrix

Daniela

Thorsten

Paul

DRAFT -

final

revisi

on to

appe

ar in

2012


2005; Jannach et al. 2011). Both techniques are, however, problematic: if is set too high, too

many users with limited similarity bring additional “noise” into the predictions. On the con-

trary, low values of can negatively impact the quality of the predictions. On the other hand,

a too high similarity threshold can entail a radical reduction of the neighborhood sizes for the

users, so that the ratings for many items cannot be predicted. A too low threshold, in contrast,

increases the neighborhood size but also raises the amount of “noise”. Jannach et al. suggest

that “in most real-world situations, a neighborhood of 20 to 50 neighbors seems reasonable”

(Herlocker et al. 2002 cited in Jannach et al. 2011, p. 18)29

. A more detailed discussion of the

problem of the selection of neighborhood size can be found in Herlocker et al. (2002) as well

as in Anand and Mobasher (2005).

Once the neighborhood size or similarity threshold is defined, the ratings for the active

user are predicted by means of an aggregation rule. Different functions have been proposed as

an aggregation rule. Some examples of them are (Adomavicius and Tuzhilin 2005; Herlocker

et al. 2002):

∑

(2.6)

∑

(2.7)

∑

(2.8)

∑

(2.9)

where denotes a set of users that are most similar to the current user and have rated the

item . The multiplier ∑ | | serves as normalizing factor and the average

rating of user is defined as

| |∑ , with | .

In the simplest case, the aggregation can be a simple average (Adomavicius and

Tuzhilin 2005), as defined by (2.6). Intuitively, because this function does not account for the

degree of similarity of different peers, its predictions are subject to suffer from “noisy” input

from neighbors with limited similarity. Although the latter issue can be compensated by set-

29 The authors quote at this point Herlocker et al. (2002). However, despite careful reading, we could not find

this quotation in the referred publication. Hence, we refer here to Jannach et al. (2011).

DRAFT -

final

revisi

on to

appe

ar in

2012


ting an appropriate similarity threshold, such countermeasure, as described above, tends to

reduce the coverage of RS. So if the aim is to accurately predict user ratings, function (2.6)

might not be the best choice. However, the simplicity of this function is its biggest advantage.

Such aggregation requires little resources and can be computed quickly, which might be very

useful for RS that must provide ad-hoc and real-time recommendations from considerable

catalogues of items. Furthermore, in situations when the system does not know enough about

the user in order to be able to produce a personalized prediction (a so-called “new user prob-

lem”, which will be discussed below in Section 2.3.3), recommendations according to the

average rule might be better than no recommendations at all. In this case, however, the condi-

tion under the sum sign in (2.6) must be relaxed to | , i.e. all users who have

rated an item should be involved in producing recommendations.

Equation (2.7) represents a slight modification of the previously discussed aggregation

rule by reformulating it in the “deviation form”. That is, the aggregation happens here not

over the ratings that the users have given to an item , but over the deviations of these rat-

ings from the average ratings of respective users. The produced sum is then adjusted by

the mean rating of the active user. By doing so, the modified rule accounts for the fact that

different users may use the rating scale differently. For instance, Michael‟s rating of “6” may

correspond to exactly the same amount of preference as André‟s “8”. Moreover, the mean-

adjustment corrects for the “gap” between those user profiles that expose reasonable correla-

tion but are shifted along the rating scale. An example of such profiles can be seen on Figure

2.1, where Daniela‟s and Thorsten‟s ratings expose a strong correlation but are “shifted” ver-

tically, so that Thorsten‟s ratings lie on average some 5 points below Daniela‟s. Although

Thorsten might not necessarily share Daniela‟s movie taste, his ratings seem to be reliable

predictors for Daniela‟s. However, in order to predict Daniela‟s preferences appropriately,

Thorsten‟s ratings should be incremented by approximately 5 points. The mean-adjustment

performs such correction. Herlocker et al. (2002) have found that the mean-adjusted average,

as defined in (2.7), significantly outperforms (2.6) with respect to prediction accuracy, specif-

ically in the case of non-personalized recommendations. Similarly to (2.6) the deviation-from-

mean average does not account for different degrees of user similarity, thus the merits and

shortcomings of the simple average remain for the most part valid also for this rule.

The most common aggregation approach is, however, the weighted sum as defined in

equation (2.8) (Adomavicius and Tuzhilin 2005; Jannach et al. 2011). As already noted above,

DRAFT -

final

revisi

on to

appe

ar in

2012


peer users are assigned weights according to their similarity to the current user. Conventional

wisdom tells us that the users whose tastes are more similar to the tastes of the active user are

more credible recommender, and thus should contribute more to the recommendation. The

weighted aggregation procedure strives to achieve exactly the same effect. The normalization

factor , as introduced above, affects that the predicted rating is scaled within the scale‟s in-

terval and does not exceed or go below the allowed scale limits.

However, equation (2.8), similarly to equation (2.6), does not take differences in the av-

erage rating between different users into account. The mean-adjusted aggregation rule (2.9)

addresses this shortcoming.

After predicting the ratings for the yet unseen items, the item with the highest rating can

be recommended to the active user. Alternatively a set of items with the highest ratings can

be shown to the user. The latter case is often referred to as “top-N recommendation” (e.g.

Sarwar et al. 2000; Seyerlehner, Flexer, and Widmer 2009; Zhang 2009).

2.3.1.2 Item-based Approach

Rather than basing recommendations on the similarity between users, item-based col-

laborative filtering relies on the similarity between items (Sarwar et al. 2001; Rashid et al.

2002; Linden, Smith, and York 2003; Zeigler et al. 2005). The item-based CF algorithm can

be broken down into the following steps:

1. From all items, find a subset of items not rated by the current user that are

similar to those, the user liked most in the past .

2. For each item from , use its similarity to the items in to weight the average

rating of other users on respective items for the prediction of the rating of the ac-

tive user.

3. Recommend the item from that exhibits the highest rating for the active user.

To gain an intuition of how this algorithm works, examine Table 2.4 again. We can see

that the ratings for “Sin City” and “Thor” are distributed similarly among the users (see also

DRAFT -

final

revisi

on to

appe

ar in

2012


Figure 2.2). Thus “Sin City” is given a high weight for the prediction of Daniela‟s rating on

“Thor”.

Figure 2.2: Comparing three movie rating profiles

As mentioned previously, empirical analysis shows that for item-based CF approaches

the cosine similarity measure performs best with respect to prediction accuracy (Jannach et al.

2011). Hence, this measure is most often employed in item-based predictions (e.g. Sarwar et

al. 2001; Rashid et al. 2002; Linden, Smith, and York 2003; Zeigler et al. 2005). However,

one fundamental difference between user-based CF and item-based CF in computing the simi-

larity is that in the former approaches the similarity is computed along the columns of the

rating matrix, whereas the latter approaches compute the similarity along the matrix‟ rows

(see Table 2.4), i.e. each pair of co-rated entries corresponds to different users. Thus, compu-

ting the similarity between items, using cosine measure analogues to (2.5) in the item-based

case has one important drawback - it lacks accounting for differences in the rating scales of

different users. The adjusted cosine similarity measure offsets this drawback by subtracting

the average rating of the corresponding user from each co-rated pair (Sarwar et al. 2001). Ac-

cording to this scheme the similarity between items and is given by

∑

√∑ √∑

(2.10)

where | is the set of users that have rated both items and

.

1

2

3

4

5

6

7

8

9

10

Thorsten André Michael Paul

Sin City

Titanic

Thor

DRAFT -

final

revisi

on to

appe

ar in

2012


After the similarities between the items are determined, the prediction of the rating of

the current user on item is computed as a weighted sum of the current user‟s ratings for the

items that are similar to the questioned item, or formally:

∑

∑ | |

(2.11)

where denotes a set of items that are most similar to the questioned item . That is, the size

of the considered neighborhood, as in the user-based case, is limited to a specific number of

most similar items.

Just like in the user-based approaches, after making predictions, the item(s) with the

highest rating(s) constitute the recommendation(s).

2.3.1.3 Matrix Factorization and Latent Factor Models

Another approach within the class of collaborative filtering techniques is matrix factori-

zation (Sarwar et al. 2000, 2002; Goldberg et al. 2001; Canny 2002; Koren, Bell and Vo-

lonsky 2009; Koren and Bell 2011; Jannach et al. 2011). The general idea of this approach is

to exploit the data received from all users of a RS to derive a set of latent factors descriptive

of hidden associations between users and items and then to apply this knowledge for the pro-

duction of recommendations. In other words, matrix factorization (MF) techniques map both

users and items to a multidimensional joint factor space, where user-item interactions are

modeled as inner products of the vectors that represent user and item rating profiles. The la-

tent space tries to explain ratings by characterizing both, items and users on factors automati-

cally inferred from the ratings gathered from the user community (Koren and Bell 2011). For

instance, in the domain of motion pictures, such automatically identified factors may corre-

spond to obvious movie aspects such as genre, less well-defined movie dimensions, such as

depth of character development or quirkiness, but they can also be completely uninterpretable

(Koren, Bell, and Volinsky 2009).

DRAFT -

final

revisi

on to

appe

ar in

2012


Figure 2.3: A simplified illustration of the latent factor approach Source: Koren, Bell and Volinsky (2009), p. 44.

Figure 2.3 depicts a simplified example of how latent factor models work, provided in

Koren, Bell, and Volinsky (2009). The figure shows where several well-known movies and

some fictitious users may fall on two hypothetical dimensions, i.e. factors, characterized as

serious versus escapist and male- versus female-oriented. In a sense, the interpretation of the

graph is similar to the interpretation of perceptional maps within multidimensional scaling

(MDS) procedures, well-known in marketing (Myers 1996): The relative positions of the us-

ers and items in the two dimensional space characterize the degree to which the user‟s taste

matches the movie‟s characteristics in terms of the derived factors. The further from the origin

the user or the movie is located in the factor‟s direction, the more pronounced is the factor in

the user‟s taste or in the movie‟s properties. The nearer the user is to a movie, the more s/he is

supposed to like it. Accordingly, we can describe Gus‟ as having a strong preference for male-

oriented escapist movies and “The Color Purple” as a serious female-oriented movie. Hence,

in our example we would expect Gus to love “Dumb and Dumber”, to hate “The Color Pur-

ple” and to rate “Braveheart” about the average. Note, however, that some movies, e.g.

“Ocean‟s 11”, and some users, e.g. Dave, would be characterized as fairy neutral on these two

dimensions (Koren, Bell, and Volinsky 2009), meaning that the two factors fail to describe

both Dave‟s movie taste and the properties of “Ocean‟s 11” substantively enough for generat-

ing predictions.

DRAFT -

final

revisi

on to

appe

ar in

2012


Underlying concept for deriving the factors is the method of singular value decomposi-

tion (SVD; Golub and Kahan 1965), which is an established technique for identifying of latent

semantic factors in information retrieval (Koren, Bell and Volinsky 2009; Jannach et al.

2011). SVD is based on the theorem of the linear algebra, which states that any matrix can

be decomposed into a product of three matrices as follows:

(2.12)

where the columns of and are called left and right singular vectors and the values of the

diagonal elements of are called the singular values (Jannach et al. 2011; Golub and Kahan

1965; Press et al. 2007). The main point of this decomposition is that it enables us to approx-

imate the full matrix by observing only the most important features – those with the largest

singular values (Jannach et al. 2011; Press et al. 2007).

Informally, the SVD technique can be described as follows: The singular values corre-

spond to the eigenvalues of the eigenvectors that span the range of (Press et al. 2007). Thus,

the eigenvectors with the largest singular values capture the biggest portion of the variance in

. These eigenvectors build up the basis, i.e. the set of “factors”, of the target factor space. If

is the user-item matrix of ratings (e.g., our example rating dataset from Table 2.4), then

corresponds to the users and to the catalog of items (Jannach et al. 2011); and if factors

were determined to have non-zero singular values, then the product of the first columns

from , columns from and dimensional diagonal matrix of singular values ac-

cording to (2.12) yields the best approximation of in terms of the least-squares error (Press

et al. 2007). Thus, the first columns of and describe the users‟ and the items‟ coordi-

nates along the dimensions of the factor space, i.e. user tastes and item properties in terms

of the determined factors.

However, conventional SVD is undefined when the knowledge about the matrix is

incomplete (Koren, Bell, and Volinsky 2009; Press et al. 2007), which is always the case in

RS: If each element of the user-item matrix was known, there would be no reason to predict

user ratings – they all would be already known. To overcome this problem some earlier works

suggested employing imputation techniques to fill in missing ratings and make the ratings

matrix dense (e.g. Sarwar et al. 2000; Kim and Yum 2005; Ying, Feinberg, and Wedel 2006).

However, the imputation approaches have been criticized for being very expensive with re-

spect to computational resources. Moreover, the data may be considerably distorted due to

inaccurate imputation (Koren, Bell and Volinsky 2009; Koren and Bell 2011). Consequently,

DRAFT -

final

revisi

on to

appe

ar in

2012


recent works suggested performing decomposition of the user-item matrix on the basis of ob-

served ratings only, while counteracting overfitting through an adequate regularization (Can-

ny 2002; Funk 2006; Paterek 2007; Bell, Koren, and Volinsky 2007; Salakhuzrdinov, Mnith,

and Hinton 2007; Koren 2008; Koren and Bell 2011).

In this case, the rating of the user for the item is modeled as an inner product of

the vector of movie qualities and the vector of user‟s preferences , each described in

terms of latent factor dimensions (Koren, Bell, and Volinsky 2009):

(2.13)

That is, the rating is thought to be a projection of the result of the interaction of the user‟s

preferences and the item‟s properties onto their common space. The problem is that neither of

the two vectors nor their dimensionality is known. The only information the system can rely

on are the results of user-item interactions, i.e. the ratings that users have given to items in the

past. The task of the system is thus to recover the knowledge about the users and the items

from past ratings, so that this knowledge can be used to predict future ratings using (2.13).

Roughly said, the system has to iterate through all ratings and infer which part of each rating

comes from the user‟s preferences and which is due to the item‟s properties, i.e. to decompose

ratings into user and item vectors. The decomposition should be additionally performed so,

that expression (2.13) “gains” its validity for the whole set of the known ratings.

To learn the vectors and , the algorithm minimizes the regularized squared error

on the set of the observed ratings (Koren, Bell, and Volinsky 2009):

∑ ‖ ‖

‖ ‖

(2.14)

where denotes the “training” set, i.e. the set of pairs for which is known. The con-

stant controls the extent of regularization, which is aimed to counteract the overfitting of the

learned parameter values to the data by penalizing the former in their magnitude. The value of

is usually determined by cross-validation (Koren, Bell, and Volinsky 2009).

The learning of the parameters, i.e. the minimization of the sum (2.14), is typically per-

formed either by alternating least squares (ALS) or by the stochastic gradient descent method.

As the name of the method suggests, ALS alternates between fixing the ‟s and fixing

‟s. Each time all ‟s are fixed, the algorithm recomputes ‟s by solving a least-squares

DRAFT -

final

revisi

on to

appe

ar in

2012


problem, and vice versa. In doing so, each step decreases equation (2.14). The alternation con-

tinues until the equation converges (Bell and Koren 2007).

Another method, stochastic gradient descent, can be traced back to Simon Funk (2006),

who popularized it during the Netflix One Million Dollar contest. The simple technique al-

lowed him to reach the top of the contestants list, and so gained an extensive attention from

the RS research (Paterek 2007; Salakhurdinov, Minh, and Hinton 2007; Takács et al. 2007;

Koren 2008; Koren, Bell, and Volinsky 2009; Koren and Bell 2011). Looping through all the

ratings in the training set, the algorithm computes for each given rating its predicted value

and associated prediction error . Then it modifies the parameters by a

magnitude proportional to the learning rate , i.e. step size, in the opposite direction of the

gradient (Koren, Bell, and Volinsky 2009):

( ) (2.15)

( )

The learning is finished when the sum in equation (2.14) cannot be reduced any further, or

when the magnitude of its decrease in a given iteration does not exceed some preassigned

threshold, say 0.001.

The dimensionality of the factor space can either be set based on some considerations,

e.g. system performance, or be determined directly in the process of decomposition. In the

latter case, another loop wraps around the algorithm. In each iteration of the outer loop the

algorithm learns one factor dimension, i.e. one coordinate of ‟s and ‟s. A soon as no fur-

ther iteration of the inner loop can decrease the cost function (2.14), one more factor dimen-

sion is added and the learning continues on this dimension. The loop proceeds until the addi-

tion of further factors does not decrease the cost function (Funk 2006). The intuition behind

this procedure is that in the first iteration the parameters of the factor with the highest expla-

nation power are learned, so that the first factor captures as much of the variance in the ratings

as possible. The second factor tries to capture the majority of the remaining variance, and so

on. Hence, the explanation power of each successive factor decreases. Here, the direct analo-

gy to the principle of SVD decomposition can be seen. Therefore, the matrix factorization

techniques are often collectively called “SVD methods”.

A comprehensive overview of the recent advances in matrix factorization for CF can be

found in Koren and Bell (2011). The authors tackle on topics related to computational issues,

DRAFT -

final

revisi

on to

appe

ar in

2012


aspects of modeling, parameter estimation and show how to utilize temporal models and im-

plicit user feedback to improve the model‟s accuracy. Additionally they report on some in-

sights from applying these techniques in the Netflix prize contest.

2.3.2 Content-based Filtering

Content-based (CB) approaches base their predictions on the similarity between items

and the information about past preferences of the active user. Unlike CF, the calculation of

item similarity is based not on the ratings of other users but solely on the content characteris-

tics of the items. The main advantage of CB approaches over CF is that the former require

neither the existence of a large user community nor a considerable rating history to produce

recommendations. In essence, CB methods do not need any knowledge about the users others

than the one, the recommendations are made for (Jannach et al. 2011). The recommendation

task consists of determining the items that are similar to those the active user has liked in the

past (Balabanovic and Shoham 1997; Mladenic 1999; Herlocker et al. 1999).

Historically, CB approaches have been developed for the recommendation of text-based

items, such as e-mail messages or news (Jannach et al. 2011). Accordingly, CB methods

mainly deal with the recommendation of textual documents. Nevertheless, the general idea of

exploiting the object‟s content can also be expanded to the domains of non-textual products or

items. In this case, however, some modifications to the original CB approach must be made.

Hence, the current Chapter is divided in two subsections: The first subsection describes the

principles and procedures of the “original” text-based CB approaches, whereas the second

subsection addresses the specifics of its application in the non-textual domains.

DRAFT -

final

revisi

on to

appe

ar in

2012


2.3.2.1 The Principles of Content-based approaches

Having their roots in the field of information retrieval and data mining, content-based

approaches mainly deal with recommendations of textual documents (Jannach et al. 2011).

The standard approach is, therefore, to extract a list of relevant keywords from the content of

a document or from a textual description thereof (Balabanovic and Shoham 1997; Ado-

mavicius and Tuzhilin 2005; Lops, de Gemmis and Semeraro 2011; Jannach et al. 2011).

Consequently, each document is described with a vector of dimensionality equal to the num-

ber of relevant keywords (also often called features) maintained in the system. These vectors

are then used to determine the documents, i.e. items, which are similar to the ones that the

user was interested in in the past. Once such items are determined, they can be recommended

to the user.

To gain an intuitive idea of how this works, examine Figure 2.4 and Table 2.5. The fig-

ure illustrates the principle of how the keywords are extracted from documents and how they

constitute a vector representation thereof. In the given example, the vector‟s elements corre-

spond to the frequency of appearance of the respective words in the document. Other, more

comprehensive, techniques for constructing the keyword vector will be discussed below. For

this moment, to simplify our example, it is only important for us to know that the elements of

a keyword vector represent the presence of a word in the document.

Figure 2.4: Illustration of the extraction of a features vector from a document

Emmerich defends Shake-

speare film

German film director Roland Emmerich ad-

mits courting controversy with his film that

questions the authorship of Shakespeare's

plays.

2 Emmerich

2 film

0 Aid

. .

. .

. .

1 director

0 E. coli

DRAFT -

final

revisi

on to

appe

ar in

2012


Consider now Table 2.5, which encompasses five article headlines30

and their binary

representations as four elements row-vectors. Each element denotes a specific word with “y”,

indicating that the word is present in the article. The last column of the table shows the user

Thorsten and his preferences for the first four articles. We see that Thorsten liked articles with

the keywords “director”, “film” and “aid”; but he did not like articles with the keywords “E.

coli” and “aid”. Hence, the system will assign positive weights for “film” and “director” to

Thorsten‟s user profile. “E. coli” will receive a negative weight and the weight of “aid” will

be neutral - because it appears equally frequent in both, liked and disliked articles. Based on

these considerations, the system will predict that Thorsten will like the last article, featuring

Tom Hanks‟ attendance to the new movie of his own, because the article‟s content includes

the keyword “film”. If this article contained also the keyword “E. coli”, the system would

predict a lower rating for Thorsten‟s liking of it. The precise magnitude of the rating would

depend on the relative weighting of the keywords.

Table 2.5: Principle of content-based filtering

aid director E. coli film Thorsten

Emmerich defends Shakespeare film y y +

EU sets E. coli aid at 150m euros y y -

E. coli map: How the outbreak looks y -

Nadir to receive legal aid y +

Tom Hanks had a 'personal mission'

with Larry Crowne y ?

Now let us consider the content-based approach in more detail:

The above noted binary (as shown in Table 2.1) and frequency-based (as exposed in

Figure 2.4) encodings of keywords are not the only methods to construct vector representa-

tions of documents. The need for more comprehensive techniques emerges because of the

following shortcomings of the mentioned methods: The binary representation assumes that all

keywords have the same importance for characterizing the content of a document. Conven-

tional wisdom tells, however, that keywords that occur more often in a document are more

descriptive of it. Although the frequency based encoding compensates this issue, another seri-

30 The article headlines and annotations in Figure 2.4 and Table 2.5 are taken from http://bbc.com, retrieved on

07.06.2011.

DRAFT -

final

revisi

on to

appe

ar in

2012


ous drawback remains: Longer documents naturally have higher keyword frequencies and

comprise richer vocabulary, so that both, the probability for the keyword vector of containing

a specific word and the keyword weights, rise with the length of a document. Consequently,

longer documents have a higher probability to be recommended because their keyword vec-

tors are more likely to overlap with user profiles and because the relevance weights of the

keywords are overestimated (Jannach et al. 2011; Lops, Gemmis, and Semeraro 2011).

A standard approach to counteract these shortcomings is the term frequency - inverse

document frequency (TF-IDF; Salton, Wong and Yang 1975), an established technique from

the field of information retrieval. The main idea of this approach is that the descriptive power

of a keyword for a document, on the one hand, depends on how frequent this word appears

within the document itself; while on the other hand, it depends on how often this word occurs

within the whole corpus of documents. Accordingly, TF-IDF is composed of two measures:

Term frequency (TF) describes the frequency of the keyword‟s occurrence in a docu-

ment, assuming that important words occur more often. To account for document lengths and

to prevent longer documents from getting higher relevance weights, the word‟s frequency is

normalized (Jannach et al. 2011), typically, by relating the word‟s frequency to the maximum

frequency of other words in the document31

(Adomavicius and Tuzhilin 2005; Lops, Gemmis,

and Semeraro 2011).

Inverse document frequency (IDF), on the contrary, assumes that words that occur sel-

dom in the whole set of documents are more descriptive of a document‟s contents. In other

words, generally frequent words are not considered to be very helpful for discriminating

among documents (Jannach et al. 2011). Hence, IDF discounts the weights of words that ap-

pear frequently.

The product of TF and IDF yields the TF-IDF measure that accounts for both of the as-

pects described above.

More formally, let be the frequency of the word in the document . Let

| denote the maximum frequency of other

words in the document. Further, let be the number of documents in the corpus and let

31Other normalization schemes, optimized for specific cases, can be found in Chakrabarti (2002), Pazzani and

Billus (2007), and Salton and Buckley (1988).

DRAFT -

final

revisi

on to

appe

ar in

2012


denote the number of documents from in which appears. Then, in a given document cor-

pus, TF-IDF measure and its components are defined as follows:

(2.16)

(2.17)

(2.18)

Once TF-IDF vector representations are computed, the similarity between the docu-

ments can be determined by means of the similarity measure of choice. Depending on the

problem at hand, various similarity measures are possible (Maimon and Rokach 2005; Baeza-

Yates and Ribeiro-Neto 1999; Zanker et al. 2006). In the domain of recommendations of tex-

tual documents, however, the most common approach is to use the cosine similarity as de-

fined in (2.5) (Adomavicius and Tuzhilin 2005; Jannach et al. 2011; Lops, Gemmis, and

Semeraro 2011).

In essence, the further procedure of the recommendation generation in CB approaches is

analogous to the item-based technique with the difference that, here, only the document rat-

ings of the active user are employed. That is, the most similar items, for which a rating of the

active user exist, “vote” for yet unrated items (Allan et al. 1998; Jannach et al. 2011). Also,

analogously to CF approaches, in CB case the number of the “voters” can be set explicitly or

determined through setting a minimum similarity threshold (Billsus, Pazzani, and Chen 2000;

Billsus and Pazzani 1999). The “votes” are then aggregated to predicted ratings; typically by

employing a weighting rule that is based on the degree of similarity between the items, i.e.

analogously to the aggregation rule (2.8). Once the predictions are done, the item(s) with the

highest rating(s) or with the highest similarity to the previously most liked items can be rec-

ommended.

DRAFT -

final

revisi

on to

appe

ar in

2012


2.3.2.2 Exploiting Content Characteristics in Non-textual Item Domains

As noted in the introductory part of Section 2.3, the idea to exploit the content charac-

teristics of items for producing recommendations can also be transferred into the domains of

non-textual objects, such as music or movies. The challenging task of this transfer is, howev-

er, the extraction of qualitative characteristics for the representation of user and item profiles.

This is mainly because of the very limited ability of modern content processing algorithms to

automatically extract meaningful features that are descriptive of multimedia content (Wei,

Shaw, Easely 2002; Pazzani, Billsus 1997; Lops, de Gemmis, and Semeraro 2011). Hence, the

recommender algorithms have to rely on rather “technical” characteristics of the content (such

as genre, cast, length, etc.), which are either available from the providers or the manufacturers

(Jannach et al. 2011); or that can be extracted from external information sources, e.g., cata-

logs, movie critics web sites, etc. (e.g. Alspector, Kolcz, and Karunanith 1998). Nevertheless,

these technical content characteristics do not always overlap with qualitative features that

determine the consumer‟s judgment of items: For example, in domains of quality and taste the

reasons that a consumer likes an item are often based on subjective impressions, e.g., of an

item‟s exterior design, rather than being related to certain product characteristics (Jannach et

al. 2011). A manual specification of the item‟s features by domain experts seems to be the

only option to address this limitation (Adomavicius and Tuzhilin 2005; Lops, de Gemmis, and

Semeraro 2011; Jannach et al. 2011).

The most prominent and exceptional example for the application of the CB approach on

manually coded items is a popular internet radio and music recommendation service, Pando-

ra.com. Pandora‟s services rely on the data from “Music Genome Project”32

that are manually

entered by highly-trained analysts33

. A song‟s description encompasses often up to several

hundreds of features34

– “music genes” – such as instrumentations, influences, measures, key

tonality, song structure, vocal harmonies, aesthetic, phrasing, lyrics‟ mood and emotions, etc.

32 http://www.pandora.com/mgp.shtml

33 http://blog.pandora.com/faq/contents/506.html

34 http://blog.pandora.com/faq/contents/19.html

DRAFT -

final

revisi

on to

appe

ar in

2012


At this point it seems reasonable to interrupt our narration shortly and to notice

that Pandora’s approach mirrors the goals of our thesis (revise esp. Sections 1.2 and

2.1.4) and practically fulfills them: The song’s attributes defined by the experts are cho-

sen to potentially influence preferences of the users and to be understandable for them.

Further, the recommendation algorithm incorporates the preference relevant attributes

directly into the process of recommendation generation. Moreover, the employed CB

method tries to match recommendations with the user’s preferences, i.e. to align rec-

ommended songs with the user’s attribute preference weights. Hence, Pandora’s rec-

ommendation engine is both concordant with the way the users evaluate choice alterna-

tives and it is also capable of providing actionable and effective explanations behind

recommendations. Nevertheless, due to the reasons explained below, we seek for an al-

ternative approach for achieving our goals.

However, in most applications, the effort to manually encode item characteristics is

considered to be impractical due to the limitation of resources (Adomavicius and Tuzhilin

2005; Jannach et al. 2011). As stated by the founder of Pandora, Tim Westergren, “unlock-

ing” of a track‟s music genes, i.e. its manual annotation, takes a trained musician from about

fifteen minutes for a pop song to about one hour and a half for more sophisticated composi-

tions (Tim Westergren cited in Tran-Le 2010).

The latter issue affects that in the most cases, RS employ only those item characteristics

(i.e. attributes) that are available in electronic form (Jannach et al. 2011). Even though, in the

domains, such as motion pictures, where considerable amounts of the “technical” attributes

are available, only a subset of the available attributes is typically exploited (e.g. Ansari, Esse-

gaier, and Kohli 2000; Kim and Kim 2001; Burke 2002; Melville, Mooney, and Nagarajan

2002; Ying, Feinberg, and Wedel 2006; Gunawardana and Meek 2009; Park and Chu 2009).

This is because of the problem of assigning importance weights to the attributes within the

vector representations of items, discussed in the preceding subsection:

In the simplest case, the attributes of movies, i.e. genres, actors, directors, etc., would be

coded binary, indicating whether the attribute, e.g., a specific actor, is preset in the movie.

However, in this case, all attributes would be equally important for describing movies. Again,

conventional wisdom tells us that some attributes may discriminate stronger than others. For

DRAFT -

final

revisi

on to

appe

ar in

2012


instance, the acting of a specific star in a movie may signal more than its categorization to a

specific genre or the movie‟s belonging to a specific production company. Unlike in the case

of text-based items, in the movie domain, the characterization of attribute importance weights

by means of their frequencies is not possible, because each movie can only be described once

at each attribute. In other words, we cannot state that there is more Clint Eastwood in “For a

Few Dollar More” than in “The Good, the Bad and the Ugly” and we cannot assert that one of

both movies is more of a western than the other; at least not without having to assign the at-

tribute weights manually. Consequently, the frequency based TF-IDF measure is also not

available for allocating importance weights to the attributes.

Due to the lack of an instrument to assign attribute importance weights respective to

their ability to differentiate among movies the typical approach is, therefore, to maintain the

binary movie vector representations as described above. The issue of different roles that the

attributes play for the formation of the user‟s preferences of movies is addressed solely

through the user‟s profile. The latter is, thereby, represented as a vector of the number of di-

mensions that equals the number of attributes in the movie vector plus one. Hence, each vec-

tor dimension represents both, the importance weight of a corresponding attribute for the us-

er‟s discriminating among different movies and the amount of movie preference that the user

associates with this attribute; and the last dimension represents the user‟s rating baseline. The

values of the vector‟s entries are estimated through regressing the user‟s past ratings on the set

of available movie attributes (e.g. Ansari, Essegaier, and Kohli 2000; Kim and Kim 2001;

Ying, Feinberg, and Wedel 2006). The regression model is typically formulated as follows:

∑

(2.19)

where denotes the rating of user to movie , are binary dummy varia-

bles indicating the presence of th attribute in the movie‟s characteristics and are the re-

spective regression coefficients with being the constant term and denoting the esti-

mation error of the regression model.

Notice that the regression coefficients correspond to the movie attributes and cap-

ture the part of the rating that is due to the presence of the attribute in a movie characteristics

vector, i.e. attribute part worths. The values of betas can be positive, indicating an increase in

the preference when the attribute is present, and they can be negative, indicating a dislike to-

DRAFT -

final

revisi

on to

appe

ar in

2012


wards the attribute. The baseline estimate shows the amount of preference for the movies

in general, i.e. when no information about the movie‟s characteristics is available, and it

equals the mean user‟s rating. The latter is due to specifics of dummy regressions (see Gujara-

ti 2004 for details).

Once the betas are estimated, the predictions of the ratings for the yet unseen movies

can be done by means of the regression equation (2.19), or when reformulated in the vector

form as inner product of the user‟s profile vector and the vector of movie‟s attributes :

(2.20)

Note, however, that in order for expression (2.20) to hold formally, the movie‟s vector

must be complemented with a unity entry at the position that corresponds to the position of

the entry in the user‟s vector, so that the baseline estimate is contained in the final sum.

Analogously to all the previously discussed methods, after the ratings are predicted, the

item(s) with the highest rating(s) can be presented to the user in order to accomplish the rec-

ommendation task.

Let us now return to the assertion we made above that only a fraction of available at-

tributes is typically used within CB approaches in the case of recommendations of non-textual

items and explain the reasons thereof. Although in many non-textual domains – and specifi-

cally in the domain of motion pictures – considerable numbers of attributes are often available

in electronic form or can easily be extracted from additional information sources, and despite

the natural pursuit to include as much of these attributes into the recommendation process as

possible to increase the “overlap” of the technical item characteristics with the qualitative

ones, this cannot be done due to the restrictions of the regression analysis.

The issue is that regression analysis in general requires at least one observation per es-

timated parameter; otherwise the problem (2.19) cannot be solved due to insufficient data

(Gujarati 2004). Additionally, the observations are required to be mutually linearly independ-

ent in terms of parameters to avoid multicollinearity, which again would entail (2.19) to be

unsolvable, unless the parameters causing multicollinearity are omitted from the model or

other countermeasures are undertaken (Gujarati 2004). Despite other requirements of the re-

gression analysis that also can be hurt, these two considerably limit the number of possible

parameters, i.e. attributes, which can be introduced to the regression model. In the best case,

DRAFT -

final

revisi

on to

appe

ar in

2012


when no multicollinearity is present, the upper limit for the number of attributes that can be

considered per user equals the number of the ratings of that user available to the system.

Taking into account that the majority of users in movie RS datasets typically have rated

about twenty movies each, an inclusion of a higher number of attributes in the regression

model would harm the RS insofar as it would be able to produce recommendations only for a

narrow group of its users. Hence, the number of attributes considered within content-based

movie recommender was varied between ten (Kim and Kim 2001) and twelve (Ansari, Esse-

gaier, and Kohli 2000; Ying, Feinberg, and Wedel 2006) up to this moment.

The estimation of more than 300 attribute part-worths per user, as suggested in the cur-

rent thesis (see Section 2.2.2 and Appendix B: List of Preference Relevant Attributes), would

be unfeasible within the CB approach described above. Discarding a substantial part of poten-

tially relevant attribute knowledge, however, entails that a considerable portion of preference

relevant variance in the known ratings would not be captured by the model. This, in turn,

would lead to larger errors in the predictions and thus to a lower prediction accuracy. The

latter fact explains also why the majority of work that incorporates movie attributes into the

recommendation algorithms (e.g. Baudish 1999; Burke 2002; Melville, Mooney, and Nagara-

jan 2002; Park and Chu 2009; Gunawardana and Meek 2009) exploit only a small fraction of

the available attributes. Moreover, the knowledge about the attributes is not used directly for

rating predictions in the content-based manner but is rather utilized as additional information

to improve the CF predictions within hybrid models.

A brief overview of hybrid approaches and the motivation thereof will be given in the

subsequent sections. At this point, to conclude the current section, we consider it reasonable

to draw the reader‟s attention to the following two aspects that were omitted from the main

discussion of this section because they would unnecessarily interrupt its natural flow and pos-

sibly interfere with the reader‟s understanding of the discussed topic:

Note that contrary to the case of textual items, in the case of non-textual items, the CB

recommendation procedure omits the step of computing similarity between the items. This is

due to two specific properties of the non-textual domain: Firstly, the content of the items is

described in terms of binary vectors, which was shown to be the only possible way to auto-

matically describe items because the attributes can be assigned to an item only once. In con-

trast, in a textual domain, each attribute (i.e. keyword) can be additionally characterized by

DRAFT -

final

revisi

on to

appe

ar in

2012


the number of its occurrences in the document, which “grants” descriptive power to the quan-

tity of the feature that is basically absent in the non-textual case. Secondly, because of the

simplified representation of the item, the preferences may be fully attributed to the user pro-

file. Thus, recommendation can be made through the direct matching of item profiles with the

user‟s profile without the need to search for the items that are similar to the most liked ones.

The information that in the case of textual items was contained in the concept of similarity

can be thought as adsorbed by the user profile in the non-textual case. Note also that the pre-

dictions of item ratings are done simply by computing the inner product of user and item vec-

tors, rather than trough “voting” of similar items for a questioned one.

Notice further the existence of conceptual similarity between the regression model

(2.19) and the multiattribute utility or the WADD rule (2.2) as well as between the vector

form of rating prediction (2.20) and the model of matrix factorization approach (2.13). If we

would think of attribute importance weights in (2.2) in terms of item characteristics presented

in the current section, we would notice that the only remaining difference between the WADD

rule (2.2) and the regression model (2.19) consists in the absence of the baseline constant term

in the former expression. In all other means, both expressions are essentially the same. Fur-

ther, recall that expressions (2.19) and (2.20) are also in essence the same rating composition

rule that is only written in two different forms, i.e. algebraic form and vector form. Taking

into account that both, the MF decomposition rule (2.13) and the vector form of rating com-

position (2.20), represent nothing more than an inner product of the vector of item properties

and the vector of user preference weights, the conceptual similarity between both expressions

becomes apparent. The only difference between the two concepts is that both involved vectors

consist of the attributes that are defined differently. That is, all the mentioned concepts - mul-

tiattribute utility, the content-based approach and matrix factorization - are in essence differ-

ent viewpoints onto the same concept that follows the idea of the representation of an object

in terms of its attributes and the representation of the users in terms of their attribute-related

preferences. The distinction between the three variants is constituted solely by the details of

implementation and the disciplines the concepts originate from, i.e. marketing, information

retrieval and recommendation systems.

DRAFT -

final

revisi

on to

appe

ar in

2012


2.3.3 Trade-offs and Problems of Collaborative and Content-based Ap-

proaches

All of the recommendation techniques introduced in preceding sections have their mer-

its and limitations that entail trade-offs when it comes to the question which approach a par-

ticular RS should employ. Some of them, i.e. those related to the issue of the provision of ef-

fective explanations, were discussed in Section 2.1. In the current section, we provide a brief

overview of the strengths and weaknesses of CB and CF approaches that influence the func-

tionality of RS in a technical sense, i.e. impact the ability of RS to provide recommendations.

Table 2.6 summarizes the discussion of strengths and weaknesses.

Table 2.6: Summary of strengths and weaknesses of different

recommendation approaches

“+” denotes a tendency to exhibit the problem, “–” indicates nonsusceptibility to it,

“±” symbolizes the presence of the weakened problem

Approach

Type of problem User-based Item-based

Matrix

factorization

Content-

based

Sparcity + + ± –

New User + – ± +

New Item – + ± –

Overspecialization – – – +

Gray sheep + – ± –

Starvation – + ± –

Shilling Attacks + + ± –

Stability vs. Plasticity + + + +

DRAFT -

final

revisi

on to

appe

ar in

2012


2.3.3.1 Data Sparcity

Perhaps, the most common problem of RS that causes almost all other problematic is-

sues is the sparsity of the underlying data base. That is, RS have to produce their recommen-

dations on basis of a user-item rating matrix, which is typically very far from being dense

(Burke 2002). Consider an example of Amazon that maintains millions of items that are of-

fered to millions of users. In such a situation, it is not realistic that at least some users may

rate a considerable amount of items in Amazon‟s catalogue. Quite the contrary, it is more re-

alistic to assume that the majority of Amazon‟s customers have rated only vanishingly small

subsets of the offered items. Such scarce datasets are typical for most RS. So, in the Netflix

Prize dataset more than 99% of the possible ratings are missing (Koren and Bell 2011). The

same problem applies to the publicly available data sets of EachMovie and MovieLens

(O‟Sulivan, Smyth and Wilson 2004, p. 230) as well as for the data basis of MoviePilot‟s rec-

ommender system.

Although sparsity is problematic for all kinds of recommendation approaches, it is more

an issue for collaborative techniques, especially for item-based and user-based ones. This is

because they base their predictions on neighborhoods of like-minded users or similar items.

To form the latter, however, some level of overlap between the users or item profiles is re-

quired (Burke 2000). That is, if two users with identical tastes have rated different segments

of items, a user-based CF system will fail to detect their similarity because both user profiles

do not share a sufficient number of items. Thus, the system will not be able to recommend the

items liked by one of the users to the other one, although their tastes are identical. Analogous-

ly, in the item-based approach, if two item profiles do not overlap sufficiently, they cannot be

considered similar – even if both entries are the duplicates of the same item. Thus, the infor-

mation contained in one of the item profiles cannot be used for predictions of the user ratings

for the other item.

For MF approaches, the sparsity problem was not investigated sufficiently in literature.

However, indications are that the sparsity problem is mitigated in MF approaches because

they reduce the dimensionality of the space on which recommendations are made by extract-

ing latent factors from the original data (Burke 2000). Nevertheless, intuitively, in order for

DRAFT -

final

revisi

on to

appe

ar in

2012


factors to capture the variance in the user and item profiles, the latter should exhibit at least

some overlap. Though, contrary to the item-based and user-based approaches, in the MF case

both, user and item profiles, are involved in the factor extraction simultaneously. Thus, it is

enough that the overlap happens along on either of both profile dimensions, which is more

probable as compared to the situation when each of the profile dimensions is considered sepa-

rately. Still, sparsity remains a significant problem in domains where many items are availa-

ble, unless the user base is very large (Burke 2000).

As described earlier, CB approaches do not utilize ratings of other users for their predic-

tions but rather base them on the content characteristics of items. Moreover, the content de-

scriptions build up the data basis of CB approaches and thus are available for each item of the

catalogue. That is, the item space of CB recommender is dense and the density of the user

space is irrelevant. Hence, CB approaches are less likely to suffer from sparsity. Nevertheless,

the density of the active user‟s profile still remains an important issue for CB techniques.

This, however, manifests itself as a subclass of the next type of problems – the “ramp-up”

problem (Konstan et al. 1998).

2.3.3.2 “Ramp-up”: New User and New Item Problems

The “ramp-up” problem (also often called “cold-start” problem) regards the situations

where RS do not have enough information in order to make rating predictions (Konstan et al.

1998). Such a situation may happen in cases when (i) a new user or (ii) a new item is intro-

duced to the system. Accordingly, these types of situations are also often referred to as “new

user” and “new item” problems (Konstan et al. 1998; Burke 2002; Adomavicius and Tuzhilin

2005).

The new user problem is mainly an issue in the user-based and content-based approach-

es. Here, the system must acquire enough knowledge about the user, i.e. user ratings, to be

able to find like-minded users (in user-based CF systems) or detect items that match the user‟s

profile (in CB systems). In these types of systems, new users have to supply some information

DRAFT -

final

revisi

on to

appe

ar in

2012


about their tastes and preferences, i.e. ratings, in order to establish the basis for future recom-

mendations.

New items are added frequently to the catalogs that are maintained by RS. Whereas in

CB approaches the items are described in terms of their content, they are ready to recommend

right after their introduction to the system. In CF approaches, on the contrary, the system

needs new items to receive some ratings before they can be recommended. The new item

problem is also called the “early rater” problem, since users who rate the new items first re-

ceive little benefit from doing so, i.e. the early ratings do not increase the user‟s ability to

match against other users (Avery and Zeckhauser 1997). Hence, CF systems have to provide

other incentives in order to encourage users to provide ratings (Burke 2002).

MF approaches, as a subclass of CF approaches, suffer from both, new user and new

item problems, in equal measure; since from the viewpoint of matrix decomposition, it mat-

ters little if a new rating comes in as a new entry in a row or in a column of the user-item ma-

trix. However, MF approaches rely less on the similarity between users or items but rather

factorize the matrix entries, in a sense, “independently” of their row or column affiliations.

Hence, ceteris paribus, they are likely to need fewer ratings from a new user or for a new item

than their user-based and item-based relatives in order to be able to recommend.

As can be seen from the forgoing discussion, all recommendation approaches suffer

from the ramp-up problem in one form or another, which makes it necessary for RS to contin-

uously acquire additional data, i.e. ratings, from users in order to improve their ability to rec-

ommend and also the quality of their recommendations.

2.3.3.3 Overspecialization

CB approaches often suffer from uniform recommendations (Zhang, Callan, and Minka

2002; Jannach et al. 2011), a phenomenon that is also often called “portfolio effect” (Billus

and Pazzani 2000; Linden, Smyth, and York 2003; Burke 2002). Recommending items that

score highly against a user‟s profile, CF systems incline users to being recommended items

that are similar to those they have already seen (Adomavicius and Tuzhilin 2005). This im-

DRAFT -

final

revisi

on to

appe

ar in

2012


plies that the recommendations tend to linger within a particular topic of interest, which – in

radical cases – causes the user to receive recommendations of different versions of the same

item, e.g., of a book or a news article, even if s/he already owns it (Linden, Smyth, and York

2003). That is, a user must exhibit an interest for at least one item of a certain topic in order

for this topic to become relevant in the user‟s profile.

CF approaches, in contrast, allow more diverse recommendations. Because they do not

rely on item properties, but rather utilize user ratings that they assign to a wide range of items,

CF methods tend to be more capable of identifying cross-genre relationships of the items

(Adomavicius and Tuzhilin 2005; Jannach et al. 2011). Hence, CF techniques are more help-

ful in discovering items that the users might not have considered otherwise (Burke 2002).

2.3.3.4 “Gray Sheep”, “Starvation” and Shilling Attacks.

Whereas user-based CF methods are not affected by portfolio effects and can identify

cross-genre niches, they suffer from the so-called “gray sheep” problem. According to that,

users with “unusual” tastes have troubles with being categorized into a neighborhood of like-

minded, i.e. similar, users, because their rating profiles do not correlate well with the ratings

of other users (Rashid et al. 2002; Claypool, et al. 1999). Consequently, the generation of rec-

ommendations for such users is problematic.

Similarly, items can be “starved” to benefit other items. That is, popular items become

easier to find as more users rate them. The amount of ratings given to a particular item in-

creases the likelihood for it to participate in the process of matching user profiles. Because

popular items are typically given a higher rating, the probability for them to be recommended

increases too. At the other hand, in the item-based approaches, popular items are more likely

to expose a high similarity in terms of rating profiles and thus also become recommended

more often than unpopular ones. For ambiguous items, i.e. items that provoke polarizing atti-

tudes, it may also be problematic to find a neighborhood of similar items that can serve as a

“source” for rating predictions. Thus, unpopular and ambiguous items become more difficult

to discover (Rashid et al. 2002; McNee et al. 2003).

DRAFT -

final

revisi

on to

appe

ar in

2012


The latter also makes CF systems susceptible to malicious attacks (also often called

shilling attacks), i.e. the injection of ratings that aim to sink or to soar the popularity of an

item (Lam and Riedl 2004; Sandvig, Mobasher, and Burke 2007; Resnick and Sami 2007;

Mobasher et al. 2007; Metha, Hoffman, and Nejdl 2007).

Although MF approaches do not account for the relationships between user or item pro-

files explicitly, the amount and the character of ratings in the rows and columns of the matrix

to be decomposed influences the “direction” and the information content of the extracted fac-

tors. Although this issue was not studied in prior research, we can logically presume that a

higher number of ratings for a popular item causes a factor to twist towards such an item so

that it becomes easier to recommend it (starvation problem); whereas an unusual pattern of a

user vector causes it to expose lower factor loadings, which means that the rating predictions

for such users become less reliable (grey sheep problem). However, because the reduced di-

mensionality of the factor solution does not correspond directly to user and item dimensions,

the “distortion” of a factor may be compensated through other factors, which is likely to re-

duce the extent of both problems in the case of MF approaches.

CB approaches are immune to the problems considered above, since in CB cases neither

the rating profiles of items nor the ratings of other users are relevant.

2.3.3.5 Stability vs. Plasticity

As noticed earlier, the ability of CF and CB approaches to recommend improves over

time by continuously gaining additional user input, which solves the ramp-up problem. The

converse of this problem is the “stability vs. plasticity” problem (Burke 2002). According to

the latter, RS may become rigid, i.e. insensitive to changes of users‟ preferences. In some

sense, the problem consists in the established knowledge about the prior user‟s preferences

which “dominates” over the new user input. Suppose that a devoted sci-fi fan, all of a sudden

begins to rate dramas highly. In this case, the system might not recognize the changes in the

user‟s preferences, especially if the new input conflicts with the old negative ratings of dra-

mas. Instead, the system is more likely to handle the new positive drama rating as an outlier

DRAFT -

final

revisi

on to

appe

ar in

2012


and to continue recommending sci-fi movies. Similarly to the ramp-up case, the user would

need to provide the system with a substantial number of positive drama ratings to stabilize the

system‟s knowledge about the changed user preferences.

To counteract this development, some approaches suggest to discount past user prefer-

ences to cause older ratings to have less influence; but they do so at the risk of losing infor-

mation about the user‟s interests that are long-term but only exercised occasionally (Billsus

and Pazzani 2000; Schwab, Kobsa, and Koychev 2001; Burke 2002; Tsymbal 2004). Thus, if

our sci-fi fan also likes westerns but watches them sporadically, a temporal discount function

might gradually “forget” the user‟s preference for westerns in the cource of time and stop to

recommend them to the user.

2.3.4 Hybrid Recommender Systems

2.3.4.1 Principles of Hybrid Methods

In order to circumvent the trade-offs and problems of individual CF and CB methods,

hybrid systems combine both types of recommendation methods for producing their recom-

mendations. Most of the hybrids combine CB techniques with item-based CF (e.g. Balabanov-

ic and Shoham 1997; Basu, Hirsh, and Cohen 1998; Claypool et al. 1999; Pazzani 1999; Sob-

oroff and Nicholas 1999; Tran and Cohen 2000; Melville, Mooney, and Nagarajan 2002;

O‟Sulivan et al. 2004; Symeonidis, Napopoulos, and Manopoulos 2007; Koren 2008). The

goal of this combination is to utilize the invulnerability of CB techniques to data sparsity and

to the new item as well as to the starvation problems while avoiding its proneness to overspe-

cialization through the use of CF. Another benefit of these approaches is that the user can be

recommended an item not only if it is rated highly by similar users but also if it scores highly

directly against the user‟s profile (Adomavicius and Tuzhilin 2005).

The hybrid approaches vary in nuances with respect to how the different methods are

combined for predictions of ratings and to how deeply they are integrated into each other:

DRAFT -

final

revisi

on to

appe

ar in

2012


One way to build a hybrid recommender is to implement each method, i.e. CF and CB,

separately and then to combine the individual predictions (Adomavicius and Tuzhilin 2005).

Now, the final rating predictions can either be formed as a linear combination of individual

predictions (e.g. Claypool et al. 1999) that may also employ some kind of the weighting

scheme for the individual methods (e.g. Pazzani 1999) or the rating of the individual method

can be chosen as a final prediction based on the confidence intervals of the methods employed

(e.g. Billus and Pazzani 2000) or on their consistence with past ratings of a user (e.g. Tran and

Cohen 2000).

In advance, some works (e.g. Good, Schafer, and Konstan 1999; Melville, Mooney, and

Nagarajan 2002) propose augmenting the user-item rating matrix with artificial user rating

vectors in order to increase the overlap between user profiles. These augmented vectors are

produced by content-analysis agents – the so-called “filterbots”. As a result, the users whose

rating profiles agree with that of filterbots may receive better recommendations.

Another approach consists in a more diffuse, deep integration of methods. For instance,

Balabanovic and Shoham (1997) and Pazzani (1999) suggest a technique of “collaboration via

content”. This technique maintains content-based profiles for each user and applies the CF

method to this data rather than to user-item ratings, to identify the similarity between users.

The rating predictions, however, are done by means of CF aggregation rule that is applied to

the ratings of users who were identified as similar. Soboroff and Nicholas (1999) propose a

method that uses latent semantic indexing to reduce the dimensionality of CB user profiles

that are initially represented by term vectors. Then, the collaborative technique is applied to

the “reduced” user vectors. Koren‟s approach (Koren 2008) enriches the MF model with in-

formation about item neighborhoods and factorizes the user-item rating matrix based on this

extended model.

All hybrid approaches, discussed in literature, manage to show either better prediction

accuracy or better performance compared to individual techniques or both. The approach that

has won the One Million Dollar Netflix Prize “blends” the results of more than 100 different

recommendation algorithms (Bell, Koren, and Volinsky 2007b, 2008). The contribution, i.e.

weights, of individual algorithms to the final rating is determined by means of linear regres-

sion, where the dependent variable is the vector of the holdout ratings and the vectors of rat-

ings, predicted for the same set of ratings through different methods, serve as independent

variables.

DRAFT -

final

revisi

on to

appe

ar in

2012


2.3.4.2 Explanations in Hybrid Approaches

The ability of the hybrid approaches to provide explanations behind recommendations

depends on the specific approaches a hybrid consists of and it varies with the degree of how

tight the individual approaches are toothed with each other. Recall the discussion of Section

2.1.2, where different explanation styles and their correspondence to different recommenda-

tion approaches were presented: User-based approaches are capable of the nearest neighbor

explanation style, item-based approaches allow for the influence style, and CB approaches

can produce explanations in the keyword style.

Consequently, the hybrids are able to utilize the explanation styles that are available to

the recommendation methods they employ. However, the properties of the correspondent ex-

planation styles, e.g., transparency or effectiveness, only remain valid if the final recommen-

dation is produced solely by one of the constituent methods, i.e. when the predictions of the

hybridized methods are not combined and the rating of the best performing method is used35

.

Nevertheless, in cases when the recommendation is produced as a mixed result of mul-

tiple methods, the explanation of why a particular item was recommended can still be gener-

ated. One possible way to do this is to adopt the explanation that would apply, if the individu-

al method recommended this item. For instance, if a hybrid combined a user-based CF with a

CB technique, the explanation could be formed as a mix of the nearest neighbor style (“…

because other users also liked”) and the keyword style (“… because it contains features X, Y,

Z”).

However, this method is not applicable in the cases when the hybridized recommenda-

tion techniques are integrated more tightly within each other, so that the explanations of the

individual methods are not accessible, e.g., as in the above described case of “collaboration

through content”. A possible method to generate an explanation for this scenario is to post-

process the recommendation results with a CB technique. A concrete implementation of such

approach is described in Symeonidis, Napopoulos, and Manopoulos (2008, 2009) – the only

works that apply the keyword explanation style in the domain of movie recommendations. In

35 The consequences of the opposite case will be discussed below.

DRAFT -

final

revisi

on to

appe

ar in

2012


these papers the recommendations are produced by means of the item-based CF that is applied

to previously formed biclustes of users and items. The explanations are, however, performed

in the content-based manner. To do this, the authors examine the correlations between the

item feature profiles and user ratings and so identify the features that are associated with the

movies the user liked most. If the recommended item included such features, they would be

highlighted in the explanation of the recommendation.

This post-processing method allows the generation of keyword style explanations for

virtually all recommendations approaches. Moreover, within such an approach, it is also pos-

sible to address the negative cues that makes the “pros-and-cons” explanation style also ac-

cessible to all recommendation approaches, albeit these issues wese not addressed by previous

research. Although the pros-and-cons style is shown to be the most effective among the ex-

planation styles considered in the previous research (see Section 2.1.2), its post-processing

version has one serious drawback – it does not reflect the way the recommendation is actually

produced. Hence, the explanations fail to achieve the goal of transparency of the recommen-

dation system, which potentially hazards the users‟ acceptance of and trust in the RS as a

whole as well as their loyalty to the system (see Section 2.1.2). The same remains true also for

the keyword style, the second best explanation style.

Even more important is that in this case the recommendation process is generally not

aligned with the user‟s preferences36

. Thus, the advantage of the pros-and-cons and the key-

word explanation styles to increase the user‟s choice effectiveness cannot unfold thoroughly:

Though the explanations might efficiently highlight the reasons why the user may like the

recommended item, they are not able to explain why the system thinks this item is the best for

the user, since the recommendation procedure cannot access the user‟s attribute preferences.

As shown previously, a deviation from the user‟s preference function potentially decreases the

user‟s choice effectiveness, satisfaction with and loyalty to the RS (Aksoy et al. 2006; see also

Section 2.1.3).

36 Recall our discussion in Section 2.1.3, where we deduced that a good, i.e. effective and actionable, explana-

tion should be informative and understandable to the user. On the other hand, such an explanation requires that the underlying recommendation process is aware of and operates in terms of the item attributes that are descriptive of the user’s preferences, i.e. item characteristics are relevant for the user. In the best case, the explanation additionally utilizes the strengths of negative cues.

DRAFT -

final

revisi

on to

appe

ar in

2012


2.4 Summary

In this chapter, we provided an overview of the theoretic work related to the objectives

of the current thesis and its underlying proposals, which target for developing a recommenda-

tion method that is capable of providing both, accurately predicted recommendations and ac-

tionable explanations of the reasoning behind them.

In the first section of this chapter, we addressed the question of why the explanations of

recommendations should be provided and we deduced the concept of how it should be done.

Particularly, we concluded that in order to be effective and actionable, the explanations should

be aligned with the user preferences. This also increases the user‟s acceptability of and trust in

as well as loyalty to the recommender system as a whole. Based on these considerations, we

substantiated a new explanation style, i.e. “pros-and-cons” style that actionably supports the

user while choosing a movie and increases the choice effectiveness. We have also shown that

the generation of such explanations involves the recommendation algorithm to be capable of

reflecting the user‟s attribute preferences and to incorporate them directly into the process of

recommendation generation.

Following this idea, in the second section, we introduced the concept of multiattribute

utility (MAU) and the weighted additive composition rule (WADD), which serve correspond-

ingly as the basis for the operationalization of the user‟s attribute preferences and as the basis

for the derivation of the recommendations from the attribute-related preferences. To be able

to apply these concepts to the case of motion picture recommendations, we then elaborated on

the question of which movie characteristics, i.e. attributes, possess relevance for the formation

of consumer preferences for movies. These characteristics are summarized in the Table 2.3

and are also presented in detail in Appendix B.

In the third section, we provided an overview of the key recommendation approaches,

i.e. collaborative filtering, content-based filtering, and hybrid methods. We have also given

detailed descriptions of the recommendation algorithms that are representative for the corre-

spondent approaches. This knowledge allows us to comprehend the principles and the details

of recommendation generation, the merits and limitations of different approaches; as well as it

DRAFT -

final

revisi

on to

appe

ar in

2012


allows us to understand the problems we may potentially face, and thus should consider and

account for while developing our proposed method.

At this point, we have accumulated all the knowledge indispensable for the develop-

ment of our proposals. Hence, we proceed to the next Chapter that describes the concepts of

the method to achieve our goals.

DRAFT -

final

revisi

on to

appe

ar in

2012

Chapter 3: Conceptual Framework 72

Chapter 3

Conceptual Framework of a Hybrid

Recommender System that allows for

Effective Explanations of Recommendations

3 Conceptual Framework of a Hybrid Recommender System

that allows for Effective Explanations of Recommendations

This chapter presents the actual proposals of the current thesis, i.e. a recommendation

method that is capable of providing both accurately predicted recommendations as well as

actionable and effective explanations of the reasoning behind them. As elaborated in the pre-

vious chapter, this method integrates the user attribute preferences directly into the process of

recommendation generation and thus aligns the recommendation process with the user prefer-

ences.

The chapter is divided in three sections: The first section elaborates on the modeling is-

sues. That is, the model of the user preferences is gradually derived and the aspects that the

model incorporates and accounts for are discussed. The second section concerns the questions

of the parameter estimation for the derived model. In essence, it presents the core of our pro-

posal – the algorithm that is capable of estimating the users‟ attribute part-worths on basis of

very scarce data sets, i.e. such data sets where the number of parameters to estimate is much

greater than the number of data points, which makes an algebraic solution to the estimation

problem impossible. The third section motivates the hybridization of our algorithm and dis-

cusses the hybridization methodology.

DRAFT -

final

revisi

on to

appe

ar in

2012


3.1 Modeling User Preferences

3.1.1 Motivation of the Approach

As stated earlier, a recommender algorithm that aims to help users make better choices

and to increase their choice efficiency through providing actionable explanations should re-

flect the user‟s way of thinking (see Section 2.1.3). This can be done either by conforming the

algorithm‟s model to the user‟s decision strategy or by an accurate estimation of the user‟s

attribute preference weights.

As shown by Aksoy et al. (2006) the relationship between both aspects is not additive

so that it is only important for a recommender algorithm to maintain one of both: either the

similarity of recommendation process to the user‟s decision strategy or the similarity of the

estimated attribute preference weights to the user‟s actual ones. In our approach, we choose to

follow the second path, since it is more generalizable and allows us to handle all users the

same way – by applying the additive decision rule to the estimated attribute part-worth. In

comparison to our choice the other alternative, i.e. deriving the user‟s decision strategies, fac-

es the serious disadvantage that consumers do not have a stable decision function and they are

likely to rely on simplified heuristics in a number of situations (e.g. time pressure). This, from

the viewpoint of RS, spontaneous strategy change would seriously impede the recommenda-

tion task, since it would challenge the system to adapt to every little change in the user‟s be-

havior, which is also hard to track automatically. Moreover, the derivation a decision strate-

gies typically requires the knowledge of the attribute part-worths, which would thus compli-

cate the recommendation process while making it more prone to errors. Instead, we suggest to

rely on the most efficient decision rule, WADD, while concentrating on the accurate estima-

tion of attribute preference weights37

. This also conforms to our aim of providing the users

with an efficient decision aid, rather than obtaining an in-depth understanding of individuals.

On the other hand, the provision of actionable explanations requires them to be under-

standable to the user, i.e. made in terms that are meaningful to the user and relevant for the his

37 Compare Sections 2.1.3 and 2.2.1 for a detailed discussion of the provided arguments.

DRAFT -

final

revisi

on to

appe

ar in

2012


or her preference formation. As shown in Section 2.2.2, movie attributes build a suitable basis

that fulfills these requirements: they are both understandable to the users and relevant for the

formation of user preferences. The latter, again, brings the attribute preferences in the fore-

ground and confirms our choice to concentrate our proposals on the reliable estimation of the

attribute preference weights, i.e. part-worths.

Consequently, we develop our model of user preferences in terms of the user‟s attribute

part-worths. For this sake, we utilize the concept of multi-attribute utility that connects the

consumer‟s, i.e. user‟s, preference to the utility that an alternative, i.e. a movie, possesses for

the user; a concept which states that this utility can be decomposed into its attribute related

components, i.e. part worths (see Section 2.2.1). The following subsections present the devel-

opment of the model in more detail. Each subsection builds upon a preceding one and intro-

duces additional components to the model, thus refining it.

3.1.2 Basic Model of User Preferences

The datasets that recommender systems (RS) operate with typically represent a set of

ratings the users of the system have assigned to the items contained in the system‟s catalog

(see Section 2.3). In the context of movie recommendations, the ratings describe the user‟s

enjoyment of a movie, i.e. the degree to which a user has liked a particular film. The higher

the rating, the more the user liked the movie. Hence, the ratings can be thought to express the

users‟ preferences of movies or, in other words, the usefulness of movies for the users in

terms of liking. Inn that, we can see a direct analogy to the concept of utility: Indeed, a higher

rating corresponds to a higher utility; two movies rated equally are equally „useful‟ for the

user. Hence, we can argue that ratings are proximity measures for the utility of movies for

users and for the preference of the latter to the former.

The preference of a movie can be decomposed in (partial) preferences of its attributes,

i.e. the user‟s attitudes towards movie‟s characteristics (see Section 2.2.2). Thus, a rating can

be expressed as a sum of the part-worths of the movies components, or more formally:

DRAFT -

final

revisi

on to

appe

ar in

2012


∑

(3.1)

where is the rating of user to movie , denotes the preference of the user for the th

attribute of the movie, i.e. the th part-worth, and denotes a binary variable with 1 indi-

cating the presence of the attribute, e.g. of an actor, in the movie and 0 otherwise.

defines the indexes of the set of attributes which are used to describe movies in the system‟s

dataset. Rewritten in vector form, (3.1) yields to

(3.2)

with denoting a transposed binary vector of the movie‟s characteristics and being the

vector of a user‟s part-worths of the corresponding attributes38

.

This first yet simple model assumes that the movie ratings are known from the user‟s

past rating records and that the movie characteristics are available, e.g., from the Internet

Movie Database (IMDb). The vector of the user‟s preferences is to be estimated. Once

estimated, the part-worths can be used for both, predictions of the user‟s ratings for new and

yet unseen movies as well as for providing the explanations of recommendations.

Note that the model implies that the elements of the part-worth vector are real numbers

and thus allows them to take positive as well as negative values. These properties entail the

ability to rank-order the attribute part-worths according to their contribution to the final rat-

ing. This, on the one hand, allows the provision of explanations in the pros-and-cons style

(see Section 2.1.3). On the other hand, it allows to highlight the most important aspects in the

explanations that influenced the recommendation in a positive as well as in a negative way,

and thus additionally increases the effectiveness of explanations.

38 Here and further we use bold font face to denote vectors and regular font face to denote scalars.

DRAFT -

final

revisi

on to

appe

ar in

2012


3.1.3 Accounting for Static Effects beyond the User-Item Interaction

Our model suggested in expressions (3.1) and (3.2), although simple, still requires some

complements to improve its efficiency.

The first shortcoming of this model concerns the centering of the part-worths. That is,

the part-worth values in the basic model are centered on zero. Although this is advantageous

for distinguishing between “good” and “bad” attribute preferences, i.e. positive and negative

part-worths, the model does not suite the scale of most recommender systems well. Most RS

usually employ rating scales that begin at 1 point or star at the bottom level. In order to pro-

duce positive rating values in these systems, the model requires each movie to possess at least

one attribute that breeds a positive part-worth that is high enough to compensate all the nega-

tive ones. Moreover, in order to score over “0”, this requirement has to be fulfilled for all us-

ers, which seems to be rather unrealistic. A common way to compensate this shortcoming is to

integrate a constant term into the model, which is often referred to as the „baseline‟. By these

means the model parameters are shifted by the value of the constant, which affects the center-

ing of the part-worths on the baseline. A suitable choice for the baseline is the mean value of

all movie ratings contained in the system‟s dataset. The advantage of this choice is that it rep-

resents the first central moment of the rating distribution in the sample, i.e. in the dataset.

Given a high number of ratings, which is often the case in RS, and following the law of large

numbers the sample mean converges to the expected value of the rating of a movie. Accord-

ingly, the model updates to

(3.3)

with denoting the mean value of the movie ratings, i.e. the expected value for every movie,

given no additional information about the movie and the user. If a user does not have any

preference for the movie‟s attributes, i.e. when the user‟s part-worths for all movie attributes

are zero, the rating of the value is the most probable to occur. The positive and negative

attribute part-worths increase and decrease the user‟s rating value respectively. However, in

this context, the meaning of the part-worths is slightly different than in the previous model

formulation: In model (3.3) the user‟s attribute part-worths indicate the amount of the user‟s

DRAFT -

final

revisi

on to

appe

ar in

2012


preference change with respect to an average movie, i.e. how much the user‟s evaluation of a

movie becomes better or worse than an average one due to containing a specific attribute.

Further, expressions (3.1) and (3.2) model the rating solely as an interaction of item at-

tributes and the user‟s attribute part-worths. However, there are some effects that are inde-

pendent of this interaction but rather associated with either users or items (Koren 2009). So,

recommender literature frequently notes that different users may use the rating scale different-

ly: some users tend to systematically give higher ratings than others (e.g. Sarwar et al. 2001;

Adomavicius and Tuzhilin 2005; Jannach et al. 2011; see also Sections 2.3.1.1 and 2.3.1.2).

This causes the mean rating of individual users to deviate from the overall mean, something

we refer to as user bias. An item bias may result, for example, from the “appeal to popularity”

of the main stream movies, which causes the mean rating of such movies to be higher by trend

(Austin 1989; Koren 2009), whereas less popular movies are likely to expose lower average

ratings. Users may also differ in their reaction to average movie ratings and to movies‟ popu-

larity: While some users simply adapt to mainstream judgments, others react overly positive,

and a third group reacts skeptically, i.e. rates movies against the trend. Although these reac-

tions involve both, a user and a movie, it can be argued that they are directed to the movie as a

whole, rather than to its specific characteristics. In other words, these reactions happen on a

more general level that does not concern the attribute-level interactions, i.e. the changes of the

user‟s movie preferences that are conditioned on the presence of a certain movie characteristic

in the movie‟s profile. Incorporating these effects into the model leads to

(3.4)

where and denote the user bias and the item bias respectively

and are defined as deviations of a user‟s and a movie‟s mean rating value from the overall

mean. The user‟s reactions to the movie bias are captured by the scale factor .

DRAFT -

final

revisi

on to

appe

ar in

2012


3.1.4 Accounting for Time

The model described by (3.4) separates user-item interactions from the effects caused

by factors that are not related to users‟ preference formation but rather influence the magni-

tude of the rating through the inherent nature of users and movies. This allows an estimation

of the user‟s attribute preferences, i.e. part-worths, which are actually involved in the emer-

gence of the user‟s preference of a particular movie. However, this model is static. That is, it

does not account for temporal changes of user preferences, their rating behavior as well as for

changes of the movie popularity. Since RS, on the one hand, rely on historical data and, on the

other hand, depend on the amount of data (see Section 2.3.3), the model requires accounting

for time in order to not be prone to the “stability vs. plasticity” problem (see Section 2.3.3.5).

Indeed, time affects all components of the model in one way or another. So, movies can

become classics over time, e.g., “Casablanca”, or fall into oblivion, like “Night of the

Creeps”. Users may change their rating behavior or adopt new views on genres, actors, direc-

tors, etc. Hence, it is crucial to account for time changing factors (Koren 2009).

Time changing effects are usually modeled by splitting them in three parts: The first one

is a constant term, which represents the effect‟s baseline. It can be interpreted as the amount

of the modeled measure that it exposes at the „starting‟ point of time, i.e. at point . The

second part captures the long-term trend and concerns the component of the temporal changes

that develops linearly with the course of time. In other words, it represents a „drift‟ of the

measure‟s baseline that happens with a constant rate over time. The third part of the temporal

effect captures short-term fluctuations, i.e. deviations from the drifted baseline at a particular

point in time. These deviations may happen irregularly or have a periodic basis. So, for ex-

ample, Christmas movies become more popular in Christmas time, i.e. periodically; whereas

the popularity of an actor increases when a new movie starring the actor starts or when his or

her name is mentioned in a considerable amount of press reports, which in general has no pe-

riodic basis and happens irregularly. Figure 3.1 illustrates the three parts of time changing

effects.

DRAFT -

final

revisi

on to

appe

ar in

2012


Figure 3.1: Decomposition of a time changing measure in three components: baseline,

long-term trend, and short-term fluctuations

Accordingly, term in the equation (3.4) needs to be replaced with the expression

. Here, is the slope of the user‟s rating trend, is the deviation of the

user‟s mean rating at a point in time , and is redefined as the static part of the user‟s rat-

ing. Analogously, the movie bias and the user reaction factor are to be replaced with

and correspondingly. After the described modifications to (3.4), the

model extends to

( )( )

(3.5)

As noted earlier, user preferences can also be subject to temporal changes; thus each element

of the user‟s part-worths vector is to be constructed as with the index

denoting associated attributes of the corresponding part-worth values.

With expression (3.5), we derived a model that incorporates temporal effects and cap-

tures user preferences on the finest level of resolution. Provided that all parameters of the

model are known, we could accurately estimate the user‟s ratings to movies. However, aside

from the question of the sufficient amount of data, the estimation of this model is challenged

baseline

short-term

fluctuation

at t=X

trend

Time

Tim

e c

ha

ng

ing

me

asu

re

X 0

long-term

change

at t=X

baseline

DRAFT -

final

revisi

on to

appe

ar in

2012


exactly by the finest resolution. That is, the estimation of the parameters which capture short-

term fluctuations of the rating is not sensible.

To understand the rationale behind this assert, let us consider the parameters of model

(3.5) individually, while turning our attention to the question to which part of the variance in

ratings the parameters are associated to. Recall that the basis for the parameter estimation is a

matrix of past user ratings that incorporates two dimensions – users and items. Hence, the

effects integrated into the model can be attributed to the specifics of the rating distribution

along either user or item dimension or both.

Parameter , defined as overall mean, is thus associated with both, user and item dimen-

sions of the rating distribution, and captures the “roughest” part of the variance in the ratings.

The residual variance is to be explained by the remaining parameters. Being a constant term,

however, introduces solely a positive affine transformation into the model and only causes

the centering of the remaining effects on its value. That is, although affects the values of the

parameters, the magnitudes of the actual effects are not affected thereby.

Parameter represents the difference between the overall mean and the mean rating of

a user. It captures the static part of the remaining variance that is attributed to a specific user.

Analogously to the overall mean, affects a positive affine shift of the centering point for

the effects caused by the user-item interaction (model term ) and adjusts the values of

the estimated parameters, leaving magnitudes of the actual effects unaffected. In doing so, it

clears out the static part of the effects that are caused solely by the user‟s specifics from the

effects that are due to the user-item interaction.

The model term describes the long-term temporal changes in the rating behavior of

a user. In other words, it accounts for the development of over time. That is, at each given

point in time, is solely a constant that adjusts the value of . Hence, this term is also

attributed to the part of the variance which is caused by the user‟s properties that are beyond

the user‟s preferences.

In the same way does it for users, captures the static part of the variance in the rat-

ings that occurs due to the specifics of an item. Similarly, adjusts the value of the item‟s

baseline over time. Both parameters and are thus attributed to the items.

DRAFT -

final

revisi

on to

appe

ar in

2012


Similar logic applies to the construction of the user‟s reaction scaling factor and the el-

ements of the user‟s part-worths vector with the exception that both parameter groups are in-

volved in a multiplicative relation with the model terms that are associated with items. The

model terms and capture the part of the variance that is due to the user‟s reaction to a

movie‟s average rating. Notice, however, that and only address the user dimension of

the rating matrix, i.e. represent another adjustment to the user bias.

The residual variance in the ratings, i.e. the variance that remains after accounting for

previously discussed effects, is then „decomposed‟ in parts that are associated with the mov-

ie‟s attributes and caused by the user‟s evaluation of these attributes. That is, each element of

the vector captures the variance that is caused by the user‟s preference of the correspond-

ing attribute.

The short-term effects are thus thought to „compensate‟ the difference between the actu-

al rating and the rating that would be predicted after accounting for the effects de-

scribed above. However, on the one hand, this difference is produced by the cumulative effect

of all short-time parameters as defined in (3.6).

∑

(3.6)

On the other hand, problem (3.6) can only be addressed after all other parameters of the mod-

el are estimated. Thus, the model parameters that describe the short-term effects would not

help to clear out the associated variance from the initial model but rather only help to explain

its error for the past cases. This makes them useless for the purpose of rating prediction,

which contradicts the aims of providing recommendations. Therefore, we decide to omit the

short-term effects from the model, which leads to our final model:

(3.7)

with elements of the vector constructed as .

In the next Section, we describe our method to estimate the model parameters.

DRAFT -

final

revisi

on to

appe

ar in

2012


3.2 Estimating Model Parameters

To this moment, we obtained a model of user preferences formulated in equation (3.7).

This model allows the prediction of user ratings for yet unseen movies, based on the

knowledge about the characteristics of a movie and the user‟s properties. Whereas the

knowledge about the movie characteristics can be obtained, e.g., from the Internet Movie Da-

tabase39

, other model parameters have to be learned from the past user ratings available to a

recommender system. For the description of our approach of the estimation of the model pa-

rameters we assume that both datasets are available and that the dataset of past user ratings

maintains associations between the ratings, the users and the movies to which the ratings were

given.

As deduced in previous sections, our model encompasses 643 parameters: 636 of them

describe the user‟s preferences of 318 movie attributes (see Section 2.2.2 for the derivation

and Appendix B for the list of the attributes), i.e. 318 pairs of and that build the ele-

ments of vectors . One parameter represents the overall mean rating . The remaining six

parameters describe the effects that are associated with either a user or a movie. Whereas

can easily be calculated and thus can be thought as given by the dataset of user ratings, a set

of 642 parameters is to be estimated for each user based on the user‟s past ratings.

A direct solution to this problem, however, can only be obtained when the data available

is sufficient, i.e. when the amount of ratings per user contained in the dataset equals the

amount of parameters to be estimated. Moreover, the data points are required to be linearly

independent, i.e. no movie vectors consisting of exactly the same attributes are allowed. In the

case of movie recommenders, both requirements, though, are not likely to be fulfilled: The

linear dependence between the movie vectors may happen, for example, when sequels or se-

ries, e.g., the “Matrix” trilogy or “Friends”, are included in the database. A more serious prob-

lem, however, is that the users of movie recommenders are not likely to rate a sufficient num-

ber of items. So, for instance, the median number of ratings in the MoviePilot and Netflix

39 The data is available for download at http://www.imdb.com/interfaces. Licensing information is provided at

http://www.imdb.com/licensing (for commercial use) and http://www.imdb.com/licensing/noncommercial (for non-commercial use).

DRAFT -

final

revisi

on to

appe

ar in

2012


databases employed in our study amount respectively 25 and 96 ratings per user (see Table

4.1), which is correspondingly more than 25 and 6 times less than the number of parameters

to be estimated per user. Hence, the problem (3.7) can be addressed neither through solving it

directly nor by means of statistical techniques such as regression analysis.

In this case, optimization techniques, such as gradient descent, can be applied to learn

the model parameters through minimizing the dedicated error function

∑

∑

(3.8)

where denotes the predicted rating and designates the set of movies rated by the user.

However, the optimization methods are strongly dependent on the initial point of optimization

and will be more likely to find local minima rather than achieving a global solution to the

problem if the starting point for optimization is chosen improperly40

(Press et al. 2007; Pa-

terek 2007; Koren, Bell, and Volinsky 2009). The latter leads to unreliable estimates of the

model parameters and consequently to higher errors in the predictions produced by the model.

On the contrary, if the initial guess, i.e. a suboptimal yet good solution to (3.7), lies near the

global optimum of (3.8), optimization techniques are able to determine that optimum and to

refine the „initial‟ model parameters, so that the predictions made by the model exhibit the

lowest possible errors in terms of (3.8) as a matter of fact.

Accordingly, the task of the estimation of parameters for our model of user preferences

can be divided into two steps: (i) provision of an accurate guess for an initial solution to (3.7)

and (ii) optimization of the model parameters by means of minimizing dedicated error func-

tion (3.8). In the following subsections, we provide a description of our two-step method.

40 The optimization method and its tendency to find local minima will be described in more detail in Section

3.2.2. For now let us assume that the estimation by means of optimization is possible.

DRAFT -

final

revisi

on to

appe

ar in

2012


3.2.1 Step 1: Estimation of Initial Parameter Values

As noted above, under the given circumstances, i.e. insufficient amount of ratings per

user, is neither a precise solution of the values of parameters in the model (3.7) available, nor

is a simultaneous estimation of all parameters possible. Nevertheless, in order to be able to

find an efficient approximation of the solution by means of an optimization method we need

an initial guess for the parameter values that defines a point in the parameter space that lies as

close to the actual solution as possible.

We propose to employ OLS regression analysis method to obtain initial parameter esti-

mates for each parameter separately. That is, instead of estimating all model parameters joint-

ly (which is impossible due to data availability restrictions), we suggest to run a set of regres-

sions for estimating the individual parameters independently of each other. Although in this

case, the obtained estimates are likely to be biased, OLS regression provides us with a set of

advantages: Alongside with the estimates, it provides (i) inference about parameter signifi-

cance and (ii) access to confidence limits of the parameters‟ values, i.e. the interval that most

probably includes the true value of a parameter. The latter allows us to interpret the OLS re-

sults as interval estimates and to additionally constrain the optimization routine, so that the

search of the parameter values is performed within the scope of possible solutions that most

probably contain the true one and the search procedure does not leave this scope to find a lo-

cal minimum that satisfies restrictions of the error function (3.8) but provides unreliable esti-

mates of user‟s preferences in terms of (3.7). The information about the significance of a pa-

rameter can be used for dropping parameters that are statistically meaningless for describing

the user‟s movie preferences and for generating and explaining rating predictions, which thus

simplifies the search procedure and reduces the probability of finding local minima.

However, compared to a simultaneous estimation of all parameters, estimating parame-

ters one by one, introduces a model specification error to OLS. That is, the regression model

becomes underspecified, which may negatively influence the „quality‟ of the estimates, par-

ticularly when the omitted regressors correlate with the independent variable in the under-

specified regression model. In this case, estimating of the parameter values and confidence

limits as well as drawing conclusions about their significance is not straightforward. There-

DRAFT -

final

revisi

on to

appe

ar in

2012


fore, before we present the details of how the individual parameters can be estimated, in the

next subsection, we make a note on the consequences of the underspecification of OLS and

present our method to counteract them in order to achieve more reliable initial parameter es-

timates.

3.2.1.1 Omitted Variable Bias in OLS Models and a Method to Counteract the Bias-

ness

Omitting a relevant variable from the regression model entails that in the majority of the

cases the estimates of parameters and their correspondent variances are biased (Gujarati

2004). Consequently, since the variance of the parameter in regression analysis serves as the

basis for the inference about the parameter significance, the statements about the latter may

become misleading. To further understand the rationale behind these asserts, let us consider

the following example41

:

In order to maintain consistency with the notation commonly used within regression

analysis, let us redefine, for the length of this section, the symbols and as the coeffi-

cients of regression equations, as the correlation coefficient between the th and th inde-

pendent variables of a regression model, and as the -value of the Student‟s t-test.

Suppose now that the true regression model to estimate is:

(3.9)

but instead, we omit the relevant variable and fit the model:

(3.10)

The consequences of omitting are as follows42

:

41 The example and its explanations are based on Gujarati (2004), Chapter 13, esp. pp. 510-513 and 556-557.

42 The proof of the individual statements lies out of the scope of this thesis and can be found, e.g., in Kamenta

(1971) or Johnston and DiNardo (1997).

DRAFT -

final

revisi

on to

appe

ar in

2012


1. If the omitted variable is correlated with the included variable , i.e. the correla-

tion coefficient between and is nonzero, the estimates of and will be

biased and inconsistent. More formally, this means that and that

. Moreover, the bias does not disappear as the sample size gets larger.

2. If and are not correlated, the constant term will be biased, although is un-

biased in this case.

3. The disturbance variance ∑ , where denotes the degree of freedom, is

incorrectly estimated.

4. The variance of ∑ is a biased estimator of the variance of the true esti-

mator .

5. Consequently, the hypothesis-testing procedure, i.e. t-test, is likely to provide mislead-

ing conclusions about the statistical significance of and its confidence limits.

For our proposed method this means that:

(i) We may erroneously drop a parameter from our model based on a not applicable

conclusion of its insignificance.

(ii) The parameter values may be underestimated or overestimated, so that our solution

for the starting point for optimization would be further offset from the global opti-

mum of (3.8), which, in turn, increases the risk of finding a local minimum during

optimization.

(iii)The confidence intervals might not include the true value of the parameter, which,

again, would depart our optimized solution from the optimum.

However, we can counteract the consequences of the OLS model misspecification and

thus reduce the risks described above. That is, we can - to some extent - correct the biased

parameter values and the biases in the correspondent variance and therefore obtain more effi-

cient initial estimates as well as more reliable confidence limits.

First of all, notice that problems 1-5 as well as their consequences (i)-(iii) only apply for

the estimation of the part-worth vectors and do not apply for the user and item biases as well

as the users‟ scale reaction factor, since the latter are free from correlations with other model

variables by definition. That is, they capture effects that are associated either with a user or an

item only, which by their nature are not supposed to have sources of influence other than the

user‟s or the item‟s inherent ones. Hence, the consequences of OLS model misspecification

DRAFT -

final

revisi

on to

appe

ar in

2012


are of no concern for estimating these parameters. Due to these reasons, the discussion below

is only relevant for the estimation of the part-worth parameters.

Notice that we are not interested in the estimates for the regression‟s constant terms

or for the part-worth parameters, since the baseline for the part-worths is provided by the

user and item bias. That is, in our underspecified auxiliary regressions, we aim to obtain only

the values of the effects of the model‟s (3.7) variables, i.e. the slope coefficients .

Hence, we only need to correct the bias in these parameters and their correspondent variances.

The biasness of affects neither our initial solution nor the subsequent optimization.

To begin, let us consider how the estimate biases can be ruled out: It can be shown that

(3.11)

where ∑ ∑ is the slope in the regression of the excluded variable on the

included variable (Gujarati 2004, Chapter 13). As can be seen from (3.11), is biased,

unless or or both are zero, i.e. when has no effect on or when and are uncor-

related. So the first step to conclude the biasness of an estimate is to examine the correlations

between the variables. If no correlations can be determined, the estimate of the correspondent

parameter and its variance are unbiased.

In other case has to be corrected. In our example for two variables it can be done by

the means of two additional auxiliary regressions: (i) of on and (ii) vice versa:

(3.12)

Using the regression coefficients from (3.12) in expression (3.11), we obtain a system of

equations:

{

(3.13)

where as well as are known and and are the unknowns. Solving the system (3.13)

for and yields their correspondent values:

(3.14)

DRAFT -

final

revisi

on to

appe

ar in

2012


These are the unbiased values of the effects of our interest.

The next step is the correction of the variance estimates of and . This correction is

needed because the variance of the estimates is involved in calculation of both the signifi-

cance inference statistic, i.e. -value of the Student‟s t-test, and the confidence limits (Gujarati

2004, Chapter 8):

√

(3.15)

√ ( ) √ (3.16)

As stated above in consequences 4 and 5, the variance of the regression parameters in

OLS with the omitted variable is biased. Consequently, as can be seen from expressions

(3.15) and (3.16), the -value and the confidence limits are biased. This implies that the Stu-

dent‟s t-test, which is used within regression analysis to test a parameter‟s significance, is

likely to provide misleading conclusions. On the other hand, both the biased variance and a

biased -value entail an error in calculations of the confidence limits. The latter may cause a

shift of the confidence interval, such that the true value of lies outside of the predicted con-

fidence limits.

One way to counteract this issue is to simply recalculate the variance after its definition

(Gujarati 2004):

( )

∑

∑

∑ (

) (3.17)

where ( ) is the variance inflation factor which quantifies the extent of multi-

collinearity in OLS. Prior to this recalculation, it is, however, necessary to obtain the value of

the residual sum of squares ∑ ∑ of the „true‟ OLS model (3.9). Given the

unbiased values of and obtained after (3.14) and the definition of the constant term of

the regression as

(3.18)

(Gujarati 2004), we are able to calculate and thus the value of ∑ . Taking into account

the number of degrees of freedom that equals the number of data points minus the number of

regressors minus one, i.e. , we can now calculate ( ) as defined in

DRAFT -

final

revisi

on to

appe

ar in

2012


(3.17). After this procedure, the bias corrected -value and the confidence limits are obtained

using expressions (3.15) and (3.16). Accordingly, the test of significance can now be per-

formed using the corrected -values.

With the discussion above, we presented our method to counteract the problem of un-

derspecified OLS models for our proposal to estimate the parameters of the model (3.7) by

means of auxiliary regressions considering only one parameter at once. The following sec-

tions present the details of the estimation of the correspondent parameters.

3.2.1.2 Estimating User and Item Related Effects

The user and item biases as well as the user‟s popularity reaction scale factor are as-

sumed to be conceptually independent of each other and of the user-item interactions (see

Section 3.1.3). Thus, they are unaffected by the problem of the omitted variable described in

the previous section. Although there might be some „technical‟ correlations with other model

variables, these correlations and the correspondent variables are of no concern for the actual

effects of interest, because, again, they are conceptually unrelated. Hence, we can simply run

bivariate auxiliary regressions for determining the correspondent initial parameter values,

their significance and confidence intervals.

We begin by estimating the user bias parameters and . For each user we run an

OLS regression modeled in the form . Whereas the user‟s rating trend parame-

ter is derived directly from this regression, the baseline is recovered from by sub-

tracting the overall rating mean, i.e. . We choose to be the cut-off crite-

rion for concluding the significance of regression parameters. For the time resolution, we

choose do denote the number of days passed since the user‟s first rating, meaning that we do

not assume the users to change their rating behavior within one day, while allowing it to vary

on the daily basis. Further, because new users need some time to become accustomed to the

system, we assume their rating behavior to change more rapidly compared the one of experi-

enced users. Hence, to prevent the overfitting to unstable fluctuations of the average user‟s

rating and to increase the reliability of our initial estimates, we require the standard deviation

DRAFT -

final

revisi

on to

appe

ar in

2012


of the user‟s rating time to be at least 60. In other words, we require the user to have rated the

movies for at least 120 days in order to be able to capture his or her drifting rating behavior.

For users who do not meet this condition as well as for users whose was found to be insig-

nificant in auxiliary regressions, the parameter is discarded from the model and is cal-

culated as the mean of the correspondent user‟s rating. In such cases, the confidence limits for

are set in accordance with (3.16) to with drawn from the Student‟s t-distribution

for and degrees of freedom equal to the number of the user‟s ratings minus one, and

√∑ being the standard deviation of differences between user ratings and

the overall mean.

The item biases are estimated in the same way, using auxiliary regressions of the form

. Again, the time resolution here is set to one day. In contrast to user bias, we

expect the movie popularity to change slower and thus require the time frame between a mov-

ie‟s first and last ratings to be at least 240 days.

The estimates for the parameters that capture the user‟s reaction to the movie bias can

now be obtained in two steps: Firstly, we fix the user and item parameters in equation (3.7) at

their estimated values and ignore the model‟s part that concerns the user-item interactions, i.e.

we set . Again, we are allowed to ignore the use-item effects because they are con-

ceptually unrelated to the user and item inherent specifics. Given the fixed parameter values

for each user‟s rating we calculate the difference between the actual rating and the user‟s bias

and the value of the movie bias

. In the second step

we solve the following regression problem

(3.19)

The rationale behind this regression is that is intended to capture the part of the rating

that is not due to user bias but varies with the item‟s bias and the time. Since accounts for

both of the factors and is clarified of the user‟s bias, the estimate for from (3.19) pro-

vides precisely this knowledge. The regression‟s constant term hence captures the stable

part of the effect.

Analogously to the user and item bias cases, we discard the parameters which do not

reach the significance level of . Here, we also require the user to have rated for at least

120 days. For the users who do not fulfill this requirement and for those whose both regres-

sion parameters turned out to be insignificant we discard from the model and set the value

DRAFT -

final

revisi

on to

appe

ar in

2012


of . In this case, both the upper and the bottom confidence limits are also set equal ,

which allows for no variations of within the optimization process. In other words, if the

user does not exhibit any statistically significant reaction to movie average ratings or is not

supposed to have one, s/he is also not expected to have it. This is equal to dropping and

from our model.

At this point, we obtained the initial estimates for the effects that are not involved in the

user-item interaction and clarified our model of parameters that seem irrelevant for the de-

scription of the preferences of a particular user. In the next step, the residual variance that is

associated with the actual user-item interaction is to be explained by the movie attributes, and

the initial values of the correspondent part-worths are to be estimated. The next section is ded-

icated to these questions.

3.2.1.3 Estimating Attribute Part-worths

Contrary to user and item biases, user attitudes toward movie characteristics are not

necessarily mutually independent. In fact, they may be even thoroughly related to each other.

So, a moviegoer may perceive different movie attributes as a signal of the same expected

„quality‟ of a movie. For example, Clint Eastwood may be strongly associated with protracted

westerns containing less dialogs but much disquieting music; Pixar Studio with entertaining

high-quality computer animations; Andrey Tarkovsky with contemplative surrealistically

framed soviet classics; France with Alain Delon, Gerard Depardieu and arty plots; and so on.

On the other hand, correlations between movie attributes are inherent to the movie at-

tributes data. So, some actors exhibit tendencies to appear in movies of a specific genre, e.g.,

Bruce Willis is known to act mainly in action movies; directors tend to engage the same stars

in their films, e.g., Quentin Tarantino is known to have a stable „team‟ of actors. Strong corre-

lations may also take place between directors, genres, and producers; producers and writers;

studios and directors, etc.

Consequently, we inevitably come across the problem of OLS model underspecification

and have to account for biasness of the parameter estimates and their variances in our auxilia-

DRAFT -

final

revisi

on to

appe

ar in

2012


ry regressions (see Section 3.2.1.1). However, although we proposed a method to correct for

the omitted variable bias, there is another problem we may confront during parameter initiali-

zation – multicollinearity.

Multicollinearity is associated with the risk of poor estimation of coefficients in aux-

iliary regressions (3.12) and, in the extreme case, may preclude it entirely. This is particularly

the case when both variables are (nearly) perfectly correlated, i.e. their correlation coefficient

equals or is close to . In such cases the solution to (3.13) is either highly biased or inde-

terminate so that the effects of two highly correlated variables cannot be reliably separated

from each other (for proof see Gujarati 2004, pp. 345-346). Thus, the biasness of the parame-

ter estimates and their variances cannot be ruled out, which, again, might lead to wrong con-

clusions about the parameter significance and to an erroneous estimation of the confidence

limits.

Nevertheless, the joint effect of both highly correlated variables can be estimated and is

given by expression (3.11) (Gujarati 2004, p. 347, 511). We utilize this property to mitigate

the problem of multicollinearity for our setting. We argue that the knowledge about the joint

effect of two or more highly correlated variables is enough to describe user preferences in

terms of our model (3.7): If some attributes (nearly) always occur in movies jointly, their rela-

tive contributions to the user‟s preference become irrelevant because they always affect the

preference jointly. Hence, we examine the database of movie attributes for pairwise correla-

tions. From each pair of attributes that correlate highly, i.e. , we eliminate the one

that is less helpful for discriminating between the movies, i.e. exposes a lower variance in the

dataset. Notice that the elimination happens in the global scope of the data and is not done for

each user separately.

In the next step, we estimate the regression coefficients of the pairwise auxiliary regres-

sions (3.12). That is, each of the attribute describing variables is regressed on each of the

remaining variables that constitute the movie attribute vector . In this procedure, we zero

the values of the insignificant regression coefficients, which means that one variable does not

„influence‟ the other one in terms of (3.12), and thus must not be accounted for in the process

of the bias correction. In doing so, we obtain a set of auxiliary parameters that will be used

later for correcting the biasness of estimates and their variances as described in Section

3.2.1.1. This operation is also done in the global scope of the data, not on the individual user

level. The rationale behind this is as follows:

DRAFT -

final

revisi

on to

appe

ar in

2012


On the one hand, equations (3.12) and (3.13) aim to clarify the effect of the variable in-

cluded in (3.10) on the user‟s rating from the effect of the omitted variable on the included

one. Notice, however, that the effect of a movie attribute on the user‟s rating (the effect of

on ) takes place in the scope of an individual user, i.e. is relevant only for a specific user.

Whereas the effect of one attribute on another one ( on ) applies to movie attributes in

general, so that can be thought as an adjustment of the „explanation power‟ of for by

the part that explains in , which, in turn, happens in the global scope.

On the other hand, performing these auxiliary regressions in the global scope allows us

to account for the OLS model underspecification and to reduce the issue of multicollinearity,

while estimating part-worths of an individual user. Consider an example where a user has

only rated the “Lord of the Rings” trilogy. Since all episodes of the trilogy were directed by

Peter Jackson and engage a constant set of stars, all attributes describing the episodes are per-

fectly correlated. This would result in equal estimates for all of the attributes‟ parameters

and in our separated auxiliary regressions. This, in turn, would cause (3.13) to have an

indeterminate solution for all . However, because we estimated in the

global scope, their values are not likely to be the same anymore, since Peter Jackson has also

directed other films and the stars were acting in other movies as well. Hence, although re-

main equal, the unequal will clarify the effects of the omitted variable in different alphas

to a different degree and thus lead to a determined solution of (3.13) as shown in (3.14).

After the above described preparations, we are now ready to estimate the attribute part-

worths parameters. For each user, we run a set of regressions of the form

(3.20)

where and are the parameters of our interest designating respectively the static and

the time dependent components of the part-worth of the th movie attribute for the user ;

is a th component of the movie‟s characteristics; vector represents a binary dummy

variable with the value of if a correspondent attribute is present in the movie‟s characteris-

tics and otherwise; is the constant term of the regression; and is time counted in days

from the first rating in the dataset. Analogously to the user bias estimation, we require the

user to have rated movies for at least 120 days in order to be able to capture the user‟s time

changing component of the part-worths. For those users who do not fulfill this requirement we

discard and estimate a simplified regression

DRAFT -

final

revisi

on to

appe

ar in

2012


(3.21)

This simplified OLS model was used also in cases when the „complete‟ model (3.20) could

not be estimated due to data insufficiency. In such cases the correspondent parameters

were also discarded.

Then for the sake of correction for the omitted variable bias the estimated parameters

were pooled together with the previously derived auxiliary parameters to form, analogous-

ly to (3.13), a system of equations of the form

∑

(3.22)

where denotes the estimated value of the th parameter (i.e. or ), denotes the

unbiased value of the parameter (see Section 3.2.1.1 for details), and designates the

index of the remaining parameters.

This equation system was solved by means of the SVD technique as described in Press

et al. (2007, chapter 2.6)43

. We choose to employ SVD because it is capable of handling ill-

conditioned44

equation systems and provides in such cases an optimal solution in terms of

least squares (Press et al. 2007). Generally, since we have made dispositions to prevent the

risk of multicollinearity as described above, we do not assume the system (3.22) to be ill-

conditioned. Nevertheless, we cannot ensure this for the vast variety of cases that may happen

during the estimation process. Thus, by utilizing SVD, we secure our algorithm to obtain a

solution for (3.22) in any case.

Using the solution to (3.22), we recalculate the variances of the estimated parameters as

described in Section 3.2.1.1 after expression (3.17). Now we are able to accomplish the test

for the parameters‟ significance as provided by (3.15). The parameters that do not fall within

the significance level are discarded and the confidence limits for the remaining pa-

rameters are estimated (3.16).

43 Since the SVD method is one of the standard methods for solving linear equations, its description lies out of

the scope of the current thesis. For a detailed and comprehensive introduction of SVD, we refer to Press et al. (2007). 44

A system of linear equations is ill-conditioned when its underlying matrix is not of a full rank, i.e. when linear dependencies are present between the rows or the columns of the equation matrix, i.e. between the variables or between the equations or both (e.g. Press et al. 2007).

DRAFT -

final

revisi

on to

appe

ar in

2012


With the above presented procedure, we finalize the estimation of the initial values for

the parameters of our model of user preferences. As described in the introductory part of this

chapter, the initial parameter estimates are then passed to an optimization method as coordi-

nates of the starting point for optimization in multiple dimensions with the aim of obtaining a

parameter solution that is closer to the optimum. The method and the process of optimization

are described in the next section.

3.2.2 Step 2: Optimization of the Parameters

In the field of numerical research the term “optimization” refers to (usually iterative)

mathematical methods which strive to find the best available values of the parameters of some

objective function (e.g. Press et al. 2007; Lange 2010). In our case, we strive to find such pa-

rameter values of model (3.7), which breed the minimum possible error for the model‟s pre-

dictions. This aim corresponds to finding the minimum of the quadratic loss function (3.8).

We choose the quadratic form in (3.8) because its U-shape ensures the loss function to have a

single extremum, i.e. a definite global minimum, and because it penalizes larger errors by

magnitude, it thus potentially reduces the error in the final solution.

Typical methods to solve such an optimization problem are the method of steepest gra-

dient descent and the conjugate gradient method. Both methods are based on the same idea of

iterative approach to the minimum of the optimized function through stepwise updates of the

solution in the direction opposite to the function‟s gradient, i.e. the direction of the function‟s

fastest descent45

. The difference between the two methods is that steepest gradient descent

optimizes only one dimension in each iteration and „steps‟ in the direction of the dimension

that exhibits the highest value of the function‟s first partial derivative in a given point; where-

as the conjugate gradient method considers all dimensions of the function‟s space to choose

the direction to move in, so that the minimization along one direction is not „spoiled‟ by sub-

sequent minimization along another. This allows us to avoid cycling through a set of direc-

45 Since both methods are well-known standard methods for optimization, we do not discuss them in detail and

refer the reader to Press et al. (2007) for an in-depth description.

DRAFT -

final

revisi

on to

appe

ar in

2012


tions and hence reduces the number of iterations needed to achieve the optimum (Press et al.

2007, Chapter 10.7). Figure 3.2 visualizes the difference between the two methods.

Figure 3.2: Successive minimization with gradient methods (a) steepest gradient descent is less efficient than (b) conjugate gradient method, taking more

steps to get to the minimum, crossing and re-crossing the principal axis. Graphics are adapted

from Komarek (2004), p. 11.

Although both methods are suitable for the minimization of our loss function and con-

verge to the same solution (Press et al. 2007), we choose to employ the conjugate gradient

method because of its higher efficiency. However, due to the specifics of our task, some ad-

justments have to be done to the method. These adjustments include the (i) initialization of the

starting point of optimization (ii) restriction of the optimization procedure to the confidence

limits of the parameters and (iii) measures for preventing overfitting.

We initialize the optimization process with the parameter values obtained by means of

auxiliary regressions as described in Section 3.2.1. This initialization plays a crucial role for

the convergence of the optimization method. Not only does it reduce the number of iterations

needed to achieve the optimal solution, but also – together with the restriction of optimization

to the parameters‟ confidence intervals – it helps to ensure the solution we achieve to be the

true one: Recall that for most users our model (3.7) is underdetermined, i.e. the number of

parameters to estimate is greater than the number of data points. This fact „relaxes‟ the opti-

mization procedure by making it possible to have more than one solution. Note that these ad-

ditional solutions, i.e. „local‟ optima, are not caused by the form of the loss function, but ra-

(a) (b)

DRAFT -

final

revisi

on to

appe

ar in

2012


ther represent a set of possible spatial dispositions of the n-dimensional U-shape that satisfy

(3.8). By initializing the optimization with the values obtained through statistical techniques

(see Section 3.2.1), we ensure that the starting point of optimization already lies near the

„true‟ minimum of the loss function (3.8).

By restricting the „area‟ of optimization to the confidence limits of the parameters, we

additionally ensure that the true solution is being sought for in the scope of the space where it

is most probable to occur by virtue of our auxiliary regressions. In other words, by not allow-

ing the optimization procedure to leave the area constrained by the confidence limits, we re-

move the risk of „slipping‟ into the area of the local minima.

Another issue caused by the underdetermination of the model is the tendency to overfit-

ting (e.g., Koren 2009), i.e. finding such parameter values that fit the available data well but

exhibit large errors while making predictions for the data not included in the optimization

process. In order to counteract overfitting and thus to keep the model generalizable and suita-

ble for predictions of future ratings we utilize a holdout set of six randomly drawn ratings for

each user. The ratings contained in the holdout set are completely excluded from the whole

procedure of learning the parameter values, i.e. these ratings are neither used in the auxiliary

regressions nor for parameter optimization. Instead, they are used in the gradient method for

determining the stop point of the optimization, such that prevents overfitting. Particularly, in

each iteration the value of the loss function (3.8) is independently calculated using the holdout

data set. The optimization is stopped when the error value on the holdout data or the „original‟

error value of the method does not decrease with respect to the correspondent value from the

previous iteration. Figure 3.3 shows the flowchart of the optimization step of our algorithm.

Our adjustments to the original method are marked bold.

DRAFT -

final

revisi

on to

appe

ar in

2012


Figure 3.3: Flowchart of the optimization step Bold font face indicates our modifications of the original method

Start with initial parameter values. Set the value of the error function for the training

set ei = ∞ and for the holdout set eh,i = ∞

Calculate gradient and determine the conjugate direction for optimization as well as the step sizes in each direction

Calculate the value of the loss function for the training set ei

and for the holdout set eh,i

ei < ei-1

oreh,i < eh,i-1

yes

For each parameter

Is the value within confidence limits?

Adjust the parameter’s value according to the method

Set the parameter‘s value equal the boundary value of

its confidence interval

yes

no

Save parameter values and stop

no

DRAFT -

final

revisi

on to

appe

ar in

2012


The gradient of the loss function is calculated, according to its definition, as a set of par-

tial derivatives of the correspondent parameters. In each iteration, the parameters are adjusted

by a magnitude proportional to the step size in the opposite direction of the gradient:

( )

( )

( )

( )

( )

( )

( )

( )

(3.23)

where denotes the step size in the direction of the corresponding parameter and

designates the prediction error of the user‟s rating calculated with the parameter

values of the current iteration.

In the procedure described above, we obtain the final estimates for the parameters of the

model of user preferences (3.7) and are now able to predict the users‟ future ratings and so to

provide recommendations to the users. The knowledge about the parameter significance as

well as the knowledge of the parameter values allows the generation of explanations of the

recommendations in the pros-and-cons style as established in Section 2.1.3.

Our task could be seen as completed herewith. However, there are still potentials to in-

crease the quality of recommendations for the overall set of the RS‟s users. Thus, before we

proceed to the empirical test of our proposed method, we will discuss these potentials and

motivate the hybridization of our recommendation algorithm. The next section is dedicated to

this topic.

DRAFT -

final

revisi

on to

appe

ar in

2012


3.3 Hybridization with Collaborative Approaches

3.3.1 Motivation for Hybridization

Recall our discussion in Section 2.2.2 that we initiated with the assertion that movies are

experiential experience goods and where we highlighted the hedonic nature of movie con-

sumption in contradistinction to the consumption of utilitarian goods. Precisely these aspects,

together with the problem of the automatic extraction of meaningful and preference relevant

attributes from multimedia content, complicated the derivation of movie attributes that are

descriptive of the preferences of movie consumers. Consequently, the preference relevant

movie attributes that we derived for the operationalization of consumer preferences, although

derived carefully to capture the major part of the latter, in some cases, might not be able to

fully cover all aspects underlying the emergence of the consumer preferences.

For example, such characteristic of a movie‟s goodness as depth and dynamics of char-

acter development may be described well through the attributes „actors‟, „writers‟ and „direc-

tors‟, since they undoubtedly contribute to the character‟s development and tend to exhibit

general tendencies or affinities that correlate with the mentioned characteristic through their

work. However, in particular movies these tendencies may not necessarily surface. On the

other hand, a consumer for whose preferences character development plays an essential role

may not always consider this characteristic good and potentially disfavor protracted stories.

What is more, a consumer may have controversial tastes that, for example, may depend on the

context in which s/he watches a movie: in some situations the consumer may prefer thought-

ful motion pictures with curly storylines, whereas at other times s/he may be more interested

in light-headed entertainment movies. Furthermore, a consumer may pay more attention to

other aspects of the movies that do not correlate with our list of attributes, e.g., an overall

„message‟ that leaves its mark in the soul. Finally, the data on which we base the estimation of

the user‟s attribute preferences may simply be insufficient for our suggested procedure to un-

cover the user‟s preference structure. Although we proposed a method that is capable of esti-

mating part-worths in underdetermined conditions, it still cannot extract the preferences from

DRAFT -

final

revisi

on to

appe

ar in

2012


the data that are not contained therein. For instance, if a user has a strong attitude towards a

particular star but has not rated any movie starring this actor, our algorithm would have no

basis to deduce the user‟s preference score for the correspondent attribute. The latter repre-

sents the problem of overspecialization that is inherent to the content-based techniques (see

Section 2.3.3.3).

Some of the aspects mentioned above may be accounted for by introducing interaction

effects in our model of user preferences. This, however, would increase the complexity of our

already complicated model by an order of magnitude, which may make it roughly impossible

to estimate the model‟s parameters reliably. Other aspects, such as the consumption context,

cannot be addressed in our approach without the use of additional information, collection of

which entails additional interactions with the user. Not to mention that such interactions po-

tentially decrease the recommendation efficiency radically and thus may cancel out benefits

of a RS to a movie consumer. Additional information, i.e. additional ratings, is also needed

for deducing the part-worths of the attributes that do not apply for the movies rated by the

user, i.e. for counteracting the overspecialization problem.

The recommendation approaches that do not utilize item attributes in the recommenda-

tion process can help to counteract the potential problems mentioned above. Because these

approaches do not rely on item attributes, they are more likely to capture the relationships

between ratings and items that go beyond the attribute preferences. Thus, in those cases when

the concepts underlying such relationships are more valuable to a user and cannot be captured

by our proposed content-based method to a satisfactory degree, these approaches may produce

more reliable predictions of the user‟s preferences. Furthermore, because these approaches are

not subject to the overspecialization problem, they are able to predict ratings for the movies

with the attributes, the part-worths of which could not be addressed by our approach. This

potentially allows to enrich the set of movies that come into question for recommendations

with such movies that possess higher ratings than the ones selected by our method, and thus to

potentially increase the effectiveness of recommendation.

Hence, it seems sensible to extend our approach with the predictions provided by other

recommendation techniques. The two questions that we need to answer in this context are (i)

which method(s) should we combine with our algorithm and (ii) how the combination should

be accomplished.

DRAFT -

final

revisi

on to

appe

ar in

2012


3.3.2 Methods to Hybridize and the Method of Hybridization

As discussed in Section 2.3.4, several strategies can be followed for constructing a hy-

brid recommender. Notice, however, that the majority of hybridization strategies aim to in-

crease solely the accuracy of recommendations, whereas accuracy is only one of the concur-

rent objectives that the current thesis pursuits: In Section 2.1.1 we showed that the explana-

tions of recommendations play an important role for the users‟ perception of the RS‟ transpar-

ency as well as for the users‟ acceptance of and trust to RS. Moreover, explanations increase

the effectiveness of the users‟ choice. Hence, we search for such a solution to hybridization

that counteracts the problems described in the previous section while maintaining the ad-

vantages of explanations.

Now, recall the discussion of different explanation styles provided in Section 2.1.2.

Consider that each recommendation method is associated with a particular explanation style,

which is caused by the specifics of the methods‟ process of recommendation generation. Each

explanation style, in turn, exhibits different potentials to increase the users‟ satisfaction with a

RS and the users‟ ability to accurately assess the true quality of recommended items. It was

shown that the nearest neighbor explanation style that is inherent to the user-based CF per-

forms the worst of all the explanation styles discussed. Even more, this style may lead to the

users‟ mistrust to RS. On the contrary, the keyword and influence explanation styles (that are

available to CB methods and to item-based CF correspondingly) were found to be effective at

enabling accurate assessments. Although there is no overall agreement whether the keyword

style dominates the influence style or vice versa, the combination of both was found to lead to

the best results in terms of the overall satisfaction and the quality assessment. To complete

our discussion of different explanation styles and the correspondent recommendation meth-

ods, please notice that MF methods allow for no meaningful explanations, because they base

their recommendations on uninterpretable factor solution (see Section 2.3.1.3).

Taking into account that the keyword explanation style is a shorter version of the pros-

and-cons style (see Section 2.1.3) that our proposed method implements, and accounting for

our objective to provide effective explanations along with accurate recommendations, the best

performing combination of the explanation styles prompts us the answer to the question which

DRAFT -

final

revisi

on to

appe

ar in

2012


methods to combine within our hybrid recommender. That is, an item-based CF method

should extend our proposed content-based approach.

The answer to the question how the predictions of both methods should be combined is

easier. Because we want to maintain the explicability of or recommendations, we are not al-

lowed to combine the prediction results of different recommendation methods by using math-

ematical operations, e.g., through averaging or weighting. Otherwise, we would lose the asso-

ciation of the recommendation result with the underlying explanation of its „original‟ method

and so would be unable to explain the reasoning behind the recommendation. Hence, we have

to use „pure‟ rating predictions of the method which performs best for a given user in terms of

accuracy.

In order to compare the accuracy of the two methods constituting our hybrid, we sug-

gest utilizing the holdout set of six randomly drawn ratings per user that we used for deter-

mining the stop point of optimization (see Section 3.2.2). We chose to utilize the same hold-

out set despite the critic that may arise that our model of user preferences estimation is al-

ready trained to the data. We argue that the latter is not of too high importance for comparing

the accuracy of both hybridized methods. Firstly, the holdout set was employed to increase

the generalizability of the estimated model parameters and to prevent overfitting to the train-

ing data. Because the ratings of the holdout set were excluded from the actual optimization

procedure and used to calculate the value of the loss function „externally‟ with respect to the

training data, the optimization is accomplished so that the predictions accuracy of the model

for the unseen data should already be very similar to the predictions that are done on the hold-

out set. Secondly, even if our model of user preferences overfits the holdout data so that the

„combining‟ algorithm would have to prefer our model‟s predictions over the CF predictions

for the final recommendations, the effect of this preference would only decrease the overall

accuracy of the hybrid method. The latter provides our hybrid method no advantage for com-

paring the prediction accuracy of different recommendation algorithms that will be given in

Section 4.4 below. Finally, although we admit that using a separate holdout set that is not em-

ployed in either calculation except for the comparison of the predictive accuracy of hybridized

methods is methodically desirable, we have no other choice: Provided with the median of 13

ratings per user in the dataset of MoviePilot (see Table 4.1) and having in mind that we need

another holdout set for validation purposes, we risk to exhaust most data for the purpose of

the holdout sets. In this case, the users that only have a few ratings would be underrepresented

DRAFT -

final

revisi

on to

appe

ar in

2012


in our empirical study, which, in turn, would question the generalizability of its results. On the

other hand, in practical settings our proposed recommendation method would require a user to

rate at least 24 items (18 for the three holdouts and 6 for the predictions) in order to be able to

recommend. Such an amount of „warm-up‟ ratings may be impracticable for a considerable

number of users – again, as can be seen from Table 4.1, MoviePilot would not be able to rec-

ommend anything for the majority of its actual users. Thus, in this tradeoff, we decide to trade

potentially inferior accuracy of our hybrid for the benefit of generalizability of the results and

a higher attractiveness of our method for the practitioners.

To summarize the above said: In our hybrid of our proposed algorithm and item-based

CF; we suggest generating predictions of future ratings by means of the one of both methods

that performs best on the same holdout set, which is used in the optimization procedure de-

scribed in Section 3.2.2. For determining the best performing method we propose utilizing the

Student‟s t-test for paired samples. That is, the method that exhibits significantly lower pre-

diction error ( ) on the holdout set is considered best and is used for future predictions.

If, however, the difference between the errors is not significant, we will use the predictions of

model (3.7), even if its error on the holdout set is greater than the error of item-based CF.

Again, in doing this, we trade formal accuracy for more effective explicability.

The discussion of this section closes the description of our proposed conceptual frame-

work of a hybrid recommender system that allows for effective explanations of recommenda-

tions. In the next chapter we present an empirical study which evaluates our proposed method

and compares it with key recommendation algorithms.

DRAFT -

final

revisi

on to

appe

ar in

2012

Chapter 4: Empirical Study 105

Chapter 4

Empirical Study

4 Empirical Study

In the previous chapters, we built theoretical foundations and developed a recommenda-

tion method that achieves our objectives. That is, a method that is capable of providing both,

(accurately predicted) recommendations and actionable explanations of the reasoning behind

them, as well as aligning the recommendation process with the user preferences. Whereas the

alignment of the recommendation process with user preferences is given by the design of the

method (see esp. Sections 2.2 and 3.1) and its ability to provide actionable explanations is

justified theoretically (see esp. Sections 2.1, 3.1 and 3.3.2), the statement that the predictions

are accurate still requires some proof.

Indeed, our method suggests estimating a considerable number of parameters that in

many cases exceeds the number of data points available for the estimation procedure (see Sec-

tion 3.2). The latter fact may raise a doubt that the estimates produced are capable of reliable

predictions of user preferences and good enough in comparison with established recommen-

dation methods. That is, proof is needed that our method is applicable in the recommendation

systems‟ praxis and provides advantages to the latter.

On the other hand, by means of hybridization of our proposed method, we „secured‟ that

the hybrid‟s predictions are at least as accurate as the predictions of its item-based CF compo-

nent (see Section 3.3.2). Thus, another question of interest is the quantity that characterizes

the number of times when our proposed preference estimation method applies, relative to the

number of times when the CF component is used for generating the final rating predictions.

At the same time, this quantity characterizes the relative number of times when explanations

DRAFT -

final

revisi

on to

appe

ar in

2012


are provided to the users in the most effective pros-and-cons style, rather than in the second

best keyword explanation style.

To answer these questions we conduct an empirical study that tests different recommen-

dation techniques on real-world rating data: the dataset of a German commercial movie rec-

ommendation system MoviePilot.com and the dataset of a US online DVD rental service Net-

flix.com. By using two datasets for our tests, we „secure‟ the generalizability of the compari-

son results to other cases of movie recommendation and prove potentials for the portability of

our method to other recommendation domains.

We compare the accuracy of our proposed method with the accuracy of the key collabo-

rative recommendation techniques that were described in detail in Section 2.3, i.e. user-based

CF, item-based CF and matrix factorization method. As matrix factorization is known to pro-

vide one of the best predictive accuracies among „pure‟, i.e. not hybridized, recommendation

algorithms (e.g. Funk 2006; Paterek 2007; Bell, Koren, and Volinsky 2007b, 2008; Koren

2009), we suggest a comparison with this algorithm to be the most informative for judging the

prediction accuracy of other algorithms.

The comparison is made on the basis of the holdout data which is not involved in the

training procedure of either algorithm. The task of the algorithms consists therefore in predict-

ing the ratings for the holdout data. The difference between the predicted ratings and the actu-

al ones then serves to calculate accuracy measures based on the comparison results evolved.

Further details on the comparison procedure and the data employed as well as the results

of the empirical study are described in the succeeding sections.

DRAFT -

final

revisi

on to

appe

ar in

2012


4.1 Datasets and their Properties

As mentioned above, two real-world datasets are involved in our study. We chose to use

the Netflix dataset because it underlies the majority of recommender research done in the re-

cent years, which was caused by the interest from the side of the research community to the

Netflix‟s competition that promised a prize of one million dollars to an individual or a team of

individuals who would suggest a recommendation algorithm that tops the prediction accuracy

of the Netflix‟s own recommender by 10% with regard to RMSE. Hence, providing the accu-

racy measures of our algorithm for this particular dataset makes our results comparable to a

variety of other recommendations methods discussed in recent literature. MoviePilot‟s data

was used because of the following reasons: The current research is performed within a re-

search project funded by the German Research Foundation (Deutsche Forschungsgemein-

schaft; DFG) where MoviePilot acts as a cooperation partner. Through this cooperation, we

could gain a full access to information which could have influenced the rating data, e.g.,

changes in the scale labeling, interface updates, etc. Such information is not available for the

Netflix‟s data, although it is known that Netflix has altered its scale labels in the past (Koren

2009), however, no exact details about the type of alternation and the date when it happened

were ever published. From his analysis of the Netflix‟ data Koren (2009) infers that it might

have been happened in early 2004, where the mean rating makes a sudden jump that would be

hard to explain otherwise. Furthermore, Netflix has made available only a subset of its rating

data stating this data to be randomly drawn from the original rating dataset. However, as a

commercial provider founding a considerable prize, Netflix could have „integrated‟ some arti-

facts in the published dataset. So, for example, one of the users in the dataset has over 17,000

ratings. Assuming an average movie runtime of one and a half hour this person should have

been watching movies without any breaks for almost three years; if this user spent only eight

hours a day watching movies, s/he would need more than eight years to watch them all. This

seems rather unrealistic. Netflix provided no comments on this artifact or other artifacts that

might be introduced to the data artificially. Contrary to Netflix‟s data, the dataset of MoviePi-

lot we employ is a complete set of the ratings provided to the recommender system by its us-

ers.

DRAFT -

final

revisi

on to

appe

ar in

2012


Each dataset represents a relational database with two tables. The first table contains

three fields „user_id‟, „movie_id‟, „timestamp‟ and „rating‟, so that each raw entry of the table

defines a correspondence of a rating to a concrete user and a concrete movie as well as to the

exact date and time when the rating was recorded by the system. The second table consists of

two fields: „movie_id‟ and „movie_title‟. To reduce ambiguity, movie titles are complemented

with the year of production of the movie.

The ratings in the Netflix dataset are represented on a 5 point scale with 1 point step,

where 1 denotes the worst rating (“Hated It”) of the movie and 5 indicates the best rating

(“Really Liked It”). In the interface of Netflix the scale points correspond to the number of

stars that a user gave to a movie (see Figure 4.1a). Although MoviePilot presents its users a 11

point scale varying from 0 (“Hated This Movie”) to 10 (“My Favorite Movie”) in .5 points

step, the ratings are saved in the database as values from 0 to 100 corresponding to a tenfold

of the rating that a user provides (i.e., a rating of 7.5 points is stored as 75). In the interface of

MoviePilot the rating are surveyed from the users by means of a horizontal scale bar that sup-

ports a gradient fill effect and changes the caption text according to the currently selected

number of points (see Figure 4.1b).

Figure 4.1: Rating scales in user interfaces of recommender systems (a) Netflix, captions from 1 to 5 stars: “hated it”, “didn‟t like it”, “liked it”, “really liked it”, “loved

it”; (b) MoviePilot, captions altering in 3.5 points interval: “hated movie”, “not interested”, “aver-

age”, “good”, “my favorite movie”.

Table 4.1 presents descriptive statistics for the raw datasets of MoviePilot and Netflix.

(a) (b)

DRAFT -

final

revisi

on to

appe

ar in

2012


Table 4.1: Descriptive statistics of the raw rating datasets

MoviePilot Netflix

General characteristics

Number of ratings 1,389,749 100,480,507

Number of users 14,528 480,189

Number of movies 12,762 17,770

Scale interval 0 – 10 (0 – 100) 1 – 5

Scale step size .5 (5) 1

Time range 19-AUG-2006

– 04-APR-2008

11-APR-1999

– 31-DEC-2005

Ratings per user

Min 1 1

Max 6,687 17,653

Mean 95 209

Median 25 96

SD 214.17 302.33

Ratings per movie

Min 1 3

Max 6,546 232,944

Mean 108 5654

Median 13 561

SD 345.62 16,909.67

Ratings per day

Min 1 5

Max 78,164 737,570

Mean 2,583 46,049

Median 1,548 15,499

SD

4,498.33 58,558.61

However, in order to be able to perform our tests both datasets were reduced as follows:

We left out six latest ratings per user as a holdout for out-of-sample predictions and the com-

putation of accuracy measures for different recommender algorithms (in the following, we

refer to this holdout as to the “validation set”). Another six ratings were drawn randomly from

each user‟s rating profile to build a holdout for operation reasons of our proposed algorithm

(in the following “operation holdout”; see Sections 3.2.2 and 3.3.2). Users for whom there

was not enough data to generate both holdouts were discarded. We further discarded such

users for whom less than six ratings remained after isolating both holdout data. The descrip-

tive statistics of the resulting datasets are summarized in

DRAFT -

final

revisi

on to

appe

ar in

2012


Table 4.2.

Table 4.2: Descriptive statistics of the datasets employed in the study

MoviePilot Netflix training

set

operation

holdout

validation

set

training

set

operation

holdout

validation

set

General characteristics

Number of ratings 1,140,577 47,610 47,610 93,170,314 2,570,310 2,570,310

Number of users 7,935 7,935 7,935 428,385 428,385 428,385

Number of movies 12,246 5,052 5,037 16,543 16,241 16,212

Time range 19-AUG-

2006 – 04-

APR-2008

20-AUG-

2006 – 04-

APR-2008

20-AUG-

2006 – 04-

APR-2008

11-NOV-

1999 – 31-

DEC-2005

06-JAN-

2000 – 31-

DEC-2005

05-JAN-

2000 – 31-

DEC-2005

Ratings per user

Min 1 6 6 8 6 6

Max 6,535 6 6 16,419 6 6

Mean 143 6 6 217 6 6

Median 59 6 6 101 6 6

SD 250.16 0 0 304.50 0 0

Ratings per movie

Min 1 1 1 2 1 1

Max 4,543 802 677 213,367 15,816 12,354

Mean 93 9 9 5,623 158 158

Median 12 3 2 544 20 18

SD 262.90 36.23 35.7161 16,305.89 624.28 603.76

Ratings per day

Min 1 1 1 5 1 1

Max 56,194 3,413 3,629 703,924 27,936 17,202

Mean 2,120 106 104 42,631 1,283 1,242

Median 1,293 50 52 15,167 38 61

SD

3,451.80 201.44 206.64 53,378.06 3,423.97 2,820.47

The data about movie attributes (genres, acting stars, directors, writers, production

companies, budget, admissions, box office, year of production, country of origin; see Sections

2.2.2 for the derivation and Appendix B for a detailed list of attributes) was obtained from

IMDb under a limited, non-commercial license46

. The data is provided as a set of text files

that maintain connections between a particular movie title and a list of a specific type of at-

46 Copyright message: “Information courtesy of The Internet Movie Database (http://www.imdb.com). Used

with permission”. Licensing information can be obtained at http://www.imdb.com/licensing/ (for commercial use) and http://www.imdb.com/help/show_leaf?usedatasoftware (for non-commercial and personal use).

DRAFT -

final

revisi

on to

appe

ar in

2012


tributes (e.g. actors, countries of origin, etc.). We converted the text files to a database format

that is more convenient for our calculation purposes, so that each raw of the data table repre-

sents a movie and each column represents a specific attribute. Non-metric, i.e. nominal, at-

tributes (such as actors, directors, country of origin, etc.) were coded as binary variables with

1 denoting the presence of an attribute in a movie‟s characteristics and 0 otherwise. Metric

attributes (admissions, budget, box office, and year of production) were recoded as follows:

Movie budgets and box office values were converted to a common currency (US dollar) in

order to unify the measurement units and thus to increase the consistency of the estimation of

correspondent parameters in model (3.7). The movies‟ years of production were recoded as

the number of years from the current year. This rescales the correspondent model parameter

by three orders of magnitude (e.g. the year 2009 is recoded as 2), which simplifies compari-

sons of the production year‟s effect on the user‟s preference with the effects of nominal pa-

rameters when inspecting parameter values „manually‟. This also alters the interpretation of

the parameter‟s values, so that negative values indicate the preference for newer movies,

whereas positive values display the preference for older ones. That is, the meaning of the pa-

rameter changes to “preference of older movies”. Since this rescaling represents a positive

affine transformation to the data, i.e. adding a constant to all values of a variable, it has no

effect on either estimations or predictions made with our algorithm and so serves only for the

sake of convenience of visual inspection of the part-worth values. Since the admissions are

already scaled in common measurement units (number of tickets sold at movie theaters) their

values were not modified.

After the above described conversions, the IMDb data was merged with the datasets of

MoviePilot and Netflix by matching movie titles and the correspondent years of production

contained in all datasets considered. This step finalizes the preparation of the data for the ac-

tual study and concludes the data description.

The next section introduces the measures that we employ for the comparison of the pre-

diction accuracy of different recommender algorithms.

DRAFT -

final

revisi

on to

appe

ar in

2012


4.2 Measures of Prediction Accuracy

Prediction accuracy measures evaluate how close the ratings predicted by a recom-

mender algorithm are to the true user ratings (Herlocker et al. 2004). Two established accura-

cy measures that the majority of works in the research area of recommendation systems em-

ploy are the Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE). Formally,

these measures are defined as follows:

∑ | |

(4.1)

√∑

(4.2)

Whereas MAE measures the average absolute deviation between a predicted rating

and a user‟s true rating , RMSE puts more emphasis on large deviations through squaring

single errors before summing them up. For instance, an error of one point increases the error

sum by one, while an error of two points increases the sum by four. Through emphasizing

large errors, RMSE puts on par the algorithms that constantly make moderate errors for all

ratings with those that predict ratings fairly good for most of the time but also make large er-

rors in some of the cases. As can be seen from equations (4.1) and (4.2), RMSE always tends

to be greater and can never be smaller than MAE. RMSE can also be equal to MAE, however,

only in one specific case – when all predictions contain an error of a constant magnitude, i.e.

when | | for all .

The meaning of MAE and RMSE can also be interpreted in statistical terms: Since

MAE is defined as mean of absolute errors, it represents the first central moment of the error

distribution, i.e. the expected value of the error that an algorithm produces. RMSE, according

to its formal definition, is a square root of the variance of the algorithm‟s errors around zero,

i.e. in terms of statistics, RMSE is the second moment about zero. Hence, RMSE corresponds

to the standard deviation of the errors from the no-error point and therefore informs about the

“width” of the interval of the error distribution. That is, assuming the normal distribution of

the errors, about 68% of them lie in the interval bounded with ±RMSE, about 95% of the er-

DRAFT -

final

revisi

on to

appe

ar in

2012


rors are in the interval of ±2RMSE, and the interval of ±3RMSE accounts for 99.7% of the

errors. In other words, MAE and RMSE are informative about the distribution of the predic-

tion errors. Hence, it seems sensible to report both measures for the evaluation of the predic-

tive accuracy of different algorithms.

Nevertheless, both MAE and RMSE depend on the scale on which the ratings are sur-

veyed from the users. That is, although these measures allow comparisons of different algo-

rithms with respect to their predictive accuracy, these comparisons remain informative only

when the algorithms are tested on the same dataset or when the datasets employ the same rat-

ing scale. The latter does not hold in our case, since MoviePilot and Netflix utilize different

rating scales (see Table 4.1). A typical approach to overcome this limitation and so to make

the prediction runs on different datasets comparable is normalizing the measures with respect

to the range of rating values (Herlocker et al. 2004; Goldberg et al. 2001). The formal defini-

tions of the Normalized Mean Absolute Error (NMAE) and the Normalized Root Mean

Squared Error (NRMSE) are as follows:

(4.3)

(4.4)

where and denote respectively the minimum and the maximum ratings of a recom-

mender system‟s rating scale.

In Section 4.4 that presents the results of our empirical study we will report all of the

four accuracy measures introduced above. While normalized measures allow us to compare

predictive accuracy of different algorithms across different datasets consistently, the raw, i.e.

non-normalized, measures would allow the reader making comparisons of our results with the

results of other published or unpublished works on recommendation systems.

Prior to presenting the results of our study the next section provides some details on the

algorithms employed therein.

DRAFT -

final

revisi

on to

appe

ar in

2012


4.3 Employed Algorithms and Benchmarks

In order to provide an informative report about the predictive accuracy of our proposed

method, we ran a series of accuracy tests with some of the key recommender algorithms. Spe-

cifically, we used pure user-based and item-based collaborative filters – each in two variants

that differ with respect to the similarity measure employed, i.e. Pearson correlation coefficient

and cosine similarity (see Sections 2.3.1.1 and 2.3.1.2 for details). In these collaborative fil-

ters, we used the neighborhood size of that provided the best accuracy over all da-

tasets in preliminary test runs.

Another algorithm employed in our study is a Singular Value Decomposition-like ma-

trix factorization algorithm by Funk (2006), the foundation for all matrix factorization rec-

ommenders discussed in recent literature. As matrix factorization is known to provide one of

the best predictive accuracies for a single algorithm, we suggest a comparison with their basis

algorithm to be informative. However, a note should be taken regarding the results of the em-

ployed algorithm: In our preliminary prediction runs it turned out that matrix factorization is

highly sensible to its parameters, i.e. number of iterations, regularization parameter, learning

rate and number of factors47

. The optimal values of these parameters, in turn, are dependent

on the underlying data, and thus should be determined individually for each dataset in order to

achieve optimal results. These asserts are also supported by recent research work (e.g. Paterek

2007; Koren 2009; Koren, Bell, and Volinsky 2000; Koren and Bell 2011). Therefore, in our

comparisons we used differently parameterized versions of Funk‟s algorithm and report corre-

spondingly the best results of the algorithm for the respective datasets. For the comparisons

on the MoviePilot‟s dataset the factor model of the algorithm was learnt for the following

parameter values: , ,

and . For the Netflix dataset, the cor-

respondent parameters are , ,

and .

47 See Section 2.3.1.3 for explanation of the parameters’ meaning.

DRAFT -

final

revisi

on to

appe

ar in

2012


To better judge relative the accuracy improvements of different algorithms we introduce

two „benchmarks‟. The first benchmark is a simple heuristic that „predicts‟ a global average of

a dataset to be the value of all future ratings for all users. This benchmark, obviously, repre-

sents the bottom level of accuracy that a recommender system can provide to its users. That

is, if a recommender algorithm exhibits a lower level of prediction accuracy than the „global

average‟ method, it will make no sense for a recommender system to employ such an algo-

rithm since a simple heuristic performs better. The second benchmark is the result of the algo-

rithm that has won the Netflix‟ One Million Dollar Prize. By achieving an RMSE of

(Bell, Koren, and Volinsky 2008), it improved the RMSE of the Netflix‟ own algorithm by

10% and hence can be considered as the most accurate recommendation algorithm of the rec-

ommender domain. Therefore, we suggest the comparison with this benchmark to be informa-

tive. However, the test runs of this algorithm on our data are impeded by the fact that this al-

gorithm is, in essence, the result of blending of the predictions generated by more than 100

recommendation algorithms (Bell, Koren, and Volinsky 2008). Testing this algorithm on our

data would imply the implementation of all of its composite parts and the subsequent blending

of their results, which is not trivial. Not only would it take a lot of time and resources but also

it would open a huge sail area to errors and criticism that may attribute our results to imple-

mentation mistakes. Thus, we suggest using the reported RMSE and the corresponding

NRMSE instead, which can easily be calculated by means of equation (4.4) and amounts to

. In the tables that present the results of our study, i.e. in Table 4.3 and Table 4.4,

these values are denoted as the “Netflix Prize winner”. Unfortunately, the authors do not re-

port on the MAE of their algorithm. Nevertheless, we consider comparisons of NRMSE of

different algorithms with this benchmark to be informative for judging improvements in pre-

diction accuracy.

Whereas in the foregoing sections we provided a conceptual description of the design of

our study and of the methods employed therein, the insights in the details of the technical im-

plementation and execution of our tests can be found in Appendix C. The next section pre-

sents the results.

DRAFT -

final

revisi

on to

appe

ar in

2012


4.4 Results

This section presents the results and discusses the findings of our empirical study. Two

main questions concerned herein are (i) how good our proposed method predicts future user

ratings and (ii) what proportion of the users receive explanations behind the recommendations

in the most effective pros-and-cons explanation style. Each of these questions is addressed in

the following in a separate subsection.

4.4.1 Comparison of Prediction Accuracy

The results of the prediction runs of different algorithms are summarized in Table 4.3

(for the MoviePilot dataset) and Table 4.4 (for the Netflix dataset). In these tables we report

the accuracy of our proposed method in three rows: Firstly, the row “Estimation step” pro-

vides the results of the predictions made by using model (3.7) with the parameter values ob-

tained in the estimation step of our algorithm (see Section 3.2.1). Secondly, the row “Optimi-

zation step” reports the accuracy of the predictions by the same model initialized with opti-

mized parameter values (see Section 3.2.2). Finally, the row labeled as “Hybrid” provides the

accuracy measures obtained through hybridization of both highlighted methods, i.e. our opti-

mized solution and item-based CF, as described in Section 3.3.2.

The columns of the tables report the four accuracy measures introduced in Section 4.2

as well as the percentage of improvement achieved by a particular algorithm with respect to

the global average and the Netflix Prize winner benchmarks. For the reasons explained in the

foregoing section, the latter improvement is only reported for RMSE and NRMSE measures.

To simplify the comparison, two additional columns display the rank order of a correspondent

measure achieved by the algorithms compared. In these columns, lower ranks correspond to

better accuracy.

DRAFT -

final

revisi

on to

appe

ar in

2012

Tab

le 4

.3:

Com

pa

riso

n o

f th

e p

red

icti

on

acc

ura

cy o

f d

iffe

ren

t alg

ori

thm

s fo

r M

ovie

Pil

ot

data

set

Alg

ori

thm

M

AE

N

MA

E

Imp

rov

em

ent

w.r

.t.

glo

ba

l

av

era

ge

(%)

Ra

nk

# o

f

MA

E &

NM

AE

RM

SE

N

RM

SE

Im

pro

vem

ent

w.r

.t.

glo

ba

l

av

era

ge

(%)

Imp

rov

em

ent

w.r

.t.

Net

flix

Pri

ze w

inn

er

(%)

Ra

nk

# o

f

RM

SE

&

NR

MS

E

Ben

chm

ark

met

ho

ds

Glo

bal

aver

age

21

.556

41

0

.21

55

5

0.0

9

2

6.3

44

66

0

.26

34

5

0.0

-2

1.0

1

0

Net

flix

Pri

ze w

inner

n/a

n/a

n/a

n/a

n/a

0

.21

78

0

17

.3

0.0

2

Co

lla

bo

rati

ve f

ilte

rin

g m

eth

od

s

Use

r-b

ased

, P

ears

on

1

6.9

21

35

0

.16

92

1

21

.5

3

22

.157

78

0

.22

15

8

15

.9

-1.7

3

Use

r-b

ased

, C

osi

ne

17

.372

69

0

.17

37

3

19

.4

5

22

.551

14

0

.22

55

1

14

.4

-3.5

6

Item

-base

d,

Pea

rso

n

16

.806

97

0

.16

80

7

22

.0

2

22

.174

44

0

.22

17

4

15

.8

-1.8

4

Item

-base

d,

Co

sine

17

.214

27

0

.17

21

4

20

.1

4

22

.521

12

0

.22

52

1

14

.5

-3.4

5

Mat

rix f

acto

riza

tio

n

17

.566

50

0

.17

56

7

18

.5

6

22

.650

87

0

.22

65

1

14

.0

-4.0

7

Pro

po

sed

met

ho

d

Est

imati

on s

tep

1

8.1

86

84

0

.18

18

7

15

.6

8

24

.283

05

0

.24

28

3

7.8

-1

1.5

9

Op

tim

izat

ion s

tep

1

8.1

63

94

0

.18

16

4

15

.7

7

24

.177

54

0

.24

17

8

8.2

-1

1.0

8

Hyb

rid

1

6.1

92

31

0

.16

19

2

24

.9

1

20

.664

75

0

.20

66

5

21

.6

5.1

1


DRAFT -

final

revisi

on to

appe

ar in

2012

118

Chap

ter

4:

Em

pir

ical

Stu

dy

Tab

le 4

.4:

Com

pari

son

of

the

pre

dic

tion

acc

ura

cy o

f d

iffe

ren

t alg

ori

thm

s fo

r N

etfl

ix d

ata

set

Alg

ori

thm

M

AE

N

MA

E

Imp

rov

em

ent

w.r

.t.

glo

ba

l

av

era

ge

(%)

Ra

nk

# o

f

MA

E &

NM

AE

RM

SE

N

RM

SE

Im

pro

vem

ent

w.r

.t.

glo

ba

l

av

era

ge

(%)

Imp

rov

em

ent

w.r

.t.

Net

flix

Pri

ze w

inn

er

(%)

Ra

nk

# o

f

RM

SE

&

NR

MS

E

Ben

chm

ark

met

ho

ds

G

lob

al a

ver

age

0.9

360

9

0.2

281

5

0.0

1

1

1.1

089

9

0.2

772

5

0.0

-2

7.3

1

0

Net

flix

Pri

ze w

inner

n/a

n/a

n/a

n/a

0

.87

12

0

0.2

178

0

21

.4

0.0

2

Co

llab

ora

tive

filt

erin

g m

etho

ds

Use

r-b

ased

, P

ears

on

0

.67

94

0

0.1

698

5

25

.6

4

0.8

792

1

0.2

198

0

20

.7

-0.9

4

Use

r-b

ased

, C

osi

ne

0.6

954

8

0.1

738

7

23

.8

5

0.8

950

2

0.2

237

6

19

.3

-2.7

6

Item

-base

d,

Pea

rso

n

0.6

772

6

0.1

693

2

25

.8

2

0.8

771

4

0.2

192

9

20

.9

-0.7

3

Item

-base

d,

Co

sine

0.6

791

1

0.1

697

8

25

.6

3

0.8

797

5

0.2

199

4

20

.7

-1.0

5

Mat

rix f

acto

riza

tio

n

0.7

052

1

0.1

763

0

22

.7

6

0.9

032

4

0.2

258

1

18

.6

-3.7

7

Pro

po

sed

met

ho

d

Est

imati

on s

tep

0

.70

71

8

0.1

768

0

22

.5

8

0.9

118

9

0.2

279

7

17

.8

-4.7

9

Op

tim

izat

ion s

tep

0

.70

61

0

0.1

765

3

22

.6

7

0.9

076

0

0.2

269

0

18

.2

-4.2

8

Hyb

rid

0

.64

05

3

0.1

601

3

29

.8

1

0.8

222

0

0.2

055

5

25

.9

5.6

1


DRAFT -

final

revisi

on to

appe

ar in

2012


First of all, it can be seen, that differences between the MoviePilot and the Netflix da-

tasets impact the magnitudes of the accuracy measures. The normalized accuracy measures

(NMAE and NRMSE) are all greater in case of the Netflix dataset. The accuracy of the global

average predictions, i.e. our bottom-level benchmark, is impacted most. This is, however,

concordant with Koren‟s observation of a sudden jump of the mean rating in the Netflix da-

taset that has happened in early 2004 and may be attributed to the alternation of the Netflix‟s

rating scale labels (Koren 2009). Indeed, such increase of the mean rating should cause both,

increase of the mean error and increase of the error variance, which is reflected in higher

NMAE and NRMSE values of prediction runs on the Netflix dataset. Whereas the global av-

erage‟s predictions are impacted by definition48

, other methods compared exhibit a surprising

robustness to the alternation of scale points meaning: The difference in NMAE between the

predictions on both datasets becomes noticeable only in the fourth decimal place for the ma-

jority of the compared methods. Nevertheless, NRMSE values are injured one order of magni-

tude stronger, so that the difference between the prediction runs can be seen already in the

third position after the decimal point. Consequently, the percentage of accuracy improvement

with respect to our benchmarks is also impacted by these issues, which makes the correspond-

ent values less consistent across the considered datasets and less informative for comparison

reasons.

Another source of the higher values of the normalized accuracy measures on the Netflix

dataset may be simply its larger size: Conventional wisdom tells us that the chance to make

higher errors increases with the size of the dataset – simply because on a larger dataset an

algorithm has to make more predictions. Nonetheless, irrespectively of the proportion of con-

tributions of both issues to the higher values of error measures, it should be recognized that

the differences of the dataset do impact the accuracy measures. Hence, the comparison of ac-

curacy improvement should be taken with care and account for the above described circum-

stances.

Even so, it can be noted that the accuracy measures of our proposed method are impact-

ed considerably less by the difference between the datasets. We explain it by the fact that,

contrary to other methods, our preference model incorporates temporal effects. Hence, it was

able to capture the time changing component of the rating variance to a greater extent than the

48 Recall that global average is defined as the mean rating of a dataset.

DRAFT -

final

revisi

on to

appe

ar in

2012


competing methods did. This underlines the importance of accounting for temporal changes

within recommendation algorithms.

A more important observation is, nevertheless, that the rank order of different methods

with respect to their accuracy remains mostly persistent for both datasets employed. We inter-

pret this fact as an indicator of the generalizability of the result summaries provided in Table

4.3 and Table 4.4 – at least concerning the rank order of algorithms. That is, we assert that the

results obtained are descriptive of the algorithms‟ performance on the accuracy measures and

that they are generalizable to other datasets. Further discussion of the results concerning the

accuracy of our proposed method is provided based on these asserts:

It can be seen that our method‟s predictions, though exhibiting significant accuracy im-

provements (over 15%) with respect to the bottom line benchmark (i.e. global average), evi-

dently do not belong to the table leaders. Moreover, the results of the optimization step do not

differ substantially from the results of the estimation step. The former achieves only a mar-

ginal improvement (less than 1%) over the latter with respect to both MAE and RMSE. How-

ever, RMSE improves about five times better than MAE. A yet more interesting fact is that

the proposed hybridization of our method with the item-based CF breeds a sudden jump of

accuracy improvement that makes the aggregated method overperform all of its competitors –

even the Netflix winning algorithm. These observations lead us to the two following conclu-

sions:

Firstly, the superior accuracy of our hybrid over both of its components indicates that

our model (3.7) does not capture all the user rating variance. That is, the attribute based model

of user preferences fails to describe the preference formation for some of the users contained

in the datasets. For these users, the item-based CF produces predictions that are nearer to their

true ratings than the predictions of our method. Hence, item-based CF captures some movie

characteristics that are „hidden‟ from the attribute based preference model and go beyond

formal attributes. Such characteristics may be, for example, the deepness of character devel-

opment, an enthralling story line or the overall atmosphere of a movie. In other words, for the

users who base their preferences on such hard-to-formalize movie characteristics, the analysis

of item similarities is capable of revealing the relationships between the movies that are due to

such characteristics. On the other hand, there is also a substantial number of users whose pref-

erence structures are described better by our attribute based preference model. Therefore,

providing both groups of users with predictions based on the individual method that better

DRAFT -

final

revisi

on to

appe

ar in

2012


describes their correspondent preferences results in superior performance of our hybrid (see

Section 3.3.2). It follows that the inferior performance of our preference model is mostly not

due to calculation errors but is rather caused by the inability of the model to capture movie

preferences for a certain group of users.

Secondly, connecting this fact with the observation of a merely moderate improvement

of prediction accuracy from the estimation step to the optimization step of our algorithm al-

lows us to conclude that the former provides fairy good estimates of the model parameters.

That is, if our model‟s predictions outperform the predictions of item-based CF for a substan-

tial group of users and the lower accuracy for the remaining group of users is caused by the

model‟s inadequacy for describing the preference formation of these users; and if the combi-

nation of the pure prediction of both hybrid‟s components leads to superior overall results

then the indications will be that the model parameters were estimated reliably enough to sub-

stantially reduce the overall prediction error. The five times better improvement of RMSE (as

compared to the improvement of MAE) indicates that the optimization procedure reduces

mainly the error variance rather than the error‟s expected value. This, again, testifies that the

adjustment of the point estimates in the optimization step results in a slightly better fit of the

model to user preferences whereas the model bias (i.e. the error‟s expectation value) remains

nearly constant. That is, the initial interval estimates in the estimation step were obtained reli-

ably.

Now, let us take a closer look at the sudden accuracy improvement caused by the hy-

bridization of our preference model and the item-based CF. Consider Table 4.5 that summa-

rizes the distribution parameters of the absolute error after the optimization step.

Table 4.5: Distribution parameters

of the absolute prediction error of the optimization step

Dataset

Min

Max

Mea

n

SD

Mod

e

Ku

rtosi

s

SE

of

ku

rtosi

s

25

th

per

cen

tile

50

th

per

cen

tile

75

th

per

cen

tile

MoviePilot 0 100 18.19 16.36 0 2.434 .022 6.03 13.60 25.48

Netflix 0 5 .706 .624 0 2.527 .018 .2315 .5368 1.44

DRAFT -

final

revisi

on to

appe

ar in

2012


It can be seen that for both datasets, our algorithm exhibits relatively high positive kur-

tosis values (over 2) and a relatively low standard deviation (as compared to the mean error

value). Both facts indicate that the error distribution is highly peaked, i.e. the most error val-

ues are concentrated around a particular point rather than being spread in a wide interval. The

analysis of the quantiles (25th

, 50th

and 75th

percentiles) shows that the error distribution is in

addition positively skewed, i.e. the distribution‟s peak is situated nearer to the point of zero

error, rather than to the error mean. The distribution‟s peakedness and positive skewness are

also confirmed by the fact that the error‟s standard deviation around the mean (SD) is lower

than the value of RMSE49

. Further, it can be seen that the absolute prediction error is lower

than the value of the standard deviation and exceeds it only in about 30% of the cases (see the

50th

and the 75th

percentile). This means that the error measures are mainly constituted by a

low number of points with large deviations rather than by a large number of points with near-

ly equal deviations. These facts altogether provide the evidence that most of the time our

model predicts user ratings fairy accurately and that it fails to do this only in a relatively small

number of cases (about 30%).

In the latter cases, however, the magnitude of the error is substantially large, ranging

from about 25% to 100% of the RS‟s rating scale‟s interval. One possible explanation for this

can be a systematical nature of the large errors. The source of the systematic errors, in turn,

may be attributed to a range of factors such as a model‟s quality, calculation errors as well as

to user or item rating patterns. In order to prove this assumption and, if it is applicable, to

identify the source of the systematic errors, we inspected the ratings our algorithm produced

large errors for. Indeed, we found that large errors belong to the same group of users. This

proves our assumption of a systematic nature of the error and allows us to attribute it to the

users. However, we were unable to find patterns which allow the a-priory identification of

users with high prediction errors. That is, these groups of users do not exhibit any noticeable

regularities with respect to the source data, such as a low number of ratings or a specific rat-

ing distribution, etc., that allow to discern these users from those for whom our algorithm

produces lower errors. The only sensible explanation for this is that the „problematic‟ users

form their movie preferences on the basis of information which is not captured by the prefer-

ence function shown in equation (3.7). This observation supports our previously stated sug-

49 Recall that RMSE designates the standard deviation of the error distribution around zero, (see Section 4.2).

DRAFT -

final

revisi

on to

appe

ar in

2012


gestion that the attribute based preference model is unable to capture movie preferences for a

certain group of users which motivates hybridization.

To further justify the hybridization of our method with item-based CF, we performed

the Kolmogorov-Smirnoff test for the equality of distribution functions. The results showed

that the error distribution of the item-based CF significantly differs from the one produced by

the attribute based preference model ( for both MoviePilot and Netflix datasets). Con-

sistent with this, both approaches produced unequal errors for most users on the single user

level ( in Student‟s t-test for equality of means). Again, these results confirm that both

approaches capture respectively different „kinds‟ of the variance in the user ratings, each of

them good for describing the preference formation of different „kinds‟ of users. Hence, the

hybridization of individual predictions of both approaches as described in Section 3.3.2, is

sensible and results in a substantial improvement of predictive accuracy for the hybrid.

Table 4.6: Accuracy improvement of the hybrid methodTable 4.6 provides a summary

of the accuracy improvement of our proposed hybrid method in comparison with its compo-

nents (i.e. to individual predictions of the optimization step and the item-based CF) as well as

with the benchmark methods (i.e. the global average and the Netflix Prize winner algorithm).

It can be seen that the improvements are substantial and consistent with respect to both

(N)MAE and (N)RMSE on both MoviePilot and Netflix datasets.

Table 4.6: Accuracy improvement of the hybrid method

The values indicate percentage of accuracy improvement of the hybrid method relative to other methods

MoviePilot Netflix

Algorithm (N)MAE (N)RMSE (N)MAE (N)RMSE

Global average 24.88% 21.56% 29.81% 25.86%

Optimization step 10.85% 14.52% 9.28% 9.41%

Item-based CF, Pearson 3.65% 6.80% 5.42% 6.26%

Netflix Prize winner n/a 5.12% n/a 5.62%

Moreover, our proposed hybrid method outperforms all compared algorithms, even the most

accurate one – the Netflix Prize winning algorithm. This finding allows us to state that one of

our initial objectives, i.e. the development of an accurate recommendation algorithm, is

achieved.

DRAFT -

final

revisi

on to

appe

ar in

2012


4.4.2 Provided Explanation Style

In the previous section our hybrid method was shown to outperform all other methods

compared with respect to predictive accuracy. However, the predictions of the hybrid are pro-

duced as a combination of individual predictions of the hybrid‟s components, each of them

providing their own explanation styles that are differently effective for the user‟s decision

making (see Section 3.3.2): Whereas the attribute preference model based predictions allow

for the most effective pros-and-cons explanation style, the item-based predictions provide a

less efficient keyword explanation style. Since the item-based CF component has substantially

contributed to the outperforming predictive accuracy of the hybrid method, the question arises

in what proportion of cases the final recommendations of the hybrid are produced by means of

our user preference model. That is, how many users receive recommendations explained in

the most effective pros-and-cons explanation style?

Table 4.7 answers this question by providing a summary of the number of users for

whom each of the individual explanation styles applies. The numbers for the pros-and-cons

explanation style correspond to the number of cases when the final recommendations are pro-

duced by means of the user preference model (3.7). The numbers for the influence explana-

tion style reflect the number of cases when predictions of the item-based CF component of the

hybrid were used as final recommendations.

Table 4.7: Provided explanation style

MoviePilot Netflix

Explanation

style

Number of

users

Percentage of

users

Number of

users

Percentage of

users

Pros-and-cons 5,194 65.31% 290,146 67.73%

Influence 2,759 34.69% 138,239 32.27%

Total 7,953 100% 428,385 100%

We can see that the results are consistent for both employed datasets, i.e. do not exhibit

substantial differences between the datasets. Accordingly, item-based CF and its inherent in-

fluence explanation style were used for only about 34% of the users. The majority of the users

(about 66%) received explanations for the recommended items produced in the most detailed

and the most effective pros-and-cons explanation style.

DRAFT -

final

revisi

on to

appe

ar in

2012


Although our hybrid method could not ensure the provision of explanations in the most

detailed explanation style for all users, all of the users were provided explanations in one of

the top effective styles (see Sections 2.1.2 and 2.1.3). Nevertheless, we can state that our se-

cond objective, i.e. the provision of effective and actionable explanations, was achieved: Re-

call the discussion of the previous section where we found that the attribute preference model

cannot capture the preference structure for some users, because those users form their prefer-

ences based on factors other than movie attributes. Hence, the explanations of recommenda-

tions for these users in terms of movie attributes would not be informative for them and thus

would not increase the effectiveness of their choice making, simply because they rely on other

kinds of information while making choices. Since the item-based component of our hybrid

substantially increases the predictive accuracy for these users, it seems to capture the „right‟

part of the rating variance for these users. Hence, the influence based explanation style that

highlights the similarity between the movies is more informative, and thus, more effective for

users whose preferences are better described with the item similarity model of item-based CF.

At the same time, about two thirds of the user base is provided with detailed explana-

tions of the recommendations based on the attribute preference model that effectively captures

the preferences of the correspondent users. In other words, our hybrid method provides for

each group of users explanations of the recommendations in the style that is most effective for

the respective user group.

The latter assert is supported by the research of Biglic and Mooney (2005) and

Symeonidis, Napopoulos, and Manopoulos (2008) – strictly speaking, by the contrast of their

findings: Among other things, both studies compare respectively the effectiveness of and the

user satisfaction with the influence and the keyword explanation styles within an experimental

framework of a single recommendation algorithm. Whereas in Bilgic and Mooney‟s study, the

keyword explanation style dominated the influence style, the study of Symeonidis, Napopou-

los, and Manopoulos reveals the opposite findings (see Section 2.1.2). In both studies, howev-

er, the difference between the two explanation styles is not significant.

Recall now that our pros-and-cons explanation style derives from the keyword explana-

tion style and represents the extension of the latter. In the view of our findings that reveal the

existence of two user groups that form their preferences differently, the controversy of the

results of the two studies discussed above becomes explainable: Since both user groups are

substantially large, the users of both groups had sufficiently high chances to be assigned to

DRAFT -

final

revisi

on to

appe

ar in

2012


either of the experiment groups, i.e. the „keyword‟ and the „influence‟ groups. Hence, the

chances that both experimental groups contained a substantial number of the users of both

types are very high. Nevertheless, the exact proportions in which the users of both types were

represented in different experimental groups could slightly differ. This leads to the difference

in the „mean‟ judgments of the experimental groups though could not entail the significance

of the difference in means – because the groups were formed of the two user types which

were both allocated in comparable numbers but in different proportions among the experi-

ments. Although this explanation requires proof, we leave it to future research. At this point

we suggest this explanation to be convincing and concordant with our findings.

Summarizing the above said, we argue that in our empirical study our proposed method

confirmed its ability of providing actionable recommendations that increase the effectiveness

of recommendations. The consistency of the results on two different datasets indicates their

generalizability. This allows us to assert that the second objective of the current thesis is

achieved and thereby to conclude the development of our proposals.

The next section provides a brief summary of the findings of the empirical study.

4.5 Summary

The purpose of the current chapter was to test our theoretically developed algorithm for

providing recommendations and the explanations thereof in an empirical setting, i.e. to prove

the portability of our proposition to the real-world operating environment of recommender

systems as well as the compliance of the proposed method with the declared objectives of the

current thesis.

For this sake, we conducted an empirical study that employs datasets of user ratings to

movies of two real-world recommendation systems. Using these datasets, our proposed rec-

ommendation method was compared with the key recommendation algorithms with respect to

their prediction accuracy, i.e. the ability to generate reliable recommendations. Further, the

ability to provide effective and actionable explanations to the users was examined.

DRAFT -

final

revisi

on to

appe

ar in

2012


The results show that our proposed hybrid recommendation method outperforms collab-

orative filtering approaches and even the state-of-the-art Netflix Prize winning algorithm in

terms of predictive accuracy, while providing all users with explanations of the reasoning

behind recommendations.

The majority of the user base (about two thirds) received explanations in the pros-and-

cons explanation style that provided detailed, easy-to-understand and actionable explanations

that increase the efficiency of the user‟s choice. However, for the smaller fraction of users

(about one third of the user base) the explanations are given with a lower level of details than

can be provided by the pros-and-cons explanation style. This is due to the fact that our theo-

retically founded multi-attribute preference model does not capture the variance in the ratings

of these users. This indicates that they base their preferences on other factors than the infor-

mation contained in the formalizable movie attributes. However, the item-based CF method

was capable of producing reliable rating predictions for such users. This, in turn, indicates that

the similarity between the movies can serve as a reliable descriptor of the preference for-

mation for such users. Hence, to be effective, the explanations for such users should also be

provided in the style that better suits their preference function, i.e. in the influence explanation

style. Thus, both user groups received the explanations that effectively support users in their

choice making.

Since different user groups received recommendations provided by an algorithm that

better suits the users‟ preference functions, it can be argued that each user received recom-

mendations generated by a recommendation process that aligns with the user‟s preferences.

Hence, we can assert that the third aspect of our objectives is also achieved.

That is, in our empirical study, our proposed recommendation method has proven to be

capable of providing both, accurately predicted recommendations and actionable explanations

of the reasoning behind them, as well as aligning the recommendation process with the user

preferences.

The results are consistent for both datasets employed and do not exhibit significant al-

ternations between them. Since both datasets underlying the study exhibit unique characteris-

tics, the consistency of the results obtained on both datasets indicates the generalizability of

the findings to the domain of movie recommendations as well as recommendations systems as

a whole.

DRAFT -

final

revisi

on to

appe

ar in

2012

Chapter 5: Conclusions and Future Work 128

Chapter 5

Conclusions and Future Work

5 Conclusions and Future Work

In this chapter, we summarize our research and its findings as well as discuss implica-

tions of the latter and provide suggestions for future work. The first subsection provides a

brief recapitalization of the course of our analysis and the development of our algorithm as

well as it summarizes our contributions to research. The second subsection highlights the

main implications of our findings for recommender systems providers and developers. Final-

ly, the third subsection concludes our thesis with a discussion of ways to improve our pro-

posed recommendation method and of the avenues for future research.

5.1 Research Summary, Findings and Contributions

The aim of the current thesis was to develop a recommendation method which is capa-

ble of providing both, accurately predicted recommendations and actionable explanations of

the reasoning behind them, as well as aligning the recommendation process with the user

preferences.

In order to provide foundations for our developments, we began with a theoretical dis-

cussion addressing the questions why explanations of the recommendations should be an inte-

gral part of recommender systems and how they should be provided to the users. Prior re-

search has shown evidence that explanations of the reasoning behind recommendations are

DRAFT -

final

revisi

on to

appe

ar in

2012


capable of establishing the users‟ acceptance of and trust in recommendation systems as well

as of increasing the users‟ loyalty thereto. Moreover, explanations can further extend the us-

ers‟ decision effectiveness when choosing among recommended items and raise their satisfac-

tion with the choice. However, these advantages will only play off if explanations are under-

standable and actionable to the users. Not only dies this require that the users should compre-

hend explanations but also does it imply that the terms in which the explanations are provided

involve the terms that are relevant for the users‟ decision making, i.e. to comprise the charac-

teristics users actually employ for judging choice alternatives. On the other hand, the recom-

mendation algorithm should also reflect the user‟s way of thinking while producing recom-

mendations. That is, an algorithm should also operate in terms of characteristics that users

employ for judging choice alternatives. Not only does this allow an algorithm to provide ac-

tionable explanations to the users but also does it ensure that the recommendations produced

are effective, i.e. indeed reflect the user‟s optimal choice. However, in order for the latter to

hold, an algorithm should be aligned with the users‟ preference weights for the relevant char-

acteristics. That is, the importance weights of the characteristics employed in the algorithm

should be similar to the user‟s actual weights.

These considerations evoked us to employ a multi-attribute utility (MAU) model as a

basis for our recommendation algorithm. MAU model „decomposes‟ the utility of a choice

alternative into a sum of preferences of attributes an alternative consists of and hence is suita-

ble for modeling the user preferences that are based on item characteristics. Further, we sug-

gested using the weighted additive decision rule (WADD), which considers all the attributes

of an alternative for the identification of an optimal choice. Although the research provides

evidence that in many real-world situations (such as stress, time pressure, etc.) people rely on

simplifying decision procedures and evaluate only a fraction of the item‟s attributes, WADD

was shown to lead to the most effective choices. On the other hand, the choice of WADD for

our algorithm is supported by the work of Aksoy et al. (2006) who provided evidence that it is

enough for a recommendation algorithm to maintain the similarity of the internal representa-

tion of the user attribute weights to the „true‟ ones, while the decision strategy issue can be

ignored.

The choice of MAU and WADD as a basis for our recommendation algorithm made it

necessary to determine a list of attributes that should be considered by the algorithm we de-

velop. Given the domain of motion pictures for our recommendations we analyzed the movie

DRAFT -

final

revisi

on to

appe

ar in

2012


research literature in order to derive preference relevant movies attributes. However, we were

facing a lack of research on this particular topic: The existent theoretical discussion on the

preference relevance of movie attributes is neither secured empirically nor does it claim to

provide a complete list of the preference relevant movie attributes. Hence, we adopted the

existing findings and extended our list by movie attributes that are employed in the research

on movie success factors. The latter research field, however, concerns movie attributes as a

part of a superordinate concept „success factors‟ and considers them on an aggregate level, i.e.

the relevance of movie attributes for the preference formation of individual consumers is not

analyzed. However, it can be argued that if a factor is found to be relevant for reflecting the

choice of a corpus, it should also be relevant on an individual level. Following this argumen-

tation, we provided a discussion of the suitability of the movie success factors for describing

preferences of individuals that resulted in a list of 318 movie attributes to consider in our rec-

ommendation algorithm.

Further, to ensure the novelty of our approach, we provided an overview of key recom-

mendation algorithms as well as insights into the principles of their algorithmic functioning,

their problems and trade-offs. Specifically, we discussed a family of collaborative recommen-

dation techniques (including user-based CF, item-based CF and matrix factorization ap-

proaches) as well as content-based filtering. We also discussed hybrid methods, which com-

bine different recommendation techniques in one approach to mitigate the disadvantages of

the constituent methods and to utilize the respective strengths of the components of a hybrid

method. The discussion provided has also shed light onto mathematical issues of different

recommendation approaches. Above all, it was shown that the application of content-based

techniques in the domain of multimedia items, such as movies, is impeded by two factors: On

the one hand, the ability of contemporary content processing algorithms to extract meaningful

features from multimedia content is limited, which makes it impossible to compile a list of

preference relevant attributes automatically, i.e. without involving additional personnel on the

side of recommendation systems. The latter increases the costs of producing recommenda-

tions, which, in most cases, radically reduces the attractiveness of content-based algorithms

for recommender system providers due to economic considerations. On the other hand, even if

a list of preference relevant item characteristics can be obtained from a third-party provider,

such as IMDb, only a fraction of attributes can be utilized for the provision of recommenda-

tions: Most users of a recommendation system only have a limited number of ratings in their

user profiles, which (from the algebraic point of view) „naturally‟ limits the number of attrib-

DRAFT -

final

revisi

on to

appe

ar in

2012


utes for which user preference weights can be estimated to the number of the user‟s ratings.

Consequently, such a content-based recommender algorithm would not be able to capture a

substantial part of the variance in a user‟s ratings and thus to produce reliable recommenda-

tions for the majority of the recommender system‟s user base.

The latter considerations motivate us to utilize statistical techniques in our approach of

estimating users‟ attribute preferences. Hence, we account for this aspect already in the early

stages of the development of a conceptual framework for our recommendation algorithm:

Firstly, based on the findings of the preceding discussions we develop a model of user movie

preferences in a regression analysis manner. Essentially, our model incorporates four types of

effects:

(i) The very basic effects of movie-user interaction, i.e. preferences of a user to-

wards each of the movie attributes;

Two kinds of effects that are beyond the user-item interaction and due to either users or

items:

(ii) „Raw‟ user effects caused, e.g., by a user‟s perception and handling of the rating

scale or a user‟s reaction to the trends of main stream;

(iii) „Raw‟ item effects caused, e.g., by different popularity of different movies that

is not conditioned on the presence of a certain movie characteristic in the mov-

ie‟s profile;

And, finally,

(iv) temporal changes in the three kinds of effects presented above.

This leads us to a model of user preferences that contains 643 parameters, each of them to be

estimated individually for each user.

Since an estimation of this number of parameters cannot be done in a „traditional‟ way

for the majority of users who simply do not have a sufficient number of ratings in their user

profiles, we propose a two-step algorithm that accomplishes the estimation task by means of

statistical techniques. The first step of the algorithm provides interval estimates of the model

parameters, i.e. the estimates of the parameter values and of their confidence limits. These are

obtained in auxiliary regressions that are performed for each model parameter individually,

DRAFT -

final

revisi

on to

appe

ar in

2012


i.e. separately from other parameters. Such estimation is subject to the so-called omitted vari-

able bias and ignores correlations that may be present between the parameters of the model.

Thus, the estimates of such auxiliary regressions are theoretically unreliable and result in er-

roneous predictions of user ratings. Hence, to reduce the bias and to recover the reliability of

the estimates as well as their validity for predictions we propose a procedure that corrects the

initially obtained estimates for both the omitted variable bias and multicollinearity. The se-

cond step of the algorithm then optimizes the bias-corrected estimates to further increase the

data fit and to reduce prediction errors. The optimization is done by means of conjugate gra-

dient descent method, which was modified so that the parameter values are only allowed to

vary inside their respective confidence intervals. Leaping ahead, let us point out that in our

empirical study, this novel procedure of parameter estimation for an underdetermined regres-

sion model was proven to provide reliable estimates. Hence, we see this procedure itself as

one of the most notable contributions of the current thesis to research.

Next to presenting our model and the estimation procedure, we suggested the hybridiza-

tion of our method with item-based CF. We motivated this hybridization by the following

concerns: Although our model covers 318 movie attributes that potentially can capture the

preferences of the majority of users, the hedonic nature of motion pictures may affect some

users to judge movies using other criteria than movie attributes, e.g., less well-defined overall

impression or entertaining value. Hence, our model would not be able to capture the prefer-

ence of such users to a full extent. On the other hand, collaborative filtering techniques, which

are not concerned with movie attributes and base their recommendations on more general rat-

ing patterns, may be better off in revealing relations between movies for such “hedonically

oriented” users, and hence, in producing recommendations for them. Taking our objective to

provide users with actionable explanations along with recommendations themselves into ac-

count, we suggested hybridizing our method with item-based CF, because the latter approach

provides the second best explanation style with respect to its potential of increasing user

choice effectiveness. Hence, the hybrid method provides all users with one of the most effec-

tive explanations of recommendations. To ensure the ability to provide explanations, we do

not combine individual predictions of the component methods algebraically (e.g., by means of

averaging or weighting). Instead, we choose to use the „raw‟ prediction of the component that

performs best on withheld data as the final recommendation.

DRAFT -

final

revisi

on to

appe

ar in

2012


To test our proposed method and to locate its place within the family of contemporary

recommendation algorithms, we conducted an empirical study, which involved rating datasets

of two real-world recommender systems, each having their own inherent properties. The study

compared predictive accuracy of the key recommendation methods as well as it reported re-

sults of the most accurate Netflix Prize winning algorithm. The results are consistent for both

datasets, which indicates the generalizability of the findings for the domain of motion pictures

as well as for the domain of recommendation systems as a whole. It was shown that the two

groups of users indeed exist: The first and the larger group (about two thirds of the user base)

can be described well by our proposed multi-attribute model of user preferences. Consequent-

ly, for these users, the explicit preference modeling outperforms the CF component of our

hybrid method, thus providing more precise rating predictions and the most effective pros-

and-cons explanation style. The second, smaller group of users (about one third) seem to form

their movie preferences on other factors than movie attributes. For this group of users item-

based CF provides essentially more reliable rating predictions, i.e. item similarity is more

descriptive of these users‟ preferences. The latter also indicates that highlighting the similarity

of recommended movies to the previously seen ones is more informative for this part of users

as an explanation of recommendations. That is, each group of users received the most precise-

ly predicted recommendations supported by the most effective explanations.

All in one, our content-based hybrid method was shown to outperform collaborative fil-

tering techniques with respect to predictive accuracy, while inherently ensuring the provision

of explanations behind recommendations for each user in the most effective explanation style.

Notably, the prediction accuracy of our hybrid method outperformed also the Netflix Prize

winning algorithm that ranks as the most accurate among published recommendation algo-

rithms but possesses no inherent capability to provide explanations behind recommendations.

This finding constitutes the main contribution of the current thesis to research in the field of

recommendation systems.

DRAFT -

final

revisi

on to

appe

ar in

2012


Recapitalizing the above said, our results and contributions to the research can be brief-

ly summarized as follows:

(i) We extended the keyword explanation style by integrating of negative cues

therein and established theoretically that the resulting pros-and-cons explanation

style increases the effectiveness of recommendations for the user‟s decision

making.

(ii) We developed a content based recommendation algorithm for the domain of

multimedia products, i.e. for recommendations of motion pictures. This algo-

rithm outperforms the key recommendation algorithms for the majority of users

and is capable of providing them with explanations of recommendations that ef-

fectively support the users‟ decision making.

(iii) We developed a novel statistical approach for the estimation of highly underde-

termined regression models. The approach employs a set of auxiliary regressions

that estimate one regression parameter at once. The initial estimates are then cor-

rected for the omitted variable bias and multicollinearity and subsequently opti-

mized for s further reduction of prediction errors.

(iv) We have shown the existence of two substantially large user groups of movie

recommender systems who form their preferences differently. Providing rec-

ommendations for each group of users by means of a method that captures the

preferences of a correspondent user group better leads to a substantial increase in

the prediction accuracy of a recommender system.

(v) We showed that a carefully designed content based hybrid recommendation

method can outperform collaborative filtering algorithms with respect to predic-

tion accuracy.

(vi) We provided an empirical support for the findings of previous research that ar-

gues that “[recommendation] agents should think like the people they are at-

tempting to help” (Aksoy et al. 2006, p. 310).

The next section discusses implications of our findings.

DRAFT -

final

revisi

on to

appe

ar in

2012


5.2 Discussion and Implications

Even the most accurate recommendation algorithm is subject to prediction errors.

Hence, recommendation systems that aim at helping users to make better choices should take

account for factors that are beyond the rating predictions as such and spread their horizons to

encompass the aspects of the recommendation process as well as facilities that further in-

crease users‟ choice efficiency:

On the one hand, recommender system providers should make efforts to increase their

understanding of the criteria users base their decisions on and to integrate an ability to align

the process of the generation of recommendations with these criteria in their algorithms: Since

different users base their choices on different criteria, a recommender system should employ

different recommendation processes that match the individual user decision making and in-

corporate the user‟s underlying choice making criteria into a personalized recommendation

process. That is, rather than employing one algorithm that performs best on overall, a recom-

mendation system should handle its users individually. That is, a recommender system should

be a hybrid of several recommendation methods, each aligned with choice making criteria of a

specific user group, and provide recommendations to a user by means of a component method

that reflects the user‟s choice making best.

On the other hand, efforts should be made to increase the user‟s understanding of rec-

ommendations. That is, an explanation facility should be made an integral part of recom-

mender systems. This facility, however, should be tightly coupled with the recommender al-

gorithm: Provided that the recommendations for different users are produced differently (see

above), the explanations should also reflect the underlying process of producing recommenda-

tions and highlight the aspects of recommendations that are relevant for the user‟s choice

making. This increases the user‟s choice effectiveness and compensates algorithmic predic-

tion errors through allowing users to assess the quality and suitability of recommendations

before completing his or her choice. Furthermore, the provision of explanation as additional

decision supporting information allows the users to address the context in which the decision

is made better as well as other fine aspects of the decision‟s implications. In other words, ex-

planations can make the aspects addressable that are hardly addressable by an automated rec-

DRAFT -

final

revisi

on to

appe

ar in

2012


ommendation agent. For instance, if western fan Thorsten chooses a movie to watch after din-

ning out with his spouse Claudia, he will unlikely choose a protracted Clint Eastwood classic

for this occasion. Instead, he will appreciate a recommendation that hints an entertaining or

love-story component of a western movie, which will allow him to choose a movie that suits

his decision context best, i.e. a movie that is worth watching for both him and his wife. Not

only can an alignment of explanations with the user‟s decision relevant characteristics in-

crease the user‟s confidence in recommendations and the user‟s choice efficiency, the provi-

sion of explanations that are understandable and actionable to the users increases his or her

trust in and acceptance of a recommendation system as a whole, which also increases the us-

er‟s loyalty to a recommender.

The next section discusses the ways for improvement of the proposals made in the cur-

rent thesis and shows the directions for future work.

5.3 Future Research

No research publication can ever completely cover a topic with all it facets and nuances.

No research project is free of limitations. Neither is also our thesis. In the following, we will

discuss the limitations of our research and show the ways for its improvements and exten-

sions.

In the current thesis, we developed a recommendation algorithm that is capable of

providing explanations alongside with recommendations. Although the proposed explanation

style and its effectiveness for user choice making as well as the ability of the algorithm to

provide such explanations were proven theoretically in previous chapters, we cannot quantify

the degree to which the explanations presented in our proposed style actually increase the

choice effectiveness. This improvement can be substantial or it can be only marginal. Like-

wise, it can be argued that the effectiveness of the pros-and-cons explanation style may de-

pend on the nuances of the formulation of an explanation. These nuances include the ques-

tions of the optimal number of attributes that an explanation should report, the valance and the

balance between the positive and negative cues included in an explanation, the wording and

DRAFT -

final

revisi

on to

appe

ar in

2012


the length of explanations, their place and design in the user interface of a recommender sys-

tem, etc. These topics were not addressed within the current thesis and require additional user

studies, which would provide empirical tests of our theoretically founded propositions and

allow to increase the understanding of the issue of effective explanations in recommender

systems research.

Through providing consistent results on real-world datasets with different properties,

our proposed recommendation method was proven to be generalizable for the domain of mov-

ie recommendations. We encourage further studies that will test the suitability, applicability

and effectiveness of our method in other real-world applications as well as for recommenda-

tions of other types of items and products.

Research directions we would also like to explore concern the modeling side of our

method, such as extending the list of item attributes, adding interaction effects, accounting for

non-linear attribute preference functions and for non-linear temporal changes of the prefer-

ences. Accounting for these factors potentially increases the explanation power of the multi-

attribute preference model, which in turn can improve the prediction accuracy of our algo-

rithm as well as it can allow to capture the preferences of a greater part of users, i.e. also of

users whose preferences were not adequately captured by our model in our study.

Improvements can be made also to the proposed algorithm itself. So, similarity-based

techniques can be employed for enriching the representation of user profiles through the im-

putation of part-worths. Such an imputation, again, potentially increases the number of users

for whom our algorithm can provide reliable rating predictions by uncovering the attribute

preferences that are initially „hidden‟ from the algorithm. Furthermore, the amount of items

that can be potentially recommended to a user also increases. For example, if a user who likes

both action movies and westerns has only rated westerns, our content-based algorithm would

not be able to deduce the user‟s preference for action movies due to the lack of the corre-

spondent data. Hence, such user would never receive a recommendation of an action movie.

In this situation a user-based CF could determine that other users with similar ratings also rate

action movies high. This information could then be used to input the part-worths for the genre

„action‟ as well as for other attributes contained in the „source‟ users profiles (e.g. actors, di-

rectors, budgets, etc.) into the incomplete profile of the active user. Obviously, such imputa-

tion requires great care, so that the enriched user profile remains descriptive of the user‟s

preferences and balanced with respect to the relative importance of different attributes. A pos-

DRAFT -

final

revisi

on to

appe

ar in

2012


sible approach to ensure this is rescaling of the part-worths to be imputed on the value of the

part-worth of the known attributes. Another possible approach for imputation can be based on

the similarity or correlations of the known part-worths between profiles of different users.

Further, our empirical study revealed the existence of two substantially large user

groups that form their preferences differently. Whereas the first and larger group could be

reasonably well described by our multi-attribute preference model and thus received recom-

mendations predicted by the model, we used predictions of item-based CF for all users of the

second group. Although the predictions of item-based CF allowed to improve the prediction

accuracy of our hybrid method substantially, this method must not necessarily provide the

best description of the underlying preference structures for all users of the second group. It is

also possible that the users of this group can be further differentiated with respect to the crite-

ria they base their movie choices on or with respect to a method that predicts their ratings bet-

ter. We argue that further analysis of the users of the second group and application of a rec-

ommendation method that captures the preferences of each user better may be fruitful and

increase both the prediction accuracy and the effectiveness of explanations. Hence, we inspire

the researches to examine this issue more deeply and encourage the recommender system

providers to combine several recommendation techniques in their recommendation systems,

rather than building a system around an algorithm that performs best on overall.

Finally, a „joint product‟ of the current thesis is the mathematical core of our algorithm

– a method to estimate parameters of underdetermined regression models. Recall that already

the parameters predicted in the estimation step of the algorithm provided reasonably accurate

estimations of the user rating. It is worth mentioning that in many cases the estimation of 636

parameters was done on the basis of as many as only 6 data points. Utilizing further 6 data

points as a holdout in the optimization step improved the prediction accuracy by 1% and 5%

with respect to MAE and RMSE correspondingly. We suggest that these results are notable

and that the estimation method itself deserves attention from other research fields that deal

with the need to estimate many parameters based on a small number of data points. Hence, we

are eager to expand the application of our estimation method to the solution of other types of

problems than recommending items and see this as a potentially fruitful research field for our

future work.

DRAFT -

final

revisi

on to

appe

ar in

2012

Bibliography 139

Bibliography

Bibliography

Adomavicius, Gediminas and Alexander Tuzhilin (2005). "Toward the next generation of

recommender systems: A survey of the state-of-the-art and possible extensions",

IEEE transactions on knowledge and data engineering. 2005. 734–749.

Adomavicius, Gediminas, Ramesh Sankaranarayanan, Shahana Sen, and Alexander Tuzhilin

(2005). "Incorporating contextual information in recommender systems using a mul-

tidimensional approach." in ACM Transactions on Information Systems (TOIS), Vol.

23, Issue 1, pp. 103-145.

Adomavicius, Gediminas and Alexander Tuzhilin (2008). "Context-Aware Recommender

Systems", in Proceedings of the 2008 ACM conference on Recommender systems -

RecSys ’08, pp. 335-336.

Aksoy, Lerzan, Paul N. Bloom, Nicholas H. Lurie, and Bruce Cooil (2006). "Should

Recommendation Agents Think Like People?" in Journal of Service Research Vol. 8,

No. 4, pp. 297-315.

Aksoy, Lerzan, Bruce Cooil, and Nicholas H. Lurie (2011). "Decision Quality Measures in

Recommendation Agents Research" in Journal of Interactive Marketing Vol. 25

(2011), pp. 110-122.

Allan, James, Jaime Carbonell, George Doddington, Jonathan Yarmon, and Yiming Yang

(1998), "Topic Detection and Tracking Pilot Study Final Report", in Proceedings of

the DARPA Broadcast News Transcription and Understanding Workshop, pp. 194-

218.

Alspector, Joshua, Aleksander Kolcz, and Nachimuthu Karunanith (1998), "Comparing Fea-

ture-Based and Clique-Based User Models for Movie Seletion", in Proceedings ot

the third ACM Conference on Digital Libraries, Pittsburgh, PA, pages 11-18.

DRAFT -

final

revisi

on to

appe

ar in

2012

Bibliography 140

Anand, Sarabjot S. and Bamshad Mobasher (2005), “Intelligent Techniques for Web Person-

alization”, in Mobasher, Bamshad and Sarabjot Anand [eds.] “Intelligent Techniques

for Web Personalization”, Lecture Notes in Computer Science, Vol. 3169, Springer,

Heidelberg, Berlin, pp. 1-36.

Andersen, Stig K., Kristian G. Olesen, and Finn V. Jensen (1990). "HUGIN - A Shell for

Building Bayesian Belief Universes for Expert Systems", Morgan Kaufmann Pub-

lishers Inc., San Francisco, CA, USA.

Anderson, Chris (2004). "The long tail.“, Wired, Hyperion New York (10), 170-177.

Ansari, Asim, Skander Essegaier, and Rajeev Kohli (2000), "Internet Recommendation Sys-

tems", Journal of Marketing Research 37 (August): 363-376.

Ariely, Dan (2000). "Controlling the information flow: Effects on Consumers' Decision Mak-

ing and Preferences", in Journal of Consumer Research Vol. 27(2), pp. 233-248.

Ariely, Dan, John G. Lynch Jr, Manuel Aparicio IV (2004). "Learning by collaborative and

individual-based recommendation agents", in Journal of Consumer Psychology Vol.

14(1&2), pp. 81–95.

Augistin, Vernon E. (1927), "Motion Pictures Preferences", in Journal of Delinquency Vol 7,

pp. 206-209.

Austin, Bruce A. (1981), "Film Attendance: Why College Students Chose to See Their Most

Recent Film", in Journal of Popular Film and Television, Vol 9, pp. 43-49.

Austin, Bruce A. (1989), "A Factor Analysis Study of Attitudes Toward Motion Pictures", in

Journal of Social Psychlology, Issue 117, pp. 211-217.

Austin, Bruce A. (1989), "Immediate Seating: A Look at Movie Audiences", Wadsworth, Inc.

Avery, Christopher and Richard Zeckhauser (1997), “Recommender Systems for Evaluating

Computer Messages”, in Communications of the ACM, Vol. 40, Issue 3, pp. 88-89.

Baeza-Yates, Ricardo and Berthier Ribeiro-Neto (1999), "Modern Information Retrieval",

Addison-Wesley Longman Publishing Co., Inc. Boston, MA, USA.

DRAFT -

final

revisi

on to

appe

ar in

2012

Bibliography 141

Balabanovic, Marko and Yoav Shoham (1997), “Fab: Content-Based, Collaborative Recom-

mendation”, in Communications of the ACM, Vol. 40, No. 3, pp. 66-72.

Baltrunas, Linas (2008). "Exploiting Contextual Information in Recommender Systems", in

Proceedings of the 2008 ACM conference on Recommender systems - RecSys ’08,

pp. 295-298

Baltrunas, Linas and Francesco Ricci (2009). "Context-Dependent Items Generation in Col-

laborative Filtering", in ACM Workshop on Context-aware Recommender Systems

(CARS 2009), pp. 295-298

Balabanovic, Marko, and Yoav Shoham (1997), "Fab: Content-based, Collaborative Recom-

mendation", Communications of the ACM 40(3), pages 66-72.

Basu, Chumki, Haym Hirsh, William and Cohen (1998), "Recommendation as Classification:

Using Social and Content-based Information in Recommendation", in AAAI '98/IAAI

'98 Proceedings of the fifteenth national/tenth conference on Artificial intelli-

gence/Innovative applications of artificial intelligence, pp. 714–720.

Baudisch, Patrick (1999), “Joining Collaborative and Content-based Filtering”, in Proceed-

ings of the ACM Conference on Human Factors in Computing Systems, pp. 1-5.

Bell, Robert and Yehuda Koren (2007) “Scalable Collaborative Filtering with Jointly Derived

Neighborhood Interpolation Weights”, in Proceedings of the 2007 Seventh IEEE In-

ternational Conference on Data Mining (ICDM'07), pp. 43-52.

Bell, Robert, Yehuda Koren, and Chris Volinsky (2007), "Modeling Relationships at multiple

Scales to Improve Accuracy of Large Recommender Systems", in Proceedings of the

13th ACM SIGKDD International Conference on Knowledge Discovery and Data

Mining (KDD '07), pp. 95-104.

Bell, Robert, Yehuda Koren, and Chris Volinsky (2007b), "The BellKor Solution to the Net-

flix Prize", http://www2.research.att.com/~volinsky/netflix/ProgressPrize2007Bell

KorSolution.pdf, [retrieved on 20.06.2011]

Bell, Robert, Yehuda Koren, and Chris Volinsky (2008), "The BellKor 2008 Solution to the

Netflix Prize", http://www2.research.att.com/~volinsky/netflix/Bellkor2008.pdf, [re-

trieved on 20.06.2011]

DRAFT -

final

revisi

on to

appe

ar in

2012

Bibliography 142

Bennet, James and Stan Lanning (2007), "The Netflix Prize", in Proceedings of KDD Cup

and Workshop, August 12, 2007. www.netflixprize.com

Bettman, James R., Eric J. Johnson, and John W. Payne (1991). “Consumer Decision Mak-

ing.” In Thomas S. Robertson and Harold H. Kassarjian (Eds.) "Handbook of Con-

sumer Behavior", Prentice Hall, pp. 50–84.

Bilgic, Mustafa and Raymond J. Mooney (2005). "Explaining recommendations: Satisfaction

vs. Promotion", in Proceedings of Beyond Personalization 2005: the Workshop on

the Next Stage of Recommender Systems Research at the 2005 International Confer-

ence on Intelligent User Interfaces (IUI'05), pp. 1-6.

Billsus, Daniel and Michael J. Pazzani (1999), "A Personal News Agent that Talks, Learns

and Explains", in Proceedings of the 3rd

ACM Annual Conference on Autonomous

Agents (AGENTS'99), pp. 268-275.

Billsus, Daniel and Michael J. Pazzani (2000), "User Modeling for Adaptive News Access",

in User-Modeling and User-Adapted Interaction Vol. 10(2-3), pp. 147-180.

Billsus, Daniel and Michael J. Pazzani, and James Chen (2000), "A Learning Agent for Wire-

less News Access", in Proceedings of the 5th

ACM International Conference on Intel-

ligent User Interfaces (IUI'00), pp. 33-36.

Bodapati, Anand V. (2008). "Recommendation Systems with Purchase Data", Journal of

Marketing Research, 45 (1), 77-93.

Breese, John S., David Heckerman, and Carl Kadie (1998), “Empirical Analysis of Predictive

Algorithms for Collaborative Filtering”, in Proceedings of the 14th Conference on

Uncertainty in Artificial Intelligence (UAI-98), San Francisco, July 24-26, pp. 43-52.

Brézillon, Patric J. and Jean-Charles Pomerol (1996). “Misuse and Nonuse of Knowledge-

based Systems: the Past Experiences Revisited”, in "Implementing Systems for Sup-

porting Management Decisions", Patrick Humphreys, Liam Bannon, Andrew

McCosh, Piero Migliarese and Jean-Charles Pomerol (eds.), Chapman and Hall, pp.

44-60.

DRAFT -

final

revisi

on to

appe

ar in

2012

Bibliography 143

Buchanan, Bruce G. and Edward H. Shortliffe (1984). "Rule-based Expert Systems: The

MYCIN Experiments of Stanford Heuristic Programming Project", Addison-Wesley,

Reading, MA.

Burke, Robin (2002), "Hybrid recommender systems: Survey and experiments", in User

Modeling and User Adapted Interaction (2002) Vol. 12, Issue: 4, pp. 331–370.

Canny, John (2002), “Collaborative Filtering with Privacy via Factor Analysis”, in Proceed-

ings of the 25th

Annual International ACM SIGIR Conference on Research and De-

velopment in Information Retrieval SIGIR’02, pp. 238-245.

Caroll, J. Douglas, Paul E. Green (1995). "Psychometric Methods in Marketing Research:

Part I, Conjoint Analysis", in Journal of Marketing Research, Vol. 32 (4), pp. 385-

391.

Chakrabarti, Soumen (2002), "Mining the Web: Discovering Knowledge from Hypertext Da-

ta", 1st edition, Morgan Kaufmann Publishers, San Francisco.

Chakravarti, Dipankar and John G. Lynch (1983). “A Framework for Examining Context Ef-

fects on Consumer Judgment and Choice”. In R. P. Bagozzi and Alice M. Tybout

(Eds.), “Advances in Consumer Research”, Vol. 10. Ann Arbor, MI: Association of

Consumer Research, pp. 289-297.

Chen, Li (2009). "Adaptive Tradeoff Explanations in Conversational Recommenders", Pro-

ceedings of the third ACM conference on Recommender systems, ACM 225–228.

Claypool, Mark, Anuja Gokhale, Tim Miranda, Pavel Murnikov, Dmitry Netes, and Matthew

Sartin (1999), “Combining Content-Based and Collaborative Filters in an Online

Newspaper”, in Proceedings of ACM SIGIR’99 Workshop on Recommender Systems:

Algorithms and Evaluation, pp. 1-8.

Cooke, Alan D.J., Harish Sujan, Mita Sujan, Barton A. Weitz (2002). "Marketing the Un-

farmiliar: The Role of Context and Item-Specific Information in Electronic Agent

Recommendations", Journal of Marketing Research, Vol 1, pages 488-497.

Cooper-Martin, Elizabeth (1991), “Consumers and Movies: Some Findings on Experiential

Products”, in Advances in Consumer Research 18, pp. 372-378.

DRAFT -

final

revisi

on to

appe

ar in

2012

Bibliography 144

Cooper-Martin, Elizabeth (1992), “Consumers and Movies: Information Sources for Experi-

ential Products”, in Advances in Consumer Research 19, pp. 756-761.

Corner, James L., Craig W. Kirkwood (1991), "Decision Analysis Applications in the Opera-

tions Research Literature, 1970–1989", in Operation Research, Vol. 39, Issue 2, pp.

206–219.

Cramer, Henriette, Vanessa Evers, Satyan Ramlal, Maarten Someren, Lloyd Rutledge, Natalia

Stash, Lora Aroyo, Bob Wielinga (2008). "The Effects of Transparency on Trust in

and Acceptance of a Content-based Art Recommender", in User Modeling and User-

Adapted Interaction 18, 5, pp. 455-496.

Das, Abhinadan S., Mayur Datar, Ashutosh Garg, and Shuyam Rajaram (2007), “Google

news personalization: scalable online collaborative filtering”, in Proceedings of the

16th international conference on World Wide Web (WWW’07), ACM, New York, pp.

271-280.

Delgado, Joaquin and Naohiro Ishii (1999), “Memory-Based Weighted-Majority Prediction

for Recommender Systems”, in Proceedings of the ACM SIGIR’99, Workshop Rec-

ommender Systems: Algorithms and Evaluation, pp. 1-5.

De Vany, Arthur and Walls, David (1999), “Uncertainty in the Movie Industry: Does Star

Power Reduce the Terror of the Box Office?”, in Journal of Cultural Economics, Vol

23, pp. 285-318.

Diehl, Kristin, Laura J. Kornish, and John G. Lynch Jr. (2003), “Smart Agents: When Lower

Search Costs for Quality Information Increase Price Sensitivity,” Journal of Con-

sumer Research, 30 (June), pp. 56-71.

Dick, Alan S. and Kunal Basu (1994), "Customer Loyalty: Toward an Integrated Conseptual

Framework", in Journal of the Academy of Marketing Science, Vol 22, Issue 2, pp.

99-113.

Ding, Yi and Xue Li (2005), "Time Weight Collaborative Filtering", in Proceedings of the

14th ACM International Conference on Information and Knowledge Management,

pp. 485-492.

DRAFT -

final

revisi

on to

appe

ar in

2012

Bibliography 145

Doyle, Dónal, Alexey Tsymbal, and Pádraig Cunningham (2003). "A Review of Explanation

and Explanation in Case-based Reasoning", Technical Report, Department of Com-

puter Science, Trinity College, Dublin.

Edwards, Ward (1954), "The theory of decision making", in Psychological Bulletin, Vol. 51,

380-417.

Edwards, Ward (1961), "Behavioral decision theory", in Annual Review of Psychology, Vol.

12, pp. 473–498.

El Helou, Sandy, Christophe Salzmann, Stéphane Sire, and Denis Gillet (2009), " The 3A

contextual ranking system: simultaneously recommending actors, assets, and group

activities", in RecSys '09 Proceedings of the third ACM conference on Recommender

systems, pp. 373-376.

Fishburn, Peter C. (1967). "Methods of estimating Additive Utilities", in Management

Schience Vol. 18 (7), pp. 435-453.

Fishburn, Peter C. (1968). "Utility Theory", in Management Science Vol. 14 (5), pp. 335-378.

Fishburn, Peter C. (1988). "Nonlinear Preference and Utility Theory", John Hopkins Universi-

ty Press, Baltimore.

Fishburn, Peter C. (1970), "Utility Theory for Decision Making", Wiley, New York.

Fitzsimons, Gavan J. and Donald R. Lehmann (2004), „Reactance to Recommendations:

When Unsolicited Advice Yields Contrary Responses,“ Marketing Science, Institute

for Operations Research and the Management Sciences 23 (1), 82-94.

Funk, Simon (2006), "Netflix Update: Try this at Home", retrieved at http://sifter.org/~simon/

journal/20061211.html, on 04.06.2011.

Gershoff, Andrew D., Ashesh Mukherjee, and Anirban Mukhopadhyay (2003). "Consumer

acceptance of online agent advice: Extremity and positivity effects.", in Journal of

Consumer PsychologyVol 13, pp. 161-170.

Gigerenzer, Gerd, Peter M. Todd, and the ABC Research Group (1999), "Simple heuristics

that make us smart", New York: Oxford University Press.

DRAFT -

final

revisi

on to

appe

ar in

2012

Bibliography 146

Goldberg, David, David Nichols, Brian M. Oki, and Douglas Terry (1992). "Using

collaborative filtering to weave an information tapestry", Communications of the

ACM, ACM 35 (12), 61-70.

Goldberg, Ken, Theresa Roeder, Dhruv Gupta, and Chris Perkins (2001), “Eigentaste: A Con-

stant Time Collaborative Filtering algorithm”, in Information Retrieval, Vol. 4, No. 2,

pp. 133-151.

Golub, Gene H. and William Kahan (1965), "Calculating the Singular Values and Pseudo-

inverse of a Matrix", in Journalof the Society for Industrial and Applied Mathemat-

ics, Series B: Numerical Analysis, Vol. 2, No. 2, pp. 205-224.

Green, Paul E., Yoram Wind, and Arun K. Jain (1972), "Preference Measurement of Item

Collections", in Journal of Marketing Research, Vol. 9, pp. 371-377.

Green, Paul E. and Yoram Wind (1973), "Multiattribute Decisions in Marketing: A Measure-

ment Approach", Hinsdale, II.

Green, Paul E., V. Srinivasan (1990), "Conjoint Analysis in Marketing: New Developments

With Implications for Research and Practice", in Journal of Marketing, October

1990, pp. 3-15

Grudin, Jonathan (1988), "Why CSCW Applications Fail: Problems in the Design and Eval-

uation of Organizational Interfaces", in Proceedings of the 1988 ACM Conference on

Computer-Supported Cooperative Work (CSCW '88), pp. 85-93.

Gunawardana, Asela and Christopher Meek (2009), “A Unified Approach to Building Hybrid

Recommender Systems”, in RecSys '09 Proceedings of the third ACM conference on

Recommender systems, pp. 117-124.

Hennig-Thurau, Thorsten, Christian Friege, Sonja Gensler, Lara Lobschat, Arvind

Rangaswamy, and Bernd Skiera (2010). "The Impact of New Media on Customer

Relationships", in Journal of Service Research, August 11, 2010 Vol. 13, No. 3, pp.

311-330.

Hennig-Thurau, Thorsten, Mark B. Houston, Gianfranco Walsh (2006), "Differing Roles of

success Drivers Across Sequential Channels: An Application to the Motion Picture

Industry", in Journal of Academy of Marketing Science, Vol. 34(4), pp. 559-575.

DRAFT -

final

revisi

on to

appe

ar in

2012

Bibliography 147

Hennig-Thurau, Thorsten, Mark B. Houston, Gianfranco Walsh (2007), "Determinants of

Motion Picture Box Office and Profitability: an interrelationship approach", in Re-

view of Managerial Science, Vol. 1(1), pp. 65-92.

Hennig-Thurau, Thorsten and Alexander Klee (1997), "The Impact of Customer Satisfaction

and Relationship Quality on Customer Retention: A Critical Reassessment and Mod-

el Development", in Psychology & Marketing, Vol. 14, pp. 737-764.

Hennig-Thurau, Thorsten, André Marchand and Paul Marx (2011), „Can Automated Recom-

mender Systems Lead to Better Group Decisions?,“ AMA Winter Educatorsʼ Con-

ference, Track 10 .

Hennig-Thurau, Thorsten, Walsh, Gianfranco, and Wruck, Oliver (2001) “An Investigation

into the Success Factors of Motion Pictures”, in Academy of Marketing Science Re-

view, (at amsreview.org/amsrev/theory/hennig06-01.html).

Herlocker, Johnathan, Joseph A. Konstan, Al Borchers, and John T. Riedl (1999). "An algo-

rithmic framework for performing collaborative filtering", in SIGIR ’99: Proceedings

of the 22nd Annual In- ternational ACM SIGIR Conference on Research and De-

velopment in Information Retrieval, 230–237.

Herlocker, Johnathan L., Joseph A. Konstan, and John T. Riedl (2000), "Explaining Collabo-

rative Filtering Recommendations", in Proceedings of the 2000 ACM conference on

Computer supported cooperative work, ACM New York, NY, USA, pp. 241–250.

Herlocker, Johnathan L., Joseph A. Konstan, and John T. Riedl (2002). "An Empirical Analy-

sis of Design Choices in Neighborhood-based Collaborative filtering Algorithms", in

Information Retrieval, Vol 5, No. 4, pp. 287–310.

Herlocker, Johnathan L., Joseph A. Konstan, Loren G. Terveen, and John T. Riedl (2004).

„Evaluating Collabotarive Filtering Recommender Systems“, ACM Transaction on

Information Systems 22(1), ACM New York, NY, USA 5–53.

Hill, Will, Larry Stead, Mark Rosenstein, and George Furnas (1995), "Recommending and

Evaluating Choices in a Virtual Community of Use", in Proceedings of ACM CHI’95

Conference on Human Factors in Computing Systems, pp.194–201.

DRAFT -

final

revisi

on to

appe

ar in

2012

Bibliography 148

Hirschman, Elizabeth C. and Morris B. Holbrook (1982), "Hedonic Consumption: Emerging

Concepts, Methods and Propositions", in Journal of Marketing, Vol. 46 (Summer),

pp. 92-101.

Holbrook, Morris B. and Hirschman, Elizabeth C. (1982) “The Experiential Aspects of Con-

sumption: Consumer Fantasies, Feelings, and Fun”, in Journal of Consumer Re-

search, Vol. 9 (September), pp. 132-140.

Horvitz, Eric, John Breese, and Max Henrion (1988). "Decision Theory in Expert Systems

and Artificial Intelligence", in International Journal of Approximate Reasoning, Spe-

cial Issue on Uncertainty in Artificial Intelligence, 2 (3), pp. 247-302. Also, Stanford

CS Technical Report KSL-88-13.

Ito, Tiffany A., Jeff T. Larsen, N. Kyle Smith, and John T. Cacioppo (1998). "Negative In-

formation Weighs More Heavily on the Brain: The Negativity Bias in Evaluative

Categorizations", in Journal of Personality and Social Psychology, Vol. 75, No. 4,

pp. 887-900.

Jacoby, Jacob, Donald E. Speller, and Carol Kohn Berning (1974). "Brand choice behavior as

a function of information load: Replication and extension", in Journal of Consumer

Research, 1, 33–42.

Jannach, Dietmar, Markus Zanker, Alexander Feldfering, and Gerhardt Friedrich (2011),

"Recommender Systems: An Introduction", Cambridge university press, New York,

2011.

Johnson, Harry and Peter Johnson (1993). "Explanation Facilities and Interactive Systems", in

IUI '93 Proceedings of the 1st international conference on Intelligent user interfaces,

ACM New York, NY, USA, pp.: 159-166.

Johnston, Jack and John DiNardo (1997), "Econometric Methods", 4th

edition, McGraw-Hill,

New York.

Kahneman, Daniel and Amos Tversky (1984). "Choices, values, and frames", in American

Psychologist, Vol. 39, pp. 341–350.

Kamenta, Jan (1971), "Elements of Econometrics", McMillan, New York.

DRAFT -

final

revisi

on to

appe

ar in

2012

Bibliography 149

Kanouse, David E. and Reid L. Hanson (1972), "Negativity in Evaluations," in Attribution:

Perceiving the Causes of Behavior, eds. Edward E. Jones and David E. Kanouse,

Hillsdale, NJ, England: Lawrence Erlbaum Associates, Inc., 47-62

Keefer, Donald L , Kirkwood, Craig W , Corner, James L (2002), "Summary of Decision

Analysis Applications in the Operations Research Literature 1990-2001", Technical

Report of the Department of Supply Chain Management of the Arizona State Univer-

sity, ( retrieved at http://www.informs.org/content/download/14833/178547/file/DA

AppsSummaryTechReport.pdf, 30.06.2011)

Kim, Dohyun and Bong-Jin Yum (2005), “Collaborative Filtering Based on Iterative Principal

Component Analysis”, in Expert Systems with Applications, Vol. 28, pp. 823-830.

Klein, Noreen M. and Manjit S. Yadav (1989), “Context Effects on Effort and Accuracy in

Choice: An Inquiry into Adaptive Decision Making.” In Journal of Consumer Re-

search, Vol. 15 (4), pp. 411–421.

Komarek, Paul (2004), "Logistic Regression for Data Mining and High-dimensional Classifi-

cation", Doctoral Dissertaion, Carnegie Mellon University Pittsburgh, PA, USA. [re-

trieved at http://www.autonlab.org/autonweb/14709/version/4/part/5/data/komarek:

lr_thesis.pdf, on 05.07.2011]

Konstan, Joseph A., Bradley N. Miller, David Malz, Jonathan L. Herlocker, Lee R. Gordon,

and John Riedl (1997), “GroupLens: Applying Collaborative Filtering to Usenet

News”, in Communications of the ACM, Vol. 30, No. 3, pp. 77-87.

Konstan, Joseph A., John Riedl, Al Borchers, and Jonathan L. Herlocker (1998) “Recom-

mender Systems: A GroupLens Perspective”, in Recommender Systems: Papers from

the 1998 Workshop (AAAI Technical Report WS-98), Vol. 8, pp. 60-64.

Koren, Yehuda (2008), "Factorization Meets the Neighborhood: A Multifaceted Collaborative

Filtering Model", in Proceeding of the 14th ACM SIGKDD International Conference

on Knowledge Discovery and Data Mining, pp. 426-434.

Koren, Yehuda (2009), "Collaborative Filtering with Temporal Dynamics", in Proceedings of

the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data

Mining, pp. 447-456.

DRAFT -

final

revisi

on to

appe

ar in

2012

Bibliography 150

Koren, Yehuda (2010), "Factor in the Neighbors: Scalable and Accurate Collaborative Filter-

ing", in ACM Transactions on Knowledge Discovery from Data (TKDD), Vol. 4, No.

1, pp. 1-24.

Koren, Yehuda, Robert Bell, Chris Volinsky (2009), "Matrix Factorization Techniques for

Recommender Systems", in IEEE Computer Society Press Los Alamitos, CA, USA

Vol.42, Issue 8, pp. 42-49.

Koren, Yehuda, and Robert Bell (2011), “Advances in Collaborative Filtering”, in Ricci,

Francesco, Lior Rokach, Bracha Shapira, Paul B. Kantor [eds.] (2011). "Recom-

mender Systems Handbook", Springer Science + Business Media LLC, pp. 145 -

186.

Lacave, Carmen and Francisco J. Diéz (2004). "A review of explanation methods for heuristic

expert systems", in The Knowledge Engineering Review, 19, pp 133-146.

Lange, Kenneth (2010), "Optimization (Springer Texts in Statistics)", Springer Verlag New

York LLC.

Lam, Shyong K. and John Riedl (2004), “Shilling Recommender Systems for Fun and Profit”,

in Proceedings of the 13th international conference on World Wide Web, WWW’04,

pp. 393–402.

Linden, Greg, Brent Smith, and Jeremy York (2003), “Amazon.com Recommendations: Item-

to-item Collaborative Filtering”, in Internet Computing, IEEE, pp. 76-80.

Lops, Pasquale, Marco de Gemmis, and Giovanni Semeraro (2011), "Content-based Recom-

mender Systems: State of the Art and Trends ", in Ricci, Francesco, Lior Rokach,

Bracha Shapira, Paul B. Kantor [eds.] (2011). "Recommender Systems Handbook",

Springer Science + Business Media LLC, pp. 73 - 105.

Luce, R. Duncan (1992), "Where does subjective expected utility fail descriptively?", in

Journal of Risk and Uncertainty, Vol. 5, pp. 5-27.

Lutz, Richard J. (1975), "Changing Brand Attitudes through Modification of Cognitive Struc-

ture", in Journal of Consumer Research, 1 (March), pp. 49 - 59.

DRAFT -

final

revisi

on to

appe

ar in

2012

Bibliography 151

Maimon, Oded and Lior Rokach (eds.) (2005), "The Data Mining and Knowledge Discovery

Handbook", Springer Science+Business Media Inc.

Majchrzak, Ann and Les Gasser (1991). “On using Artificial intelligence to integrate the de-

sign of organizational and process change in US manufacturing”, AI and society,

Vol. 5, pp 321-338.

McNee, Sean M., Shyong K. Lam, Joseph A. Konstan, John and Riedl (2003), "Interfaces for

Eliciting New User Preferences in Recommender Systems", in The 9th International

Conference on User Modeling (UM'2003), pp. 178–187.

McSherry, David (2005), "Explanation in recommender systems", in Artificial Intelligence

Review, Vol. 24, Issue 2, pp. 179 – 197.

Mehta Bhaskar, Thomas Hofmann, Wolfgang Nejdl (2007), “Robust Collaborative Filtering”,

in Proceedings of the 2007 ACM conference on Recommender Systems, pp. 49-56.

Melville, Prem, Raymond J. Mooney, and Ramadass Nagarajan (2002), “Content-boosted

Collaborative Filtering”, in Proceedings of 18th

National Conference on Artificial In-

telligence (AAAI-2002), pp. 187-192.

Mild, Andreas and Martin Natter (2002). "Collaborative Filtering or Regression Models for

Internet Recommendation Systems?", in Journal of Targeting, Measurement and

Analysis for Marketing, volume 10, issue 4, pages 304-313.

Miller, Christopher A. and Raymond Larson (1992). "An Explanatory and "Argumentative"

Interface for a Model-based Diagnostic System", in Proceedings of the 5th annual

ACM symposium on User interface software and technology (UIST'92), ACM, pp:

43-52

Mladenic, Dunja (1999), “Text-learning and Related Intelligent Agents: A Survey”, in IEEE

Intelligent Systems, Vol. 14, No. 4, pp. 44-54.

Mobasher Bamshad, Robin Burke, Runa Bhaumik, and Chad Williams (2007), "Towards

Trustworthy Recommender Systems: An Analysis of Attack Models and Algorithm

Robustness", in ACM Transactions on Internet Technology, Vol. 7, No. 2, pp.23-60.

DRAFT -

final

revisi

on to

appe

ar in

2012

Bibliography 152

Moon, Sangkil, Paul K. Bergey, and Dawn Iacobucci (2010), "Dynamic Effects Among Mov-

ie Ratings, Movie Revenues, and Viewer Satisfaction", in Journal of Marketing, Vol.

74, pp. 108-121.

Mooney, Raymond J., and Loriene Roy (1999). "Content-based Book Rrecommending Using

Learning for Text Categorization" Proceedings of the ACM SIGIR'99, Workshop on

Recommender Systems: Algorithms and Evaluation

Mooney, Raymond J., and Loriene Roy (2000). "Content-based Book Rrecommending Using

Learning for Text Categorization" Proceedings of the Fifth ACM Conference in Digi-

tal Libraries. San Anto- nio, TX. pp. 195-204

Moore, Johanna D. and William R. Swartout (1988). "Explanation in expert systems: A sur-

vey", Research Report RR-88-228, University of Southern California, Marina Del

Rey, CA, 1988.

Moore, Carolyn A., Bednall, David, and Adam, Stewart (2005), "Genre, Gender and Interpre-

tation of Movie Trailers: An Exploratory Study", in ANZMAC 2005: Broadening the

boundaries, conference proceedings, ANZMAC, Dunedin, N.Z., pp. 124-130.

Myers, James H. (1996), "Segmentation and Positioning for Strategic Marketing Decisions",

American Marketing Association, Chicago, IL USA, 1996.

Nakamura, Atsuyoshi and Naoki Abe (1998), “Collaborative Filtering Using Weighted Ma-

jority Prediction Algorithms”, in ICML '98: Proceedings of the 15th International

Conference on Machine Learning, pp. 395-403.

Neumann, Andreas W. (2009). "Recommender Systems for Information Providers", Physica-

Verlag Heidelberg

O'Donovan, John and Barry Smyth (2005). "Trust in Recommender Systems", in IUI'05 Pro-

ceedings of the 10th international conference on Intelligent user interfaces, ACM

New York, NY, USA, pp. 167-174.

O‟Sullivan, Derry, Barry Smyth, and David C. Wilson (2004), “Preserving Recommender

Accuracy and Diversity in Sparse Datasets”, in International Journal on Artificial In-

telligence Tools, Vol. 13, Issue 1, pp. 219–236.

DRAFT -

final

revisi

on to

appe

ar in

2012

Bibliography 153

O‟Sullivan, Derry, Barry Smyth, David C. Wilson, Kieran McDonald, Alan and Smeaton

(2004), "Improving the quality of the personalized electronic program guide", in Us-

er Modeling and User-Adapted Interaction, Vol 14, Issue 1, pp. 5–36.

Park, Seung-Taek and Wei Chu (2009), “Pairwise preference regression for cold-start recom-

mendation”, in RecSys '09 Proceedings of the third ACM conference on Recom-

mender systems, pp. 21-28.

Paterek, Arkadiusz (2007), "Improving Regularized Singular Value Decomposition for Col-

laborative Filtering", in Proceedings of KDD Cup Workshop at SIGKDD'07, 13th

ACM International Conference on Knowledge Discovery and Data Mining, pp. 39-

42.

Payne, John W., James R. Bettman, and Eric Johnson (1988), “Adaptive Strategy Selection in

Decision Making,” Journal of Experimental Psychology: Learning, Memory and

Cognition, 14 (July), 534-52.

Payne, John W., James R. Bettman, and Eric Johnson (1993), The Adaptive Decision Maker.

Cambridge, UK: Cambridge University Press.

Pazzani, Michael J. (1999), "A Framework for Collaborative, Content-Based, and Demo-

graphic Filtering", in Artificial Intelligence Review - Special issue on data mining on

the Internet, pp. 393-408.

Pazzani, Michael J. and Daniel Billsus (2007), "Content-based Recommendation Systems", in

The Adaptive Web, pages 325-341.

Prag, Jay and James Casavant (1994) “An Empirical Study of the Determinants of Revenues

and Marketing Expenditures in the Motion Picture Industry”. in Journal of Cultural

Economics, Vol. 18, pp. 217-235.

Press, William H., Saul A. Teukolsky, William T. Vetterling, Biran P. Flannery (2007), "Nu-

merical Recipes: The Art of Scientific Computing", Cambridge University Press, 3rd

edition.

DRAFT -

final

revisi

on to

appe

ar in

2012

Bibliography 154

Rashid, Al Mamunur, Istvan Albert, Dan Cosley, Shyong K. Lam, Sean M. McNee, Joseph A.

Konstan, and John Riedl (2002), “Getting to Know You: Learning New User Prefer-

ences in Recommender Systems”, in Proceedings of the International Conference on

Intelligent User Interfaces, pp. 127–134.

Resnick, Paul, Neophytos Iakovou, Mitesh Sushak, Peter Bergstrom, and John Riedl (1994),

“GroupLens: An Open Architecture for Collaborative Filtering of Netnews”, in Pro-

ceedings of ACM CSCW’94 Conference on Computer Supported Cooperative Work,

pp. 175-186.

Resnick, Paul, Rahul Sami (2007), “The Influence Limiter: Provably Manipulation-resistant

Recommender Systems”, in Proceedings of the 2007 ACM conference on Recom-

mender systems Rec Sys’07, pp. 25-32.

Ricci, Francesco, Lior Rokach, Bracha Shapira (2011), "Introduction to Recommender Sys-

tems Handbook", in Ricci, Francesco, Lior Rokach, Bracha Shapira, Paul B. Kantor

[eds.] (2011). "Recommender Systems Handbook", Springer Science + Business

Media LLC, pp. 1 - 35.

Rutkowski, Anne-Francoise, Alea Fairchild, John B.Rijsman (2004). "Group Decision Sup-

port Systems and Patterns of Interpersonal Communication to Improve Ethical Nego-

tiation in Dyads". European Journal of Social Psychology, vol. 9, pages 11-30.

Salakhurdinov, Ruslan, Andriy Minh, and Geoffrey Hinton (2007), "Restricted Boltzmann

Machines for Collaborative Filtering", in Proceedings of the 24th International Con-

ference on Machine Learning, pp. 791-798.

Salton, Gerard, Anita Wong, and Chung-Shu Yang (1975), "A Vector Space model for Infor-

mation Retrieval", in Journal of The American Society for Information Science, Vol.

18, No. 11, pp. 613-620.

Salton, Gerard and Christopher Buckley (1988), "Term-weighting Approaches in Automatic

Text Retrieval", in Information Processing and Management, Vol. 25, No. 5, pp.

513-523.

DRAFT -

final

revisi

on to

appe

ar in

2012

Bibliography 155

Sandvig, J. J., Bamshad Mobasher, and Robin Burke (2007), "Robustness of Collaborative

Recommendation Based on Association Rule Mining", in Proceedings of the 2007

ACM conference on Recommender systems, pp. 105-111.

Sarwar, Badul M., George Karypis, Konstan Joseph A., and Riedl John T. (2000), “Applica-

tion of Dimensionality Reduction in Recommender System – A Case Study”, in

ACM WebKDD 2000 Web Mining for E-Commerce Workshop, pp. 285-289.

Sarwar, Badul M., George Karypis, Joseph Konstan, and John T. Riedl (2001), “Item-Based

Collaborative Filtering Recommendation Algorithms”, in WWW '01 Proceedings of

the 10th International Conference on World Wide Web ACM New York, NY, USA,

pp. 285-295.

Sarwar, Badul M., George Karypis, Joseph Konstan, and John T. Riedl (2002), “Incremental

Singular Value Decomposition algorithms for Highly Scalable Recommender Sys-

tems”, in ICCIT '02 Proceedings of the 5th International Conference on Computer

and Information Technology, pp. 399-404.

Sawhney, Mohanbir S. and Jehoshua Eliashberg (1996) “A Parsimonious Model of Forecast-

ing Gross Box-Office Revenues of Motion Pictures”, in Marketing Science, Vol. 15,

Issue 2, pp. 113-131.

Seyerlehner, Klaus, Arthur Flexer, and Gerhard Widmer (2009), “On the Limitations of

Browsing Top-N Recommender Systems”, in Proceedings of the third ACM confer-

ence on Recommender systems, pp. 321-324.

Shardanand, Upenda and Patti Maes (1995), “Social Information Filtering: Algorithms for

Automating „Word of Mouth‟”, in Proceedings of ACM CHI’95 Conference on Hu-

man Factors in Computing Systems, pp. 210-217.

Schafer, Ben J., Joseph A. Konstan, and John Riedl (1999). "Recommender Systems in E-

Commerce", Proceedings of the First ACM Conference on Electronic Commerce,

Denver, CO, 158-166.

Schafer, Ben J., Joseph A. Konstan, and John Riedl (2001). "E-Commerce Recommendation

Applications", Data mining and Knowledge Discovery. 5 (1-2), 115-153.

DRAFT -

final

revisi

on to

appe

ar in

2012

Bibliography 156

Schafer, Joseph L. and John W. Graham (2002), "Missing Data: Our View of the Stat of the

Art", in Psychological Methods Vol. 7, No. 2, pp. 147-177.

Schwab, Ingo, Alfred Kobsa, and Ivan Koychev (2001), “Learning User Interests through

Positive Examples Using Content Analysis and Collaborative Filtering”, in User

Modeling and User-Adapted Interaction.

Senecal, Sylvain and Jacques Nantel (2004), „The Influence of Online Product Recommenda-

tions on Consumers‟ Online Choices,“ Journal of Retailing, 80 (2), 159-169.

Simon, Herbert A. (1982), "Models of bounded rationality", Cambridge,MA: MIT Press.

Sinha, Rashmi, and Kirsten Swearingen (2002). „The Role of Transparency in Recommender

Systems“, Conference on Human Factors in Computing Systems, ACM New York,

NY, USA 830–831.

Shortliffe, Edward H. and Bruce G. Buchanan (1975). "A model of inexact reasoning in med-

icine". Mathematical Biosciences Vol. 23 (3-4), pp. 351–379.

Soboroff, Ian M. and Charles Nicholas (1999), “Combining Content and Collaboration in

Text Filtering”, in Proceedings of the IJCAI-99 Workshop on Machine Learning for

Information Filtering,Vol. 99, pp. 86-91

Sørmo, Frode, Jörg Cassens, and Agnar Aamodt (2005). "Explanation in Case-Based Reason-

ing: Perspectives and Goals", in Artificial Intelligence Review, Volume 24 Issue 2,

Kluwer Academic Publishers, pp. 145-161.

Symeonidis, Panagoitis, Alexandros Nanopoulos, Yannis Manolopoulos (2007). "Feature-

weighted User Model for Recommender Systems", in UM '07 Proceedings of the

11th international conference on User Modeling, pp. 97–106.

Symeonidis, Panagoitis, Alexandros Nanopoulos, Yannis Manolopoulos (2008), "Providing

Justifications in Recommender Systems", IEEE Transactions on Systems, MAN, and

Cybernetics, Vol. 38, No. 6, pp. 1262-1272.

Symeonidis, Panagoitis, Alexandros Nanopoulos, Yannis Manolopoulos (2009). „MoviEx-

plain: a recommender system with explanations“, Proceedings of the third ACM con-

ference on Recommender systems, ACM 317–320.

DRAFT -

final

revisi

on to

appe

ar in

2012

Bibliography 157

Takács, Gábor, Isván Pilászy, Bottyán Németh, and Domonkos Tikk (2007), "Major Compo-

nents of the Gravity Recommendation System", in SIGKDD Explorations, Vol. 9,

No. 2., pp. 80-84.

Tang, Tiffany Ya, Pinata Winoto, and Keith C. C. Chan (2003) "On the Temporal Analysis

for Improved Hybrid Recommendations", in WI '03 Proceedings of the 2003

IEEE/WIC International Conference on Web Intelligence. pp. 214-220.

Terveen, Loren, Jessica McMackin, Brian Amento, and Will Hill (2002), "Specifying prefer-

ences based on user history", in Proceedings of the Conference on Human Factors in

Computing Systems, pp. 315-322.

Tintarev, Nava (2007), "Explanations of recommendations", in Proceedings of the 2007 ACM

Conference on Recommender Systems (RecSys'07), Minneapolis, MN, pp. 203-206.

Tintarev, Nava and Masthoff, Judith (2007), "Effective Explanations of Recommendations:

User-Centered Design", in Proceedings of the 2007 ACM conference on Recom-

mender systems, ACM New York, NY, USA 153–156.

Tintarev, Nava and Masthoff, Judith (2007), "The Effectiveness of Personalized Movie Ex-

planations: An Experiment Using Commercial Meta-data", in AH '08 Proceedings of

the 5th international conference on Adaptive Hypermedia and Adaptive Web-Based

Systems, pp. 204-213.

Tintarev, Nava and Masthoff, Judith (2011), "Designing and Evaluating Explanations for

Recommender Systems", in Ricci, Francesco, Lior Rokach, Bracha Shapira, Paul B.

Kantor [eds.] (2011). "Recommender Systems Handbook", Springer Science + Busi-

ness Media LLC, pp. 479-510.

Tompson, Clive (2008), "If You Liked This, sure to Love That", in The New York Times, No-

vember 23th 2008, http://www.nytimes.com/2008/11/23/magazine/23Netflix-t.html

Tran, Thomas and Robin Cohen (2000), “Hybrid Recommender Systems for Electronic

Commerce”, in Knowledge-Based Electronic Markets, Papers from the AAAI Work-

shop, AAAI Technical Report WS-00-04, AAAI Press, pp. 78-83.

DRAFT -

final

revisi

on to

appe

ar in

2012

Bibliography 158

Tran-Le, Esther (2010), "NYC Pandora Listener Meet Up", blog entry, March 22, 2010,

http://esthertranle.com/wordpress/2010/03/23/nyc-pandora-listener-meet-up, re-

trieved on 15.06.2011.

Tsymbal, Alexey (2004), “The Problem of Concept Drift: Definitions and Related Work”,

Technical Report TCD-CS-2004-15, Trinity College Dublin.

Tversky, Amos (1967), "Additivity, Utility, and Subjective Probability", in Journal of Math-

ematical Psychology, Vol 4, pp. 175-201.

Uchyigit, Gulden and Matthew Y. Ma [Eds.] (2008), "Personalization Techniques and Rec-

ommender Systems: Series in Machine Perception and Artificial Intelligence - Vol.

70", World Scientific Publishing Co. Pte. Ltd. 2008

von Winterfeldt, Detlouf and Ward Edwards (1986), "Decision analysis and behavioral re-

search", New Yor k:Cambridge University Press.

Wei, Chang-Ping, Michael J. Shaw, and Robert F. Easley (2002), "A Survey of Recommenda-

tion Systems in Electronic Commerce", in Roland T. Rust and P.K. Kannan [eds.] "e-

Service: New Directions in Theory and Prasctice" (2002), M.E. Sharpe, Armonk,

New-York, London, England.

Weiss, Jie W., David J. Weiss, and Ward Edwards (2009), "A Descriptive Multi-attribute

Utility Model for Everyday Decisions", in Theory and Decision, Vol. 68, Issues (1-

2), pp. 101-114.

Wright, Peter (1974), "The Harassed Decision Maker: Time Pressures, Distractions, and the

Use of Evidence", in Journal of Applied Psychology, 59 (October), pp. 555-561.

Ying, Yuanping, Fred Feinberg, Michel Wedel (2006). "Leveraging Missing Ratings to Im-

prove Online Recommendation Systems", in Journal of Marketing Research, Vol.

XLIII, August, pp. 355-365.

Zanker, Markus, Sergiu Gordea, Markus Jessenitschnig, and Michael Schnabl (2006), "A Hy-

brid Similarity Concept for Browsing Semi-structured Product Items", in Proceed-

ings of 7th International Conference on Electronic Commerce and Web Technologies

(EC-Web), Springer 2006 (LNCS, 4082), pp. 21-30.

DRAFT -

final

revisi

on to

appe

ar in

2012

Bibliography 159

Zaslow, Jeffrey (2002), "If TiVo Thinks You Are Gay, Here's How to Set It Straight", in Wall

Street Journal - Eastern Edition, 11/26/2002, Vol. 240 Issue 105, p.A1

Zhan, Sinan, Fengrong Gao, Chunxiao Xing, and Lizhu Zhou (2006), "Addressing Concept

Drift Problem in Collaborative Filtering Systems", in Proceedings of the 17th Euro-

pean Conference on Artificial Intelligence, pp. 34-39.

Zhang, Yi, Jamie Callan, and Thomas Minka (2002), “Novelty and Redundancy Detection in

Adaptive Filtering”, in Proceedings of the 25th Annual International ACM SIGIR

Conference on Research and Development in Information Retrieval SIGIR '02, pp.

81-88.

Zhang Mi (2009), “Enhancing Diversity in Top-N Recommendation”, in Proceedings of the

third ACM conference on Recommender systems, pp. 397-400.

Zhao, Yangchang, Chengqi Zhang, and Shichao Zhang (2005), "A Recent-biased Dimension

Reduction Technique for Time Series Data", in ACM Proceedings of the 9th Pacific-

Asia Conference on Knowledge Discovery and Data Mining (PAKDD'05), pp. 751-

758.

Ziegler, Cai-Nicolas, Sean M. McNee, Konstan, Joseph A., and Georg Lausen (2005), „Im-

proving Recommendation Lists Through Topic Diversification”, in Proceedings of

the International World Wide Web Conference WWW’05, pp. 22–32.

DRAFT -

final

revisi

on to

appe

ar in

2012

Appendix A: Sources of Error in Recommender Systems 160

Appendix A: Sources of Error in Recommender Systems

Automated recommender systems are in essence stochastic processes that infer their

recommendations based on heuristic approximations of human processes by means of numer-

ic algorithms. Their computations are done on extremely sparse and incomplete data. These

two conditions result in recommendations that are often correct and reliable but also occa-

sionally very wrong, i.e. the suggestions generated by RS are subject to errors. According to

Herlocker, Konstan, and Riedl (2000), the sources of errors can be roughly grouped into two

categories: model/process errors and data errors. We agree with this classification and will

extend its understanding below.

MODEL/PROCESS ERRORS

Model or process errors occur when a computational process employed by the RS for

generating recommendations does not appropriately reflect the user‟s intrinsic decision pro-

cess and thus does not match his or her requirements. This can happen, for example, due to:

Multiattribute preferences. Multiattribute utility (MAU) models have a long history in

the research fields of decision-making and marketing (eg. Edwards 1954; Tversky 1967;

Green, Wind, and Jain 1972; Green and Wind 1973; Luce 1992). According to MAU, people

make choices using an intrinsic utility function which sums up attribute-related preferences of

the items considered, i.e. contained in the evoked set of choice alternatives. An item with the

highest utility for a given consumer has the highest probability to be chosen. Although the

research of motion pictures success factors has shown that movie attributes such as actors,

directors, genres, budgets, country of origin, awards, etc. significantly influence the movie‟s

success as a result of the expression of consumer preferences (Hennig-Thurau, Houston, and

Walsh 2006), contemporary movie recommender systems still fail to adequately incorporate

such attribute characteristics and to account for attribute-related consumer preferences within

the recommendation process. The reason for that is the limited ability of information pro-

cessing algorithms to automatically extract meaningful attributes, descriptive to multimedia

content (Wei, Shaw, Easely 2002; Pazzani, Billsus 1997; Lops, de Gemmis, and Semeraro

2011). When preferences towards movie attributes were used in extant work, the choice of

attributes was either based on information availability, not a thorough study of relevant attrib-

DRAFT -

final

revisi

on to

appe

ar in

2012


utes (e.g., Ying, Feinberg, and Wedel 2006), or the attributes were used for post processing of

recommendation generation (e.g., Symenoidis, Nanopoulos, and Manolopoulos 2009). It fol-

lows that RS fail to model the user attribute-related preferences to a full extent, and thus the

recommendation processes can lead to potentially erroneous recommendations.

Concept or interest drift. It is not uncommon for people to change their interests. Es-

pecially in the domain of movie recommendations, it can be clearly seen. In fact, movies go in

and out, users may adopt new views on actors, genres, directors, etc. In RS literature, this

phenomena is referred to as “concept drift” (Billus and Pazzani 2000) or “interest drift”

(Burke 2002). Traditional RS, however, do not consider the user interest drift, so they cannot

reflect changes in the user preferences (Zhan et al. 2006). To our knowledge, only few studies

have focused on this problem. So Tang, Winoto, and Chan (2003) suggested that the movie‟s

production year reflects the situational environment in which a movie was filmed and thus

might significantly affect users‟ feature preferences. For this reason, they propose discounting

user preferences for earlier movies while boosting the newer ones in the recommendation pro-

cess, i.e. assigning higher weights to user ratings for the newer movies. Similarly, other works

suggest using the date of when the ratings were collected as a basis for the weights assign-

ment. In accordance to this, greater weights are assigned to recent data, while the older data is

either decayed or completely removed from the computational process (Terveen et al. 2002;

Zhao, Zhang, and Zhang 2005; Ding and Lee 2005). Zhan et al. (2006) proposed an iterative

data weighting method which can capture also recurring user interests. However, the

weighting methods are still open for model and process errors caused by interest drift, as they

solely rely on the time as the describer of interest drift at an aggregated level and do not con-

sider changes in attribute preferences.

Contextual factors. Traditional RS assume homogeneity of context, i.e. the decision

upon what to recommend does not depend on when the recommendation is requested (Ado-

mavicius et al. 2005). Though, behavioral research in marketing has shown evidence that con-

sumer decision making, rather to be invariant, depends on the context of decision making:

One and the same consumer may prefer different products or brands and even utilize different

decision-making strategies under different contexts (Chakravarti and Lynch 1983; Klein and

Yadav 1989; Bettman, Johnson, and Payne 1991). Because of a huge variety of imaginable

user contexts, however, it seems not possible for RS to collect all the data needed to be able to

suitably account for the contexts. Implicit collection of the context information, though possi-

ble, is constrained only to the information that is automatically available to or can be queried

DRAFT -

final

revisi

on to

appe

ar in

2012


by the system from real-time databases, e.g. daytime, season, weather conditions, traffic situa-

tion, user GPS coordinates, etc. Active querying of the users for additional information about

their contexts would contradict one of the main the principles of RS, i.e. simplifying the us-

ers‟ choice making through minimizing the amount of user-system interactions rather than

overwhelming them with long questionnaires50

. Although concepts of RS that incorporate

contextual information were elaborated in recent RS literature, they either do not go beyond

the concept level (e.g. Adomavicius et al. 2005; Adomavicius and Tuzhilin 2008) or employ

only very limited amount of contextual information (e.g. Balturnas 2008; Balturnas and Ricci

2008; El Helou et al. 2009). Many of the context factors that influence the decision making

process, such as motives, anticipated complexity of the decision task and need to justify the

decision to others or to account for somebodies preferences, time pressure, prior knowledge

(Bettmann, Johnson, and Payne 1991) can hardly be formalized both for explicit and implicit

data collection and thus cannot be properly accounted for in the models underlying recom-

mendations. Consequently, the computational process of an RS fails to fully reflect the user

context, thus leaving a room for errors especially in the cases where contextual factors domi-

nate over user preferences.

Scale granularity. RS typically make use of discrete integer-valued rating scales for

collecting user preferences towards items, e.g., movies, contained in the catalog. Or they uti-

lize a binary 0/1 scale for the implicit collection of purchase acts or other events (such as

clicking on a hyperlink or reading an article) that represent meaningful data input for the rec-

ommendation process in correspondent item domains. This raises two problems that may lead

to errors in the computational process: Firstly, it cannot be guaranteed that all users perceive

the scale points identically and express certain amounts of preference on a given scale equal-

ly. For instance, if two persons find a certain movie equally good, one of them may rate it

with 5 of 5 points, while the other may give it only 4 of 5 points. In such a situation, a rec-

ommendation process may not be able to determine that in both cases the same amount of

preference was meant and thus would treat the assigned scores differently – in accordance

with its internal representation of the meanings of the scale points. Hence, the difference in

the ratings received from the considered users introduces an error to the recommendation pro-

cess. Secondly, as described in Chapter 2.3, the algorithms employed in RS typically operate

on rational numbers. For this reason, the results of averaging or weighing, which may be em-

50 This thesis is supported by the early studies in the research area of CSCW that revealed that people are not

ready to explicitly express their preferences and priorities as well as to perceive such actions as extrinsic to their actual task and as requiring extra effort (eg. Grudin 1988).

DRAFT -

final

revisi

on to

appe

ar in

2012


ployed within the recommendation process, will also often be rational numbers that, however,

should be represented as an integer at least for the evaluation of the prediction accuracy. This

either introduces potential errors caused by rounding off/up or makes the accuracy evaluation

per se error -prone.

Algorithmic processing errors. Finally, the computational procedure itself represents a

potential source for errors. Even a perfect model of user choice-making numeric algorithms

will still be error-prone due to the possibilities of overfitting, rounding errors, and other types

of miscalculations. Not at last, the quality of data determines the outcome of calculations.

DATA ERRORS

Data errors result from inadequacies of the data employed in the calculations of recom-

mendations. These data inadequacies usually fall into three classes: not enough data, poor or

bad data, and high variance data (Herlocker, Konstan, and Riedl 2000).

Not enough data. RS base their computations on extremely sparse and incomplete data.

Indeed, if the data were complete, there would be no reason for RS to predict the missing data

points. The estimation of missing data itself is known to raise computational challenges and to

be prone to errors (e.g., Schafer and Graham 2002). In the context of RS, the latter problem

becomes even more aggravated for the items and users that have recently entered the system –

an issue we addressed earlier in this chapter as the new item and new user problems.

Poor or bad data. Even in the cases where considerable amounts of data about the us-

ers and items are available, some of the data may still contain errors. These errors may result

from accidentally erroneous user inputs or even be fraudulently generated through shilling

attacks by malicious web robots, which favor or disfavor a particular item (Mobasher et al.

2007; Sandvig, Mobasher and Burke 2007). Another part of inconsistent data points is pro-

duced by natural variability in the perception of the scale points by users, i.e. when users pro-

vide different ratings to the same item at different times (Hill et al. 1995; Herlocker et al.

2004) or when different users associate different ratings with the same amount of preference.

High variance data. High variance data is not necessarily considered bad data for rec-

ommendation algorithms. However, it can cause recommendation errors (Herlocker, Konstan,

and Riedl 2000). Especially in cases of interest polarizing items, such as the comedy movie

“Napoleon Dynamite” that can “be either loved or despised” (Thompson 2008), it can be hard

DRAFT -

final

revisi

on to

appe

ar in

2012


to predict its preference rating for a given user. In such cases, a proper prediction is probably

not an average rating, although this is probably what will be predicted by a RS (Herlocker,

Konstan, and Riedl 2000).

As we have shown above, there are many factors that can cause misleading recommen-

dations. A chance to receive an erroneous recommendation impairs users‟ acceptance and

trust in RS. Explanations of the reasoning behind the recommendations provide users with

indications when to trust a recommendation and when to doubt one. This gives an instrument

to handle the errors in recommendations as well as it recovers users‟ trust in and acceptance

of RS (Herlocker, Konstan, and Riedl 2000).

DRAFT -

final

revisi

on to

appe

ar in

2012

Appendix B: List of Preference Relevant Attributes 165

Appendix B: List of Preference Relevant Attributes

Genres (26)

Action

Adult

Adventure

Animation

Biography

Comedy

Crime

Documentary

Drama

Family

Fantasy

Film

History

Horror

Music

Musical

Mystery

News

Reality

Romance

Sci-Fi

Short

Sport

Thriller

War

Western

Actors (87)

Affleck, Ben

Allen, Tim

Bale, Christian

Banderas, Antonio

Black, Jack

Bleibtreu, Moritz

Bloom, Orlando

Broderick, Matthew

Cage, Nicolas

Caine, Michael

Carrey, Jim

Chan, Jackie

Clooney, George

Connery, Sean

Costner, Kevin

Craig, Daniel

Crowe, Russell

Cruise, Tom

Cusack, John

Damon, Matt

De Niro, Robert

Depp, Johnny

DiCaprio, Leonardo

Diesel, Vin

Douglas, Michael

Downey, Jr. Robert

Dreyfuss, Richard

Eastwood, Clint

Farrell, Colin

Ford, Harrison

Foxx, Jamie

Fraser, Brendan

Freeman, Morgan

Gere, Richard

Gibson, Mel

Grant, Hugh

Gyllenhaal, Jake

Hanks, Tom

Hartnett, Josh

Hoffman, Dustin

Hopkins, Anthony

Ice, Cube

Jackman, Hugh

Jackson, Samuel L.

Kutcher, Ashton

LaBeouf, Shia

Law, Jude

Lawrence, Martin

Ledger, Heath

Maguire, Tobey

Marsden, James

Martin, Steve

McConaughey, Matthew

McGregor, Ewan

McKellen, Ian

Murphy, Eddie

Murray, Bill

Myers, Mike

Newman, Paul

Nicholson, Jack

Norton, Edward

Owen, Clive

Pacino, Al

Phoenix, Joaquin

Pitt, Brad

Quaid, Dennis

Redford, Robert

Reeves, Keanu

Reynolds, Ryan

Russell, Kurt

Sandler, Adam

Schwarzenegger, Arnold

Schweiger, Til

Scott, Seann William

Smith, Will

Snipes, Wesley

Stallone, Sylvester

Statham, Jason

Stiller, Ben

Travolta, John

Tucker, Chris

Waalkes, Otto

Wahlberg, Mark

Washington, Denzel

Williams, Robin

Willis, Bruce

Wilson, Owen

Wood, Elijah

Actresses (46)

Adams, Amy

Aniston, Jennifer

Barrymore, Drew

Berry, Halle

Blanchett, Cate

Bullock, Sandra

Curtis, Jamie Lee

Diaz, Cameron

Dunst, Kirsten

Fonda, Jane

Foster, Jodie

Hathaway, Anne

Hawn, Goldie

Hewitt, Jennifer Love

Hudson, Kate

Hunt, Helen

DRAFT -

final

revisi

on to

appe

ar in

2012


Johansson, Scarlett

Jolie, Angelina

Keaton, Diane

Kidman, Nicole

Knightley, Keira

Lopez, Jennifer

Moore, Demi

Moore, Julianne

Paltrow, Gwyneth

Pfeiffer, Michelle

Portman, Natalie

Potente, Franka

Riemann, Katja

Roberts, Julia

Russo, Rene

Ryan, Meg

Ryder, Winona

Sarandon, Susan

Stiles, Julia

Streep, Meryl

Streisand, Barbra

Swank, Hilary

Theron, Charlize

Thurman, Uma

Weaver, Sigourney

Weisz, Rachel

Winslet, Kate

Witherspoon, Reese

Zellweger, Renée

Zeta-Jones, Catherine

Directors (106)

Abrahams, Jim

Allen, Woody

Amiel, Jon

Anderson, Paul W. S.

Annaud, Jean-Jacques

Apted, Michael

Bay, Michael

Besson, Luc

Boyle, Danny

Brest, Martin

Brooks, James L

Burton, Tim

Cameron, James

Campbell, Martin

Carpenter, John

Coen, Joel

Cohen, Rob

Columbus, Chris

Coppola, Francis Ford

Craven, Wes

Crowe, Cameron

Dante, Joe

Davis, Andrew

de Bont, Jan

Demme, Jonathan

del Toro, Guillermo

De Palma, Brian

De Vito, Danny

Dörrie, Doris

Donner, Richard

Dugan, Dennis

Eastwood, Clint

Emmerich, Roland

Ephron, Nora

Farrelly, Peter

Farrelly, Bobby

Fincher, David

Forster, Marc

Gilliam, Terry

Gosnell, Raja

Gray, F. Gary

Hallström, Lasse

Hanson, Curtis

Harlin, Renny

Herek, Stephen

Hoblit, Gregory

Howard, Ron

Jackson, Peter

Johnston, Joe

Lee, Ang

Lee, Spike

Levant, Brian

Levinson, Barry

Levy, Shawn

Lucas, George

Lyne, Adrian

Mann, Michael

Marshall, Garry

Marshall, Penny

McTiernan, John

Miller, George

Newell, Mike

Nichols, Mike

Nolan, Christopher

Noyce, Phillip

Oz, Frank

Petersen, Wolfgang

Pollack, Sydney

Raimi, Sam

Ramis, Harold

Ratner, Brett

Reiner, Rob

Reitman, Ivan

Reynolds, Kevin

Roach, Jay

Rodriguez, Robert

Russell, Chuck

Schumacher, Joel

Scorsese, Martin

Scott, Ridley

Scott, Tony

Segal, Peter

Shadyac, Tom

Shankman, Adam

Shyamalan, M. Night

Singer, Bryan

Singleton, John

Smith, Kevin

Soderbergh, Steven

Sommers, Stephen

Sonnenfeld, Barry

Spielberg, Steven

Stone, Oliver

Tarantino, Quentin

Thomas, Betty

Turteltaub, Jon

Tykwer, Tom

Verbinski, Gore

Vilsmaier, Joseph

Weir, Peter

Woo, John

Wortmann, Sönke

Zemeckis, Robert

Zucker, David

Zucker, Jerry

Zwick, Edward

Producers (4)

Apatow, Judd

Bruckheimer, Jerry

Rudin, Scott

Silver, Joel

DRAFT -

final

revisi

on to

appe

ar in

2012


Writers (5)

Crichton, Michael

Curtis, Richard

Dick, Philip K.

Grisham, John

King, Stephen

Production Firms (6)

Imagine

Nickelodeon

Pixar

Revolution

Section

Spyglass

Countries of Origin (38)

Australia

Austria

Argentina

Belgium

Brazil

Canada

China

Czech Republic

Czechoslovakia

Denmark

East Germany

France

Finland

Germany

Hong Kong

Iceland

India

Ireland

Israel

Italy

Japan

Mexico

Netherlands

New Zealand

Norway

Poland

Russia

South Africa

South Korea

Soviet Union

Spain

Sweden

Switzerland

Thailand

Turkey

UK

USA

West Germany

DRAFT -

final

revisi

on to

appe

ar in

2012

Appendix C: Technical Details of Prediction Accuracy Tests 168

Appendix C: Technical Details of Prediction Accuracy Tests

Whereas in Chapter 4 of our thesis we describe our tests of predictive accuracy concep-

tually, in this appendix we provide insights in the details of technical implementation and ex-

ecution of the tests. By providing this information we ensure that a critical reader can prove

the methodical correctness of the process of how the results were obtained, understand the

course of our action deeper and, if necessary, replicate our results as well as use our procedure

for his or her own studies and for building his or her own recommender system.

For the calculation of the prediction accuracy of the Global average as well as all vari-

ants of user-based and item-based collaborative filtering algorithms, we utilized the open

source library of recommender system algorithms „MyMediaLite‟51

. This library was recom-

mended for use in real-world recommender systems as well as for research purposes on the 4th

ACM Conference on Recommender Systems RecSys‟201052

(personal communication with

Francesco Ricci, Gediminas Adomavicius, Xavier Amatriain).

The matrix factorization algorithm was implemented based on Simon Funk‟s (2006) de-

scription53

of his approach that brought him to the fourth position in the Netflix Prize leader-

board in the fall of 2006. Funk algorithm‟s surprising performance has attracted an extreme

attention from the Netflix Prize community, which made the matrix factorization approach

popular in the recommender system research. Although Funk‟s approach was never published

in an academic journal, his blog entry describing his stochastic gradient descent method for

matrix factorization was widely cited in recent literature and serves as the basis for all pub-

lished matrix factorization approaches (e.g., Paterek 2007; Koren 2009; Linden 2009; Koren

and Bell 2011).

The program for our approach, described in Chapter 4, was implemented using source

code snippets from Press et al. 2007, a widely acknowledged source of numeric methods for

scientific computing. Table C.1 provides an overview of the employed procedures, their short

descriptions and information about their roles in our algorithm.

51 http://www.ismll.uni-hildesheim.de/mymedialite/index.html

52 http://recsys.acm.org/2010/

53 http://sifter.org/~simon/journal/20061211.html

DRAFT -

final

revisi

on to

appe

ar in

2012

Appendix C: Technical Details of Prediction Accuracy Tests 169

All algorithms employed in our study are implemented in programming language C#.

The tests were performed on an Intel® QuadCore Duo

™ Q9400 2.67GHz machine with 8GB

RAM running 64-bit Windows Server® 2008 Standard Edition with Service Pack 2.

Table C.1: Overview of the employed source code snippets from Press et al. 2007

Method or function

name

Description Role for the algorithm

Fitab

Object for fitting a straight line to set of points , with or without

available errors .

Solving regression

problems for one re-

gression parameter,

Section 3.2.1. invxlogx

Erf Normaldist:Erf

Lognormaldist:Erf

Gauleg18 Beta:Gauleg18

Gamma:Gauleg18

Studenttdist:Beta Fdist:Beta

Classes and functions providing distribu-

tional statistics and statistical tests for

Betta, Gamma, Gauss, logarithmic, Stu-

dent-t, and F-distributions. Performing tests for

significance, Section

3.2.1.

SVD Object for Singular Value Decomposition

of matrix .

Correction for omitted

variable bias. Solving

equation system (3.22),

Section 3.2.1.3.

SVD::solve Solves equation system for vector

using pseudoinverse of matrix .

Bracketmethod Base class for one-dimensional minimiza-

tion routines. Provides a routine to bracket

a minimum and several utility functions.

Optimizing initial pa-

rameter values, Section

3.2.2.

Brent:Bracketmethod Isolates the minimum using Brent‟s meth-

od.

F1dim Performs one-dimensional minimization.

Linemethod Base class for line minimization algo-

rithms.

Frprmn:Linemethod

Multidimensional minimization by the

Fletcher-Reeves-Polak-Ribiere method.

DRAFT -

final

revisi

on to

appe

ar in

2012

DRAFT -

final

revisi

on to

appe

ar in

2012

Ehrenwörtliche Erklärung

Ich erkläre hiermit ehrenwörtlich, dass ich die vorliegende Arbeit ohne unzulässige

Hilfe Dritter und ohne Benutzung anderer als der angegebenen Hilfsmittel angefertigt habe.

Die aus anderen Quellen direkt oder indirekt übernommenen Daten und Konzepte sind unter

Angabe der Quelle gekennzeichnet.

Bei der Auswahl und Auswertung folgenden Materials haben mir die nachstehend

aufgeführten Personen in der jeweils beschriebenen Weise entgeltlich/unentgeltlich geholfen:

keiner

Weitere Personen waren an der inhaltlich-materiellen Erstellung der vorliegenden Ar-

beit nicht beteiligt. Insbesondere habe ich hierfür nicht die entgeltliche Hilfe von Vermittlung-

bzw. Beratungsdiensten (Promotionsberater oder anderer Personen) in Anspruch genommen.

Niemand hat von mir unmittelbar oder mittelbar geldwerte Leistungen für Arbeiten erhalten,

die im Zusammenhang mit dem Inhalt der vorgelegten Dissertation stehen.

Die Arbeit wurde bisher weder im In- noch im Ausland in gleicher oder ähnlicher

Form einer anderen Prüfungsbehörde vorgelegt.

Ich versichere, dass ich nach bestem Wissen die reine Wahrheit gesagt und nichts ver-

schwiegen habe.

Langenhagen, den 29.07.2011

DRAFT -

final

revisi

on to

appe

ar in

2012

2012 draft - to appear - · pdf file(3.22) ... (4.4) ..... 113. draft - final ... this...

Documents