fuzzy modeling for item recommender systemsnorcio/papers/submitted...fuzzy modeling for item...
Post on 22-Sep-2020
3 Views
Preview:
TRANSCRIPT
Fuzzy Modeling for Item Recommender Systems Or
A Fuzzy Theoretic Method for Recommender Systems
Azene Zenebe, Anthony F. Norcio
Abstract—Representation of features of items and user feedbacks that are subjective, incomplete, imprecise and vague, and reasoning about their relationships are major problems in recommender systems. The paper presents a Fuzzy Theoretic Method (FTM) for recommender systems that handles the non-stochastic uncertainty induced from subjectivity, vagueness and imprecision in the data, and the domain knowledge and task under consideration. FTM further advances methods of fuzzy modeling in recommender systems as well as empirically evaluates the methods’ performance through simulations using a benchmark movie data. FTM is comprised of representation method for items’ feature and user feedbacks on these items, and a content-based algorithm based on various fuzzy theoretic similarity measures such as the fuzzy theoretic extensions of the Jaccard Index, Cosine, Proximity or Correlation similarity measures, and recommendation strategies such as the maximum-minimum or weighted-sum fuzzy theoretic recommendation strategies. Compared to the baseline Crisp Set based method (CSM), simulation runs of the FTM using the movie data show an improvement in precision with comparable recall and F1-measure. The improvement in the performance of FTM is attributed to the use of fuzzy modeling. Also, lower model size and recommendation size produce satisfactory recommendation accuracy. Moreover, depending on a task, a guideline for recommender systems designers that will help them choose a combination of one of the fuzzy theoretic recommendation strategies and the fuzzy theoretic similarity measures is provided. From the results we can conclude that modeling uncertainty using fuzzy set and logic improves the performance of content-based recommender systems.
Index Terms—recommendation algorithm, uncertainty, fuzzy theory, similarity measure ( if resubmitted in
IEEE Cybernetics …)
Or
Index Terms - Fuzzy system models; Fuzzy inference systems, Recommender Systems, Learning under Uncertainty (if submitted in Journal of Fuzzy Sets and Systems @ http://www.elsevier.com/wps/find/journaldescription.cws_home/505545/authorinstructions)
I. INTRODUCTION
Recommender systems are systems that provide users with recommendation of items and
information that help them to decide which items to procure or look based on the individual
customer preferences [1]. Recommendation systems are generally consisting of background data
such as historical data consisting of ratings from users of items before the recommendation
2
begins; input data such as features of items or a user’s ratings of items in order to initiate a
recommendation; and an algorithm to combine the two and generates a recommendation[2].
There are many advances in recommender systems research since their beginning. However,
there are various venues in which current recommender systems need improvements. As result,
the performance and acceptance of the state-of-the-art recommender systems is far from
satisfactory. Adomavicius and Tuzhilin [3] suggest various areas of improvements for current
recommender systems: better methods for representing user behavior and information about
items; more advanced recommendation modeling methods; incorporation of contextual
information into recommendation process; utilization of multi-criteria ratings; development of
less intrusive and more flexible recommendation methods; and development of recommender
systems effectiveness measures.
A. The Problem
Representation on and reasoning about the behavior of customer and features of items raised a
number of challenging issues. Attributes of items and user behavior are subjective, vague and
imprecise. These in turn induce uncertainty in representing and reasoning on items’ features,
users’ behavior, and their relationships. This uncertainty is non-stochastic that is induced from
subjectivity, vagueness and imprecision in the data, and the domain knowledge and task under
consideration.
In relation to the items, the uncertainty is associated to what extent (e.g. low to high) the items
have some features. For instance, given a movie to what extent does the movie have drama
content or is highly drama? In relation to user behavior such as interest, the uncertainty is
associated to how to measure and to represent user interest as precisely as possible. In relation to
3
the domain task (recommendation of items), the uncertainty is associated with the types of
relationship that exist between the user behavior and item features, among users in terms of their
characteristics, and among items in terms of their features; as well as how to model these
relationships. This non-stochastic uncertainty that inherently exists is not studied in previous
recommender systems research[1, 4-8]. Ignoring this type of uncertainty puts the modeling in a
less realistic setting, and the resulting model does not precisely represent the reality.
B. The Proposed Solution
Fuzzy set theory and logic provide a way to quantify the non-stochastic uncertainty that is
induced from subjectivity, vagueness and imprecision [9]. Compared with traditional statistical
methods such as the Bayesian method that handles uncertainty due to randomness, using fuzzy
theory that handles uncertainty due to subjectivity, imprecision and vagueness, has the following
benefits [10, 11]: (i) the membership function in fuzzy theory is deliberately designed to treat the
vagueness and imprecision in the context of the application. Therefore, it is more reliable and
accurate to use fuzzy theory to model subjectivity and vagueness in the attributes; (ii) the
membership function can be continuous, which are more accurate in representing the attributes
of items and user feedbacks; and (iii) the fuzzy mathematical method is easy to perform once the
membership functions of attribute have been defined. These properties promise to provide the
framework for addressing the representation and inference challenges faced in recommender
systems research.
Our research is motivated by the “reclusive methods” proposed by Yager [12] as
complementary methods to conventional collaborative methods within fuzzy set framework.
Yager [12] presents a methodology consisting of a collection of justifications and rules for the
4
recommendations based on fuzzy set and fuzzy logic. However, he assumes the availability of a
representation scheme for each object that allows the development of an appropriate tool for
calculating the similarity relationship over the set of objects. He also assumes the availability of
similarity index between two objects in the range 0 to 1. Furthermore, there is no empirical study
to support or refute the effectiveness of using fuzzy modeling for recommender systems.
This research further develops and empirically evaluates a Fuzzy Theoretic Method (FTM) for
recommender systems using the problem of movie recommendation as the domain. Particularly,
the proposed approach is content-based that uses features of items as background data and a
user’s feedbacks such as ratings of items of the same kind as input. One of the limitations of
current content-based approach is it requires sufficient data about the behavior of a user and
features of items; and its performance also depends on these data and how these data are used –
represented and reasoned.
Within the framework of fuzzy modeling theory, first we advance the representational method,
recommendation strategies, and similarity measures for items recommender systems by
accounting for the induced uncertainty in item features, and user feedback as well as by
combining users’ feedback and items’ features. That is, we, are presenting an alternative
representation scheme, inference mechanisms and an item-item similarity based recommendation
algorithm constituting FTM that would improve the performance of recommender systems by
modeling the non-stochastic uncertainty discussed. Second, we empirically assess the effect of
the FTM on the performance of a movie recommender system. Third, we empirically assess the
effect of the variants inference mechanisms – combination of the recommendation strategies and
similarity measures - on the performance of a movie recommender system that use FTM. Forth,
we study the effect of the training or model size, recommendation size and test size on the
5
performance of the movie recommender system.
C. The Results
Using benchmark movie data from the MovieLens recommender system, first, the mean
precision, recall, and F1-measure of 55.18, 37.93%, and 38.08% are obtained, respectively. For
recommender systems where precision is the most important metric[13], compared to the baseline
Crisp Set -based method (CSM) with mean precision, recall and F1-measure 48.74%, 39.2%, and
38.32 respectively, FTM provides greater precision. Second, the algorithm is scalable with linear
time complexity proportionality to the size of distinct values of features of item which are very
small (e.g. 20 for genres); and empirically a mean model time in range 0.083 to 0.12 seconds for
the different similarity measures are found, and recommendation time approximately 0.021
seconds per movie is obtained.
Third, model size (number of items initially required for which a user need to provide
feedback) of 5 to 15, and recommendation size (number of items recommended at one time) of 3
to 10 produces satisfactory recommendation accuracy. Fourth, the analysis shows that testing
size has significant effect on performance __ precision increases as testing size increases, where
as recall decreases as testing size increases. A greater performance is shown when the testing
size is between 41 and 60. Fifth, the results of analysis of variance show that there are significant
different among the different alternative combination of fuzzy theoretic similarity measures and
recommendation strategies in their recommendation accuracy and latency.
D. Conclusion and Contributions
From the results we can conclude that modeling uncertainty using fuzzy set and logic improves
6
the performance of content-based recommender systems. The contributions of this paper are
many folds:
a. The research provides a representation framework for features of items and users’
feedback based on fuzzy theory to represent and reason on uncertainty due to
subjectivity, imprecision and vagueness. The research empirically shows that modeling
non-stochastic uncertainty using fuzzy theory improves the performance.
b. The research provides new algorithms for content-based item recommender systems.
c. Results of an exploratory study using various alternatives Fuzzy Set Theoretic similarity
measures and Fuzzy recommendation strategies for recommender systems are presented.
d. The research provides a guideline for recommender systems designers that will help
him/her to choose a combination of one of the fuzzy theoretic recommendation strategies
and the fuzzy theoretic similarity measures. Moreover, the research provides results of a
detailed empirical study on the effect of these two factors as well as the other factors
model size, recommendation size, and testing size.
The remainder of the paper is organized as follows. Section II presents a review of related
literature1. Section III presents the representation methods, inference methods, similarity
measures and algorithms for recommendation of items. Section IV describes the dataset,
experimental settings, and evaluation metrics. Section V reports the results of the empirical
experiment and followed by the discussion in VI. Finally, conclusion and future research
directions are presented in Section VII.
1 Throughout the paper, we use the term item as a synonym to product, the term user as a synonym to customer and client, and the term feature as a synonym to attribute, characteristic, and variable.
7
II. RELATED LITERATURE
A. Recommender Systems
There are various classifications of recommendation methods. The three most widely used
methods of recommendation are content-based, collaborative filtering and hybrid methods. In
content-based recommendation, an item is recommended to a user mainly based on his/her
personal past actions like purchases, queries, and ratings and the characteristics of the item. In
collaborative recommendation, an item is recommended to a user based on other similar users’
actions like interests, preferences and ratings. This employs a learning technique that is called
collaborative filtering [4, 14-16]. Due to the limitations of the content-based and collaborative
filtering based recommendations, a hybrid approach is studied [2]. Collaborative filtering cannot
be applied to new items that have not been rated, and to a user who has not been assigned to a
group (a new user problem or the “cold start” problem). Content-based recommender systems
cannot predict user’s new behavior like having new interest because they are limited by what a
user has seen in the past; and it also requires careful features selection, representation, and
inference [1, 3, 17].
Based on the sources of data and how these data are used for recommendation, Burke [2] has
classified recommendation methods into: collaborative, content-based, demographic, utility-
based, and knowledge-based. There also various variants of hybrid methods that combines the
single methods. These methods are discussed in detail along with their pros and cons in [2].
Moreover, a recent survey of state-of-the-art recommender systems along with suggestions for
improvements is found in [3].
[Insert, here content-based on movie recommender systems] – see dissertation
8
In content-based recommendation, standard machine learning techniques such as
clustering, Bayesian networks and induction learning are applied in forming attribute-based
models. However, Bayesian network found to be suitable for environments in which user
preference models must not be updated rapidly or frequently; and clustering and decision
(association) rule techniques usually produce less-personal recommendation and results with
lower accuracy compared to the nearest neighbor algorithms [15]. Examples of application
systems that employ action-to-items based methods are Amazon Eyes and eBay Personal
Shopper.
Several numbers of studies on CF algorithms exists [6, 15, 18-21]. Deshpande and Karypis [6]
develop item-based Top-N recommendation algorithms that are collaborative type and faster
than traditional user-user collaborative algorithms with comparable recommendation recall.
Moreover, results of evaluation of these CF algorithms for recommender systems using the
MovieLens dataset are reported in terms of precision and F1 measure in [15, 19].
Research efforts that address the representation of and reasoning on user behavior and
information about items under uncertainty due to their subjectivity, incompleteness, imprecision,
and vagueness are rare, e.g. [22-25]. These researches focus on theoretical benefits of fuzzy set
and logic in data mining and on fuzzy clustering techniques such as [22, 25-31] ???. They are
also less applicable for recommender systems???. Yager [12] presents a theoretical framework
for constructing recommender systems using fuzzy modeling methods. This paper extends
Yager’s framework, and empirically evaluates the methods for movie recommendation. Nasraoui
and Petenes [24] have empirically compared the performance of their Web pages
recommendations using fuzzy inference engine with that of collaborative filtering and nearest-
profile based Web pages recommendation. They concluded that fuzzy methods have provided
9
higher coverage with low computational cost, are faster and require much lower main memory,
and are intuitive in dealing with the natural lack of clear boundaries in user interests.
B. Fuzzy Set and Fuzzy Logic
Fuzzy set theory consists of mathematical approaches that are flexible and well-suited to
handle incomplete information, the un-sharpness of classes of objects or situations, or the
gradualness of preference profiles [32]. Fuzzy set theory and logic provide a way to quantify the
uncertainty due to vagueness and imprecision [9]. Membership functions, a building block of
fuzzy sets, have possibilistic interpretation, which assumes the presence of a property and
compares its strength in relation to other members of the set.
A fuzzy set A in X is characterized by its membership, which is defined as [33]:
]1,0[:)( →∈ XxxAµ , where X is a domain space or universe of discourse. Alternatively, A can
be characterized by a set of pairs: })),(,{( XxxxA A ∈µ= . According to the context in which X is
used and the concept to be represented, the fuzzy membership function, )(xAµ , can have
different interpretations [34]. As a degree of similarity, it represents the proximity between
different pieces of information. For example, movie x in the fuzzy set of "drama movies " can be
estimated by the degree of similarity. As degree of preference, it represents the intensity of
preference in favor of x, or the feasibility of selecting x as a value of X. For instance, a movie
rating of 4 out of 5 indicates the degree of a user's satisfaction or liking with x based on certain
criteria, say movie attributes such as content-intensity of action, drama, and humor. These two
interpretations are used in this research.
• About Fuzzy Logic and how it is used here (one paragraph).
10
III. FORMALISM OF THE REPRESENTATION AND INFERENCE METHODS
The proposed fuzzy theoretic content-based approach is based on a user’s previous feedbacks,
and features of the new items and features of the set of items for which the user has provided
feedback. The rationale of this method is users are more likely to have interest in item like movie
that is similar to the items like movies they have experienced and liked. This approach is useful
for new item like movie with no or few user ratings. It is solely based on one user’s previous
interest expressed by ratings. The representation scheme, inference engine consisting of
recommendation strategies and similarity measures, and the algorithm of the proposed method
are presented in this section.
A. Items Representation using Fuzzy Set
For an item described with multiple attributes, more than one attribute can be used for
recommendation. Moreover, some attributes can be multi-valued involving overlapping or non-
mutually exclusive possible values. For example, movies are multi-genres and multi-actors [35].
These values of multi-valued attributes in an item can be represented more accurately with in a
fuzzy set framework than with in a crisp set framework.
Let an item Ij (j = 1 … M) be defined in the space of an attribute X ={x1, x2, x3, …. xL}, then Ij
can take multiple values such as x1, x2, …, and xL. If these values of X can be sorted in the
decreasing order of their presence in the item Ij expressed by degrees of membership, then the
membership function of item Ij to value xk (k = 1 …L), denoted by )(Iµ jxk, can be obtained
heuristically. Hence, a vector Xj={( xk, )(Iµ jxk), k= 1… L} is formed for Ij. )(Iµ jxk
can be
interpreted as the degree of similarity of Ij to a hypothetical (or prototype) pure xk type of the
item; or as the degree of presence of value xk in item Ij.
11
We use movie as the domain and movie genre as the attribute to formalize and apply the
heuristic. According to the work by Cooper-Martin [36] in movies marketing application, most
movies are selected for pleasure and expenditures of time. And users choose movies based on
what they like and enjoy. Furthermore, users use subjective movies features such as “funny”,
“romantic” and “scary” (all are a kind of movies genres) to select movies more than objective
features such as the director, theatre location and admission price which are useful but are less
important.
The reasons why genre is considered as the major attribute for the representation of movies,
and similarity computation are: movie genres describe the content of movies, and movies are
multi-genres [35]; and analysis of descriptions of main film genres shows that genre g1 of movies
(e.g. action) and genre g2 of movies (e.g. adventure) are overlapping in terms of their subject
matter and other movie’s attributes. As result of the findings in [36] movies highly liked by users
(rated 4 and above) can be grouped into similar categories by subjective features of movies such
as genre and MPAA rating.
Given the definition of a movie in the space of genre (G), a movie can have one major genre
denoted by x1 and one or more minor genres x2, x3, and so on, in the decreasing order of their
degrees of presence in a movie. The degree of membership of movie Ij (j = 1 …M) to genre xk (k
= 1 …N) is denoted by )(Iµ jxk, which can be obtained heuristically from domain analysis.
Hence, for Ij, we can form a vector Gj={( xk, )(Iµ jxk), k= 1… N}. Since membership function
in fuzzy set theory is deliberately designed to treat the vagueness and imprecision in the context
of the application [10], to determine the degree of genre presence in movies, we take the
following heuristic approach:
12
Step 1: Sort xk in descending order of )(Iµ jxk. In IMDB (www.imdb.com2) the genres of
movie Ij are presented in the order of significance. For example, movie ‘King Kong (2005)’ has
Action as a major genre, and Adventure as the 1st minor, Drama the 2nd minor, Fantasy as the 3rd
minor, and Thriller as 4th minor genres.
Step 2: Assign higher degrees of membership value to more important genres of a movie. For
instance,
If Ij has only one genre, then )(Iµ jxk= 1 and )(Iµ jxk
=0 for all k=2 to N.
If Ij has two genres, then )(Iµ jxk= 1, )(Iµ jxk
= 0.7 and )(Iµ jxk=0 for all k=3 to N.
If Ij has three genres, then )(Iµ jxk= 1.0 and )(Iµ jxk
= 0.50, )(Iµ jxk=0.20 and )(Iµ jxk
=0
for all k=4 to N.
Based on the heuristics illustrated for a movie, the possibility for item Ij to take different values
of X varies, and the membership function should meet the following three criteria: 1) assigning
higher degree of membership to major values than minor values; 2) assigning 0 to values that are
not associated with the item; 3) degrees of membership should be normalized to the range of
[0,1]; and 4) the same value of X at same rank positions between different items should have
varying degrees of membership values if the number of values of X associated with the items are
different. We represent this type of heuristic with a Gaussian-like fuzzy set membership
function, as shown in (1).
)1(|L|*x
j
k2/)( −= kr
kj rI αµ (1)
2 “IMDb History," vol. 2004: Internet Movie Database Inc., n.d.http://www.imdb.com/
13
where N=|Lj| is the number of values of X associated with Ij and rk (1≤rk ≤|Lj|) is the rank
position of value xk, and α > 1 is a parameter used as a threshold to control the difference
between consecutive values of X in Ij. Moreover, α is the only parameter that needs to be
determined for items under consideration.
The reciprocal function representation defined as 1/k for k=1 to N is also investigated. It does
not consider the total number of different genres exist in a movie leading to the same degree of
membership value for same genre at same rank position with different number of genres. We
also consider a uniform distribution membership of genres in a movie: assuming a movie with
multiple genres has equal degree of genre presence for all occurring genres represented by 1 or 0.
This is the baseline Crisp Set representation of movies in the space of genres used in the
empirical evaluation.
The type of function that is suitable dependents on the application context, however in
certain cases the meaning captured by fuzzy sets is not too sensitive to the variations in the
shape. In practice, triangular, trapezoid, Gaussian function, S-function, function−Γ and
exponential-like function are the most commonly used membership functions. Moreover, in
practice, suitable membership function's shape is assumed a priori and its parameters are
determined by domain expert or using machine learning techniques[37]. Therefore, the
membership function (1) is used. Other membership with similar characteristics as (1) can also
be considered.
For example, with α set to 1.2 (after various trails), the movie ‘King Kong (2005)’ is
represented in terms of genres: for |Lj| =5 and xj = {(Action, 1), (Adventure, 0.366), (Drama,
0.272), (Fantasy, 0.211), (Thriller, 0.168)}. Furthermore, for a user 7, Table 1 shows the
14
representation of sample of the rated movies. The validity and reliability of the representation
and inference mechanism are further investigated in section V using the movie dataset.
Table 1: Membership degree of movie to genres for a user 7 Gj (vector for jth movie)
Movie (Ij)Rating xj1 xj2 xj3 … xj15 xj16
56 4 0.6831.0000.438 … 0.000 0.000
79 5 1.0000.0000.000 … 0.467 0.000
89 3 0.6830.0000.000 … 0.000 1.000
.. .. .. .. … .. ..
254 2 0.4380.0001.000 … 0.000 0.0001016 1 0.0000.0001.000 … 0.000 0.000
The representation scheme can be extended to recommender systems based on a combination
of multiple attributes. For example, one can use movie genre describing the content of movies as
the first attribute and actresses/actors as the second attribute. The actors in a movie can be
represented in a vector A={a1, a2, … ak} for K actors. The degree of role or importance of an
actor or actress ak in a movie mi can be represented by degree of membership associated with the
fuzzy set degree of role or importance. That is, Aj ={(ak, )j(mkaµ ), for k=1 to K}, where
)j(mkaµ can be determine heuristically. A single attribute of an item and a user’s feedback on the
items are considered in this paper.
The representation scheme presented for movie can be generalized and applied to any item
with similar characteristics as movie, i.e. multi-valued attributes that can be ordered. A few
examples are music, TV shows, and Book. The heuristic that leads to the membership function in
(1) is developed based on our analysis of the movies dataset, and literature on movies[36] and a
limited experiment on α. This heuristic, for instance, assumes that two genres will not have equal
degree of presence in a movie. This assumption requires further research. Moreover, in future
15
research, studying various membership functions and finding optimal α through evolutionary
computing is needed.
B. User Feedback Representation using Fuzzy Set
User rating is the most widely used user feedback in recommender systems. It is a proxy
variable used for measuring user degrees of interest in an item. In the popular collaborative
algorithm (e.g. [15],[6]) user ratings are represented and interpreted as binary values-those liked
or disliked. In 5-scale ratings, above 3 are considered as liked. However, user rating is
intrinsically imprecise as user may give different ratings to same item at different time and
situation due to the difficulty to make a distinction between rating 4 and 5, and 1 and 2 by the
users. Moreover, the same rating say 4 on a scale of 5 given by two users do not necessarily
imply equal degrees of interest in an item. For pessimist users, a rating of 4 may mean strongly
liked but for optimist rater it may mean somewhat liked. Is the difference between ratings 3 and
4 same as the difference between 4 and 5? These all contribute to fuzziness that arises from
human thinking processes instead of randomness associated with experiments.
Therefore, user interest based on user rating is treated as fuzzy variable and its uncertainty is
represented using possibility distribution function (п). п is defined to be numerically equal to
membership function [10]. For the fuzzy variable degrees of interest in an item (DI) consisting
of Strongly Liked (SL), Liked(L), Indifferent(I), Disliked (D), and Strongly Disliked (SD) fuzzy
values, and associated with user rating (R) expressed in continuum from Minimum value (Min)
to Maximum value (Max). Then, the proposition ‘a User has strongly liked an item I’ has the
possibility distribution function пR(I)=µSL(R=r) , for r between Min and Max. Under this
interpretation or semantic of the fuzzy variable DI, user rating is represented and reasoned using
possibility distribution function by treating the rating as fuzzy number. For instance a rating 4 on
16
5 a scale which refers to strongly liked is represented in terms of its possibility distribution
function values={µSL(R=4)/4, µL(R=4)/4,…,µSD(R=4)/4}. Without losing generalization, a half
triangular fuzzy number, which is the simplest models of uncertain quantity, is used to represent
the degree of positive experience a user has in relation to an item. Equation 2 defines the half
triangular fuzzy number membership function; for user rating r on Ii є [Min to Max] and A is a
fuzzy set on DI
µA(Ii) =(r - Min)/(Max – Min) (2)
As result, a set of items liked by a user denoted by E is defined as: {Ii : µA(Ii) > 0.5, i.e., Ii : r >
(Min+Max)/2}. Due to current practice and available datasets, the range of rating is restricted to
1 to 5. Extending the possible range of rating, e.g., from -100 to 100, and representing within
possibility theory framework are expected to produce more accurate measurement and
representation of user interest in an item. These effects in turn are expected improve the accuracy
of recommender systems that need further study.
C. Inference Engine and Algorithm
Based on the representation scheme defined for items and user feedback to these items, the
inference engine consisting of the recommendation strategies and the similarity measures are
defined.
1) Recommendation Strategies There are different alternatives for inference and making recommendation within fuzzy and
possibility theory framework [12]. The two alternatives that regard both user feedbacks such as
previous user ratings and similarity between previously liked items and a new item are
considered.
17
a) Weighted-Sum recommendation method
For each targeted item Ij, calculate the weighted sum recommendation confidence as:
),()()(IRkIj1 jkE kE IISI∑ ∈
= µ (3)
E is a set of positively experienced (liked) items by users, and )( kE Iµ is the membership of
the item Ik to the fuzzy set E. Furthermore, ),( jk IIS is the similarity between Ij and Ik computed
using the techniques defined next. A normalized evaluation score for each jI ’s recommendation
confidence is obtained using {[ ])(IR/)(IR)(INR k1j1j1k
Max=
b) Maximum-Min or Maximum-Product recommendation methods
For each targeted item Ij, calculate the max-min recommendation confidence as:
{ ( )][ )(),()(2 kEkjEI
j IIISMaxIRk
µ∧=∈
Where ∧ is substituted by min operator, and results in:
{[ {( ) ])(),,()(2 kEkjEIEi
j IIISMinMaxIRk
µ∈∈
= (4)
A normalized evaluation score can be obtained by dividing each jI ’s recommendation
confidence by the maximum of recommendation confidence scores. Finally, the results of
recommendation confidence score )( jk IR is a degree of support to recommend jI , and can be
interpreted as how much the recommender system assumes this user will like the item from the
set of items that are not experienced by the user.
2) Fuzzy Theoretic Similarity Measures
One of the most important issues in recommender systems research is computing similarity
between users, and between objects (items, events, etc.). This in turns highly depends on the
appropriateness and accuracy of the methods of representation.
18
In fuzzy set and possibility framework, similarity of users or items is computed based on the
membership functions of the fuzzy sets associated to the users or items features. Similarity is
studied and applied in taxonomy, psychology and statistics. Similarity is subjective and context
dependent. The set-theoretic, proximity-based and logic-based are the three classes of measures
of similarity [38]. Based on the results of the study by Cross and Sudkamp [38] those measures
that are relevant for items recommendation application are adapted .
jI and kI are defined as {( xi, )(Iµ jx i), i= 1… N} and {( xi, )(Iµ kx i
), i= 1… N}; and
represent the possibility distribution of the item Ij and Ik in the space of X with cardinality N. A
similarity measure between Ij and Ik is denoted by ),( jk IIS . The similarity measures that are
considered are: Fuzzy Set Theoretic in (5), Fuzzy Theoretic Cosine in (6), Fuzzy Theoretic
Proximity (Minkowski’s distance based) in (7) and Fuzzy Theoretic Correlation-like in (8).
{{ )()((
)()(),(1
jIxIx
jIxIxIIS
ik
ii
ik
i
ijk
µµ
µµ
U
I=
=
∑∑
∑
2)((2))((
)(*)(),(2
k iik
i
i ik
ijk
jIxIx
jIxIxIIS
µµ
µµ
{
21
1
2)()(),(2
Where
)(),(
),(21),(3
∑=
−=
−=
L
i jIixkI
ixjIkId
jIixkI
ixi
Max
jIkIdjIkIS
µµ
µµ
(5)
(6)
(7)
21)j(I
ixµ*2jIZ
21)k(I
ixµ*2kIZ
Where
22
214
∑
−
=
∑
−
=
+−= )j,Ik(Id
jIZkIZ
)j,Ik(IS
As a baseline, for assessing the effect of fuzzy representation scheme and recommendation
strategy on item feature (e.g. genre) represented in crisp set in addition to user feedback (e.g.
rating), is defined in (9). It is called Crisp Set Based Similarity Measure (CSM). In (9) )( ki
Ixµ is
1 if the feature value ix exists in kI . Otherwise )( ki
Ixµ is assigned to 0. (5) and (9) are similar
in all respect except the difference in the representation scheme. Moreover, in (5) and (9) the
intersection operator is replaced by minimum; maximum operator replaces union; and | | is the
cardinality measure replaced by the sum of the membership degrees.
{{ )()((
)()(),(5
jIxIx
jIxIxIIS
ik
ii
ik
i
ijk
µµ
µµ=
U
I
For example, movies I1=Copycat 1995: Crime/ Mystery/Thriller/Drama, and I2=Gru
Horror/Thriller/Mystery with their crisp (I1 and I2) and fuzzy (G1 and G2) represen
shown in Table 2. Using crisp set theoretic (9), crisp cosine, and crisp distance me
similarity between I1 and I2 are 0.50, 0.58, and 0.73 (|1 – distance|), respectively. Fin
fuzzy theoretic similarity measures (5), (6), (7), and (8) the similarity coefficients are
0.55 and 0.18. The difference in the representation leads to the variation in similarity
)
(8)
(9
19
dge 2004:
tations are
asures the
ally, using
0.24, 0.25,
measures
20
between I1 and I2. The effect of these differences in similarity measures on recommender
systems’ accuracy is evaluated in Section Five.
Table 2: Movies Representation in Space of Genres Crime Horror Mystery Thriller DramaI1 1 0 1 1 1G1 1 0 0.44 0.35 0.29 M2 0 1 1 1 0G 0 1 0.41 0.47 0
3) Algorithm and Computational Complexity
The general algorithm is presented in Figure 1. First, the algorithm uses items for which the
user has expressed a positive degree of interest or preference; and then the algorithm finds
similar items that the user has not known, experienced or possessed. Using the representation
schemes, the various similarity measures and inference strategies various algorithms are
designed and implemented. The evaluation of these algorithms for a simulated movie
recommender system is presented in Section V.
), i= 1… N}; and {(xi, )(Iµ kx i), i= 1… N}
//Input: user feedbacks (RI) =Vector of < item id, attributes’ values xi, feedbacks on the item> //Input: Items available for recommendation (PI) = Vector of < item id, attributes’ values> //Input: training size (R), potential number of items for recommendation (P)//Output: Degree of memberships of an item to values of an attribute //Output: Degree of a user’s interest in an item as recommendation confidence score //Output: NRCS=<user id, item id, recommendation score, normalized score> // E = Recent items with positive feedback by a user, defined as: {Ii : r > (Min+Max)/2} // where A is the set Strongly Liked (SL) or Liked(L). For each user do {
E � Select R items with positive feedback from RI //For R items with positive feedback from RI For k=1 to R do {
Ik � Load the kth item from E For ∀xi, compute )(Iµ tx i
//Fuzzification of attribute’s content of an item using (1) Compute µA(Ik) //Fuzzification of user feedback (rating) on an item using (2)
} next item //For all targeted items available for recommendation compute confidence scores For j= 1 to P do { Ij � Load the jth item from PI For ∀xi, compute )(Iµ jx i
For k=1 to R do { compute ),( jk IIS //similarity between two items using any of (5) to (9) RCS (Ij) � Compute Recommendation Score (µA(Ik), ),( jk IIS ) using (3) or (4) }}next targeted item // Compute Normalized recommendation scores For ∀j, maxNR � Maximum{RCS (Ij)} For j= 1 to P do { NRCS(Ij) � RCS(Ij)/maxNR }
}next user ii) TOP-N Recommendation //Input: Normalized Recommendation Confidence Scores (NRCS) //Output: Top-n items For each user Sort NRCS by normalized recommendation score in descending order Select top n items Recommend the top n items Figure 1: Algorithm for Item-Item Similarity Based Recommender System
21
22
Let M be the total number of items for which feedbacks are available from a user, e.g. 20 to
737 movies rated. Let N be the total unique values of the feature considered, e.g. 20 genres.
Then, to compute recommendation scores for a single item (say movie) for a user, the
computational complexity of the algorithm is in the order of O(MN). This computational
complexity is greatly reduced compared with those of both user-user and item-based
collaborative filtering algorithms, which have an upper bound computation complexity in the
orders of O(n2m) and O(m2n) respectively for m items rated by n different users [6]. Moreover,
since M*N is lower than m*n, the proposed algorithm is scalable and its time complexity is less
than that of traditional collaborative algorithms.
IV. EVALUATIONS
The performance of the proposed method and algorithm are evaluated using movie as an item.
We select movie as the test domain because movies have multi-valued attributes with vague and
subjective features. For example, movies are multi-genres, multi-actors, etc. [35]. Moreover, the
following features of movies support inference based on movie genres and user past movies
watching behavior [36]: (i) As one of the experiential products, mostly movies are selected for
pleasure and expenditures of time. Thus, consumers choose movies based on what they’ve liked
and enjoyed before; and (ii) Consumers use subjective features such as “funny”, “romantic” and
“scary” (reflect movies genres) to select movies more than objective features such as the
director, actresses/actors, theatre location, and admission price.
Simulation recommender systems for movies are designed and developed to assess the
effectiveness of the proposed methods. The simulation system works as follows:
a. For each user, it randomly splits the ratings dataset into a training set and a test set.
23
b. Using the training set, it computes recommendation confidence score for each item in the
test set using the different similarity measures and recommendation strategies, Figure 1.
c. For each user, using the movies in the testing set, it computes Top-N recommendations and
the recommendation accuracy– precision, recall, and F1–measure. Moreover,
computational time for learning user preference and presenting the recommendation are
recorded.
d. Using different random selection of the movies into testing and training sets, 10 different
runs are executed to avoid sensitivity to sampling bias, and the average results are reported.
A. Dataset and Preprocessing
The benchmark dataset from MovieLens at the University of Minnesota
(http://movielens.umn.edu), which has been widely used in recommendation research, is
employed in this study. The dataset includes movie attributes, user ratings, and simple user
demographic information. The dataset consists of 100,000 ratings (1-5) from 943 users on 1682
movies; and each user has rated at least 20 and at most 737 movies. In the dataset, movies are
described with: movie id, movie title, release date, video release date, IMDb URL, and 20 genres
including action, adventure, animation, children's, comedy, crime, documentary, drama, fantasy,
film-noir, horror, musical, mystery, romance, sci-fi, thriller, war, western, family, and unknown.
Genres in the MovieLens dataset are represented with binary values, which do not reflect the
true content of movies in the genre space. Therefore, we use the proposed representation scheme
by incorporating information about movie genres from the Internet Movie Database (imdb.com),
which is a large database consisting of comprehensive information about past, present and
upcoming movies.
24
B. Evaluation Metrics
The accuracy metric and latency are computed for measuring the performance of the proposed
method.
1) Accuracy metrics Accuracy is a commonly used metric for a recommender system based on user tasks or goals
[13]. The accuracy metrics includes predictive and recommendation accuracy measures.
Predictive accuracy measures such as mean absolute error, mean square error and percentage of
correct predictions are found to be less appropriate when the user task is to find ‘good’ items and
when the granularity of true value is small because predicting a 4 as 5 or a 3 as 2 makes no
difference to the user [39], [13]. Instead, recommendation accuracy metrics including recall,
precision and F1-measures are more appropriate. Approximations to the true precision and recall
are computed using movies for which ratings are provided and held for testing. This approach of
measuring performance is widely used in recommender systems research, e.g., [6], [19].
Precision measures the ratio of correct recommendations being made. Recall reflects the
coverage or hit rate of recommendations. F1-Measure = (2*precision*recall)/(precision + recall)
is a single metric that combines precision and recall. They are defined as:
Precision = (no of movies in the TOP-N movies with rating ≥ 4) / N
Recall = (no of movies in the TOP-N with rating ≥ 4) / M
Where M is total number of movies rated as interesting (rating ≥ 4) in the test cases; and N is
total number of movies recommended.
2) Computational Time Computational time refers to the time the system takes to provide list of ‘good’ movies for a
user. It is consisting of model time and recommendation time. Model time is the time to predict
the degree of a user interest in a movie. Recommendation time is the time to find top-N ‘good’
movies for a user. Both are measured in milliseconds.
25
C. Experimental Design
The different evaluation metrics defined such as precision, recall, F1-measure, model time and
recommendation time are the dependent variables. Table 3 presents the independent variables.
On each of the dependent variable, a 2 x 5 x 6 x 12 factorial design is conducted. In this design
the fixed factors are recommendation strategy, similarity measure, training size (no of training
cases) and recommendation size (value of N for Top-N). The effect of testing size is also studied
as covariate variable. Moreover, normalization is applied on the recommendation scores; all
statistical testes are done at 5% level; and movies with ratings 4 and 5 are considered as movies
liked by users.
Table 3: Independent Factors Independent Factor Description Possible values Recommendation strategy
Fuzzy reasoning as recommendation strategies
Weighted-Sum or Maximum-Min
Similarity Measure Analogical reasoning using fuzzy set and possibility theory
Crisp Set-theoretic3, Fuzzy Set-Theoretic, Cosine, Proximity-based, and Correlation-like
Model size is the number of training cases (the K most liked movies by a user)
Movies rated by users
5,10,15,20,25,30
Recommendation size is Top-N
Total number of items to be recommended
3,4, 5, 10,15, 20, 25, 30, 35, 40, 45, 50
No of testing cases (no of items considered for recommendation)
Movies assumed not seen or experienced by a user
Total number of rated movies minus total number of movies considered for training
Simulation implementations and simulation runs are performed on an Intel Pentium(R) 4,
running at 2.66 GHz, 504 Mega Bytes of memory, and Windows operating systems workstation.
3 This is used as a baseline to assess the effect of fuzzy representation and reasoning
26
The Java 2 programming language, Sun ONE Studio 4 CE integrated development environment
(IDE) for Java technology and Microsoft Access 2002 are used as software tools for
implementation and simulation runs of the movie recommender system.
V. RESULTS
After presenting the result of the reliability and validity test, first the mean performance of the
proposed algorithms is presented along with the results of comparative analysis between a Fuzzy
Theoretic Method (FTM) that includes Fuzzy Set-Theoretic, Cosine, Proximity-based, and
Correlation-like, and the baseline Crisp Set similarity-based method (CSM) defined in (9).
Second, the results of analysis of the effect of the different factors __ similarity measure,
recommendation strategy, training size, recommendation size and testing size __ on the
recommendation performance are presented. Finally, the results of a reasonable comparison in
the performance between the proposed method and other methods that have used the MovieLens
dataset are presented.
A. Validity and Reliability Analysis
The FTM uses recently movies liked by the user; and it depends on the degree of similarity
between the targeted movie and those rated movies as well as the aggregation strategy. They are
based on their degree of memberships to genres under the assumption that users prefer movies
with similar genres. Figure 2 visually presents the distribution of the genres by the three
preference categories for user 7 (see sample data in Table 1). It shows that User 7 likes drama,
romance and adventure; dislike thriller; and indifferent to comedy, action and crime.
27
disliked liked indifferent
Preference
0.00000
0.10000
0.20000
0.30000
0.40000
0.50000
Mean
DramaComdeyActionThrillerRomanceAdventureAnimationCrime
Figure 2: Genre Preferences distribution for User 7: <age=57, Male, Administrator>
Based on the representation of movies’ genres using (1) as indicated in Figure 2, we check
the soundness of the representation by verifying the following assertion: users have preference to
movies of specific genres (e.g. Drama, Comedy and Family) with some degree of memberships
over others (e.g. Action, Horror and Adventure). First, movies are grouped into three groups:
group 1 consisting of movies with rating 1 and 2 (labeled as disliked), group 2 consisting of
movies with rating 4 and 5 (labeled as liked), and group three consisting of movies with rating 3
(labeled as indifferent). Second, One-way ANOVA is run and it shows that there are significant
differences in the average degree of membership of movies for the different genres among those
movies which are liked disliked and indifferent by users with the exception of Animation,
Documentary, Musical and Science Fiction. Had it been a random occurrence of genres with
respect of users’ interest, the results would have been non-significant. The reason for non-
significant difference in the four genres is due to their rare occurrence in movies database.
Therefore, the representation scheme supports the assertion that users could prefer one or more
28
genres with varying grades.
Since recommendation are made using recommendation scores, another way of checking the
reliability of the representation and inference procedure is to test correlation between user ratings
and recommendation confidence scores computed using (3) and (4) per a user. Using the
Spearman's rank correlation, significant bi-variants relationships are found. This validates the
soundness of the inference procedure.
B. Overall Performance
Mean of recommendation accuracies by recommendation strategy and similarity measure are
presented in Figure 3 (a) and (b). The maximum mean precision, recall and F1 measure are
around 55%, 38%, and 38% for FTM compared to the 49%, 39% and 38% obtained for the CSM
(Crisp Set Similarity based) approach, respectively. Figure 4 shows the average precision for
different recall levels. Overall an unsteady decline in precision is observed as recall increases,
and the decline is consistent through all similarity measures. The Fuzzy Set and Cosine based
approaches have better precision for fixed recall. Why? How? So what ???
29
0.00
0.10
0.20
0.30
0.40
0.50
0.60
precision45 recall45 f1ratio
Crsip Set
Fuzzy Set
Cosine
Proximity
Corrleation
Figure 3 (a): Average/Mean recommendation accuracies by similarity measures for
Weighted-Sum Recommendation Strategy
0.00
0.10
0.20
0.30
0.40
0.50
0.60
Precision Recall F1-Measure
Crsip Set
Cosine
Fuzzy Set
Proximity
Corrleation
Figure 3 (b): Average/Mean recommendation accuracies by similarity measures for
Max-Min Recommendation Strategy
30
Figure 4: Average Precision by Recall for Weighted-Sum Approach
C. Effect of Recommendation Strategies
Figure 5 shows the effect of recommendation strategy on performance. The precision is not
affected by the recommendation strategy, but for recall and F1 measure the Max-Min
recommendation strategy yield an improved results. Therefore, depending on the tasks,
recommender systems designers can choose one of these recommendation strategies that would
produce optimal performance. For instance, Weighted-Sum recommendation strategy would be a
choice for task of finding top 10 “good’ items because it results in higher precision with lower
latency; and Maximum-Minimum aggregation strategy would be a choice for the task of finding
all “good” items because it results in higher recall with lower latency. Further analysis is
31
performed in order to identify whether these differences are significant or not, and results are
presented in the following subsections.
Figure 5 (i) : Mean recommendation precision
Figure 5 (ii): Mean recall
32
Figure 5 (iii): Mean F1 measure D. Comparison between Fuzzy-Theoretic Approaches and the baseline Crisp Set-Theoretic
Approach
Pair-wise multiple comparisons test using Bonferroni4 test is conducted between mean
performance of Crisp Set theoretic approach and the varieties of Fuzzy theoretic approaches. The
results are summarized as follows.
i. With weighted-sum recommendation strategy, all the Fuzzy Set-theoretic
similarity measures driven movie recommender system average precisions and F1-measure (with
the exception of proximity similarity measure where its average F1-measure is not significantly
different) are greater than the corresponding Crisp Set-theoretic similarity measure driven movie
recommender system. Moreover, there is no statistically significant difference between the two
groups in their average recall except in the case of the proximity-based similarity measure where
its average recall is lower.
4 The Bonferroni test, based on Student's t statistic, adjusts the observed significance level for the fact that multiple comparisons are made. It is commonly used multiple comparison tests, and more powerful for a small number of pairs [SPSS PC User Manual].
33
ii. With maximum-minimum recommendation strategy, all the fuzzy-theoretic
similarity measures driven movie recommender system average precisions are greater than the
corresponding crisp set-theoretic similarity measure driven movie recommender system.
Furthermore, all the fuzzy-theoretic similarity measures driven movie recommender system
average recalls are less than the corresponding crisp set-theoretic similarity measure driven
movie recommender system. Finally, except cosine similarity measure where there is no
significant difference, all the fuzzy-theoretic similarity measures driven movie recommender
system average F1-measures are less than the corresponding crisp set-theoretic similarity
measure driven movie recommender system.
In summary, the Fuzzy set theoretic and fuzzy based cosine, proximity and Correlation
similarity measures result in greater mean value of the precision compared to the crisp set
theoretic similarity measure using the two recommendation strategies. For recall, only
comparable results are obtained by both recommendation strategies. For F1-measure, better
results are obtained by weighted-sum recommender approach except when proximity similarity
is used. In case of the minimum-maximum recommendation strategy for F1-measure,
comparable results are obtained.
E. Comparison among the Fuzzy-Theoretic Approaches
Furthermore, the performance differences among the other fuzzy-set theoretic similarity
based approaches are analyzed using Bonferroni test. The results show significant different
among the different alternative combination of fuzzy theoretic similarity measures and
recommendation strategies in their recommendation accuracy. For precision, fuzzy theoretic set
and cosine similarity measures perform better for both inference strategies. For recall and F1-
34
measure, correlation and cosine similarity measures performer better with maximum minimum
inference strategies. In all cases, distance based similarity measure performs poor. The ordered
performance by the inference strategies and similarity measures on the different metrics are
presented in Table 4. For precision Cosine and Fuzzy Set based are the better choices. For recall,
Fuzzy Set based is a better choice when the recommendation strategy is weighted-sum, but
Cosine is the better choice when the recommendation strategy is Max-min. For F1 measure,
Correlation-like and Cosine based are the better choices when the recommendation strategy is
weighted-sum, but Cosine is the better choice when the recommendation strategy is Max-min.
Table 4: Summary of ordered performance by the inference strategies and similarity measures Metric Inference strategy Order for movie recommendation Precision Weighted-Sum Prox < Cos=FS=Corr
Maximum-Minimum Prox < Corr=Cos=FS Recall Weighted-Sum Prox < Cos < FS
FS=Corr; Cos=Corr Maximum-Minimum FS < Prox < Corr < Cos
F1-Measure Weighted-Sum Prox < Cos < Corr=FS Maximum-Minimum Prox=FS < Corr < Cos
F. Model Size and Recommendation Size Sensitivity Analysis
Since similar patterns are obtained in both recommendation strategies, results from
weighted-sum recommendation strategy are presented. Overall, as shown in Figure 6 (i), (ii) and
(iii), precision does not increase as the training size increases; and recall and F1-measure are not
significantly increasing as the size of training increases. The reason for this is once sufficient
number of items that suggest the preference of a user are gathered additional feedback on items
do not improve the preference of the user as well as the recommendation accuracies. Pair-wise
mean comparison test show no significant difference in performance by training size. Similar
pattern is also reported in [6]. These results are consistent through all similarity measures across
35
the two recommendation strategies. Furthermore, overall, a training size of 5 to 15 appears to
produce satisfactory recommendation accuracy.
Figure 6 (i): Means plot of recommendation accuracy metrics for Precision
Figure 6 (ii): Means plot of recommendation accuracy metrics for Recall
36
(c)F-ratio Figure 6 (iii): Means plot of recommendation accuracy metrics for F1-measure
Overall, as shown in Figure 7 (i), when top-N increases the precision is not significantly
increasing. Hence a Fuzzy theoretic based recommender system does not need to recommend a
large number of movies to achieve a better precision. However, recall increases as the size of
recommended items increases. The results are consistent through all similarity measures across
the two recommendation strategies. Furthermore, overall, 3 to 10 recommendations look to
produce satisfactory recommendation accuracy. Moreover, the change in precision in Crisp–
based approach is greater than that of FTM. This indicates the former approach is more sensitive
to recommendation size.
37
Figure 7 (i): Means plot of Precision for different similarity measures by top-N
Figure 7 (ii): Means plot of Recall for different similarity measures by top-N
38
Figure 7 (iii): Means plot of F1-measure for different similarity measures by top-N
G. Effect of Testing Size
Testing size is the actual number of items that are considered for recommendation, and from
which a few relevant items will be recommended to a user. Given the large number of
prospective items in a database, does a recommender system consider all or subset of these items
to come up with a few top interesting items? Does the size affect the accuracy of the
recommendation?
The Univariate analysis shows that testing size has significant effect on precision, recall and
F1-measures. As presented in Figure 8, for a marginal recommendation size, precision increases
as testing size increases, where as recall decreases as testing size increases. An approximate
optimal performance seems to be gained when the testing size is between 21 and 60. Further
analysis is left for future research.
39
Figure 8: Mean recommendation accuracies by testing size
H. Computation Time
Performance of simulated recommender system is analyzed in terms of its model time and
recommendation time. The means model time using the different similarity measure is in range
0.03 to 0.12 seconds for both the weighted-sum recommender and the maximum-minimum
recommender method. For top-10 the FTM mean recommendation time is 0.21 seconds for 10
movies and approximately 0.021 seconds per item. Moreover, analysis of variance shows
significant variations (sign.=0.00) in mean model time by similarity measures, as shown in
Figure 9. When we compare the mean model time of the Crisp-based with that of the FTM it is
low because the FTM has additional fuzzification and inference processes.
40
Figure 9: Means of Model Time for Weighted-Sum Recommendation Approach
I. Comparison with Related Conventional Approaches
Creating an equivalent base for a fair comparison between the results of the proposed approach
and the results of reported studies for movies recommendation using the same MovieLens
dataset is not straightforward. For example, comparison with collaborative filtering algorithms
sounds is unfair because our approach has additional item feature. Moreover, the experimental
set-ups are different because they are not standardized. Notwithstanding these limitations we
present the results of approximate comparisons. We are not claiming to prove that our approach
in its present state is superior to existing approaches. Rather, we are attempting to show the
potential of this approach for recommender systems and provide a framework for future research
in this area.
Among other related successful studies using MovieLens, most studies reported results
for top-5, top-10, top-15, and top-20 with model size of 20 to 50 as in [21], [15], [6], [19].
41
Results using model size of 20 and recommendation size of 10 are optimal in those studies.
These results are summarized in Table 5 along with equivalent results of the proposed FTM
methods.
Probabilistic memory-based collaborative filtering [20] is another approach to modeling
uncertainties due to stochastic nature of preference. Evaluations of the approach showed that on
EACHMOVIE (ratings on movies) dataset, the mean precision, recall, and F1 measure of top-10
are 66%, 51%, and 57% respectively; and on JESTER (ratings on jokes) dataset, the mean
precision, recall, and F1 measure of top-10 are 40%, 47%, and 43% respectively. Other studies
that used both EACHMOVIE and MOVIELENS reported greater performance on EACHMOVIE
[40],[6]. For example, Deshpande & Karypis [6] reported 40% of top-10 hit rate for
EACHMOVIE compared to 26% for MovieLens dataset. Therefore, we expect that the
performance of the proposed algorithms should be better than probabilistic CF approaches on
these datasets. However, this assertion needs to be empirically verified in future research.
As indicated in Table 5, the results for proposed approach have a greater average precision,
comparable recall and greater F1 measure. This improvement has to be due to the representation
scheme and inference strategy used in the FTM.
42
Table 5: Summary of Performance Comparison Approaches Recommendation
Size Model size Performance
Metrics Results
Precision 52.7%
Recall 28.4%
Propose herein
(FTM best results)
Top-10 20
F1-measure 31.6%
Precision 44.1%
Recall 32.91%
The Baseline (CSM) Top-10 20
F1-measure 34.58%
Item-Based CF [6]
User-Based CF [6]
Top-10
Top-10
20
20
Recall
Recall
27%
28%
User-Based CF [4] Top-10 50 F1-measure 23%
User-based (belief distribution
and nearest-neighbor algorithm) [19]
Top-15 3:1 split Precision 23%
VI. DISCUSSION
The issues we would like to discuss about the results of proposed method are about: the
features used in the model and their representation, recommendation approach, similarity
measure, model size, recommendation size, and testing size along with their effect on the
performance.
Using the baseline Crisp set theoretic similarity-based (9) Crisp set theoretic similarity measure
based recommender systems that use genres and ratings represented in crisp set result in lower
precision compared to Fuzzy theoretic similarity measures based recommender systems that use
genres and ratings represented using Fuzzy set. For FTM, approximately the precision increases
by 19.50% (from 0.44.1 to 0.52.7, Table 5). Precision is improved because the inclusion of the
43
item feature with the fuzzy theoretic-based representation and inference mechanism reduces the
noises in the recommended items.
Recall is not improved and further investigation of the other factors reveals that training size
found to be a moderating factor (see Figure 6 (ii)). For training size 10 to 25, it is found that
FMT recalls are greater than that of CSM. Further research is needed to find why recall for CSM
is larger at 30 and if it continues for training size above 30. Moreover, compared to precision,
recall is highly affected by Top-n, testing size and training (see Figures 5, 6 and 7).
In recommender systems, the focus is on finding ways to improve precision rather than recall
because the system can only recommend a limited number of items, usually 5 to 20. Therefore,
careful modeling of features of item and user behavior using fuzzy theory has improved the
precision with comparable recall.
Discussion about approach and similarity measures: {follow here!!} w.r.t other methods
Overall, the max-min recommendation strategy produces better performance in terms of recall
and F1 measure. For precision, there is no significance difference between the two approaches.
Why?
Overall, the Cosine and Fuzzy Set similarity measures produce better performance. The result
is consistence with previous results. Why FS and cosine???
The movie by genre matrix that is used to compute similarity is spares. Nearly 50%, 34%,
13%, 3%, 0.7%, 0.3% of the movies in the data set have 1, 2, 3, 4, 5, or 6 genres, respectively.
With vector of size 20 genres for a movie most of the values are zeroes. This affect most the
proximity and correlation-like similarity measures. Cosine and fuzzy set based similarity
measures are not affected. Why see below:
44
In case of proximity measure, the similarity coefficient can be nonzero even when there
is no common element between two objects. It is appropriate in applications where we want to
know the “ nearest” fuzzy value even in case when the system does not find any fuzzy value with
non-empty intersection. This means fuzzy sets with non-empty intersection can have the same
similarity as fuzzy sets with empty intersection. The sparsity in the movie by genres matrix has
less desirable to the distance-based similarity measures. The experimental results have also
shown supporting results. Correlation-like produces the next worst performance because: it has
similar characteristics as the proximity measure. However, compared to distance–based measure,
it is a weighted squared Euclidean distance.
The cosine measure is based both on magnitude and direction of the vectors. This measure
captures a scale invariant understanding of similarity, and it ignores the effect of the proportional
size differences. Furthermore, Fuzzy Set based Similarity Measure is based on the set-theoretic
operations on fuzzy sets, fuzzy set cardinality and commonality of attributes as the foundation.
The sparsity in the movie by genres matrix has less effects on the cosine and fuzzy set based
similarity measures. The experimental results have also shown supporting results.
The model size, recommendation size and testing size sensitivity analysis produce lower values
which are consistence results with previous results such as [1, 6, 15, 41]. This in turn also
supports the scalability of the algorithm for online recommendation.
Some of the potential applications of the proposed methodology are in recommender systems,
information filtering, tutoring systems, and web mining. The successful application of the
proposed method for movies recommender system makes their extension to recommendation of
other items such as music, jokes, books, etc. in e-commerce application highly promising. The
application of the proposed method for information filtering such as news filtering is also
45
promising. Related literature such as [18, 39, 42] can be extended to handle uncertainty due to
imprecision and vagueness using the proposed methods and algorithms. For instance,
representation of user news interest, content of news, and their relationships as well as
computing similarity and recommendation scores can be done in similar ways as for the movie
recommender system.
VII. CONCLUSION AND FUTURE RESEARCH
This research develops a fuzzy theoretic methodology for recommender systems. Using actual
data on movies, the results of the simulation study provide empirical evidence that supports the
effectiveness of FTM in terms of higher recommendation accuracy, lower model size and
recommendation size, and lower latency – lower model time and recommendation time. Hence,
integrating user and item features, and using fuzzy and possibility theories as the foundation for
representing and reasoning about uncertainty contribute to the effectiveness and efficiency of the
proposed method for recommender systems.
Furthermore, the study shows that there are significant different among the different alternative
combination of fuzzy theoretic similarity measures and recommendation strategies in their
recommendation accuracy. Hence, depending on the tasks, recommender systems designers can
choose a combination of recommendation strategies and similarity measures that would produce
optimal performance. A guideline is provided. Moreover, the study shows the effect of model
size, recommendation size and test size on the recommendation accuracy and latency on a FTM
based recommender system. Roughly speaking, the performance of FTM is better than
collaborative filtering. However, compare to collaborative filtering approaches that relies only on
user feedbacks such ratings, the proposed approach requires item features and domain
knowledge. A promising development is an automatic identification of these features, e.g., movie
46
genres by exploiting audio-visual cues presented in the movies.
Further studies are planned to extend this approach in several directions. First, inclusion of
additional attributes e.g. actors/actresses, and directors for movies expected to improve the
performance of the system. Second, genetic algorithm can be used for learning and tuning the
fuzzy sets parameters of the membership function defined in (1) as well as to find a better
membership function. Third, test the FTM approach with additional datasets and domain
applications to see the generalization of the results.
47
References
[1] J. B. Schafer, K. J, and J. Riedl, "Electronic Commerce Recommender Applications," Journal of Data Mining and Knowledge Discovery, vol. 5, no.1/2, pp. 115-152, 2001.
[2] R. Burke, "Hybrid Recommender Systems: Survey and Experiments," User modeling and user-adapted interaction, vol. 12, no.4, pp. 331-370, 2002.
[3] G. Adomavicius and A. Tuzhilin, "Toward the Next Generation of Recommender Systems: A Survey of the State-of-the-Art and Possible Extensions," IEEE Transactions of Knowledge and Data Engineering, vol. 17, no.6, pp. 734-749, 2005.
[4] B. M. Sarwar, G. Karypis, J. A. Konstan, and J. Riedl, "Analysis of Recommender Algorithms for E-Commerce," Proc. Proceedings of the ACM E-Commerce 2000 Conference, pp. 158-167, 2000.
[5] B. Mobasher, R. Cooley, and J. Srivastava, "Automatic Personalization Based on Web Usage Mining," Communication of ACM, vol. 43, no.8, pp. 142-151, 2000.
[6] M. Deshpande and G. Karypis, "Item-Based Top-N Recommendation algorithms," ACM Transactions on Information Systems, vol. 22, no.1, pp. 143-177, 2004.
[7] R. Mukherjee, P. S. Dutta, and S. Sen, "MOVIES2GO: A new approach to online movie recommendation," Proc. Joint Conference on Artificial Intelligence's (IJCAI's) and Workshop on Intelligent Techniques for Web Personalization, 2001.
[8] R. Mukherjee, N. Sajja, and S. Sen, "A Movie Recommnedation System - An Application of Voting Theory in User Modeling," User modeling and user-adapted interaction, vol. 13, no.1-2, pp. 5-33, 2003.
[9] L. A. Zadeh, "Fuzzy Logic, Neural Networks, and Soft Computing," Comm. ACM, vol. 37, no.3, pp. 77-84, 1994.
[10] S.-M. Hsu, C. Wu, and T.-W. Tien, "A Fuzzy Mathematical Approach for Measuring Multi-facet Consumer Involvement in the Product Category Evaluation," Marketing Research On-Line, vol. 3, pp. 1-19, 1998.
[11] P. Smets, "Theories of Uncertainty," in Handbook of Fuzzy Computation, E. Ruspini, P. P. Bonissone, and W. Pedrycz, Eds. Philadephia, PA: Institute of Physics Publishing, 1999.
[12] R. R. Yager, "Fuzzy logic methods in recommender systems," Fuzzy Sets and Systems,vol. 136, pp. 133-149, 2003.
[13] J. L. Herlocker, J. A. Konstan, L. G. Terveen, and J. T. Riedl, "Evaluating Collaborative Filtering Recommender Systems," ACM Transactions on Information Systems, vol. 22, no.1, pp. 5-53, 2004.
[14] K.-W. Cheung, J. T. Kwok, M. H. Law, and K.-C. Tsui, "Mining customer product ratings for personalized marketing," Decision Support Systems, vol. 35, pp. 231-243, 2003.
[15] B. M. Sarwar, G. Karypis, J. A. Konstan, and J. Riedl, "Item-based collaborative filtering recommendation algorithms," Proc. 10th International World Wide Web Conference (WWW10), pp. 285-295, 2001.
[16] G. Linden, B. Smith, and J. York, "Amazon.com recommendations: item-to-item collaborative filtering," Internet Computing, IEEE, vol. 7, no.1, pp. 76-80, 2003.
[17] I. Zukerman and D. W. Albrecht, "Predicitive Statistical Models for User Modeling," User Modeling and User_Adapted Interaction 11: 5-18, 2001., vol. 11, pp. 5-18, 2001.
48
[18] J. A. Konstan, D. Maltz, J. Herlocker, L. Gordon, and J. Riedl, "GroupLens: applying collaborative filtering to Usenet news," Communications of the ACM, vol. 40, no.3, pp. 77-87, 1997.
[19] M. R. McLaughlin and J. L. Herlocker, "A Collaborative Filtering Algorithm and Evaluation Metric that Accurately Model the User Experience," Proc. 27th Annual ACM Conference on Research and Development in Information Retrieval, pp. 329-336, 2004.
[20] K. Yu, A. Schwaighofer, V. Tresp, X. Xu, and H.-P. Kriegel, "Probabilistic Memory-Based Collaborative Filtering," IEEE Transactions of Knowledge and Data Engineering,vol. 16, no.1, pp. 56-69, 2004.
[21] G. Karypis, "Evaluation of Item-Based Top-N Recommendation algorithms," Proc. 10th Conference of Information and Knowledge Managment, 2001.
[22] T.-H. Hsu, K.-M. Chu, and H.-C. Chan, "The Fuzzy Clustering on Market Segment," Proc. The Ninth IEEE International Conference on Fuzzy Systems, pp. 621-626, 2000.
[23] U. Kaymak, "Fuzzy target selection using RFM variables," Proc. IFSA World Congress and 20th NAFIPS International Conference, 2001. Joint 9th, pp. 1038-1043, 2001.
[24] O. Nasraoui and C. Petenes, "Combining Web Usage Mining and Fuzzy Inference for Website Personalization," Proc. Fifth WebKDD Workshop: Web mining as a Premise to effective and Intelligent Web Applications(WebKDD'2003), in conjunction with the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 37-45, 2003.
[25] M. Ozer, "User segmentation of online music services using fuzzy clustering," Omega: The International Journal of Management Science, vol. 29, no.2, pp. 193-206, 2001.
[26] S. Mitra, S. K. Pal, and P. Mitra, "Data Mining in Soft Computing Framework: A Survey," IEEE Transactions on Neural Networks, vol. 13, no.1, pp. 3-14, 2002.
[27] J. Basak and M. Kumar, "E-commerce and soft computing: scopes of research," IETE Technical Review, vol. 18, no.4, pp. 283-293, 2001.
[28] G. Meghabghab, "Mining User's Web Searching Skills through Fuzzy Cognitive State Map," 2001.
[29] M. Setnes and U. Kaymak, "Fuzzy modeling of client preference from large data sets: an application to target selection in direct marketing," IEEE Transactions on Fuzzy Systems,vol. 9, no.1, pp. 153-163, 2001.
[30] J. M. Sousa, U. Kaymak, and S. Madeira, "A comparative study of fuzzy target selection methods in direct marketing," Proc. Proceedings of the 2002 IEEE International Conference on Fuzzy Systems, pp. 1251-1256, 2002.
[31] R. R. Yager, "Targeted E-commerce Marketing Using Fuzzy Intelligent Agents," IEEE Intelligent Systems, vol. 15, no.6, pp. 42-45, 2000.
[32] D. Dubois and H. Prade, "General Introduction," in Fundamentals of Fuzzy Sets, The Handbooks of Fuzzy Sets, H. Prade, Ed. Boston: Kluwer Academic Publishers, 2000, pp. 21-124.
[33] L. A. Zadeh, "Fuzzy Sets," Information Control, vol. 8, pp. 338-353, 1965. [34] T. Bilgiç and I. B. Turksen, "Measurement of Membership Functions: Theoretical and
Empirical Work," in Fundamentals of Fuzzy Sets, Handbook of Fuzzy Sets and Systems,D. Dubois and H. Prade, Eds. Boston: Kluwer, 2000, pp. 195-232.
[35] R. Altman, Film/Genre. London: British Film Institute, 1999. [36] E. Cooper-Martin, "Consumers and Movies: Some Findings on Experiential Products,"
Advances in Consumer Research, vol. 18, pp. 372-378, 1991.
49
[37] W. Pedrycz and F. Gomide, An Introduction to Fuzzy Sets. Cambridge, Massachusetts: The MIT Press, 1998.
[38] V. V. Cross and T. A. Sudkamp, Similarity and Compatibility in Fuzzy Set Theory: Assessment and Applications. Heidelberg; New York: Physica-Verlag, 2002.
[39] D. Billsus and M. J. Pazzani, "User Modeling for Adaptive News Access," User modeling and user-adapted interaction, vol. 10, no.2-3, pp. 147-180, 2000.
[40] D. O. Sullivan, B. Smyth, and D. Wilson, "Preserving Recommender Accuracy and Diversity in Sparse Datasets," International Journal on Artificial Intelligence Tools, vol. 13, no.1, pp. 219-235, 2004.
[41] A. I. Schein, A. Popescul, L. H. Ungar, and D. M. Pennock, "Methods and metrics for cold-start recommendations," Proc. 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2002), pp. 253 - 260, 2002.
[42] L. Ardissono, L. Console, and I. Torre, "On the application of personalization techniques to news servers on the www," in In Proc. AI*IA 99, Lecture Notes in Computer Science:Springer Verlag, 2000.
[43]
50
Appendix
Table 4.3: Notation and Definition of Variables
Variable Name Description Possible Value(s)
Approachid Recommendation method which refers
to the aggregation function used.
Weighted-Sum or Max-Min
Similarityid Similarity Measures Crisp Set, Cosine, Fuzzy Set,
Proximity, Correlation
Userid User identification number 1 to 943
trainingsize or
notrainingcase
Number of movies used as training case 5, 10, 15, 20, 25 and 30
Topn No of movies (N) to be recommended 3, 4, 5, 10, 20, …, 50
testingsize No of movies used as testing cases 3 to 539
recomtime Time required for N recommendations >0
precision45 Precision as defined in Chapter 3 Between 0 and 1
recall45 Recall as defined in Chapter 3 Between 0 and 1
f1ratio F1-measure as defined in Chapter 3 Between 0 and 1
FTM
CSM
Weighted-Sum similarityid Precision Recall F1-MeasureCrsip Set 0.48739784 0.320742 0.33334666Cosine 0.55005312 0.316626 0.33781755Fuzzy Set 0.55176256 0.321070 0.34300966Proximity 0.53925867 0.298921 0.32270392Corrleation 0.54910064 0.318104 0.34346133
Maximum-Sum similarityid Precision Recall F1-MeasureCrsip Set 0.48366128 0.392206 0.38324557Cosine 0.54998004 0.379255 0.38075219Fuzzy Set 0.54798302 0.350619 0.37108524Proximity 0.54347867 0.356230 0.37097099Corrleation 0.55098015 0.371140 0.37473446
51
Statistics for Top 10 and Training Size 20 approachid similarityid precision45 recall45 f1ratio
Mean .44090414 .238442 .27026395Std. Deviation .273553416 .2312662 .163949833
Crsip Set
N 704 704 633Mean .52528494 .227816 .26415138Std. Deviation .260475409 .2241420 .148228242
Cosine
N 720 720 688Mean .52710148 .231435 .27099571Std. Deviation .258639582 .2159096 .145101855
Fuzzy Set
N 719 719 690Mean .50913150 .198791 .23925365Std. Deviation .261871571 .1997978 .139353512
Proximity
N 720 720 680Mean .52607735 .234517 .27794824Std. Deviation .253934571 .2214276 .142743026
Weighted-Sum
Corrleation
N 719 719 686Mean .43085293 .329119 .34583043Std. Deviation .266956136 .2574552 .161814263
Crsip Set
N 704 704 628Mean .52215824 .272611 .29443228Std. Deviation .255016631 .2492869 .153560965
Cosine
N 720 720 692Mean .51974916 .273208 .31606714Std. Deviation .254855389 .2015340 .141920636
Fuzzy Set
N 719 719 689Mean .51043384 .283984 .30545140Std. Deviation .256021738 .2221199 .153628734
Proximity
N 720 720 694Mean .52747162 .257781 .29660432Std. Deviation .250682526 .2264607 .146991210
Corrleation
N 719 719 687Mean .50243956 .283146 .31104701Std. Deviation .259067460 .2332573 .152562771
Maximum-Sum
Total
N 3582 3582 3390
top related