a review of information filtering-cf

Upload: vincent-chu

Post on 29-May-2018

226 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/9/2019 A Review of Information Filtering-CF

    1/48

    A Review of Information Filtering

    Part II: Collaborative Filtering

    Chengxiang Zhai

    Language Technologies Institiute

    School of Computer ScienceCarnegie Mellon University

  • 8/9/2019 A Review of Information Filtering-CF

    2/48

    Ou tline

    A Concept u al Framework for CollaborativeFiltering (CF)

    Rating-based Methods (Breese et al. 98) Memory-based methods

    Model-based methods

    Preference-based Methods(Cohen et al. 99 & Fre u nd et al. 98)

    S u mmary & Research Directions

  • 8/9/2019 A Review of Information Filtering-CF

    3/48

    What is Collaborative Filtering (CF)?

    Making filtering decisions for an individ u alu ser based on the j u dgments of other u sers

    Inferring individ u als interest/preferencesfrom that of other similar u sers

    General idea

    Given a u ser u , find similar u sers { u 1 , , u m } Predict u s preferences based on the preferences

    of u 1 , , u m

  • 8/9/2019 A Review of Information Filtering-CF

    4/48

    CF: Applications

    Recommender Systems: books , CDs , Videos , Movies , potentially anything!

    Can be combined with content-based filtering

    Example (commercial) systems

    Gro u pLens (Resnick et al. 94): u senet news rating

    Amazon: book recommendation

    Firefly (p u rchased by Microsoft?): m u sicrecommendation

    Alexa: web page recommendation

  • 8/9/2019 A Review of Information Filtering-CF

    5/48

    CF: Ass u mptions

    Users with a common interest will have similar preferences

    Users with similar preferences probably share

    the same interest

    Examples

    interest is IR => read SIGIR papers

    read SIGIR papers => interest is IR

    S u fficiently large n u mber of u ser preferencesare available

  • 8/9/2019 A Review of Information Filtering-CF

    6/48

    CF: Int u itions

    User similarity

    If Jamie liked the paper , Ill like the paper

    ? If Jamie liked the movie , Ill like the movie

    S u ppose Jamie and I viewed similar movies in thepast six months

    Item similarity

    Since 90% of those who liked Star Wars also likedIndependence Day , and , yo u liked Star Wars

    Yo u may also like Independence Day

  • 8/9/2019 A Review of Information Filtering-CF

    7/48

    Collaborative Filtering vs.Content-based Filtering

    Basic filtering q u estion: W ill u ser U like item X ?

    Two different ways of answering it Look at what U likes

    Look at who likes X

    Can be combined

    => characterize X => content-based filtering

    => characterize U => collaborative filtering

  • 8/9/2019 A Review of Information Filtering-CF

    8/48

    Rating-based vs. Preference-based

    Rating-based: Users preferences areencoded u sing n u merical ratings on items

    Complete ordering

    Absol u te val u es can be meaningf u l

    Bu t , val u es m u st be normalized to combine

    Preferences: Users preferences are

    represented by partial ordering of items Partial ordering

    Easier to exploit implicit preferences

  • 8/9/2019 A Review of Information Filtering-CF

    9/48

    A Formal Framework for Rating

    u1u2

    ui...

    um

    Users: U

    Objects: O

    o1 o2 o j on

    3 1.5 . 2

    2

    1

    3

    Xij=f(ui,o j)=?

    ?

    The task

    Unknown f unctionf: U x Op R

    Ass u me known f val u es for some ( u ,o)s

    Predict f val u es for other (u ,o)s

    Essentially f u nctionapproximation , like other learning problems

  • 8/9/2019 A Review of Information Filtering-CF

    10/48

    Where are the int u itions?

    Similar u sers have similar preferences If u } u , then for all os , f(u ,o) } f(u ,o)

    Similar objects have similar u ser preferences

    If o } o, then for all u s, f(u ,o) } f(u ,o)

    In general , f is locally constant

    If u } u and o } o, then f( u ,o) } f(u ,o)

    Local smoothness makes it possible to predictu nknown val u es by interpolation or extrapolation

    What does local mean?

  • 8/9/2019 A Review of Information Filtering-CF

    11/48

    Two Gro u ps of Approaches

    Memory-based approaches

    f(u ,o) = g( u )(o) } g( u )(o) if u } u

    Find neighbors of u and combine g( u )(o)s

    Model-based approaches

    Ass u me str u ct u res/model: object cl u ster , u ser cl u ster , f defined on cl u sters

    f(u ,o) = f(c u , c o )

    Estimation & Probabilistic inference

  • 8/9/2019 A Review of Information Filtering-CF

    12/48

    Memory-based Approaches(Breese et al. 98)

    General ideas:

    Xij: rating of object j by u ser i

    n i: average rating of all objects by u ser i Normalized ratings: V ij = X ij - n i

    Memory-based prediction

    Specific approaches differ in w(a ,i) -- thedistance/similarity between u ser a and i

    !!!!!

    m

    iaajajij

    m

    iaj iawk nv xviaw K v 11 ),(/1

    ),(

  • 8/9/2019 A Review of Information Filtering-CF

    13/48

    User Similarity Meas u res

    Pearson correlation coefficient (s u m over commonly rated items)

    Cosine meas u re

    Many other possibilities!

    !

    j i ij

    j a aj

    j i ij a aj

    p n x n x

    n x n x

    i a w 22 )()(

    ))((

    ),(

    !!

    !!n

    j ij

    n

    j aj

    n

    j ij aj

    c

    x x

    x x i a w

    1

    2

    1

    2

    1),(

  • 8/9/2019 A Review of Information Filtering-CF

    14/48

    Improving User SimilarityMeas u res (Breese et al. 98)

    Dealing with missing val u es: defa u ltratings

    Inverse User Freq u ency (IUF): similar toIDF

    Case Amplification:u

    se w(a ,I)p

    , e.g. , p=2.5

  • 8/9/2019 A Review of Information Filtering-CF

    15/48

    Model-based Approaches(Breese et al. 98)

    General ideas

    Ass u me that data/ratings are explained by aprobabilistic model with parameter U

    Estimate/learn model parameter Ubased on data

    Predict u nknown rating u sing E U[xk+1 | x 1, , xk], which is comp u ted u sing the estimated model

    Specific methods differ in the model u sedand how the model is estimated

    r x x r x p x x x E k k r

    k k ),,...,|(],...,|[ UU 1111 !!

  • 8/9/2019 A Review of Information Filtering-CF

    16/48

    Probabilistic Cl u stering

    Cl u stering u sers based on their ratings

    Ass u me ratings are observations of am u ltinomial mixt u re model with parameters

    p(C) , p(x i|C)

    Model estimated u sing standard EM

    Predict ratings u sing E [xk+1 | x 1, , xk]

    )|(),...,|(),...,|(

    ),...,|(],...,|[

    1111

    1111

    cC r x p x xcC p x xr x p

    r x xr x p x x x E

    k k c

    k k

    k k r

    k k

    !!!!!

    !!

  • 8/9/2019 A Review of Information Filtering-CF

    17/48

    Bayesian Network

    Use BN to capt u re object/item dependency

    Each item/object is a node

    (Dependency) str u ct u re is learned from all data

    Model parameters: p(x k+1 |pa(x k+1 )) wherepa(x k+1 ) is the parents/predictors of x k+1(represented as a decision tree)

    Predict ratings u sing E [xk+1 | x 1, , xk]

    111

    1111

    ),...,|(

    ),...,|(],...,|[

    !

    !!

    k k k

    k k r

    k k

    xnodeat treedecisiontheby given x xr x p

    r x xr x p x x x E

  • 8/9/2019 A Review of Information Filtering-CF

    18/48

    Three-way Aspect Model(Popesc u l et al. 2001)

    CF + content-based

    Generative model

    (u ,d ,w) as observations

    z as hidden variable

    Standard EM

    Essentially cl u stering the jointdata

    Eval u ation on ResearchIndexdata

    Fo u nd its better to treat ( u ,w) asobservations

  • 8/9/2019 A Review of Information Filtering-CF

    19/48

    Eval u ation Criteria (Breese et al. 98)

    Rating acc u racy

    Average absol u te deviation

    Pa = set of items predicted

    Ranking acc u racy

    Expected u tility

    Exponentially decaying viewing probabillity

    E ( halflife )= the rank where the viewing probability=0.5

    d = ne u tral rating

    |||| aj aj P j

    P a x x S a

    a ! 1

    ! j

    j aj

    a

    d x )/()(

    ),max (112

    0

    E

  • 8/9/2019 A Review of Information Filtering-CF

    20/48

    Datasets

  • 8/9/2019 A Review of Information Filtering-CF

    21/48

    Res u lts

    - BN & CR+ are generally better than VSIM& BC

    - BN is best withmore training data- VSIMis better with little training data- Inverse User Freq. Is effective- Case amplification ismostly effective

  • 8/9/2019 A Review of Information Filtering-CF

    22/48

    S u mmary of Rating-based Methods

    Effectiveness

    Both memory-based and model-based methods canbe effective

    The correlation method appears to be rob u st

    Bayesian network works well with plenty of trainingdata , b u t not very well with little training data

    The cosine similarity method works well with littletraining data

  • 8/9/2019 A Review of Information Filtering-CF

    23/48

    S u mmary of Rating-based Methods (cont.)

    Efficiency

    Memory based methods are slower than model-based methods in predicting

    Learning can be extremely slow for model-basedmethods

  • 8/9/2019 A Review of Information Filtering-CF

    24/48

    Preference-based Methods(Cohen et al. 99 , Fre u nd et al. 98)

    Motivation

    Explicit ratings are not always available , b u t implicitorderings/preferences might be available

    O nly relative ratings are meaningf u l, even if whenratings are available

    Combining preferences has other applications , e.g. ,

    Merging results from different search engines

  • 8/9/2019 A Review of Information Filtering-CF

    25/48

    A Formal Model of Preferences

    Instances: O ={o 1, , o n}

    Ranking f u nction: R: (U x) O x O p [0 ,1]

    R( u ,v)=1 means u is strongly preferred to v

    R( u ,v)=0 means v is strongly preferred to u

    R( u ,v)=0.5 means no preference

    Feedback: F = {( u ,v)}, u is preferred to v

    Minimize Loss:),(minarg),(

    ||),(

    ),(

    FLvu F

    FLH

    Fvu

    !! 1

    1

    Hypothesis space

  • 8/9/2019 A Review of Information Filtering-CF

    26/48

    The Hypothesis Space H

    Witho u t constraints on H , the loss isminimized by any R that agrees with F

    Appropriate constraints for collaborativefiltering

    Compare this with

    1!! i a U i

    i i a w v u w v u }{

    ),(),(

    !!

    !!!m

    iaajajij

    m

    iaj iawk nv xviaw K v

    11

    ),(/1),(

  • 8/9/2019 A Review of Information Filtering-CF

    27/48

    The Hedge Algorithm for Combining Preferences

    Iterative u pdating of w 1, w 2, , w n

    Initialization: w i is u niform

    Updating: F [0 ,1]

    L=0 => weight stays

    L is large => weight is decreased

    t

    F R Lt i t

    i Z w

    w t t

    i ),( F!1

  • 8/9/2019 A Review of Information Filtering-CF

    28/48

    Some Theoretical Res u lts

    The c u m u lative loss of Ra will not be m u chworse than that of the best ranking

    expert/featu

    rePreferences Ra => ordering V=> R V

    L(R V,F)

  • 8/9/2019 A Review of Information Filtering-CF

    29/48

    A Greedy O rdering Algorithm

    Use weighted graph to representpreferences R

    For each node , comp u te the potential val u e , I.e. , o u tgoing_weights - ingoing_weights

    Rank the node with the highest potentialval u e above all others

    Remove this node and its edges , repeat

    At least half of the optimal agreement isg u aranteed

    !O u O u

    v u R u v R v ),(),()(T

  • 8/9/2019 A Review of Information Filtering-CF

    30/48

  • 8/9/2019 A Review of Information Filtering-CF

    31/48

    Eval u ation of O rdering Algorithms

    Meas u re: weight coverage

    Datasets = randomly generated smallgraphs

    O bservations

    The basic greedy algorithm works better than arandom perm u tation baseline

    Improved version is generally better , b u t theimprovement is insignificant for large graphs

  • 8/9/2019 A Review of Information Filtering-CF

    32/48

    Metasearch Experiments

    Task: Known item search Search for a ML researchers homepage

    Search for a u niversity homepage

    Search expert = variant of qu

    eryLearn to merge res u lts of all search experts

    Feedback

    Complete : known item preferred to all others Click data : known item preferred to all above it

    Leave-one-o u t testing

  • 8/9/2019 A Review of Information Filtering-CF

    33/48

    Metasearch Res u lts

    Meas u res: compare combined preferences withindivid u al ranking f u nction

    sign test: to see which system tends to rank theknown relevant article higher.

    #q u eries with the known relevant item ranked abovek.

    average rank of the known relevant item

    Learned system better than individ u al expertby all meas u re (not s u rprising , why?)

  • 8/9/2019 A Review of Information Filtering-CF

    34/48

    Metasearch Res u lts (cont.)

  • 8/9/2019 A Review of Information Filtering-CF

    35/48

    Direct Learning of anO rdering F u nction

    Each expert is treated as a ranking feat u ref i: O p R U {0} (allow partial ranking)

    Given preference feedback * : X x X p R

    Goal: to learn H that minimizes the loss

    D* (x 0,x1): a distrib u tion over X x X (act u ally au niform dist. over pairs with feedback order)D* (x 0,x1) = c max{0 , * (x 0,x1) }

    )]()([Pr )]]()()[[,()( ~),(,

    101010 10

    10

    x H x H x H x H x x D H rloss D x x x x

    D "!"! * *

  • 8/9/2019 A Review of Information Filtering-CF

    36/48

    The RankBoost Algorithm

    Iterative u pdating of D(x 0,x1)Initialization: D 1= D *For t=1 , ,T : Train weak learner u sing D t Get weak hypothesis h t: X p R

    Choose Et >0 Update

    Final hypothesis:

    t

    x h x h t

    t Z

    e x x x x

    t t t ))()((),(),(

    10

    10101 !

    E

    !

    !T

    t

    t x h x H 1

    )()(

  • 8/9/2019 A Review of Information Filtering-CF

    37/48

    How to Choose Et and Design h t ?

    Bo u nd on the ranking loss

    Th u s , we sho u ld choose Et that minimizes thebo u nd

    Three approaches:

    Nu merical search

    Special case: h is either 0 or 1

    Approximation of Z , then find analytic sol u tion

    !

    e*

    T

    t

    t D Z rloss

    1

  • 8/9/2019 A Review of Information Filtering-CF

    38/48

    Efficient RankBoost for Bipartite

    Feedback

    t

    x h x h t

    t Z

    e x x x x

    t t t ))()((),(),(

    10

    10101 !

    E

    0

    )(0

    01

    0)()(

    t

    xht

    t Z

    e xv xv

    t t E

    !

    1

    )(1

    11

    1)()(

    t

    xht

    t Z

    e xv xv

    t t E

    !

    10

    1010 )()(),(

    t t t

    t t t

    Z Z Z

    xv xv x x D

    !

    !

    Complexity at each round: O(|X0||X1|) O(|X0|+|X1|)

    Bipartite feedback:Essentially binary classification

    X0

    X1

  • 8/9/2019 A Review of Information Filtering-CF

    39/48

    Eval u ation of RankBoost

    Meta-search: Same as in (Cohen et al 99)

    Perfect feedback

    4-fold cross validation

  • 8/9/2019 A Review of Information Filtering-CF

    40/48

    EachMovie Eval u ation

    # users #movies/user

    #feedback movies

  • 8/9/2019 A Review of Information Filtering-CF

    41/48

    Performance ComparisonCohen et al. 99 vs. Freund et al. 99

  • 8/9/2019 A Review of Information Filtering-CF

    42/48

    S u mmary

    CF is easy The u sers expectation is low

    Any recommendation is better than none

    Making it practically u sef u l

    CF is hard

    Data sparseness

    Scalability Domain-dependent

  • 8/9/2019 A Review of Information Filtering-CF

    43/48

    S u mmary (cont.)

    CF as a Learning Task Rating-based form u lation

    Learn f: U x O -> R

    Algorithms

    Instance-based /memory-based (k -nearest neighbors )

    Model-based (probabilistic clustering )

    Preference-based form u lation

    Learn PREF: U x O x O -> R Algorithms

    General preference combination (Hedge ), greedy ordering

    Efficient restricted preference combination (Ran kBoost )

  • 8/9/2019 A Review of Information Filtering-CF

    44/48

    S u mmary (cont.)

    Eval u ation

    Rating-based methods

    Sim ple methods seem to be reasonably effective

    Advantage of so phisticated methods seems to be limited

    Preference-based methods

    More effective than rating-based methods according to

    one evaluationEvaluation on meta-search is wea k

  • 8/9/2019 A Review of Information Filtering-CF

    45/48

    Research Directions

    Exploiting complete information

    CF + content-based filtering + domainknowledge + u ser model

    More localized kernels for instance-based methods

    Predicting movies need different neighbor u sers than predicting books

    S u ggesting u sing items similar to the targetitem as feat u res to find neighbors

  • 8/9/2019 A Review of Information Filtering-CF

    46/48

    Research Directions (cont.)

    Modeling time

    There might be seq u ential patterns on theitems a u ser p u rchased (e.g. , bread machine ->

    bread machine mix)

    Probabilistic model of preferences

    Making preference f u nction a probability

    f u nction , e.g , P(A>B|U) Cl u stering items and u sers

    Minimizing preference disagreements

  • 8/9/2019 A Review of Information Filtering-CF

    47/48

    References

    Cohen , W.W., Schapire , R.E. , and Singer , Y. (1999) "Learning to O rder Things" , Jo u rnal of AI Research , Vol u me 10 , pages 243-270.

    Fre u nd , Y., Iyer , R. ,Schapire , R.E. , & Singer , Y. (1999). An efficient boosting algorithmfor combining preferences. M achine Learning Jo u rnal. 1999.

    Breese , J. S. , Heckerman , D. , and Kadie , C. (1998). Empirical Analysis of PredictiveAlgorithms for Collaborative Filtering. In Proceedings of the 14th Conference onUncertainty in Articial Intelligence , pp. 43-52.

    Alexandrin Popesc u l and Lyle H. Ungar , Probabilistic Models for UnifiedCollaborative and Content-Based Recommendation in Sparse-Data Environments , UAI 2001.

    N. Good , J.B. Schafer , J. Konstan , A. Borchers , B. Sarwar , J. Herlocker , and J. Riedl."Combining Collaborative Filtering with Personal Agents for Better Recommendations." Proceedings AAAI-99. pp 439-446. 1999.

  • 8/9/2019 A Review of Information Filtering-CF

    48/48

    The End

    Tha nk yo u!