itmo recsys course. autumn 2014. lecture1: introduction. knn, svd, evaluation
DESCRIPTION
ITMO RecSys course. Autumn 2014. Lecture1: Introduction. kNN, SVD, evaluationTRANSCRIPT
![Page 1: ITMO RecSys course. Autumn 2014. Lecture1: Introduction. kNN, SVD, evaluation](https://reader033.vdocuments.site/reader033/viewer/2022052602/559b890f1a28ab6d158b4659/html5/thumbnails/1.jpg)
Рекомендательные системы Лекция №1: введение
Андрей Данильченко 18 октября 2014
![Page 2: ITMO RecSys course. Autumn 2014. Lecture1: Introduction. kNN, SVD, evaluation](https://reader033.vdocuments.site/reader033/viewer/2022052602/559b890f1a28ab6d158b4659/html5/thumbnails/2.jpg)
Структура
• Introduction
• Collaborative filtering
• Content-based & hybrid methods
• Evaluation
![Page 3: ITMO RecSys course. Autumn 2014. Lecture1: Introduction. kNN, SVD, evaluation](https://reader033.vdocuments.site/reader033/viewer/2022052602/559b890f1a28ab6d158b4659/html5/thumbnails/3.jpg)
F. Ricci
“Recommender Systems (RSs) are software tools and techniques providing suggestions for items to be of use to a user”
Introduction
![Page 4: ITMO RecSys course. Autumn 2014. Lecture1: Introduction. kNN, SVD, evaluation](https://reader033.vdocuments.site/reader033/viewer/2022052602/559b890f1a28ab6d158b4659/html5/thumbnails/4.jpg)
Количество статей в области RS
по данным google scholar (от 2014-10-17)
![Page 5: ITMO RecSys course. Autumn 2014. Lecture1: Introduction. kNN, SVD, evaluation](https://reader033.vdocuments.site/reader033/viewer/2022052602/559b890f1a28ab6d158b4659/html5/thumbnails/5.jpg)
Мы живем в эпоху рекомендательных систем!
![Page 6: ITMO RecSys course. Autumn 2014. Lecture1: Introduction. kNN, SVD, evaluation](https://reader033.vdocuments.site/reader033/viewer/2022052602/559b890f1a28ab6d158b4659/html5/thumbnails/6.jpg)
Классификация RS
Available data
User history Content
Collaborative Content-based
Hybrid
Tags &
Metadata
![Page 7: ITMO RecSys course. Autumn 2014. Lecture1: Introduction. kNN, SVD, evaluation](https://reader033.vdocuments.site/reader033/viewer/2022052602/559b890f1a28ab6d158b4659/html5/thumbnails/7.jpg)
Данные
• Рейтинги (explicit feedback) • Унарные (like) • Бинарные (like/dislike) • Числовые (stars)
• История действий (implicit feedback) • Теги, метаданные
~ • Отзывы • Друзья (community-based RS)
![Page 8: ITMO RecSys course. Autumn 2014. Lecture1: Introduction. kNN, SVD, evaluation](https://reader033.vdocuments.site/reader033/viewer/2022052602/559b890f1a28ab6d158b4659/html5/thumbnails/8.jpg)
Постановка задач RS
• Predict
• Recommend
• Similar
![Page 9: ITMO RecSys course. Autumn 2014. Lecture1: Introduction. kNN, SVD, evaluation](https://reader033.vdocuments.site/reader033/viewer/2022052602/559b890f1a28ab6d158b4659/html5/thumbnails/9.jpg)
Collaborative filtering
• Neighborhood methods
• Matrix factorization methods
![Page 10: ITMO RecSys course. Autumn 2014. Lecture1: Introduction. kNN, SVD, evaluation](https://reader033.vdocuments.site/reader033/viewer/2022052602/559b890f1a28ab6d158b4659/html5/thumbnails/10.jpg)
Neighborhood methods
Collaborative filtering
![Page 11: ITMO RecSys course. Autumn 2014. Lecture1: Introduction. kNN, SVD, evaluation](https://reader033.vdocuments.site/reader033/viewer/2022052602/559b890f1a28ab6d158b4659/html5/thumbnails/11.jpg)
Идея метода (user-based)
Как продукт оценили похожие пользователи?
r̂ui =1
Ni u( )rvi
v∈Ni (u)∑
Взвесим вклад каждого
r̂ui =wuvrvi
v∈Ni (u)∑
wuvv∈Ni (u)∑
И нормализуем рейтинги
r̂ui = h−1
wuvh rvi( )v∈Ni (u)∑
wuvv∈Ni (u)∑
$
%
&&&
'
(
)))
![Page 12: ITMO RecSys course. Autumn 2014. Lecture1: Introduction. kNN, SVD, evaluation](https://reader033.vdocuments.site/reader033/viewer/2022052602/559b890f1a28ab6d158b4659/html5/thumbnails/12.jpg)
Какое расстояние использовать?
• Косинусное расстояние
• Корреляция Пирсона
cos(u,v) =ruirvi
i∈Iuv
∑
r2uii∈Iu
∑ rvj2
j∈Iv
∑
PC(u,v) =(rui − ru )(rvi − rv )
i∈Iuv
∑
(rui − ru )2
i∈Iu
∑ (rvi − rv )2
j∈Iv
∑
![Page 13: ITMO RecSys course. Autumn 2014. Lecture1: Introduction. kNN, SVD, evaluation](https://reader033.vdocuments.site/reader033/viewer/2022052602/559b890f1a28ab6d158b4659/html5/thumbnails/13.jpg)
Как нормализовать рейтинги?
• Mean centering
• Z-score
• Percentile
h rui( ) = rui − ru
h rui( ) = rui − ruσ u
h rui( ) =j ∈ Iu : ruj ≤ rui{ }
Iu
![Page 14: ITMO RecSys course. Autumn 2014. Lecture1: Introduction. kNN, SVD, evaluation](https://reader033.vdocuments.site/reader033/viewer/2022052602/559b890f1a28ab6d158b4659/html5/thumbnails/14.jpg)
Matrix factorization methods
Collaborative filtering
![Page 15: ITMO RecSys course. Autumn 2014. Lecture1: Introduction. kNN, SVD, evaluation](https://reader033.vdocuments.site/reader033/viewer/2022052602/559b890f1a28ab6d158b4659/html5/thumbnails/15.jpg)
Наилучшее приближение ранга k
Теорема: Если в матрице λ оставить k наибольших сингулярных векторов, то получим наилучшее приближение матрицы A ранга k
![Page 16: ITMO RecSys course. Autumn 2014. Lecture1: Introduction. kNN, SVD, evaluation](https://reader033.vdocuments.site/reader033/viewer/2022052602/559b890f1a28ab6d158b4659/html5/thumbnails/16.jpg)
Baseline predictors
Модель: r̂uiu = µ + bu + bi
argminb*
ruiu −µ − bu − bi( )(u,i)∈R∑
2+λ bu
2 +u∈U∑ bi
2
i∈I∑
$
%&
'
()
Функция ошибки:
![Page 17: ITMO RecSys course. Autumn 2014. Lecture1: Introduction. kNN, SVD, evaluation](https://reader033.vdocuments.site/reader033/viewer/2022052602/559b890f1a28ab6d158b4659/html5/thumbnails/17.jpg)
SVD
Модель: r̂uiu = µ + bu + bi + pu
Tqi
argminp*q*b*
ruiu −µ − bu − bi − puTqi( )
(u,i)∈R∑
2+λ pu
2+ qi
2+ bu
2 + bi2( )
Функция ошибки:
![Page 18: ITMO RecSys course. Autumn 2014. Lecture1: Introduction. kNN, SVD, evaluation](https://reader033.vdocuments.site/reader033/viewer/2022052602/559b890f1a28ab6d158b4659/html5/thumbnails/18.jpg)
Neighborhood (item-based)
Модель:
r̂uiu = bui +sij ruj − buj( )j∈Sk (u,i)∑
sijj∈Sk (u,i)∑= bui + θij
u ruj − buj( )j∈Sk (u,i)∑
![Page 19: ITMO RecSys course. Autumn 2014. Lecture1: Introduction. kNN, SVD, evaluation](https://reader033.vdocuments.site/reader033/viewer/2022052602/559b890f1a28ab6d158b4659/html5/thumbnails/19.jpg)
Neighborhood (optimization)
r̂uiu = bui + ωij ruj − buj( )j∈R(u)∑
r̂uiu = bui +sij ruj − buj( )j∈Sk (u,i)∑
sijj∈Sk (u,i)∑= bui + θij
u ruj − buj( )j∈Sk (u,i)∑
![Page 20: ITMO RecSys course. Autumn 2014. Lecture1: Introduction. kNN, SVD, evaluation](https://reader033.vdocuments.site/reader033/viewer/2022052602/559b890f1a28ab6d158b4659/html5/thumbnails/20.jpg)
Neighborhood (optimization + implicit)
r̂uiu = bui + ωij ruj − buj( )j∈R(u)∑ + cij
j∈N (u)∑
r̂uiu = bui +sij ruj − buj( )j∈Sk (u,i)∑
sijj∈Sk (u,i)∑= bui + θij
u ruj − buj( )j∈Sk (u,i)∑
![Page 21: ITMO RecSys course. Autumn 2014. Lecture1: Introduction. kNN, SVD, evaluation](https://reader033.vdocuments.site/reader033/viewer/2022052602/559b890f1a28ab6d158b4659/html5/thumbnails/21.jpg)
Neighborhood (normalization)
r̂uiu = bui + R(u)−12 ωij ruj − buj( )j∈R(u)∑ + N(u) −
12 cijj∈N (u)∑
r̂uiu = bui + ωij ruj − buj( )j∈R(u)∑ + cij
j∈N (u)∑
r̂uiu = bui + Rk (i,u)
−12 ωij ruj − buj( )j∈Rk (u)∑ + Nk (i,u)
−12 cijj∈Nk (u)∑
![Page 22: ITMO RecSys course. Autumn 2014. Lecture1: Introduction. kNN, SVD, evaluation](https://reader033.vdocuments.site/reader033/viewer/2022052602/559b890f1a28ab6d158b4659/html5/thumbnails/22.jpg)
Снова SVD
Модель: r̂uiu = µ + bu + bi + pu
Tqi
argminp*q*b*
ruiu −µ − bu − bi − puTqi( )
(u,i)∈R∑
2+λ pu
2+ qi
2+ bu
2 + bi2( )
Функция ошибки:
![Page 23: ITMO RecSys course. Autumn 2014. Lecture1: Introduction. kNN, SVD, evaluation](https://reader033.vdocuments.site/reader033/viewer/2022052602/559b890f1a28ab6d158b4659/html5/thumbnails/23.jpg)
Asymmetric-SVD
Модель: r̂uiu = µ + bu + bi + qi
T R(u) −12 ruj − buj( ) x jj∈R(u)∑ + N(u) −
12 yjj∈N (u)∑
$
%&&
'
())
argminp*q*b*
ruiu − r̂uiu( )2 +(u,i)∈R∑ λ qi
2+ bu
2 + bi2 + x j
2+ yj
2
j∈N (u)∑
j∈R(u)∑
$
%&&
'
())
Функция ошибки:
![Page 24: ITMO RecSys course. Autumn 2014. Lecture1: Introduction. kNN, SVD, evaluation](https://reader033.vdocuments.site/reader033/viewer/2022052602/559b890f1a28ab6d158b4659/html5/thumbnails/24.jpg)
SVD++
Модель: r̂uiu = µ + bu + bi + qi
T pu + N(u)−12 yjj∈N (u)∑
$
%&&
'
())
argminp*q*b*
ruiu − r̂uiu( )2 +(u,i)∈R∑ λ pu
2+ qi
2+ bu
2 + bi2 + yj
2
j∈N (u)∑
$
%&&
'
())
Функция ошибки:
![Page 25: ITMO RecSys course. Autumn 2014. Lecture1: Introduction. kNN, SVD, evaluation](https://reader033.vdocuments.site/reader033/viewer/2022052602/559b890f1a28ab6d158b4659/html5/thumbnails/25.jpg)
Integrated model
Модель: r̂uiu = µ + bu + bi + qi
T pu + N(u)−12 yjj∈N (u)∑
$
%&&
'
())+
+ Rk (i,u)−12 ωij ruj − buj( )j∈Rk (u)∑ + Nk (i,u)
−12 cijj∈Nk (u)∑
![Page 26: ITMO RecSys course. Autumn 2014. Lecture1: Introduction. kNN, SVD, evaluation](https://reader033.vdocuments.site/reader033/viewer/2022052602/559b890f1a28ab6d158b4659/html5/thumbnails/26.jpg)
А как все это оптимизировать?
![Page 27: ITMO RecSys course. Autumn 2014. Lecture1: Introduction. kNN, SVD, evaluation](https://reader033.vdocuments.site/reader033/viewer/2022052602/559b890f1a28ab6d158b4659/html5/thumbnails/27.jpg)
SGD-оптимизация модели SVD
Модель: r̂uiu = µ + bu + bi + pu
Tqi
argminp*q*b*
ruiu −µ − bu − bi − puTqi( )
(u,i)∈R∑
2+λ pu
2+ qi
2+ bu
2 + bi2( )
Функция ошибки:
Правила для градиентного спуска: bu ← bu +γ1 eui −λ1bu( )bi ← bi +γ1 eui −λ1bi( )pu ← pu +γ2 euiqi −λ2pu( )qu ← qi +γ2 eui pu −λ2qi( )
![Page 28: ITMO RecSys course. Autumn 2014. Lecture1: Introduction. kNN, SVD, evaluation](https://reader033.vdocuments.site/reader033/viewer/2022052602/559b890f1a28ab6d158b4659/html5/thumbnails/28.jpg)
Ridge regression
Модель: yi ← wT xiwTw→ 0
argminw
λwTw+ wT xi − yi( )2
i=1
n
∑#
$%
&
'(
Функция ошибки:
Точное решение: w = λI + XTX( )
−1XT y = λI + A( )−1 d
A = XTXd = XT y
![Page 29: ITMO RecSys course. Autumn 2014. Lecture1: Introduction. kNN, SVD, evaluation](https://reader033.vdocuments.site/reader033/viewer/2022052602/559b890f1a28ab6d158b4659/html5/thumbnails/29.jpg)
ALS-оптимизация модели SVD
Модель: r̂uiu = µ + bu + bi + pu
Tqi
argminp*q*b*
ruiu −µ − bu − bi − puTqi( )
(u,i)∈R∑
2+λ pu
2+ qi
2+ bu
2 + bi2( )
Функция ошибки:
P-step: pu = λnuI + Au( )−1 duAu =Q[u]
TQ[u]= qiqiT
i: u,i( )∈R∑
d =Q[u]T ru = ruiqii: u,i( )∈R∑
Q-step: qi = λniI + Ai( )−1 diAi = P[i]
T P[i]= pupuT
u: u,i( )∈R∑
di = P[i]T ri = rui pu
u: u,i( )∈R∑
![Page 30: ITMO RecSys course. Autumn 2014. Lecture1: Introduction. kNN, SVD, evaluation](https://reader033.vdocuments.site/reader033/viewer/2022052602/559b890f1a28ab6d158b4659/html5/thumbnails/30.jpg)
Сравнение моделей по RMSE
Модель 50 факторов 100 факторов 200 факторов Лучшее
Item-based kNN — —
—
0.9406
Neighborhood — — — 0.9002
SVD 0.9046 0.9025 0.9009 0.9009
Asymmetric SVD 0.9037 0.9013 0.9000 0.9000
SVD++ 0.8952 0.8924 0.8911 0.8911
Integrated model
0.8877 0.8870 0.8868 0.8868
на данных Netflix Prize
![Page 31: ITMO RecSys course. Autumn 2014. Lecture1: Introduction. kNN, SVD, evaluation](https://reader033.vdocuments.site/reader033/viewer/2022052602/559b890f1a28ab6d158b4659/html5/thumbnails/31.jpg)
Модель 50 факторов 100 факторов 200 факторов Лучшее
Item-based kNN — —
—
0.9406
Neighborhood — — — 0.9002
SVD 0.9046 0.9025 0.9009 0.9009
Asymmetric SVD 0.9037 0.9013 0.9000 0.9000
SVD++ 0.8952 0.8924 0.8911 0.8911
Integrated model
0.8877 0.8870 0.8868 0.8868
на данных Netflix Prize
BULLSHIT!
![Page 32: ITMO RecSys course. Autumn 2014. Lecture1: Introduction. kNN, SVD, evaluation](https://reader033.vdocuments.site/reader033/viewer/2022052602/559b890f1a28ab6d158b4659/html5/thumbnails/32.jpg)
Сontent-based methods
Tag-based methods
True content-based methods
![Page 33: ITMO RecSys course. Autumn 2014. Lecture1: Introduction. kNN, SVD, evaluation](https://reader033.vdocuments.site/reader033/viewer/2022052602/559b890f1a28ab6d158b4659/html5/thumbnails/33.jpg)
Tag-based methods
Сontent-based methods
![Page 34: ITMO RecSys course. Autumn 2014. Lecture1: Introduction. kNN, SVD, evaluation](https://reader033.vdocuments.site/reader033/viewer/2022052602/559b890f1a28ab6d158b4659/html5/thumbnails/34.jpg)
Давайте использовать тэги!
![Page 35: ITMO RecSys course. Autumn 2014. Lecture1: Introduction. kNN, SVD, evaluation](https://reader033.vdocuments.site/reader033/viewer/2022052602/559b890f1a28ab6d158b4659/html5/thumbnails/35.jpg)
Способы генерации тэгов
• User-generated
• Web-mining
• Expert-generated
• Metadata
![Page 36: ITMO RecSys course. Autumn 2014. Lecture1: Introduction. kNN, SVD, evaluation](https://reader033.vdocuments.site/reader033/viewer/2022052602/559b890f1a28ab6d158b4659/html5/thumbnails/36.jpg)
Similarity by tags (co-occurrence)
Данные: облака тэгов и Меры сходства: • Жаккарда
• Дайса
• Охаи
Ti Tj
Ti TjTi Tj
2 ⋅ Ti TjTi + Tj
Ti TjTi Tj
![Page 37: ITMO RecSys course. Autumn 2014. Lecture1: Introduction. kNN, SVD, evaluation](https://reader033.vdocuments.site/reader033/viewer/2022052602/559b890f1a28ab6d158b4659/html5/thumbnails/37.jpg)
Similarity by tags (LSA)
• Разложим матрицу Items x Tags по SVD • Меры сходства: косинусное расстояние и др.
≈ x x Item
fe
atur
es
Tags
Tag
feat
ures
Items
λ
![Page 38: ITMO RecSys course. Autumn 2014. Lecture1: Introduction. kNN, SVD, evaluation](https://reader033.vdocuments.site/reader033/viewer/2022052602/559b890f1a28ab6d158b4659/html5/thumbnails/38.jpg)
Тэговый вандализм
Тэги Paris Hillton
Last.fm, май 2013
![Page 39: ITMO RecSys course. Autumn 2014. Lecture1: Introduction. kNN, SVD, evaluation](https://reader033.vdocuments.site/reader033/viewer/2022052602/559b890f1a28ab6d158b4659/html5/thumbnails/39.jpg)
Тэговый вандализм — как бороться?
Исправленные тэги Paris Hillton
• User listening habbits
• Filter tags by similarity
![Page 40: ITMO RecSys course. Autumn 2014. Lecture1: Introduction. kNN, SVD, evaluation](https://reader033.vdocuments.site/reader033/viewer/2022052602/559b890f1a28ab6d158b4659/html5/thumbnails/40.jpg)
True content-based methods
Сontent-based methods
![Page 41: ITMO RecSys course. Autumn 2014. Lecture1: Introduction. kNN, SVD, evaluation](https://reader033.vdocuments.site/reader033/viewer/2022052602/559b890f1a28ab6d158b4659/html5/thumbnails/41.jpg)
Пример — музыка
• Spectral centroid • Spectral flatness • Spectral skewness • Spectral kurtosis • Zero-Crossing Rate (ZCR) • Mel Frequency Cepstrum Coefficients (MFCCs)
• Instrumentation • Rhythm • Harmony • Structure • Intensity • Genre • Mood
low-level
high-level
![Page 42: ITMO RecSys course. Autumn 2014. Lecture1: Introduction. kNN, SVD, evaluation](https://reader033.vdocuments.site/reader033/viewer/2022052602/559b890f1a28ab6d158b4659/html5/thumbnails/42.jpg)
Hybrid methods
![Page 43: ITMO RecSys course. Autumn 2014. Lecture1: Introduction. kNN, SVD, evaluation](https://reader033.vdocuments.site/reader033/viewer/2022052602/559b890f1a28ab6d158b4659/html5/thumbnails/43.jpg)
Классификация методов
• Weighted • Switching • Mixed • Cascade
![Page 44: ITMO RecSys course. Autumn 2014. Lecture1: Introduction. kNN, SVD, evaluation](https://reader033.vdocuments.site/reader033/viewer/2022052602/559b890f1a28ab6d158b4659/html5/thumbnails/44.jpg)
Evaluation
![Page 45: ITMO RecSys course. Autumn 2014. Lecture1: Introduction. kNN, SVD, evaluation](https://reader033.vdocuments.site/reader033/viewer/2022052602/559b890f1a28ab6d158b4659/html5/thumbnails/45.jpg)
Как можно измерить качество RS?
• Offline test • User study • Online experiment
![Page 46: ITMO RecSys course. Autumn 2014. Lecture1: Introduction. kNN, SVD, evaluation](https://reader033.vdocuments.site/reader033/viewer/2022052602/559b890f1a28ab6d158b4659/html5/thumbnails/46.jpg)
Offline evaluation • Prediction accuracy
– RMSE – MAE
• Usage prediction accuracy – Precision/recall @N – F1 – AUC
• Ranking accuracy – DPM – DGC – Average Reciprocal Hit Rank (ARHR)
• Coverage – Catalog coverage – Sales diversity – Gini index – Shannon entropy
![Page 47: ITMO RecSys course. Autumn 2014. Lecture1: Introduction. kNN, SVD, evaluation](https://reader033.vdocuments.site/reader033/viewer/2022052602/559b890f1a28ab6d158b4659/html5/thumbnails/47.jpg)
User study
• Confidence • Trust • Novelty • Diversity • Serendipity • Robustness • Adaptivity • Scalability
![Page 48: ITMO RecSys course. Autumn 2014. Lecture1: Introduction. kNN, SVD, evaluation](https://reader033.vdocuments.site/reader033/viewer/2022052602/559b890f1a28ab6d158b4659/html5/thumbnails/48.jpg)
Сравнивать легче!
![Page 49: ITMO RecSys course. Autumn 2014. Lecture1: Introduction. kNN, SVD, evaluation](https://reader033.vdocuments.site/reader033/viewer/2022052602/559b890f1a28ab6d158b4659/html5/thumbnails/49.jpg)
Online study methods
• A-B testing
• Team-Driven Interleaving (TDI)
![Page 50: ITMO RecSys course. Autumn 2014. Lecture1: Introduction. kNN, SVD, evaluation](https://reader033.vdocuments.site/reader033/viewer/2022052602/559b890f1a28ab6d158b4659/html5/thumbnails/50.jpg)
Андрей Данильченко
разработчик
Удачи!