predicting long-term impact of cqa posts: a comprehensive ... · yuan yao. joint work with ....

Yuan YaoJoint work with

Hanghang Tong, Feng Xu, and Jian Lu

Predicting Long-Term Impact of CQA Posts: A Comprehensive Viewpoint

1Aug 24-27, KDD 2014

Roadmap

Background and Motivations Modeling Multi-aspect Computation Speedup Empirical Evaluations Conclusions

2

Roadmap


3

CQA

4

Long-Term Impact

Q: How many users will find it beneficial?

What is the Long-Term Impact of a Q/A post?5

Challenges

Q: Why not off-the-shell data mining algorithms?

Challenge 1: Multi-aspect C1.1. Coupling between questions and answers C1.2. Feature non-linearity C1.3. Posts dynamically arrive

Challenge 2: Efficiency

6

C1.1 Coupling

Strong positive correlation![Yao+@ASONAM’14]

[Yao+@ASONAM’14] Y Yao, H Tong, T Xie, L Akoglu, F Xu, J Lu. Joint Voting Prediction for Questions and Answers in CQA. ASONAM 2014.

Fq

Predicted Q Impact Yq

Question Features

Fa

Predicted A Impact Ya

Answer Features

VotingConsistency

1 2

3

7

C1.1 Coupling

LIP-M: [Yao+@ASONAM’14]

Question prediction Answer prediction

Voting Consistency Regularization

1 2

3

8

C1.2 Non-linearity

The kernel trick (e.g., SVM) Mercer kernel

Kernel matrix as new feature matrix

9

C1.3 Dynamics

Solution: recursive least squares regression[Haykin2005]

(x1, y1)(x2, y2)

…(xt, yt)

(xt+1, yt+1)

Modelt

Modelt+1+

[Haykin2005] S Haykin. Adaptive filter theory. 2005.

Existing examples

New examples

Current model

New model

10

This Paper

Q1: how to comprehensively capture the multi-aspect in one algorithm? Coupling, non-linearity, and dynamics

Q2: how to make the long-term impact prediction algorithm efficient?

11

Roadmap


12

Modeling Non-linearity

Basic Idea: kernelize LIP-M Details - LIP-KM:

Closed-form solution:

Complexity: O(n3)

Question prediction Answer prediction

Voting Consistency Regularization

1 2

3

13

Modeling Dynamics

Basic idea: recursively update LIP-KM Details - LIP-KIM:

Complexity: O(n3) -> O(n2)

(matrix inverse lemma)

14

Roadmap


16

Approximation Method (1)

Basic idea: compress the kernel matrix Details – LIP-KIMA:

1) Separate decomposition 2) Make decomposition reusable 3) Apply decomposition on LIP-KIM

Complexity: O(n2) -> O(n)

(Nyström approximation)

(Eigen-decomposition)

(SVD on X1)(Eigen-decomposition)

17

Approximation Method (2)

Basic idea: filter less informative examples Details - LIP-KIMAA:

Complexity: O(n) -> <O(n)

?

(x1, y1)(x2, y2)

…(xt, yt)

ModeltExisting examples

Current model

(xt+1, yt+1)

New examples

Modelt+1 +New model

Filtering

18

LIP-KIM O(n2)

LIP-M (CoPs)

Ridge Regression

CouplingNon-linearity Dynamics

LIP-KI (Recursive Kernel Ridge Regression)LIP-KM O(n3) LIP-IM

LIP-KIMA O(n)

LIP-KIMAA <O(n)

K: Non-linearityI: DynamicsM: CouplingA: Approximation

LIP-K (Kernel Ridge Regression)

LIP-I (Recursive Ridge Regression)

Summary

19

http://www.google.com.hk/url?sa=i&rct=j&q=&esrc=s&source=images&cd=&cad=rja&uact=8&docid=pdtty0o7_6V8jM&tbnid=f2KtMNtL1IjckM:&ved=0CAUQjRw&url=http://www.sgrservizi.it/6248/1/Offerta-Bonus-20-20.html&ei=brf5U7rxEtGBygSGqoH4CQ&psig=AFQjCNFfbEX52236p8VGN_th7ODdcdV0kA&ust=1408960738426656




Roadmap


20

Experiment Setup

Datasets (http://blog.stackoverflow.com/category/cc-wiki-dump/)

Stack Overflow , Mathematics Stack Exchange

Features Content (bag-of-words) & contextual features

Test setTraining setInitial set Incremental set

Time

21

Evaluation Objectives

O1: Effectiveness How accurate are the proposed algorithms

for long-term impact prediction?

O2: Efficiency How scalable are the proposed algorithms?

22

Effectiveness Results

Comparisons with existing models.

(better)

Our methods

23

Efficiency Results

The speed comparisons.

(better)

LIP-KIMAA(sub-linear)

Ours

25

Quality-Speed Balance-off

Our methods

(better)

26

Roadmap


27

Conclusions

A family of algorithms for long-term impact prediction of CQA posts

Q1: how to capture coupling, non-linearity, and dynamics?

A1: voting consistency + kernel trick + recursive updating

Q2: how to make the algorithms scalable? A2: approximation methods

Empirical Evaluations Effectiveness: up to 35.8% improvement Efficiency: up to 390x speedup and sub-linear

scalability28

Q&A

Thanks！

Authors: Yuan Yao, Hanghang Tong, Feng Xu, and Jian Lu

Yuan Yao, [email protected]

29

mailto:[email protected]

predicting long-term impact of cqa posts: a comprehensive ... · yuan yao. joint work with ....

Documents