progress reports 2010.7.15

15

Click here to load reader

Upload: lau

Post on 24-Jun-2015

296 views

Category:

Technology


5 download

DESCRIPTION

CCAM

TRANSCRIPT

Page 1: Progress reports 2010.7.15

Progress reports

2010/7/15

Student / Rui-Zhe Liu, Meng-Lun Wu

Advisor / Chia-Hui Chang

Page 2: Progress reports 2010.7.15

Outline

Introduction

Methods

Baseline(K-means)

ITCC (Information theoretic co-clustering)

CCAM (Co-clustering with augmented matrix )

Evaluations

Results-based approach

Feature-based approach

2

Page 3: Progress reports 2010.7.15

Introduction(1/2)

Dhillon et al. proposed information theoretic co-

clustering (ITCC) to progress two way clustering for the

document-word matrix.

Sometimes we have addition information (called

augmented matrix) which are not considered by ITCC.

For example, in addition to user-ad link matrix, we may have

user description matrix and advertisement description

matrix.

3

Page 4: Progress reports 2010.7.15

Introduction (2/2)

To fully utilize augmented matrix, we proposed a new

method called Co-clustering with augmented

matrix (CCAM).

We also use the mutual information to model each data.

4

Page 5: Progress reports 2010.7.15

Methods

Baseline(k-means):

Data:

ad feature + ad-user link matrix

lohas game + user-ad link matrix

ITCC:

Data: ad-user link matrix

CCAM

Data: ad-user link, ad feature, lohas game matrix

Each method generates its ad clusters, and user group

results matrix.

User id

Ad_id

Ad features

Ad_id

5

Page 6: Progress reports 2010.7.15

Evaluations(1/8)

Each method evaluated by classification methods,

including SVM, decision tree, simple CART.

6

Page 7: Progress reports 2010.7.15

Evaluations(2/8) – result based

Method: baseline(k-means), co-clustering

Evaluation data:

ad feature + ad-user link + method results(ad cluster) matrix

lohas game + user-ad link + method results(user group) matrix

Results are as follows.

7

Page 8: Progress reports 2010.7.15

Evaluations(3/8)

svm kart decision tree

co-clustering 0.312 0.277 0.349

baseline 0.965 0.826 0.822

0

0.2

0.4

0.6

0.8

1

1.2

F-m

easu

reEvaluation of ad cluster (K=5)

Cart

8

Page 9: Progress reports 2010.7.15

Evaluations(4/8)

svm kart decision tree

co-clustering 0.861 0.729 0.729

baseline 0.931 0.677 0.677

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

F-m

easu

re

Evaluation of user group (K=5)

Cart

9

Page 10: Progress reports 2010.7.15

Evaluations(5/8)

K-means is better unfortunately, because it generates the

standard answers for classifications.

Therefore, we propose another way to evaluate.

10

Page 11: Progress reports 2010.7.15

Evaluations(6/8) – feature based

Method: baseline

Evaluation data:

ad feature + ad-user link data + baseline(k-means) results matrix

lohas game + user-ad link data + baseline(k-means) results matrix

Method: ITCC, CCAM

Evaluation data:

ad feature + ad-user link data + co-clustering feature (ad-user group

matrix) + baseline(k-means) results matrix

lohas game + user-ad link data + co-clustering feature (user-ad cluster

matrix) + baseline(k-means) results matrix

Results are as follows.

11

User id

Ad_id

User group

Ad_id

methods

Page 12: Progress reports 2010.7.15

Evaluations(7/8)

0.800

0.850

0.900

0.950

1.000

1.050

k=2 k=3 k=4 k=5

Avera

ge F

-measu

re

Co-clustering comparing of ad clustering

our method

itcc

baseline

ccam

12

Page 13: Progress reports 2010.7.15

Evaluations(8/8)

0.700

0.750

0.800

0.850

0.900

0.950

1.000

k=2 k=3 k=4 k=5

Avera

ge F

-measu

re

Co-clustering comparing of user group

our method

itcc

baseline

ccam

13

Page 14: Progress reports 2010.7.15

Future work

Discretize ad feature data.

Try different parameters for CCAM.

14

Page 15: Progress reports 2010.7.15

Thank you for listening.