progress reports 2010.7.15
DESCRIPTION
CCAMTRANSCRIPT
Progress reports
2010/7/15
Student / Rui-Zhe Liu, Meng-Lun Wu
Advisor / Chia-Hui Chang
Outline
Introduction
Methods
Baseline(K-means)
ITCC (Information theoretic co-clustering)
CCAM (Co-clustering with augmented matrix )
Evaluations
Results-based approach
Feature-based approach
2
Introduction(1/2)
Dhillon et al. proposed information theoretic co-
clustering (ITCC) to progress two way clustering for the
document-word matrix.
Sometimes we have addition information (called
augmented matrix) which are not considered by ITCC.
For example, in addition to user-ad link matrix, we may have
user description matrix and advertisement description
matrix.
3
Introduction (2/2)
To fully utilize augmented matrix, we proposed a new
method called Co-clustering with augmented
matrix (CCAM).
We also use the mutual information to model each data.
4
Methods
Baseline(k-means):
Data:
ad feature + ad-user link matrix
lohas game + user-ad link matrix
ITCC:
Data: ad-user link matrix
CCAM
Data: ad-user link, ad feature, lohas game matrix
Each method generates its ad clusters, and user group
results matrix.
User id
Ad_id
Ad features
Ad_id
5
Evaluations(1/8)
Each method evaluated by classification methods,
including SVM, decision tree, simple CART.
6
Evaluations(2/8) – result based
Method: baseline(k-means), co-clustering
Evaluation data:
ad feature + ad-user link + method results(ad cluster) matrix
lohas game + user-ad link + method results(user group) matrix
Results are as follows.
7
Evaluations(3/8)
svm kart decision tree
co-clustering 0.312 0.277 0.349
baseline 0.965 0.826 0.822
0
0.2
0.4
0.6
0.8
1
1.2
F-m
easu
reEvaluation of ad cluster (K=5)
Cart
8
Evaluations(4/8)
svm kart decision tree
co-clustering 0.861 0.729 0.729
baseline 0.931 0.677 0.677
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
F-m
easu
re
Evaluation of user group (K=5)
Cart
9
Evaluations(5/8)
K-means is better unfortunately, because it generates the
standard answers for classifications.
Therefore, we propose another way to evaluate.
10
Evaluations(6/8) – feature based
Method: baseline
Evaluation data:
ad feature + ad-user link data + baseline(k-means) results matrix
lohas game + user-ad link data + baseline(k-means) results matrix
Method: ITCC, CCAM
Evaluation data:
ad feature + ad-user link data + co-clustering feature (ad-user group
matrix) + baseline(k-means) results matrix
lohas game + user-ad link data + co-clustering feature (user-ad cluster
matrix) + baseline(k-means) results matrix
Results are as follows.
11
User id
Ad_id
User group
Ad_id
methods
Evaluations(7/8)
0.800
0.850
0.900
0.950
1.000
1.050
k=2 k=3 k=4 k=5
Avera
ge F
-measu
re
Co-clustering comparing of ad clustering
our method
itcc
baseline
ccam
12
Evaluations(8/8)
0.700
0.750
0.800
0.850
0.900
0.950
1.000
k=2 k=3 k=4 k=5
Avera
ge F
-measu
re
Co-clustering comparing of user group
our method
itcc
baseline
ccam
13
Future work
Discretize ad feature data.
Try different parameters for CCAM.
14
Thank you for listening.