my past works kazunari sugiyama 16 apr., 2009 wing group meeting
DESCRIPTION
3 I-1 Personal Name Disambiguation in Web Search Results [Outline] 1. Introduction 2. Our Proposed Method 3. ExperimentsTRANSCRIPT
![Page 1: My past works Kazunari Sugiyama 16 Apr., 2009 WING Group Meeting](https://reader035.vdocuments.site/reader035/viewer/2022062311/5a4d1b107f8b9ab05998f321/html5/thumbnails/1.jpg)
My past worksMy past works
Kazunari Sugiyama16 Apr., 2009
WING Group Meeting
![Page 2: My past works Kazunari Sugiyama 16 Apr., 2009 WING Group Meeting](https://reader035.vdocuments.site/reader035/viewer/2022062311/5a4d1b107f8b9ab05998f321/html5/thumbnails/2.jpg)
2
OutlineOutlineI. Natural Language Processing (Disambiguation)
I-1 Personal Name Disambiguation in Web Search Results
I-2 Word Sense Disambiguation
in Japanese TextsII. Web Information Retrieval
II-1 Characterizing Web Pages Using Hyperlinked Neighboring Pages
II-2 Adaptive Web Search based on User’s Information Needs
![Page 3: My past works Kazunari Sugiyama 16 Apr., 2009 WING Group Meeting](https://reader035.vdocuments.site/reader035/viewer/2022062311/5a4d1b107f8b9ab05998f321/html5/thumbnails/3.jpg)
3
I-1 Personal Name Disambiguation I-1 Personal Name Disambiguation in Web Search Resultsin Web Search Results
[Outline]1. Introduction2. Our Proposed Method3. Experiments
![Page 4: My past works Kazunari Sugiyama 16 Apr., 2009 WING Group Meeting](https://reader035.vdocuments.site/reader035/viewer/2022062311/5a4d1b107f8b9ab05998f321/html5/thumbnails/4.jpg)
4
Politician
Professor of Computer Science
consulting company
Robert M. Gates(other person not “William Cohen”)
![Page 5: My past works Kazunari Sugiyama 16 Apr., 2009 WING Group Meeting](https://reader035.vdocuments.site/reader035/viewer/2022062311/5a4d1b107f8b9ab05998f321/html5/thumbnails/5.jpg)
5
2.1 2.1 Our Proposed MethodOur Proposed Method
tn
t1
t2
t3
[Semi-supervised Clustering]
×G1
×G2 tn
t1
t2
t3
×G1
×
×G’1D(G, w )p
Control the fluctuation of the centroid of a cluster
G2
p : search-result Web page
w p: feature vector of pGi : the centroid vector of a cluster
psi: seed page
wp: feature vector of a Web page contained in a cluster that has centroid G
(G)
w pw ps1
w ps2
w p
w ps1w ps2
21
),()(
)(
nnD p
pn p
newp
wGww
GGw
G
G
C1C2 C1
C2
![Page 6: My past works Kazunari Sugiyama 16 Apr., 2009 WING Group Meeting](https://reader035.vdocuments.site/reader035/viewer/2022062311/5a4d1b107f8b9ab05998f321/html5/thumbnails/6.jpg)
6
3. Experiments3. Experiments
3.1 Experimental Data3.2 Evaluation Measure3.3 Experimental Results
![Page 7: My past works Kazunari Sugiyama 16 Apr., 2009 WING Group Meeting](https://reader035.vdocuments.site/reader035/viewer/2022062311/5a4d1b107f8b9ab05998f321/html5/thumbnails/7.jpg)
7
3.1 3.1 Experimental DataExperimental Data WePS Corpus
– Established for “Web People Search Task” at SemEval-2007 in the Association for Computational Linguistics (ACL) conference
Web pages related to 79 personal names– Sampled from
• participants in conferences on digital libraries and computational linguistics,
• bibliographic articles in the English Wikipedia,• the U.S. Census
– The top 100 Yahoo search results via its search API for a personal name query• Training sets : 49 names, Test sets : 30 names ( 7,900 Web pages
in total )
Pre-processing for WePS corpus– Eliminate stopwords, and perform stemming– Determine the optimal parameter for merging similar
clusters using the training set in the WePS corpus and then apply it to the test set in the WePS corpus
![Page 8: My past works Kazunari Sugiyama 16 Apr., 2009 WING Group Meeting](https://reader035.vdocuments.site/reader035/viewer/2022062311/5a4d1b107f8b9ab05998f321/html5/thumbnails/8.jpg)
8
3.2 3.2 Evaluation MeasureEvaluation Measure
(1) Purity(2) Inverse purity(3) F (harmonic mean of (1) and (2))
(These are standard evaluation measures employed in the “Web People search task.”)
[Hotho et al., GLDV Journal’05]
![Page 9: My past works Kazunari Sugiyama 16 Apr., 2009 WING Group Meeting](https://reader035.vdocuments.site/reader035/viewer/2022062311/5a4d1b107f8b9ab05998f321/html5/thumbnails/9.jpg)
9
3.3 3.3 Experimental ResultsExperimental Results
Team ID of “Web People Search Task”
Purity Inverse Purity F
CU_COMSEM 0.72 0.88 0.78IRST-BP 0.75 0.80 0.75PSNUS 0.73 0.82 0.75UVA 0.81 0.60 0.67SHEF 0.60 0.82 0.66Our proposed method 2 and 3 sentences in 5 Wikipedia seed pages and search result Web page, redpectively
0.72 0.81 0.76
This result is comparable to the top results (0.78) among the top 5 participants in “Web People Search Task.”
We could acquire useful information from sentences that characterize an entity of a person and disambiguate the entity of a person effectively.
![Page 10: My past works Kazunari Sugiyama 16 Apr., 2009 WING Group Meeting](https://reader035.vdocuments.site/reader035/viewer/2022062311/5a4d1b107f8b9ab05998f321/html5/thumbnails/10.jpg)
10
I-2. Word Sense Disambiguation I-2. Word Sense Disambiguation in Japanese Textsin Japanese Texts
[Outline]1. Introduction2. Proposed Method3. Experiments
![Page 11: My past works Kazunari Sugiyama 16 Apr., 2009 WING Group Meeting](https://reader035.vdocuments.site/reader035/viewer/2022062311/5a4d1b107f8b9ab05998f321/html5/thumbnails/11.jpg)
11
1. 1. Introduction (1/2)Introduction (1/2)
Word Sense Disambiguation (WSD)– Determining the meaning of ambiguous word
in its context“run”:
(1) Bob goes running every morning.“to move fast by using one’s feet”
(2) Mary runs a beauty parlor.“to direct or control”
![Page 12: My past works Kazunari Sugiyama 16 Apr., 2009 WING Group Meeting](https://reader035.vdocuments.site/reader035/viewer/2022062311/5a4d1b107f8b9ab05998f321/html5/thumbnails/12.jpg)
12
1. 1. Introduction (2/2)Introduction (2/2)
Raw corpusnot assigned sense tags
Add sense-tagged instances
(2)
(3)
Feature extractionfor clustering and WSD(“baseline features”)
(1)
Semi-supervised clustering
SupervisedWSD
Feature extractionfor WSD from clustering results
(4)
(“seed instances”)
Our approach for WSD Basically, supervised WSD Applying semi-supervised clustering
by introducing sense-tagged instances to supervised WSD
![Page 13: My past works Kazunari Sugiyama 16 Apr., 2009 WING Group Meeting](https://reader035.vdocuments.site/reader035/viewer/2022062311/5a4d1b107f8b9ab05998f321/html5/thumbnails/13.jpg)
13
2. 2. Proposed MethodProposed Method
2.1 Semi-supervised Clustering2.2 Features for WSD
obtained Using Clustering Results
![Page 14: My past works Kazunari Sugiyama 16 Apr., 2009 WING Group Meeting](https://reader035.vdocuments.site/reader035/viewer/2022062311/5a4d1b107f8b9ab05998f321/html5/thumbnails/14.jpg)
14
2.1 Semi-supervised Clustering2.1 Semi-supervised Clustering
2.1.1 Features for Clustering2.1.2 Semi-supervised Clustering2.1.3 Seed Instances and Constraints
for Clustering
![Page 15: My past works Kazunari Sugiyama 16 Apr., 2009 WING Group Meeting](https://reader035.vdocuments.site/reader035/viewer/2022062311/5a4d1b107f8b9ab05998f321/html5/thumbnails/15.jpg)
15
2.1.1 2.1.1 Features for Clustering and WSD Features for Clustering and WSD (“baseline features”)(“baseline features”)
Morphological features– Bag-of-words (BOW), Part-of-speech (POS),
and detailed POS classification Target word itself and the two words to its right and left.
Syntactic features– If the POS of a target word is a noun, extract the verb in a
grammatical dependency relation with the noun.– If the POS of a target word is a verb, extract the noun in a
grammatical dependency relation with the verb. Figures in Bunrui-Goi-Hyou
– 4 and 5 digits regarding the content word to the right and left of the target word.
地域 (“community”) 社会 (“society”) “1.1720,4,1,3” → 1172 (as 4 digits), 11720 (as 5digits)
5 topics inferred on the basis of LDA– Compute the log-likelihood of a word instance
(“soft-tag” approach [Cai et al., EMNLP’07])
![Page 16: My past works Kazunari Sugiyama 16 Apr., 2009 WING Group Meeting](https://reader035.vdocuments.site/reader035/viewer/2022062311/5a4d1b107f8b9ab05998f321/html5/thumbnails/16.jpg)
16
2.1.2 Semi-supervised Clustering2.1.2 Semi-supervised Clustering[Proposed method]
x : word instance
Gi: the centroid of a clusterx si
: seed instancecD
nn
cD
CC
CC
xn
C
xn
Cnew
xx
),(1
),(1
21
2121
21 GG
GGff
Gff
tn
t1
t2
t3
×G1
×G2 w ps2C1
C3
× G2C2
xs1
tn
t1
t2
t3
×G1
×G2 w ps2C1
C3
× G2C2
xs1
Gnew×
Cnew
),( 21 CCD GG : adaptive Mahalanobis distance
Refine [Sugiyama and Okumura, ICADL’07] (about Web People Search Task)for word instances
Control the fluctuation of the centroid of a cluster
![Page 17: My past works Kazunari Sugiyama 16 Apr., 2009 WING Group Meeting](https://reader035.vdocuments.site/reader035/viewer/2022062311/5a4d1b107f8b9ab05998f321/html5/thumbnails/17.jpg)
17
2.1.3 Seed Instances 2.1.3 Seed Instances and Constraints for Clustering (1/3)and Constraints for Clustering (1/3)
Select initial seed instances:(I-1) randomly,(I-2) “KKZ”(I-3) centroid of a cluster
generated by K-means initial instances for K-means * randomly,
* “KKZ” [Katsavounidis et al., IEEE Signal Processing Letters, ‘94]
[Method I]Data set of word instances
Training data set Data set for clustering
: seed instance
![Page 18: My past works Kazunari Sugiyama 16 Apr., 2009 WING Group Meeting](https://reader035.vdocuments.site/reader035/viewer/2022062311/5a4d1b107f8b9ab05998f321/html5/thumbnails/18.jpg)
18
2.1.3 Seed Instances 2.1.3 Seed Instances and Constraints for Clustering (2/3)and Constraints for Clustering (2/3)
[Method II]Data set of word instances
Training data set
s1, s2, s3: word senses of target word
s1 s2 s3
Data set for clustering
: seed instance
Select initial seed instances:(II-1) By considering the frequency
of word senses(II-1-1) randomly,(II-1-2) “KKZ”(II-1-3) centroid of a cluster generated by K-means
initial instances for K-means * randomly, * “KKZ”
(II-2) In proportion to the frequency of word senses (D’Hondt
method)(II-2-1) randomly,(II-2-2) “KKZ”(II-2-3) centroid of a cluster generated by K-means initial instances for K-means
* randomly, * “KKZ”
![Page 19: My past works Kazunari Sugiyama 16 Apr., 2009 WING Group Meeting](https://reader035.vdocuments.site/reader035/viewer/2022062311/5a4d1b107f8b9ab05998f321/html5/thumbnails/19.jpg)
19
Constraints– “cannot-link” only,– “must-link” only,– Both constraints, – “cannot-link” and “must-link” without outliers
2.1.3 Seed Instances 2.1.3 Seed Instances and Constraints for Clustering (3/3)and Constraints for Clustering (3/3)
Put “must-link” constraints between and
×
×is an outlier,
so a “must-link” constraint is not added.
Csw
Csv
Csv
Csw
C sv C sw
C sw
D( , ) < , G new G sC v ThdisD( , ) < G new G sC w ThdisG new
G newD( , ) > , G new G sC v ThdisD( , ) > G new G sC w Thdis
D( , ) Gnew svGCD( , ) Gnew swGC
D( , ) Gnew swGC
D( , ) Gnew svGC
(Thdis = 0.388)
[“must-link” without outliers]
![Page 20: My past works Kazunari Sugiyama 16 Apr., 2009 WING Group Meeting](https://reader035.vdocuments.site/reader035/viewer/2022062311/5a4d1b107f8b9ab05998f321/html5/thumbnails/20.jpg)
20
2.2 2.2 Features for WSD Obtained Using Features for WSD Obtained Using Clustering ResultsClustering Results
(a) Inter-cluster information TF in a cluster (TF) Cluster ID (CID) Sense frequency (SF)
(b) Context information regarding adjacent words
Mutual information (MI) T-score (T) (CHI2)
(c) Context information regarding two words to the right and left of the target word
Information gain (IG)
2
)1,...,2(1 iww ii
21012 wwwww
*We employ (b) and (c) to reflect the concept of “one sense per collocation.”[Yarowsky, ACL’95]
![Page 21: My past works Kazunari Sugiyama 16 Apr., 2009 WING Group Meeting](https://reader035.vdocuments.site/reader035/viewer/2022062311/5a4d1b107f8b9ab05998f321/html5/thumbnails/21.jpg)
21
4. 4. ExperimentsExperiments
4.1 Experimental Data4.2 Semi-supervised Clustering4.3 Word Sense Disambiguation
![Page 22: My past works Kazunari Sugiyama 16 Apr., 2009 WING Group Meeting](https://reader035.vdocuments.site/reader035/viewer/2022062311/5a4d1b107f8b9ab05998f321/html5/thumbnails/22.jpg)
22
4.1 4.1 Experimental DataExperimental Data
RWC corpus from the “SENSEVAL-2 Japanese Dictionary Task”– 3000 Japanese newspaper articles issued in
1994 Sense tags in Japanese Dictionary, “Iwanami
Kokugo Jiten” were manually assigned to 148,558 ambiguous words.
– 100 target words 50 nouns, and 50 verbs
![Page 23: My past works Kazunari Sugiyama 16 Apr., 2009 WING Group Meeting](https://reader035.vdocuments.site/reader035/viewer/2022062311/5a4d1b107f8b9ab05998f321/html5/thumbnails/23.jpg)
23
4.2 4.2 Semi-supervised ClusteringSemi-supervised Clustering
Our semi-supervised clustering approach outperforms other distance-based approaches.– Our method locally adjusts the centroid of a cluster
Comparison of distance-based approaches
[Observations]
Distance-based approaches
![Page 24: My past works Kazunari Sugiyama 16 Apr., 2009 WING Group Meeting](https://reader035.vdocuments.site/reader035/viewer/2022062311/5a4d1b107f8b9ab05998f321/html5/thumbnails/24.jpg)
24
4.3 4.3 Word Sense DisambiguationWord Sense Disambiguation Experimental Results
[Observations]• The best accuracy is obtained, when we add features from clustering results, CID, MI, and IG to the baseline features.
• According to the results of OURS, TITECH, NAIST, WSD accuracy issignificantly improved by adding features computed from clustering results.
![Page 25: My past works Kazunari Sugiyama 16 Apr., 2009 WING Group Meeting](https://reader035.vdocuments.site/reader035/viewer/2022062311/5a4d1b107f8b9ab05998f321/html5/thumbnails/25.jpg)
25
II-1 Characterizing Web Pages Using II-1 Characterizing Web Pages Using Hyperlinked Neighboring PagesHyperlinked Neighboring Pages
[Outline]1. Introduction2. Our Proposed Method3. Experiments
![Page 26: My past works Kazunari Sugiyama 16 Apr., 2009 WING Group Meeting](https://reader035.vdocuments.site/reader035/viewer/2022062311/5a4d1b107f8b9ab05998f321/html5/thumbnails/26.jpg)
26
1. Introduction1. Introduction
Transitions of Web Search EnginesThe first-generation search engines:
• Only terms included in Web pages were utilized as indices of Web pages.• Peculiar features of the Web such as hyperlink structures are not exploited.
The second-generation search engines: • The hyperlink structures of Web pages are considered.
e.g., (1) “Optimal Document Granularity” based IR systems(2) HITS (CLEVER project), Teoma (DiscoWeb)(3) PageRank (Google)
Users are not satisfied with• the ease of use,• retrieval accuracy.
![Page 27: My past works Kazunari Sugiyama 16 Apr., 2009 WING Group Meeting](https://reader035.vdocuments.site/reader035/viewer/2022062311/5a4d1b107f8b9ab05998f321/html5/thumbnails/27.jpg)
27
2. Our Proposed Methods2. Our Proposed Methods
t1
tm
t2
t3
feature vector of :
ptgttarget Web page:
ptgt
feature vector of Web page hyperlinked from ptgt
w ptgt
pw’ tgt
w ptgt
refined feature vectorof : ptgt
pw’ tgt
![Page 28: My past works Kazunari Sugiyama 16 Apr., 2009 WING Group Meeting](https://reader035.vdocuments.site/reader035/viewer/2022062311/5a4d1b107f8b9ab05998f321/html5/thumbnails/28.jpg)
28
Method Method ⅠⅠ
ptgt
[in the Web space]
tm
t2
t3
[in the vector space]
t1
w ptgt pw’ tgtL(in)
L(out)
![Page 29: My past works Kazunari Sugiyama 16 Apr., 2009 WING Group Meeting](https://reader035.vdocuments.site/reader035/viewer/2022062311/5a4d1b107f8b9ab05998f321/html5/thumbnails/29.jpg)
29
Method ⅡMethod Ⅱ
ptgt
tm
t2
t3
t1clusters generated from groups of Web pagesin each level from target page
w ptgt pw’ tgt
centroid vectorof clusters
[in the Web space] [in the vector space]
L(in)
L(out)
![Page 30: My past works Kazunari Sugiyama 16 Apr., 2009 WING Group Meeting](https://reader035.vdocuments.site/reader035/viewer/2022062311/5a4d1b107f8b9ab05998f321/html5/thumbnails/30.jpg)
30
Method Method ⅢⅢ
ptgt
tm
t2
t3
t1clusters generated from groups of Web pagesin each level from target page
w ptgt pw’ tgt
centroid vectorof clusters
[in the Web space] [in the vector space]
L(in)
L(out)
![Page 31: My past works Kazunari Sugiyama 16 Apr., 2009 WING Group Meeting](https://reader035.vdocuments.site/reader035/viewer/2022062311/5a4d1b107f8b9ab05998f321/html5/thumbnails/31.jpg)
31
Experimental SetupExperimental Setup
Data Set– TREC WT10g test collection
(10GB, 1.69 million Web pages)
Specification of Workstation– Sun Enterprise 450
CPU: UltraSparc-II 480MHz Memory: 2 GB OS: Solaris 8
![Page 32: My past works Kazunari Sugiyama 16 Apr., 2009 WING Group Meeting](https://reader035.vdocuments.site/reader035/viewer/2022062311/5a4d1b107f8b9ab05998f321/html5/thumbnails/32.jpg)
32
Web pages Using in the ExperimentWeb pages Using in the Experiment
ptgt
![Page 33: My past works Kazunari Sugiyama 16 Apr., 2009 WING Group Meeting](https://reader035.vdocuments.site/reader035/viewer/2022062311/5a4d1b107f8b9ab05998f321/html5/thumbnails/33.jpg)
33
Experimental ResultsExperimental ResultsComparison of best search accuracy
obtained using each Method I, II, III
Comparison of best search accuracyobtained using each Method I, II, III
00.10.20.30.40.50.60.7
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1recall
prec
ision TF- IDF
(MI-a), L(in)=3(MII-a), L(in)=1, K=2(MIII-a), L(in)=2, K=3
The contents of a target Web page can be represented much better by making a point of in-linked pages of a target page.
tfidf(MI-a), L(in)=3
averageprecision
% improvement
0.3130.3420.3400.345
+2.9+2.7+3.2
-
(MII-a), L(in)=1, K=2
(MIII-a), L(in)=2, K=3
(MI-a), L(in)=3
averageprecision % improvement
0.032
0.3400.345
+31.0+30.8+31.3
(MII-a), L(in)=1, K=2
(MIII-a), L(in)=2, K=3
HITS
modified HITS
modified HITS with weighted links
0.136
0.1380.342
-+10.4
+10.6
![Page 34: My past works Kazunari Sugiyama 16 Apr., 2009 WING Group Meeting](https://reader035.vdocuments.site/reader035/viewer/2022062311/5a4d1b107f8b9ab05998f321/html5/thumbnails/34.jpg)
34
Method Method ⅢⅢ
ptgt
tm
t2
t3
t1clusters generated from groups of Web pagesin each level from target page
w ptgt pw’ tgt
centroid vectorof clusters
[in the Web space] [in the vector space]
L(in)
1
2
![Page 35: My past works Kazunari Sugiyama 16 Apr., 2009 WING Group Meeting](https://reader035.vdocuments.site/reader035/viewer/2022062311/5a4d1b107f8b9ab05998f321/html5/thumbnails/35.jpg)
35
II-2 Adaptive Web Search Based on II-2 Adaptive Web Search Based on User’s Information NeedsUser’s Information Needs
[Outline]1. Introduction2. Our Proposed Method3. Experiments
![Page 36: My past works Kazunari Sugiyama 16 Apr., 2009 WING Group Meeting](https://reader035.vdocuments.site/reader035/viewer/2022062311/5a4d1b107f8b9ab05998f321/html5/thumbnails/36.jpg)
36
1. Introduction1. Introduction The World Wide Web (WWW)
– It has become increasingly difficult for users to find relevant information.
– Web search engines help users find useful information on the WWW.
Web Search Engines– Return the same results regardless of who submits the query.
Web search results should adapt to users with different information
needs.
In general, each user has different information needs for his/her query.
![Page 37: My past works Kazunari Sugiyama 16 Apr., 2009 WING Group Meeting](https://reader035.vdocuments.site/reader035/viewer/2022062311/5a4d1b107f8b9ab05998f321/html5/thumbnails/37.jpg)
37
2. Our Proposed Method2. Our Proposed Method
User profile construction based on(1) pure browsing history,(2) modified collaborative filtering.
![Page 38: My past works Kazunari Sugiyama 16 Apr., 2009 WING Group Meeting](https://reader035.vdocuments.site/reader035/viewer/2022062311/5a4d1b107f8b9ab05998f321/html5/thumbnails/38.jpg)
38
(1) User Profile Construction Based (1) User Profile Construction Based on Pure Browsing History (1/3)on Pure Browsing History (1/3)
We assume that the preferences of each user consist of the following two aspects:
Pand construct user profile .
P per(1) Persistent (or long term) preferences ,
P today(2) Ephemeral (or short term) preferences
![Page 39: My past works Kazunari Sugiyama 16 Apr., 2009 WING Group Meeting](https://reader035.vdocuments.site/reader035/viewer/2022062311/5a4d1b107f8b9ab05998f321/html5/thumbnails/39.jpg)
39
(1) User Profile Construction Based (1) User Profile Construction Based on Pure Browsing History (2/3)on Pure Browsing History (2/3)
Browsing history of today(0 day ago)
Browsing history of 2 days ago
1 SN
Browsing history ofN days ago
[Persistent preferences]
: Web page : Window size
[Ephemeral preferences]
1 S2
Browsing history of 1 days ago
1 S1
1 S0
1st browsing history in today
(1) 1 S0
browsing history in today
(n )bh
nbh th
1 hp
rth browsing history in today
(r) S0(r)(1) (r) 1 S0
(cur)(cur)
current sessionP (r)
P (br)P(cur)
Pper
P today
![Page 40: My past works Kazunari Sugiyama 16 Apr., 2009 WING Group Meeting](https://reader035.vdocuments.site/reader035/viewer/2022062311/5a4d1b107f8b9ab05998f321/html5/thumbnails/40.jpg)
40
(1) User Profile Construction Based (1) User Profile Construction Based on Pure Browsing History (3/3)on Pure Browsing History (3/3)
User profile is finally constructed as follows: P
).5.0,5.0(,1satisfythat constantsareandand,1satisfythat constantsareandwhere
)()(
yxyxyxbaba
bybxa
bacurbrper
todaypertoday
PPPPPP (a)
(b)
![Page 41: My past works Kazunari Sugiyama 16 Apr., 2009 WING Group Meeting](https://reader035.vdocuments.site/reader035/viewer/2022062311/5a4d1b107f8b9ab05998f321/html5/thumbnails/41.jpg)
41
Overview of the Pure Overview of the Pure Collaborative Filtering AlgorithmCollaborative Filtering Algorithm
user 1user 2
user a
user U
item 1 item 2 item i item I
Item that prediction is computed
Active user
25
3 21 4
5
5
3
24
User-item ratings matrix for collaborative filtering
![Page 42: My past works Kazunari Sugiyama 16 Apr., 2009 WING Group Meeting](https://reader035.vdocuments.site/reader035/viewer/2022062311/5a4d1b107f8b9ab05998f321/html5/thumbnails/42.jpg)
42
(2) User Profile Construction Based on (2) User Profile Construction Based on Modified Collaborative FilteringModified Collaborative Filtering
(( 1/21/2 ))User-term weight matrix for modified collaborative filtering
user 1
user 2
user a
user U
term 1 term 2 term i term TTerm weight that prediction is computed
Active user
user 1
user 2
user a
user U
term 1 term 2 term i term T term T+1 term T+2 term T+v
0.745
0.745 0.362
0.835
0.639
0.4610.247
0.534
0.928
0.718
0.126
0.485
0.718
0.126
0.485
0.534
0.928
0.362
0.835
0.639
0.4610.247
0.723
0.686
0.328
0.451
0.563
0.172
When each user browsed k Web pages
Active user
Term weight that prediction is computed
When each user browsed k+1 Web pages
![Page 43: My past works Kazunari Sugiyama 16 Apr., 2009 WING Group Meeting](https://reader035.vdocuments.site/reader035/viewer/2022062311/5a4d1b107f8b9ab05998f321/html5/thumbnails/43.jpg)
43
User Profile Construction User Profile Construction Based on Based on
Modified Collaborative Filtering AlgorithmModified Collaborative Filtering Algorithm
User Profile Construction Based on (1) Static Number of Users in the Neighborhood,
(2) Dynamic Number of Users in the Neighborhood.
![Page 44: My past works Kazunari Sugiyama 16 Apr., 2009 WING Group Meeting](https://reader035.vdocuments.site/reader035/viewer/2022062311/5a4d1b107f8b9ab05998f321/html5/thumbnails/44.jpg)
44
ExperimentExperiment (1/2)(1/2)
Explicit Method(1) Relevance Feedback ,
Implicit Method(2) Method based on pure browsing history ,
(3) Method based on modified collaborative filtering
Construction of User Profile
Observed browsing history– 20 subjects – 30 days
![Page 45: My past works Kazunari Sugiyama 16 Apr., 2009 WING Group Meeting](https://reader035.vdocuments.site/reader035/viewer/2022062311/5a4d1b107f8b9ab05998f321/html5/thumbnails/45.jpg)
45
Experiments (2/2)Experiments (2/2)Query
– 50 query topics employed as test topics in the TREC WT10g test collection
Evaluation– Compute similarity between user profile and
feature vector of Web page in search results– R-precision (R=30)
![Page 46: My past works Kazunari Sugiyama 16 Apr., 2009 WING Group Meeting](https://reader035.vdocuments.site/reader035/viewer/2022062311/5a4d1b107f8b9ab05998f321/html5/thumbnails/46.jpg)
46
User Profile Based on User Profile Based on Pure Browsing History (Implicit Method)Pure Browsing History (Implicit Method)
Using , and , user profile is defined as follows:
P per P (br)P
(cur) P
1where
)()(
babybxa curbrper PPPP
![Page 47: My past works Kazunari Sugiyama 16 Apr., 2009 WING Group Meeting](https://reader035.vdocuments.site/reader035/viewer/2022062311/5a4d1b107f8b9ab05998f321/html5/thumbnails/47.jpg)
47
Experimental ResultsExperimental Results
• A user profile that provides search results adaptive to a user can be constructed when a window size with about 15 days is used.
• This approach can achieve about 3% higher precision than the relevance feedback-based user profile.
The user’s browsing history strongly reflectsthe user’s preference.
![Page 48: My past works Kazunari Sugiyama 16 Apr., 2009 WING Group Meeting](https://reader035.vdocuments.site/reader035/viewer/2022062311/5a4d1b107f8b9ab05998f321/html5/thumbnails/48.jpg)
48
User Profile Based on User Profile Based on Modified Collaborative Filtering Modified Collaborative Filtering
(Implicit Method)(Implicit Method)Using , and , user profile is defined as follows:
Pper P (br) V (pre)
P
1where
)()(
babybxa prebrper VPPP
User Profile Construction Based on (1) Static Number of Users in the Neighborhood,(2) Dynamic Number of Users in the Neighborhood
![Page 49: My past works Kazunari Sugiyama 16 Apr., 2009 WING Group Meeting](https://reader035.vdocuments.site/reader035/viewer/2022062311/5a4d1b107f8b9ab05998f321/html5/thumbnails/49.jpg)
49
Experimental ResultsExperimental Results(Dynamic Method)(Dynamic Method)
• In all of our experimental results, the best precision is obtained in the case of x=0.129, y= 0.871.
In this method, the neighborhood of each user is determined by the centroid vectors of clusters of users, and the number of clusters is different user by user.
This allows each user to perform more fine-grained search compared with the static method.
![Page 50: My past works Kazunari Sugiyama 16 Apr., 2009 WING Group Meeting](https://reader035.vdocuments.site/reader035/viewer/2022062311/5a4d1b107f8b9ab05998f321/html5/thumbnails/50.jpg)
50
Thank you very much!Thank you very much!