2008/06/06 Y.H.ChangTowards Effective Browsing of Large
Scale Social Annotations1
Towards Effective Browsing of Large Scale Social Annotations
WWW 2007
Advisor: Hsin-Hsi ChenReporter: Y.H Chang
2008-06-06
Rui Li, Shenghua Bao, Yong Yu, Zhong Su, and Ben Fei
Shanghai JiaoTong University IBM China Research Lab
2008/06/06 Y.H.ChangTowards Effective Browsing of Large
Scale Social Annotations2
Outline
• Introduction
• ELSABer overview
• Components of ELSABer
• Enhanced models
• Experimental results
• Conclusion
2008/06/06 Y.H.ChangTowards Effective Browsing of Large
Scale Social Annotations3
Introduction
• Today, a lot of services (e.g., Del.icio.us, Filckr) have been provided for helping users to manage and share their favorite URLs and photos based on social annotations.
• How to effectively find desired resources from large annotation data is a new problem.
• In this paper, we propose a novel algorithm, namely Effective Large Scale Annotation Browser (ELSABer), to browse large-scale social annotation data.
2008/06/06 Y.H.ChangTowards Effective Browsing of Large
Scale Social Annotations4
Introduction
• ELSABer helps the users browse huge number of annotations in a semantic, hierarchical and efficient way.
• By incorporating the personal and time information, ELSABer can be further extended for personalized and time-related browsing.
2008/06/06 Y.H.ChangTowards Effective Browsing of Large
Scale Social Annotations5
A set of pages
related to the current annotation“programming”
The prototype system based on ELSABer
Sub-tags (sub category) of “programming”
2008/06/06 Y.H.ChangTowards Effective Browsing of Large
Scale Social Annotations6
ELSABer overview• Input An empty concept set SC
• Step 1 Output the initial view of annotations– generates TOP 100 tags from 2000 most frequently URLs and tags. – They are the roots in hierarchical browsing.
• Loop User select a tag Ti
• Step 2 Concept Matching– Add tag Ti to set SC
– Calculate related tag set and URL set• Step 3 (optional) sample URL set and sample Tag set• Step 4 Hierarchical Browsing
– 4-1 Calculate candidate sub-tags– 4-2 Rank the sub-tags by Infor-score
• IF Termination condition Satisfied; Return• ELSE Loop
2008/06/06 Y.H.ChangTowards Effective Browsing of Large
Scale Social Annotations7
Components of ELSABer
• Data setup and representation• Semantic Browsing
– a. Annotation Similarity Estimation
– b. Generating the Semantic Concept
• Hierarchical Browsing– c. Sub-Tag Generation
– d. Sub-Tag Clustering
• Efficient Browsing
2008/06/06 Y.H.ChangTowards Effective Browsing of Large
Scale Social Annotations8
Data setup and representation
• Del.icio.us (May, 2006)• We define an annotation as a quadruple:
– (User, URL, Tag, Time).• Associated matrix Mmxn
• m and n is the total number of tags and URLs• |URL(ti)| represents the number of URLs annotated by tag ti.• Cij denote the number of users who annotate the jth URL with
the ith tag
Like the TFIDF of IR
2008/06/06 Y.H.ChangTowards Effective Browsing of Large
Scale Social Annotations9
Data setup and representation
• Given the associated matrix Mmxn :
T1
T2
.
.
.
Tm
the tag can be represented as a row vector Ti (U1,U2,.. Un) of M
the URL can be represented as a column vector Ui (t1,t2,…,tm) of M.
U1 U2 .. … .. Un
2008/06/06 Y.H.ChangTowards Effective Browsing of Large
Scale Social Annotations10
Semantic Browsinga. Annotation Similarity Estimation
• Similarity:
• Special case-1(stemming):
Ex: Programs & Programming
=> add 0.1 weight
• Special case-2(punctuation):
Ex: Web-dev & WebDev
=> add 0.08 weight
2008/06/06 Y.H.ChangTowards Effective Browsing of Large
Scale Social Annotations11
Semantic Browsingb. Generating the Semantic Concept
• Given the selected tag ti, we choose a tag set STi that is most related to ti by following rules:– 1. tj should be among the N most similar tags relat
ed to ti– 2. The similarity should be larger than a threshold
θ.– N=4, θ=0.7
• semantic concept Ci = STi {ti}∪
2008/06/06 Y.H.ChangTowards Effective Browsing of Large
Scale Social Annotations12
Semantic Browsingb. Generating the Semantic Concept
• The path of user’s clicking: t1, t2,…,tL will bring a sequence of concepts: C1, C2,…,CL.
• Let concept set SC = {C1, C2,…, CL}.
• The related URLs :– ReURL(SC ) = {u | C S∀ ∈ C ,T(u) ∩C ≠ Φ}– T(u) means the set of annotations given to URL u.
• the related tags can be defined as all the tags given to ReURL(SC ):– ReTag(SC ) {t | u ReURL(S∈ C ),t T(u)}∈
2008/06/06 Y.H.ChangTowards Effective Browsing of Large
Scale Social Annotations13
Hierarchical Browsingc. Sub-Tag Generation
• If the intersection URL set is the main part of all the URLs of ti, but a small part of tj, we can infer that ti is a sub-tag of tj
40 related tags of “google”
2008/06/06 Y.H.ChangTowards Effective Browsing of Large
Scale Social Annotations14
Hierarchical Browsingc. Sub-Tag Generation
Features Features
Coverage of Tags
ICR
Intersection Rate
IR’
IRR Top 1~30 =1 (by IR rank)
Top 30~60 =2
Top 60~ =3
U(ti) denotes the number of URLs tagged with ti
2008/06/06 Y.H.ChangTowards Effective Browsing of Large
Scale Social Annotations15
Hierarchical Browsing c. Sub-Tag Generation
• Given the features above, each related tag is represented as a feature vector. A decision tree can be derived from the manually labeled data set to predict the sub-tag relations using C4.5.
2008/06/06 Y.H.ChangTowards Effective Browsing of Large
Scale Social Annotations16
Hierarchical Browsingd. Sub-Tag Clustering
2008/06/06 Y.H.ChangTowards Effective Browsing of Large
Scale Social Annotations17
Hierarchical Browsingd. Sub-Tag Clustering
• Infor(t) = w1TFIDF(t) + w2ICS(t) + w3TE(t) • Intra-Cluster Similarity:
– ot denotes the centroid of all the URLs associated with the tag
• Tag Entropy:
• In our experiment, these weights are 0.58, 0.27, and 0.13, respectively.
2008/06/06 Y.H.ChangTowards Effective Browsing of Large
Scale Social Annotations18
Efficient Browsing
• Observation : People use popular tags to annotate URLs and also the popular URLs are annotated by the majority of tags.
2008/06/06 Y.H.ChangTowards Effective Browsing of Large
Scale Social Annotations19
Efficient Browsing
• So we can get good results efficiently by running our algorithm in a small sub tagging space .
• In our experiment, we sampling 2000 most frequently annotated URLs and 2000 most frequently tag , so the size of M is 2000 × 2000
• <we do not cut off the “long tail”>
• After a sequence of click by the user, the intention of the user will be more specific, this causes a decreasing number of related URLs or related tags.
• When the number is less than 2000, all the tags and URLs will be calculated
2008/06/06 Y.H.ChangTowards Effective Browsing of Large
Scale Social Annotations20
Enhanced Models
• User’s profile:• The user interested annotations and resources can be f
ound as follows:
• Ri denotes the vector representation of a resource, and Ti denotes the vector representation of Ai.
• Adjust the sampling and ranking algorithms according to the user’s preference:– Infor (t,U) = α × Infor (t) + β ×UI (t | P(U))
2008/06/06 Y.H.ChangTowards Effective Browsing of Large
Scale Social Annotations21
Enhanced Models
Given the user required time interval TI= [ts, te]. We define the match of the URL’s time sequence TS and the user required time interval TI as follows:
θ=0.5
2008/06/06 Y.H.ChangTowards Effective Browsing of Large
Scale Social Annotations22
Experiment results
• The scale of the dataset:
• Machine: Intel Pentium IV 3.0 GHz, 1GB memory, 2 processors
• Java • Lucene API is also used to build URL and Tag index.
Del.icio.us (May, 2006)
1,736,268 web pages
269,566 different annotations
2008/06/06 Y.H.ChangTowards Effective Browsing of Large
Scale Social Annotations23
Experiment results
Red tag: owned by user
Orange tag: recommended
2008/06/06 Y.H.ChangTowards Effective Browsing of Large
Scale Social Annotations24
Experiment results
2008/06/06 Y.H.ChangTowards Effective Browsing of Large
Scale Social Annotations25
Conclusion
• Our main contributions:• The proposal of the effective algorithm – ELSABer b
ased on the analysis of social annotation’s characteristics.
• The proposal of enhanced models for personalized and time related browsing.
2008/06/06 Y.H.ChangTowards Effective Browsing of Large
Scale Social Annotations26
Future work
• more user studies• emphasize on how to find more qualified URL resour
ces• utilize existing hierarchical structures such as ODP an
d WordNet for helping construct more meaningful hierarchical structures for social annotations.
2008/06/06 Y.H.ChangTowards Effective Browsing of Large
Scale Social Annotations27
• Thank you!!