„ip“ is not always „internet protocol“ a long and a very short example for ip problems in...
Post on 18-Dec-2015
216 views
TRANSCRIPT
„IP“ is not always „Internet Protocol“A long and a very short example for IP problems in Web 2.0 research
Ralf Schenkel
Joint work with Tom Crecelius, Mouna Kacimi, Sebastian Michel, Thomas Neumann, Josiane Parreira, Marc Spaniol, Gerhard Weikum
September 25, 2008 Dagstuhl Perspectives Workshop Web 2.0
Social Tagging NetworksDefinition: Social Tagging NetworkWebsite where people• publish + tag information• review + rate information• publish their interests• maintain network of friends• interact with friends
Common examples:• Flickr (images)• YouTube (videos)• del.icio.us (bookmarks)• Librarything (books)
• Discogs (CDs)• CiteULike (papers)• Facebook• Myspace (media)
Part 1: Search in Social Tagging Networks
(long)
September 25, 2008 Dagstuhl Perspectives Workshop Web 2.0
Some Statistics
Flickr: (as of Nov 2007)• 2+ billion photosFacebook: (as of Apr 2007)• 1.8 billion photos• 31 million active users• 100,000 new users per day
Myspace: (as of Apr 2007)• 135 million users (6th largest country on Earth)• 2+ billion images (150,000 req/s), millions added daily• 25 million songs• 60TB videos
Huge volume of highly dynamic data
September 25, 2008 Dagstuhl Perspectives Workshop Web 2.0
Showcase: librarything.com
RatingsTagsBooks
Others
September 25, 2008 Dagstuhl Perspectives Workshop Web 2.0
librarything.com: Social Interaction
Explicit Friends
Similar Users
Comments
September 25, 2008 Dagstuhl Perspectives Workshop Web 2.0
librarything.com: Tag Clouds
September 25, 2008 Dagstuhl Perspectives Workshop Web 2.0
librarything.com: Search
Search results independent of the querying user(and the social context)
Search results independent of the querying user(and the social context)
September 25, 2008 Dagstuhl Perspectives Workshop Web 2.0
Outline
• Introduction
• Modelling Social Tagging Networks– Graph Model
– Different Information Needs
• Effective Query Scoring
• Efficient Query Evaluation
• Summary & Further Challenges
September 25, 2008 Dagstuhl Perspectives Workshop Web 2.0
Social Network Model
travelNorway
travelChina queueing
theory
USERS
ITEMS
TAGS
September 25, 2008 Dagstuhl Perspectives Workshop Web 2.0
Social Network Model
travelNorway
travelChina queueing
theory
USERS
ITEMS
TAGS
September 25, 2008 Dagstuhl Perspectives Workshop Web 2.0
Social Network Model
travelNorway
travelChina queueing
theory
USERS
ITEMS
TAGS
travel travel
tripvldb
travel probability
queuestravel
probability
harrypotter
September 25, 2008 Dagstuhl Perspectives Workshop Web 2.0
Information Need 1: Global
travelNorway
travelChina queueing
theory
USERS
ITEMS
TAGS
travel travel
tripvldb
travel probability
queuestravel
probability
harrypotter
harry potter
Tags by all users equally important
September 25, 2008 Dagstuhl Perspectives Workshop Web 2.0
Information Need 2: Similar Users
travelNorway
travelChina queueing
theory
USERS
ITEMS
TAGS
travel travel
tripvldb
travel probability
queuestravel
probability
harrypotter
travel
?Tags by users with similar tags/items
more important
September 25, 2008 Dagstuhl Perspectives Workshop Web 2.0
Information Need 3: Trusted Friends
travelNorway
travelChina queueing
theory
USERS
ITEMS
TAGS
travel travel
tripvldb
travel probability
queuestravel
probability
harrypotter
probability
?Tags by closely related users
more important
September 25, 2008 Dagstuhl Perspectives Workshop Web 2.0
Wishlist for Social-Aware Social Search• Search results depend on
– Global popularity of items– Collection context of the querying user (books, tags)– Social context of the querying user (trusted friends)
• Automatic tag expansion (beyond synonyms)• Scalable query processing• Explanation of results
(similar wishlist for social recommendations)
September 25, 2008 Dagstuhl Perspectives Workshop Web 2.0
Fast Forward…
Imagine a 20 minutes talk aboutquantified friendship measures,
personalized scoring models,dynamic tag expansion,
scalable query processing, …
Essence:
• Context-aware personalized search
• Tags from closely related users are more important
• Different kinds of „relatedness“ possible[SIGIR 2008]
September 25, 2008 Dagstuhl Perspectives Workshop Web 2.0
Experimental Evaluation: Effectiveness
Systematic evaluation of result quality difficult
Three possible setups:• Manual queries + human assessments• Queries+assessments derived from external info
(ex: DMOZ categories)• Automated assessments from context of user
– Items tagged by friends– Items tagged in the future
?
September 25, 2008 Dagstuhl Perspectives Workshop Web 2.0
Prototype Implementation
[SIG
IR D
emo
2008
], [V
LDB
Dem
o 20
08]
September 25, 2008 Dagstuhl Perspectives Workshop Web 2.0
Preliminary User StudyLibraryThing user study: [Data Engineering Bulletin, June 2008]• 6 librarything users with reasonably large library and friend sets• Overall 49 queries• Crawled (part of) librarything: ~1,3 mio books, ~15 mio tags,
~12,000 users, ~18,000 friends• Measured NDCG[10]
0.0 0.2 0.5 0.8 1.0
0.0 0.546 0.572 0.568 0.565 0.565
0.2 0.564 0.572 0.579 0.581 -
0.5 0.539 0.552 0.559 - -
0.8 0.515 0.546 - - -
1.0 0.465 - - - -
(1-α)(graph)
(1-α) (content)
Authors of the paper
September 25, 2008 Dagstuhl Perspectives Workshop Web 2.0
We need a benchmark collection, but…• Everybody „has“ data from Flickr, librarything• Data contains private information by definition• Data cannot be successfully anonymized (AOL)• Data must not be anonymized
(we need the users to assess results)• Data must be large scale
(a few volunteers are not enough)• Collection must be completely offline available
for stability of results (including images,…)
Part 2: Web Archiving
(very short)
September 25, 2008 Dagstuhl Perspectives Workshop Web 2.0
Online Information is Volatile• Huge amount of information available online only
today• Easily lost (hardware failure, software failure,
human failure, deletion, attack, …)• Easily unaccessible (anybody knows Interleaf?)• Easily manipulated• How will historians learn about the 21th century?
Strong need for long-term preservationof the evolving Web