rawsugar web 2.0, tagging, search engines, rawsugar frank smadja rawsugar may 2006

56
RawSugar Web 2.0, Tagging, Search engines, RawSugar Frank Smadja RawSugar May 2006

Post on 21-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

RawSugar

Web 2.0, Tagging, Search engines, RawSugar

Frank SmadjaRawSugar

May 2006

RawSugar

What is Web 2.0Tim O’Reilly:Web 2.0 is the network as platform, spanning all connected devices; Web 2.0 applications are those that make the

most of the intrinsic advantages of that platform: delivering software as a continually-updated service that gets better the more people use it, consuming and remixing data from multiple sources, including individual users, while providing their own data and services in a form that allows remixing by others, creating network effects through an "architecture of participation," and going beyond the page metaphor of Web 1.0 to deliver rich user experiences.

http://www.oreillynet.com/pub/a/oreilly/tim/news/2005/09/30/what-is-web-20.html

RawSugar

What is Web 2.0?Social Web – “Wisdom of Crowds”

– Users are publishers– Network effect – SHARE - – e.g: blogger.com, flickr, youtube, del.icio.us, tadalist.com, i4giveu.com,

Technology:– Software delivery: Hours, Users are testers– AJAX (more later)– E.g.: 30Boxes, Writely, Google Calendar

Business model:– Free for users, Paid Advertisements– Share revenues with users– E.g., Google adsense, simpy, RawSugar– Pageviews => $$$$

RawSugar

Social Web – Wisdom of Crowds

(1) diversity of opinion

(2) independence of members from one another

(3) decentralization and

(4) a good method for aggregating opinions

Show: Digg amazon.com Yahoo! Movies

RawSugar

What is Tagging?

From Gary Larson

RawSugar

Tagging Example

RawSugar

Before Tagging: Classification

• Too hard to classify• Too expensive• Not scalable

• Yahoo! directory• Dmoz• Semantic Web

RawSugar

Categorization is hard!!

Multiple concepts activated

Choose ONE of the activated concepts.

Categorize it!

Object worth

remembering (article, image…)

Analysis-Paralysis!

From Rashmi Sinha

RawSugar

Tagging is simpler

Multiple concepts

are activated

Tagit!

Note all concepts

Object worth

remembering (article, image…)

From Rashmi Sinha

RawSugar

The Personal to the Social

From Rashmi Sinha

RawSugar

Tagging is a reality

• Bookmarkers tag:– Delicious, Rawsugar, Shadows, Simpy, Blinklist, …

• Bloggers tag:– 27 million blogs, doubles every 6 months– 1/3rd of blog posts now use tags (or categories)

• Many more:– BBC – news site– News - Digg– YouTube - Video– Flickr, photo publishing and tagging – Enterprise? Museums? Cell phones?

Most user generated content is tagged !

RawSugar

What Tagging is NOT

– NOT: Generous and altruistic people classifying the Web for the sake of the community

– NOT: Smart software automatically classifying Web pages and tagging them

– NOT: A collaborative way to classify the web into a growing giant ontology (folksonomy)

RawSugar

So why do People Tag?

– Recovery/sharing of personal information:• Bookmarks• Photos• Videos, etc.

– Increased traffic and findability• Bloggers

– Social reward – Advertisement $

Tagging brings value to the tagger

RawSugar

Why is Tagging successful?Semantic Web

Tagging

Who classifies

Publishers or Librarians

Everybody, consumers

Controlled vocabulary

Yes No

Imposed structure

Yes No

Classification cost

High Free

Recovery NA Yes

Searchability Low Medium

Navigation High Medium

•Tagging is free•Tagging is easy•Tagging brings value

[Marlow, Naaman, Boyd & Davis 2006]

RawSugar

RawSugar

• Covers the last mile of search• Provides Guided Search on tagged pages• Publish guided search

– Provide guided search to your site, Blog– Get more traffic – Receive advertising revenues!

Search and Explore – Navigate by topics, people, directories– Find Experts

RawSugar

Nothing to eat here!

RawSugar

Still no food here !

RawSugar

Bingo !

RawSugar

What’s Great What’s not Great ?

• Great: – You know what you’re looking for:

• “Zibibbo restaurant” -

• Not so great:– You’re hungry !– You want to browse - Discover information, explore.– You want to know what is popular (“restaurants,

digital camera, Java Tutorial, Free Games, etc.”)

RawSugar

State of the art:The Last Mile of Search

• 83% unhappy with search results (WSJ survey)– Most searches point to a list of content websites and directories– Navigation of these sites is cumbersome and tedious

• Google 2 steps approach:– Search “restaurants”– While (true) { explore guide; }– Change the query and Repeat

“The last mile of search” Examples:Digital CameraPalo Alto bikeDaily Kos Sprol dot Com

RawSugar

Where is the last mile?

Google stops here:

Human Knowledge:• Small and mid-size websites and blogs • Content is organized by human and manually:

– Categorization

– recommendations • Poor search and navigation• Each directory is an island of information and

does not connect to related directories

RawSugar

What’s Missing?Browsing with Facets

“Easy to discover information without prior knowledge of collection contents “

Faceted Search Paradigm

Not new:• Library systems: “American history”, “Shakespeare”, etc.• Search Engines: Endeca, Shopping.com, Yahoo! Directories, Dmoz, etc.• Google/MSN/Yahoo! Local Search - Browse by Location -• Current uses: E-Commerce

Problems:• Maintained by humans – Expensive• Rely on a world order – Brittle • Facets use a controlled vocabulary – Not easy to define.

=> Not Scalable

RawSugar

Amazon – Faceted SearchSearch for Tel Aviv

RawSugar

Shopping.com Faceted SearchSearch for Tel Aviv

RawSugar

RawSugar Faceted Search

Refine your search

RawSugar

RawSugar Faceted Search

Juniorbonner on del.icio.us vs. Juniorbonner on RawSugar

RawSugar

RawSugar Into the Last Mile

RawSugar inside

RawSugar

RawSugar Into the Last Mile

RawSugar inside

RawSugar

RawSugar Faceted Search in the last mile

Daily Kos Blog

Search for Iran on RawSugar

RawSugar

RawSugar Technology

RawSugar

Problem 1:Searching the TagSpace

Tags: Ikura, Uni, Ebi, Sushi, Nigiri, Japanese food, lunch in Tokyo, Ezobafun-uni, Kitamurashiuni, Murasakiuni, Akazaebi, Tenagaebi, etc.

How wouldYou tag this?

How wouldYou searchFor it?

RawSugar

Problem 2: Exploring the TagSpace

morphology

Locations

Restaurant Type

Not a restaurant!

RawSugar

Problem 3: Exploring the TagSpace

Not usable !

RawSugar

RawSugar – Tag HierarchyGuided Navigation

Food groups

Locations groups

Origins groups

RawSugar

RawSugar Tag Hierarchy

• Key idea: Some users (4%) define tag hierarchies – (food>sushi, european>spanish, …)

• We mine this tag space to learn simple tag-relations (ISA relations and RELATED) using statistics.

• At search time: We apply this learned knowledge to group tags from results.

RawSugar

RawSugar –Guided Search Combining Hierarchy Fragments

europe

UK

Scotland

Edinburgh

Spain

Italy

food

vegetarian

Sushi

food

cooking

recipes

Asian

Chinese

Thai

Southwest

California

Bay Area

San Francisco

Texas

User 1

User 2

User 3

User4

User 5

RawSugar

RawSugar: Mining and Clustering

• Related tags: Tags that are related – (collocations, synonymy, antinomy, ISA, HASA, …)

• Related pages: Pages tagged similarly

• Related people: People with similar interests

Tags

Pages

People

RawSugar TagSpace

sailing

Cyclin

g group

RawSugar

Related workRashmi Sinha: “Tag Sorting: Another tool in an information architect's toolbox” http://www.rashmisinha.com/archives/05_02/tag-sorting.html

Emanuele Quintarelli: “Hierarchical taxonomies from flat tag spaces” http://www.infospaces.it/wordpress/topics/information-architecture/91

Paul Heyman (Stanford): “Tag Hierarchies” http://i.stanford.edu/~heymann/taghierarchy.html

Brooks, Montanez, University of San Francisco: “Improved Annotation of the Blogosphere via Autotagging and Hierarchical Clustering ” http://www.cs.usfca.edu/~brooks/papers/brooks-montanez-www06.pdf

Siderean fac.etio.us: “Faceted search on delicious tags” http://www.siderean.com/delicious/facetious.jsp

Marti Hearst: “Clustering vs. Faceted Search” http://bailando.sims.berkeley.edu/papers/cacm06.pdf

And more …

RawSugar

Conclusion

Questions?

RawSugar

Backup Technology Slides

RawSugar

What should we do?Smart Backend – Easy Tagging“Tag Relations improve searchability and exploration.”

Similar tags:• Spelling and morphology: macos<->mac_os<->mac os; tagging <-> tags <->tagged,• Synonyms: macos <-> tiger; films <-> movies; new york <-> nyc; • Related: cooking <-> recipes, software development <-> programming,

Tag groups or subtags:•Location -> san francisco, london, new york, etc.•Food -> sushi, sashimi, pizza, etc.•Programming -> html, java, css, etc.

Goal : Discover them by Mining the tag space

RawSugar

What should we do?Smart Backend – Friendly Frontend

• Backend should not dictate Frontend (Patrick Schmitz, Berkeley/Yahoo!)

•Smart processing is done by the backend under the hood.

• Tagging should be as effortless as possible, assisted but not automatic. Fight Analysis-Paralysis (Rashmi Sinha)

• Systems should be built to incite people to tag. Bring Value to the tagger

RawSugar

What is Missing? Tag relations

“Tag Relations improve searchability and exploration.”

Similar tags:• Spelling and morphology: macos<->mac_os<->mac os; tagging <-> tags <->tagged,• Synonyms: macos <-> tiger; films <-> movies; new york <-> nyc; • Related: cooking <-> recipes, software development <-> programming,

Tag groups or subtags:•Location -> san francisco, london, new york, etc.•Food -> sushi, sashimi, pizza, etc.•Programming -> html, java, css, etc.

Goal : Discover them by Mining the tag space

RawSugar

Flickr – Clusters

RawSugar

Clustering – Step 1Similarity among tags

RawSugar

Some good Clusters found

RawSugar

Tags that belong to the same clusters -

RawSugar

Dmoz – World Order

RawSugar

Dmoz – World Order

RawSugar

Recommendations: dpreview

RawSugar

Faceted Search on TagSpaceChallenges

• Faceted search paradigm on the TagSpace:– Not a controlled environment– Large scale (1 facet for every 5 documents)– Lots of noise: search, search engine, google,

search_engines, searchengine, searchengines, search_engine, engine, web, internet, tools, reference, news, information, portal, engines, searching, tech, buscadores, tool …

RawSugar

Faceted Search on TagSpaceChallenges

How to rank facets? What facets should be displayed? How to show them?

• Performance: Reduce the search space - • Refining facets: Tags that allow the user to

refine (reduce) the search (depth)• Related facets: Tags that allow the user to

explore (breadth)• Group facets: Cluster tags that are related -

RawSugar

Before RawSugar

RawSugar

With RawSugar

navigation

Otherusers

RawSugar

Searching the TagSpace with RawSugar: Suggestion Engine

Goals:- Ease of tagging- Cohesiveness of our tagspace. Attempts to have our users re-use the same tags instead of creating

infinite variations. (search engines, searchengine, search, search tools, search sites, etc.)

Key Ideas :- Always suggest first the most popular tags- Use tag hierarchy and tag context to find the most relevant tags.- Use information on the user and the other users to refine the suggestions.

RawSugar

What’s Missing?Human Meta Knowledge

Is it good or no? What is it about? Is it popular?

Not new:• Guides: paloaltoonline.com, expedia.com, etc..• Review Sites - Zagat.com, dpreview.com, etc.• Shopping sites – shopping.com, Amazon,

Problems:• Limited to small environments or verticals (digital camera,

restaurants, etc.)• Not real search across sites -• Manpower – hiring, training, etc.

=> Not Scalable