semantic search on the rise

44
Semantic Search on the Rise Peter Mika | Yahoo Labs Tran Duc Thanh | LyfeLine Corporation

Upload: peter-mika

Post on 20-Jan-2015

845 views

Category:

Technology


2 download

DESCRIPTION

Presentation with Tran Duc Thanh at SemTechBiz 2014

TRANSCRIPT

Page 1: Semantic Search on the Rise

Semant ic Search on the Rise

P e t e r M i k a | Y a h o o L a b s

T r a n D u c T h a n h | L y f e L i n e C o r p o r a t i o n

Page 2: Semantic Search on the Rise

About the speakers

Peter Mika› Senior Research Scientist› Head of Semantic Search group at Yahoo! Labs› Expertise: Semantic Web, Information Retrieval,

Natural Language Processing

Tran Duc Thanh› CTO of LyfeLine Corporation, Tech Startup, Santa Clara › Assistant Professor San Jose State University (on leave), › Served as Assistant Professor for

Stanford University and Karlsruhe Institute of Technology› Expertise: Semantic Search, Semantic / Linked Data Management

Page 3: Semantic Search on the Rise

3

Agenda

What is Semantic Search? Semantic Search technology Applications Beyond Web Search Q&A

Page 4: Semantic Search on the Rise

What is Semantic Search?

4

Page 5: Semantic Search on the Rise

Why Semantic Search? Part I.

Improvements in IR are harder and harder to come by› Basic relevance models are well established› Machine learning using hundreds of features› Heavy investment in computational power, e.g. real-time indexing and instant search

Remaining challenges are not computational, but in modeling user cognition› Modeling the relationships between:

• the query

• the content

• the world at large

Page 6: Semantic Search on the Rise

Semantic gap› Ambiguity

• jaguar

• paris hilton

› Secondary meaning

• george bush (and I mean the beer brewer in Arizona)

› Subjectivity

• reliable digital camera

• paris hilton sexy

› Imprecise or overly precise searches

• jim hendler

Complex needs› Missing information

• brad pitt zombie

• florida man with 115 guns

• 35 year old computer scientist living in barcelona

› Category queries• countries in africa

• barcelona nightlife

› Relational, transactional or computational queries• Friends of peter who knows VCs in the Bay Area

• 120 dollars in euros

• digital camera under 300 dollars

• world temperature in 2020

Poorly solved information needs remain

Are there even true keyword queries?

Users may have stopped asking them

Page 7: Semantic Search on the Rise

Real problem

Page 8: Semantic Search on the Rise

What it’s like to be a machine?

Roi Blanco

Page 9: Semantic Search on the Rise

What it’s like to be a machine?

↵⏏☐ģ

✜Θ♬♬ţğ √∞ §®ÇĤĪ✜★♬☐✓✓ţğ★✜

✪✚✜ΔΤΟŨŸÏĞÊϖυτρ℠≠⅛⌫Γ≠=⅚ ©§ ★✓♪ΒΓΕ℠

✖Γ♫⅜±⏎↵⏏☐ģğğğμλκσςτ⏎⌥°¶§ ΥΦΦΦ ✗✕☐

Page 10: Semantic Search on the Rise

Why Semantic Search? Part II.

The Semantic Web is now a reality› Emerging agreements around schemas

• Facebook’s Open Graph Protocol (OGP)

• Schema.org

› Large amounts of data published in RDF

• As Linked Data

• Inside HTML pages

• Inside email text messages

› Private Knowledge Graphs inside corporations

Semantic data exploited by search engines› Better document presentation and ranking› Advanced search functionality

Page 11: Semantic Search on the Rise

11

Metadata in HTML: schema.org

Agreement on a shared set of schemas for common types of web content› Bing, Google, and Yahoo! as initial founders (June, 2011), joined by Yandex later› Similar in intent to sitemaps.org

• Use a single format to communicate the same information to all three search engines

<div vocab="http://schema.org/" typeof="Movie"> <h1 property="name">Pirates of the Carribean: On Stranger Tides (2011)</h1> <span property="description">Jack Sparrow and Barbossa embark on a quest to  find the elusive fountain of youth, only to discover that Blackbeard and  his daughter are after it too.</span> Director: <div property="director” typeof="Person"> <span property="name">Rob Marshall</span> </div></div>

Page 12: Semantic Search on the Rise

12

Substantial adoption of schema.org markup

Over 15% of all pages now have schema.org markup Over 5 million sites, over 25 billion entity references In other words: same order of magnitude as the web

› Source: R.V. Guha: Light at the end of the tunnel, ISWC 2013 keynote

See also › P. Mika, T. Potter. Metadata Statistics for a Large Web Corpus, LDOW 2012

• Based on Bing US corpus

• 31% of webpages, 5% of domains contain some metadata (including Facebook’s OGP)

› WebDataCommons

• Based on CommonCrawl Nov 2013

• 26% of webpages, 14% of domains contain some metadata (including Facebook’s OGP)

Page 13: Semantic Search on the Rise

Semantic Search technology

13

Page 14: Semantic Search on the Rise

14

Def. Semantic Search is any retrieval method where› User intent and resources are

represented in a semantic model

• A set of concepts or topics that generalize over tokens/phrases

• Additional structure such as a hierarchy among concepts, relationships among concepts etc.

› Semantic representations of the query and the user intent are exploited in some part of the retrieval process

As a research field› Workshops

• ESAIR (2008-2014) at CIKM, Semantic Search (SemSearch) workshop series (2008-2011) at ESWC/WWW, EOS workshop (2010-2011) at SIGIR, JIWES workshop (2012) at SIGIR, Semantic Search Workshop (2011-2014) at VLDB

› Special Issues of journals› Surveys

• Christos L. Koumenides, Nigel R. Shadbolt: Ranking methods for entity-oriented semantic web search.JASIST 65(6): 1091-1106 (2014)

Semantic Search

Page 15: Semantic Search on the Rise

16

Semantic models: implicit vs. explicit

Implicit/internal semantics› Models of text extracted from a corpus of queries, documents or interaction logs

• Query reformulation, term dependency models, translation models, topic models, latent space models, learning to match (PLS)

› See

• Hang Li and Jun Xu: Semantic Matching in Search. Foundations and Trends in Information Retrieval Vol 7 Issue 5, 2013, pp 343-469

Explicit/external semantics› Explicit linguistic or ontological structures extracted from text and linked to external

knowledge› Obtained using IE techniques or acquired from Semantic Web markup

Page 16: Semantic Search on the Rise

17

Entity Linking vs. Entity Retrieval

Entity Linking› Recognizing entities that are explicitly mentioned in queries and linking them to a KB

Entity Retrieval› Ranking entities in a KB, given a query› Result may not be explicitly mentioned in the query

Page 17: Semantic Search on the Rise

What it is like to be a machine?

↵⏏☐ģ

✜Θ♬♬ţğ √∞ §®ÇĤĪ✜★♬☐✓✓ţğ★✜

✪✚✜ΔΤΟŨŸÏĞÊϖυτρ℠≠⅛⌫Γ≠=⅚ ©§ ★✓♪ΒΓΕ℠

✖Γ♫⅜±⏎↵⏏☐ģğğğμλκσςτ⏎⌥°¶§ ΥΦΦΦ ✗✕☐

Page 18: Semantic Search on the Rise

Entity Linking

<roi>↵⏏☐ģ</roi>

✜Θ♬♬ţğ √∞ §®ÇĤĪ✜★♬☐✓✓ţğ★✜

✪✚✜ΔΤΟŨŸÏĞÊϖυτρ℠≠⅛⌫Γ≠=⅚ ©§ ★✓♪ΒΓΕ℠

✖Γ♫⅜±<roi>⏎↵⏏☐ģ</roi>ğğğμλκσςτ⏎⌥°¶§ ΥΦΦΦ ✗✕☐

<roi>

Page 19: Semantic Search on the Rise

Entity Retrieval

↵⏏☐ģ

<roi>

<kia>

<rio>

Page 20: Semantic Search on the Rise

21

The role of entities in queries

Entities play an important role› ~70% of queries contain a named entity (entity mention queries) and

~50% of queries have an entity focus (entity seeking queries)

• brad pitt attacked by fans

› ~10% of queries are looking for a class of entities

• brad pitt movies

› See

• Jeffrey Pound, Peter Mika, Hugo Zaragoza: Ad-hoc object retrieval in the web of data. WWW 2010: 771-780

• Thomas Lin, Patrick Pantel, Michael Gamon, Anitha Kannan, Ariel Fuxman: Active objects: actions for entity-centric search. WWW 2012: 589-598

Page 21: Semantic Search on the Rise

Entity linking in queries

Common structure to entity mention queries: query = <entity> + <intent>› Intent is typically an additional word or phrase to

• Disambiguate, e.g. brad pitt actor

• Specify action or aspect e.g. brad pitt net worth, brad pitt download

Entity linking in queries› Tutorial: Entity Linking and Retrieval by Edgar Meij, Krisztián Balog and Daan Odijk› Microsoft Entity Linking challenge› Yahoo WebScope dataset L24 - Yahoo Search Query Log To Entities, version 1.0

Session-level analysis› Recognize entities and intents at the session level› Laura Hollink, Peter Mika, Roi Blanco: Web usage mining with semantic analysis. WWW 2013: 561-570

Page 22: Semantic Search on the Rise

Entity Retrieval

Keyword search over entity graphs› see Pound et al. WWW08 for a definition› No common benchmark until 2010

SemSearch Challenge 2010/2011• 50 entity-mention queries Selected from the Search Query Tiny Sample v1.0 dataset (Yahoo!

Webscope)

• Billion Triples Challenge 2009 data set

• Evaluation using Mechanical Turk

› See report:

• Roi Blanco, Harry Halpin, Daniel M. Herzig, Peter Mika, Jeffrey Pound, Henry S. Thompson, Thanh Tran: Repeatable and reliable semantic search evaluation. J. Web Sem. 21: 14-29 (2013)

Page 23: Semantic Search on the Rise

26

Question Answering

Question Answering over Linked Data competition› 2011-2014› Data

• Dbpedia and MusicBrainz in RDF

› Queries

• Full natural language questions of different forms, written by the organizers

• Multi-lingual

• Give me all actors starring in Batman Begins

› Results are defined by an equivalent SPARQL query

• Systems are free to return list of results or a SPARQL query

Page 24: Semantic Search on the Rise

Appl icat ions

27

Page 25: Semantic Search on the Rise

28

Semantic Search for…

Improving ad-hoc document retrieval› Query composition› Result presentation› Matching› Ranking

Providing new search functionality› Entity retrieval› Personalization› Related entity recommendation› Complex question-answering, relational search, computational search…› Task completion

Page 26: Semantic Search on the Rise

Exploiting Semantic Web markup(Yahoo internal prototype, 2007)

Personal and private homepageof the same person(clear from the snippet but it could be also automaticallyde-duplicated)

Conferences he plans to attend and his vacations from homepageplus bio events from LinkedIn

Geolocation

Page 27: Semantic Search on the Rise

Search snippets using Semantic Web markup

Summarization of HTML is a hard task• Template detection

• Selecting relevant snippets

• Composing readable text

› Efficiency constraints

Yahoo SearchMonkey (2008)› Enhanced results using structured data from the page

• Key/value pairs

• Deep links

• Image or Video

Page 28: Semantic Search on the Rise

Effectiveness of enhanced results (Yahoo)

Explicit user feedback› Side-by-side editorial evaluation (A/B testing)

• Editors are shown a traditional search result and enhanced result for the same page

• Users prefer enhanced results in 84% of the cases and traditional results in 3% (N=384)

Implicit user feedback› Click-through rate analysis

• Long dwell time limit of 100s (Ciemiewicz et al. 2010)

• 15% increase in ‘good’ clicks

› User interaction model

• Enhanced results lead users to relevant documents– even though less likely to clicked than textual results

• Enhanced results effectively reduce bad clicks!

See› Kevin Haas, Peter Mika, Paul Tarjan, Roi Blanco: Enhanced results for web search. SIGIR 2011: 725-734

Page 29: Semantic Search on the Rise

Enhanced results at other search providers Google announces Rich Snippets - June, 2009

› Faceted search for recipes - Feb, 2011

Bing tiles – Feb, 2011 Facebook’s Like button and the Open Graph Protocol (2010)

› Shows up in profiles and news feed› Site owners can later reach users who have liked an object

Page 30: Semantic Search on the Rise

33

Moving beyond entity markup

We would like to help our users in task completion› But we have trained our users to talk in nouns

• Retrieval performance decreases by adding verbs to queries

› Markup for actions/intents could potentially help

Modeling actions› Understand what actions can be taken on a page› Help users in mapping their query to potential actions› Applications in web search, email etc.

THING

THING

Schema.org v1.2 including Actions

vocabulary published

April 16, 2014

Page 31: Semantic Search on the Rise

Applications of Actions markup

Email (Gmail) SERP (Yandex)

Page 32: Semantic Search on the Rise

Personalized content and native ads (Yahoo)

User profiling based on entities recognized in the content consumed News and ads personalized to the user

Page 33: Semantic Search on the Rise

Entity retrieval› Which entity does a keyword query

refer to, if any?

Related entities › Which entity would the user visit next?

• Roi Blanco, B. Barla Cambazoglu, Peter Mika, Nicolas Torzec: Entity Recommendations in Web Search. ISWC 2013

Entity displays in web search(Bing/Google/Yahoo)

Page 34: Semantic Search on the Rise

Relational Search (Facebook Graph Search)

Page 35: Semantic Search on the Rise

“my friends, who is member of queen”

{band}[id:Queen1]

Queen1

queen

[member-of-v]is member of

member()

member

[member-vp]is member of [id:1]member(x,Queen1)

[who]who

-

friends

[user-filter]who is member of [id:1]

member(x,Queen1)

[start]my friends, who is member of [id:Queen1]

friends(x,me), member(x,Queen1)

[user-head]my friends

friends(x,me)

Grammar: set of production rules, capturing all possible connections, i.e. the search space of all parse trees

[start] [users] [users] my friends friends(x, me)[…] is member of [bands] member(x, $1)[bands] {band} $1…

Grammar-based Query Translation: which combination of production rules results in a parse tree that connects the recognized entities and relationships?

Relational Search (Facebook Graph Search)

Page 36: Semantic Search on the Rise

Sem. Auto-completion

- Entity + relationships - Multi-source- Domain-independent- Low manual effort

Freddie MercuryBrianMay

Queen

Queen Elizabeth 1

Liar 1971 single

PersonArtist Single

membe

r

mem

ber producer

formed in

marital

status

writer

Query Translation

Semantic Search (Graphinder)

Page 37: Semantic Search on the Rise

Freddie Mercury Queen

Queen Elizabeth 1 single

Singlewriter

single from freddy mercury que

Data Index

SchemaIndex

Keyword Interpretation- Imprecise / fuzzy matching- Match every keyword

Token rewriting via syntactic distance

Relational Query Rewriting

1) single from freddie mercury queen…

Token rewriting via semantic distance

1) single writer freddie mercury queen…

Freddie Mercury Queen

Singlewriter

Data Index

SchemaIndex

Query segmentation

1) single writer “freddie mercury” queen…

Result Retrieval & Ranking

Keyword / Key Phrase Interpretation: - Precise matching- Match keyword and key phrases

Benefits:- Higher selectivity of query terms (quality)- Reduced number of query terms (efficiency) - Better search experience…

Challenges: many rewrite candidates, some are semantically not “valid” in the relational settingsingle (marital status) writer “freddie mercury” queen (the queen of UK)

Relational Query Rewriting (Graphinder)

Page 38: Semantic Search on the Rise

Results Aggregation (Wolfram Alpha)

Page 39: Semantic Search on the Rise

Factual Search/Question Answering (Google)

Page 40: Semantic Search on the Rise

Beyond Web Search

45

Page 41: Semantic Search on the Rise

46

Beyond Web search: mobile interaction

Interaction› Question-answering› Support for interactive retrieval› Spoken-language access› Task completion

Contextualization› Personalization› Geo› Context (work/home/travel)

• Try getaviate.com

Page 42: Semantic Search on the Rise

Interactive, conversational voice search

Parlance EU project› Complex dialogs within a domain

• Requires complete semantic understanding

Complete system (mixed license)› Automated Speech Recognition (ASR)› Spoken Language Understanding (SLU)› Interaction Management› Knowledge Base› Natural Language Generation (NLG)› Text-to-Speech (TTS)

Video

Page 43: Semantic Search on the Rise

48

Conclusions

Semantic Search› Explicit understanding for queries and documents

through links to external knowledge

• Using methods of Information Extraction or explicit annotations (markup) in webpages

• Semantic Web as a source of external knowledge

Increasing level of understanding› Early focus on entities and their attributes

• Applications in web search: rich results, entity displays, entity recommendation

› Moving toward modeling intents/actions› Adding human-like interaction