learning joint query interpretation and response ranking uma sawant soumen chakrabarti iit bombay

34
Learning Joint Query Interpretation and Response Ranking Uma Sawant Soumen Chakrabarti IIT Bombay

Upload: sam-bruckman

Post on 15-Dec-2015

228 views

Category:

Documents


1 download

TRANSCRIPT

Learning Joint Query Interpretation and Response Ranking

Uma SawantSoumen Chakrabarti

IIT Bombay

Searching the “Web of things”

Lin et. al., WWW 2012

At least 14% of Web search

queries mention target type or

category

Telegraphic entity search queries

Telegraphic queries with target type

woodrow wilson president university

dolly clone institute

hermitage museum bank river

lead singer led zeppelin band

losing team baseball world series 1998

No reliable syntax clues for the search engine• Free word order• No or rare capitalization• Rare to find quoted

phrases• Few function or relational

words

Execution Ready Query

Telegraphic

NLQ

Template

Query Interpretation

Ranking

2-stage process

How to answer entity queries?(simplified view of related work)

e1e2e3

Knowledge base

Telegraphic Query

Our Proposal

e1e2e3

AnnotatedCorpus

Interpretation

response

Interpretation

response

Interpretation

response

Generative and

Discriminative models

Multiple Interpretations

Joint Query Interpretation and Ranking

The annotated Web

… By comparison, the Padres have been to two World Series, losing in 1984 and 1998. …

Entity: San_Diego_Padres

Type: Major_league_baseball_teams

Type: All

subTypeOf

instanceOf

mentionOf

Type hierarchy

Annotateddocument

Query: losing team baseball world series 1998

Query = type hints + word matchers Large type catalog

• Most query words match some type

Padres rarely co-occurs with hockey• Can know this only

from corpus stats

Query: losing team baseball world series 1998

Incorrect type:World_Series_Hockey_teams

Query: losing team baseball world series 1998

Large type catalog• Most query words

match some type

Padres rarely co-occurs with hockey• Can know this only

from corpus stats

Need joint type inference and snippet scoring

Query: losing team baseball world series 1998

Correct Type:Major_league_baseball_teams

Entity: San Diego Padres

By comparison, the Padres have been to twoWorld Series, losing in 1984 and 1998.

mentionOf

Word matchesinstanceOf

Evidence snippet

Query = type hints + word matchers

Generative model : generate query from entity

San Diego Padres

Major league baseball team

type context

E

TPadres have been to two World Series, losing in 1984 and 1998

Type hint :

baseball , team

losing team baseball world series 1998

Z

Context matchers : lost , 1998, world seriesswitch

model model

q losing team baseball world series 1998

Choose type to

describe entity

Generative approach : plate diagram

W Z

E

T

Type description language model

For each query

Entity context language

modelChoose entity

For each query

word…

“Switch” variables:

word hints at type or is a matcher?

Generate query word

hints matchers

Discriminative model : separatecorrect and incorrect entities

Chakrabarti

San_Diego_Padres

losing team baseball world series 1998(baseball team)

losing team baseball world series 1998(baseball team)

losing team baseball world series 1998(t = baseball team)

1998_World_Series

losing team baseball world series 1998

(series)

losing team baseball world series 1998

(series)

losing team baseball world series 1998

(t = series)

: losing team baseball world series 1998q

Compatibility between matchers and snippets

that mention e

Feature vector design inspired by generative

Feature vector given query, entity, type,

switches

Models type prior

Pr(t|e)

Models entity prior

Compatibility between hint words and type

Hints Matchers

Generative:

Discriminative:

Discriminative framework

Non-convex formulation Annealing algorithms

Constraints are formulated using the best scoring interpretation

Testbed

YAGO entity and type catalog• ~0.2 million types and 1.9 million entities

Annotated corpus• Web corpus having 500 million pages• ~ 16 annotations per page

~700 entity search queries• TREC + INEX • Converted to telegraphic form, with most probable type

and answer entities

Experiment 1 : Entity ranking using joint inference

To reach : Human recommended type To surpass : Most generic type in catalog (no type

inference) Entity level ndcg measure (map and mrr follow the

same trend, details in paper)

Human > Discriminative > Generative > Generic

Generative significantly better than generic (lower)• Generative fills 28% gap to human (upper)

Discriminative significantly better than generic (lower)• Discriminative fills 43% gap to human (upper)

Discriminative significantly better than generative• Easier to handle balance diverse scales of probabilities

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

1 2 3 4 5 6 7 8 9 10Rank

ND

CG

humandiscriminativegenerativegeneric

Human > ?? > Generic

Generic v/s discriminativeCorrect hint match & type choicecathedral claude monet painting

Incorrect hint match & type choiceamazing grace hymn writer

Discriminative better than human Correct entity unreachable from human

recommended type • discriminative recovers using corpus feedback

patsy cline producer patsy cline producer

producer manufacturer

Discriminative

Owen Bradley

Experiment 2 : Target Type Inference Aggregate ranks of top-k interpretations to rank

types Compare type-level ndcg with B&N 2012

hermitage museum bank river (museum)

hermitage museum bank river (river)

hermitage museum bank river (building)

rivermuseumbuilding

possibletarget type

... ...

k

Joint prediction improves type inference

Data : [B&N 2012], Dbpedia catalog

Joint prediction improves type inference too!

(river)+ matchers

Experiment 3 : joint v/s two-stage Two-stage

1. Best type prediction from experiment (2)

2. Launch type restricted query on annotated corpus

Top m types to improve recall Measure entity-level ndcg

rivermuseumbuilding

Stage 1 Type inference Form query

(river OR museum)+ matchers

Ranking

Stage 2 Ranking

Joint entity ranking ?? two-stage

Not much difference with the benefit of more types in 2-stage

Joint type prediction and ranking significantly better than 2-stage

0.2

0.3

0.4

0.5

0.6

1 2 3 4 5 6 7 8 9 10Rank

ND

CG

Joint2stage(m=1)2stage(m=5)2stage(m=10)

Joint entity ranking better than two-stage

Conclusion Large percentage of Web search queries contain a

mention of the target type Identification of target type hint words and type

itself is rewarding, but non-trivial Joint query interpretation and ranking approach

significantly better than two stage Joint prediction improves type inference Datasets available at bit.ly/WSpxvr

Questions?

References1) Patrick Pantel, Thomas Lin, Michael Gamon:

Mining Entity Types from Query Logs via User Intent Modeling. ACL (1) 2012: 563-571

2) K. Balog and R. Neumayer: Hierarchical Target Type Identification for Entity-oriented Queries, In CIKM 2012, October 2012

3) T. Lin, P. Pantel, M. Gamon, A. Kannan, A. Fuxman: Active Objects: Actions for Entity-Centric Search, WWW 2012

Extra slides

Chakrabarti

Components of the model Entity prior

• (Weighted) fraction of snippets attached to an entity in the corpus

Type• Generality or specificity of types

Hint-type compatibility• Probability of generating hint words from a language

model built using type description• Hint sub-sequence matches some type name exactly

Matcher-entity compatibility• Weighted fraction of snippets attached to an entity,

retrieved using matchers• Rarity of matchers + number of supporting snippets

Implementation details

Additive features• One generic query executed on index, rest in memory

Pruned large search space using easy heuristics• Continuous hint words

Not entity disambiguation in query

ymca in query refers to song or organization? Similar to entity disambiguation in documents Uses accompanying words Misinterpreting target type: usually disastrous Avoid early or hard commitment

Query:ymca lyrics

Query:ymca address

Entity:YMCA_(song)

Entity:YMCA_(org)

Type: Music Type: Organization

instanceOf instanceOf

Lear

n to

pic

mod

el

Lear

n to

pic

mod

el

Better type description model More generic query than “hint+matchers”

Entities as literals

Different models Explore non-linear models (boosting) List-wise loss

Use click data

Future work

Choose type to

describe entity

Generative framework

W Z

E

T

Type description language model

For each query…

Entity context language model

Choose entity to describe

For each query word…

“Switch” variables: decide if

word hints at type or is a

matcher

Generate query word

Compatibility between matchers and snippets

that mention e

Discriminative framework

Feature vector given query, entity, type,

switches

Models type prior

Pr(t|e)

Models entity prior

Compatibility between hint words and type

Hints Matchers

Given q, score of response e is:

Ranking model trained by distant supervision

Joint entity ranking better than two-stage State of the art target

type predictor• Does not use corpus

information Pick top k types to

improve type recall Launch type-

restricted query on annotated corpus

Significantlyworse than jointtype predictionand ranking 0.2

0.3

0.4

0.5

0.6

1 2 3 4 5 6 7 8 9 10Rank

ND

CG

Joint2stage(k=1)2stage(k=5)2stage(k=10)

Execution Ready Query

Telegraphic

NLQ

Template

Query Interpretation

Ranking

2-stage process

How to answer entity queries?

(simplified viewof related work)

e1e2e3

RDFtuples

AnnotatedCorpus

Tables

Kn

owle

dge