exploiting semantics with structured queries

18
Hugo Zaragoza (Yahoo! Research). CLEF 2008 1 Exploiting Semantics with Structured Queries Jose Ramón Pérez-Agüera & Hugo Zaragoza U. Complutense de Madrid Yahoo! Research (Barcelona)

Upload: platt

Post on 14-Jan-2016

37 views

Category:

Documents


1 download

DESCRIPTION

Exploiting Semantics with Structured Queries. Jose Ramón Pérez-Agüera & Hugo Zaragoza U. Complutense de Madrid Yahoo! Research (Barcelona). Query expansion makes term independance a big issue… we are double counting “meanings” !!!. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Exploiting Semantics with Structured Queries

Hugo Zaragoza (Yahoo! Research). CLEF 2008 1

Exploiting Semantics with Structured Queries

Jose Ramón Pérez-Agüera & Hugo Zaragoza

U. Complutense de Madrid Yahoo! Research (Barcelona)

Page 2: Exploiting Semantics with Structured Queries

Hugo Zaragoza (Yahoo! Research). CLEF 2008 2

Query expansion makes term independance

a big issue…

we are double counting “meanings” !!!

Page 3: Exploiting Semantics with Structured Queries

Hugo Zaragoza (Yahoo! Research). CLEF 2008 3

Term independance assumption gets worse with query expansion… (example 1)

Verde que te quiero verde.

Verde viento. Verdes

ramas.

El barco sobre la mar

y el caballo en la montaña.

Con la sombra en la cintura

ella sueña en su baranda

verde carne, pelo verde,

con

ojos de fría plata.

Bajo la luna gitana, las

cosas

la están mirando y ella

no puede mirarlas.

[…]

verde3 que te quiero

verde2.

verde3 viento. verde1

ramas.

El barco sobre la mar

y el caballo en la montaña.

Con la sombra en la cintura

ella sueña en su baranda

verde5 carne, pelo verde1,

con

ojos de fría plata.

Bajo la luna gitana, las cosas

la están mirando y ella

no puede mirarlas.

[…] q1: verde1 pelo

q2: verde1 verde2 pelo

q2: verde1 verde2 verde3 verde4 verde5 pelo

q: verde pelo [CLEF EFE94, 2001 Spanish topics]

Page 4: Exploiting Semantics with Structured Queries

Hugo Zaragoza (Yahoo! Research). CLEF 2008 4

Term independance assumption gets worse with query expansion… (example 2)

[CLEF EFE94, 2001 Spanish topics][Pérez-Agüera , Zaragoza and Araujo, NLDB 2008]

- 46% !!!

Page 5: Exploiting Semantics with Structured Queries

Hugo Zaragoza (Yahoo! Research). CLEF 2008 5

• BM25 dependance model:

tf = 1 2 3 4 … 10

tfk

tfw

Term independance assumption gets worse with query expansion… (example 3)

24

4

2

2

1

1

1

1:

kkkkex

Page 6: Exploiting Semantics with Structured Queries

Hugo Zaragoza (Yahoo! Research). CLEF 2008 6

Query Expansion (example of state of the art)

• Term Selection:– Divergence From Randomness Expansion Model (DFR) Bo1 Model [8,6]:

• Term Weighting:– Rochio [9]:

tf in top x=1 document

top 40 terms document

P(term)

0.3

• Perf. Prediction:– AvICTF [5] (cheap)

> 9.0

qt tn

n

qlCq 2log

1),(AvICTF

n

nt

Page 7: Exploiting Semantics with Structured Queries

Hugo Zaragoza (Yahoo! Research). CLEF 2008 7

Results in CLEF 2008 Robust-WSD Task:

• Standard Query Expansion:

• 3rd team in CLEF Robust out of 8. 1st team well ahead of everyone.– It seems no one improved GMAP so they reported MAP

Page 8: Exploiting Semantics with Structured Queries

Hugo Zaragoza (Yahoo! Research). CLEF 2008 8

Query expansion makes term independance

a big issue…

we are double counting “meanings” !!!

Page 9: Exploiting Semantics with Structured Queries

Hugo Zaragoza (Yahoo! Research). CLEF 2008 9

“Cheap Barcelona Italian Restaurants”{cheap, barcelona, italian, restaurant }

Expansion:{cheap, barcelona, italian, restaurant, inexpensive, affordable, Sagrada Familia, Ramblas, Gràcia, Barceloneta, pizzeria, trattoria, café }

Strcuture: collect related meanings in clauses{

{cheap, inexpensive, affordable},{Barcelona, Sagrada Familia, Ramblas, Gràcia, Barceloneta, …},{Italian_restaurant, pizzeria, trattoria, café}

}

Query Clauses Idea:

c1

c2

c3

Clause independance, not term independance

Page 10: Exploiting Semantics with Structured Queries

Hugo Zaragoza (Yahoo! Research). CLEF 2008 10

Query Clauses Idea

term 1

term 2

term e1

Page 11: Exploiting Semantics with Structured Queries

Hugo Zaragoza (Yahoo! Research). CLEF 2008 11

Query Clauses Idea

term1

term e1

term e4

term2

term e2

term e3

c1

c2

c3

(same idea as BM25-F on fields [10])

Page 12: Exploiting Semantics with Structured Queries

Hugo Zaragoza (Yahoo! Research). CLEF 2008 12

Query Clauses Model

Bag of words:

Query clauses :(bag of bags of weighted words):

),(*),()( 21 CtWlttfWdscoreqt

dd

},...,,{ 10 qtttq

)},()...,,(),,{( 01100 wtwtwtc c

},...,,{ 10 qcccq

),(*,)()( 21 CcWlwttfWdscoreqc

dqt

td

Matrix notation: let , then redefine each document as

Example:

Page 13: Exploiting Semantics with Structured Queries

Hugo Zaragoza (Yahoo! Research). CLEF 2008 13

clause term frequency:

clause collection frequency:

clause document likelihood:

clause collection lihelihood:

In general projection is query-dependent and needs to be done online:

Query Clauses Implementation of W1 and W2

Page 14: Exploiting Semantics with Structured Queries

Hugo Zaragoza (Yahoo! Research). CLEF 2008 14

Query Clauses Implementation of W1 and W2

IDF is not straight-forward, there are several possibilities:

Some possibilities:– min, max, avg (leads to inconsistent situations for small weights)– expected clause idf:

)},()...,,(),,{( 01100 wtwtwtc c )(...,),(),( 10 ctidftidftidf

ct

cttd

td

wttf

wttftidfdcicf

)(

)()(),(

Page 15: Exploiting Semantics with Structured Queries

Hugo Zaragoza (Yahoo! Research). CLEF 2008 15

How can we construct the clauses?

• Idea: use WordNet to expand each term in the query as a clause.

• Idea: use statistical methods to expand each term in the query.

• Idea: use query expansion to find terms, use statistical methods to group the, into clauses.

• Idea: use query expansion to find terms, use WordNet to group them into clauses. – There exist several semantic similarity measures based on WordNet [11]:

WN(s1,s2) – We construct a clause for every original query term, and we add to it expanded

terms with:WN(s1,s2) < k

– To be conservative, all terms not in an original clause are added together to a new “Other” clause.

Page 16: Exploiting Semantics with Structured Queries

Hugo Zaragoza (Yahoo! Research). CLEF 2008 16

• Implementation:

DFR Expansion: 40 new terms extracted for each query.

Query Clauses:

Ranking: BM25 with standard params, on clauses:

WordNet Similarity

DFR)},()...,,(),,{( 01100 wtwtwtc c

},...,,{ 10 qcccq

Results in CLEF 2008 Robust-WSD Task:

icf

tqt

d wttfctf )(

Query Clauses

Page 17: Exploiting Semantics with Structured Queries

Hugo Zaragoza (Yahoo! Research). CLEF 2008 17

Results in CLEF 2008 Robust-WSD Task:

4% rel. impr.

(overall results)

• 2nd team in CLEF Robust, 1st team well ahead without use of WSD.

clauses

Page 18: Exploiting Semantics with Structured Queries

Hugo Zaragoza (Yahoo! Research). CLEF 2008 18

Biblio

[10] H. Zaragoza, N. Craswell, M. Taylor, S. Saria, and S. Robertson. Microsoft Cambridge at TREC 13: Web and hard tracks. In Text REtrieval Conference (TREC-13), 2004.

[11] Z. Wu and M. Palmer, Verb semantics and lexical selection, 32nd. Annual Meeting of the Association for Computational Linguistics, ACL 1991.