acm cikm 2008, oct. 26-30, napa valley 1 mining term association patterns from search logs for...
TRANSCRIPT
ACM CIKM 2008, Oct. 26-30, Napa Valley 1
Mining Term Association Patterns from
Search Logs for Effective Query
Reformulation
Xuanhui Wang and ChengXiang Zhai
Department of Computer Science
University of Illinois at Urbana-Champaign
ACM CIKM 2008, Oct. 26-30, Napa Valley 2
Ineffective Queries
reduce space command latex
ACM CIKM 2008, Oct. 26-30, Napa Valley 3
Effective Queries
squeeze space command latex
ACM CIKM 2008, Oct. 26-30, Napa Valley 4
More Examples
• If you want to wash your vehicle
– “vehicle wash”, “auto wash”
– “car wash”, “truck wash”
• If you want to buy a car
– “auto quotes”
– “auto sale quotes”?
– “auto insurance quotes”?
ACM CIKM 2008, Oct. 26-30, Napa Valley 5
What Makes a Query Ineffective?
• Vocabulary mismatch
– “reduce space command latex” vs “squeeze space command latex”
– “auto wash” vs “car wash”
• Lack of discrimination
– “auto quotes” vs “auto sale quotes”
• …
How can we help improving ineffective queries?
Term substitution
Term addition
ACM CIKM 2008, Oct. 26-30, Napa Valley 6
Our Contribution
• We cast query reformulation as term level pattern mining from search logs
• We define two basic types of patterns at term level and propose probabilistic methods
– Context-sensitive term substitution
• “autocar | _wash”, “car auto | _trade”
– Context-sensitive term addition
• “+sale | auto_quotes”
• We evaluate our methods on commercial search engine logs and show their effectiveness
ACM CIKM 2008, Oct. 26-30, Napa Valley 7
Problem Formulation
QueryCollection
Task 1:Contextual
Models
Task 2:Translation
Models
q = auto wash
Task 3: Pattern Mining
autocar | _washautotruck | _wash
+southland | _auto wash…
Patterns
Search logs
Offline part Online part
car washtruck washsouthland auto wash
…
ACM CIKM 2008, Oct. 26-30, Napa Valley 8
Task 1: Contextual Models
enterprise car rental rental car budget car rental car pricing car pictures car accidents…
G: General context
• Syntagmatic relations
• Capture terms frequently co-occur with w inside queries
Sample query collection
rental: 0.375enterprise: 0.125budget: 0.125pricing: 0.125…
Model PG( * |car)
ACM CIKM 2008, Oct. 26-30, Napa Valley 9
Task 1: Contextual Models
enterprise car rental rental car budget car rental car pricing car pictures car accidents…
Model: P L1( * | car)
• Syntagmatic relations
• Capture terms frequently co-occur with w inside queries
Sample query collection
rental: 0.333enterprise: 0.333budget: 0.333…
L1: 1st Left Context
ACM CIKM 2008, Oct. 26-30, Napa Valley 10
Task 1: Contextual Models
enterprise car rental rental car budget car rental car pricing car pictures car accidents…
Model: P R1( * |w)
• Syntagmatic relations
• Capture terms frequently co-occur with w inside queries
Sample query collection
rental: 0.4pricing: 0.2pictures: 0.2accidents: 0.2 …
R1: 1st Right context
ACM CIKM 2008, Oct. 26-30, Napa Valley 11
Task 2: Translation Models
• Paradigmatic relations (“car” and “auto”)
• Capture terms that are substitutable with w
• Similar contexts high translation probability
• Translation models
Probability of generating s’s context from w’s contextual model
Size of L1 context Size of R1 context
ACM CIKM 2008, Oct. 26-30, Napa Valley 12
Task 3.1: Pattern Mining–Term Substitution
q=[w1…wi-1wiwi+1…wn]
q’=[w1…wi-1swi+1…wn]
Substitute wi by s
Which word s should be chosen?Local factor
Global factor:translation model
ACM CIKM 2008, Oct. 26-30, Napa Valley 13
Estimating Local Factor
Independence
w1…wi-1__wi+1…wn
s
)|(~
11swP
iL )|(
~11
swP iL )|(~
11swP iR )|(
~2 swP
inR … …
Ignore those terms far away
ACM CIKM 2008, Oct. 26-30, Napa Valley 14
Task 3.2: Pattern Mining–Term Addition
q=[w1…wi-1wi…wn]
q’=[w1…wi-1rwi…wn]
Adding r before wi
Similar to the Local Factor in Term Substitution Patterns
Uniform
ACM CIKM 2008, Oct. 26-30, Napa Valley 15
Evaluation: Data Preparation
• From Microsoft Live Labs
5/1/2006 5/31/20065/20/2006
History Logs Future logs
History Collection4.4M queries
1.6M are distinct1.3M user sessions
Used to construct test
cases
ACM CIKM 2008, Oct. 26-30, Napa Valley 16
Examples of Contextual Models
• Left and Right contexts are different
• General context mixed them together
ACM CIKM 2008, Oct. 26-30, Napa Valley 17
Examples of Translation Models
• Conceptually similar keywords have high translation probabilities
• Provide possibility for exploratory search in an interactive manner
ACM CIKM 2008, Oct. 26-30, Napa Valley 18
Examples of Term Substitution
• Substitution is context sensitive
• Intuitively, reworded queries are more effective
ACM CIKM 2008, Oct. 26-30, Napa Valley 19
Effectiveness Comparison of Term Substitution – Experiment Design
Q1 Q2 Qk
R21
R22
R23
…
Rk1
Rk2
Rk3
…
C3C2
C1
Session …
…
How well can a reformulated query rank C1, C2, and C3 on the top?
Q1reformulation Q1’
dx
C3
C1
C2
dx
…
Q2’ Q3’
dx
C1
dx
dx
dx
…
dx
C2
dx
C3
dx
…P@5 0.6 0.2 0.4
Best P@5=0.6
ACM CIKM 2008, Oct. 26-30, Napa Valley 20
Results
Our method reformulates queries more effectively
[Jones’06]
Our method
#Recommended Queries
ACM CIKM 2008, Oct. 26-30, Napa Valley 21
Term Addition Patterns
Term addition patterns can refine a broad query
ACM CIKM 2008, Oct. 26-30, Napa Valley 22
Related Work
• Query suggestions [e.g., Jones’06, Sahami et al’06]– Discover pattern at query level
– Rely on external resources or training data
– Does not consider the effectiveness
• Query modifications in IR [Rocchio’71, Anick’03]– Expand queries from returned documents
– Does not rely on search logs, mostly adding terms
• Related work in NLP community [Lin’98, Rapp’02]– Finding synonym or near synonyms
– Syntagmatic and paradigmatic relations
– Not used for query reformulation
ACM CIKM 2008, Oct. 26-30, Napa Valley 23
Conclusions and Future Work
• We propose a new way to mine search logs for patterns to address ineffective queries– Vocabulary mismatch
– Lack of discrimination
• We define and mine two basic patterns at term level– Context-sensitive term substitution patterns
– Context-sensitive term addition patterns
• Experiments show the effectiveness of our methods
• In the future, – Use relevance judgments instead of clicks
– Exploit click information for better query reformulation