maximum personalization: user-centered adaptive information retrieval

73
Maximum Personalization: Maximum Personalization: User-Centered User-Centered Adaptive Information Retrieval Adaptive Information Retrieval ChengXiang (“Cheng”) Zhai Department of Computer Science Graduate School of Library & Information Science Department of Statistics Institute for Genomic Biology University of Illinois at Urbana- Champaign 1 Yahoo! Research, Jan. 12, 2011

Upload: gram

Post on 15-Jan-2016

37 views

Category:

Documents


0 download

DESCRIPTION

Maximum Personalization: User-Centered Adaptive Information Retrieval. ChengXiang (“Cheng”) Zhai Department of Computer Science Graduate School of Library & Information Science Department of Statistics Institute for Genomic Biology University of Illinois at Urbana-Champaign. Happy Users. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Maximum Personalization: User-Centered  Adaptive Information Retrieval

Maximum Personalization:Maximum Personalization:User-Centered User-Centered

Adaptive Information Retrieval Adaptive Information Retrieval

ChengXiang (“Cheng”) ZhaiDepartment of Computer Science

Graduate School of Library & Information Science

Department of Statistics

Institute for Genomic Biology

University of Illinois at Urbana-Champaign

1Yahoo! Research, Jan. 12, 2011

Page 2: Maximum Personalization: User-Centered  Adaptive Information Retrieval

Happy Users

Yahoo! Research, Jan. 12, 2011 2

Query: avatar hotel

Page 3: Maximum Personalization: User-Centered  Adaptive Information Retrieval

Sad Users

Yahoo! Research, Jan. 12, 2011 3

They’ve got to know the users better!

I work on information retrieval; I searched for similar pages last week; I clicked on AIRS-related pages (including keynote); …

How can search engines better help these users?

Page 4: Maximum Personalization: User-Centered  Adaptive Information Retrieval

Yahoo! Research, Jan. 12, 2011 4

Current Search Engines are Document-Centered

Documents

“airs”Search Engine “airs”

...

It’s hard for a search engine to know everyone well!

Page 5: Maximum Personalization: User-Centered  Adaptive Information Retrieval

Yahoo! Research, Jan. 12, 2011 5

To maximize personalization, we must put a user in the center!

Search Engine

“airs”

...Personalized search agent

WEB

Search Engine

EmailViewed Web pages

QueryHistory

Search Engine

DesktopFiles

Personalized search agent

“airs”

A search agent knows about a particular user very well

Page 6: Maximum Personalization: User-Centered  Adaptive Information Retrieval

Yahoo! Research, Jan. 12, 2011 6

User-Centered Adaptive IR (UCAIR)

• A novel retrieval strategy emphasizing

– user modeling (“user-centered”)

– search context modeling (“adaptive”)

– interactive retrieval

• Implemented as a personalized search agent that

– sits on the client-side (owned by the user)

– integrates information around a user (1 user vs. N sources as opposed to 1 source vs. N users)

– collaborates with each other

– goes beyond search toward task support

Page 7: Maximum Personalization: User-Centered  Adaptive Information Retrieval

Much work has been done on personalization

Yahoo! Research, Jan. 12, 2011 7

• Personalized data collection: Haystack [Adar & Karger 99], MyLifeBit [Gemmell et al. 02], Stuff I’ve Seen [Dumais et al. 03] , Total Recall [Cheng et al. 04], Google desktop search, Microsoft desktop search

• Server-side personalization: My Yahoo! [Manber et al. 00], Personalized Google Search

• Capturing user information & search context: SearchPad [Bharat 00], Watson [Budzik & Hammond 00], Intellizap [Finkelstein et

al. 01], Understanding clickthrough data [Joachmis et al. 05]

• Implicit feedback: SVM [Joachims 02] , BM25 [Teevan et al. 05] , Language models [Shen et al. 05]

However, we are far from

unleashing the full power of personalization

Page 8: Maximum Personalization: User-Centered  Adaptive Information Retrieval

UCAIR is unique in emphasizing maximum exploitation of client-side personalization

Yahoo! Research, Jan. 12, 2011 8

• Benefit of client-side personalization

• More information about the user, thus more accurate user modeling

– Can exploit the complete interaction history (e.g., can easily capture all click-through information and navigation activities)

– Can exploit user’s other activities (e.g., searching immediately after reading an email)

• Naturally scalable

• Alleviate the problem of privacy

• Can potentially maximize benefit of personalization

Page 9: Maximum Personalization: User-Centered  Adaptive Information Retrieval

Maximum Personalization = Maximum User Information Maximum Exploitation of User Info.

Client-Side Agent

(Frequent + Optimal) Adaptation

Yahoo! Research, Jan. 12, 2011 9

Page 10: Maximum Personalization: User-Centered  Adaptive Information Retrieval

Examples of Useful User Information• Textual information

– Current query

– Previous queries in the same search session

– Past queries in the entire search history

• Clicking activities

– Skipped documents

– Viewed/clicked documents

– Navigation traces on non-search results

– Dwelling time

– Scrolling

• Search context

– Time, location, task, …

Yahoo! Research, Jan. 12, 2011 10

Page 11: Maximum Personalization: User-Centered  Adaptive Information Retrieval

Examples of Adaptation• Query formulation

– Query completion: provide assistance while a user enters a query

– Query suggestion: suggest useful related queries

– Automatic generation of queries: proactive recommendation

• Dynamic re-ranking of unseen documents

– As a user clicks on the “back” button

– As a user scrolls down on a result list

– As a user clicks on the “next” button to view more results

• Adaptive presentation/summarization of search results

• Adaptive display of a document: display the most relevant part of a document

Yahoo! Research, Jan. 12, 2011 11

Page 12: Maximum Personalization: User-Centered  Adaptive Information Retrieval

Yahoo! Research, Jan. 12, 2011 12

Challenges for UCAIR• General: how to obtain maximum personalization without

requiring extra user effort?

• Specific challenges

– What’s an appropriate retrieval framework for UCAIR?

– How do we optimize retrieval performance in interactive retrieval?

– How can we capture and manage all user information?

– How can we develop robust and accurate retrieval models to maximally exploit user information and search context?

– How do we evaluate UCAIR methods?

– …

Page 13: Maximum Personalization: User-Centered  Adaptive Information Retrieval

Yahoo! Research, Jan. 12, 2011 13

The Rest of the Talk

• Part I: A decision-theoretic framework for UCAIR

• Part II: Algorithms for personalized search

– Optimize initial document ranking

– Dynamic re-ranking of search results

– Personalize search result presentation

• Part III: Summary and open challenges

Page 14: Maximum Personalization: User-Centered  Adaptive Information Retrieval

Yahoo! Research, Jan. 12, 2011 14

Part I

A Decision-Theoretic Framework for UCAIR

Page 15: Maximum Personalization: User-Centered  Adaptive Information Retrieval

Yahoo! Research, Jan. 12, 2011 15

IR as Sequential Decision Making

User System

A1 : Enter a query Which documents to present?How to present them?

Ri: results (i=1, 2, 3, …)Which documents to view?

A2 : View documentWhich part of the document to show? How?

R’: Document contentView more?

A3 : Click on “Back” button

(Information Need) (Model of Information Need)

Page 16: Maximum Personalization: User-Centered  Adaptive Information Retrieval

Yahoo! Research, Jan. 12, 2011 16

Retrieval Decisions

User U: A1 A2 … … At-1 At

System: R1 R2 … … Rt-1

Given U, C, At , and H, choosethe best Rt from all possibleresponses to At

History H={(Ai,Ri)} i=1, …, t-1

DocumentCollection

C

Query=“Jaguar”

All possible rankings of C

The best ranking for the query

Click on “Next” button

All possible rankings of unseen docs

The best ranking of unseen docs

Rt r(At)

Rt =?

Page 17: Maximum Personalization: User-Centered  Adaptive Information Retrieval

Yahoo! Research, Jan. 12, 2011 17

A Risk Minimization Framework

User: U Interaction history: HCurrent user action: At

Document collection: C

Observed

All possible responses: r(At)={r1, …, rn}

User Model

M=(S, U,… ) Seen docs

Information need

L(ri,At,M) Loss Function

Optimal response: r* (minimum loss)

( )arg min ( , , ) ( | , , , )tt r r A t tM

R L r A M P M U H A C dM ObservedInferredBayes risk

Page 18: Maximum Personalization: User-Centered  Adaptive Information Retrieval

Yahoo! Research, Jan. 12, 2011 18

• Approximate the Bayes risk by the loss at the mode of the posterior distribution

• Two-step procedure

– Step 1: Compute an updated user model M* based on the currently available information

– Step 2: Given M*, choose a response to minimize the loss function

A Simplified Two-Step Decision-Making Procedure

( )

( )

( )

arg min ( , , ) ( | , , , )

arg min ( , , *) ( * | , , , )

arg min ( , , *)

* arg max ( | , , , )

t

t

t

t r r A t tM

r r A t t

r r A t

M t

R L r A M P M U H A C dM

L r A M P M U H A C

L r A M

where M P M U H A C

Page 19: Maximum Personalization: User-Centered  Adaptive Information Retrieval

Yahoo! Research, Jan. 12, 2011 19

Approximately Optimal Interactive Retrieval

User

A1

U C

M*1P(M1|U,H,A1,C)

L(r,A1,M*1)

R1A2

L(r,A2,M*2)

R2

M*2P(M2|U,H,A2,C)

A3 …

Collection

IR system

Many possible actions:-type in a query character- scroll down a page- click on any button -…

Many possible responses:-query completion-display relevant passage-recommendation -clarification-…

Page 20: Maximum Personalization: User-Centered  Adaptive Information Retrieval

Yahoo! Research, Jan. 12, 2011 20

Refinement of Risk Minimization• r(At): decision space (At dependent)

– r(At) = all possible rankings of docs in C

– r(At) = all possible rankings of unseen docs

– r(At) = all possible summarization strategies

– r(At) = all possible ways to diversify top-ranked documents

• M: user model – Essential component: U = user information need

– S = seen documents

– n = “Topic is new to the user”; r=“reading level of user”

• L(Rt ,At,M): loss function

– Generally measures the utility of Rt for a user modeled as M

– Often encodes retrieval criteria, but may also capture other preferences

• P(M|U, H, At, C): user model inference

– Often involves estimating the unigram language model U

– May involve inference of other variables also (e.g., readability, tolerance of redundancy)

Page 21: Maximum Personalization: User-Centered  Adaptive Information Retrieval

Yahoo! Research, Jan. 12, 2011 21

Case 1: Context-Insensitive IR– At=“enter a query Q”

– r(At) = all possible rankings of docs in C

– M= U, unigram language model (word distribution)

– p(M|U,H,At,C)=p(U |Q)

1

1

1 2

( , , ) (( ,..., ), )

( | ) ( || )

( | ) ( | ) ....

( || )

i

i

i t N U

N

i U di

t U d

L r A M L d d

p viewed d D

Since p viewed d p viewed d

the optimal ranking R is given by ranking documents by D

Page 22: Maximum Personalization: User-Centered  Adaptive Information Retrieval

Yahoo! Research, Jan. 12, 2011 22

Case 2: Implicit Feedback

– At=“enter a query Q”

– r(At) = all possible rankings of docs in C

– M= U, unigram language model (word distribution)

– H={previous queries} + {viewed snippets}

– p(M|U,H,At,C)=p(U |Q,H)

1

1

1 2

( , , ) (( ,..., ), )

( | ) ( || )

( | ) ( | ) ....

( || )

i

i

i t N U

N

i U di

t U d

L r A M L d d

p viewed d D

Since p viewed d p viewed d

the optimal ranking R is given by ranking documents by D

Page 23: Maximum Personalization: User-Centered  Adaptive Information Retrieval

Yahoo! Research, Jan. 12, 2011 23

Case 3: General Implicit Feedback

– At=“enter a query Q” or “Back” button, “Next” button

– r(At) = all possible rankings of unseen docs in C

– M= (U, S), S= seen documents

– H={previous queries} + {viewed snippets}

– p(M|U,H,At,C)=p(U |Q,H)

1

1

1 2

( , , ) (( ,..., ), )

( | ) ( || )

( | ) ( | ) ....

( || )

i

i

i t N U

N

i U di

t U d

L r A M L d d

p viewed d D

Since p viewed d p viewed d

the optimal ranking R is given by ranking documents by D

Page 24: Maximum Personalization: User-Centered  Adaptive Information Retrieval

Yahoo! Research, Jan. 12, 2011 24

Case 4: User-Specific Result Summary – At=“enter a query Q”

– r(At) = {(D,)}, DC, |D|=k, {“snippet”,”overview”}

– M= (U, n), n{0,1} “topic is new to the user”

– p(M|U,H,At,C)=p(U, n|Q,H), M*=(*, n*)

( , , ) ( , , *, *)

( , *) ( , *)

( * || ) ( , *)i

i t i i

i i

d id D

L r A M L D n

L D L n

D L n

n*=1 n*=0

i=snippet 1 0i=overview 0 1

( , *)iL n

Choose k most relevant docs If a new topic (n*=1), give an overview summary;otherwise, a regular snippet summary

Page 25: Maximum Personalization: User-Centered  Adaptive Information Retrieval

Yahoo! Research, Jan. 12, 2011 25

Part II. Algorithms for personalized search

- Optimize initial document ranking - Dynamic re-ranking of search results - Personalize search result presentation

Page 26: Maximum Personalization: User-Centered  Adaptive Information Retrieval

Scenario 1: After a user types in a query, how to exploit long-term search history to

optimize initial results?

Yahoo! Research, Jan. 12, 2011 26

Page 27: Maximum Personalization: User-Centered  Adaptive Information Retrieval

Yahoo! Research, Jan. 12, 2011 27

Case 2: Implicit Feedback

– At=“enter a query Q”

– r(At) = all possible rankings of docs in C

– M= U, unigram language model (word distribution)

– H={previous queries} + {viewed snippets}

– p(M|U,H,At,C)=p(U |Q,H)

1

1

1 2

( , , ) (( ,..., ), )

( | ) ( || )

( | ) ( | ) ....

( || )

i

i

i t N U

N

i U di

t U d

L r A M L d d

p viewed d D

Since p viewed d p viewed d

the optimal ranking R is given by ranking documents by D

Page 28: Maximum Personalization: User-Centered  Adaptive Information Retrieval

28

Long-term Implicit Feedback from Personal Search Log

Search interests:user interested in X(champaign, luxury car)

consistent & distinct

Most useful forambiguous queries

Search preferences:For Y, user prefers Xquotes → newcars.com

Most useful forrecurring queries

session

query champaign map......query jaguarquery champaign jaguarclick champaign.il.auto.comquery jaguar quotesclick newcars.com......query yahoo mail......query jaguar quotesclick newcars.com

noise

recurringquery

avg 80 queries / mo

Yahoo! Research, Jan. 12, 2011

Page 29: Maximum Personalization: User-Centered  Adaptive Information Retrieval

29

Estimate Query Language Model using the Entire Search History

q1D1C1

S1

θS1

q2D2C2

S2

θS2

... qt-1Dt-1Ct-1

St-1

θSt-1

θH

qtDt

St

θq

θq,H

λ1?λ2?

λq?

How can we optimize λkand λq?

-Need to distinguish informative/noisy past searches-Need to distinguish queries with strong vs. weak support from history

1-λq

λt-1?

Yahoo! Research, Jan. 12, 2011

Page 30: Maximum Personalization: User-Centered  Adaptive Information Retrieval

30

Adaptive Weighting withMixture Model [Tan et al. 06]

θS1θS2

θSt-1...

θH

θq,H

λ1

λ2λt-1

λqθB

1-λq

θq

λB 1-λB

<d1>jaguar car official site racing<d2>jaguar is a big cat...<d3>local jaguar dealerin champaign...

querypast jaguar searchespast champaign searchesbackground

θmix

select {λ} to maximize P(Dt | θmix)

Dt

EM algorithm

Yahoo! Research, Jan. 12, 2011

Page 31: Maximum Personalization: User-Centered  Adaptive Information Retrieval

31

Sample Results: improving initial ranking with long-term implicit feedback

recurring fresh≫

combination ≈ clickthrough > docs > query, contextless

Yahoo! Research, Jan. 12, 2011

Page 32: Maximum Personalization: User-Centered  Adaptive Information Retrieval

Scenario 2: The user is examining search results, how can we further dynamically optimize search

results based on clickthroughs?

Yahoo! Research, Jan. 12, 2011 32

Page 33: Maximum Personalization: User-Centered  Adaptive Information Retrieval

Yahoo! Research, Jan. 12, 2011 33

Case 3: General Implicit Feedback

– At=“enter a query Q” or “Back” button, “Next” button

– r(At) = all possible rankings of unseen docs in C

– M= (U, S), S= seen documents

– H={previous queries} + {viewed snippets}

– p(M|U,H,At,C)=p(U |Q,H)

1

1

1 2

( , , ) (( ,..., ), )

( | ) ( || )

( | ) ( | ) ....

( || )

i

i

i t N U

N

i U di

t U d

L r A M L d d

p viewed d D

Since p viewed d p viewed d

the optimal ranking R is given by ranking documents by D

Page 34: Maximum Personalization: User-Centered  Adaptive Information Retrieval

Yahoo! Research, Jan. 12, 2011 34

Estimate a Context-Sensitive LM

Q2

C2={C2,1 , C2,2 ,C2,3 ,… }…

C1={C1,1 , C1,2 ,C1,3 ,…} User Clickthrough

Qk

Q1 User Query e.g., Apple software

e.g., Apple - Mac OS X The Apple Mac OS X product page. Describes features in the current version of Mac OS X, …

e.g., Jaguar

1 1 1 1,...,( | ,) ( | ,...,, ) ?k kk kp w p Q CQ Q Cw User Model:

Query History Clickthrough

Page 35: Maximum Personalization: User-Centered  Adaptive Information Retrieval

Yahoo! Research, Jan. 12, 2011 35

Method1: Fixed Coeff. Interpolation (FixInt)

Qk

Q1

Qk-1

C1

Ck-1

Average user query history and clickthrough

CH

QH1

11

1

( | ) ( | )k

Q iki

p w H p w Q

11

11

( | ) ( | )k

C iki

p w H p w C

1

HLinearly interpolate history models

( | ) ( | ) (1 ) ( | )C Qp w H p w H p w H

k

1

Linearly interpolate current queryand history model

( | ) ( | ) (1 ) ( | )k kp w p w Q p w H

Page 36: Maximum Personalization: User-Centered  Adaptive Information Retrieval

Yahoo! Research, Jan. 12, 2011 36

Method 2: Bayesian Interpolation (BayesInt)

Q1

Qk-1

C1

Ck-1

Average user query andclickthrough history

CH

QH1

11

1

( | ) ( | )i k

Q iki

p w H p w Q

11

11

( | ) ( | )i k

C iki

p w H p w C

Intuition: trust the current query Qk more if it’s longer

Qk

Dirichlet Prior

( , ) ( | ) ( | )

| |( | ) k Q C

k

c w Q p w H p w H

k Qp w

k

Page 37: Maximum Personalization: User-Centered  Adaptive Information Retrieval

Yahoo! Research, Jan. 12, 2011 37

Method 3: Online Bayesian Updating (OnlineUp)

'1k

Qk k

C2'2

v

Q1 1Intuition: incremental updating of the language model

C1

v'

1( , )

|| )' (

|( | ) i

i

ic p ww Ci C vp w

Q2 2 '

1( ,|

))|

( |( | ) i

i

ic w Q p wi Qp w

Page 38: Maximum Personalization: User-Centered  Adaptive Information Retrieval

Yahoo! Research, Jan. 12, 2011 38

Method 4: Batch Bayesian Update (BatchUp)

C2

1k

…Ck-1

'k

1

1

1

1

( , ) ( | )'

| |( | )

ij kj

ijj

c w C p w

k Cp w

Intuition: all clickthrough data are equally useful

Qk k

Q1 1

C1

1( , ) ( | )| |( | ) i i

i

c w Q p wi Qp w

Q2 2

Page 39: Maximum Personalization: User-Centered  Adaptive Information Retrieval

Yahoo! Research, Jan. 12, 2011 39

Overall Effect of Search Context [Shen et al. 05b]

Query

FixInt

(=0.1,=1.0)

BayesInt

(=0.2,=5.0)

OnlineUp

(=5.0,=15.0)

BatchUp

(=2.0,=15.0)

MAP pr@20 MAP pr@20 MAP pr@20 MAP pr@20

Q3 0.0421 0.1483 0.0421 0.1483 0.0421 0.1483 0.0421 0.1483

Q3+HQ+HC 0.0726 0.1967 0.0816 0.2067 0.0706 0.1783 0.0810 0.2067

Improve 72.4% 32.6% 93.8% 39.4% 67.7% 20.2% 92.4% 39.4%

Q4 0.0536 0.1933 0.0536 0.1933 0.0536 0.1933 0.0536 0.1933

Q4+HQ+HC 0.0891 0.2233 0.0955 0.2317 0.0792 0.2067 0.0950 0.2250

Improve 66.2% 15.5% 78.2% 19.9% 47.8% 6.9% 77.2% 16.4%

• Short-term context helps system improve retrieval accuracy

• BayesInt better than FixInt; BatchUp better than OnlineUp

Page 40: Maximum Personalization: User-Centered  Adaptive Information Retrieval

Yahoo! Research, Jan. 12, 2011 40

Using Clickthrough Data Only

Query MAP pr@20

Q3 0.0421 0.1483

Q3+HC 0.0766 0.2033

Improve 81.9% 37.1%

Q4 0.0536 0.1930

Q4+HC 0.0925 0.2283

Improve 72.6% 18.1%BayesInt (=0.0,=5.0)

Clickthrough is the major contributor

13.9% 67.2%Improve

0.1880.0739Q4+HC

0.1650.0442Q4

42.4%99.7%Improve

0.1780.0661Q3+HC

0.1250.0331Q3

pr@20MAPQuery

Performance on unseen docs

-4.1%15.7%Improve

0.18500.0620Q4+HC

0.19300.0536Q4

23.0%23.8%Improve

0.18200.0521Q3+HC

0.14830.0421Q3

pr@20MAPQuery

Snippets for non-relevant docs are still useful!

Page 41: Maximum Personalization: User-Centered  Adaptive Information Retrieval

Yahoo! Research, Jan. 12, 2011 41

UCAIR Outperforms Google [Shen et al. 05]

PR Curve

Page 42: Maximum Personalization: User-Centered  Adaptive Information Retrieval

Scenario 3: The user has not viewed any document on the first result page and is now clicking on “Next” to view more: how can we

optimize the search results on the next page?

Yahoo! Research, Jan. 12, 2011 42

Page 43: Maximum Personalization: User-Centered  Adaptive Information Retrieval

Problem Formulation

Query: Q

Collection C

1st page

Results

L1

L2

…Lf

Search Engine

N

2nd page Lf+1

Lf+2

Lf+r

How to rerank these unseen docs?

101st page

U

Seen, Negative

Unseen, To be Reranked

43Yahoo! Research, Jan. 12, 2011

Page 44: Maximum Personalization: User-Centered  Adaptive Information Retrieval

Strategy I: Query Modification

Q

Qnew

Q Qnew

N = {L1, …, L10}

D11

D12

D13

D14

D15

…D1010

D’11

D’12

D’13

D’14

D’15

…D’1010

NΝ D

new DQQ

||

1

parameter

44Yahoo! Research, Jan. 12, 2011

Page 45: Maximum Personalization: User-Centered  Adaptive Information Retrieval

Strategy II: Score Combination

),( DQS neg

),( DQS

),(),( DQSDQS neg

D11 0.05D12 0.04D13 0.04D14 0.03 D15 0.03…D1010 0.01

D11 0.03D12 0.05D13 0.02D14 0.01 D15 0.01…D1010 0.01

D’11 0.04D’12 0.03D’13 0.03D’14 0.01 D’15 0.01…D’1010 0.01

QQneg parameter

45Yahoo! Research, Jan. 12, 2011

Page 46: Maximum Personalization: User-Centered  Adaptive Information Retrieval

Multiple Negative Models

• Negative feedback examples may be quite diverse

– They may distract in totally different ways

– A single negative model is not optimal

• Multiple negative models

– Learn multiple models from N

– Score function for negative query

)),((),(1

k

i

inegneg DQSFDQS

F: aggregation function

Q

Q1neg

Q2neg

Q3neg

Q4neg Q5

neg

Q6neg

46Yahoo! Research, Jan. 12, 2011

Page 47: Maximum Personalization: User-Centered  Adaptive Information Retrieval

Effectiveness of Negative Feedback[Wang et al. 08]

MAP GMAP MAP GMAP

ROBUST+LM ROBUST+VSM

OriginalRank 0.0293 0.0137 0.0223 0.0097

SingleQuery 0.0325 0.0141 0.0222 0.0097

SingleNeg1 0.0325 0.0147 0.0225 0.0097

SingleNeg2 0.0330 0.0149 0.0226 0.0097

MultiNeg1 0.0346 0.0150 0.0226 0.0099

MultiNeg2 0.0363 0.0148 0.0233 0.0100

GOV+LM GOV+VSM

OriginalRank 0.0257 0.0054 0.0290 0.0035

SingleQuery 0.0297 0.0056 0.0301 0.0038

SingleNeg1 0.0300 0.0056 0.0331 0.0038

SingleNeg2 0.0289 0.0055 0.0298 0.0036

MultiNeg1 0.0331 0.0058 0.0294 0.0036

MultiNeg2 0.0311 0.0057 0.0290 0.0036

47Yahoo! Research, Jan. 12, 2011

Page 48: Maximum Personalization: User-Centered  Adaptive Information Retrieval

Scenario 4:Can we leverage user interaction history to personalize result presentation?

Yahoo! Research, Jan. 12, 2011 48

Page 49: Maximum Personalization: User-Centered  Adaptive Information Retrieval

Yahoo! Research, Jan. 12, 2011 49

Need for User-Specific Summaries

Such a snippet summary may be fine for a user who knows about the topic

But for a user who hasn’t been tracking the news, a theme-based overview summary may be more useful

Query = “Asian tsunami”

Page 50: Maximum Personalization: User-Centered  Adaptive Information Retrieval

Yahoo! Research, Jan. 12, 2011 50

A Theme Overview Summary (Asia Tsunami)

Immediate Reports

Statistics of Death and loss

Personal Experience of Survivors

Statistics of further impact

Aid from Local Areas Aid from the world

Donations from countries

Specific Events of Aid

Lessons from Tsunami Research inspired

Time

Doc1Doc3 Doc ..

Theme Evolutionary transitions

Theme evolution thread

Page 51: Maximum Personalization: User-Centered  Adaptive Information Retrieval

Yahoo! Research, Jan. 12, 2011 51

Risk Minimization for User-Specific Summary

– At=“enter a query Q”

– r(At) = {(D,)}, DC, |D|=k, {“snippet”, “overview”}

– M= (U, n), n{0,1} “topic is new to the user”

– p(M|U,H,At,C)=p(U,n|Q,H), M*=(*, n*)( , , ) ( , , *, *)

( , *) ( , *)

( || ) ( , *)i

i t i i

i i

d id D

L r A M L D n

L D L n

D L n

n*=1 n*=0

i=snippet 1 0i=overview 0 1

( , *)iL n

Task 1 = Estimating n*: p(n=1)p(Q|H)Task 2 = Generating an overview summary

Page 52: Maximum Personalization: User-Centered  Adaptive Information Retrieval

Yahoo! Research, Jan. 12, 2011 52

General problem definition: Given a text collection with time stamps Extract a theme evolution graph Model the life cycles of the most salient themes

Temporal Theme Mining for Generating Overview News Summaries

Time

Theme1.1

T1 T2 Tn…

Theme1.2…

Theme2.1

Theme2.2…

Theme3.1

Theme3.2……

T1 T2 … Tn

Theme A

Theme B

Theme life cycles

Theme evolution graph

Page 53: Maximum Personalization: User-Centered  Adaptive Information Retrieval

Yahoo! Research, Jan. 12, 2011 53

A Topic Modeling Approach [Mei & Zhai 06]

tt11

12

13

21

22

31

3k

PartitioningPartitioning

Theme Evolution GraphTheme Evolution Graph

Extracting global Extracting global salient themessalient themes(mixture model)(mixture model)

…… ……

θθ11 θθ22

θθ33

BB

…… ……

(HMM)(HMM)

Decoding Decoding CollectionCollection

s

tt

Theme Life cyclesTheme Life cycles

tt

Theme extractionTheme extraction((mixture mixture modelsmodels))

……

Collection with time stampsCollection with time stamps

Model theme transitionsModel theme transitions((KL divKL div))

Computing Theme Computing Theme StrengthStrength

t1 t2 t3, …, t

Page 54: Maximum Personalization: User-Centered  Adaptive Information Retrieval

Yahoo! Research, Jan. 12, 2011 54

Theme Evolution Graph: TsunamiT

aid 0.020relief 0.016U.S. 0.013military 0.011U.N. 0.011…

Bush 0.016U.S. 0.015$ 0.009relief 0.008million 0.008…

Indonesian 0.01military 0.01islands 0.008foreign 0.008aid 0.007…

system 0.0104Bush 0.008warning 0.007conference 0.005US 0.005…

system 0.008China 0.007warning 0.005Chinese 0.005…

warning 0.012system 0.012Islands 0.009Japan 0.005quake 0.003…

……

……

……

12/28/04 01/05/05 01/15/05 …

Page 55: Maximum Personalization: User-Centered  Adaptive Information Retrieval

Yahoo! Research, Jan. 12, 2011 55

Theme Life Cycles: Tsunami

Aid from the world

$ 0.0173million 0.0135relief 0.0134aid 0.0099U.N. 0.0066 …

Personal experiences

I 0.0322wave 0.0061beach 0.0051saw 0.0046sea 0.0046 …

CNN, Absolute StrengthCNN, Absolute Strength

Page 56: Maximum Personalization: User-Centered  Adaptive Information Retrieval

The UCAIR Prototype System

Yahoo! Research, Jan. 12, 2011 58

• A client-side search agent • Talks to any browser (both Firefox and IE)

http://timan.cs.uiuc.edu/proj/ucair

Page 57: Maximum Personalization: User-Centered  Adaptive Information Retrieval

UCAIR Screen Shots: Immediate Implicit Feedback

Yahoo! Research, Jan. 12, 2011 59

Standard mode Adaptive mode

Page 58: Maximum Personalization: User-Centered  Adaptive Information Retrieval

Screen Shots of UCAIR System: query =“airs accommodation”

Yahoo! Research, Jan. 12, 2011 60

Adaptive modeStandard mode

Page 59: Maximum Personalization: User-Centered  Adaptive Information Retrieval

Screen Shots of UCAIR: “airs regisgtration”

Yahoo! Research, Jan. 12, 2011 61

Adaptive mode Standard mode

Page 60: Maximum Personalization: User-Centered  Adaptive Information Retrieval

UCAIR Screenshots: Search History-Based Recommendation

Yahoo! Research, Jan. 12, 2011 62

Page 61: Maximum Personalization: User-Centered  Adaptive Information Retrieval

Part III. Summary and Open Challenges

Yahoo! Research, Jan. 12, 2011 63

Page 62: Maximum Personalization: User-Centered  Adaptive Information Retrieval

Yahoo! Research, Jan. 12, 2011 64

Summary • One doesn’t fit all; each user needs his/her own search

agent (especially important for long-tail search)

• User-centered adaptive IR (UCAIR) emphasizes

– Collecting maximum amount of user information and search context

– Formal models of user information needs and other user status variables

– Information integration

– Optimizing every response in interactive IR, thus potentially maximizing the effectiveness

• Preliminary results show that

– Implicit user modeling can improve search accuracy in many different ways

Page 63: Maximum Personalization: User-Centered  Adaptive Information Retrieval

Yahoo! Research, Jan. 12, 2011 65

Open Challenges• Formal user models

– More in-depth analysis of user behavior (e.g., why did the user drop a query word and add it again later?)

– Exploit more implicit feedback clues (e.g., dwelling time-based language model)

– Collaborative user modeling (e.g., smoothing of user model)

• Context-sensitive retrieval models based on appropriate loss functions – Optimize long-term utility in interactive retrieval (e.g., active feedback,

exploration-exploitation tradeoff, incorporation of Fuhr’s interactive retrieval model)

– Robust and non-intrusive adaptation (e.g., considering confidence of adaptation)

• UCAIR system extension– Right architecture: client+server? P2P?

– Design of novel interface to facilitate acquisition of user info

– Beyond search to support querying+browsing+recommendation

Page 64: Maximum Personalization: User-Centered  Adaptive Information Retrieval

Final Goal: A unified personal intelligent information agent

Yahoo! Research, Jan. 12, 2011 66

EmailWWW

E-COM

Blog Sports

Literature

IM

Desktop

Intranet

User Profile

Intelligent Adaptation

Proactive Info Service

Frequently Accessed Info

SecurityHandler

Task Support…

Page 65: Maximum Personalization: User-Centered  Adaptive Information Retrieval

67

Search

Text

Filtering

Categorization

Summarization

Clustering

Natural Language Content Analysis

Extraction

Mining

VisualizationSearchApplications

MiningApplications

InformationAccess

KnowledgeAcquisition

InformationOrganization

Other Research Work & Roadmap

- Personalized- Retrieval models- Topic map - Recommender

Current focus

-Contextual text mining-Opinion integration-Controversy discovery-Abstractive summarization

Current focus

Web, Email, and Biomedical informatics

Entity/Relation Extraction

Yahoo! Research, Jan. 12, 2011

Page 66: Maximum Personalization: User-Centered  Adaptive Information Retrieval

68

Towards Next-Generation Search Engines

Bag of words

Search

Keyword Queries

Access

Mining

Task Support

Entities-Relations

Knowledge Representation

Search History

Complete User Model

Current Search Engine

1.Personalization(User Modeling)

2. Large-Scale Semantic Analysis

3. Full-Fledged Text Info. Management

Yahoo! Research, Jan. 12, 2011

+ Social Networks

+ Task Environment

+Information Networks

Page 67: Maximum Personalization: User-Centered  Adaptive Information Retrieval

Multiresolution Topic Map for Browsing[Want et al. 2009]

Yahoo! Research, Jan. 12, 2011 69

Turn search logs into a topic map

Naturally support collaborative surfing

Browse logs offer more opportunities

to understand user interests and intents

Make browsing a “first-class citizen”!

Page 68: Maximum Personalization: User-Centered  Adaptive Information Retrieval

70

Multi-Faceted Sentiment Summary [Mei et al. 07]

(query=“Da Vinci Code”)Neutral Positive Negative

Facet 1:Movie

... Ron Howards selection of Tom Hanks to play Robert Langdon.

Tom Hanks stars in the movie,who can be mad at that?

But the movie might get delayed, and even killed off if he loses.

Directed by: Ron Howard Writing credits: Akiva Goldsman ...

Tom Hanks, who is my favorite movie star act the leading role.

protesting ... will lose your faith by ... watching the movie.

After watching the movie I went online and some research on ...

Anybody is interested in it?

... so sick of people making such a big deal about a FICTION book and movie.

Facet 2:Book

I remembered when i first read the book, I finished the book in two days.

Awesome book. ... so sick of people making such a big deal about a FICTION book and movie.

I’m reading “Da Vinci Code” now.

So still a good book to past time.

This controversy book cause lots conflict in west society.

Page 69: Maximum Personalization: User-Centered  Adaptive Information Retrieval

Latent Aspect Rating Analysis [Wang et al. 2010]

71

Reviews + overall ratings Aspect segments

location:1amazing:1walk:1anywhere:1

0.10.70.10.9

nice:1accommodating:1smile:1friendliness:1attentiveness:1

Term weights Aspect Rating

0.00.90.10.3

room:1nicely:1appointed:1comfortable:1

0.60.80.70.80.9

Aspect Segmentation

Latent Rating Regression

1.3

1.8

3.8

Aspect Weight

0.2

0.2

0.6

+

Aspect ratings?Weights on aspects?

Page 70: Maximum Personalization: User-Centered  Adaptive Information Retrieval

Reviewer Behavior Analysis & Personalized Ranking of Entities

72

People like cheap hotels because of good value

People like expensive hotels because of good service

Query: 0.9 value 0.1 others

Non-Personalized

Personalized

Page 71: Maximum Personalization: User-Centered  Adaptive Information Retrieval

Major References • Xuehua Shen, Bin Tan, and ChengXiang Zhai, Implicit User Modeling for Personalized Search , In Proceedings of

CIKM 2005 pages 824-831.

• Xuehua Shen, Bin Tan, ChengXiang Zhai, Context-Sensitive Information Retrieval with Implicit Feedback, Proceedings of SIGIR 2005, 43-50, 2005.

• Bin Tan, Xuehua Shen, ChengXiang Zhai, Mining long-term search history to improve search accuracy , Proceedings of KDD 2006, pages 718-723.

• Xuanhui Wang, Hui Fang, ChengXiang Zhai. A study of methods for negative relevance feedback , Proceedings of SIGIR 2008, pages 219-226.

• Qiaozhu Mei, ChengXiang Zhai, Discovering Evolutionary Theme Patterns from Text -- An Exploration of Temporal Text Mining, Proceedings of KDD 2005, pages 198-207.

• Maryam Karimzadehgan, ChengXiang Zhai: Exploration-exploitation tradeoff in interactive relevance feedback. In Proceedings of CIKM 2010, pages1397-1400.

• Norbert Fuhr: A probability ranking principle for interactive information retrieval. Information Retrieval 11(3): 251-265 (2008)

• Xuanhui Wang, Bin Tan, Azadeh Shakery, ChengXiang Zhai, Beyond Hyperlinks: Organizing Information Footprints in Search Logs to Support Effective Browsing, Proceedings of CIKM 2009, pages 1237-1246, 2009.

• Qiaozhu Mei, Xu Ling, Matthew Wondra, Hang Su, ChengXiang Zhai, Topic Sentiment Mixture: Modeling Facets and Opinions in Weblogs, Proceedings of WWW 2007, pages 171-180.

• Hongning Wang, Yue Lu, ChengXiang Zhai. Latent Aspect Rating Analysis on Review Text Data: A Rating Regression Approach, Proceedings of KDD 2010, pages 115-124, 2010

Yahoo! Research, Jan. 12, 2011 73

Page 72: Maximum Personalization: User-Centered  Adaptive Information Retrieval

More information can be found at http://timan.cs.uiuc.edu/

Looking forward to opportunities for collaborations…

Thank You!

Yahoo! Research, Jan. 12, 2011 74

Page 73: Maximum Personalization: User-Centered  Adaptive Information Retrieval

Acknowledgments

Yahoo! Research, Jan. 12, 2011 75

Funding Support

Joint work with Xuehua Shen, Bin Tan, Xuanhui Wang, Qiaozhu Mei,

Hongning Wang, and other TIMAN group members