navigation aided retrieval
DESCRIPTION
Navigation Aided Retrieval. Shashank Pandit & Christopher Olston Carnegie Mellon & Yahoo. Search & Navigation Trends. Users often search and then supplement the search by extensively navigating beyond the search page to locate relevant information. Why ? Query formulation problems - PowerPoint PPT PresentationTRANSCRIPT
Navigation Aided Retrieval
Shashank Pandit & Christopher Olston
Carnegie Mellon & Yahoo
Search & Navigation Trends
Users often search and then supplement the search by extensively navigating beyond the search page to locate relevant information.
Why ? Query formulation problems Open ended search tasks Preference for orienteering
Search & Navigation Trends
User behaviour in IR tasks not often fully exploited by search engines ……….. Content based – words PageRank – in and out links for popularity Collaborative – clicks on results
Search engines do not examine these navigation patterns ………(they fail to mention SearchGuide – Coyle et al that does)
NAR – Navigation Aided Recommendation
New retrieval paradigm that incorporates post query user navigation as an explicit component – NAR
A query is seen as a means to identify starting points for further navigation by users
The starting points are presented to the user in a result-list and they permit easy navigation to many documents which match the users query
NAR Navigation retrieval with Organic structure
Structure naturally present in pre-existing web documents
Advantages Human oversight – human generated categories etc Familiar user Interface – list of documents (i.e. result-
list) Single view of document collection Robust implementation – no semantic knowledge
required
The model
D – set of documents in corpus, T - users search task ST – answer set for search task, QT- the set of valid queries
for task T
Query submodel – belief distribution for the answer set given a query. What is the likelihood that doc d solves the task - Relevance
Navigation submodel – likelihood that a user starting at a particular document will be able to navigate (under guidance) to a document that solves the task.
Conventional probabilistic IR Model
No outward navigation considered
Probability of solving the task depends on whether there is a document in the document collection which solves the task
Probability of the document solving a task is based on its “relevance” to the query
Navigation-Conscious Model
Considers browsing as part of the search task
Query submodel – any probabilistic IR relevance ranking model
Navigation submodel – Stochastic model of user navigation WUFIS (Chi et al)
WUFIS
W(N, d1, d2) - probability that a user with need N will
navigate from d1 to d2.• Scent provided by anchor and surrounding
text. • The probability of a link being followed is
related to how well a user’s need matches the scent – similarity between weighted vector of need terms and scent terms.
Final Model
Documents starting point score
= Query submodel X Navigation submodel
Dd
n dddNWqdRqd'
)',),'((),'(),(
Volant - Prototype
Volant - Preprocessing
Content Engine R(d,q) –estimated by Okapi DM25 scoring function
Connectivity Engine Estimates the probability of a user with need N(d2)
navigating from d1 to d2 starting with dw
Dijikstra’s algorithm used to generate tuples
)d ,d ), W(N(d,d ,d ,d 212w21
Volant – Starting points
Query entered -> ranked list of starting points
1. Retrieve from the content engine all documents, d’, that are relevant to the query
2. For each document retrieved from 1 retrieve from the connectivity engine all documents d for which W(N(d’),d,d’)>0
3. For each unique d, compute the starting point score.
4. Sort in decreasing order of starting point score
Volant – Navigation Guidance
When a user is navigation Volant intercepts the document and highlights links that lead to documents relevant to their query, q.
1. Retrieve from content engine all documents d’ that are relevant to q
2. For each d’ retrieved, get the documents that can lead to d from the connectivity engine i.e. W(N(d’),d,d’)>0
3. For each tuple retrieved in step 2 highlight the links that point to dw
Evaluation
Hypothesis1. In query only scenarios Volant does not perform
significantly worse that conventional approaches
2. In combined query/navigation scenarios Volant selects high-quality starting points.
3. In a significant fraction of query navigation scenarios the best organic starting point is of higher quality than the one that can be synthesized using existing techniques.
Search Task Test Sets
Navigation prone scenarios are difficult to predict. Simplified Clarity Score was used to determine a set of ambiguous and unambiguous queries
Unambiguous – 20 search tasks with highest clarity from Trek 2000
Ambiguous - 48 randomly selected tasks from Trek 2003
Performance on Unambiguous Queries
Mean Average Precision
No significant difference Why? Relevant documents tended not to be
siblings or close cousins so Volant deemed that the best starting points were the documents themselves.
Performance on Ambiguous Queries
User study – 48 judges judge the suitability of starting documents as starting points
30 starting points generated 10 Trec winner 2003 CSIRO 10 Volant with user guidance 10 (same as first 10 Volant) Volant without user
guidance
Performance on Ambiguous Queries
Rating criteria Breadth – spectrum of people, different interests Accessibility – how easy to navigate and find info Appeal – presentation of material Usefulness – would people be able to complete
their task from this point.
Each judge spent 5 hours on their task
Results
Summary & Future Work
Effectiveness – responds to users and positions them at suitable starting point for their task, guides them to further information in a query driven fashion.
Relationship to conventional IR – generalizes conventional probabilistic IR model and is successful in scenarios where IR techniques fail – ambiguous queries etc
Discussion
Cold Start Problem
Scalability
Bias in Evaluation