€¦ · web viewone list keeps getting longer and longer and the other list stays fairly short, so...

Indexing 8-12: Query Execution Videos CMSC476676 SP2020 These notes are pulled from Victor Lavrenko’s IR videos in the included links. Copyright is by Victor Lavrenko. These strategies are useful for computing the scores for all documents compared to each other and you treat each document as if it were a query. *Indexing 8:Doc-at-a-time query execution talk is here . 1. K-way Linear Merge

Upload: others

Post on 14-Aug-2020

4 views

Category:

Documents

0 download

Report

Download

Embed Size (px):

TRANSCRIPT

Page 1: €¦ · Web viewOne list keeps getting longer and longer and the other list stays fairly short, so have to iterate over the longest list. Worst case is a query with a lot of rare

Indexing 8-12: Query Execution Videos CMSC476676SP2020

These notes are pulled from Victor Lavrenko’s IR videos in the included links. Copyright is by Victor Lavrenko.

These strategies are useful for computing the scores for all documents compared to each other and you treat each document as if it were a query.

*Indexing 8:Doc-at-a-time query execution talk is here.1. K-way Linear Merge

Page 2: €¦ · Web viewOne list keeps getting longer and longer and the other list stays fairly short, so have to iterate over the longest list. Worst case is a query with a lot of rare

*Indexing 9:Doc-at-a-time query execution worst case talk is here.

1. One list keeps getting longer and longer and the other list stays fairly short, so have to iterate over the longest list. Worst case is a query with a lot of rare words.

Page 3: €¦ · Web viewOne list keeps getting longer and longer and the other list stays fairly short, so have to iterate over the longest list. Worst case is a query with a lot of rare

2. K-Way Linear Merge

You get the worst case if you have lots of tiny short lists.

3. Can we do better? Yes. You are wasting time when finding the smallest id. It is more efficient to use a priority queue.

4. The worst case Priority Queue.

*Indexing 10:Term-at-a-time query execution talk is here.

1. Incrementally compute the scores for all documents.

Page 4: €¦ · Web viewOne list keeps getting longer and longer and the other list stays fairly short, so have to iterate over the longest list. Worst case is a query with a lot of rare

2. Update document partial scores as you go through the index.

3. In this way you process the query and terms in no particular order. Doc-at-a-time only emits score function when non-zero. Not so with term-at-a-time. You have some work to do at the end of going through the index which is you have to extract the final result set.

*Indexing 11: Query execution tradeoffs talk is here.1. If you can decompose your scoring function, then you can

use term-at-a-time. Cosine Similarity score is decomposable. 2.

3. Doc-at-a-time allows you to use any scoring function (does not have to be decomposable.

*Indexing 12:Expected cost of query execution talk is here.1. Why complexity matters. Suppose you want to compare all

documents against the others. Then you need the pairwise Similarity scores . We do this later in our programming project.