ranking objects based on relationships computing top-k over aggregation sigmod 2006 kaushik...
TRANSCRIPT
![Page 1: Ranking objects based on relationships Computing Top-K over Aggregation Sigmod 2006 Kaushik Chakrabarti et al](https://reader034.vdocuments.site/reader034/viewer/2022051516/56649f485503460f94c6a6f2/html5/thumbnails/1.jpg)
Ranking objects based on relationships
Computing Top-K over Aggregation
Sigmod 2006
Kaushik Chakrabarti et al.
![Page 2: Ranking objects based on relationships Computing Top-K over Aggregation Sigmod 2006 Kaushik Chakrabarti et al](https://reader034.vdocuments.site/reader034/viewer/2022051516/56649f485503460f94c6a6f2/html5/thumbnails/2.jpg)
Outline
• Motivation
• The framework and problem definition
• Proposed solution
• Discussions
• Experiments
![Page 3: Ranking objects based on relationships Computing Top-K over Aggregation Sigmod 2006 Kaushik Chakrabarti et al](https://reader034.vdocuments.site/reader034/viewer/2022051516/56649f485503460f94c6a6f2/html5/thumbnails/3.jpg)
Heating up discussion
• We basically know how web search engines work. – Having web crawlers collecting web-page
information, index and rank them.
• How do we define searching in a relational database– Free-style search v.s. SQL + predicates ?– What’s the expected outcome?– How do we rank results?
![Page 4: Ranking objects based on relationships Computing Top-K over Aggregation Sigmod 2006 Kaushik Chakrabarti et al](https://reader034.vdocuments.site/reader034/viewer/2022051516/56649f485503460f94c6a6f2/html5/thumbnails/4.jpg)
Motivation• Searching over a relational database
– information scattered in different relations
![Page 5: Ranking objects based on relationships Computing Top-K over Aggregation Sigmod 2006 Kaushik Chakrabarti et al](https://reader034.vdocuments.site/reader034/viewer/2022051516/56649f485503460f94c6a6f2/html5/thumbnails/5.jpg)
Motivation
• Full text search, aggregation already supported by RDBMS
– What else do we need in order to perform good searching?
![Page 6: Ranking objects based on relationships Computing Top-K over Aggregation Sigmod 2006 Kaushik Chakrabarti et al](https://reader034.vdocuments.site/reader034/viewer/2022051516/56649f485503460f94c6a6f2/html5/thumbnails/6.jpg)
Related work
• Information Retrieval (full text searching)• Researches in Text Databases• Explore database via foreign key-primary key
– DBExplorer (ICDE 2002)– BANKS (ICDE 2002)– DISCOVER (VLDB 2002)
• What are related work missing– Target objects don’t contain keywords– Lack of scoring function for query results– Not utilizing aggregates to put together search results
for multiple keywords
![Page 7: Ranking objects based on relationships Computing Top-K over Aggregation Sigmod 2006 Kaushik Chakrabarti et al](https://reader034.vdocuments.site/reader034/viewer/2022051516/56649f485503460f94c6a6f2/html5/thumbnails/7.jpg)
Contributions
• Introduce an interesting problem domain
• Define “Object Finder” (OF) queries
• Propose scoring functions
• Propose a solution to process OF query– Return top K ranked results– Efficient early termination property
![Page 8: Ranking objects based on relationships Computing Top-K over Aggregation Sigmod 2006 Kaushik Chakrabarti et al](https://reader034.vdocuments.site/reader034/viewer/2022051516/56649f485503460f94c6a6f2/html5/thumbnails/8.jpg)
System Overview
![Page 9: Ranking objects based on relationships Computing Top-K over Aggregation Sigmod 2006 Kaushik Chakrabarti et al](https://reader034.vdocuments.site/reader034/viewer/2022051516/56649f485503460f94c6a6f2/html5/thumbnails/9.jpg)
Scoring functions
• Scoring Matrixes and row- column- marginal's
![Page 10: Ranking objects based on relationships Computing Top-K over Aggregation Sigmod 2006 Kaushik Chakrabarti et al](https://reader034.vdocuments.site/reader034/viewer/2022051516/56649f485503460f94c6a6f2/html5/thumbnails/10.jpg)
Scoring semantics
• All Query Keywords Present in each document– can be too restrictive
• All Query Keywords Present in Set of Related Documents– can not use MIN as row-marginal scoring
• Pseudo-document Approach:– enlarged searching space
![Page 11: Ranking objects based on relationships Computing Top-K over Aggregation Sigmod 2006 Kaushik Chakrabarti et al](https://reader034.vdocuments.site/reader034/viewer/2022051516/56649f485503460f94c6a6f2/html5/thumbnails/11.jpg)
Problem definition
• Object finder problem:
![Page 12: Ranking objects based on relationships Computing Top-K over Aggregation Sigmod 2006 Kaushik Chakrabarti et al](https://reader034.vdocuments.site/reader034/viewer/2022051516/56649f485503460f94c6a6f2/html5/thumbnails/12.jpg)
Process OF query as Top-K query
• Top-K query incorporates ranking. Results are total ordered if we process strong top-K
• A good algorithm can utilize early termination to avoid processing of results that are not in top-K
![Page 13: Ranking objects based on relationships Computing Top-K over Aggregation Sigmod 2006 Kaushik Chakrabarti et al](https://reader034.vdocuments.site/reader034/viewer/2022051516/56649f485503460f94c6a6f2/html5/thumbnails/13.jpg)
Top-K query processing
• General framework:Supporting Ad-hoc Ranking Aggregates SIGMOD 2006
( presented in May)
*SELECT* ga_1,..ga_n , F ----groups *FROM* R1,...,Rh ----source rel *WHERE* c1 AND... cl ----join cond. *GROUP BY* ga_1,...ga_n ----group def. *ORDER BY* F ----ordering func. an
aggregate*LIMIT* k ----Top-k setting
![Page 14: Ranking objects based on relationships Computing Top-K over Aggregation Sigmod 2006 Kaushik Chakrabarti et al](https://reader034.vdocuments.site/reader034/viewer/2022051516/56649f485503460f94c6a6f2/html5/thumbnails/14.jpg)
Top-K query processing
• For OF query, it isselect TOId, TOValue, score(TOId)
from TargetTable T, R, L1,...,LN
where R.TOId = T.TOID
and R.DocId=Li.DocID (i=1..N)
group by TOId, TOValue
order By score(TOId)
limit k
![Page 15: Ranking objects based on relationships Computing Top-K over Aggregation Sigmod 2006 Kaushik Chakrabarti et al](https://reader034.vdocuments.site/reader034/viewer/2022051516/56649f485503460f94c6a6f2/html5/thumbnails/15.jpg)
My work is done(please try to recall my last talk)
![Page 16: Ranking objects based on relationships Computing Top-K over Aggregation Sigmod 2006 Kaushik Chakrabarti et al](https://reader034.vdocuments.site/reader034/viewer/2022051516/56649f485503460f94c6a6f2/html5/thumbnails/16.jpg)
Algorithm : Generate-Prune
Phrase I : Compute top-K candidates
![Page 17: Ranking objects based on relationships Computing Top-K over Aggregation Sigmod 2006 Kaushik Chakrabarti et al](https://reader034.vdocuments.site/reader034/viewer/2022051516/56649f485503460f94c6a6f2/html5/thumbnails/17.jpg)
Algorithm Overview
![Page 18: Ranking objects based on relationships Computing Top-K over Aggregation Sigmod 2006 Kaushik Chakrabarti et al](https://reader034.vdocuments.site/reader034/viewer/2022051516/56649f485503460f94c6a6f2/html5/thumbnails/18.jpg)
Algorithm
• Phrase II Compute exact top-K
![Page 19: Ranking objects based on relationships Computing Top-K over Aggregation Sigmod 2006 Kaushik Chakrabarti et al](https://reader034.vdocuments.site/reader034/viewer/2022051516/56649f485503460f94c6a6f2/html5/thumbnails/19.jpg)
Discussions
• In this work– Choice of aggregation function– ranking function in general – How do you think of this work
• Not limited– Impact of more complicated schema– Impact of selectivity of the query
![Page 20: Ranking objects based on relationships Computing Top-K over Aggregation Sigmod 2006 Kaushik Chakrabarti et al](https://reader034.vdocuments.site/reader034/viewer/2022051516/56649f485503460f94c6a6f2/html5/thumbnails/20.jpg)
Experiment Results
• Faster than SQL
• Faster than Generate-Only
• Robust to # of keywords and selections
• Intuitive Results
![Page 21: Ranking objects based on relationships Computing Top-K over Aggregation Sigmod 2006 Kaushik Chakrabarti et al](https://reader034.vdocuments.site/reader034/viewer/2022051516/56649f485503460f94c6a6f2/html5/thumbnails/21.jpg)
Experiments• Faster than SQL
![Page 22: Ranking objects based on relationships Computing Top-K over Aggregation Sigmod 2006 Kaushik Chakrabarti et al](https://reader034.vdocuments.site/reader034/viewer/2022051516/56649f485503460f94c6a6f2/html5/thumbnails/22.jpg)
Experiments• Faster than Generate-Only
![Page 23: Ranking objects based on relationships Computing Top-K over Aggregation Sigmod 2006 Kaushik Chakrabarti et al](https://reader034.vdocuments.site/reader034/viewer/2022051516/56649f485503460f94c6a6f2/html5/thumbnails/23.jpg)
Experiments• Robust to # of keywords and selections
![Page 24: Ranking objects based on relationships Computing Top-K over Aggregation Sigmod 2006 Kaushik Chakrabarti et al](https://reader034.vdocuments.site/reader034/viewer/2022051516/56649f485503460f94c6a6f2/html5/thumbnails/24.jpg)
Thank you
Questions to discuss?