keyword proximity search on graphs m.sc. systems course the hebrew university of jerusalem, winter...
Post on 19-Dec-2015
216 views
TRANSCRIPT
![Page 1: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d3e5503460f94a17b22/html5/thumbnails/1.jpg)
Keyword Proximity Search on Keyword Proximity Search on Graphs Graphs
M.Sc. Systems CourseM.Sc. Systems CourseThe Hebrew University of Jerusalem, Winter 2006
![Page 2: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d3e5503460f94a17b22/html5/thumbnails/2.jpg)
Keyword Proximity Search on Graphs MSSYS 2006
A rapidly evolving paradigm for data extraction
Data have varying degrees of structure
Queries are sets of keywords− No structural constraints
Keyword Proximity Search
Relational Databases
Web SitesXML
Documents
The Goal:The Goal:
Extract meaningful parts of data w.r.t. the keywords
![Page 3: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d3e5503460f94a17b22/html5/thumbnails/3.jpg)
Keyword Proximity Search on Graphs MSSYS 2006
Recent Work on KPS (Keyword Proximity Search)
• DataSpotDataSpot (Sigmod 1998)
• Information Units Information Units (WWW 2001)
• BANKSBANKS (ICDE 2002, VLDB 2005)
• DISCOVERDISCOVER (VLDB 2002)
• DBXplorerDBXplorer (ICDE 2002)
• XKeyword XKeyword (ICDE 2003)
• ……
![Page 4: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d3e5503460f94a17b22/html5/thumbnails/4.jpg)
Keyword Proximity Search on Graphs MSSYS 2006
Systems for KPS on Relational Data
BANKS, DISCOVER and DBXplorer implemented KPS (Keyword Proximity Search) on relational databases Different algorithms are used Slight differences in semantics
G. Bhalotia, A. Hulgeri, C. Nakhe, S. Chakrabarti, and S. Sudarshan. Keyword searching and browsing in databases using BANKS. In ICDE, pages 431–440, 2002.
V. Hristidis and Y. Papakonstantinou. DISCOVER: Keyword search in relational databases. In VLDB, pages 670–681, 2002.
S. Agrawal, S. Chaudhuri, and G. Das. DBXplorer: enabling keyword search over relational databases. In SIGMOD Conference, page 627, 2002.
![Page 5: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d3e5503460f94a17b22/html5/thumbnails/5.jpg)
Keyword Proximity Search on Graphs MSSYS 2006
Example: KPS on RDB
IDNamePopulation
22Amsterdam1101407
73Brussels951580
IDNameHead Q.
135EU73
175ESA81
CountryOrg.
B135
NL135
search Belgium , Brussels
CodeNameAreaCapital
NLNetherlands3733022
BBelgium3051073
CitiesCities OrganizationsOrganizations
CountriesCountries MembershipsMemberships
![Page 6: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d3e5503460f94a17b22/html5/thumbnails/6.jpg)
Keyword Proximity Search on Graphs MSSYS 2006
IDNamePopulation
22Amsterdam1101407
73Brussels951580
IDNameHead Q.
135EU73
175ESA81
CountryOrg.
B135
NL135
search Belgium , Brussels
CodeNameAreaCapital
NLNetherlands3733022
BBelgium3051073
CitiesCities OrganizationsOrganizations
CountriesCountries MembershipsMemberships
Brussels is the capital city of Belgium
![Page 7: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d3e5503460f94a17b22/html5/thumbnails/7.jpg)
Keyword Proximity Search on Graphs MSSYS 2006
IDNamePopulation
22Amsterdam1101407
73Brussels951580
IDNameHead Q.
135EU73
175ESA81
CountryOrg.
B135
NL135
search Belgium , Brussels
CodeNameAreaCapital
NLNetherlands3733022
BBelgium3051073
CitiesCities OrganizationsOrganizations
CountriesCountries MembershipsMemberships
BBelgium3051073 73Brussels951580
Brussels is the capital city of Belgium
![Page 8: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d3e5503460f94a17b22/html5/thumbnails/8.jpg)
Keyword Proximity Search on Graphs MSSYS 2006
IDNamePopulation
22Amsterdam1101407
73Brussels951580
IDNameHead Q.
135EU73
175ESA81
CountryOrg.
B135
NL135
CodeNameAreaCapital
NLNetherlands3733022
BBelgium3051073
CitiesCities OrganizationsOrganizations
CountriesCountries MembershipsMemberships
Brussels hosts EU and Belgium is a member
search Belgium , Brussels
![Page 9: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d3e5503460f94a17b22/html5/thumbnails/9.jpg)
Keyword Proximity Search on Graphs MSSYS 2006
IDNamePopulation
22Amsterdam1101407
73Brussels951580
IDNameHead Q.
135EU73
175ESA81
CountryOrg.
B135
NL135
CodeNameAreaCapital
NLNetherlands3733022
BBelgium3051073
CitiesCities OrganizationsOrganizations
CountriesCountries MembershipsMemberships
BBelgium3051073
73Brussels951580
Brussels hosts EU and Belgium is a member
search Belgium , Brussels
B135 135EU73
![Page 10: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d3e5503460f94a17b22/html5/thumbnails/10.jpg)
Keyword Proximity Search on Graphs MSSYS 2006
XKeyword: KPS on XML
XKeyword implemented KPS on XML Architecture is based on that of DISCOVER
A demo over DBLP is available
• http://kebab.ucsd.edu:81/xkeyword
V. Hristidis, Y. Papakonstantinou, and A. Balmin. Keyword proximity search on XML graphs. In ICDE, pages 367–378, 2003.
![Page 11: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d3e5503460f94a17b22/html5/thumbnails/11.jpg)
Keyword Proximity Search on Graphs MSSYS 2006
Example: KPS on XML
dblp
title
author
article
MihalisYannakakis
On theApproximationof MaximumSatisfiability
title
author
article
ImprovedApproximationAlgorithms for
MAX SAT
TakaoAsano
David P.Williamson
authorreferences
cite
search Yannakakis , Approximation
![Page 12: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d3e5503460f94a17b22/html5/thumbnails/12.jpg)
Keyword Proximity Search on Graphs MSSYS 2006
Yannakakis wrote a paper about Approximation
dblp
title
author
article
MihalisYannakakis
On theApproximation
of MaximumSatisfiability
title
author
article
ImprovedApproximationAlgorithms for
MAX SAT
TakaoAsano
David P.Williamson
authorreferences
cite
search Yannakakis , Approximation
![Page 13: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d3e5503460f94a17b22/html5/thumbnails/13.jpg)
Keyword Proximity Search on Graphs MSSYS 2006
dblp
title
author
article
MihalisYannakakis
On theApproximationof MaximumSatisfiability
title
author
article
ImprovedApproximationAlgorithms for
MAX SAT
TakaoAsano
David P.Williamson
authorreferences
cite
Yannakakis is cited by a paper about Approximation
search Yannakakis , Approximation
![Page 14: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d3e5503460f94a17b22/html5/thumbnails/14.jpg)
Keyword Proximity Search on Graphs MSSYS 2006
KPS on Web Sites (Information Units)
• KPS can also be used for retrieving information from Web sites
• For a given query, results are collections of Web pages from the site
– Pages are relevant w.r.t. the keywords
– Pages are connected by hyperlinks
Wen-Syan Li, K. Selçuk Candan, Quoc Vu, and Divyakant Agrawal. Retrieving and organizing web pages by “information unit”. In WWW, pages 230-244, 2001.
![Page 15: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d3e5503460f94a17b22/html5/thumbnails/15.jpg)
Keyword Proximity Search on Graphs MSSYS 2006
Example: KPS in Web Sites
http://www.goisrael.com/http://www.goisrael.com/
search Hilton , Beach
![Page 16: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d3e5503460f94a17b22/html5/thumbnails/16.jpg)
Keyword Proximity Search on Graphs MSSYS 2006
Example: KPS in Web Sites
Eilat Beaches
Hilton Eilat Queen of Sheba
search Hilton , Beach
Eilat
![Page 17: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d3e5503460f94a17b22/html5/thumbnails/17.jpg)
A Formal Framework for KPSA Formal Framework for KPS
![Page 18: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d3e5503460f94a17b22/html5/thumbnails/18.jpg)
Keyword Proximity Search on Graphs MSSYS 2006
Data Graphs
company
supplies
supply
product
supplier
papersA4
company
supplies
supply
product
supplier
coffee
president
Cohen
department
Summers
manager
Parishqhq
Data graphs have two types of nodes: Structural nodes
Keywords
![Page 19: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d3e5503460f94a17b22/html5/thumbnails/19.jpg)
Keyword Proximity Search on Graphs MSSYS 2006
Queries
K={ Summers , Cohen , coffee }company
supplies
supply
product
supplier
papersA4
company
supplies
supply
product
supplier
coffee
president
Cohen
department
Summers
manager
Parishqhq
Queries are sets of keywords from the data graph
![Page 20: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d3e5503460f94a17b22/html5/thumbnails/20.jpg)
Keyword Proximity Search on Graphs MSSYS 2006
Query Results
company
supplies
supply
product
supplier
papersA4
company
supplies
supply
product
supplier
coffee
president
Cohen
department
Summers
manager
Parishqhq
![Page 21: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d3e5503460f94a17b22/html5/thumbnails/21.jpg)
Keyword Proximity Search on Graphs MSSYS 2006
Query Results
company
supplies
supply
product
supplier
papersA4
company
supplies
supply
product
supplier
coffee
president
Cohen
department
Summers
manager
Parishqhq
Query results are subtrees of the data graph Contain all keywords in the query
Have no redundant edges
A subtree that isreduced w.r.t. thekeywords
![Page 22: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d3e5503460f94a17b22/html5/thumbnails/22.jpg)
Keyword Proximity Search on Graphs MSSYS 2006
Three Variants
Three variants of keyword proximity search are considered:
Rooted proximity
Undirected proximity
Strong proximity
![Page 23: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d3e5503460f94a17b22/html5/thumbnails/23.jpg)
Keyword Proximity Search on Graphs MSSYS 2006
Rooted Variant
company
supplies
supply
product
supplier
papersA4
company
supplies
supply
product
supplier
coffee
president
Cohen
department
Summers
manager
Parishqhq
Used in BANKS BANKS
Results are rooted trees
![Page 24: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d3e5503460f94a17b22/html5/thumbnails/24.jpg)
Keyword Proximity Search on Graphs MSSYS 2006
Undirected Variant
company
supplies
supply
product
supplier
papersA4
company
supplies
supply
product
supplier
coffee
president
Cohen
department
Summers
manager
Parishqhq
Used in Interconnection Interconnection Semantics for XMLSemantics for XML
Results are undirected trees
![Page 25: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d3e5503460f94a17b22/html5/thumbnails/25.jpg)
Keyword Proximity Search on Graphs MSSYS 2006
Strong Variant
company
supplies
supply
product
supplier
papersA4
company
supplies
supply
product
supplier
coffee
president
Cohen
department
Summers
manager
Parishqhq
Used in XKeywordXKeyword, Information Information UnitsUnits, DBXplorerDBXplorer and DISCOVERDISCOVER
Results are undirected treesand keywords are leaves
![Page 26: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d3e5503460f94a17b22/html5/thumbnails/26.jpg)
Keyword Proximity Search on Graphs MSSYS 2006
DataData
A data graph G
Problem Definition
QueryQuery
A set K of keywords in G
Query ResultsQuery Results
Subtrees of G that are reduced w.r.t. K
Input:Input:
Output:Output:
Rooted/Undirected/Strong
![Page 27: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d3e5503460f94a17b22/html5/thumbnails/27.jpg)
Keyword Proximity Search on Graphs MSSYS 2006
Creating Data Graphs from Relational Databases
Nodes are tuples
Edges are foreign-key references
![Page 28: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d3e5503460f94a17b22/html5/thumbnails/28.jpg)
Keyword Proximity Search on Graphs MSSYS 2006
Creating Data Graphs from Relational Databases
Edges from each tuple node to all the keywords in that tuple
Belgium 30510B 73
Belgium 30510B 73
![Page 29: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d3e5503460f94a17b22/html5/thumbnails/29.jpg)
Keyword Proximity Search on Graphs MSSYS 2006
Creating Data Graphs from XML
Nodes are XML elements
dblp
article article
On theApproximationof MaximumSatisfiability
titleMihalis
Yannakakis
authorTakao Asano
authorDavid P.
Williamson
authorImproved
ApproximationAlgorithms for
MAX SAT
titlecite
![Page 30: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d3e5503460f94a17b22/html5/thumbnails/30.jpg)
Keyword Proximity Search on Graphs MSSYS 2006
Creating Data Graphs from XML
dblp
article article
On theApproximationof MaximumSatisfiability
titleMihalis
Yannakakis
authorTakao Asano
authorDavid P.
Williamson
authorImproved
ApproximationAlgorithms for
MAX SAT
titlecite
Nodes are XML elements
Edges are nesting of elements …Edges represent
nesting of elements …
![Page 31: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d3e5503460f94a17b22/html5/thumbnails/31.jpg)
Keyword Proximity Search on Graphs MSSYS 2006
Creating Data Graphs from XML
dblp
article article
On theApproximationof MaximumSatisfiability
titleMihalis
Yannakakis
authorTakao Asano
authorDavid P.
Williamson
authorImproved
ApproximationAlgorithms for
MAX SAT
titlecite
Nodes are XML elements
Edges represent nesting of elements …
… and ID references
![Page 32: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d3e5503460f94a17b22/html5/thumbnails/32.jpg)
Keyword Proximity Search on Graphs MSSYS 2006
Creating Data Graphs from XMLKeywords appear in PCDATA
dblp
article article
On theApproximationof MaximumSatisfiability
titleMihalis
Yannakakis
authorTakao Asano
authorDavid P.
Williamson
authorImproved
ApproximationAlgorithms for
MAX SAT
titlecite
Nodes are XML elements
… and ID references
Edges are nesting of elements …Edges represent
nesting of elements …
![Page 33: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d3e5503460f94a17b22/html5/thumbnails/33.jpg)
Keyword Proximity Search on Graphs MSSYS 2006
All Occurrences of a Keyword are Represented by One Node
dblp
article article
On theApproximationof MaximumSatisfiability
titleMihalis
Yannakakis
authorTakao Asano
authorDavid P.
Williamson
authorImproved
ApproximationAlgorithms for
MAX SAT
titlecite
Approximation Approximation
A keywords is represented by a single node
![Page 34: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d3e5503460f94a17b22/html5/thumbnails/34.jpg)
Keyword Proximity Search on Graphs MSSYS 2006
Creating Data Graphs from Web Sites
Nodes are Web pages …
Keywords appear in these pages …
Edges are hyperlinks/XLinks
http://www.goisrael.com/http://www.goisrael.com/
A keywords is represented by a single
node
![Page 35: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d3e5503460f94a17b22/html5/thumbnails/35.jpg)
Ranking and Enumeration OrderRanking and Enumeration Order
![Page 36: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d3e5503460f94a17b22/html5/thumbnails/36.jpg)
Keyword Proximity Search on Graphs MSSYS 2006
Ranking Results
Yannakakis
Approximation
title
Yannakakis
Approximation
dblp
article
title
article
title
Yannakakis
Approximation
article
title
article
title
cite
references
Ranking of results is determined by size
2 13
![Page 37: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d3e5503460f94a17b22/html5/thumbnails/37.jpg)
Keyword Proximity Search on Graphs MSSYS 2006
Edges Have Weights
Yannakakis
Approximation
dblp2
article
2
title
1
1
article
title1
1
Yannakakis
Approximation
article
title
article
title
cite
references1
1.5
1
1
1
1 1
1
Yannakakis
Approximation
title
edges incident to dblp have a large weight
edges from cite to article have a medium weight
2 13
![Page 38: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d3e5503460f94a17b22/html5/thumbnails/38.jpg)
Keyword Proximity Search on Graphs MSSYS 2006
Order of Results
Arbitrary Order
Exact Order ji RRji ,
![Page 39: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d3e5503460f94a17b22/html5/thumbnails/39.jpg)
Keyword Proximity Search on Graphs MSSYS 2006
Order of Results (cont’d)
Heuristic Order
C-Approximate Order
ji RCRji ,
![Page 40: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d3e5503460f94a17b22/html5/thumbnails/40.jpg)
Measuring the Efficiency of Measuring the Efficiency of EnumerationsEnumerations
![Page 41: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d3e5503460f94a17b22/html5/thumbnails/41.jpg)
Keyword Proximity Search on Graphs MSSYS 2006
Polynomial Runtime is not Appropriate for KPS
• In the theory of CS, the usual notion of efficiency is polynomial running time That is, the algorithm terminates in time that is
polynomial in the size of the input
• However, in KPS the number of results can be exponential in the size of the input Algorithms cannot be expected to terminate in
polynomial time
Even for two keywords
• Therefore, other notions are required
![Page 42: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d3e5503460f94a17b22/html5/thumbnails/42.jpg)
Keyword Proximity Search on Graphs MSSYS 2006
Time Efficiency
Polynomial Total TimePolynomial Total Time
Polynomial runtime in the combined size of the input and the output
Polynomial DelayPolynomial Delay
The runtime between two successive results is polynomial in the size of the input
![Page 43: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d3e5503460f94a17b22/html5/thumbnails/43.jpg)
Keyword Proximity Search on Graphs MSSYS 2006
About Polynomial Delay
• With polynomial delay you can: Generate the first few results quickly
Efficiently return results in pages
• In most cases of keyword search, this is the suitable notion of efficiency
• Goal: develop algorithms that enumerate KPS results with polynomial delay
![Page 44: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d3e5503460f94a17b22/html5/thumbnails/44.jpg)
Keyword Proximity Search on Graphs MSSYS 2006
Space Efficiency
Polynomial Space
Linearly-Incremental Space i results require i times polynomial space in
the input
![Page 45: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d3e5503460f94a17b22/html5/thumbnails/45.jpg)
Keyword Proximity Search on Graphs MSSYS 2006
Data and Query-and-Data Complexity
• Under query-and-data complexity, we assume that both the query and the data are of unbounded size Many problems in database theory, e.g.,
computing joins of relational tables, are intractable under this measure
• In practice, however, queries are very small compared to the data
• Under data complexity, the size of the query is assumed to be fixed
![Page 46: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d3e5503460f94a17b22/html5/thumbnails/46.jpg)
Enumerating Results of KS with Enumerating Results of KS with Polynomial DelayPolynomial Delay
![Page 47: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d3e5503460f94a17b22/html5/thumbnails/47.jpg)
Keyword Proximity Search on Graphs MSSYS 2006
Keyword Search with Polynomial Delay
• The following algorithm enumerates reduced subtrees (i.e., results of keyword search) with polynomial delay Results are not ranked
• A different version of the algorithm for each of the three variants: rooted
undirected
strong
![Page 48: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d3e5503460f94a17b22/html5/thumbnails/48.jpg)
Keyword Proximity Search on Graphs MSSYS 2006
Importance of the Algorithm
• An upper bound for ranked keyword search: Results can be enumerated in ranked order in polynomial total time Generate all the results and then sort them
• In some cases, ranking is not required
• A basis for developing efficient heuristics that enumerate in an “almost” ranked order (discussed later)
![Page 49: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d3e5503460f94a17b22/html5/thumbnails/49.jpg)
The Algorithm for Enumerating The Algorithm for Enumerating Rooted Reduced SubtreesRooted Reduced Subtrees
![Page 50: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d3e5503460f94a17b22/html5/thumbnails/50.jpg)
Keyword Proximity Search on Graphs MSSYS 2006
Overview
• The algorithm uses two reductions
• Each reduction alone either does not solve the problem or runs in exponential total time
• However, the two reductions can be combined together to enumerate reduced subtrees with polynomial delay
![Page 51: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d3e5503460f94a17b22/html5/thumbnails/51.jpg)
Keyword Proximity Search on Graphs MSSYS 2006
Data Reduction
1. Choose an arbitrary node v in K
2. For each parent p of v do:
I. In K: replace v with p
II. In G: remove v
III. Generate all results for the new input
IV. Add p→v to each result of the new input
A
KKGG
A B
p
vA B v
pvv
A B
p
v
p
A B
p
v A B A Bv v
B
![Page 52: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d3e5503460f94a17b22/html5/thumbnails/52.jpg)
Keyword Proximity Search on Graphs MSSYS 2006
Example Showing Failure
B
AC
KK
A
BC
Four results!
Two with this
root
Two with this
root
![Page 53: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d3e5503460f94a17b22/html5/thumbnails/53.jpg)
Keyword Proximity Search on Graphs MSSYS 2006
Failure Example
B
AC
KK
A
BC
![Page 54: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d3e5503460f94a17b22/html5/thumbnails/54.jpg)
Keyword Proximity Search on Graphs MSSYS 2006
Failure Example
B
C
KK BC
![Page 55: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d3e5503460f94a17b22/html5/thumbnails/55.jpg)
Keyword Proximity Search on Graphs MSSYS 2006
Failure Example
C
KK
C
![Page 56: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d3e5503460f94a17b22/html5/thumbnails/56.jpg)
Keyword Proximity Search on Graphs MSSYS 2006
Failure Example
KK
![Page 57: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d3e5503460f94a17b22/html5/thumbnails/57.jpg)
Keyword Proximity Search on Graphs MSSYS 2006
Failure Example
C
B
A
Only one result!
Three others are missing!
![Page 58: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d3e5503460f94a17b22/html5/thumbnails/58.jpg)
Keyword Proximity Search on Graphs MSSYS 2006
Why Data Reduction Fails
• We assumed that v is a leaf in every result
• It does not hold for structural nodes in recursive steps!
• Therefore, some results are not found!
• Solution(?): Repeat data reduction for every v in K Exponential total time in the worst case!
![Page 59: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d3e5503460f94a17b22/html5/thumbnails/59.jpg)
Keyword Proximity Search on Graphs MSSYS 2006
Query Reduction
1. Remove one keyword from the query
2. Find all results for the smaller query
3. Extend each result to include the missing keyword, in every possible way
A K= {A,B,C}
A
B
BA
A
BC
C
A
B
C BA A
C
B CBA
![Page 60: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d3e5503460f94a17b22/html5/thumbnails/60.jpg)
Keyword Proximity Search on Graphs MSSYS 2006
Extending Partial Results
• In query reduction, we need to extend a result T of the query K\{k} to all results of the query K
• This is done as follows: For all nodes v of T:
• Remove from G all nodes of T, except for v
• Find all simple directed paths P from v to k and print the concatenation of T and P
• If v is the root of T, we also need to concatenate T with all subtrees that are reduced w.r.t. v and k
• More details are can be found in the paper
![Page 61: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d3e5503460f94a17b22/html5/thumbnails/61.jpg)
Keyword Proximity Search on Graphs MSSYS 2006
Extensions by Directed Paths
![Page 62: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d3e5503460f94a17b22/html5/thumbnails/62.jpg)
Keyword Proximity Search on Graphs MSSYS 2006
Extensions by Directed Subtrees
![Page 63: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d3e5503460f94a17b22/html5/thumbnails/63.jpg)
Keyword Proximity Search on Graphs MSSYS 2006
Query Reduction is not Efficient!
• Query reduction completely solves the problem, but it is inefficient
• Problem: A subset of the query may have much more results than the query itself
Exponential total time!
A B CnA B CnA B CnA B Cn
2n results
for {A,B}
1 result for
{A,B,C}
![Page 64: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d3e5503460f94a17b22/html5/thumbnails/64.jpg)
Keyword Proximity Search on Graphs MSSYS 2006
Combining the Reductions
• In order to enumerate in polynomial total time, combine query and data reductions: If some node v of K is reachable, in the data
graph, from another node u of K, use query reduction
• remove v from K
Otherwise, use data reduction
• By combining the two reductions, results can be enumerated in polynomial total time
v
u
![Page 65: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d3e5503460f94a17b22/html5/thumbnails/65.jpg)
Keyword Proximity Search on Graphs MSSYS 2006
Achieving Polynomial Delay
• To achieve polynomial delay, we cannot wait until a recursive subroutine terminates
• Use coroutines instead of subroutines!
• That is, each recursive execution of the algorithm
stops after generating each result
resumes when the next result is required
![Page 66: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d3e5503460f94a17b22/html5/thumbnails/66.jpg)
Keyword Proximity Search on Graphs MSSYS 2006
routine 3 routine 2 routine 1
Subroutines
Base
Polynomial Polynomial Total TimeTotal Time
![Page 67: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d3e5503460f94a17b22/html5/thumbnails/67.jpg)
Keyword Proximity Search on Graphs MSSYS 2006
routine 3 routine 2 routine 1
Coroutines
Base
Polynomial Polynomial DelayDelay
![Page 68: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d3e5503460f94a17b22/html5/thumbnails/68.jpg)
For papers and projects related to this topic, see the home page of Benny Kimelfeld