interactive query formulation over web service-accessed sources
DESCRIPTION
Interactive Query Formulation over Web Service-Accessed Sources. SIGMOD 2006 Best Paper Runner-Up. Michalis Petropoulos Alin Deutsch Yannis Papakonstantinou. CSE 636 Data Integration, March 2008. Large-Scale Data Integration Systems. . Web Domain. Web Forms & Reports. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Interactive Query Formulation over Web Service-Accessed Sources](https://reader034.vdocuments.site/reader034/viewer/2022051419/56815a91550346895dc805db/html5/thumbnails/1.jpg)
Interactive Query Formulationover Web Service-Accessed Sources
Michalis PetropoulosAlin Deutsch
Yannis Papakonstantinou
CSE 636 Data Integration, March 2008
SIGMOD 2006Best Paper
Runner-Up
![Page 2: Interactive Query Formulation over Web Service-Accessed Sources](https://reader034.vdocuments.site/reader034/viewer/2022051419/56815a91550346895dc805db/html5/thumbnails/2.jpg)
2
Large-Scale Data Integration Systems
SourceDomain
WebDomain
End User
ApplicationDomain
IntegrationDomain
Application
DataSource
DataSource
MediatorIntegratedSchema
Developer
IntegrationEngineer
SourceOwner
Application
Web Forms& Reports
SourceSchema
…
WebService
WebService
WebService
SourceSchema …
• Dell Computers• Cisco Routers• HP Printers
• Dell Computers by CPU• Cisco Routers by Rate• HP Printers by Speed
• CNET Computer• PCWorld Portals
Compatible Combinationsof Computers, Routersand Printers
• CNET’s Top Combinations• CNET’s Search Desktops• PCWorld’s Product Finder
![Page 3: Interactive Query Formulation over Web Service-Accessed Sources](https://reader034.vdocuments.site/reader034/viewer/2022051419/56815a91550346895dc805db/html5/thumbnails/3.jpg)
3
Large-Scale Data Integration Systems
What queries can the mediator answer for me?
CLIDE
SourceDomain
WebDomain
End User
ApplicationDomain
IntegrationDomain
Application
DataSource
DataSource
MediatorIntegratedSchema
Developer
IntegrationEngineer
SourceOwner
Application
Web Forms& Reports
SourceSchema
…
WebService
WebService
WebService
SourceSchema …
![Page 4: Interactive Query Formulation over Web Service-Accessed Sources](https://reader034.vdocuments.site/reader034/viewer/2022051419/56815a91550346895dc805db/html5/thumbnails/4.jpg)
4
Running Example
Schema
Computers(cid, cpu, ram, price)NetCards(cid, rate, standard, interface)
Views
V1 ComByCpu(cpu) (Computer)*
SELECT DISTINCT Com1.*FROM Computers Com1WHERE Com1.cpu=cpu
V2 ComNetByCpuRate(cpu, rate) (Computer,
NetCard)*
SELECT DISTINCT Com1.*, Net1.*FROM Computers Com1, Network Net1WHERE Com1.cid=Net1.cidAND Com1.cpu=cpuAND Net1.rate=rate
Parameterized Views
DellDell CiscoCiscoSchema
Routers(rate, standard, price, type)
Views
V3 RouWired() (Router)*
SELECT DISTINCT Rou1.*FROM Routers Rou1WHERE Rou1.type='Wired'
V4 RouWireless() (Router)*
SELECT DISTINCT Rou1.*FROM Routers Rou1WHERE Rou1.type='Wireless'
Conjunctive Queries CQ• Equality & Comparison Conditions• Parameters
Computersfor a given cpu
Computers & NetCardsfor a given cpu & rate
Wired Routers
Wireless Routers
![Page 5: Interactive Query Formulation over Web Service-Accessed Sources](https://reader034.vdocuments.site/reader034/viewer/2022051419/56815a91550346895dc805db/html5/thumbnails/5.jpg)
5
Running Example
• Integrated schema puts togetherthe Dell and Cisco schemas
Attribute Associations• (Computers.cid, NetCards.cid)• (NetCards.rate, Routers.rate)• (NetCards.standard, Routers.standard)
Integrated Schema
V1
Application
V3V2
Dell Cisco
MediatorIntegrated
Schema
Developer
V4
![Page 6: Interactive Query Formulation over Web Service-Accessed Sources](https://reader034.vdocuments.site/reader034/viewer/2022051419/56815a91550346895dc805db/html5/thumbnails/6.jpg)
6
Sophisticated Mediators MakeFeasible Queries Hard to Predict
Feasible Queries FQ• Equivalent CQ query rewritings using the views• Might involve more than one views• Order might matter
V4
Mediator
RouWireless()
Routers.*10 .11b 50 Wireless54 .11g 120 Wireless
A
B
V2
ComNetByCpuRate(‘P4’, ‘10’)
C
DComputers.* NetCards.*A123P4 512 400 A123 10 .11b USBB123P4 1024 550 B123 54 .11g USB
Feasible
ComNetByCpuRate(‘P4’, ‘54’)
Computers.* NetCards.* Routers.*A123 P4 512 400 A123 10 .11b USB 10 .11b 50 WirelessB123 P4 1024 550 B123 54 .11g USB 54 .11g 120 Wireless
E
Query:Get all ‘P4’ Computers, together with their NetCards and their compatible ‘Wireless’ Routers
V1
Mediator
Query:Get all Computers
Infeasible
![Page 7: Interactive Query Formulation over Web Service-Accessed Sources](https://reader034.vdocuments.site/reader034/viewer/2022051419/56815a91550346895dc805db/html5/thumbnails/7.jpg)
7
Problem
1. Large number of sources2. Large number of views (web-services)3. Mediator capabilities
Developer formulates an application query Is an application query feasible? If not, how do I know which ones are feasible?
Previous options:– The developer had to browse the view definitions and
somehow formulate a feasible query– Or formulate queries until a feasible one is found
(trial-and-error)
No system-provided guidance
![Page 8: Interactive Query Formulation over Web Service-Accessed Sources](https://reader034.vdocuments.site/reader034/viewer/2022051419/56815a91550346895dc805db/html5/thumbnails/8.jpg)
8
The CLIDE Solution
A query formulation interface, which interactively guides the developer toward feasible queries by employing a coloring scheme
CLIDE
V1
Application
V3V2
Dell Cisco
MediatorIntegrated
Schema
Developer
V4
![Page 9: Interactive Query Formulation over Web Service-Accessed Sources](https://reader034.vdocuments.site/reader034/viewer/2022051419/56815a91550346895dc805db/html5/thumbnails/9.jpg)
9
QBE-Like Interfaces
Microsoft SQL-Server
![Page 10: Interactive Query Formulation over Web Service-Accessed Sources](https://reader034.vdocuments.site/reader034/viewer/2022051419/56815a91550346895dc805db/html5/thumbnails/10.jpg)
10
CLIDE Interface
• Table, selection, projection and join actions• Feasibility Flag• Color-based suggestions
Projection Box
Table Boxes
Selection Boxes
Table Alias
Feasibility Flag
Last/Next Step
![Page 11: Interactive Query Formulation over Web Service-Accessed Sources](https://reader034.vdocuments.site/reader034/viewer/2022051419/56815a91550346895dc805db/html5/thumbnails/11.jpg)
11
Example Interaction
Yellow required action– All feasible queries require this action
White optional action– Feasible queries can be formulated
w/ or w/o these actions
Snapshot 1
![Page 12: Interactive Query Formulation over Web Service-Accessed Sources](https://reader034.vdocuments.site/reader034/viewer/2022051419/56815a91550346895dc805db/html5/thumbnails/12.jpg)
12
Example Interaction
Snapshot 2
Blue required choice of action– At least one feasible query cannot be formulated
unless this action is performed
V1
Mediator
ComByCpu(‘P4’)cid cpu ram price
A123 P4 512 400B123 P4 1024 550
ram price512 400
1024 550
A B
C
![Page 13: Interactive Query Formulation over Web Service-Accessed Sources](https://reader034.vdocuments.site/reader034/viewer/2022051419/56815a91550346895dc805db/html5/thumbnails/13.jpg)
13
Example Interaction
Join Lines:• Only yellow and blue are displayed• Must appear in Attribute Associations
Snapshot 3
![Page 14: Interactive Query Formulation over Web Service-Accessed Sources](https://reader034.vdocuments.site/reader034/viewer/2022051419/56815a91550346895dc805db/html5/thumbnails/14.jpg)
14
Example Interaction
Snapshot 4
• * any other constant• Red prohibited action
– Does not appear in any feasible query– Lead to “Dead End” state
![Page 15: Interactive Query Formulation over Web Service-Accessed Sources](https://reader034.vdocuments.site/reader034/viewer/2022051419/56815a91550346895dc805db/html5/thumbnails/15.jpg)
15
Example Interaction
Computers.* NetCards.*A123P4 512 400 A123 10 .11b 50B123P4 1024 550 B123 54 .11g 120
Snapshot 5
Mediator
RouWireless()Routers.*
10 .11b 512 Wireless54 .11g 1024 Wireless
A
B
ComNetByCpuRate(‘P4’, rate)
D
E
ram price rate interface price512 400 10 USB 501024 550 54 USB 120
F
V4 V2
![Page 16: Interactive Query Formulation over Web Service-Accessed Sources](https://reader034.vdocuments.site/reader034/viewer/2022051419/56815a91550346895dc805db/html5/thumbnails/16.jpg)
16
CLIDE Properties
• Completeness of Suggestions– Every feasible query can be formulated by
performing yellow and blue actions at every step
• Summarization of Suggestions– At every step, only a minimal number of actions is
suggested, i.e., the ones that are needed to preserve completeness
• Rapid Convergence By Following Suggestions– The shortest sequence of actions from a query to
any feasible query consists of suggested actions
![Page 17: Interactive Query Formulation over Web Service-Accessed Sources](https://reader034.vdocuments.site/reader034/viewer/2022051419/56815a91550346895dc805db/html5/thumbnails/17.jpg)
17
Join ActionTableAction
SelectionAction
Interaction Graph
• Nodes are queries: One for each qCQ
• Edges are actions: Table, selection, projection and join actions
• Green nodes are feasible queries• Infinitely big structure
– All CQ queries– All possible combinations of actions formulating them
Com1.cid=Net1.cidCom1.cpu=‘P4’Com1 Com1.ram Rou1…… Com1.price ……
s
… … ………Net1 …
![Page 18: Interactive Query Formulation over Web Service-Accessed Sources](https://reader034.vdocuments.site/reader034/viewer/2022051419/56815a91550346895dc805db/html5/thumbnails/18.jpg)
18
Interaction Graph: Colorable Actions
• Colorable actions AC labeloutgoing edges of the current node Net1
Com1.cpu=*
Com1.price=*
Rou1
Com1.ram=*
Com1.cid=*
Com2
Com1.cid
……
……
…
Com1.cpu
……
……
Current Node
![Page 19: Interactive Query Formulation over Web Service-Accessed Sources](https://reader034.vdocuments.site/reader034/viewer/2022051419/56815a91550346895dc805db/html5/thumbnails/19.jpg)
19
Com1.cpu=*
Interaction Graph: Colors
Com1.cpu=*
…
……
……
……
…
……
…
…
CurrentNode
Net1 Com1.cid=Net1.cid
Com2.cid=Net1.cid
Com2
Com2.cpu=‘P4’ Net1.rate=‘54Mbps’
Net1.rate=’54Mbps’…… … … …
… …
Com1.cpu=* Rou1 Net1.rate=Rou1.rate ……… …
Net1.rate=’54Mbps’ …
Com1.cid=Net1.cid
Com1.cid=Net1.cid …Net1
Com1.price=*
Rou1
Com1.ram=*
Com1.cid=*
Com2
Com1.cid
Com1.cpu
• Yellow action – Every path from current node n to a
feasible node contains • Blue action
– At least one feasible query cannot be formulated unless this action is performed (summarization)
• Red action – No path to a feasible node contains
CurrentNode
Com1.cid=Net1.cid
Com2
Rou1
Net1.rate=’54Mbps’
![Page 20: Interactive Query Formulation over Web Service-Accessed Sources](https://reader034.vdocuments.site/reader034/viewer/2022051419/56815a91550346895dc805db/html5/thumbnails/20.jpg)
20
CLIDE Architecture
• Back-End invoked every time the user performs an action– i.e., the user arrives at a new node in the interactions graph
Back-End
Closest Feasible Queries Algorithm
User
Closest Feasible Queries FQC
Current Query
Color Algorithm
Colored Actions + Feasibility Flag
Aliases Collapse Rule
Maximally-Contained Rewriter
ViewsSchemas ColumnAssociations
Minimal FeasibleExtension Queries
Front-EndActions
Parameters Algorithm
Seed Queries SQ
![Page 21: Interactive Query Formulation over Web Service-Accessed Sources](https://reader034.vdocuments.site/reader034/viewer/2022051419/56815a91550346895dc805db/html5/thumbnails/21.jpg)
21
Color DeterminedBy a Finite Set of Feasible Queries
• FQC is sufficient to color actions in AC
• Theorem: Set of Closest Feasible Queries is Finite
n
…
…
…
…
…
…
ClosestFeasibleQueries FQC
Challenge: Infinitely Many Feasible Queries
Radius?…
Solution: Closest Feasible Queries FQC
Challenge: How far can the Closest Feasible Queries FQC be?
Solution: Based on Maximally Contained Queries FQMC
![Page 22: Interactive Query Formulation over Web Service-Accessed Sources](https://reader034.vdocuments.site/reader034/viewer/2022051419/56815a91550346895dc805db/html5/thumbnails/22.jpg)
22
Maximally Contained Queries FQMC
• Assuming fixed SELECT clause (projection list)• Covered extensively in literature
– MiniCon, Bucket, InverseRules Algorithms
• FQMC is finite
Maximally Contained Query
Query: Q1Get all Computers
Query: Q2Get all Computers with a given cpu
Query: Q3Get all Computerswith a given cpu & ram
Not Maximally ContainedMaximally Contained Query
Query: Q4Get all Computerswith a given ram
![Page 23: Interactive Query Formulation over Web Service-Accessed Sources](https://reader034.vdocuments.site/reader034/viewer/2022051419/56815a91550346895dc805db/html5/thumbnails/23.jpg)
23
Closest Feasible Queries FQC Algorithm
• Compute maximally contained queries FQMC
• Theorem: All FQC queries are reachable via a path of length p pL
• The radius pL is the longest path to a maximally contained query
ClosestFeasibleQueries FQC
MaximallyContainedQueries FQMC
n
…
…
…
…
…
…
pL Radius
…
Solution: Maximally Contained Queries FQMC
Challenge: How far can the Closest Feasible Queries FQC be?
![Page 24: Interactive Query Formulation over Web Service-Accessed Sources](https://reader034.vdocuments.site/reader034/viewer/2022051419/56815a91550346895dc805db/html5/thumbnails/24.jpg)
24
Closest Feasible Queries FQC Algorithm
• Theorem: All queries in FQMC are in FQC
• But not all queries in FQC are in FQMC
ClosestFeasibleQueries FQC
MaximallyContainedFeasibleQueries FQMC
…
…
…
…
…
…
More feasible nodes
n
Challenge: Find the Closest Feasible Queries
![Page 25: Interactive Query Formulation over Web Service-Accessed Sources](https://reader034.vdocuments.site/reader034/viewer/2022051419/56815a91550346895dc805db/html5/thumbnails/25.jpg)
25
Closest Feasible Queries FQC Algorithm
• Collapse Aliases to compute FQC \ FQMC
• Check satisfiability
ClosestFeasibleQueries FQC
MaximallyContainedFeasibleQueries FQMC
n
…
…
…
…
…
…
Solution: Collapse Aliases
![Page 26: Interactive Query Formulation over Web Service-Accessed Sources](https://reader034.vdocuments.site/reader034/viewer/2022051419/56815a91550346895dc805db/html5/thumbnails/26.jpg)
26
Color Algorithm
Yellow and Blue• An action is colored based on which closest feasible
queries it appear in
• Yellow, if appears in all queries in FQC
• Blue, if appears in at least one (but not all) query in FQC
White and Red• Attach Maximum Projection Lists to Closest Feasible
Queries– Projections that can be added to a feasible query, without
compromising feasibility
• Projection is white if in the maximum projection list• Color selections based on projections
![Page 27: Interactive Query Formulation over Web Service-Accessed Sources](https://reader034.vdocuments.site/reader034/viewer/2022051419/56815a91550346895dc805db/html5/thumbnails/27.jpg)
27
CLIDE Implementation & Optimizations
• Views expansion introduce redundancy– Affects CLIDE’s rapid convergence and summarization
• Efficient containment test crucial to redundancy removal
Maximally-Contained Rewriter
Feasible Extension Queries+ Maximum Projection Lists
Maximally-Contained Feasible Extension Queries+ Maximum Projection Lists
Maximally-Contained Feasible Queries over Views+ Containment Mappings
MiniCon
Containment Mappings Logging
Redundant Queries Removal
Minimal Feasible Extension Queries FQME
+ Maximum Projection Lists
Redundant Actions Removal
Views Expansion
Back-End
Closest Feasible Queries Algorithm
Closest Feasible Queries FQC
Current Query
Color Algorithm
Colored Actions + Feasibility Flag
Aliases Collapse Rule
Maximally-Contained Rewriter
ViewsSchemas ColumnAssociations
Minimal FeasibleExtension Queries
Front-End
Parameters Algorithm
Seed Queries SQ
![Page 28: Interactive Query Formulation over Web Service-Accessed Sources](https://reader034.vdocuments.site/reader034/viewer/2022051419/56815a91550346895dc805db/html5/thumbnails/28.jpg)
28
CLIDE Performance
• Queries
A-span = 7B-span = 3Selections = 4,6,8,10
A
B1
…C1
B2 C1A
BK
B1…
C1
CL
…• Schema…
Bi
… Ci
• Views
A
BK
B1…
C1
CL
…
… …
BiM
Bi1…
CiM
Ci1…
Chains of Stars – No Parameters
![Page 29: Interactive Query Formulation over Web Service-Accessed Sources](https://reader034.vdocuments.site/reader034/viewer/2022051419/56815a91550346895dc805db/html5/thumbnails/29.jpg)
29
CLIDE Performance
• Queries
A-span = 7B-span = 3Selections = 4,6,8,10
A
B1
…C1
B2 C1A
BK
B1…
C1
CL
…• Schema…
Bi
… Ci
• Views
A
BK
B1…
C1
CL
…
… …
BiM
Bi1…
CiM
Ci1…
Chains of Stars – No Parameters
![Page 30: Interactive Query Formulation over Web Service-Accessed Sources](https://reader034.vdocuments.site/reader034/viewer/2022051419/56815a91550346895dc805db/html5/thumbnails/30.jpg)
30
CLIDE Performance
• Queries
A-span = 7B-span = 3Selections = 4,6,8,10
A
B1
…C1
B2 C1A
BK
B1…
C1
CL
…• Schema…
Bi
… Ci
• Views
A
BK
B1…
C1
CL
…
… …
BiM
Bi1…
CiM
Ci1…
Chains of Stars – With Parameters
![Page 31: Interactive Query Formulation over Web Service-Accessed Sources](https://reader034.vdocuments.site/reader034/viewer/2022051419/56815a91550346895dc805db/html5/thumbnails/31.jpg)
31
CLIDE Summary
First interactive query formulation interface based on source and mediator capabilities
Applicability• Service-Oriented Architectures• Privacy-Preserving ServicesContributions• Interaction Guarantees: Rapid Convergence, Completeness,
Summarization of Suggestions• Interaction Graph• Back-End Algorithms
– Closest Feasible Queries, Colors, Parameters
• Modular, Customizable Architecture
http://www.clide.info