![Page 1: Boolean + Ranking: Querying a Database by K-Constrained Optimization](https://reader036.vdocuments.site/reader036/viewer/2022062422/56813b8f550346895da4bf94/html5/thumbnails/1.jpg)
Zhen Zhang Seung-won HwangKevin C. ChangMin WangChristian A. LangYuan-chi Chang
Presented ACM SIGMOD Conference (SIGMOD 2006), Chicago, June 2006
Presented By : Pavan Kumar M.K. (1000618890) Aditya Mangipudi (1000649172)
![Page 2: Boolean + Ranking: Querying a Database by K-Constrained Optimization](https://reader036.vdocuments.site/reader036/viewer/2022062422/56813b8f550346895da4bf94/html5/thumbnails/2.jpg)
Introduction Motivation A* Search Algorithm A*-Driven State Space Construction Optimization Driven Configuration OPT* Search Algorithm Experiments Conclusion
![Page 3: Boolean + Ranking: Querying a Database by K-Constrained Optimization](https://reader036.vdocuments.site/reader036/viewer/2022062422/56813b8f550346895da4bf94/html5/thumbnails/3.jpg)
The wide spread of databases for managing structured data, compounded with the expanded reach of the Internet, has brought forward interesting data retrieval and analysis scenarios to RDBMS
Only the Top-K results are of interest to the user.
![Page 4: Boolean + Ranking: Querying a Database by K-Constrained Optimization](https://reader036.vdocuments.site/reader036/viewer/2022062422/56813b8f550346895da4bf94/html5/thumbnails/4.jpg)
4
Ranking query:
Top 5 ranked by GPA
+
Boolean query:
dept = CSE and year = 2
Qualifying constraint
Quantifying function
O: GPA
B: dept = CSE and year = 2
Find top answers
QUERY: Select the Top-5 2nd year students in CSE with highest GPA
![Page 5: Boolean + Ranking: Querying a Database by K-Constrained Optimization](https://reader036.vdocuments.site/reader036/viewer/2022062422/56813b8f550346895da4bf94/html5/thumbnails/5.jpg)
Query Q = (G, k)
G - Goal Function G = B . O k – Retrieval Size
![Page 6: Boolean + Ranking: Querying a Database by K-Constrained Optimization](https://reader036.vdocuments.site/reader036/viewer/2022062422/56813b8f550346895da4bf94/html5/thumbnails/6.jpg)
6
Ranking query+Boolean query
How to answer?
![Page 7: Boolean + Ranking: Querying a Database by K-Constrained Optimization](https://reader036.vdocuments.site/reader036/viewer/2022062422/56813b8f550346895da4bf94/html5/thumbnails/7.jpg)
If evaluated as separate operators
If search by an overall goal function G as a ranking function
7
Boolean query B
………Ranking query R
Current techniques optimize only condition-by-condition
D Boolean query B
Ranking query R
D RBGoal function G
![Page 8: Boolean + Ranking: Querying a Database by K-Constrained Optimization](https://reader036.vdocuments.site/reader036/viewer/2022062422/56813b8f550346895da4bf94/html5/thumbnails/8.jpg)
Att 1 Att 2
![Page 9: Boolean + Ranking: Querying a Database by K-Constrained Optimization](https://reader036.vdocuments.site/reader036/viewer/2022062422/56813b8f550346895da4bf94/html5/thumbnails/9.jpg)
Threshold Algorithm essentially relies on a rigid assumption that G functions are Monotonic.
The monotonicity requires G to be decreasing if all its parameters are decreasing.
![Page 10: Boolean + Ranking: Querying a Database by K-Constrained Optimization](https://reader036.vdocuments.site/reader036/viewer/2022062422/56813b8f550346895da4bf94/html5/thumbnails/10.jpg)
Consider the example query as below to find houses in a certain price range with good price/sqrft ratio
The function G here in Non-Monotonic.
Select h.address from House h,
Where h.price ≤ 200k ν h.price ≥ 400k
Order by h.size/|h.price-300k|
![Page 11: Boolean + Ranking: Querying a Database by K-Constrained Optimization](https://reader036.vdocuments.site/reader036/viewer/2022062422/56813b8f550346895da4bf94/html5/thumbnails/11.jpg)
Att 1 Att 2
![Page 12: Boolean + Ranking: Querying a Database by K-Constrained Optimization](https://reader036.vdocuments.site/reader036/viewer/2022062422/56813b8f550346895da4bf94/html5/thumbnails/12.jpg)
Existing algorithms build upon their problem-specific assumptions on the goal functions or index traversals.
For example, Threshold Algorithm assumes the monotonicity of G and the use of sorted accesses (interleaf navigation), based on which the search is implicitly hardwired.
In a Boolean Query like B = price > 100K, such a search is straightforward as the constraint expressions B explicitly suggests how to carry out a focused search, eg., visiting only the nodes with locality potentially satisfying B.
![Page 13: Boolean + Ranking: Querying a Database by K-Constrained Optimization](https://reader036.vdocuments.site/reader036/viewer/2022062422/56813b8f550346895da4bf94/html5/thumbnails/13.jpg)
In contrast, for a general k-constrained optimization query potentially involving arbitrary ranking combined with Boolean conditions and joining multiple relations, eg.. Q maximizing size/price ratio, it is no longer clear how to focus the search.
By encoding into a generic search with no assumptions on G, the search is generalized to support arbitrary G over potentially multiple indices and a combination of both hierarchical and interleaf traversals.
![Page 14: Boolean + Ranking: Querying a Database by K-Constrained Optimization](https://reader036.vdocuments.site/reader036/viewer/2022062422/56813b8f550346895da4bf94/html5/thumbnails/14.jpg)
A* is a well known search algorithm that finds the Shortest Path, given an initial and a designated goal state.
Widely used in the field of Artificial Intelligence. Uses Best-First Search Traversal. Uses heuristic information to carry out the search
in a guided manner. A* is guaranteed to find the correct answer
(Correctness) by visiting the least number of states (Optimality)
Ex: GPS, Google Maps, A lot of puzzles, games etc.
![Page 15: Boolean + Ranking: Querying a Database by K-Constrained Optimization](https://reader036.vdocuments.site/reader036/viewer/2022062422/56813b8f550346895da4bf94/html5/thumbnails/15.jpg)
For a tuple t with m attribute values, Goal Function G(t) maps the tuple to a positive numeric score.
15
G(t) = B(t)*R(t) = R(t) if B(t) is true
0 if B(t) is false(ie, lowest score)
![Page 16: Boolean + Ranking: Querying a Database by K-Constrained Optimization](https://reader036.vdocuments.site/reader036/viewer/2022062422/56813b8f550346895da4bf94/html5/thumbnails/16.jpg)
Addr Price Size
1. Oak park, Chicago 600K 4500
2. Mattis, Champaign 350K 2000
3. … 150K 1000
4. … 250K 2000
5. … 300K 3500
6. … 80K 500
Select h.address from House h,
Where h.price ≤ 200k ν h.price ≥ 400k
Order by h.size/|h.price-300k|
Score
15
0
6.67
0
0
2.27
![Page 17: Boolean + Ranking: Querying a Database by K-Constrained Optimization](https://reader036.vdocuments.site/reader036/viewer/2022062422/56813b8f550346895da4bf94/html5/thumbnails/17.jpg)
Addr Price Size
1. Oak park, Chicago 600K 4500
2. Mattis, Champaign 350K 2000
3. … 150K 1000
4. … 250K 2000
5. … 300K 3500
6. … 80K 500
Score
15
0
6.67
0
0
2.27
![Page 18: Boolean + Ranking: Querying a Database by K-Constrained Optimization](https://reader036.vdocuments.site/reader036/viewer/2022062422/56813b8f550346895da4bf94/html5/thumbnails/18.jpg)
To realize k-constrained optimization over databases, this paper develops the OPT* framework.
Objective: To Optimize G with the help of indices as access methods over tuples in D.
Discrete State Search: From the view of using indices, we are to search the maximizing tuples on the index nodes as “discrete states”.
Continuous Function Optimization: From the view of maximizing goal functions, we are to optimize G.
![Page 19: Boolean + Ranking: Querying a Database by K-Constrained Optimization](https://reader036.vdocuments.site/reader036/viewer/2022062422/56813b8f550346895da4bf94/html5/thumbnails/19.jpg)
19
Optimize G over D
Function optimization
of GDiscrete state
search over D
G
D
D
OPT*
![Page 20: Boolean + Ranking: Querying a Database by K-Constrained Optimization](https://reader036.vdocuments.site/reader036/viewer/2022062422/56813b8f550346895da4bf94/html5/thumbnails/20.jpg)
Indices Value Space
![Page 21: Boolean + Ranking: Querying a Database by K-Constrained Optimization](https://reader036.vdocuments.site/reader036/viewer/2022062422/56813b8f550346895da4bf94/html5/thumbnails/21.jpg)
States : States in a search graph represent “localities” of values at different granularity– from coarse to fine, and eventually reach tuples in the database.
• Region State• Tuple State
Transitions : While states of space give “locations” in the map, transitions further capture possible paths followed to reach our destination of query answers.
Example : for two states u and v, there is a transition (u, v) if v ∈ Next(u)
![Page 22: Boolean + Ranking: Querying a Database by K-Constrained Optimization](https://reader036.vdocuments.site/reader036/viewer/2022062422/56813b8f550346895da4bf94/html5/thumbnails/22.jpg)
22
250
3000
350
100
1500
4000
4500
600
250-600
0-250
100-250
0-100
350-600
250-350
52 1………
b1
b3b2
b7b6
3000-4500
0-3000
1500-3000
0-1500
4000-6000
3000-4000
5 1………
a1
a6
a3a2
a7
size
Price (k)
1
52
3 4
6
![Page 23: Boolean + Ranking: Querying a Database by K-Constrained Optimization](https://reader036.vdocuments.site/reader036/viewer/2022062422/56813b8f550346895da4bf94/html5/thumbnails/23.jpg)
23
250
3000
350
100
1500
4000
4500
600 M11
M22M32 M23 M33
M66 M77
M67
M76M55 M56M75
154 2
250-600
0-250
100-250
0-100
350-600
250-350
52 1………
b1
b3b2
b7b6
3000-4500
0-3000
1500-3000
0-1500
4000-6000
3000-4000
5 1………
a1
a6
a3a2
a7
size
Price (k)
1
52
3 4
6
Mij = (ai, bj)
……
![Page 24: Boolean + Ranking: Querying a Database by K-Constrained Optimization](https://reader036.vdocuments.site/reader036/viewer/2022062422/56813b8f550346895da4bf94/html5/thumbnails/24.jpg)
24
250
3000
350
100
1500
4000
4500
600 M11
M22 M32 M23 M33
M66 M77 M67 M76M55 M56M75
154 2
250-600
0-250
100-250
0-100
350-600
250-350
52 1………
b1
b3b2
b7b6
3000-4500
0-3000
1500-3000
0-1500
4000-6000
3000-4000
5 1………
a1
a6
a3a2
a7
size
Price (k)
1
52
3 4
6
Mij =(ai, bj)
conceptually, combined space
…
![Page 25: Boolean + Ranking: Querying a Database by K-Constrained Optimization](https://reader036.vdocuments.site/reader036/viewer/2022062422/56813b8f550346895da4bf94/html5/thumbnails/25.jpg)
Challenge 1: What is the search mechanism?
25
![Page 26: Boolean + Ranking: Querying a Database by K-Constrained Optimization](https://reader036.vdocuments.site/reader036/viewer/2022062422/56813b8f550346895da4bf94/html5/thumbnails/26.jpg)
26
> A* Gives Shortest Path to testable goal.
> The goal is to find optimal tuple states with maximal G-Score.
K-constrained optimization
Find a tuple with maximal score
A* Shortest path
Find a path with minimal distance
![Page 27: Boolean + Ranking: Querying a Database by K-Constrained Optimization](https://reader036.vdocuments.site/reader036/viewer/2022062422/56813b8f550346895da4bf94/html5/thumbnails/27.jpg)
How to encode a tuple to a path?◦ Adding a virtual target t* only reachable through tuples
How to encode maximal tuple with minimal path?◦ Quality of path depends solely on the tuple it passes
by For tuple state t D(t, t*) = - G(t) For two states r, u
D(r, u) = 0
27
M55
M11
M22 M32 M23 M33
M66 M77 M67 M76M75 M56
154 2
t*
0
0
0
0
- G(4)- G(1)
0
0
…
![Page 28: Boolean + Ranking: Querying a Database by K-Constrained Optimization](https://reader036.vdocuments.site/reader036/viewer/2022062422/56813b8f550346895da4bf94/html5/thumbnails/28.jpg)
Challenge 2: How to guide the search?
28
![Page 29: Boolean + Ranking: Querying a Database by K-Constrained Optimization](https://reader036.vdocuments.site/reader036/viewer/2022062422/56813b8f550346895da4bf94/html5/thumbnails/29.jpg)
Function optimization measures quality of states Function optimization aspects:
• Defines Proper Heuristics• Identifies a set of initial states to start search.
29
![Page 30: Boolean + Ranking: Querying a Database by K-Constrained Optimization](https://reader036.vdocuments.site/reader036/viewer/2022062422/56813b8f550346895da4bf94/html5/thumbnails/30.jpg)
Input : G(x1,……,xm) and domain of values dom = xi ε [xi
1,xi2]
Output : <O,U> = OPT(G,dom) where O={gives local optima} U={Upper Bound Score}
OPTPOINT gives O Component of OPTOPTMAX gives U Component of OPT
Approaches
Analytical MethodSeach based (Ex:Hill Climbing)Template Based
![Page 31: Boolean + Ranking: Querying a Database by K-Constrained Optimization](https://reader036.vdocuments.site/reader036/viewer/2022062422/56813b8f550346895da4bf94/html5/thumbnails/31.jpg)
Figure illustrates different states have different promises.
Search should favor the choice of M77 over M67 because its more promising.
HighMediumLow
![Page 32: Boolean + Ranking: Querying a Database by K-Constrained Optimization](https://reader036.vdocuments.site/reader036/viewer/2022062422/56813b8f550346895da4bf94/html5/thumbnails/32.jpg)
To guarantee completeness◦ A* requires admissible heuristics, i.e., estimate optimistically
To ensure admissible heuristics◦ Function optimization gives tightest upper bound
Analytical approaches Numeric analysis package
32
H(region) = OPTMAX(G, region)
i.e., maximal value of G in the region
![Page 33: Boolean + Ranking: Querying a Database by K-Constrained Optimization](https://reader036.vdocuments.site/reader036/viewer/2022062422/56813b8f550346895da4bf94/html5/thumbnails/33.jpg)
h(M67) gives U=0 However if we follow the link from M67 to M77, we can
reach Tuple 1 with score 15.
250
3000
350
100
1500
4000
4500
600 1
52
3 4
6
M77M67
![Page 34: Boolean + Ranking: Querying a Database by K-Constrained Optimization](https://reader036.vdocuments.site/reader036/viewer/2022062422/56813b8f550346895da4bf94/html5/thumbnails/34.jpg)
To guarantee optimality ◦ A* requires descending heuristics
To ensure descending heuristics◦ Remove uphill links
34
M11
M22 M32 M23 M33
M66 M77 M67 M76M55 M75 M56
154 2
…
![Page 35: Boolean + Ranking: Querying a Database by K-Constrained Optimization](https://reader036.vdocuments.site/reader036/viewer/2022062422/56813b8f550346895da4bf94/html5/thumbnails/35.jpg)
To guarantee correctness◦ Every tuple state must be reachable from start states◦ Taking only downhills requires start with high points
To ensure reachability◦ Initial states should contain all local optima
35
M11
M22 M32 M23 M33
M66 M77 M67 M76M55 M75 M56
15
42
…
![Page 36: Boolean + Ranking: Querying a Database by K-Constrained Optimization](https://reader036.vdocuments.site/reader036/viewer/2022062422/56813b8f550346895da4bf94/html5/thumbnails/36.jpg)
36
M11
M22 M32 M23 M33
M66 M77 M67 M76M55 M75 M56
154 2
M57…
Search is implemented as priority queue driven traversal
top-down
![Page 37: Boolean + Ranking: Querying a Database by K-Constrained Optimization](https://reader036.vdocuments.site/reader036/viewer/2022062422/56813b8f550346895da4bf94/html5/thumbnails/37.jpg)
Example . Given a set of states constructed from the set of index graph I, the search, in principle, should follow those transitions to look for the tuple states maximizing the goal function.. The search may follow the path
M11 → M33 → M77 → 1 Top-down search
M57 → M77 → 1 Bottom-Up Search
![Page 38: Boolean + Ranking: Querying a Database by K-Constrained Optimization](https://reader036.vdocuments.site/reader036/viewer/2022062422/56813b8f550346895da4bf94/html5/thumbnails/38.jpg)
M11
M22 M32 M23 M33
M66 M77 M67 M76M55 M75 M56
14
25
![Page 39: Boolean + Ranking: Querying a Database by K-Constrained Optimization](https://reader036.vdocuments.site/reader036/viewer/2022062422/56813b8f550346895da4bf94/html5/thumbnails/39.jpg)
OPT* may result in different costs if started at different initial states.
Top down-> More hops | Bottom up->Less hops
Preference goes to Bottom Up but what if Goal functions G=1/(X-Y)2+1, any value satisfying
X=Y maximizes the function.
![Page 40: Boolean + Ranking: Querying a Database by K-Constrained Optimization](https://reader036.vdocuments.site/reader036/viewer/2022062422/56813b8f550346895da4bf94/html5/thumbnails/40.jpg)
Comparison vs.◦ Boolean then ranking◦ Ranking then boolean
Metrics: node accessed = Nl + Nt
Settings:◦ Benchmark queries over real dataset◦ Controlled queries over synthetic dataset
40
![Page 41: Boolean + Ranking: Querying a Database by K-Constrained Optimization](https://reader036.vdocuments.site/reader036/viewer/2022062422/56813b8f550346895da4bf94/html5/thumbnails/41.jpg)
Datasets:◦ 19,706 real estate listing crawled online
Queries◦ Q1: size * bedrms/| price-450k| : [40k<=price<=50k]◦ Q2: size * ebedrms / |price-350k| : [price<400k^size>4000]◦ Q3: size/price : [bedrms=3 ν bedrms=4]
41
BR_unclustered
BR_clustered
OPT*
Q1 Q2 Q3
![Page 42: Boolean + Ranking: Querying a Database by K-Constrained Optimization](https://reader036.vdocuments.site/reader036/viewer/2022062422/56813b8f550346895da4bf94/html5/thumbnails/42.jpg)
Datasets◦ Three randomly generated datasets of 100k points
Uniform, gaussian, logvariatenormal Queries
◦ Linear average queries: (eg, 0.4*a + 0.6*b)◦ Nearest neighbor queries: (eg, (x-3)^2 + (y-4)^2)◦ Join queries: (0.4*R.a + 0.6*S.b: R.c=R.d)
42
!"#$
%
!"#$
! "#$%
![Page 43: Boolean + Ranking: Querying a Database by K-Constrained Optimization](https://reader036.vdocuments.site/reader036/viewer/2022062422/56813b8f550346895da4bf94/html5/thumbnails/43.jpg)
Problem◦ Study K-constrained optimization queries as boolean
+ ranking Abstraction
◦ Encode K-constrained optimization into shortest path problem
Framework◦ Develop OPT* to process K-constrained optimization
43
![Page 44: Boolean + Ranking: Querying a Database by K-Constrained Optimization](https://reader036.vdocuments.site/reader036/viewer/2022062422/56813b8f550346895da4bf94/html5/thumbnails/44.jpg)
References
• Boolean + Ranking: Querying a Database by K-Constrained Optimization. Z. Zhang, S. Hwang, K. C.-C. Chang, M. Wang, C. Lang, and Y. Chang. In Proceedings of the 2006 ACM SIGMOD Conference (SIGMOD 2006), pages 359-370, Chicago, June 2006
• www.wikipedia.org
44
![Page 45: Boolean + Ranking: Querying a Database by K-Constrained Optimization](https://reader036.vdocuments.site/reader036/viewer/2022062422/56813b8f550346895da4bf94/html5/thumbnails/45.jpg)
Questions?
45