vldb 2006, seoul1 indexing for function approximation biswanath panda mirek riedewald, stephen b....
TRANSCRIPT
![Page 1: VLDB 2006, Seoul1 Indexing For Function Approximation Biswanath Panda Mirek Riedewald, Stephen B. Pope, Johannes Gehrke, L. Paul Chew Cornell University](https://reader035.vdocuments.site/reader035/viewer/2022062322/56649f335503460f94c5018c/html5/thumbnails/1.jpg)
VLDB 2006, Seoul 1
Indexing For Function Approximation
Biswanath PandaMirek Riedewald, Stephen B. Pope, Johannes
Gehrke, L. Paul Chew
Cornell University
![Page 2: VLDB 2006, Seoul1 Indexing For Function Approximation Biswanath Panda Mirek Riedewald, Stephen B. Pope, Johannes Gehrke, L. Paul Chew Cornell University](https://reader035.vdocuments.site/reader035/viewer/2022062322/56649f335503460f94c5018c/html5/thumbnails/2.jpg)
VLDB 2006, Seoul 2
Motivation
• Simulations are important in science
• Large simulations computationally infeasible– Driven by complex mathematical models – Require solution to complex differential equations
• Approximation techniques speed up simulations– Bounded error in the simulation – Approximate simulation steps using information from
previous steps
![Page 3: VLDB 2006, Seoul1 Indexing For Function Approximation Biswanath Panda Mirek Riedewald, Stephen B. Pope, Johannes Gehrke, L. Paul Chew Cornell University](https://reader035.vdocuments.site/reader035/viewer/2022062322/56649f335503460f94c5018c/html5/thumbnails/3.jpg)
VLDB 2006, Seoul 3
Outline
• Example scientific application– Combustion simulation
• Function approximation problem– Formulation– Hardness– Algorithm
• Indexing problem
![Page 4: VLDB 2006, Seoul1 Indexing For Function Approximation Biswanath Panda Mirek Riedewald, Stephen B. Pope, Johannes Gehrke, L. Paul Chew Cornell University](https://reader035.vdocuments.site/reader035/viewer/2022062322/56649f335503460f94c5018c/html5/thumbnails/4.jpg)
VLDB 2006, Seoul 4
Combustion SimulationHigh Dimensional
Composition Vector
Inflow
Outflow
Mixing &
Reaction
Air
Methane
Air + Methane
![Page 5: VLDB 2006, Seoul1 Indexing For Function Approximation Biswanath Panda Mirek Riedewald, Stephen B. Pope, Johannes Gehrke, L. Paul Chew Cornell University](https://reader035.vdocuments.site/reader035/viewer/2022062322/56649f335503460f94c5018c/html5/thumbnails/5.jpg)
VLDB 2006, Seoul 5
Properties Of Simulation
• Composition dimensionality– 9 for simple hydrogen simulations– >50 for complex methane simulations
• Cost of reaction function evaluation: 30ms• Number of function evaluations: 108 to 1010
• Total simulation time– 108 function evaluations ≈ 35 days
![Page 6: VLDB 2006, Seoul1 Indexing For Function Approximation Biswanath Panda Mirek Riedewald, Stephen B. Pope, Johannes Gehrke, L. Paul Chew Cornell University](https://reader035.vdocuments.site/reader035/viewer/2022062322/56649f335503460f94c5018c/html5/thumbnails/6.jpg)
VLDB 2006, Seoul 6
Function Approximation
• Approximate the reaction function• Approach
– Use previous function evaluations to approximate future function evaluations
– ISAT (In Situ Adaptive Tabulation) [Pope’ 97]
• Definition: ε-approximation of f(x)– Let f: Rm → Rn be a function, let x Rm and ε R. f*(x)
is an ε-approximation of f(x) if || f*(x) –f(x)|| < ε
![Page 7: VLDB 2006, Seoul1 Indexing For Function Approximation Biswanath Panda Mirek Riedewald, Stephen B. Pope, Johannes Gehrke, L. Paul Chew Cornell University](https://reader035.vdocuments.site/reader035/viewer/2022062322/56649f335503460f94c5018c/html5/thumbnails/7.jpg)
VLDB 2006, Seoul 8
Example
Cost
f
![Page 8: VLDB 2006, Seoul1 Indexing For Function Approximation Biswanath Panda Mirek Riedewald, Stephen B. Pope, Johannes Gehrke, L. Paul Chew Cornell University](https://reader035.vdocuments.site/reader035/viewer/2022062322/56649f335503460f94c5018c/html5/thumbnails/8.jpg)
VLDB 2006, Seoul 9
Example
x2x1
ε
ε
f*(x2) = f(x) + s * (x2 - x)
( x, f(x) )
An ε-Local Region Rf,f*(x, ε) Rm
Original Cost
Cost
f
![Page 9: VLDB 2006, Seoul1 Indexing For Function Approximation Biswanath Panda Mirek Riedewald, Stephen B. Pope, Johannes Gehrke, L. Paul Chew Cornell University](https://reader035.vdocuments.site/reader035/viewer/2022062322/56649f335503460f94c5018c/html5/thumbnails/9.jpg)
VLDB 2006, Seoul 10
x1 x2 x3 x4 x5 x6
Original Cost
Cost
Example
f
f1*
f2*
f3*
![Page 10: VLDB 2006, Seoul1 Indexing For Function Approximation Biswanath Panda Mirek Riedewald, Stephen B. Pope, Johannes Gehrke, L. Paul Chew Cornell University](https://reader035.vdocuments.site/reader035/viewer/2022062322/56649f335503460f94c5018c/html5/thumbnails/10.jpg)
VLDB 2006, Seoul 11
x1 x2 x3 x4 x5 x6
Example
f
f1*
f2*
f3*
When should a local region be added?
![Page 11: VLDB 2006, Seoul1 Indexing For Function Approximation Biswanath Panda Mirek Riedewald, Stephen B. Pope, Johannes Gehrke, L. Paul Chew Cornell University](https://reader035.vdocuments.site/reader035/viewer/2022062322/56649f335503460f94c5018c/html5/thumbnails/11.jpg)
VLDB 2006, Seoul 12
Example
Each query point can be covered by several Local Regions
x1 x2 x3 x4 x5 x6x7 x8
f
f1*
f2*
f3*
f4*
![Page 12: VLDB 2006, Seoul1 Indexing For Function Approximation Biswanath Panda Mirek Riedewald, Stephen B. Pope, Johannes Gehrke, L. Paul Chew Cornell University](https://reader035.vdocuments.site/reader035/viewer/2022062322/56649f335503460f94c5018c/html5/thumbnails/12.jpg)
VLDB 2006, Seoul 15
Challenges
• Finding good f* s and corresponding Local Regions
• Computing a set of Local Regions• Data management: storing Local Regions for
future use
• Problem: Minimize total simulation time by computing and storing a set of Local Regions
![Page 13: VLDB 2006, Seoul1 Indexing For Function Approximation Biswanath Panda Mirek Riedewald, Stephen B. Pope, Johannes Gehrke, L. Paul Chew Cornell University](https://reader035.vdocuments.site/reader035/viewer/2022062322/56649f335503460f94c5018c/html5/thumbnails/13.jpg)
VLDB 2006, Seoul 17
Finding The Optimal Set Of Local Regions
• Simplified cost model– Both the function value and Local Region at a point can be
obtained at some constant cost equal across all regions– Approximations have zero cost
• Offline Problem– Given a set X={ x1, x2, … xn } of query points, find the smallest
set L={ l1, l2, … lk } of Local Regions, such that for each xi X there is an lj L which contains xi
– NP-Complete: Reduction from Geometric Covering By Discs
• Online Problem– No online algorithm is competitive
![Page 14: VLDB 2006, Seoul1 Indexing For Function Approximation Biswanath Panda Mirek Riedewald, Stephen B. Pope, Johannes Gehrke, L. Paul Chew Cornell University](https://reader035.vdocuments.site/reader035/viewer/2022062322/56649f335503460f94c5018c/html5/thumbnails/14.jpg)
VLDB 2006, Seoul 19
Algorithm Illustration
x1 x2 x3 x4 x5 x6x7 x8
f
f1*
f2*
f3*
f4*
![Page 15: VLDB 2006, Seoul1 Indexing For Function Approximation Biswanath Panda Mirek Riedewald, Stephen B. Pope, Johannes Gehrke, L. Paul Chew Cornell University](https://reader035.vdocuments.site/reader035/viewer/2022062322/56649f335503460f94c5018c/html5/thumbnails/15.jpg)
VLDB 2006, Seoul 20
Algorithm
Initialize S
Lookup x in S
Local Region Found?
Return Approximation
Y N
Add new region containing x to S
Evaluate function at x
Retrieve
Add
Simulation
![Page 16: VLDB 2006, Seoul1 Indexing For Function Approximation Biswanath Panda Mirek Riedewald, Stephen B. Pope, Johannes Gehrke, L. Paul Chew Cornell University](https://reader035.vdocuments.site/reader035/viewer/2022062322/56649f335503460f94c5018c/html5/thumbnails/16.jpg)
VLDB 2006, Seoul 21
Possible Instantiation Of Local Regions
• Local Regions can be approximated using high dimensional ellipsoids [Pope ‘97]– Based on Taylor Expansion of function
• Two step approach– Initial conservative approximation
– Grow
x x1
![Page 17: VLDB 2006, Seoul1 Indexing For Function Approximation Biswanath Panda Mirek Riedewald, Stephen B. Pope, Johannes Gehrke, L. Paul Chew Cornell University](https://reader035.vdocuments.site/reader035/viewer/2022062322/56649f335503460f94c5018c/html5/thumbnails/17.jpg)
VLDB 2006, Seoul 22
Example
x2x1
x ε’ < ε
![Page 18: VLDB 2006, Seoul1 Indexing For Function Approximation Biswanath Panda Mirek Riedewald, Stephen B. Pope, Johannes Gehrke, L. Paul Chew Cornell University](https://reader035.vdocuments.site/reader035/viewer/2022062322/56649f335503460f94c5018c/html5/thumbnails/18.jpg)
VLDB 2006, Seoul 23
Example
x’2
x
x’1
ε’ < ε
![Page 19: VLDB 2006, Seoul1 Indexing For Function Approximation Biswanath Panda Mirek Riedewald, Stephen B. Pope, Johannes Gehrke, L. Paul Chew Cornell University](https://reader035.vdocuments.site/reader035/viewer/2022062322/56649f335503460f94c5018c/html5/thumbnails/19.jpg)
VLDB 2006, Seoul 24
Example
x’1 x’2
x
ε
ε’ < ε
![Page 20: VLDB 2006, Seoul1 Indexing For Function Approximation Biswanath Panda Mirek Riedewald, Stephen B. Pope, Johannes Gehrke, L. Paul Chew Cornell University](https://reader035.vdocuments.site/reader035/viewer/2022062322/56649f335503460f94c5018c/html5/thumbnails/20.jpg)
VLDB 2006, Seoul 26
Updating Existing RegionsN
Evaluate function at x
Can existing region
contain x?
Update existing regions to contain x
Add new region containing x to S
GrowNY
![Page 21: VLDB 2006, Seoul1 Indexing For Function Approximation Biswanath Panda Mirek Riedewald, Stephen B. Pope, Johannes Gehrke, L. Paul Chew Cornell University](https://reader035.vdocuments.site/reader035/viewer/2022062322/56649f335503460f94c5018c/html5/thumbnails/21.jpg)
VLDB 2006, Seoul 28
Outline
• Example scientific application– Combustion Simulation
• Function Approximation Problem– Formulation– Hardness– Algorithm
• Indexing problem
![Page 22: VLDB 2006, Seoul1 Indexing For Function Approximation Biswanath Panda Mirek Riedewald, Stephen B. Pope, Johannes Gehrke, L. Paul Chew Cornell University](https://reader035.vdocuments.site/reader035/viewer/2022062322/56649f335503460f94c5018c/html5/thumbnails/22.jpg)
VLDB 2006, Seoul 29
Indexing Problem
• Workload– Retrieve: Find ellipsoid
containing query point
![Page 23: VLDB 2006, Seoul1 Indexing For Function Approximation Biswanath Panda Mirek Riedewald, Stephen B. Pope, Johannes Gehrke, L. Paul Chew Cornell University](https://reader035.vdocuments.site/reader035/viewer/2022062322/56649f335503460f94c5018c/html5/thumbnails/23.jpg)
VLDB 2006, Seoul 30
Indexing Problem
• Workload– Retrieve: Find ellipsoid
containing query point– Grow
• Find ellipsoids to be grown
• Update grown ellipsoids
![Page 24: VLDB 2006, Seoul1 Indexing For Function Approximation Biswanath Panda Mirek Riedewald, Stephen B. Pope, Johannes Gehrke, L. Paul Chew Cornell University](https://reader035.vdocuments.site/reader035/viewer/2022062322/56649f335503460f94c5018c/html5/thumbnails/24.jpg)
VLDB 2006, Seoul 31
Indexing Problem
• Workload– Retrieve: Find ellipsoid
containing query point– Grow
• Find ellipsoids to be grown
• Update grown ellipsoids
– Add: Insert a new ellipsoid
![Page 25: VLDB 2006, Seoul1 Indexing For Function Approximation Biswanath Panda Mirek Riedewald, Stephen B. Pope, Johannes Gehrke, L. Paul Chew Cornell University](https://reader035.vdocuments.site/reader035/viewer/2022062322/56649f335503460f94c5018c/html5/thumbnails/25.jpg)
VLDB 2006, Seoul 32
New Indexing Problem• Shape of regions• Updates and queries interleaved • Additional costs: ellipsoid maintenance costs
• Overall aim: Reduce total simulation time• Retrieve/grow/add are all optional
– Tuning parameters at each step
Operation Cost
Evaluation 2000
Addition 1200
Grow 10
Approximation 1
Search 1
![Page 26: VLDB 2006, Seoul1 Indexing For Function Approximation Biswanath Panda Mirek Riedewald, Stephen B. Pope, Johannes Gehrke, L. Paul Chew Cornell University](https://reader035.vdocuments.site/reader035/viewer/2022062322/56649f335503460f94c5018c/html5/thumbnails/26.jpg)
VLDB 2006, Seoul 34
Outline
• Example scientific application– Combustion simulation
• Function approximation problem– Formulation– Hardness– Algorithm
• Indexing problem– Cost structure, tuning parameters and effects– Index structures and experiments
![Page 27: VLDB 2006, Seoul1 Indexing For Function Approximation Biswanath Panda Mirek Riedewald, Stephen B. Pope, Johannes Gehrke, L. Paul Chew Cornell University](https://reader035.vdocuments.site/reader035/viewer/2022062322/56649f335503460f94c5018c/html5/thumbnails/27.jpg)
VLDB 2006, Seoul 35
Grow Effects
Cmiss = tf + tgrowsearch + Igrow * Cgrow + (1-Igrow)*Cadd
• Tuning Parameter: Ellg – Limit on number of ellipsoids examined for growing– No pruning criteria – Affects
• tgrowsearch
• Chance of finding a growable ellipsoid
• Tuning Parameter: Ngrown – Number of ellipsoids grown per step– Affects
• Cgrow
• Structure of the index (overlapping ellipsoids)
![Page 28: VLDB 2006, Seoul1 Indexing For Function Approximation Biswanath Panda Mirek Riedewald, Stephen B. Pope, Johannes Gehrke, L. Paul Chew Cornell University](https://reader035.vdocuments.site/reader035/viewer/2022062322/56649f335503460f94c5018c/html5/thumbnails/28.jpg)
VLDB 2006, Seoul 36
Retrieve Effects
Ctot = tsearch + Iret * tla + (1-Iret) * Cmiss
• Tuning Parameter: Ellr – Limit on number of ellipsoids examined during retrieve– Limits how much of the index is searched
– Affects• tsearch
• Chances of a current retrieve and also future retrieves
![Page 29: VLDB 2006, Seoul1 Indexing For Function Approximation Biswanath Panda Mirek Riedewald, Stephen B. Pope, Johannes Gehrke, L. Paul Chew Cornell University](https://reader035.vdocuments.site/reader035/viewer/2022062322/56649f335503460f94c5018c/html5/thumbnails/29.jpg)
VLDB 2006, Seoul 38
Add Effects
Cmiss = tf + tgrowsearch + Igrow * Cgrow + (1-Igrow)*Cadd
• Tuning parameter: Indirectly controlled by retrieves and grows– Affects
• Should query point be covered by an add or grow?
(-) Computing new ellipsoids is expensive
(-) New ellipsoids cover smaller part of the domain
(+) May lead to better ellipsoid distribution
![Page 30: VLDB 2006, Seoul1 Indexing For Function Approximation Biswanath Panda Mirek Riedewald, Stephen B. Pope, Johannes Gehrke, L. Paul Chew Cornell University](https://reader035.vdocuments.site/reader035/viewer/2022062322/56649f335503460f94c5018c/html5/thumbnails/30.jpg)
VLDB 2006, Seoul 39
Candidate Index Structures
• Bounding Box Rtree• Point Rtree• Ellipsoid Rtree• Random Projection Rtree• Binary Tree• MRU List + Rtree
![Page 31: VLDB 2006, Seoul1 Indexing For Function Approximation Biswanath Panda Mirek Riedewald, Stephen B. Pope, Johannes Gehrke, L. Paul Chew Cornell University](https://reader035.vdocuments.site/reader035/viewer/2022062322/56649f335503460f94c5018c/html5/thumbnails/31.jpg)
VLDB 2006, Seoul 40
Binary Tree
Primary Retrieve
A
C
B
1
2A
B C
21
q
![Page 32: VLDB 2006, Seoul1 Indexing For Function Approximation Biswanath Panda Mirek Riedewald, Stephen B. Pope, Johannes Gehrke, L. Paul Chew Cornell University](https://reader035.vdocuments.site/reader035/viewer/2022062322/56649f335503460f94c5018c/html5/thumbnails/32.jpg)
VLDB 2006, Seoul 41
Binary Tree
Secondary Retrieve
A
C
B
1
2A
B C
21
q
![Page 33: VLDB 2006, Seoul1 Indexing For Function Approximation Biswanath Panda Mirek Riedewald, Stephen B. Pope, Johannes Gehrke, L. Paul Chew Cornell University](https://reader035.vdocuments.site/reader035/viewer/2022062322/56649f335503460f94c5018c/html5/thumbnails/33.jpg)
VLDB 2006, Seoul 42
Binary Tree
A
C
B
1
2A
B C
2
1
![Page 34: VLDB 2006, Seoul1 Indexing For Function Approximation Biswanath Panda Mirek Riedewald, Stephen B. Pope, Johannes Gehrke, L. Paul Chew Cornell University](https://reader035.vdocuments.site/reader035/viewer/2022062322/56649f335503460f94c5018c/html5/thumbnails/34.jpg)
VLDB 2006, Seoul 43
Binary Tree
Secondary Retrieve now Primary Retrieve
A
C
B
1
2A
1
2
3
3DB
D C
C
![Page 35: VLDB 2006, Seoul1 Indexing For Function Approximation Biswanath Panda Mirek Riedewald, Stephen B. Pope, Johannes Gehrke, L. Paul Chew Cornell University](https://reader035.vdocuments.site/reader035/viewer/2022062322/56649f335503460f94c5018c/html5/thumbnails/35.jpg)
VLDB 2006, Seoul 44
Effects In Action: Binary Tree
• 32 dimensional Methane simulation• 6 x 106 queries• Windows XP machine (2.4 Ghz, 2GB)
![Page 36: VLDB 2006, Seoul1 Indexing For Function Approximation Biswanath Panda Mirek Riedewald, Stephen B. Pope, Johannes Gehrke, L. Paul Chew Cornell University](https://reader035.vdocuments.site/reader035/viewer/2022062322/56649f335503460f94c5018c/html5/thumbnails/36.jpg)
VLDB 2006, Seoul 45
MRU List + Rtree
• MRU List for retrieving– High locality
• Rtree for searching growable ellipsoids
MRU List
Rtree
![Page 37: VLDB 2006, Seoul1 Indexing For Function Approximation Biswanath Panda Mirek Riedewald, Stephen B. Pope, Johannes Gehrke, L. Paul Chew Cornell University](https://reader035.vdocuments.site/reader035/viewer/2022062322/56649f335503460f94c5018c/html5/thumbnails/37.jpg)
VLDB 2006, Seoul 46
Effects In Action: MRU List + Rtree
• Effects very different from Binary Tree
![Page 38: VLDB 2006, Seoul1 Indexing For Function Approximation Biswanath Panda Mirek Riedewald, Stephen B. Pope, Johannes Gehrke, L. Paul Chew Cornell University](https://reader035.vdocuments.site/reader035/viewer/2022062322/56649f335503460f94c5018c/html5/thumbnails/38.jpg)
VLDB 2006, Seoul 47
Total Simulation TimesIndex Type Error Tolerance
0.005 0.00005 0.00004
Binary Tree (tuned)
1073 10181 13100
MRU List + Rtree 1125 14000 19920
Bbox Rtree 1201 14700 20850
Random Projection Rtree
1378 15800 22051
Binary Tree(default)
1344 29186 31200
FIFO List + Rtree 2154 33770 42900
Point Rtree 10431 >44000 -
Ellipsoidal Rtree 14328 >44000 -
![Page 39: VLDB 2006, Seoul1 Indexing For Function Approximation Biswanath Panda Mirek Riedewald, Stephen B. Pope, Johannes Gehrke, L. Paul Chew Cornell University](https://reader035.vdocuments.site/reader035/viewer/2022062322/56649f335503460f94c5018c/html5/thumbnails/39.jpg)
VLDB 2006, Seoul 48
Conclusion & Future Work
• Formulated the function approximation problem• New class of applications for high dimensional indexing• Understand index selection for function approximation
• Future work– Dynamic parameter settings– New benchmark for index structures– Evaluation of other index structures– Comparison with other function approximation techniques
![Page 40: VLDB 2006, Seoul1 Indexing For Function Approximation Biswanath Panda Mirek Riedewald, Stephen B. Pope, Johannes Gehrke, L. Paul Chew Cornell University](https://reader035.vdocuments.site/reader035/viewer/2022062322/56649f335503460f94c5018c/html5/thumbnails/40.jpg)
VLDB 2006, Seoul 49
Questions?