1 adaptive execution of variable-accuracy functions vldb conference seoul september 2006 matt denny...
Post on 21-Dec-2015
214 views
TRANSCRIPT
1
Adaptive Execution of Adaptive Execution of Variable-Accuracy Variable-Accuracy
FunctionsFunctions
VLDB ConferenceSeoul
September 2006
Matt Denny - UC Berkeley/Fred Alger, Inc.Michael Franklin - UC Berkeley
Matt Denny, Mike FranklinUC Berkeley EECS
IntroductionIntroduction
• Many applications apply expensive functions to streams of data• Finance: real-time market monitoring with
securities models• Power Management: overload prediction
using current weather conditions• Supply Chain Management: inventory models
using RFID data to find shortages in real-time
Matt Denny, Mike FranklinUC Berkeley EECS
Continuous Queries w/ Continuous Queries w/ UDFsUDFs
Example: Bond Pricing BondData: table of bond data (maturity, coupon, etc.) IntRate: stream of interest rate data
model(): C++/Java routine takes bond data and interest rate, and returns a price
SELECT BD.BondIDFROM BondData BD, IntRate IR [Rows 1] WHERE BD.numHeld > 0 AND model(BD,IR.rate) > $100
Filtering
SELECT MAX(model(BD,IR.rate))FROM BondData BD, IntRate IR [Rows 1]WHERE BD.numHeld > 0
Aggregation
Matt Denny, Mike FranklinUC Berkeley EECS
The ProblemThe Problem
• Analytical functions can be expensive!• minutes or hours per data point.
• Query processor has no control over execution of individual function calls.• UDF API is a Black Box
• Earlier work aims to avoid UDF calls:• predicate reordering ([HS93][KMPS94][CS96]))• memoization and caching ([HN96], [DF05])
• Remaining calls can still be a showstopper.
Matt Denny, Mike FranklinUC Berkeley EECS
The IntuitionThe Intuition
1. Many functions have accuracy/cost tradeoffs. e.g., iterative solvers.
2. UDFs often appear in predicates and aggregates where exact answers are not required.SELECT BD.*
FROM BondData BD, IntRate IR [Rows 1] WHERE BD.numHeld > 0 AND model(BD,IR.rate) > $100
Matt Denny, Mike FranklinUC Berkeley EECS
Our SolutionOur Solution
VAOs (Variable Accuracy Operators)
New query operators that:• Expose function cost/accuracy
tradeoffs using a new UDF API.
• Exploit this tradeoff to avoid excess work while correctly answering the query.
Matt Denny, Mike FranklinUC Berkeley EECS
VAOs - Basic IdeaVAOs - Basic Idea
• Initially run function to obtain a coarse answer.• This needs to be cheaper than
running to a more accurate answer.
• If more accuracy needed - iterate!
Matt Denny, Mike FranklinUC Berkeley EECS
Traditional Execution - Traditional Execution - SelectSelect
Select> 100 ?
execute model (IR.Rate,BD)
SELECT BD.bondIDFROM BondData BD, IntRate IR [Rows 1]WHERE model(BD,IR.rate)
> $100;
10.1% . . .InterestRate
BondData
BD 1 $105.01 Result
BD 1
Matt Denny, Mike FranklinUC Berkeley EECS
VAO VAO Execution: Execution: SelectSelect
SELECT BD.bondIDFROM BondData BD, IntRate IR [Rows 1]WHERE model(BD,IR.rate)
> $100;
10.1% . . .InterestRate
BondData
Select> 100 ?
execute model (IR.Rate,BD)
-VAO
BD 1 $98 $110
ResultObject
L H
Matt Denny, Mike FranklinUC Berkeley EECS
VAO VAO Execution: Execution: SelectSelect
SELECT BD.bondIDFROM BondData BD, IntRate IR [Rows 1]WHERE model(BD,IR.rate)
> $100;
10.1% . . .InterestRate
BondData
BD 1
Select> 100 ?
execute model (IR.Rate,BD)
-VAO
BD 1 $101 $108
ResultObject
L H
Iterate()
Matt Denny, Mike FranklinUC Berkeley EECS
VAO APIVAO API
• Use iterative interface• Traditional: <number> = f(<args>) • VAO: <result object> = f(<args>)
1. fields for (conservative) error bounds2. iterate() method: refines bounds with more
work3. for some vaos: also need estimates for CPU
cost and error reduction of next iteration
• Useful for:• Any sort of iterative function (e.g. root
finders, numerical integration)• Any technique with iterative step refinement
(e.g. PDEs)
Matt Denny, Mike FranklinUC Berkeley EECS
Iteration StrategyIteration Strategy
• Selection iterates over an object until predicate value is known.
• Aggregate operators more difficult • Answer dependent on sets of result
objects• Need to decide how to iterate over
multiple result objects
Matt Denny, Mike FranklinUC Berkeley EECS
Example: MAX(f(x1), Example: MAX(f(x1), f(x2))f(x2))
xx1 x2
f(x) bounds
initial
bounds
IterateOverf(x1)
xx1 x2
f(x) bounds
xx1 x2
f(x) bounds
IterateOverf(x2)
IterateOverboth
xx1 x2
f(x) bounds
Need an iteration strategy that attempts to minimize cost
Matt Denny, Mike FranklinUC Berkeley EECS
Solution: Greedy Solution: Greedy StrategyStrategy
• Iterate over the object that has the best ratio of benefit to CPU cost among the current choices.
• Good strategy if functions converge• Later iterations likely to have
less benefit/unit cost
• Operator-dependent
Matt Denny, Mike FranklinUC Berkeley EECS
Example RevisitedExample Revisited
MAX(f(x1),f(x2))
Greedy Strategy: choose best overlap reduction per CPU costUse error reduction estimates to estimate overlap reduction.Cost estimation depends on function.
Goal State: no overlap between f(x1) and f(x2)
Matt Denny, Mike FranklinUC Berkeley EECS
Example RevisitedExample Revisited• Determine if f(x1) > f(x2)
Function Overlap Red. Est.
CPU Cost Est.
f(x1)
f(x2)
$.04 4 sec.
$.04 4 sec.
xx1 x2
f(x)
Matt Denny, Mike FranklinUC Berkeley EECS
Example RevisitedExample Revisited• Determine if f(x1) > f(x2)
Function Overlap Red. Est.
CPU Cost Est.
f(x1)
f(x2)
xx1 x2
f(x)
$.01 8 sec.
$.02 4 sec.
$.04 4 sec.
$.04 4 sec.
Matt Denny, Mike FranklinUC Berkeley EECS
Example RevisitedExample Revisited• Determine if f(x1) > f(x2)
Function Overlap Red. Est.
CPU Cost Est.
f(x1)
f(x2)
xx1 x2
f(x)
$0 8 sec.
$0 8 sec.
$.01 8 sec.
$.02 4 sec.
Matt Denny, Mike FranklinUC Berkeley EECS
AggregatesAggregates
Operator
Goal State Greedy Heuristic
min/max(general)
No overlap between minimum (maximum) value and other function error bounds
Make educated guess for max. Choose iteration that reduces most overlap between guess and other error bounds per cycle
avg/sum avg/sum of error bounds have widthless than user-defined tolerance
Choose iteration which reduces avg/sum of bounds the most per cycle
Matt Denny, Mike FranklinUC Berkeley EECS
Performance SetupPerformance Setup
• Standalone implemenation of VAO framework in C++
• Used numeric bond model and bond data from [DF05]
• Real Bond Data - 500 Mortgage-backed Securities.
• Synthetic Bond Data - to stress test VAOs
• Single Interest Rate.
Matt Denny, Mike FranklinUC Berkeley EECS
VAO ImplementationVAO Implementation
• Numeric bond model [S95] implemented with traditional and VAOs interface• Based on PDE solver• VAO iterate(): double size of PDE
grid• Bounds and error reduction estimates
derived by using current and previous iteration results and Richardson’s Extrapolation [BF01]
Matt Denny, Mike FranklinUC Berkeley EECS
Selection Selection PerformancePerformance500 bonds, 1 interest rate
Selection Performance
1
10
100
1000
10000
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Selectivity
Runtime (sec.)
Trad
VAO
Runtime depends on number of bonds close to predicate.
Matt Denny, Mike FranklinUC Berkeley EECS
Stress TestStress Test• Generate bonds with accurate
values near the predicateGaussian, mean = predicate value, vary
std. dev.
Std. dev. of realbonds: $7.78
Matt Denny, Mike FranklinUC Berkeley EECS
In the PaperIn the Paper• Other Results
• Max• Real bonds: 111 sec. vs. 6953 sec.• Synthetic bonds: VAOs better than traditional above
$.05 std. dev.• Average
• Up to 5x improvement if a small number of bonds are weighted heavily in average.
• Details on Error and Cost estimates for PDE-based bond model.• Other types of models covered in Matt’s thesis.
Matt Denny, Mike FranklinUC Berkeley EECS
ConclusionConclusion• Many emerging CQ applications require the
repeated execution of expensive functions.• VAOs are new operators that change how
these functions execute• Use new iterative API that exposes work-accuracy
tradeoff in functions• Do only enough work to answer the query using
greedy strategy to choose iterations
• With real bond data and models, VAOs show 1-2 orders of magnitude improvement.
• For more detailed information:[email protected]
Matt Denny, Mike FranklinUC Berkeley EECS
The Advisor’s DodgeThe Advisor’s Dodge
Relative Contribution to Research
0
20
40
60
80
100
0 1 2 3 4 5
Time in Program (years)
Percent Contribution
Student
AdvisorThisWork
…
Courtesy of Jennifer Widom
Matt Denny, Mike FranklinUC Berkeley EECS
BibliographyBibliography
• [HS93] J. M. Hellerstein and M. Stonebraker, “Predicate Migration: Optimizing Queries with Expensive Predicates”, SIGMOD 1993.
• [HN96] J. M. Hellerstein and J. Naughton, “Query Execution Techniques for Caching Expensive Predicates”, SIGMOD 1996.
• [DF05] M. Denny and M.J. Franklin. “Predicate Result Range Caching for Continuous Queries”, SIGMOD 2005