adaptive query processing in the looking glass
Post on 31-Dec-2015
35 Views
Preview:
DESCRIPTION
TRANSCRIPT
Introduction AQP Families Comparison New Ideas Conclusions
Adaptive Query Processing in the Looking Glass
Shivnath Babu (Stanford Univ.)Pedro Bizarro (Univ. of Wisconsin,
Madison)
Introduction AQP Families Comparison New Ideas Conclusions
Adaptive Query Processing (AQP) Systems:
Publication Timeline
…1976 1977 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004
Parametric opt.
RedBrick
DEC-Rdb
Query Scrambling
Re-Opt
Tukwila
River
DQE
Conquest
Expected cost opt.
Pipeline sch.
Memory adap.
POP
CAPE
Correctiveprocessing
EddiesNiagaraCQ
STREAM
Ingres
Introduction
Introduction AQP Families Comparison New Ideas Conclusions
Motivation• Plenty of recent work on Adaptive Query Processing
(AQP) in different contexts– Conventional DBMS query processing, data
integration, continuous queries in stream systems• No exhaustive, in-depth categorization and comparison
of AQP systems to date• Difficult to answer questions like:
– Will techniques from one system work on another?– What are the shortcomings of each system?– Which system is best for a new application domain?
Introduction
Introduction AQP Families Comparison New Ideas Conclusions
Our Contributions
• Detailed study of current AQP systems • Classification of AQP systems into 3 families• Comparison across families in terms of AQP tasks• Identification of shortcomings & new approaches
to address them
Introduction
Introduction AQP Families Comparison New Ideas Conclusions
Roadmap
• Introduction to AQP• The three AQP system families• Comparison across families in terms of AQP tasks• Summary of what we learned
Introduction AQP Families Comparison New Ideas Conclusions
Primer on Traditional Query Processing
Optimizer:Chooses best plan
Query
Catalog(table sizes,histograms)
Uses stats to cost plans
Executor:Runs chosen plan
Chosen plan
Introduction
Statistics Tracker:Creates/updates stats
Runstats
Introduction AQP Families Comparison New Ideas Conclusions
Need for Adaptive Query Processing
Introduction
Correlated & skewed datadistributions
Errors in statsestimates,
optimizer mistakes
Detect plansuboptimality,
re-optimize
Stats & systemconditions maychange while
query is running
Monitor forchanges,
re-optimize
Continuousqueries,
long-runningqueries
AQP is integral to the current CS-wide push towardsautonomic computing
Introduction AQP Families Comparison New Ideas Conclusions
Our Focus: AQP for a Single Query
Introduction
• AQP System:– A system that interleaves the optimization and
execution aspects of query processing, possibly multiple times, during the processing of a single query
Introduction AQP Families Comparison New Ideas Conclusions
Roadmap
• Introduction to AQP• The three AQP system families• Comparison across families in terms of AQP tasks• Summary of what we learned
Introduction AQP Families Comparison New Ideas Conclusions
AQP System Families
• Plan-based AQP systems– AQP for traditional plan-based DBMSs
• Continuous-Query-based (CQ-based) AQP systems– AQP for long-running continuous queries over data
streams• Routing-based AQP systems– AQP for DBMSs and continuous queries based on
adaptive tuple routing
AQP Families
Introduction AQP Families Comparison New Ideas Conclusions
AQP in Plan-based Systems
Optimizer:Chooses best plan
Query
Catalog(table sizes,histograms)
Uses stats to cost plans
Executor:Runs chosen plan
Chosen plan
Statistics Tracker:Creates/updates stats
Runstats
+Extra
operators
Collectedstats
AQP Families
Introduction AQP Families Comparison New Ideas Conclusions
AQP in Plan-based Systems
Optimizer:Chooses best plan
Query
Catalog(Original +
observed stats)
Uses stats to cost plans
Executor:Runs chosen plan
Chosen plan
Statistics Tracker:Creates/updates stats
Runstats
+Extra
operators
Collectedstats
AQP Families
Re-optimize
Introduction AQP Families Comparison New Ideas Conclusions
Example Plan-based AQP Systems
…1976 1977 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004
Parametric opt.
RedBrick
DEC-Rdb
Query Scrambling
Re-Opt
Tukwila
River
DQE
Conquest
Expected cost opt.
Pipeline sch.
Memory adap.
POP
CAPE
Correctiveprocessing
EddiesNiagaraCQ
STREAM
Ingres
AQP Families
Introduction AQP Families Comparison New Ideas Conclusions
Primer on Continuous Query Processing
• Continuous Queries (CQs) are long-running queries usually over data streams– Example CQ: Filtering packet
streams• Stream properties or system
conditions may change while query is running best plan may change σ
1
σ2
σ3
Packets
Chosen packets
AQP Families
Introduction AQP Families Comparison New Ideas Conclusions
AQP in CQ-based Systems
Optimizer:Chooses best plan
Query
Executor:Runs chosen plan
Chosen plan
AQP Families
Catalog(table sizes,histograms)
Statistics Tracker:Creates/updates stats
Runstats
Uses stats to cost plans
Introduction AQP Families Comparison New Ideas Conclusions
AQP in CQ-based Systems
Optimizer:Chooses best plan
Continuous Query
Executor:Runs chosen plan
Chosen plan
AQP Families
Catalog(stream rates,
data distr.)
Statistics Tracker: Monitors stream stats
and system conditions
Uses stats to cost plans
Introduction AQP Families Comparison New Ideas Conclusions
AQP in CQ-based Systems
Optimizer:Ensures that plan is best for current stats
Continuous Query
Executor:Runs chosen plan
Chosen plan
AQP Families
Catalog(stream rates,
data distr.)
Statistics Tracker: Monitors stream stats
and system conditions
Uses stats to cost plans
Introduction AQP Families Comparison New Ideas Conclusions
AQP in CQ-based SystemsContinuous Query
Executor:Runs chosen plan
Chosen plan
AQP Families
Catalog(stream rates,
data distr.)
Statistics Tracker: Monitors stream stats
and system conditions
Statsto track
Re-optimize
Combinedin-part forefficiency
Uses stats to cost plans
Optimizer:Ensures that plan is best for current stats
Introduction AQP Families Comparison New Ideas Conclusions
…1976 1977 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004
Parametric opt.
RedBrick
DEC-Rdb
Query Scrambling
Re-Opt
Tukwila
River
DQE
Conquest
Expected cost opt.
Pipeline sch.
Memory adap.
POP
CAPE
Correctiveprocessing
EddiesNiagaraCQ
STREAM
Ingres
Example CQ-based AQP Systems
AQP Families
Introduction AQP Families Comparison New Ideas Conclusions
Primer on Routing-based Processing
• Non-plan-based architecture where tuples are routed individually through operators
• No optimizer• Exemplified by
Eddies [AH00]
AQP Families
σ1
σ2
σ3
Packets
Chosen packets
Using a plan
σ1
σ2 σ
3
Packets
Chosen packets
TupleRouter
Using tuple routing
Introduction AQP Families Comparison New Ideas Conclusions
AQP in Routing-based Systems
Executor:Runs chosen plan
Chosen plan
AQP Families
Optimizer:Chooses best plan
Query
Catalog(table sizes,histograms)
Statistics Tracker:Creates/updates stats
Runstats
Uses stats to cost plans
Introduction AQP Families Comparison New Ideas Conclusions
AQP in Routing-based Systems
Tuple Router:Integrated Optimizer
& Stats Tracker
Query or Continuous Query
AQP Families
Executor:Runs chosen plan
Chosen plan
Executor:Pool of operators
Selective routing of tuples In-memory catalog
(operator costs,selectivities, etc.)
Uses stats to choose efficient routes
Introduction AQP Families Comparison New Ideas Conclusions
…1976 1977 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004
Parametric opt.
RedBrick
DEC-Rdb
Query Scrambling
Re-Opt
Tukwila
River
DQE
Conquest
Expected cost opt.
Pipeline sch.
Memory adap.
POP
CAPE
Correctiveprocessing
EddiesNiagaraCQ
STREAM
Ingres
Example Routing-based AQP Systems
AQP Families
Introduction AQP Families Comparison New Ideas Conclusions
Roadmap
• Introduction to AQP• The three AQP system families• Comparison across families in terms of AQP tasks• Summary of what we learned
Introduction AQP Families Comparison New Ideas Conclusions
Comparison Across AQP System Families
• Goal: To bring out AQP algorithms and features, not performance numbers
Comparison
• Models, assumptions, and approach• Techniques for tracking statistics• Re-optimization subtasks
• When and how to re-optimize• Switching between plans
• Pros & cons of using a conventional optimizer• Performance issues
• Quality of re-optimization• Run-time overhead & thrashing• Scalability
Introduction AQP Families Comparison New Ideas Conclusions
Comparison Across AQP System Families
• Goal: To bring out AQP algorithms and features, not performance numbers
Comparison
• Models, assumptions, and approach• Techniques for tracking statistics• Re-optimization subtasks
• When and how to re-optimize• Switching between plans
• Pros & cons of using a conventional optimizer• Performance issues
• Quality of re-optimization• Run-time overhead & thrashing• Scalability
Introduction AQP Families Comparison New Ideas Conclusions
Techniques for Tracking Statistics
• Observation– Mostly in Plan-based systems
• Competition– Mostly in Plan-based systems
• Profiling– Mostly in CQ-based systems
• Exploration– In Routing-based systems
Comparison
Introduction AQP Families Comparison New Ideas Conclusions
Tracking Statistics: Observation [KD98]
• Collect statistics on operator behavior or intermediate subexpressions in a plan
Comparison
σ1
σ2
σ3
Packets
Chosen packets
Selectivity of 1 oninput stream can be
observed here
Introduction AQP Families Comparison New Ideas Conclusions
Tracking Statistics: Competition [A93]
• Extra processing to collect statistics
Comparison
Packets
σ1
σ2
σ3
Chosen packets
Selectivity of
on inputstream σ
2
Selectivity of
on inputstream
Introduction AQP Families Comparison New Ideas Conclusions
Tracking Statistics: Profiling [BMM+04]
• Extra processing on a fraction of the input tuples (e.g., a random sample) to collect statistics
• Builds a “statistical profile” that can be used to estimate many individual statistics
Comparison
σ1
σ2
σ3
Profiledtuples
Introduction AQP Families Comparison New Ideas Conclusions
Tracking Statistics: Exploration [AH00]
• A fraction of tuples are routed along routes different from the current best route to track statistics along those routes
• No redundant processing
Comparison
σ1
σ2 σ
3
Packets
Chosen packets
TupleRouter
Introduction AQP Families Comparison New Ideas Conclusions
Comparing Statistics-Tracking Techniques:
Extra Overhead Introduced
Comparison
Increasingoverhead
• Observation
• Exploration (inefficient routes for some tuples)• Profiling (extra processing on some tuples)
• Competition (lots of extra work)
Introduction AQP Families Comparison New Ideas Conclusions
Comparing Statistics-Tracking Techniques:
Coverage of Different Statistics
Comparison
Increasingcoverage
• Observation & Competition (limited by plan)
• Exploration (limited by large number of routes)
• Profiling (highest since it builds statistics profile)
Introduction AQP Families Comparison New Ideas Conclusions
Comparing Statistics-Tracking Techniques:
Accuracy of Estimation
Comparison
Increasingaccuracy
• Observation & Competition
• Exploration (but, susceptible to routing bias)• Profiling (depends on sampling fraction)
Introduction AQP Families Comparison New Ideas Conclusions
Roadmap
• Introduction to AQP• The three AQP system families• Comparison across families in terms of AQP tasks• Summary of what we learned
Introduction AQP Families Comparison New Ideas Conclusions
What have we learned? (1)
• Many similarities in internals of different AQP families
• Can re-use many current (and new) AQP techniques across families
• Ex: Profiling from CQ-based systems– Enables, e.g., faster detection of plan
suboptimality in Plan-based systems– Generates more accurate statistics
at lower cost in Routing-based systems
New Ideas
Example Query:p1 and p2 (R) S ⋈
R
INLJ
Unclusteredindex
S
⋈
Introduction AQP Families Comparison New Ideas Conclusions
What have we learned? (2)• Current AQP systems are reactive
– E.g., do not consider sensitivity to errors/changes in stats
New Ideas
Example Query: p1 and p2 (R) S ⋈
|σ(R)|
Hash Join
INLJ
Cost
Proactive Re-optimization
R
S
Hash Join
⋈
R
INLJ
Unclusteredindex
S
⋈
Introduction AQP Families Comparison New Ideas Conclusions
What have we learned? (3)
• Challenging meta problems in AQP for continuous queries need to be addressed1. Larger and more complex plan spaces higher
costs for statistics tracking and re-optimization2. Tracking “Return-of-Investment” on AQP3. Avoiding thrashing, e.g., on bursty changes in
statistics
New Ideas
Proposal: Plan Logging for Continuous Queries
Introduction AQP Families Comparison New Ideas Conclusions
Plan Logging for Continuous Queries
• Log the statistics and re-optimization history– Query is long-running– Example view over log for R S TRate(R) … R,S) Plan Cost
1024 … 0.75 P112762
5642 … 0.72 P272332
934 … 0.76 P112003
⋈ ⋈
Rate(R)
R,S
) P1 P2
New Ideas
Plans lying in a high-dimensional space of statistics
time
Introduction AQP Families Comparison New Ideas Conclusions
Summary
• AQP is becoming important:– New data and application trends– CS-wide push towards Autonomic Computing– Significant amount of work on AQP in recent
years• Our contributions:
– In-depth categorization and comparison of AQP systems and techniques
– Identified current shortcomings and new approaches to AQP
Conclusions
top related