querying large databases rukmini kaushik. purpose research for efficient algorithms and software...
TRANSCRIPT
Querying Large Databases
Rukmini Kaushik
Purpose
• Research for efficient algorithms and software architectures of query engines.
Query Execution Engine Architecture
• Query processing algorithms – physical algebra
• Data Model – logical algebra
Sorting & Hashing
• Both are memory intensive.
• Memory Concerns
- Merge Efficiency & memory
management.
- Hash table overflow
Aggregation and Duplicate Removal
• Aggregation Concept
Describes a set of objects with one value.• Algorithms
Three Types
- Nested Loops
- Sorting
- Hashing
Aggregation & Duplicate Removal
• Nested Loops - Easiest of the three - Doesn’t work well for large inputs• Sorting - Sort for common elements which results
in a simple duplicate removal. - Should remove duplicates as early as
possible.
Aggregation & Duplicate Removal
• Hashing
- Hash on group attributes.
- Can perform duplicate removal when creating hash table.
• Algorithm Analysis
Sorting and hashing functions are logarithmic with input size
Complex Query Execution Plan
• Purpose
- To schedule a query with several operations optimally
• Ideas
- Right-deep plans
- Left-deep plans
Complex Query Execution Plan
• Prediction
- Use a decision tree of sub-plans
- Done by using choose-plan operators
• Major Concern
- Optimal resource allocation
Parallel Query Execution Mechanism
• Goal
Obtain speed-up & scale-up
• Speed-up
- Uses extra hardware for constant size problem
- Linear speed-up is optimal
- Can be expressed as parallel efficiency
Parallel Query Execution Mechanism
• Scale-up
- Uses same resources with altered problem size
- Can be expressed as parallel efficiency.
Parallel Query Execution Mechanism
• Parallel Vs Distributed Systems
• Distributed
- Locally Autonomous
- Also uses Parallelism
Parallel Query Execution Mechanism
• Parallel
- One center of control
- Three types
Shared memory
Shared Disk
Distributed Memory
Parallel Query Execution Mechanism
• Three forms of parallelism
- Inter Query: Servicing multiple requests at the same time
- Inter Operator: Pipelining
- Intra Operator: Execute a single operator in multiple processors
Parallel Query Execution Mechanism
• Implementation
Bracket Models
Operator Models
• Bracket Model
Goal: Generic process template that receives and sends data and performs one operation at a time
Parallel Query Execution Mechanism
Number of inputs is limited to two
Can be run in parallel by having many templates in the system running simultaneously.
• Operator Model
Goal: Insert parallel operators in an ordered plan
Parallel Query Execution Mechanism
• Uses the exchange operator
• Exchange operator
- Does not manipulate data
- Provides capabilities for parallel query processing
- Changes a complex query into a single process
Parallel Algorithms
• Idea: More focus on algorithms and parallel execution
• Parallel selections and updates
- Disk input and output should be made parallel
- Selection: Maintain indices near stored data
- Updates: Use keys for partitioning attributes
Parallel Algorithms
• Parallel Sorting:
-classified by
- number of parallel inputs
- number of parallel outputs
- Algorithms consists of local sort and a data exchange step
Parallel Algorithms
- Major Concern
- Deadlock can be avoided by using range partitioning
- having a sufficient size data exchange buffer
- using a modified sort algorithm
Query Optimization
• Uses the differences between logical and physical aspects
• Must keep track of the properties of the inputs
• Cost models focus on throughput measures
Tuning query performance
• Focus
- Guidelines for improving query performance
• Guidelines for three points of view
- implementor and vendor
- database administrator
- application programmer
Tuning Query Performance
• Implementor
System should support indexing and clustering
Query optimizer should be reliable and accurate
• Administrator
Ensure usage of system facilities
Tuning Query Performance
carefully choose physical database design
provide available and efficient processing resources
• Application Programmer
Provide high level queries