the target architecture - computer science / free ... · parallel architectures (shared-memory...

12
Motivation for Parallel Databases (The Database Problem) I/O bottleneck (or memory access bottleneck): Speed(disk) < speed(RAM) < speed(microprocessor) Predictions (Micro-) processor speed growth : 0.5× per year DRAM capacity growth : 4× every three years Disk throughput : 2× in the last ten years Conclusion : the I/O bottleneck worsens Spring, 2006 Arturas Mazeika Page 3 Parallel Databases Introduction, General Idea, Motivation Parallel Architectures Parallel DBMS Techniques Motivation for Parallel Databases (The Solution) Increase the I/O bandwidth Data partitioning Parallel data access Origins (1980’s): database machines Hardware-oriented (failure due to bad cost-performance) 1990’s: same solution but using standard hardware components integrated in a multiprocessor Software-oriented Standard essential to exploit continuing technology improvements Spring, 2006 Arturas Mazeika Page 4 The Target Architecture Shared Memory (SM) Shared Disk (SD) Shared Nothing (SN) Spring, 2006 Arturas Mazeika Page 2

Upload: dinhtu

Post on 30-Mar-2019

221 views

Category:

Documents


0 download

TRANSCRIPT

Motivation for Parallel Databases (The Database Problem)

☞ I/O bottleneck (or memory access bottleneck):

• Speed(disk) < speed(RAM) < speed(microprocessor)

☞ Predictions

• (Micro-) processor speed growth : 0.5× per year

• DRAM capacity growth : 4× every three years

• Disk throughput : 2× in the last ten years

☞ Conclusion : the I/O bottleneck worsens

Spring, 2006 Arturas Mazeika Page 3

Parallel Databases

☞ Introduction, General Idea, Motivation

☞ Parallel Architectures

☞ Parallel DBMS Techniques

Motivation for Parallel Databases (The Solution)

☞ Increase the I/O bandwidth

• Data partitioning

• Parallel data access

☞ Origins (1980’s): database machines

• Hardware-oriented (failure due to bad cost-performance)

☞ 1990’s: same solution but using standard hardware components integrated in amultiprocessor

• Software-oriented

• Standard essential to exploit continuing technology improvements

Spring, 2006 Arturas Mazeika Page 4

The Target Architecture

Shared Memory (SM) Shared Disk (SD)

Shared Nothing (SN)

Spring, 2006 Arturas Mazeika Page 2

Parallel Data Processing (Data Based Parallelism)

☞ Databases archive parallelism through inter-operations and intra-operations

☞ Inter-operation: one query is broken into a number of operations that are run in parallel

☞ Intra-operation the same query is run on different portions of the data in parallel

Spring, 2006 Arturas Mazeika Page 7

Motivation for Parallel Databases (Multiprocessor Objectives)

☞ High-performance with better cost-performance than mainframe or vector supercomputer

☞ Use many nodes, each with good cost- performance, communicating through network

• Good cost via high-volume components

• Good performance via bandwidth

☞ Trends

• Microprocessor and memory (DRAM): off-the-shelf

• Network (multiprocessor edge): custom

☞ The real challenge is to parallelize applications to run with good load balancing

Spring, 2006 Arturas Mazeika Page 5

Parallel Data Processing (Objectives of Parallel DBMS 1/3)

☞ High-performance through parallelism

☞ High availability and reliability by exploiting data replication

☞ Extensibility with the ideal goals

• Linear speed-up

• Linear scale-up

Spring, 2006 Arturas Mazeika Page 8

Parallel Data Processing (General Idea)

☞ Three ways of exploiting high-performance multiprocessor systems:

• Automatically detect parallelism in sequential programs (e.g., Fortran)

• Augment an existing language with parallel constructs (e.g., C, Fortran90)

• Offer a new language in which parallelism can be expressed or automatically inferred

☞ Critique

• Hard to develop parallelizing compilers, limited resulting speed-up

• Enables the programmer to express parallel computations but too low-level

• Can combine the advantages of both (1) and (2)

Spring, 2006 Arturas Mazeika Page 6

Parallel Data Processing (Barriers to Parallelism)

☞ Startup

• The time needed to start a parallel operation may dominate the actual computationtime (especially for small queries)

☞ Interference

• When accessing shared resources, each new process slows down the others (twoprocesses access the same data item)

☞ Skew

• The response time of a set of parallel processes is the time of the slowest one (oneprocess has lots of work, while the others are idle)

Spring, 2006 Arturas Mazeika Page 11

Parallel Data Processing (Objectives of Parallel, Linear Speed-Up DBMS 2/3)

☞ Every system component (processor, memory, disk) increases the performance by aconstant (the performance is proportional to the number of components)

Spring, 2006 Arturas Mazeika Page 9

Parallel Architectures (Functional Architecture of Parallel Database System 1/2)

Spring, 2006 Arturas Mazeika Page 12

Parallel Data Processing (Objectives of Parallel, Linear Speed-Up DBMS 3/3)

☞ The same performance as the size of the database increases and the number ofcomponents of the system increases

Spring, 2006 Arturas Mazeika Page 10

Parallel Architectures (Shared-Memory Architecture 1/2)

Spring, 2006 Arturas Mazeika Page 15

Parallel Architectures (Functional Architecture of Parallel Database System 2/2)

☞ Session manager

• Mediate between the DBMS and applications. Initiate, close transactions

☞ Request manager

• Compilation and optimization

• Data directory management

• Semantic data control

• Execution control

☞ Data manager

• Execution of DB operations

• Transaction management support

• Data management

Spring, 2006 Arturas Mazeika Page 13

Parallel Architectures (Shared-Memory Architecture 2/2)

☞ Advantages: simplicity and load balancing, fast communication

• Both the directory (meta information) and lock tables (control information) are in mainmemory, and is shared by all CPUs

• Inter-query parallelism comes for free

• Excellent load-balancing, since it can be achieved during run-time of the execution

☞ Disadvantages: high costs, limited extensibility, low availability (central memory is singlepoint of failure)

Spring, 2006 Arturas Mazeika Page 16

Parallel Architectures (Architectural Alternatives)

☞ Shared memory (shared everything)

☞ Shared disk

☞ Shared nothing (message-passing)

☞ Hierarchical (cluster)

☞ Non-Uniform Memory Architecture (NUMA)

Spring, 2006 Arturas Mazeika Page 14

Parallel Architectures (Shared-Nothing 1/2)

Spring, 2006 Arturas Mazeika Page 19

Parallel Architectures (Shared-Disk Architecture 1/2)

Spring, 2006 Arturas Mazeika Page 17

Parallel Architectures (Shared-Nothing 2/2)

☞ advantages: cost, extensibility, availability

• smooth, incremental growth

• replication of data to nodes increases availability

☞ disadvantages: complexity, difficult load balancing

• good fragmentation and replication is required to achieve high performance

• distribution of data plays an important role in load balance, not only the load of theeach cpu

• addition of discs might require redistribution of data in order to achieve highperformance

Spring, 2006 Arturas Mazeika Page 20

Parallel Architectures (Shared-Disk Architecture 2/2)

☞ Advantages: network cost, extensibility, easy migration from uniprocessor

☞ Disadvantages: higher complexity and potential performance problems

• Distributed database protocols are required: two-phase commit, distributed locking

• Shared disk might be a bottleneck

• High communication between nodes

Spring, 2006 Arturas Mazeika Page 18

Parallel Architectures (NUMA Architectures 1/2)

Spring, 2006 Arturas Mazeika Page 23

Parallel Architectures (Hierarchical/Cluster/Grid Architecture 1/2)

Spring, 2006 Arturas Mazeika Page 21

Parallel Architectures (NUMA Architectures 2/2)

☞ The memories of each nodes are used to form a global shared address space

☞ The shared-memory address space is supported on the hardware level, and is onlyseveral times (4 times) slower compared to local accesses

☞ Advantages: Application software needs not to be rewritten

☞ Disadvantages: Supports relatively small DBMS

Spring, 2006 Arturas Mazeika Page 24

Parallel Architectures (Hierarchical/Cluster/Grid Architecture 2/2)

☞ Combines good load balancing of shared-memory with extensibility of shared-nothing

☞ Alternatives

• Limited number of large nodes, e.g., 4×16 processor nodes

• High number of small nodes, e.g., 16 × 4 processor nodes, has much bettercost-performance (can be a cluster of workstations)

☞ Advantages: better load balancing

☞ Disadvantages: More complicated design

Spring, 2006 Arturas Mazeika Page 22

Data Placement (Main Implementations 2/2)

☞ Each relation is divided in n partitions (sub-relations), where n is a function of relationsize and access frequency

☞ Data partitioning supports good load balancing

☞ Three implementations are common:

• Round-robin

– Maps i-th element to node i mod n– Simple but only exact-match queries– Updates should be handled carefully

• B-tree index

– Supports range queries but large index

• Hash function

– Only exact-match queries but small index

Spring, 2006 Arturas Mazeika Page 27

Parallel DBMS Techniques

Parallel DBMS techniques rely on four inter-dependent concepts:

☞ Data placement

• Physical placement and replication of the DB onto multiple nodes

☞ Parallel data processing

• The main issue is to process joins, selects are easy

☞ Parallel query optimization

• Automatic parallelization of the queries and load balancing are main issues

• Choice of the best parallel execution plans

☞ Transaction management

• Similar to distributed transaction management

Spring, 2006 Arturas Mazeika Page 25

Data Placement (Issues in Data Partitioning)

☞ Simple involving (small) relations should be partitioned separately from the large ones (itdoes not make sense to partition a few blocks to a thousand of places)

☞ Large relations also should not be fragmented everywhere in large parallel systems.Consider a binary join in full fragmentation in 1024 CPU parallel architecture. Thenumber of messages between nodes would be 10,242.

☞ The data should be periodically repartitioned to support balanced loads and changes indata distributions (due to updates)

Spring, 2006 Arturas Mazeika Page 28

Data Placement (Main Implementations 1/2)

Spring, 2006 Arturas Mazeika Page 26

Data Placement (Data Replication, Interleaved Partitioning)

☞ fragment the data of the node, and mirror each fragments to a different node

☞ System is unrecoverable, if two nodes fail

Spring, 2006 Arturas Mazeika Page 31

Data Placement (Data Replication)

☞ High-availability requires data replication

• Mirror disks

• Interleaved partitioning

• Chained partitioning

Spring, 2006 Arturas Mazeika Page 29

Data Placement (Data Replication, Chained Partitioning)

☞ A copy is stored on two adjacent nodes

☞ The probability that two adjacent nodes fails is much lower than the probability of any twonodes (?)

☞ Simple strategy to mirror nodes

Spring, 2006 Arturas Mazeika Page 32

Data Placement (Data Replication, Mirroring)

☞ keep the entire copy of one node at the other

☞ simple strategy

☞ hurts the load balancing if one node fails

Spring, 2006 Arturas Mazeika Page 30

Join Processing (Parallel Associative Join)

☞ Assumptions: equi-join, partition according to the join attribute (S is partitionedaccording to the hash function h)

☞ Send Ri = R|h(R.A) to Si

☞ Join Ri with Si

☞ Return ∪ki=1Ri on Si

Spring, 2006 Arturas Mazeika Page 35

Join Processing

☞ Three basic algorithms for intra-operator parallelism

• Parallel nested loop join: no special assumption

• Parallel associative join: one relation is de-clustered on join attribute and equi-join

• Parallel hash join: equi-join

☞ They also apply to other complex operators such as duplicate elimination, union,intersection, etc. with minor adaptation

☞ Let

• R1, R2, . . . , Rm be fragments of R

• S1, S2, . . . , Sk be fragments of S

• JP be the join predicate

Spring, 2006 Arturas Mazeika Page 33

Join Processing (Parallel Hash Join)

☞ Generalizes associative join, does not require any specific partitioning

☞ Partition both R and S into mutually exclusive sets: Ri = R|h(R.A), Si = S|h(S.A)

☞ Send Si and Ri to the same node

☞ Return ∪pi=1Ri on Si

Spring, 2006 Arturas Mazeika Page 36

Join Processing (Parallel Nested Loop)

☞ Send the whole R to all sites with Si

☞ Join R with Si at all sites

☞ Return ∪ki=1R on Si

Spring, 2006 Arturas Mazeika Page 34

Parallel Query Optimization (Search Space)

☞ Two subsequent operations are executed through the pipeline

☞ One operand of the pipeline usually should be stored (hash table in case of the join)

☞ Build and probe steps are typical phases of the pipeline (first the hash table is built thenit is queried for joining)

Two Hash-Join Trees with Different Scheduling

Spring, 2006 Arturas Mazeika Page 39

Join Processing (Complexity)

☞ Cost of the algorithms is

Cost = CostCommunication + CostProcessing

☞ Fragments R1, R2, . . . , Rm, S1, S2, . . . , Sk, CostLocal(m, n) - process cost of thelocal join

☞ Nested loop (without the broadcast):

• CostCommunication = m · n · msg(Card(R)m

)

• CostProcessing = n · CostLocal(Card(R), Card(S)/n)

☞ Associative join:

• CostCommunication = m · n · msg(Card(R)m cot n

)

• CostProcessing = n · CostLocal(Card(R)/n, Card(S)/n)

☞ Hash join

• CostCommunication = m · p · msg(Card(R)m cot p

) + n · p · msg(Card(S)n cot p

)

• CostProcessing = n · CostLocal(Card(R)/n, Card(S)/n)

☞ Associative join minimizes the communication cost, hash join trades the communicationwrt the processing cost

Spring, 2006 Arturas Mazeika Page 37

Parallel Query Optimization (Search Strategy)

☞ Conceptually search strategy is not different from the centralized or distributedenvironments

☞ The search space is substantially larger

• left-deep trees express full materialization

• right-deep trees express full pipelining

• right-deep trees consumes more memory, though are more efficient

• Zig-zag trees tradeoff pipelining with memory

• Bushy trees tradeoff parallelism with memory

Spring, 2006 Arturas Mazeika Page 40

Parallel Query Optimization

☞ The objective is to select the ”best” parallel execution plan for a query using the followingcomponents

☞ Search space

• Models alternative execution plans as operator trees

• Left-deep vs. Right-deep vs. Bushy trees

☞ Search strategy

• Dynamic programming for small search space

• Randomized for large search space

☞ Cost model

• Physical schema info. (partitioning, indexes, etc.)

• Statistics and cost functions

Spring, 2006 Arturas Mazeika Page 38

Parallel Execution Problems (Initialization)

☞ Serial time is required to start parallel setup (initialization of threads, for e.g.)

☞ Initial time might outlast the processing time of small queries

☞ Let N be number of tuples, n be the number of processors, tp be average processingtime, ti initialization time per processor. Then

Response Time = ti · n +tp · N

n

☞ Optimal number of processors nO and maximal speedup Sm is then

nO =√

tp·N

tiSm = n0

2

Spring, 2006 Arturas Mazeika Page 43

Parallel Query Optimization (Cost Model)

☞ Three parameters are typically important: total work (TW), response time (RT), andmemory consumption (MC).

☞ Total work and response time shows the tradeoff between the throughput and responsetime

☞ The cots is typically modeled in the following way:

Cost(αRT , αTW ) =

{

αRT · RT + αTW · TW iff MC fits in RAM

∞ oterwise

Good choice of constants αRT and αTW is a challenge.

☞ Response time:

RT (phases) =∑

p∈phases

maxOp∈p

(RT (Op) + Pipe Delay(Op) + Store Delay(Op))

Spring, 2006 Arturas Mazeika Page 41

Parallel Execution Problems (Interference)

☞ Interference occurs when two CPUs simultaneously access the same resource

☞ Solution: duplicate resources or use shared variables (allowing only one CPU to accessthe data)

Spring, 2006 Arturas Mazeika Page 44

Parallel Execution Problems

☞ Initialization

☞ Interference

☞ Load Balancing

Spring, 2006 Arturas Mazeika Page 42

Parallel Hierarchical Architecture (2/2)

Spring, 2006 Arturas Mazeika Page 47

Parallel Execution Problems (Load Balancing)

☞ Problems arise for intra-operator parallelism with skewed data distributions

• attribute data skew (AVS)

• tuple placement skew (TPS)

• selectivity skew (SS)

• redistribution skew (RS)

• join product skew (JPS)

☞ Solutions

• sophisticated parallel algorithms that predicts (deals) with skew

• dynamic processor allocation (at execution time)

Spring, 2006 Arturas Mazeika Page 45

Parallel Hierarchical Architecture (1/2)

☞ The parallel hierarchical architecture is based on the following concepts

• Divide the query tree into activations. Assign each activation to a shared memory(SM) machine. Activation is the smallest unit of sequential processing that cannot befurther partitioned

• Make sure that there are more fragments than CPUs. In this case the SM is stillpossible to enjoy parallelism

• Run more threads in one SM that there are CPUs. In this case CPUs will be busy,though some threads are blocked due to unavailable recourses.

• Maximize parallelism in one SM, minimize parallelism between different SMs.

• Allow parallelism SM, so busy SMs give work to idle SMs. Compute the benefit of thetransfer of the work. As a rule of thumb send builds to idle SMs, rather than the data.

Spring, 2006 Arturas Mazeika Page 46