gamma dbms part 1: physical database design

Gamma DBMSGamma DBMSPart 1: Physical Database DesignPart 1: Physical Database Design

Shahram GhandeharizadehShahram GhandeharizadehComputer Science DepartmentComputer Science DepartmentUniversity of Southern CaliforniaUniversity of Southern California

OutlineOutline

Alternative architectures:Alternative architectures: Shared-disk versus Shared-NothingShared-disk versus Shared-Nothing

Declustering techniques.Declustering techniques.

Shared-Disk ArchitectureShared-Disk Architecture

Emerged in 1980s:Emerged in 1980s: Many clients share Many clients share

storage and data: data storage and data: data remains available when remains available when a client fails.a client fails.

Network

Data


Advantages:Advantages: Many clients share Many clients share

storage and data.storage and data. Redundancy is Redundancy is

implemented in one implemented in one place protecting all place protecting all clients from disk clients from disk failure.failure.

Network





Centralized backup: Centralized backup: The administrator does The administrator does not care/know how not care/know how many clients are on the many clients are on the network sharing network sharing storage.storage.

Network





Centralized backup: Centralized backup: The administrator does The administrator does not care/know how not care/know how many clients are on the many clients are on the network sharing network sharing storage.storage.

Network

HighAvailability

DataBackup

DataSharing

Network failuresNetwork failures

What about network failures?What about network failures? Two host bus adapters per server,Two host bus adapters per server, Each server connected to a different switch.Each server connected to a different switch.


Storage Area Network Storage Area Network (SAN):(SAN): Block level access,Block level access, Write to storage is Write to storage is

immediate,immediate, Specialized hardware Specialized hardware

including switches, including switches, host bus adapters, disk host bus adapters, disk chassis, battery backed chassis, battery backed caches, etc.caches, etc.

ExpensiveExpensive Supports transaction Supports transaction

processing systems.processing systems.

Network Attached Network Attached Storage (NAS):Storage (NAS): File level access,File level access, Write to storage might Write to storage might

be delayed,be delayed, Generic hardware,Generic hardware, In-expensive,In-expensive, Not appropriate for Not appropriate for

transaction processing transaction processing systems.systems.

Concepts and TerminologyConcepts and Terminology

Virtualization:Virtualization: Available storage is represented as one HUGE Available storage is represented as one HUGE

disk drive, e.g., a SAN with a thousand 1.5 TB disk drive, e.g., a SAN with a thousand 1.5 TB disk provides 1 Petabyte of storage,disk provides 1 Petabyte of storage, Available storage is partitioned into Logical Unit Available storage is partitioned into Logical Unit

Numbers (LUNs),Numbers (LUNs), A LUN is A LUN is presentedpresented to one or more servers, to one or more servers, A LUN appears as a disk drive to a server.A LUN appears as a disk drive to a server.

SAN places blocks across physical disks SAN places blocks across physical disks intelligently to balance load.intelligently to balance load.

What to do when a PC fails?What to do when a PC fails?

Shared-NothingShared-Nothing

Each node (blade) consisted of one Each node (blade) consisted of one processor, memory, and a disk drive. processor, memory, and a disk drive.

Network

CPU1

CPUN

….


Each node (blade) may consist of one or Each node (blade) may consist of one or several processors, memory, and one or several processors, memory, and one or several disk drives. several disk drives.

Network

….

CPU1

CPU2

CPUn

DRAM 1

DRAM 2

DRAM D

…

…

CPU1

CPU2

CPUn

DRAM 1

DRAM 2

DRAM D

…

…

Node 1 Node M


Network

CPU1

CPUnM

….

Partition resources to construct logical Partition resources to construct logical nodes. With an 8 CPU PC, construct eight nodes. With an 8 CPU PC, construct eight logical nodes each with a CPU, fraction of logical nodes each with a CPU, fraction of memory, and one disk drive.memory, and one disk drive.

Data DeclusteringData Declustering

Data is partitioned across the nodes (why?):Data is partitioned across the nodes (why?): Random/round-robin,Random/round-robin, Hash partitioning,Hash partitioning, Range partitioning.Range partitioning.

Each piece of a table is termed a fragment.Each piece of a table is termed a fragment. Single attribute declustering strategiesSingle attribute declustering strategies Two multi-attribute declustering strategies:Two multi-attribute declustering strategies:

1.1. Multi-Attribute GrId deClustering (MAGIC)Multi-Attribute GrId deClustering (MAGIC)

2.2. Bubba’s Extended Range Declustering (BERD)Bubba’s Extended Range Declustering (BERD)

Horizontal DeclusteringHorizontal Declustering

Physical ViewPhysical View

BobBob 2020 10K10K

ShidehShideh 1818 35K35K

TedTed 5050 60K60K

KevinKevin 6262 120K120K

AngelaAngela 5555 140K140K

MikeMike 4545 90K90K

Logical ViewLogical View

namename ageage salarysalaryEmpEmp

Horizontal DeclusteringHorizontal Declustering

No partitioning attribute: Random and No partitioning attribute: Random and Round-robin.Round-robin.

Single attribute declustering strategies:Single attribute declustering strategies: Hash,Hash, Range.Range.

Note: the database administrator must choose one Note: the database administrator must choose one attribute as the partitioning attribute.attribute as the partitioning attribute.

Hash DeclusteringHash Declustering

BobBob 2020 10K10K


TedTed 5050 60K60K





namename ageage salarysalary

salary % 3salary % 3

TedTed 5050 60K60K



BobBob 2020 10K10K






EmpEmp

salary is the salary is the partitioning partitioning

attribute.attribute.

Hash DeclusteringHash Declustering

Selections with equality predicates Selections with equality predicates referencing the partitioning attribute are referencing the partitioning attribute are directed to a single node:directed to a single node: Retrieve Emp where salary = 60KRetrieve Emp where salary = 60K

Equality predicates referencing a non-Equality predicates referencing a non-partitioning attribute and range predicates partitioning attribute and range predicates are directed to all nodes:are directed to all nodes: Retrieve Emp where age = 20 Retrieve Emp where age = 20 Retrieve Emp where salary < 20KRetrieve Emp where salary < 20K

SELECT *SELECT *FROM FROM EmpEmpWHERE salary=60KWHERE salary=60K

SELECT *SELECT *FROM FROM EmpEmpWHERE salary<20KWHERE salary<20K

Range DeclusteringRange Declustering

BobBob 2020 10K10K


TedTed 5050 60K60K






BobBob 2020 10K10K



TedTed 5050 60K60K






0-50K0-50K 51K-100K51K-100K 101K-101K-∞∞

EmpEmp

salary is the salary is the partitioning partitioning


Range DeclusteringRange Declustering

Equality and range predicates referencing Equality and range predicates referencing the partitioning attribute are directed to a the partitioning attribute are directed to a subset of nodes:subset of nodes: Retrieve Emp where salary = 60KRetrieve Emp where salary = 60K Retrieve Emp where salary < 20KRetrieve Emp where salary < 20K

Predicates referencing a non-partitioning Predicates referencing a non-partitioning attribute are directed to all nodes.attribute are directed to all nodes.

In our In our example, example,

both both queries are queries are directed to directed to one node.one node.

An iPSC/2 Intel HypercubeAn iPSC/2 Intel Hypercube Year is 1988!Year is 1988! 32 Processor 32 Processor

HypercubeHypercube Each node consists of:Each node consists of:

80386 processor (12 80386 processor (12 MHz)MHz)

2 MB DRAM2 MB DRAM 333 MB disk333 MB disk A hypercube inter-A hypercube inter-

connect supporting connect supporting parallel transmission of parallel transmission of messages among messages among nodes.nodes.

Software ArchitectureSoftware Architecture Each node stores its fragment on its local disk drive.Each node stores its fragment on its local disk drive. Each node may build a B+-tree (clustered/non-clustered) and hash index on its Each node may build a B+-tree (clustered/non-clustered) and hash index on its

fragment of a relation.fragment of a relation. Each node has its own concurrency control and crash recovery mechanism.Each node has its own concurrency control and crash recovery mechanism.

Software ArchitectureSoftware Architecture

Software ArchitectureSoftware Architecture

……

Software ArchitectureSoftware Architecture Processes executing on one node shared memory – identical to today’s Processes executing on one node shared memory – identical to today’s

threads!threads! At initialization time, a node would start a fixed number of threads (processes).At initialization time, a node would start a fixed number of threads (processes). All threads listen on a well defined socket, waiting for the Scheduler to All threads listen on a well defined socket, waiting for the Scheduler to

dispatch work to them.dispatch work to them. A message contains the identity that the operator should assume:A message contains the identity that the operator should assume:

A “switch” statement would enable a thread to become a select, project, hash-join A “switch” statement would enable a thread to become a select, project, hash-join build, hash-join probe, etc…build, hash-join probe, etc…

The message specifies the role of the thread.The message specifies the role of the thread.

A Comparison of Range & HashA Comparison of Range & Hash

Closed simulation model:Closed simulation model: A client generates a range selection predicate: X < age < Y.A client generates a range selection predicate: X < age < Y. The age attribute value is unique with values ranging from 0 to The age attribute value is unique with values ranging from 0 to

999,999 (1 million rows).999,999 (1 million rows). A client does not generate a new request until its pending request A client does not generate a new request until its pending request

is processed by Gamma and returned.is processed by Gamma and returned. The system is multi-programmed by increasing the number of The system is multi-programmed by increasing the number of

clients in the system.clients in the system. A multi-programming level of 8 means there are 8 clients A multi-programming level of 8 means there are 8 clients

generating requests to the system (independent of one another).generating requests to the system (independent of one another).

……

32 Node Gamma32 Node Gamma




is processed by Gamma and returned.is processed by Gamma and returned. A 0.01% selection predicate retrieves 100 rows.A 0.01% selection predicate retrieves 100 rows. With a clustered B+-tree index, the 100 rows are grouped together With a clustered B+-tree index, the 100 rows are grouped together

in a few disk pages.in a few disk pages.

……





is processed by Gamma and returned.is processed by Gamma and returned. A 0.01% selection predicate retrieves 100 rows.A 0.01% selection predicate retrieves 100 rows. With a clustered B+-tree index, the 100 rows are grouped together With a clustered B+-tree index, the 100 rows are grouped together

in a few disk pages.in a few disk pages. With range partitioning, the predicate is processed by one node.With range partitioning, the predicate is processed by one node. With hash partitioning, the predicate is processed by all 32 nodes With hash partitioning, the predicate is processed by all 32 nodes

with the scheduler coordinating the execution of each predicate with the scheduler coordinating the execution of each predicate on a node, and gathering of the results from every node.on a node, and gathering of the results from every node.

……


0-0-31,24931,249

31,250 –31,250 –62,49962,499

968-750 –968-750 –1,000,0001,000,000

Declustering Techniques: Tradeoffs Declustering Techniques: Tradeoffs

Range selection predicate using a clustered BRange selection predicate using a clustered B++-tree, 0.01% -tree, 0.01% selectivity (10 records)selectivity (10 records)

RangeRange

Hash/Random/Round-robinHash/Random/Round-robin

Multiprogramming LevelMultiprogramming Level

Throughput (Queries/Second)Throughput (Queries/Second)




is processed by Gamma and returned.is processed by Gamma and returned. A 1% selection predicate retrieves 10,000 rows.A 1% selection predicate retrieves 10,000 rows. With a clustered B+-tree index, the 10,000 rows are grouped With a clustered B+-tree index, the 10,000 rows are grouped

together.together.

……


0-0-31,24931,249

31,250 –31,250 –62,49962,499

968-750 –968-750 –1,000,0001,000,000




is processed by Gamma and returned.is processed by Gamma and returned. A 1% selection predicate retrieves 10,000 rows.A 1% selection predicate retrieves 10,000 rows. With a clustered B+-tree index, the 10,000 rows are grouped With a clustered B+-tree index, the 10,000 rows are grouped

together.together. With Range partitioning, the predicate is processed using one or With Range partitioning, the predicate is processed using one or

two nodes.two nodes. With Hash partitioning, the predicate is processed by all the With Hash partitioning, the predicate is processed by all the

nodes with the scheduler coordinating the execution of the nodes with the scheduler coordinating the execution of the predicate.predicate.

……0-0-31,24931,249

31,250 –31,250 –62,49962,499

968-750 –968-750 –1,000,0001,000,000

Tradeoffs (Cont…) Tradeoffs (Cont…)

Range selection predicate using a clustered BRange selection predicate using a clustered B++-tree, 1% -tree, 1% selectivity (1000 records)selectivity (1000 records)

RangeRange

Hash/Random/Round-robinHash/Random/Round-robin



Why Range Performs Poorly?Why Range Performs Poorly?

Note: Range performed poorly because the Note: Range performed poorly because the query (1% selection) imposed a high query (1% selection) imposed a high workload onto a node!workload onto a node! For a query with minimal (0.01% selection) For a query with minimal (0.01% selection)

workload requirement, Range is ideal!workload requirement, Range is ideal!

Two reasons:Two reasons: Random generation of selection predicates does Random generation of selection predicates does

NOT mean uniform distribution of workload NOT mean uniform distribution of workload across nodes.across nodes.

The number of ranges is the same as the number The number of ranges is the same as the number of nodes causing the tail-end servers to observe of nodes causing the tail-end servers to observe a lower load.a lower load.

3R1 R2 R3R1 R3 R2

R2 R1 R3R2 R3 R1

R3 R1 R2R3 R2 R1

{R1, R2, R3}{R1, R2, R3}

{R1, R2, R3}

{R1, R3} R2{R1, R3} R2

{R1, R3} R2

{R1, R3}R2{R1, R3}R2

6Idealcases

{R1, R3}R2

{R2, R3} R1{R2, R3} R1

{R2, R3} R1

{R2, R3}R1{R2, R3}R1

{R2, R3}R1

{R2, R1} R3{R2, R1} R3

{R2, R1} R3

{R2, R1}R3{R2, R1}R3

{R2, R1}R3

21

27 ways to 27 ways to assign 3 assign 3

requests to requests to the 3 the 3

nodes!nodes!Only 6 Only 6

result in a result in a uniform uniform

distribution distribution of of

requests.requests.

Tradeoffs (Cont…)Tradeoffs (Cont…)

Simple range partitioning may lead to load Simple range partitioning may lead to load imbalance for queries with high selectivity:imbalance for queries with high selectivity: Low performance: increased response time and Low performance: increased response time and

low system throughput.low system throughput.

Consider a table that maintains the grade of Consider a table that maintains the grade of students for different exams, range students for different exams, range partitioned on the grade.partitioned on the grade.

0-190-19 20-3920-39 40-5940-59 60-7960-79 80-10080-100


Assume a range predicate overlaps 3 Assume a range predicate overlaps 3 partitions, e.g.,partitions, e.g., 0 < grade < 450 < grade < 45

45 < grade < 9045 < grade < 90

0-190-19 20-3920-39 40-5940-59 60-7960-79 80-10080-100

0-190-19 20-3920-39 40-5940-59 60-7960-79 80-10080-100


Higher response time because 2 nodes sit Higher response time because 2 nodes sit idle while 3 nodes process the query idle while 3 nodes process the query (assuming overhead of parallelism is (assuming overhead of parallelism is negligible).negligible).

0-190-19 20-3920-39 40-5940-59 60-7960-79 80-10080-100

45 < grade < 9045 < grade < 90


Lower throughput because node 3 becomes Lower throughput because node 3 becomes a bottleneck. a bottleneck. Assuming even distribution of access to ranges, when node 3 is Assuming even distribution of access to ranges, when node 3 is

utilized 100%, nodes 2 and 4 have a 66% utilization, while nodes 1 utilized 100%, nodes 2 and 4 have a 66% utilization, while nodes 1 and 5 are utilized 33%.and 5 are utilized 33%.

0-190-19 20-3920-39 40-5940-59 60-7960-79 80-10080-100

Hybrid Range Partitioning [VLDB’90]Hybrid Range Partitioning [VLDB’90]

To minimize the impact of load imbalance, To minimize the impact of load imbalance, construct more ranges than nodes, e.g., 10 construct more ranges than nodes, e.g., 10 ranges for a 5 node system.ranges for a 5 node system.

Predicates such as “0 < grade < 45” are now Predicates such as “0 < grade < 45” are now directed to all nodes.directed to all nodes.

Assuming even distribution of access to ranges Assuming even distribution of access to ranges where workload consists of predicates utilizing 3 where workload consists of predicates utilizing 3 sequential ranges, when node 3 become 100% sequential ranges, when node 3 become 100% utilized, nodes 2 and 4 are now utilized 83%, utilized, nodes 2 and 4 are now utilized 83%, while nodes 1 and 5 are utilized 66%.while nodes 1 and 5 are utilized 66%.

0-100-1051-6051-60

11-2011-2061-7061-70

21-3021-3071-8071-80

31-4031-4081-9081-90

41-5041-5091-10091-100

Multi-Attribute Declustering [SIGMOD’92]Multi-Attribute Declustering [SIGMOD’92]

Queries with minimal resource requirements Queries with minimal resource requirements should be directed to a few processors. should be directed to a few processors. Why?Why? Overhead of parallelismOverhead of parallelism

1.1. Impacts query response time adversely,Impacts query response time adversely,

2.2. Wastes system resources, reducing throughput.Wastes system resources, reducing throughput.

OLTP has come a long way:OLTP has come a long way: Heaviest transaction in TPC-C reads Heaviest transaction in TPC-C reads

approximately 400 records.approximately 400 records. Assuming no disk accesses, a low-end PC Assuming no disk accesses, a low-end PC

processes this transaction < 1 ms.processes this transaction < 1 ms. Transactions should be single sited!Transactions should be single sited!

RangeRange

Round-robinRound-robin

Multi-Attribute Declustering (E.g.)Multi-Attribute Declustering (E.g.)

Recall the Emp(name, age, salary) table.Recall the Emp(name, age, salary) table. Workload consists of two queries, each with Workload consists of two queries, each with

a 50% frequency of occurrence:a 50% frequency of occurrence: Query A, range query referencing the age Query A, range query referencing the age

attribute. On average, retrieves 5 tuples.attribute. On average, retrieves 5 tuples. Retrieve Emp where age > 21 and age < 22.Retrieve Emp where age > 21 and age < 22.

Query B, range query referencing the salary Query B, range query referencing the salary attribute. On average, retrieves 10 tuples.attribute. On average, retrieves 10 tuples. Retrieve Emp where salary > 50K and salary < 50.5KRetrieve Emp where salary > 50K and salary < 50.5K

Access methods: Access methods: A non-clustered BA non-clustered B++-tree index on age-tree index on age A clustered BA clustered B++-tree index on salary -tree index on salary

Ideally, both queries should be directed to Ideally, both queries should be directed to one node.one node.

Multi-Attribute Declustering (E.g. Cont...)Multi-Attribute Declustering (E.g. Cont...)

Range decluster Emp using age as the Range decluster Emp using age as the partitioning attribute.partitioning attribute.

Assuming a system configured with nine Assuming a system configured with nine nodes, the number of employed nodes is:nodes, the number of employed nodes is:

RangeRange IdealIdeal

AA 50% * 150% * 1 50% * 150% * 1

BB 50% * 950% * 9 50% * 150% * 1

AverageAverage 55 11

MAGICMAGIC Construct a multi-attribute grid directory on the Emp tableConstruct a multi-attribute grid directory on the Emp table

Each dimension corresponds to a partitioning attribute.Each dimension corresponds to a partitioning attribute. Each cell represents a fragment of the relation.Each cell represents a fragment of the relation.

11 11 44 44 77 77

11 11 44 44 77 77

22 22 55 55 88 88

22 22 55 55 88 88

33 33 66 66 00 00

33 33 66 66 00 00

SalarySalary

AAggee

0-200-20 21-2521-25 26-3026-30 31-3531-35 36-4036-40 41-7041-70

10-2010-20

21-2521-25

26-3026-30

31-3531-35

36-4036-40

41-6041-60

MAGIC (Low Correlation)MAGIC (Low Correlation) Low correlation between Low correlation between

salary and age attribute salary and age attribute values:values:

11 11 44 44 77 77

11 11 44 44 77 77

22 22 55 55 88 88

22 22 55 55 88 88

33 33 66 66 00 00

33 33 66 66 00 00

....

..

..

..

......

MAGICMAGIC RangeRange IdealIdeal

AA 50% * 350% * 3 50% * 150% * 1 50% * 150% * 1

BB 50% * 350% * 3 50% * 950% * 9 50% * 150% * 1

AvgAvg 33 55 11

....

..

..

......

..

....

..

..

..

..

....

..

..

......

..

....

....

..

....

....

......

....

..

..

..

..

..

....

..

....

....

MAGIC (High Correlation)MAGIC (High Correlation) High correlation between salary High correlation between salary

and age attribute values:and age attribute values:11 11 44 44 77 77

11 11 44 44 77 77

22 22 55 55 88 88

22 22 55 55 88 88

33 33 66 66 00 00

33 33 66 66 00 00

........

..

.. ....

MAGICMAGIC RangeRange IdealIdeal

AA 50% * 150% * 1 50% * 150% * 1 50% * 150% * 1

BB 50% * 150% * 1 50% * 950% * 9 50% * 150% * 1

AvgAvg 11 55 11

.. ......

......

..

........

..

..............

....

..........

..

..

....

....

..

....

....

.... ..

....

..

..

..

..

..

BERDBERD

Range partition Emp using the salary Range partition Emp using the salary attribute.attribute.

For the age attribute, construct an auxiliary For the age attribute, construct an auxiliary relation containing:relation containing:1.1. The age attribute value of each recordThe age attribute value of each record

2.2. Node containing that recordNode containing that record

Range partition the auxiliary relation using Range partition the auxiliary relation using the age attribute value.the age attribute value.

BERDBERD

BobBob 2020 10K10K


TedTed 5050 60K60K






BobBob 2020 10K10K



TedTed 5050 60K60K






0-50K0-50K 51K-100K51K-100K 101K-101K-∞∞

EmpEmp

salary is the salary is the primary primary

partitioning partitioning attribute.attribute.

BERD, Auxiliary relationBERD, Auxiliary relation

2020 00

1818 00

5050 11

4545 11

6262 22

5555 22

ageage NodeNode

BobBob 2020 10K10K



TedTed 5050 60K60K






0-50K0-50K 51K-100K51K-100K 101K-101K-∞∞

Auxiliary relationAuxiliary relation


2020 00

1818 00

5050 11

4545 11

6262 22

5555 22

ageage NodeNode

2020 00

1818 00

ageage nodenode

0-200-20 21-5221-52 53-53-∞∞

Auxiliary relationAuxiliary relation

Range partition Range partition auxiliary auxiliary

relation using relation using the age the age


5050 11

4545 11

ageage nodenode

6262 22

5555 22

ageage nodenode


2020 00

1818 00

ageage nodenode

Aux.ageAux.age0-200-20

Aux.ageAux.age21-5221-52

Aux.ageAux.age

53-53-∞∞

5050 11

4545 11

ageage nodenode6262 22

5555 22

ageage nodenode

TedTed 5050 60K60K



SalarySalary51K-100K51K-100K




SalarySalary

101K-101K-∞∞

BobBob 2020 10K10K



SalarySalary0-50K0-50K

BERD (Cont…)BERD (Cont…)

High correlation between age and salary High correlation between age and salary attribute values:attribute values:

BERDBERD RangeRange IdealIdeal

AA 50% * 150% * 1 50% * 150% * 1 50% * 150% * 1

BB 50% * 150% * 1 50% * 950% * 9 50% * 150% * 1

AvgAvg 11 55 11

BERD (Cont…)BERD (Cont…)

Low correlation between age and salary Low correlation between age and salary attribute values:attribute values:

BERDBERD RangeRange IdealIdeal

AA 50% * 150% * 1 50% * 150% * 1 50% * 150% * 1

BB 50% * 950% * 9 50% * 950% * 9 50% * 150% * 1

AvgAvg 55 55 11

Is it possible to avoid lookup in the auxiliary table? Is it possible to avoid lookup in the auxiliary table?

Experimental environmentExperimental environment

Verified simulation model of the Gamma Verified simulation model of the Gamma database machinedatabase machine

A 32 processor systemA 32 processor system Database consists of a 100,000 tuple table Database consists of a 100,000 tuple table

based on the Wisconsin Benchmark.based on the Wisconsin Benchmark.

Experimental DesignExperimental Design

Correlation betweenCorrelation betweenpartitioning attribute partitioning attribute

valuesvalues

Workload Workload characteristics (A,B)characteristics (A,B)

Multiprogramming levelMultiprogramming level

LowLow HighHigh

Low, LowLow, Low

Low, ModerateLow, Moderate

Moderate, LowModerate, Low

Moderate, ModerateModerate, Moderate

Low-Low Query Mix (Low Correlation)Low-Low Query Mix (Low Correlation)



Low-Low Query Mix (High Correlation)Low-Low Query Mix (High Correlation)



Low-Moderate Mix (Low Correlation)Low-Moderate Mix (Low Correlation)



Low-Moderate Mix (High Correlation)Low-Moderate Mix (High Correlation)



Moderate-Moderate Mix (Low Correlation)Moderate-Moderate Mix (Low Correlation)



Moderate-Moderate Mix (High Correlation)Moderate-Moderate Mix (High Correlation)



Advantages of MAGICAdvantages of MAGIC

Provides a superior performance when Provides a superior performance when compared to BERD and Rangecompared to BERD and Range

Constructs the grid directory using the Constructs the grid directory using the workload of the relation. Changes the shape workload of the relation. Changes the shape of the grid directory in order to compensate of the grid directory in order to compensate for the different frequencies of access to the for the different frequencies of access to the partitioning attributes.partitioning attributes.

Minimizes the overhead of parallelism.Minimizes the overhead of parallelism. Supports partial declustering of a relation in Supports partial declustering of a relation in

large systems.large systems.

SummarySummary

Given the fast speed of CPUs, each Given the fast speed of CPUs, each query/transaction should be processed by query/transaction should be processed by one node ideally.one node ideally.

Parallelism versus Efficient ServersParallelism versus Efficient Servers

Even if all queries and transactions become Even if all queries and transactions become single-sited, parallelism is no substitute for single-sited, parallelism is no substitute for smart algorithms that make a single server smart algorithms that make a single server efficient.efficient.

Why?Why?

Why?Why?

Assume a single server that can process one Assume a single server that can process one request per second.request per second.

Two choices:Two choices:1.1. Extend it with Flash and obtain a throughput of 3 Extend it with Flash and obtain a throughput of 3

requests per second.requests per second.

2.2. Buy two additional servers and partition the data Buy two additional servers and partition the data across the 3 servers.across the 3 servers.

Given 3 simultaneous requests issued to Given 3 simultaneous requests issued to each alternative:each alternative: The single processor system will process 3 The single processor system will process 3

requests per second.requests per second. The 3 node system may not provide a throughput The 3 node system may not provide a throughput

of 3 requests per second.of 3 requests per second.

3R1 R2 R3R1 R3 R2

R2 R1 R3R2 R3 R1

R3 R1 R2R3 R2 R1

{R1, R2, R3}{R1, R2, R3}

{R1, R2, R3}

{R1, R3} R2{R1, R3} R2

{R1, R3} R2

{R1, R3}R2{R1, R3}R2

6Idealcases

{R1, R3}R2

{R2, R3} R1{R2, R3} R1

{R2, R3} R1

{R2, R3}R1{R2, R3}R1

{R2, R3}R1

{R2, R1} R3{R2, R1} R3

{R2, R1} R3

{R2, R1}R3{R2, R1}R3

{R2, R1}R3

21

27 ways to 27 ways to assign 3 assign 3

requests to requests to the 3 the 3

nodes!nodes!

Brain TeaserBrain Teaser

Given N servers and M requests, Given N servers and M requests, compute the probability of:compute the probability of:

M/N requests per node.M/N requests per node. Number of ways M requests may map onto N servers Number of ways M requests may map onto N servers

and the probability of each scenario.and the probability of each scenario.

Brain TeaserBrain Teaser

Given N servers and M requests, Given N servers and M requests, compute the probability of:compute the probability of:

M/N requests per node.M/N requests per node. Number of ways M requests may map onto N servers Number of ways M requests may map onto N servers

and the probability of each scenario.and the probability of each scenario.

Reward for correct answer:Reward for correct answer:

gamma dbms part 1: physical database design

Documents

disk failure

disk chassis

tb disk

disk drives

available storage

huge disk drive

network sharing storage

storage nas