gamma dbms part 1: physical database design shahram ghandeharizadeh computer science department...

Gamma DBMS Part 1: Physical Database Design Shahram Ghandeharizadeh Computer Science Department University of Southern California Outline Alternative architectures: Alternative architectures: Shared-disk versus Shared-Nothing Declustering techniques. Declustering techniques. Shared-Disk Architecture Emerged in 1980s: Emerged in 1980s: Many clients share storage and data: data remains available when a client fails. Network Data Shared-Disk Architecture Advantages: Advantages: Many clients share storage and data. Redundancy is implemented in one place protecting all clients from disk failure. Network Shared-Disk Architecture Advantages: Advantages: Many clients share storage and data. Redundancy is implemented in one place protecting all clients from disk failure. Centralized backup: The administrator does not care/know how many clients are on the network sharing storage. Network Shared-Disk Architecture Advantages: Advantages: Many clients share storage and data. Redundancy is implemented in one place protecting all clients from disk failure. Centralized backup: The administrator does not care/know how many clients are on the network sharing storage. Network High Availability Data Backup Data Sharing Network failures What about network failures? What about network failures? Two host bus adapters per server, Each server connected to a different switch. Shared-Disk Architecture Storage Area Network (SAN): Storage Area Network (SAN): Block level access, Write to storage is immediate, Specialized hardware including switches, host bus adapters, disk chassis, battery backed caches, etc. Expensive Supports transaction processing systems. Network Attached Storage (NAS): Network Attached Storage (NAS): File level access, Write to storage might be delayed, Generic hardware, In-expensive, Not appropriate for transaction processing systems. Concepts and Terminology Virtualization: Virtualization: Available storage is represented as one HUGE disk drive, e.g., a SAN with a thousand 1.5 TB disk provides 1 Petabyte of storage, Available storage is partitioned into Logical Unit Numbers (LUNs), A LUN is presented to one or more servers, A LUN appears as a disk drive to a server. SAN places blocks across physical disks intelligently to balance load. What to do when a PC fails? What to do when a PC fails? Shared-Nothing Each node (blade) consisted of one processor, memory, and a disk drive. Each node (blade) consisted of one processor, memory, and a disk drive. Network CPU 1 CPU N . Shared-Nothing Each node (blade) may consist of one or several processors, memory, and one or several disk drives. Each node (blade) may consist of one or several processors, memory, and one or several disk drives. Network . CPU 1 CPU 2 CPU n DRAM 1 DRAM 2 DRAM D CPU 1 CPU 2 CPU n DRAM 1 DRAM 2 DRAM D Node 1 Node M Shared-Nothing Network CPU 1 CPU nM . Partition resources to construct logical nodes. With an 8 CPU PC, construct eight logical nodes each with a CPU, fraction of memory, and one disk drive. Partition resources to construct logical nodes. With an 8 CPU PC, construct eight logical nodes each with a CPU, fraction of memory, and one disk drive. Data Declustering Data is partitioned across the nodes (why?): Data is partitioned across the nodes (why?): Random/round-robin, Hash partitioning, Range partitioning. Each piece of a table is termed a fragment. Each piece of a table is termed a fragment. Single attribute declustering strategies Single attribute declustering strategies Two multi-attribute declustering strategies: Two multi-attribute declustering strategies: 1. Multi-Attribute GrId deClustering (MAGIC) 2. Bubbas Extended Range Declustering (BERD) Horizontal Declustering Physical View Bob2010K Shideh1835K Ted5060K Kevin62120K Angela55140K Mike4590K Logical View nameage salary Emp Horizontal Declustering No partitioning attribute: Random and Round-robin. No partitioning attribute: Random and Round-robin. Single attribute declustering strategies: Single attribute declustering strategies: Hash, Range. Note: the database administrator must choose one attribute as the partitioning attribute. Hash Declustering Bob2010K Shideh1835K Ted5060K Kevin62120K Angela55140K Mike4590K Physical View nameage salary salary % 3 Ted5060KKevin62120K nameage salaryBob2010KMike4590K nameage salaryShideh1835KAngela55140K nameage salary Emp salary is the partitioning attribute. Hash Declustering Selections with equality predicates referencing the partitioning attribute are directed to a single node: Selections with equality predicates referencing the partitioning attribute are directed to a single node: Retrieve Emp where salary = 60K Equality predicates referencing a non- partitioning attribute and range predicates are directed to all nodes: Equality predicates referencing a non- partitioning attribute and range predicates are directed to all nodes: Retrieve Emp where age = 20 Retrieve Emp where salary < 20K SELECT * FROM Emp WHERE salary=60K SELECT * FROM Emp WHERE salary 21 and age 21 and age < 22. Query B, range query referencing the salary attribute. On average, retrieves 10 tuples. Retrieve Emp where salary > 50K and salary 50K and salary < 50.5K Access methods: A non-clustered B + -tree index on age A clustered B + -tree index on salary Ideally, both queries should be directed to one node. Ideally, both queries should be directed to one node. Multi-Attribute Declustering (E.g. Cont...) Range decluster Emp using age as the partitioning attribute. Range decluster Emp using age as the partitioning attribute. Assuming a system configured with nine nodes, the number of employed nodes is: Assuming a system configured with nine nodes, the number of employed nodes is: RangeIdeal A 50% * 1 B 50% * 9 50% * 1 Average51 MAGIC Construct a multi-attribute grid directory on the Emp table Construct a multi-attribute grid directory on the Emp table Each dimension corresponds to a partitioning attribute. Each dimension corresponds to a partitioning attribute. Each cell represents a fragment of the relation. Each cell represents a fragment of the relation Salary Age MAGIC (Low Correlation) Low correlation between salary and age attribute values: Low correlation between salary and age attribute values: MAGICRangeIdealA 50% * 3 50% * 1 B 50% * 3 50% * 9 50% * 1 Avg MAGIC (High Correlation) High correlation between salary and age attribute values: High correlation between salary and age attribute values: MAGICRangeIdealA 50% * 1 B 50% * 9 50% * 1 Avg BERD Range partition Emp using the salary attribute. Range partition Emp using the salary attribute. For the age attribute, construct an auxiliary relation containing: For the age attribute, construct an auxiliary relation containing: 1. The age attribute value of each record 2. Node containing that record Range partition the auxiliary relation using the age attribute value. Range partition the auxiliary relation using the age attribute value. BERD Bob2010K Shideh1835K Ted5060K Kevin62120K Angela55140K Mike4590K Physical View nameage salaryBob2010KShideh1835K nameage salaryTed5060KMike4590K nameage salaryKevin62120KAngela55140K nameage salary 0-50K51K-100K 101K- Emp salary is the primary partitioning attribute. BERD, Auxiliary relation age NodeBob2010KShideh1835K nameage salaryTed5060KMike4590K nameage salaryKevin62120KAngela55140K nameage salary 0-50K51K-100K 101K- Auxiliary relation BERD, Auxiliary relation age Node age node Auxiliary relation Range partition auxiliary relation using the age attribute age node age node BERD, Auxiliary relation age node Aux.age0-20 Aux.age21-52 Aux.age 53- age node age nodeTed5060KMike4590K nameage salary Salary51K-100KKevin62120KAngela55140K nameage salary Salary 101K- Bob2010KShideh1835K nameage salary Salary0-50K BERD (Cont) High correlation between age and salary attribute values: High correlation between age and salary attribute values: BERDRangeIdeal A 50% * 1 B 50% * 9 50% * 1 Avg151 BERD (Cont) Low correlation between age and salary attribute values: Low correlation between age and salary attribute values: BERDRangeIdeal A 50% * 1 B 50% * 9 50% * 1 Avg551 Is it possible to avoid lookup in the auxiliary table? Experimental environment Verified simulation model of the Gamma database machine Verified simulation model of the Gamma database machine A 32 processor system A 32 processor system Database consists of a 100,000 tuple table based on the Wisconsin Benchmark. Database consists of a 100,000 tuple table based on the Wisconsin Benchmark. Experimental Design Correlation between partitioning attribute values Workload characteristics (A,B) Multiprogramming level Low High Low, Low Low, Moderate Moderate, Low Moderate, Moderate Low-Low Query Mix (Low Correlation) Multiprogramming Level Throughput (Queries/Second) Low-Low Query Mix (High Correlation) Multiprogramming Level Throughput (Queries/Second) Low-Moderate Mix (Low Correlation) Multiprogramming Level Throughput (Queries/Second) Low-Moderate Mix (High Correlation) Multiprogramming Level Throughput (Queries/Second) Moderate-Moderate Mix (Low Correlation) Multiprogramming Level Throughput (Queries/Second) Moderate-Moderate Mix (High Correlation) Multiprogramming Level Throughput (Queries/Second) Advantages of MAGIC Provides a superior performance when compared to BERD and Range Provides a superior performance when compared to BERD and Range Constructs the grid directory using the workload of the relation. Changes the shape of the grid directory in order to compensate for the different frequencies of access to the partitioning attributes. Constructs the grid directory using the workload of the relation. Changes the shape of the grid directory in order to compensate for the different frequencies of access to the partitioning attributes. Minimizes the overhead of parallelism. Minimizes the overhead of parallelism. Supports partial declustering of a relation in large systems. Supports partial declustering of a relation in large systems. Summary Given the fast speed of CPUs, each query/transaction should be processed by one node ideally. Given the fast speed of CPUs, each query/transaction should be processed by one node ideally. Parallelism versus Efficient Servers Even if all queries and transactions become single-sited, parallelism is no substitute for smart algorithms that make a single server efficient. Even if all queries and transactions become single-sited, parallelism is no substitute for smart algorithms that make a single server efficient. Why? Why? Why? Assume a single server that can process one request per second. Assume a single server that can process one request per second. Two choices: Two choices: 1. Extend it with Flash and obtain a throughput of 3 requests per second. 2. Buy two additional servers and partition the data across the 3 servers. Given 3 simultaneous requests issued to each alternative: Given 3 simultaneous requests issued to each alternative: The single processor system will process 3 requests per second. The 3 node system may not provide a throughput of 3 requests per second. 3 R1R2R3 R1R3R2 R1R3 R2R3R1 R3R1R2 R3R2R1 {R1, R2, R3} {R1, R3}R2 {R1, R3}R2 {R1, R3}R2 {R1, R3}R2 {R1, R3}R2 6 Ideal cases {R1, R3}R2 {R2, R3}R1 {R2, R3}R1 {R2, R3}R1 {R2, R3}R1 {R2, R3}R1 {R2, R3}R1 {R2, R1}R3 {R2, R1}R3 {R2, R1}R3 {R2, R1}R3 {R2, R1}R3 {R2, R1}R ways to assign 3 requests to the 3 nodes! Brain Teaser Given N servers and M requests, Given N servers and M requests, compute the probability of: M/N requests per node. Number of ways M requests may map onto N servers and the probability of each scenario. Brain Teaser Given N servers and M requests, Given N servers and M requests, compute the probability of: M/N requests per node. Number of ways M requests may map onto N servers and the probability of each scenario. Reward for correct answer:

gamma dbms part 1: physical database design shahram ghandeharizadeh computer science department...

Documents