1 cse444: review. 2 cse444 in one slide v logical : e/r diagram normalized relations v physical :...

13
CSE444: REVIEW

Upload: margaret-franklin

Post on 19-Jan-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 CSE444: REVIEW. 2 CSE444 in one slide v Logical : E/R diagram  normalized relations v Physical : files, buffering, and indexes v Logical : Relational

1

CSE444: REVIEW

Page 2: 1 CSE444: REVIEW. 2 CSE444 in one slide v Logical : E/R diagram  normalized relations v Physical : files, buffering, and indexes v Logical : Relational

2

CSE444 in one slide

Logical: E/R diagramnormalized relations Physical: files, buffering, and indexes Logical: Relational algebra and SQL Physical: join methods, optimization CC/R choices affect correctness, performance Object Databases extend, improve relations

– … OODBMS, ORDBMS are converging?– … OQL, SQL3 are converging?

Page 3: 1 CSE444: REVIEW. 2 CSE444 in one slide v Logical : E/R diagram  normalized relations v Physical : files, buffering, and indexes v Logical : Relational

3

The E/R model

Entities vs. relationships Attributes for entities and relationships Keys and weak entities (no foreign keys!!) Cardinality constraints Participation (key) constraints Translation to relations:

– Entity becomes relation– Relationship becomes relation iff it is many-many– Other relationships: key of “parent” goes with

“child”

Page 4: 1 CSE444: REVIEW. 2 CSE444 in one slide v Logical : E/R diagram  normalized relations v Physical : files, buffering, and indexes v Logical : Relational

4

FDs and Normal Forms Determine Candidate Keys (CKs) A field is prime iff it is in some CK 1NF: All relations are in 1NF 2NF: For each FD XA:

– A is prime OR X is not a proper subset of any CK 3NF: For each FD XA:

– X is a superkey OR A is prime OR A X (trivial) BCNF: For each FD XA:

– X is a superkey OR A X (trivial) Normalize by decomposing R (ABC) to R1(AB) and R2

(BC), where BC violates a NF

Page 5: 1 CSE444: REVIEW. 2 CSE444 in one slide v Logical : E/R diagram  normalized relations v Physical : files, buffering, and indexes v Logical : Relational

5

Physical DB Design

File organizations: heapfile, sorted file, hashed file

Disks: speed and physical characteristics Buffer pool can vary in size; frame holds page Replacement policy choice is important Indexes can speed up data access:

– Clustered vs. unclustered– Primary vs. secondary – Dense vs. sparse– B+ Trees are the most common: log(N) searches

Page 6: 1 CSE444: REVIEW. 2 CSE444 in one slide v Logical : E/R diagram  normalized relations v Physical : files, buffering, and indexes v Logical : Relational

6

Queries: Logical Aspects Relational Algebra

, , , , – Joins kind of important too!!!

General join Equijoin Natural join

– Transformations for optimization SQL

– Declarative– Relational calculus + algebra + grouping, etc.– Computationally incomplete– Query execution: relational algebra operations

Page 7: 1 CSE444: REVIEW. 2 CSE444 in one slide v Logical : E/R diagram  normalized relations v Physical : files, buffering, and indexes v Logical : Relational

7

Queries: Physical Aspects

Implementation of relational algebra– File scan vs. index lookup vs. binary search – Exact-match vs. range queries– Impact of clustered vs. non-clustered indices– Join methods:

Block-Oriented Nested Loops (good idea, if buffers avail.) Index Nested Loops (good idea, if index avail.) Sort-Merge (good idea, if sorted or needs to be sorted)

Query optimization: cost estimates hard!!!– Large space of physical, logical alternatives– Prune space by considering only left-deep plans– Enables pipelined execution

Page 8: 1 CSE444: REVIEW. 2 CSE444 in one slide v Logical : E/R diagram  normalized relations v Physical : files, buffering, and indexes v Logical : Relational

8

Concurrency Control and Recovery

ACID transactions enforce CC&R Strict 2PL “guarantees” serializability (C, I)

– Deadlocks possible (detect and abort a XACT)– Phantom tuples possible (fix with index locking)

ARIES guarantees XACTS are A, D– Write-ahead logging– Log UNDO actions to allow complete replaying of history– Recovery phases:

Analyze (rebuild main memory structures) REDO (rebuild data in buffer frames) UNDO (abort uncommitted XACTS)

Page 9: 1 CSE444: REVIEW. 2 CSE444 in one slide v Logical : E/R diagram  normalized relations v Physical : files, buffering, and indexes v Logical : Relational

9

Object Databases

ORDBMS (e.g. Oracle 8i) vs. OODBMS (e.g. O2)

Structural aspects– Collection-valued fields– OIDs– Freely composable type constructors

Behavioral aspects (methods) Adds new dimensions to RDBMS problems SQL3, OQL standards are emerging

Page 10: 1 CSE444: REVIEW. 2 CSE444 in one slide v Logical : E/R diagram  normalized relations v Physical : files, buffering, and indexes v Logical : Relational

10

Current and Future DBMS Issues

New applications yield new techniques New techniques yield new applications Some “new” applications:

– Data warehousing– On-line analytical processing (OLAP)– Data mining– Distributed data– Heterogeneous data and data integration– Scientific/sequential/ordered data– Partial or approximate query answers

Page 11: 1 CSE444: REVIEW. 2 CSE444 in one slide v Logical : E/R diagram  normalized relations v Physical : files, buffering, and indexes v Logical : Relational

11

Current and Future Issues (cont.)

– Active DBs: rule management (ICs and triggers)– Real-time DBMS– Web-based DBMS– XML and semi-structured data– Spatial and high-dimensional data (lots of

columns)– Special-purpose DBMSs– Digital Libraries– Geographic Information Systems– etc…..

Page 12: 1 CSE444: REVIEW. 2 CSE444 in one slide v Logical : E/R diagram  normalized relations v Physical : files, buffering, and indexes v Logical : Relational

12

Current and Future Issues (cont.)

Some “new” techniques:– New kinds of indices– Improved B Trees– Faster aggregation algorithms– New QP algorithms– Better optimization techniques– Data broadcasting– Generic data models– Faster sorting algorithms– New query languages– Deductive DBMSs

Page 13: 1 CSE444: REVIEW. 2 CSE444 in one slide v Logical : E/R diagram  normalized relations v Physical : files, buffering, and indexes v Logical : Relational

13

Current and Future Issues (cont.)

– Object databases– New algebras– Query cost estimation– New locking and commit protocols– Main-memory databases– CC/R techniques for non-relational settings– DBMS interfaces, visualization tools– DBMS development tools– etc….

BOTTOM LINE: Lots of opportunities for jobs, research, development, and fun !!!