cs 4432lecture #41 cs4432: database systems ii lecture #5 professor elke a. rundensteiner
Post on 21-Dec-2015
219 views
TRANSCRIPT
CS 4432 lecture #4 1
CS4432: Database Systems IILecture #5
Professor Elke A. Rundensteiner
CS 4432 lecture #4 2
TODAY
• Lecture on chapter 3 for ~30 min
• Then introduction to project 1 ~20 min
• Project-1 team formation later today (watch for it on my.wpi)
CS 4432 lecture #4 3
Storage Layout :How to lay out data on disk. ( chapter 3)
CS 4432 lecture #4 4
Data Items
Records
Blocks
Files
Memory
Overview
CS 4432 lecture #4 5
Record - Collection of related data
items (called FIELDS)
E.g.: Employee record:name field,salary field,date-of-hire field,
...
CS 4432 lecture #4 6
Types of records:
• Main choices:– FIXED vs VARIABLE FORMAT– FIXED vs VARIABLE LENGTH
CS 4432 lecture #4 7
A SCHEMA contains information such as:
- # fields (attributes)- type of each field (length)- order of attributes in record- meaning of each field
(domain) - constraints (primary key, etc).
Fixed format
Not associated with each record.
CS 4432 lecture #4 8
Example: fixed format and length
Employee record(1) E#, 2 byte integer(2) E.name, 10 char. Schema(3) Dept, 2 byte code
We can simply concatenate fields.
55 s m i t h 02
83 j o n e s 01
Records
CS 4432 lecture #4 9
• What : Not all fields are included in the record, and possibly in different orders.
• Then : Record itself must contain format, i.e., it is “Self Describing”:
Variable format
CS 4432 lecture #4 10
Why Variable format ?
• “sparse” records• repeating fields• evolving formats
CS 4432 lecture #4 11
Example: variable format and length
4I52 4S DROF46
Field name codes could also be strings, i.e. TAGS
# F
ield
s
Cod
e id
enti
fyin
g
field
as
E#
Inte
ger
typ
e
Cod
e f
or
En
am
eS
trin
g t
yp
eLe
ng
th o
f st
r.
CS 4432 lecture #4 12
• EXAMPLE: variable format record with repeating
fieldse.g., Employee has one or more children
3 E_name: Fred Child: SallyChild: Tom
• Do repeating fields always require variable format and length?
CS 4432 lecture #4 13
e.g., a person and her hobbies.
Mary Sailing Chess --
•Then must allocate maximum number of
repeating fields (if not used, set to null)
CS 4432 lecture #4 14
Many variants betweenfixed - variable format:
Example1: Include record type in record
record type record lengthtells me whatto expect(i.e., points to schema)
5 27 . . . .
CS 4432 lecture #4 15
Record header - data at beginning
that describes record
May contain:- pointer to schema (record type)- length of record - time stamp (create time, mod.
time)- other stuff (e.g., ROW-ID in
Oracle)
CS 4432 lecture #4 16
Example2: Variant btw FIXED/VAR format
• Hybrid format : one part is fixed, other is variable
E.g.: All employees have E#, name, dept; andother fields vary.
25 Smith Toy 2 retiredHobby:chess
# of varfields
CS 4432 lecture #4 17
Also, many variations in internal
organization of record
Just to show one: length of field
3 F310 F1 5 F2 12 * * *
3 32 5 15 20 F1 F2 F3
total size
offsets
0 1 2 3 4 5 15 20
CS 4432 lecture #4 18
Question:We have seen examples for :
* Fixed format and length records* Variable format and length records
(a) Does fixed format and variable length
make sense?(b) Does variable format and fixed length
make sense?
CS 4432 lecture #4 19
Data Items
Records
Blocks
Files
Memory
Next:
CS 4432 lecture #4 20
Next: placing records into blocks
blocks ...
a file
assume fixedlength blocks
assume a single file (for now)
CS 4432 lecture #4 21
(1) separating records(2) spanned vs. unspanned(3) mixed record types – clustering(4) split records(5) sequencing(6) indirection
Options for storing records in blocks:
CS 4432 lecture #4 22
Block
(a) no need to separate - fixed size recs.(b) special marker(c) give record lengths (or offsets)
- within each record- in block header
(1) Separating records
R2R1 R3
CS 4432 lecture #4 23
• Unspanned: records must be within one block
block 1 block 2
...
• Spannedblock 1 block 2
...
(2) Spanned vs. Unspanned
R1 R2
R1
R3 R4 R5
R2 R3(a)
R3(b) R6R5R4 R7
(a)
CS 4432 lecture #4 24
• Unspanned is much simpler, but may waste space…
• Spanned essential if record size > block size
Spanned vs. unspanned:
CS 4432 lecture #4 25
Example
106 recordseach of size 2,050 bytes (fixed)block size = 4096 bytes
block 1 block 2
2050 bytes wasted 2046 2050 bytes wasted 2046
R1 R2
• Utiliz = 50% -> ½ of space is wasted
CS 4432 lecture #4 26
• Mixed - records of different types(e.g. EMPLOYEE, DEPT)allowed in same block
e.g., a block
(3) Mixed record types
EMP e1 DEPT d1 DEPT d2
CS 4432 lecture #4 27
Why do we want to mix?
• Answer: CLUSTERING
Records that are frequently accessed together should bein the same block
• Problems
Creates variable length records in block
Must avoid duplicates (how to cluster?)
Insert/deletes are harder
CS 4432 lecture #4 28
Example Clustering
Q1: select A#, C_NAME, C_CITY, …from DEPOSIT, CUSTOMERwhere DEPOSIT.C_NAME =
CUSTOMER.C.NAME
a blockCUSTOMER,NAME=SMITH
DEPOSIT,NAME=SMITH
DEPOSIT,NAME=SMITH
CS 4432 lecture #4 29
• If Q1 frequent, clustering good• But if Q2 frequent
Q2: SELECT * FROM CUSTOMER
CLUSTERING IS COUNTER PRODUCTIVE
CS 4432 lecture #4 30
Compromise:
No mixing, but keep relatedrecords in same cylinder ...
CS 4432 lecture #4 31
(1) Separating records(2) Spanned vs. Unspanned(3) Mixed record types - Clustering(4) Split records(5) Sequencing(6) Indirection
Recap: Storing records in blocks
CS 4432 lecture #5 32
Now on to Project 1 …