cmpt 454, simon fraser university, fall 2009, martin ester 75 database systems ii record...
TRANSCRIPT
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 1
Database Systems II
Record Organization
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 2
IntroductionWe have introduced secondary storage devices, in particular disks.Disks use blocks as basic units of transfer and storage.In a DBS, we have to manage entities, typically represented as records with attributes (relational model).Attribute values (data items) can be complex: texts, images, videos.How to organize records on disk?
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 3
IntroductionData Items
Records
Blocks
Files
Memory
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 4
Data Items
Short Integer, unsigned (16 Bits)
e.g., 35 is
Short Integer, signed
e.g., -35 is
00000000 00100011
25 + 21 + 20
10000000 00100011
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 5
Data Items
Floating point (32 Bits, single precision) 1 bit for sign, m for exponent, n bits for mantissa
value = (-1)sign x 2exponent – bias x 1.fraction
bias = 2m-1-1 = 127
= 0.15625
124 1.01 (binary) = 1.25 (decimal)
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 6
Data Items
Charactersvarious coding schemes suggestedmost popular is ASCII
1 Character = 1 Byte = 8 Bitsexamples
A: 1000001a: 11000015: 0110101LF: 0001010
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 7
Data Items
String of charactersNULL-terminated e.g.,
length indicator e.g.,
fixed lengthdo not need terminating
NULL or length indicator
c ta m uo es
c ta 43 m uo es
VA
RC
HA
RC
HA
R
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 8
Records
Fixed formate.g., relational data model
record is list of data itemsnumber and type of data items
fixedvs. variable format
e.g., XMLnumber of data items variable
Fixed lengthvs. variable length
e.g., VARCHAR, blobs, repeating fields
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 9
Records
Record header keeps general information about the record.Header contains (some of) the following:- pointer to schema,- record types,- record length (to skip record without consulting schema),- timestamp of last access,- pointers (offsets) to record attributes.
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 10
Fixed-Length Records
Fixed length records are the simplest format.Header contains only pointer to schema, record length and timestamp.Example
(1) id, 2 byte integer(2) name, 10 char. Schema(3) dept, 2 byte code55 s m i t h 02
83 j o n e s 01Records
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 11
Variable-Length Records
Variable length records first store fixed length fields, followed by variable-length fields.Header contains also pointers (offsets) to variable-length fields (except first one).NULL values can be efficently represented by null pointers in the header.
header birthdate name address
fixed variable
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 12
Variable Format Records
Variable format records are self-describing.Simplest representation as sequence of tagged fields.Record header contains number of fields.Tag contains- attribute name,- attribute type,- field length (if not apparent from attribute type).
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 13
Variable Format Records
Example
4I52 4S DROF46
#
Fi
eld
s
Cod
e
iden
tify
ing
fi
eld
as
E#
Inte
ger
typ
e
Valu
e
Cod
e f
or
EN
am
e
Str
ing
typ
e
Leng
th o
f st
r.
Valu
e
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 14
IntroductionHow to pack records into blocks?Need to address the following issues
- separating records,- spanned vs. unspanned,- mixing record types, - ordering records,- addressing records,- BLOBs.
Packing Records Into Blocks
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 15
Separating records
for fixed-length records, no need to separate for variable-length records use special markeror store record lengths (or offsets)
- within each record- in block header
Packing Records Into Blocks
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 16
Spanned vs. unspannedUnspanned records on a single block
block 1 block 2
Spanned records split into fragmentsthat are distributed over multiple blocks
block 1 block 2
Packing Records Into Blocks
R1 R2 R3 R4 R5
R1 R2 R3(a)
R3(b) R6R5R4 R7
(a)
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 17
Packing Records Into BlocksSpanned vs. unspanned
Spanning necessary if records do not fit in a block, e.g. if they have very long fields (VARCHAR, blob).Spanning useful for better space utilization, even if records fit in a block.
2050 bytes wasted 2046 2050 bytes wasted 2046
R1 R2
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 18
Packing Records Into BlocksSpanned vs. unspanned
Spanned records requires extra header information in records and record fragments:- is it fragment? (bit)- if fragment, is it first or last? (bit)- if applicable, pointers to previous/next fragment.
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 19
Packing Records Into BlocksMixing record types
Can we mix records of different types (relations)in same block?Why should we? Records that are frequently accessed together should be in the same block (clustering).
Compromise: no mixing on same block, but keep related records in same cylinder.
EMP e1 DEPT d1 DEPT d2
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 20
Packing Records Into BlocksOrdering records
Want to efficiently read records in orderOrdering records in file and block by some key value (sequential file)Implementation options - next record physically contiguous
- link to next record
Next (R1)R1
R1 Next (R1)
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 21
Packing Records Into BlocksOrdering records
Overflow area
R1
R2
R3
R4
R5
header
R2.1
R1.3
R4.7
records insequence
links more flexible under modifications, but require random I/O
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 22
Packing Records Into BlocksAddressing records
In memory records have virtual memory address.How to address a record on disk?Trade-off between efficiency of dereferencing and flexibility in moving records over timeMany options, from purely physical to fully indirect (logical)
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 23
Packing Records Into BlocksAddressing records
Purely physicaldirect address on disk
Record address (record ID) consists ofcylinder #
track #block #offset in block
efficient access, but cannot move record
within block or across blocks
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 24
Packing Records Into BlocksAddressing records
Fully indirect (logical)address (record ID) can be any bit
string,needs to be mapped to physical
address
inefficient access, but can move record anywhere
Physicaladdr.Rec ID
Often8-16 Bytes!
Typically≤ 8 Bytes
map
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 25
Packing Records Into BlocksAddressing records
Indirection in block
compromise between physical and fully indirect
Header
Free
space
R3
R4
R1 R2
offsets
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 26
Packing Records Into BlocksBLOBs
Sometimes, very large data items, e.g. images, videos,. . .
Store them as Binary Large Objects (BLOBs)
Stored as spanned sequence of blocks- preferably contiguous on disk,- possibly even striped over several disks.
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 27
Record ModificationsInsertions
If record order does not matter, just append record to file, i.e. find a block of that file with empty space.
If order does matter, may be more complicated.
First locate corresponding block.
If block has enough space left, insert record there and rearrange records in block.
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 28
Record ModificationsInsertions
If block does not have enough space left, use one of the two following strategies:- find space in nearby block
predecessor or successor in sort order - create overflow block
and link it into chain of blocks only if order implemented by links
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 29
Record ModificationsInsertions
Overflow blocks raise the following issues:
- How much free space to leave in each block, track, cylinder?
- How often to reorganize file and overflow area?
Freespace
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 30
Record ModificationsDeletions
Try to reclaim space.Either move around records within a block or maintain available-space list.Must avoid dangling pointers to deleted records.Typical approach: place tombstone in place of deleted record.
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 31
Record ModificationsDeletions
physical record IDs
This space This space cannever re-used be re-used
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 32
Record ModificationsDeletions
logical record IDs
ID LOC
7788
map
Never reuseID 7788 nor
space in map...
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 33
Buffer ManagementIntroduction
The part of the DB relevant for the current transactions needs to reside in memory, i.e. in the DB buffer.
DB
MAIN MEMORY
DISK
disk block
free frame
BUFFER POOL
choice of frame dictatedby replacement policy
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 34
Buffer ManagementBlock request
If requested block is not in buffer pool, choose a frame for replacement.
If this frame is dirty, write it to disk.
Read requested page into chosen frame.
Pin the page and return its address.
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 35
Buffer ManagementBlock replacement
Requestor of page must unpin it, as soon as block is no longer needed.Requestor must also indicate whether block has been modified: dirty bit is used for this.Block in pool may be requested many times, and a pin count is used. Block is candidate for replacement if and only if pin count = 0.
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 36
Buffer ManagementReplacement policy
Frame is chosen for replacement by a replacement policy:- Least-recently-used (LRU), - Clock,
efficient approximation of LRU- Most-recently-used (MRU) etc.Policy can have big impact on number of I/O’s, depends on the access pattern.Will discuss more in the context of query optimization.