database management systems m physical access to data g
TRANSCRIPT
![Page 1: Database Management Systems M Physical Access to Data G](https://reader035.vdocuments.site/reader035/viewer/2022071214/62cbabc71aa8161ddf422ead/html5/thumbnails/1.jpg)
DBMG
Database Management Systems
Physical Access to Data
1
![Page 2: Database Management Systems M Physical Access to Data G](https://reader035.vdocuments.site/reader035/viewer/2022071214/62cbabc71aa8161ddf422ead/html5/thumbnails/2.jpg)
DBMG
DBMS Architecture
OPTIMIZER
MANAGEMENT OF ACCESS
METHODS
BUFFER MANAGER
CONCURRENCY CONTROL
RELIABILITY MANAGEMENT
SQL INSTRUCTION
System
Catalog
Index Files
Data Files
DATABASEDATABASE
2
![Page 3: Database Management Systems M Physical Access to Data G](https://reader035.vdocuments.site/reader035/viewer/2022071214/62cbabc71aa8161ddf422ead/html5/thumbnails/3.jpg)
DBMG
3
Physical Access Structures
Data may be stored on disk in different formats to provide efficient query execution
Different formats are appropriate for different query needs
Physical access structures describe how data is stored on disk
![Page 4: Database Management Systems M Physical Access to Data G](https://reader035.vdocuments.site/reader035/viewer/2022071214/62cbabc71aa8161ddf422ead/html5/thumbnails/4.jpg)
DBMG
4
Access Method Manager
Transforms an access plan generated by the optimizer into a sequence of physical access requests to (database) disk pages
It exploits access methods
An access method is a software module
It is specialized for a single physical data structure
It provides primitives for
reading data
writing data
![Page 5: Database Management Systems M Physical Access to Data G](https://reader035.vdocuments.site/reader035/viewer/2022071214/62cbabc71aa8161ddf422ead/html5/thumbnails/5.jpg)
DBMG
5
Access method
Selects the appropriate blocks of a file to be loaded in memory
Requests them to the Buffer Manager
Knows the organization of data into a page
can find specific tuples and values inside a page
![Page 6: Database Management Systems M Physical Access to Data G](https://reader035.vdocuments.site/reader035/viewer/2022071214/62cbabc71aa8161ddf422ead/html5/thumbnails/6.jpg)
DBMG
6
Organization of a disk page
Different for different access methods
Divided in
Space available for data
Space reserved for access method control information
Space reserved for file system control information
![Page 7: Database Management Systems M Physical Access to Data G](https://reader035.vdocuments.site/reader035/viewer/2022071214/62cbabc71aa8161ddf422ead/html5/thumbnails/7.jpg)
DBMG
7
Remarks
Tuples may have varying size
Varchar types
Presence of Null values
A single tuple may span several pages
When its size is larger than a single page
e.g., for BLOB or CLOB data types
![Page 8: Database Management Systems M Physical Access to Data G](https://reader035.vdocuments.site/reader035/viewer/2022071214/62cbabc71aa8161ddf422ead/html5/thumbnails/8.jpg)
DBMG
Database Management Systems
Physical Access Structures
8
![Page 9: Database Management Systems M Physical Access to Data G](https://reader035.vdocuments.site/reader035/viewer/2022071214/62cbabc71aa8161ddf422ead/html5/thumbnails/9.jpg)
DBMG
9
Physical Access Structures
Physical access structures describe how data is stored on disk to provide efficient query execution
SQL select, update, …
In relational systems
Physical data storage
Sequential structures
Hash structures
Indexing to increase access efficiency
Tree structures (B-Tree, B+-Tree)
Unclustered hash index
Bitmap index
![Page 10: Database Management Systems M Physical Access to Data G](https://reader035.vdocuments.site/reader035/viewer/2022071214/62cbabc71aa8161ddf422ead/html5/thumbnails/10.jpg)
DBMG
10
Sequential Structures
Tuples are stored in a given sequential order
Different types of structures implement different ordering criteria
Available sequential structures
Heap file (entry sequenced)
Ordered sequential structure
![Page 11: Database Management Systems M Physical Access to Data G](https://reader035.vdocuments.site/reader035/viewer/2022071214/62cbabc71aa8161ddf422ead/html5/thumbnails/11.jpg)
DBMG
11
Heap file
Tuples are sequenced in insertion orderinsert is typically an append at the end of the file
All the space in a block is completely exploited before starting a new block
Delete or update may cause wasted space Tuple deletion may leave unused space
Updated tuple may not fit if new values have larger size
Sequential reading/writing is very efficient
Frequently used in relational DBMSjointly with unclustered (secondary) indices to support search and sort operations
![Page 12: Database Management Systems M Physical Access to Data G](https://reader035.vdocuments.site/reader035/viewer/2022071214/62cbabc71aa8161ddf422ead/html5/thumbnails/12.jpg)
DBMG
12
Ordered sequential structures
The order in which tuples are written depends on the value of a given key, called sort key
A sort key may contain one or more attributes
the sort key may be the primary key
Appropriate for
Sort and group by operations on the sort key
Search operations on the sort key
Join operations on the sort key
when sorting is used for join
![Page 13: Database Management Systems M Physical Access to Data G](https://reader035.vdocuments.site/reader035/viewer/2022071214/62cbabc71aa8161ddf422ead/html5/thumbnails/13.jpg)
DBMG
13
Ordered sequential structures
Problem
preserving the sort order when inserting new tuples
it may also hold for update
Solution
Leaving a percentage of free space in each block during table creation
On insertion, dynamic (re)sorting in main memory of tuples into a block
Alternative solution
Overflow file containing tuples which do not fit into the correct block
![Page 14: Database Management Systems M Physical Access to Data G](https://reader035.vdocuments.site/reader035/viewer/2022071214/62cbabc71aa8161ddf422ead/html5/thumbnails/14.jpg)
DBMG
14
Ordered sequential structures
Typically used with B+-Tree clustered (primary) indices
the index key is the sort key
Used by the DBMS to store intermediate operation results
![Page 15: Database Management Systems M Physical Access to Data G](https://reader035.vdocuments.site/reader035/viewer/2022071214/62cbabc71aa8161ddf422ead/html5/thumbnails/15.jpg)
DBMG
15
Tree structures
Provide “direct” access to data based on the value of a key field
The key includes one or more attributes
It does not constrain the physical position of tuples
The most widespread in relational DBMS
![Page 16: Database Management Systems M Physical Access to Data G](https://reader035.vdocuments.site/reader035/viewer/2022071214/62cbabc71aa8161ddf422ead/html5/thumbnails/16.jpg)
DBMG
16
General characteristics
One root node
![Page 17: Database Management Systems M Physical Access to Data G](https://reader035.vdocuments.site/reader035/viewer/2022071214/62cbabc71aa8161ddf422ead/html5/thumbnails/17.jpg)
DBMG
Tree structure
U1
17
![Page 18: Database Management Systems M Physical Access to Data G](https://reader035.vdocuments.site/reader035/viewer/2022071214/62cbabc71aa8161ddf422ead/html5/thumbnails/18.jpg)
DBMG
18
General characteristics
One root node
Many intermediate nodes
Nodes have a large fan-out
Each node has many children
![Page 19: Database Management Systems M Physical Access to Data G](https://reader035.vdocuments.site/reader035/viewer/2022071214/62cbabc71aa8161ddf422ead/html5/thumbnails/19.jpg)
DBMG
Tree structure
U1
19
![Page 20: Database Management Systems M Physical Access to Data G](https://reader035.vdocuments.site/reader035/viewer/2022071214/62cbabc71aa8161ddf422ead/html5/thumbnails/20.jpg)
DBMG
20
General characteristics
One root node
Many intermediate nodes
Nodes have a large fan-out
Each node has many children
Leaf nodes provide access to data
Clustered
Unclustered
![Page 21: Database Management Systems M Physical Access to Data G](https://reader035.vdocuments.site/reader035/viewer/2022071214/62cbabc71aa8161ddf422ead/html5/thumbnails/21.jpg)
DBMG
Tree structure
U1
DATA
21
![Page 22: Database Management Systems M Physical Access to Data G](https://reader035.vdocuments.site/reader035/viewer/2022071214/62cbabc71aa8161ddf422ead/html5/thumbnails/22.jpg)
DBMG
22
B-Tree and B+-Tree
Two different tree structures for indexing
B-Tree
Data pages are reached only through key values by visiting the tree
B+-Tree
Provides a link structure allowing sequential access in the sort order of key values
![Page 23: Database Management Systems M Physical Access to Data G](https://reader035.vdocuments.site/reader035/viewer/2022071214/62cbabc71aa8161ddf422ead/html5/thumbnails/23.jpg)
DBMG
B-Tree structure
U1
DATA
23
![Page 24: Database Management Systems M Physical Access to Data G](https://reader035.vdocuments.site/reader035/viewer/2022071214/62cbabc71aa8161ddf422ead/html5/thumbnails/24.jpg)
DBMG
B+-Tree structure
U1
DATA
24
![Page 25: Database Management Systems M Physical Access to Data G](https://reader035.vdocuments.site/reader035/viewer/2022071214/62cbabc71aa8161ddf422ead/html5/thumbnails/25.jpg)
DBMG
25
B-Tree and B+-Tree
Two different tree structures for indexing
B-Tree
Data pages are reached only through key values by visiting the tree
B+-Tree
Provides a link structure allowing sequential access on the sort order of key values
B stands for balanced
Leaves are all at the same distance from the root
Access time is constant, regardless of the searched value
![Page 26: Database Management Systems M Physical Access to Data G](https://reader035.vdocuments.site/reader035/viewer/2022071214/62cbabc71aa8161ddf422ead/html5/thumbnails/26.jpg)
DBMG
26
Clustered
The tuple is contained into the leaf node
Constrains the physical position of tuples in a given leaf node
The position may be modified by splitting the node, when it is full
Also called key sequenced
Typically used for primary key indexing
![Page 27: Database Management Systems M Physical Access to Data G](https://reader035.vdocuments.site/reader035/viewer/2022071214/62cbabc71aa8161ddf422ead/html5/thumbnails/27.jpg)
DBMG
Clustered B+-Tree index
U1
Data Data Data Data
27
![Page 28: Database Management Systems M Physical Access to Data G](https://reader035.vdocuments.site/reader035/viewer/2022071214/62cbabc71aa8161ddf422ead/html5/thumbnails/28.jpg)
DBMG
28
Unclustered
The leaf contains physical pointers to actual data
The position of tuples in a file is totally unconstrained
Also called indirect
Used for secondary indices
![Page 29: Database Management Systems M Physical Access to Data G](https://reader035.vdocuments.site/reader035/viewer/2022071214/62cbabc71aa8161ddf422ead/html5/thumbnails/29.jpg)
DBMG
Unclustered B+-Tree index
U1
Data
29
![Page 30: Database Management Systems M Physical Access to Data G](https://reader035.vdocuments.site/reader035/viewer/2022071214/62cbabc71aa8161ddf422ead/html5/thumbnails/30.jpg)
DBMG
30
Example: Unclustered B+-Tree index
STUDENT (StudentId, Name, Grade)
T1 T6 T10
19 34 34
T2 T3 T5
22 30 33
T4 T7
30 34
T8 T9
40 50
DATA FILE FOR STUDENT TABLE
33 34 34 34 40
(T5) (T6 ) (T10) (T7) (T8)
50
(T9)
12<=Grade < 1919 56
56< Grade <=78
19 <= Grade <= 56
12 78
Grade < 12
Grade > 78
12 <= Grade <= 78
33 44
19 <= Grade < 33 44< Grade <= 56
33 <= Grade <= 44
19 22 30 30
(T1) (T2 ) (T3) (T4)
LEAF
![Page 31: Database Management Systems M Physical Access to Data G](https://reader035.vdocuments.site/reader035/viewer/2022071214/62cbabc71aa8161ddf422ead/html5/thumbnails/31.jpg)
DBMG
31
Example: Clustered B+-Tree index
STUDENT (StudentId, Name, Grade)
12<=Grade < 1919 56
56< Grade <=78
19 <= Grade <= 56
33 44
19 <= Grade < 33 44< Grade <= 56
33 <= Grade <= 44
12 78
Grade < 12
Grade > 78
12 <= Grade <= 78
LEAF
T5
33
T6
34
T10
34
T7
34
T8
40
T9
50
T1
19
T2
22
T3
30
T4
30
DATA FILE FOR STUDENT TABLE
![Page 32: Database Management Systems M Physical Access to Data G](https://reader035.vdocuments.site/reader035/viewer/2022071214/62cbabc71aa8161ddf422ead/html5/thumbnails/32.jpg)
DBMG
32
Advantages and disadvantages
Advantages
Very efficient for range queries
Appropriate for sequential scan in the order of the key field
Always for clustered, not guaranteed otherwise
Disadvantages
Insertions may require a split of a leaf
possibly, also of intermediate nodes
computationally intensive
Deletions may require merging uncrowded nodes and re-balancing
![Page 33: Database Management Systems M Physical Access to Data G](https://reader035.vdocuments.site/reader035/viewer/2022071214/62cbabc71aa8161ddf422ead/html5/thumbnails/33.jpg)
DBMG
33
Hash structure
It guarantees direct and efficient access to data based on the value of a key field
The hash key may include one or more attributes
Suppose the hash structure has B blocks
The hash function is applied to the key field value of a record
It returns a value between 0 and B-1 which defines the position of the record
Blocks should never be completely filled
To allow new data insertion
![Page 34: Database Management Systems M Physical Access to Data G](https://reader035.vdocuments.site/reader035/viewer/2022071214/62cbabc71aa8161ddf422ead/html5/thumbnails/34.jpg)
DBMG
Example: hash index
STUDENT (StudentId, Name, Grade)
TUPLE T1
StudentId = 50
TUPLE T4
StudentId = 75
H(StudentId =50)=1
H(StudentId =75)=1
BLOCK 0
BLOCK 1
BLOCK 2
T1 50
T4 75
DATA FILE FOR STUDENT TABLE
34
![Page 35: Database Management Systems M Physical Access to Data G](https://reader035.vdocuments.site/reader035/viewer/2022071214/62cbabc71aa8161ddf422ead/html5/thumbnails/35.jpg)
DBMG
35
Hash index
Advantages
Very efficient for queries with equality predicate on the key
No sorting of disk blocks is required
Disadvantages
Inefficient for range queries
Collisions may occur
![Page 36: Database Management Systems M Physical Access to Data G](https://reader035.vdocuments.site/reader035/viewer/2022071214/62cbabc71aa8161ddf422ead/html5/thumbnails/36.jpg)
DBMG
36
Unclustered hash index
It guarantees direct and efficient access to data based on the value of a key field
Similar to hash index
Blocks contain pointers to data
Actual data is stored in a separate structure
Position of tuples is not constrained to a block
Different from hash index
![Page 37: Database Management Systems M Physical Access to Data G](https://reader035.vdocuments.site/reader035/viewer/2022071214/62cbabc71aa8161ddf422ead/html5/thumbnails/37.jpg)
DBMG
Example: Unclustered hash index
STUDENT (StudentId, Name, Grade)
T1 30
T2 40
DATA FILE FOR
STUDENT TABLE
BLOCK 0
BLOCK 1
BLOCK 2
INDEX BLOCKS
TUPLE T1
GRADE = 30
TUPLE T2
GRADE = 40
H(GRADE=30)=1
H(GRADE=40)=1
30 → T1
40 → T2
37
![Page 38: Database Management Systems M Physical Access to Data G](https://reader035.vdocuments.site/reader035/viewer/2022071214/62cbabc71aa8161ddf422ead/html5/thumbnails/38.jpg)
DBMG
38
Bitmap index
It guarantees direct and efficient access to data based on the value of a key field
It is based on a bit matrix
The bit matrix references data rows by means of RIDs (Row IDentifiers)
Actual data is stored in a separate structure
Position of tuples is not constrained
![Page 39: Database Management Systems M Physical Access to Data G](https://reader035.vdocuments.site/reader035/viewer/2022071214/62cbabc71aa8161ddf422ead/html5/thumbnails/39.jpg)
DBMG
39
Bitmap index
The bit matrix has
One column for each different value of the indexed attribute
One row for each tuple
Position (i, j) of the matrix is
1 if tuple i takes value j
0 otherwiseRID Val1 Val2 … Valn
1 0 0 … 1
2 0 0 … 0
3 0 0 … 1
4 1 0 … 0
5 0 1 … 0
![Page 40: Database Management Systems M Physical Access to Data G](https://reader035.vdocuments.site/reader035/viewer/2022071214/62cbabc71aa8161ddf422ead/html5/thumbnails/40.jpg)
DBMG
Example: Bitmap index
T2
T4
DATA FILE
FOR EMPLOYEE
TABLE
EMPLOYEE (EmployeeId, Name, Job)
Domain of Job attribute = {Engineer, Consultant, Manager, Programmer, Secretary, Accountant}
RID Eng. Cons. Man. Prog. Secr. Acc.
1 0 0 1 0 0 0
2 0 0 0 1 0 0
3 0 0 0 0 1 0
4 0 0 0 1 0 0
5 1 0 0 0 0 0
Prog.
0
1
0
1
0
40
![Page 41: Database Management Systems M Physical Access to Data G](https://reader035.vdocuments.site/reader035/viewer/2022071214/62cbabc71aa8161ddf422ead/html5/thumbnails/41.jpg)
DBMG
41
Bitmap index
Advantages
Very efficient for boolean expressions of predicates
Reduced to bit operations on bitmaps
Appropriate for attributes with limited domain cardinality
Disadvantages
Not used for continuous attributes
Required space grows significantly with domain cardinality