database management systems m physical access to data g

41
D B M G Database Management Systems Physical Access to Data 1

Upload: others

Post on 11-Jul-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Database Management Systems M Physical Access to Data G

DBMG

Database Management Systems

Physical Access to Data

1

Page 2: Database Management Systems M Physical Access to Data G

DBMG

DBMS Architecture

OPTIMIZER

MANAGEMENT OF ACCESS

METHODS

BUFFER MANAGER

CONCURRENCY CONTROL

RELIABILITY MANAGEMENT

SQL INSTRUCTION

System

Catalog

Index Files

Data Files

DATABASEDATABASE

2

Page 3: Database Management Systems M Physical Access to Data G

DBMG

3

Physical Access Structures

Data may be stored on disk in different formats to provide efficient query execution

Different formats are appropriate for different query needs

Physical access structures describe how data is stored on disk

Page 4: Database Management Systems M Physical Access to Data G

DBMG

4

Access Method Manager

Transforms an access plan generated by the optimizer into a sequence of physical access requests to (database) disk pages

It exploits access methods

An access method is a software module

It is specialized for a single physical data structure

It provides primitives for

reading data

writing data

Page 5: Database Management Systems M Physical Access to Data G

DBMG

5

Access method

Selects the appropriate blocks of a file to be loaded in memory

Requests them to the Buffer Manager

Knows the organization of data into a page

can find specific tuples and values inside a page

Page 6: Database Management Systems M Physical Access to Data G

DBMG

6

Organization of a disk page

Different for different access methods

Divided in

Space available for data

Space reserved for access method control information

Space reserved for file system control information

Page 7: Database Management Systems M Physical Access to Data G

DBMG

7

Remarks

Tuples may have varying size

Varchar types

Presence of Null values

A single tuple may span several pages

When its size is larger than a single page

e.g., for BLOB or CLOB data types

Page 8: Database Management Systems M Physical Access to Data G

DBMG

Database Management Systems

Physical Access Structures

8

Page 9: Database Management Systems M Physical Access to Data G

DBMG

9

Physical Access Structures

Physical access structures describe how data is stored on disk to provide efficient query execution

SQL select, update, …

In relational systems

Physical data storage

Sequential structures

Hash structures

Indexing to increase access efficiency

Tree structures (B-Tree, B+-Tree)

Unclustered hash index

Bitmap index

Page 10: Database Management Systems M Physical Access to Data G

DBMG

10

Sequential Structures

Tuples are stored in a given sequential order

Different types of structures implement different ordering criteria

Available sequential structures

Heap file (entry sequenced)

Ordered sequential structure

Page 11: Database Management Systems M Physical Access to Data G

DBMG

11

Heap file

Tuples are sequenced in insertion orderinsert is typically an append at the end of the file

All the space in a block is completely exploited before starting a new block

Delete or update may cause wasted space Tuple deletion may leave unused space

Updated tuple may not fit if new values have larger size

Sequential reading/writing is very efficient

Frequently used in relational DBMSjointly with unclustered (secondary) indices to support search and sort operations

Page 12: Database Management Systems M Physical Access to Data G

DBMG

12

Ordered sequential structures

The order in which tuples are written depends on the value of a given key, called sort key

A sort key may contain one or more attributes

the sort key may be the primary key

Appropriate for

Sort and group by operations on the sort key

Search operations on the sort key

Join operations on the sort key

when sorting is used for join

Page 13: Database Management Systems M Physical Access to Data G

DBMG

13

Ordered sequential structures

Problem

preserving the sort order when inserting new tuples

it may also hold for update

Solution

Leaving a percentage of free space in each block during table creation

On insertion, dynamic (re)sorting in main memory of tuples into a block

Alternative solution

Overflow file containing tuples which do not fit into the correct block

Page 14: Database Management Systems M Physical Access to Data G

DBMG

14

Ordered sequential structures

Typically used with B+-Tree clustered (primary) indices

the index key is the sort key

Used by the DBMS to store intermediate operation results

Page 15: Database Management Systems M Physical Access to Data G

DBMG

15

Tree structures

Provide “direct” access to data based on the value of a key field

The key includes one or more attributes

It does not constrain the physical position of tuples

The most widespread in relational DBMS

Page 16: Database Management Systems M Physical Access to Data G

DBMG

16

General characteristics

One root node

Page 17: Database Management Systems M Physical Access to Data G

DBMG

Tree structure

U1

17

Page 18: Database Management Systems M Physical Access to Data G

DBMG

18

General characteristics

One root node

Many intermediate nodes

Nodes have a large fan-out

Each node has many children

Page 19: Database Management Systems M Physical Access to Data G

DBMG

Tree structure

U1

19

Page 20: Database Management Systems M Physical Access to Data G

DBMG

20

General characteristics

One root node

Many intermediate nodes

Nodes have a large fan-out

Each node has many children

Leaf nodes provide access to data

Clustered

Unclustered

Page 21: Database Management Systems M Physical Access to Data G

DBMG

Tree structure

U1

DATA

21

Page 22: Database Management Systems M Physical Access to Data G

DBMG

22

B-Tree and B+-Tree

Two different tree structures for indexing

B-Tree

Data pages are reached only through key values by visiting the tree

B+-Tree

Provides a link structure allowing sequential access in the sort order of key values

Page 23: Database Management Systems M Physical Access to Data G

DBMG

B-Tree structure

U1

DATA

23

Page 24: Database Management Systems M Physical Access to Data G

DBMG

B+-Tree structure

U1

DATA

24

Page 25: Database Management Systems M Physical Access to Data G

DBMG

25

B-Tree and B+-Tree

Two different tree structures for indexing

B-Tree

Data pages are reached only through key values by visiting the tree

B+-Tree

Provides a link structure allowing sequential access on the sort order of key values

B stands for balanced

Leaves are all at the same distance from the root

Access time is constant, regardless of the searched value

Page 26: Database Management Systems M Physical Access to Data G

DBMG

26

Clustered

The tuple is contained into the leaf node

Constrains the physical position of tuples in a given leaf node

The position may be modified by splitting the node, when it is full

Also called key sequenced

Typically used for primary key indexing

Page 27: Database Management Systems M Physical Access to Data G

DBMG

Clustered B+-Tree index

U1

Data Data Data Data

27

Page 28: Database Management Systems M Physical Access to Data G

DBMG

28

Unclustered

The leaf contains physical pointers to actual data

The position of tuples in a file is totally unconstrained

Also called indirect

Used for secondary indices

Page 29: Database Management Systems M Physical Access to Data G

DBMG

Unclustered B+-Tree index

U1

Data

29

Page 30: Database Management Systems M Physical Access to Data G

DBMG

30

Example: Unclustered B+-Tree index

STUDENT (StudentId, Name, Grade)

T1 T6 T10

19 34 34

T2 T3 T5

22 30 33

T4 T7

30 34

T8 T9

40 50

DATA FILE FOR STUDENT TABLE

33 34 34 34 40

(T5) (T6 ) (T10) (T7) (T8)

50

(T9)

12<=Grade < 1919 56

56< Grade <=78

19 <= Grade <= 56

12 78

Grade < 12

Grade > 78

12 <= Grade <= 78

33 44

19 <= Grade < 33 44< Grade <= 56

33 <= Grade <= 44

19 22 30 30

(T1) (T2 ) (T3) (T4)

LEAF

Page 31: Database Management Systems M Physical Access to Data G

DBMG

31

Example: Clustered B+-Tree index

STUDENT (StudentId, Name, Grade)

12<=Grade < 1919 56

56< Grade <=78

19 <= Grade <= 56

33 44

19 <= Grade < 33 44< Grade <= 56

33 <= Grade <= 44

12 78

Grade < 12

Grade > 78

12 <= Grade <= 78

LEAF

T5

33

T6

34

T10

34

T7

34

T8

40

T9

50

T1

19

T2

22

T3

30

T4

30

DATA FILE FOR STUDENT TABLE

Page 32: Database Management Systems M Physical Access to Data G

DBMG

32

Advantages and disadvantages

Advantages

Very efficient for range queries

Appropriate for sequential scan in the order of the key field

Always for clustered, not guaranteed otherwise

Disadvantages

Insertions may require a split of a leaf

possibly, also of intermediate nodes

computationally intensive

Deletions may require merging uncrowded nodes and re-balancing

Page 33: Database Management Systems M Physical Access to Data G

DBMG

33

Hash structure

It guarantees direct and efficient access to data based on the value of a key field

The hash key may include one or more attributes

Suppose the hash structure has B blocks

The hash function is applied to the key field value of a record

It returns a value between 0 and B-1 which defines the position of the record

Blocks should never be completely filled

To allow new data insertion

Page 34: Database Management Systems M Physical Access to Data G

DBMG

Example: hash index

STUDENT (StudentId, Name, Grade)

TUPLE T1

StudentId = 50

TUPLE T4

StudentId = 75

H(StudentId =50)=1

H(StudentId =75)=1

BLOCK 0

BLOCK 1

BLOCK 2

T1 50

T4 75

DATA FILE FOR STUDENT TABLE

34

Page 35: Database Management Systems M Physical Access to Data G

DBMG

35

Hash index

Advantages

Very efficient for queries with equality predicate on the key

No sorting of disk blocks is required

Disadvantages

Inefficient for range queries

Collisions may occur

Page 36: Database Management Systems M Physical Access to Data G

DBMG

36

Unclustered hash index

It guarantees direct and efficient access to data based on the value of a key field

Similar to hash index

Blocks contain pointers to data

Actual data is stored in a separate structure

Position of tuples is not constrained to a block

Different from hash index

Page 37: Database Management Systems M Physical Access to Data G

DBMG

Example: Unclustered hash index

STUDENT (StudentId, Name, Grade)

T1 30

T2 40

DATA FILE FOR

STUDENT TABLE

BLOCK 0

BLOCK 1

BLOCK 2

INDEX BLOCKS

TUPLE T1

GRADE = 30

TUPLE T2

GRADE = 40

H(GRADE=30)=1

H(GRADE=40)=1

30 → T1

40 → T2

37

Page 38: Database Management Systems M Physical Access to Data G

DBMG

38

Bitmap index

It guarantees direct and efficient access to data based on the value of a key field

It is based on a bit matrix

The bit matrix references data rows by means of RIDs (Row IDentifiers)

Actual data is stored in a separate structure

Position of tuples is not constrained

Page 39: Database Management Systems M Physical Access to Data G

DBMG

39

Bitmap index

The bit matrix has

One column for each different value of the indexed attribute

One row for each tuple

Position (i, j) of the matrix is

1 if tuple i takes value j

0 otherwiseRID Val1 Val2 … Valn

1 0 0 … 1

2 0 0 … 0

3 0 0 … 1

4 1 0 … 0

5 0 1 … 0

Page 40: Database Management Systems M Physical Access to Data G

DBMG

Example: Bitmap index

T2

T4

DATA FILE

FOR EMPLOYEE

TABLE

EMPLOYEE (EmployeeId, Name, Job)

Domain of Job attribute = {Engineer, Consultant, Manager, Programmer, Secretary, Accountant}

RID Eng. Cons. Man. Prog. Secr. Acc.

1 0 0 1 0 0 0

2 0 0 0 1 0 0

3 0 0 0 0 1 0

4 0 0 0 1 0 0

5 1 0 0 0 0 0

Prog.

0

1

0

1

0

40

Page 41: Database Management Systems M Physical Access to Data G

DBMG

41

Bitmap index

Advantages

Very efficient for boolean expressions of predicates

Reduced to bit operations on bitmaps

Appropriate for attributes with limited domain cardinality

Disadvantages

Not used for continuous attributes

Required space grows significantly with domain cardinality