databases 2012 file structures - computer science...

Christian S. Jensen

Computer Science, Aarhus University

Databases 2012

File Structures

2 File Structures

File Structures

Data structure for data stored on disk

Implementation of queries requires

• File structures

• Algorithms for operators

• Sorting, selection (), projection (), join (⋈), Duplicate removal

• Query optimization and execution

We consider typical database file structures

3 File Structures

Data Storage

Relations are stored as bits.

Functionality requirements

• Sequentially process tuples

• Search for tuples that meet some condition

• Insert and delete tuples

Performance objectives

• Achieve a high packing density (little wasted space)

• Achieve fast response time

4 File Structures

File Organization Overview

File

• A database is stored as a collection of files.

• Storage (usually a disk)

• Random access (requires disk arm move)

• Non-volatile

Records

• A file is a set of records, generally of the same type.

• A record is a sequence of fields.

• Record size

• Fixed - same number of fields of same size

• Variable - may have a different number of fields or a field may

vary in size

5 File Structures

Disk Concepts – Files

File systems

• Blocks not necessarily contiguous

• Allocated in blocks

Contiguous block I/O faster (avoids disk seek)

• 8 msec to seek, read

• .4 msec for contiguous block read

Can reorganize to make block chains contiguous

report.doc

block 1 unused

book.htm

block 1

report.doc

block 2

report.doc

block 3

blocks on disk

6 File Structures

Disk Concepts - Blocks

Input/Output entire blocks

Block size is O/S dependent (e.g., 8KB)

Block size is usually much bigger than record size

• Many records in each block

• Exception: spanning – each record bigger than a block

• Fill percentage – percentage of filled space per block

• Supported by some DBMSs (e.g., Oracle)

50% filled

room to grow

100% filled

compact

7 File Structures

Storing a Table

A table is typically stored on disk

Several rows fit into one disk block

Disks are slow

• Accessing a random block is 8ms

• Accessing the next consecutive block is 0.4ms

• RAM access time 8-10 ns (L1+L2 cache even faster)

• One disk access = 1,000,000 RAM accesses!

• Justifies ”count only I/Os” model of complexity

Part of the table may be in RAM (buffer pool)

The table can be stored in sorted order

8 File Structures

Heap File

Assume n records pr table and R records pr block

• Search for 1 record: n/2R accesses O(n) - slow

• Insertion: 2 O(1) - fast

• Deletion (k records): n/R O(n) - slow

• Modification (k records): n/R O(n) - slow

Leonardo

Racquel

Amanda Arnold

Mufasa

Tom

Omar Jamie

Records are stored in blocks, no particular order

Example: Assume records are names

blocks Gerard

9 File Structures

Sequential File

Suitable for applications that need sequential

access.

The records in the file are ordered by a search-

key.

Example: Each record is a name.

Leonardo

Racquel Amanda

Arnold

Mufasa

Tom

Omar

Jamie

blocks

Gerard

10 File Structures

Sequential File

Search for key value, O(log2(n)) cost

Deletion: 1 disk I/O (after search)

Insertion: locate where record is to be inserted

• If there is free space, insert

• If no free space, insert the record in an overflow block

or shift records to previous or next block

Reorganize

• Restore block contiguity, fill percentage

• Remove empty blocks

• O(n) cost

11 File Structures

Sorting a Table

In RAM, many sorting algorithms are available

• Typical time complexity is O(n log2(n))

Those can also be performed on disk

• But they often perform poorly

• Only I/O accesses need to be counted

Specialized versions of algorithms are needed

12 File Structures

An Example Scenario

A 1 GB table with

• 10,000,000 rows

• Each row is 100 bytes

A disk with 4K sized blocks

• Each block holds 40 rows

• 250,000 blocks for the entire table

50MB RAM

• 1/20 of the table

13 File Structures

Recursive Merge Sort

The merge sort algorithm

• Split the list into two sublists

• Recursively sort the sublists

• Merge the sorted sublists to get the sorted list

• The time complexity is O(n log2(n))

Each row is read and written log2(107) = 23 times

Time consumed

• 23 2 10,000,000 8ms = 43 days

14 File Structures

Two-Phase Multiway Merge Sort

Load RAM with 12,500 blocks = 500,000 rows

• Sort those (for ”free”) using any RAM sorting algorithm

Do this 20 times to obtain 20 sorted sublists

• Store the 20 sorted sublists on 20 disks

Merge the sublists using a RAM buffer for each

• Only consecutive reads of blocks

• Only consecutive writes of blocks

Each row is read and written 2 times

Time consumed

• 2 2 20 12,500 0.4ms = 6.7 minutes

15 File Structures

Lessons Learned

Naive algorithms fall short

The reality of storage must be considered

• Use entire block contents

• Read blocks consecutively

• Buffer information in RAM

Two groups at CS address this

• The database/data-intensive systems group (systems)

• The Madalgo group (theory)

16 File Structures

Selection

SELECT *

FROM R

WHERE condition;

Full table scan

• Read all rows in the table

• Report those that satisfy the condition

Fine if many rows will actually be selected

• Rule of thumb is 5-10%

17 File Structures

Point Query

SELECT *

FROM People

WHERE userid = ’amoeller’;

We know that userid is a key

Optimization if People is sorted on userid

• Full table scan can stop sooner

Binary search not necessarily better

• Random disk access vs. sequential access

So, what can help us?

18 File Structures

Indexes

A table can be equipped with an index

• A data structure that helps find rows quickly

• Rows are identified by a subset of the attributes

A table may have several indexes

• In contrast, it can only be sorted on one criterion

Pros and cons of indexes

• Make (some) queries faster

• Make modifications slower

19 File Structures

Indexes in SQL

CREATE INDEX DateIndex

ON Meetings(date);

CREATE INDEX ExamIndex

ON Exams(vip,date,time);

An index on several attributes also gives an index

for any prefix of those attributes

Think of this as a virtual sorting of the table

Each primary key has by default an index

20 File Structures

Using Indexes

CREATE INDEX Idx ON R(a1,a2,...,an);

Some queries are now ”easy”

• A range query or point query on a1

• A point query on a1 combined with a range query on a2

• A range query on a1, a2, and a3

Others are not really easier

• A range query on a17

In case of large modifications of the table

• DROP INDEX Idx;

• Rebuild the index afterwards

21 File Structures

Indexed File

Suitable for applications that require random

access

Usually combined with sequential file

A single-level index is an auxiliary file of entries

<search-key, pointer to record>

ordered on the search-key.

Index is separate from data file

• Usually smaller

10-20% rule of thumb, take with a grain of salt!

• Can have multiple indexes on same relation

22 File Structures

Types of Single-Level Indexes

Primary Index

• Defined on a data file ordered on the primary key

Clustering Index

• Defined on a data file ordered on a non-key field

• One index entry for each distinct value of the field

• Points to first data block of records for search key

Secondary Index

• Defined on a data file not ordered on the index’s search

key

Sparse vs. Dense Index

• A dense index has one entry for each search key value

• Sparse - fewer index entries than search key values

23 File Structures

Example: Primary (Sparse) Index

123456 Melanie 701 Broad Tucson AZ

246800 Eric 701 Broad Tucson AZ

336699 Merrie 123 Speed Tucson AZ

456789 Tom 197 Cardiff Houston TX

555555 Lee 221 Post Houston TX

678900 Jane 197 Cardiff Houston TX

890890 Linda 234 Oak Houston TX

993889 Jeff 564 Alberca Tucson AZ

...

CustomerID Name Street City State

123456 336699 555555

890890

Index

Data File

24 File Structures

Example: Clustering (Dense) Index




112200 Ken 73 Elm Houston TX



147906 Cheryl 89 Pine Wichita KS

034321 Karsten 15 Main Wichita KS

...


Houston Tucson

Wichita

Index

Data File


25 File Structures

Example: Secondary (Dense) Index








993889 David 64 Alberca Tucson AZ

...


David Eric Jane

Linda

Index

Data File

Lee

Melanie Merrie Tom

26 File Structures

Searching a Single-Level Index

Sequential search

• Faster than linear search of main file.

• Index is smaller than the main file

• Worst-case search cost is still O(n).

Binary search

• Key space:

• Search cost is O(log2(n)) time (n = size of the index).

27 File Structures

Multi-Level Indexes

Observation: A single-level index is an ordered file

• Can create a primary index on another index!

• Original is called the second-level index

• The index to the index is called the top-level index

Multi-level index: repeat the process, until all entries

of the top level index fit in one disk block

• Pin top-level index in main memory (RAM)

• Every level of the index is an ordered file

28 File Structures

Multi-level Index Example








993889 Jeff 564 Alberca Tucson AZ

... CustomerID Name Street City State

Second-level

index

Data file

123456

336699

123456

555555

555555

890890

Top-level

index

29 File Structures

B-Trees

A data structure for indexes on tables

• A variation of search trees

• Trades some extra space to gain better performance

Supports the necessary operations

• Insert a new row

• Delete an exisiting row

• Search for a row given the index attribute(s)

”Perfect” for disk storage

• High fanout

• Very robust to data changes, data volumes, etc.

• Used by ALL RDBMSes

30 File Structures

Each node is stored in one disk block

Each row is pointed to by a leaf node

B-Tree Example

12

0

15

0

18

0

30

3

5

11

10

0

10

1

11

0

12

0

13

0

15

0

15

6

17

9

18

0

20

0

30

35

10

0

31 File Structures

B-Tree Internal Node

57

81

95

to keys

< 57

to keys

57 k< 81

to keys

81 k< 95

to keys

95

32 File Structures

B-Tree Leaf Node

57

81

95

to record

with key 57

to record

with key 81

to record

with key 95

to next

leaf node

33 File Structures

B-Tree Invariants

Assume each node (block) holds at most k keys

• Typically k is several hundreds

Each node must hold at least (k+1)/2 pointers

• Except for the root: may have down to 2 pointers

All leaves must be at the same level

This ensures that the tree remains balanced:

• Its height with n rows is at most 1+logk/2(n)

• In practice, the height is 3 or 4 (1-2 top levels in RAM)

34 File Structures

Search path for key 101

Time proportional to the height of the tree

B-Tree Point Query

12

0

15

0

18

0

30

3

5

11

10

0

10

1

11

0

12

0

13

0

15

0

15

6

17

9

18

0

20

0

30

35

10

0

35 File Structures

Subtree for keys between 101 and 166

Time proportional to height + size of range

B-Tree Range Query

12

0

15

0

18

0

30

3

5

11

10

0

10

1

11

0

12

0

13

0

15

0

15

6

17

9

18

0

20

0

30

35

10

0

36 File Structures

B-Tree Insertion (1/4)

Inserting 33 (simple case)

12

0

15

0

18

0

30

3

5

11

10

0

10

1

11

0

12

0

13

0

15

0

15

6

17

9

18

0

20

0

30

35

10

0

37 File Structures


Inserting 33 (simple case)

12

0

15

0

18

0

30

3

5

11

10

0

10

1

11

0

12

0

13

0

15

0

15

6

17

9

18

0

20

0

30

33

35

10

0

38 File Structures


Inserting 7 (split leaf)

12

0

15

0

18

0

30

3

5

11

10

0

10

1

11

0

12

0

13

0

15

0

15

6

17

9

18

0

20

0

30

33

35

10

0

39 File Structures


Inserting 7 (split leaf)

12

0

15

0

18

0

7

30

7

11

10

0

10

1

11

0

12

0

13

0

15

0

15

6

17

9

18

0

20

0

30

33

35

10

0

3

5

40 File Structures


Inserting 160 (split internal node)

12

0

15

0

18

0

7

30

7

11

10

0

10

1

11

0

12

0

13

0

15

0

15

6

17

9

18

0

20

0

30

33

35

10

0

3

5

41 File Structures


Inserting 160 (split internal node)

12

0

15

0

7

30

7

11

10

0

10

1

11

0

12

0

13

0

15

0

15

6

18

0

20

0

30

33

35

3

5

16

0

17

9

18

0

10

0

16

0

42 File Structures


Insert 45 (split the root)

1

2

3

10

12

10

20

30

2

0

25

30

32

40

43 File Structures


Insert 45 (split the root)

The height increases

1

2

3

10

12

20

25

30

32

40

45

40

30

10

20

44 File Structures

B-Tree Deletion

Balanced deletion is also possible

• There are similar case-based algorithms

Generally, deleted rows are left as tombstones

• The overhead of deletion is too large

Most tables tend to grow with time

• The tombstones quickly get reused

Otherwise, periodically rebuild the index

• Or perform online reorg of the index

45 File Structures

Clustering Index

Generally, indexed rows are scattered in the table

A clustering index has consecutive rows

Equivalent to sorting the table

46 File Structures

Clustering Index

At most one index can be the clustering index

• But other indexes may happen to be clustered too

• Attributes may be correlated

CREATE INDEX ExamIndex

ON Exams

CLUSTER(vip,date,time);

A cluster index on a primary key is a bad idea

• Range queries are not often meaningful

• Keys should not not carry information themselves

47 File Structures

Index Queries

If the query uses only indexed attributes

• ”Virtually constant” time evaluation

SELECT date

FROM Exams

WHERE vip = ’amoeller’;

SELECT vip, COUNT(date) AS Dates

FROM Exams

GROUP BY vip;

48 File Structures

Boolean Index Selection

SELECT *

FROM R

WHERE x=42 AND y>87;

We have one index for x and another for y

• Use index scan to find row pointers for x=42

• Use index scan to find row pointers for y>87

• Compute the intersection of the two pointer set

Similarly, OR corresponds to disjunction

databases 2012 file structures - computer science...

Documents