![Page 1: 03 Database Storage Part I - CMU 15-445/645 · 03 Database Storage Part I. CMU 15-445/645 (Fall 2019) ADMINISTRIVIA Homework #1 is due September 11th @ 11:59pm Project #1 will be](https://reader030.vdocuments.site/reader030/viewer/2022033123/5ec3aae55cbb287d2613c8c2/html5/thumbnails/1.jpg)
Intro to Database Systems
15-445/15-645
Fall 2019
Andy PavloComputer Science Carnegie Mellon UniversityAP
03 Database StoragePart I
![Page 2: 03 Database Storage Part I - CMU 15-445/645 · 03 Database Storage Part I. CMU 15-445/645 (Fall 2019) ADMINISTRIVIA Homework #1 is due September 11th @ 11:59pm Project #1 will be](https://reader030.vdocuments.site/reader030/viewer/2022033123/5ec3aae55cbb287d2613c8c2/html5/thumbnails/2.jpg)
CMU 15-445/645 (Fall 2019)
ADMINISTRIV IA
Homework #1 is due September 11th @ 11:59pm
Project #1 will be released on September 11th
5
![Page 3: 03 Database Storage Part I - CMU 15-445/645 · 03 Database Storage Part I. CMU 15-445/645 (Fall 2019) ADMINISTRIVIA Homework #1 is due September 11th @ 11:59pm Project #1 will be](https://reader030.vdocuments.site/reader030/viewer/2022033123/5ec3aae55cbb287d2613c8c2/html5/thumbnails/3.jpg)
CMU 15-445/645 (Fall 2019)
OVERVIEW
We now understand what a database looks like at a logical level and how to write queries to read/write data from it.
We will next learn how to build software that manages a database.
7
![Page 4: 03 Database Storage Part I - CMU 15-445/645 · 03 Database Storage Part I. CMU 15-445/645 (Fall 2019) ADMINISTRIVIA Homework #1 is due September 11th @ 11:59pm Project #1 will be](https://reader030.vdocuments.site/reader030/viewer/2022033123/5ec3aae55cbb287d2613c8c2/html5/thumbnails/4.jpg)
CMU 15-445/645 (Fall 2019)
COURSE OUTLINE
Relational Databases
Storage
Execution
Concurrency Control
Recovery
Distributed Databases
Potpourri
8
Query Planning
Operator Execution
Access Methods
Buffer Pool Manager
Disk Manager
![Page 5: 03 Database Storage Part I - CMU 15-445/645 · 03 Database Storage Part I. CMU 15-445/645 (Fall 2019) ADMINISTRIVIA Homework #1 is due September 11th @ 11:59pm Project #1 will be](https://reader030.vdocuments.site/reader030/viewer/2022033123/5ec3aae55cbb287d2613c8c2/html5/thumbnails/5.jpg)
CMU 15-445/645 (Fall 2019)
DISK-ORIENTED ARCHITECTURE
The DBMS assumes that the primary storage location of the database is on non-volatile disk.
The DBMS's components manage the movement of data between non-volatile and volatile storage.
9
![Page 6: 03 Database Storage Part I - CMU 15-445/645 · 03 Database Storage Part I. CMU 15-445/645 (Fall 2019) ADMINISTRIVIA Homework #1 is due September 11th @ 11:59pm Project #1 will be](https://reader030.vdocuments.site/reader030/viewer/2022033123/5ec3aae55cbb287d2613c8c2/html5/thumbnails/6.jpg)
CMU 15-445/645 (Fall 2019)
STORAGE HIERARCHY
10
CPU Registers
CPU Caches
DRAM
SSD
HDD
Network Storage
FasterSmaller
Expensive
SlowerLarger
Cheaper
VolatileRandom Access
Byte-Addressable
Non-VolatileSequential AccessBlock-Addressable
![Page 7: 03 Database Storage Part I - CMU 15-445/645 · 03 Database Storage Part I. CMU 15-445/645 (Fall 2019) ADMINISTRIVIA Homework #1 is due September 11th @ 11:59pm Project #1 will be](https://reader030.vdocuments.site/reader030/viewer/2022033123/5ec3aae55cbb287d2613c8c2/html5/thumbnails/7.jpg)
CMU 15-445/645 (Fall 2019)
STORAGE HIERARCHY
10
Memory
Disk
CMU 15-721CPU Registers
CPU Caches
DRAM
SSD
HDD
Network Storage
FasterSmaller
Expensive
SlowerLarger
Cheaper
![Page 8: 03 Database Storage Part I - CMU 15-445/645 · 03 Database Storage Part I. CMU 15-445/645 (Fall 2019) ADMINISTRIVIA Homework #1 is due September 11th @ 11:59pm Project #1 will be](https://reader030.vdocuments.site/reader030/viewer/2022033123/5ec3aae55cbb287d2613c8c2/html5/thumbnails/8.jpg)
CMU 15-445/645 (Fall 2019)
STORAGE HIERARCHY
10
Memory
Disk
CMU 15-721CPU Registers
CPU Caches
DRAM
SSD
HDD
Network Storage
FasterSmaller
Expensive
SlowerLarger
Cheaper
Non-volatile Memory
![Page 9: 03 Database Storage Part I - CMU 15-445/645 · 03 Database Storage Part I. CMU 15-445/645 (Fall 2019) ADMINISTRIVIA Homework #1 is due September 11th @ 11:59pm Project #1 will be](https://reader030.vdocuments.site/reader030/viewer/2022033123/5ec3aae55cbb287d2613c8c2/html5/thumbnails/9.jpg)
CMU 15-445/645 (Fall 2019)
ACCESS TIMES
11
0.5 ns L1 Cache Ref
7 ns L2 Cache Ref
100 ns DRAM
150,000 ns SSD
10,000,000 ns HDD
~30,000,000 ns Network Storage
1,000,000,000 ns Tape Archives
0.5 sec
7 sec
100 sec
1.7 days
16.5 weeks
11.4 months
31.7 years
[Source]
![Page 10: 03 Database Storage Part I - CMU 15-445/645 · 03 Database Storage Part I. CMU 15-445/645 (Fall 2019) ADMINISTRIVIA Homework #1 is due September 11th @ 11:59pm Project #1 will be](https://reader030.vdocuments.site/reader030/viewer/2022033123/5ec3aae55cbb287d2613c8c2/html5/thumbnails/10.jpg)
CMU 15-445/645 (Fall 2019)
SYSTEM DESIGN GOALS
Allow the DBMS to manage databases that exceed the amount of memory available.
Reading/writing to disk is expensive, so it must be managed carefully to avoid large stalls and performance degradation.
12
![Page 11: 03 Database Storage Part I - CMU 15-445/645 · 03 Database Storage Part I. CMU 15-445/645 (Fall 2019) ADMINISTRIVIA Homework #1 is due September 11th @ 11:59pm Project #1 will be](https://reader030.vdocuments.site/reader030/viewer/2022033123/5ec3aae55cbb287d2613c8c2/html5/thumbnails/11.jpg)
CMU 15-445/645 (Fall 2019)
DISK-ORIENTED DBMS
13
Disk
Da
tab
ase
File
1HeaderDirectory
2Header
3Header
… Pages4Header
5Header
![Page 12: 03 Database Storage Part I - CMU 15-445/645 · 03 Database Storage Part I. CMU 15-445/645 (Fall 2019) ADMINISTRIVIA Homework #1 is due September 11th @ 11:59pm Project #1 will be](https://reader030.vdocuments.site/reader030/viewer/2022033123/5ec3aae55cbb287d2613c8c2/html5/thumbnails/12.jpg)
CMU 15-445/645 (Fall 2019)
DISK-ORIENTED DBMS
13
Disk
Memory
Da
tab
ase
File
1HeaderDirectory
2Header
3Header
… Pages
Bu
ffe
r P
oo
l
4Header
5Header
![Page 13: 03 Database Storage Part I - CMU 15-445/645 · 03 Database Storage Part I. CMU 15-445/645 (Fall 2019) ADMINISTRIVIA Homework #1 is due September 11th @ 11:59pm Project #1 will be](https://reader030.vdocuments.site/reader030/viewer/2022033123/5ec3aae55cbb287d2613c8c2/html5/thumbnails/13.jpg)
CMU 15-445/645 (Fall 2019)
DISK-ORIENTED DBMS
13
Disk
Memory
Da
tab
ase
File
1HeaderDirectory
2Header
3Header
… Pages
Bu
ffe
r P
oo
l
4Header
5Header
ExecutionEngine
Get page #2
![Page 14: 03 Database Storage Part I - CMU 15-445/645 · 03 Database Storage Part I. CMU 15-445/645 (Fall 2019) ADMINISTRIVIA Homework #1 is due September 11th @ 11:59pm Project #1 will be](https://reader030.vdocuments.site/reader030/viewer/2022033123/5ec3aae55cbb287d2613c8c2/html5/thumbnails/14.jpg)
CMU 15-445/645 (Fall 2019)
DISK-ORIENTED DBMS
13
Disk
Memory
Da
tab
ase
File
1HeaderDirectory
2Header
3Header
… Pages
Bu
ffe
r P
oo
l
4Header
5Header
ExecutionEngine
Get page #2
Directory
![Page 15: 03 Database Storage Part I - CMU 15-445/645 · 03 Database Storage Part I. CMU 15-445/645 (Fall 2019) ADMINISTRIVIA Homework #1 is due September 11th @ 11:59pm Project #1 will be](https://reader030.vdocuments.site/reader030/viewer/2022033123/5ec3aae55cbb287d2613c8c2/html5/thumbnails/15.jpg)
CMU 15-445/645 (Fall 2019)
DISK-ORIENTED DBMS
13
Disk
Memory
Da
tab
ase
File
1HeaderDirectory
2Header
3Header
… Pages
Bu
ffe
r P
oo
l
2Header
4Header
5Header
ExecutionEngine
Get page #2
Directory
![Page 16: 03 Database Storage Part I - CMU 15-445/645 · 03 Database Storage Part I. CMU 15-445/645 (Fall 2019) ADMINISTRIVIA Homework #1 is due September 11th @ 11:59pm Project #1 will be](https://reader030.vdocuments.site/reader030/viewer/2022033123/5ec3aae55cbb287d2613c8c2/html5/thumbnails/16.jpg)
CMU 15-445/645 (Fall 2019)
DISK-ORIENTED DBMS
13
Disk
Memory
Da
tab
ase
File
1HeaderDirectory
2Header
3Header
… Pages
Bu
ffe
r P
oo
l
2Header
4Header
5Header
ExecutionEngine
Get page #2
Directory
Interpret the layout Pointer to page #2
![Page 17: 03 Database Storage Part I - CMU 15-445/645 · 03 Database Storage Part I. CMU 15-445/645 (Fall 2019) ADMINISTRIVIA Homework #1 is due September 11th @ 11:59pm Project #1 will be](https://reader030.vdocuments.site/reader030/viewer/2022033123/5ec3aae55cbb287d2613c8c2/html5/thumbnails/17.jpg)
CMU 15-445/645 (Fall 2019)
DISK-ORIENTED DBMS
13
Disk
Memory
Da
tab
ase
File
1HeaderDirectory
2Header
3Header
… Pages
Bu
ffe
r P
oo
l
2Header
4Header
5Header
ExecutionEngine
Get page #2
Directory
Interpret the layout Pointer to page #2
Lectures 3-4
Lecture 5
Lecture 6
![Page 18: 03 Database Storage Part I - CMU 15-445/645 · 03 Database Storage Part I. CMU 15-445/645 (Fall 2019) ADMINISTRIVIA Homework #1 is due September 11th @ 11:59pm Project #1 will be](https://reader030.vdocuments.site/reader030/viewer/2022033123/5ec3aae55cbb287d2613c8c2/html5/thumbnails/18.jpg)
CMU 15-445/645 (Fall 2019)
WHY NOT USE THE OS?
One can use memory mapping (mmap) to store the contents of a file into a process' address space.
The OS is responsible for moving data for moving the files' pages in and out of memory.
15
page1 page2 page3 page4
On-Disk File
VirtualMemory
page1
page2
page3
page4
PhysicalMemory
![Page 19: 03 Database Storage Part I - CMU 15-445/645 · 03 Database Storage Part I. CMU 15-445/645 (Fall 2019) ADMINISTRIVIA Homework #1 is due September 11th @ 11:59pm Project #1 will be](https://reader030.vdocuments.site/reader030/viewer/2022033123/5ec3aae55cbb287d2613c8c2/html5/thumbnails/19.jpg)
CMU 15-445/645 (Fall 2019)
WHY NOT USE THE OS?
One can use memory mapping (mmap) to store the contents of a file into a process' address space.
The OS is responsible for moving data for moving the files' pages in and out of memory.
15
page1 page2 page3 page4
On-Disk File
VirtualMemory
page1
page2
page3
page4
PhysicalMemory
page1
![Page 20: 03 Database Storage Part I - CMU 15-445/645 · 03 Database Storage Part I. CMU 15-445/645 (Fall 2019) ADMINISTRIVIA Homework #1 is due September 11th @ 11:59pm Project #1 will be](https://reader030.vdocuments.site/reader030/viewer/2022033123/5ec3aae55cbb287d2613c8c2/html5/thumbnails/20.jpg)
CMU 15-445/645 (Fall 2019)
WHY NOT USE THE OS?
One can use memory mapping (mmap) to store the contents of a file into a process' address space.
The OS is responsible for moving data for moving the files' pages in and out of memory.
15
page1 page2 page3 page4
On-Disk File
VirtualMemory
page1
page2
page3
page4
PhysicalMemory
page1page1
![Page 21: 03 Database Storage Part I - CMU 15-445/645 · 03 Database Storage Part I. CMU 15-445/645 (Fall 2019) ADMINISTRIVIA Homework #1 is due September 11th @ 11:59pm Project #1 will be](https://reader030.vdocuments.site/reader030/viewer/2022033123/5ec3aae55cbb287d2613c8c2/html5/thumbnails/21.jpg)
CMU 15-445/645 (Fall 2019)
WHY NOT USE THE OS?
One can use memory mapping (mmap) to store the contents of a file into a process' address space.
The OS is responsible for moving data for moving the files' pages in and out of memory.
15
page1 page2 page3 page4
On-Disk File
VirtualMemory
page1
page2
page3
page4
PhysicalMemory
page1
page3
page1
![Page 22: 03 Database Storage Part I - CMU 15-445/645 · 03 Database Storage Part I. CMU 15-445/645 (Fall 2019) ADMINISTRIVIA Homework #1 is due September 11th @ 11:59pm Project #1 will be](https://reader030.vdocuments.site/reader030/viewer/2022033123/5ec3aae55cbb287d2613c8c2/html5/thumbnails/22.jpg)
CMU 15-445/645 (Fall 2019)
WHY NOT USE THE OS?
One can use memory mapping (mmap) to store the contents of a file into a process' address space.
The OS is responsible for moving data for moving the files' pages in and out of memory.
15
page1 page2 page3 page4
On-Disk File
VirtualMemory
page1
page2
page3
page4
PhysicalMemory
page1
page3
page1
page3
![Page 23: 03 Database Storage Part I - CMU 15-445/645 · 03 Database Storage Part I. CMU 15-445/645 (Fall 2019) ADMINISTRIVIA Homework #1 is due September 11th @ 11:59pm Project #1 will be](https://reader030.vdocuments.site/reader030/viewer/2022033123/5ec3aae55cbb287d2613c8c2/html5/thumbnails/23.jpg)
CMU 15-445/645 (Fall 2019)
WHY NOT USE THE OS?
One can use memory mapping (mmap) to store the contents of a file into a process' address space.
The OS is responsible for moving data for moving the files' pages in and out of memory.
15
page1 page2 page3 page4
On-Disk File
VirtualMemory
page1
page2
page3
page4
PhysicalMemory
page1
page3???page1
page3
![Page 24: 03 Database Storage Part I - CMU 15-445/645 · 03 Database Storage Part I. CMU 15-445/645 (Fall 2019) ADMINISTRIVIA Homework #1 is due September 11th @ 11:59pm Project #1 will be](https://reader030.vdocuments.site/reader030/viewer/2022033123/5ec3aae55cbb287d2613c8c2/html5/thumbnails/24.jpg)
CMU 15-445/645 (Fall 2019)
WHY NOT USE THE OS?
What if we allow multiple threads to access the mmap files to hide page fault stalls?
This works good enough for read-only access.It is complicated when there are multiple writers…
16
![Page 25: 03 Database Storage Part I - CMU 15-445/645 · 03 Database Storage Part I. CMU 15-445/645 (Fall 2019) ADMINISTRIVIA Homework #1 is due September 11th @ 11:59pm Project #1 will be](https://reader030.vdocuments.site/reader030/viewer/2022033123/5ec3aae55cbb287d2613c8c2/html5/thumbnails/25.jpg)
CMU 15-445/645 (Fall 2019)
WHY NOT USE THE OS?
There are some solutions to this problem:→ madvise: Tell the OS how you expect to
read certain pages.→ mlock: Tell the OS that memory ranges
cannot be paged out.→ msync: Tell the OS to flush memory
ranges out to disk.
17
Full Usage
Partial Usage
![Page 26: 03 Database Storage Part I - CMU 15-445/645 · 03 Database Storage Part I. CMU 15-445/645 (Fall 2019) ADMINISTRIVIA Homework #1 is due September 11th @ 11:59pm Project #1 will be](https://reader030.vdocuments.site/reader030/viewer/2022033123/5ec3aae55cbb287d2613c8c2/html5/thumbnails/26.jpg)
CMU 15-445/645 (Fall 2019)
WHY NOT USE THE OS?
DBMS (almost) always wants to control things itself and can do a better job at it.→ Flushing dirty pages to disk in the correct order.→ Specialized prefetching.→ Buffer replacement policy.→ Thread/process scheduling.
The OS is not your friend.
18
![Page 27: 03 Database Storage Part I - CMU 15-445/645 · 03 Database Storage Part I. CMU 15-445/645 (Fall 2019) ADMINISTRIVIA Homework #1 is due September 11th @ 11:59pm Project #1 will be](https://reader030.vdocuments.site/reader030/viewer/2022033123/5ec3aae55cbb287d2613c8c2/html5/thumbnails/27.jpg)
CMU 15-445/645 (Fall 2019)
DATABASE STORAGE
Problem #1: How the DBMS represents the database in files on disk.
Problem #2: How the DBMS manages its memory and move data back-and-forth from disk.
19
← Today
![Page 28: 03 Database Storage Part I - CMU 15-445/645 · 03 Database Storage Part I. CMU 15-445/645 (Fall 2019) ADMINISTRIVIA Homework #1 is due September 11th @ 11:59pm Project #1 will be](https://reader030.vdocuments.site/reader030/viewer/2022033123/5ec3aae55cbb287d2613c8c2/html5/thumbnails/28.jpg)
CMU 15-445/645 (Fall 2019)
TODAY'S AGENDA
File Storage
Page Layout
Tuple Layout
20
![Page 29: 03 Database Storage Part I - CMU 15-445/645 · 03 Database Storage Part I. CMU 15-445/645 (Fall 2019) ADMINISTRIVIA Homework #1 is due September 11th @ 11:59pm Project #1 will be](https://reader030.vdocuments.site/reader030/viewer/2022033123/5ec3aae55cbb287d2613c8c2/html5/thumbnails/29.jpg)
CMU 15-445/645 (Fall 2019)
FILE STORAGE
The DBMS stores a database as one or more files on disk.→ The OS doesn't know anything about the contents of
these files.
Early systems in the 1980s used custom filesystems on raw storage.→ Some "enterprise" DBMSs still support this.→ Most newer DBMSs do not do this.
21
![Page 30: 03 Database Storage Part I - CMU 15-445/645 · 03 Database Storage Part I. CMU 15-445/645 (Fall 2019) ADMINISTRIVIA Homework #1 is due September 11th @ 11:59pm Project #1 will be](https://reader030.vdocuments.site/reader030/viewer/2022033123/5ec3aae55cbb287d2613c8c2/html5/thumbnails/30.jpg)
CMU 15-445/645 (Fall 2019)
STORAGE MANAGER
The storage manager is responsible for maintaining a database's files.→ Some do their own scheduling for reads and writes to
improve spatial and temporal locality of pages.
It organizes the files as a collection of pages.→ Tracks data read/written to pages.→ Tracks the available space.
22
![Page 31: 03 Database Storage Part I - CMU 15-445/645 · 03 Database Storage Part I. CMU 15-445/645 (Fall 2019) ADMINISTRIVIA Homework #1 is due September 11th @ 11:59pm Project #1 will be](https://reader030.vdocuments.site/reader030/viewer/2022033123/5ec3aae55cbb287d2613c8c2/html5/thumbnails/31.jpg)
CMU 15-445/645 (Fall 2019)
DATABASE PAGES
A page is a fixed-size block of data.→ It can contain tuples, meta-data, indexes, log records…→ Most systems do not mix page types.→ Some systems require a page to be self-contained.
Each page is given a unique identifier.→ The DBMS uses an indirection layer to map page ids to
physical locations.
23
![Page 32: 03 Database Storage Part I - CMU 15-445/645 · 03 Database Storage Part I. CMU 15-445/645 (Fall 2019) ADMINISTRIVIA Homework #1 is due September 11th @ 11:59pm Project #1 will be](https://reader030.vdocuments.site/reader030/viewer/2022033123/5ec3aae55cbb287d2613c8c2/html5/thumbnails/32.jpg)
CMU 15-445/645 (Fall 2019)
DATABASE PAGES
There are three different notions of "pages" in a DBMS:→ Hardware Page (usually 4KB)→ OS Page (usually 4KB)→ Database Page (512B-16KB)
By hardware page, we mean at what level the device can guarantee a "failsafe write".
24
16KB
8KB
4KB
![Page 33: 03 Database Storage Part I - CMU 15-445/645 · 03 Database Storage Part I. CMU 15-445/645 (Fall 2019) ADMINISTRIVIA Homework #1 is due September 11th @ 11:59pm Project #1 will be](https://reader030.vdocuments.site/reader030/viewer/2022033123/5ec3aae55cbb287d2613c8c2/html5/thumbnails/33.jpg)
CMU 15-445/645 (Fall 2019)
PAGE STORAGE ARCHITECTURE
Different DBMSs manage pages in files on disk in different ways.→ Heap File Organization→ Sequential / Sorted File Organization→ Hashing File Organization
At this point in the hierarchy we don't need to know anything about what is inside of the pages.
25
![Page 34: 03 Database Storage Part I - CMU 15-445/645 · 03 Database Storage Part I. CMU 15-445/645 (Fall 2019) ADMINISTRIVIA Homework #1 is due September 11th @ 11:59pm Project #1 will be](https://reader030.vdocuments.site/reader030/viewer/2022033123/5ec3aae55cbb287d2613c8c2/html5/thumbnails/34.jpg)
CMU 15-445/645 (Fall 2019)
DATABASE HEAP
A heap file is an unordered collection of pages where tuples that are stored in random order.→ Create / Get / Write / Delete Page→ Must also support iterating over all pages.
Need meta-data to keep track of what pages exist and which ones have free space.
Two ways to represent a heap file:→ Linked List→ Page Directory
26
![Page 35: 03 Database Storage Part I - CMU 15-445/645 · 03 Database Storage Part I. CMU 15-445/645 (Fall 2019) ADMINISTRIVIA Homework #1 is due September 11th @ 11:59pm Project #1 will be](https://reader030.vdocuments.site/reader030/viewer/2022033123/5ec3aae55cbb287d2613c8c2/html5/thumbnails/35.jpg)
CMU 15-445/645 (Fall 2019)
HEAP FILE: LINKED LIST
Maintain a header page at the beginning of the file that stores two pointers:→ HEAD of the free page list.→ HEAD of the data page list.
Each page keeps track of the number of free slots in itself.
27
Header
Page
Data
Page
Data
Page
Data
Page
Data
…
…
Free PageList
Data PageList
![Page 36: 03 Database Storage Part I - CMU 15-445/645 · 03 Database Storage Part I. CMU 15-445/645 (Fall 2019) ADMINISTRIVIA Homework #1 is due September 11th @ 11:59pm Project #1 will be](https://reader030.vdocuments.site/reader030/viewer/2022033123/5ec3aae55cbb287d2613c8c2/html5/thumbnails/36.jpg)
CMU 15-445/645 (Fall 2019)
HEAP FILE: LINKED LIST
Maintain a header page at the beginning of the file that stores two pointers:→ HEAD of the free page list.→ HEAD of the data page list.
Each page keeps track of the number of free slots in itself.
27
Header
Page
Data
Page
Data
Page
Data
Page
Data
…
…
Free PageList
Data PageList
![Page 37: 03 Database Storage Part I - CMU 15-445/645 · 03 Database Storage Part I. CMU 15-445/645 (Fall 2019) ADMINISTRIVIA Homework #1 is due September 11th @ 11:59pm Project #1 will be](https://reader030.vdocuments.site/reader030/viewer/2022033123/5ec3aae55cbb287d2613c8c2/html5/thumbnails/37.jpg)
CMU 15-445/645 (Fall 2019)
HEAP FILE: PAGE DIRECTORY
The DBMS maintains special pages that tracks the location of data pages in the database files.
The directory also records the number of free slots per page.
The DBMS has to make sure that the directory pages are in sync with the data pages.
28
Directory
…
Page
Data
Page
Data
Page
Data
…
![Page 38: 03 Database Storage Part I - CMU 15-445/645 · 03 Database Storage Part I. CMU 15-445/645 (Fall 2019) ADMINISTRIVIA Homework #1 is due September 11th @ 11:59pm Project #1 will be](https://reader030.vdocuments.site/reader030/viewer/2022033123/5ec3aae55cbb287d2613c8c2/html5/thumbnails/38.jpg)
CMU 15-445/645 (Fall 2019)
TODAY'S AGENDA
File Storage
Page Layout
Tuple Layout
29
![Page 39: 03 Database Storage Part I - CMU 15-445/645 · 03 Database Storage Part I. CMU 15-445/645 (Fall 2019) ADMINISTRIVIA Homework #1 is due September 11th @ 11:59pm Project #1 will be](https://reader030.vdocuments.site/reader030/viewer/2022033123/5ec3aae55cbb287d2613c8c2/html5/thumbnails/39.jpg)
CMU 15-445/645 (Fall 2019)
PAGE HEADER
Every page contains a header of meta-data about the page's contents.→ Page Size→ Checksum→ DBMS Version→ Transaction Visibility→ Compression Information
Some systems require pages to be self-contained (e.g., Oracle).
30
Data
Page
Header
![Page 40: 03 Database Storage Part I - CMU 15-445/645 · 03 Database Storage Part I. CMU 15-445/645 (Fall 2019) ADMINISTRIVIA Homework #1 is due September 11th @ 11:59pm Project #1 will be](https://reader030.vdocuments.site/reader030/viewer/2022033123/5ec3aae55cbb287d2613c8c2/html5/thumbnails/40.jpg)
CMU 15-445/645 (Fall 2019)
PAGE L AYOUT
For any page storage architecture, we now need to understand how to organize the data stored inside of the page.→ We are still assuming that we are only storing tuples.
Two approaches:→ Tuple-oriented→ Log-structured
31
![Page 41: 03 Database Storage Part I - CMU 15-445/645 · 03 Database Storage Part I. CMU 15-445/645 (Fall 2019) ADMINISTRIVIA Homework #1 is due September 11th @ 11:59pm Project #1 will be](https://reader030.vdocuments.site/reader030/viewer/2022033123/5ec3aae55cbb287d2613c8c2/html5/thumbnails/41.jpg)
CMU 15-445/645 (Fall 2019)
TUPLE STORAGE
How to store tuples in a page?
32
Page
Num Tuples = 0
![Page 42: 03 Database Storage Part I - CMU 15-445/645 · 03 Database Storage Part I. CMU 15-445/645 (Fall 2019) ADMINISTRIVIA Homework #1 is due September 11th @ 11:59pm Project #1 will be](https://reader030.vdocuments.site/reader030/viewer/2022033123/5ec3aae55cbb287d2613c8c2/html5/thumbnails/42.jpg)
CMU 15-445/645 (Fall 2019)
TUPLE STORAGE
How to store tuples in a page?
Strawman Idea: Keep track of the number of tuples in a page and then just append a new tuple to the end.
32
Page
Num Tuples = 0
Tuple #1
Tuple #2
Tuple #3
Num Tuples = 3
![Page 43: 03 Database Storage Part I - CMU 15-445/645 · 03 Database Storage Part I. CMU 15-445/645 (Fall 2019) ADMINISTRIVIA Homework #1 is due September 11th @ 11:59pm Project #1 will be](https://reader030.vdocuments.site/reader030/viewer/2022033123/5ec3aae55cbb287d2613c8c2/html5/thumbnails/43.jpg)
CMU 15-445/645 (Fall 2019)
TUPLE STORAGE
How to store tuples in a page?
Strawman Idea: Keep track of the number of tuples in a page and then just append a new tuple to the end.→ What happens if we delete a tuple?
32
Page
Num Tuples = 0
Tuple #1
Tuple #3
Num Tuples = 3Num Tuples = 2
![Page 44: 03 Database Storage Part I - CMU 15-445/645 · 03 Database Storage Part I. CMU 15-445/645 (Fall 2019) ADMINISTRIVIA Homework #1 is due September 11th @ 11:59pm Project #1 will be](https://reader030.vdocuments.site/reader030/viewer/2022033123/5ec3aae55cbb287d2613c8c2/html5/thumbnails/44.jpg)
CMU 15-445/645 (Fall 2019)
TUPLE STORAGE
How to store tuples in a page?
Strawman Idea: Keep track of the number of tuples in a page and then just append a new tuple to the end.→ What happens if we delete a tuple?
32
Page
Num Tuples = 0
Tuple #1
Tuple #3
Tuple #4
Num Tuples = 3
![Page 45: 03 Database Storage Part I - CMU 15-445/645 · 03 Database Storage Part I. CMU 15-445/645 (Fall 2019) ADMINISTRIVIA Homework #1 is due September 11th @ 11:59pm Project #1 will be](https://reader030.vdocuments.site/reader030/viewer/2022033123/5ec3aae55cbb287d2613c8c2/html5/thumbnails/45.jpg)
CMU 15-445/645 (Fall 2019)
TUPLE STORAGE
How to store tuples in a page?
Strawman Idea: Keep track of the number of tuples in a page and then just append a new tuple to the end.→ What happens if we delete a tuple?→ What happens if we have a variable-
length attribute?
32
Page
Num Tuples = 0
Tuple #1
Tuple #3
Tuple #4
Num Tuples = 3
![Page 46: 03 Database Storage Part I - CMU 15-445/645 · 03 Database Storage Part I. CMU 15-445/645 (Fall 2019) ADMINISTRIVIA Homework #1 is due September 11th @ 11:59pm Project #1 will be](https://reader030.vdocuments.site/reader030/viewer/2022033123/5ec3aae55cbb287d2613c8c2/html5/thumbnails/46.jpg)
CMU 15-445/645 (Fall 2019)
SLOT TED PAGES
The most common layout scheme is called slotted pages.
The slot array maps "slots" to the tuples' starting position offsets.
The header keeps track of:→ The # of used slots→ The offset of the starting location of the
last slot used.
33
Header
Tuple #4
Tuple #2
Tuple #3
Tuple #1
Fixed/Var-lengthTuple Data
Slot Array
![Page 47: 03 Database Storage Part I - CMU 15-445/645 · 03 Database Storage Part I. CMU 15-445/645 (Fall 2019) ADMINISTRIVIA Homework #1 is due September 11th @ 11:59pm Project #1 will be](https://reader030.vdocuments.site/reader030/viewer/2022033123/5ec3aae55cbb287d2613c8c2/html5/thumbnails/47.jpg)
CMU 15-445/645 (Fall 2019)
SLOT TED PAGES
The most common layout scheme is called slotted pages.
The slot array maps "slots" to the tuples' starting position offsets.
The header keeps track of:→ The # of used slots→ The offset of the starting location of the
last slot used.
33
Header
Tuple #4
Tuple #2
Tuple #3
Tuple #1
Fixed/Var-lengthTuple Data
Slot Array
![Page 48: 03 Database Storage Part I - CMU 15-445/645 · 03 Database Storage Part I. CMU 15-445/645 (Fall 2019) ADMINISTRIVIA Homework #1 is due September 11th @ 11:59pm Project #1 will be](https://reader030.vdocuments.site/reader030/viewer/2022033123/5ec3aae55cbb287d2613c8c2/html5/thumbnails/48.jpg)
CMU 15-445/645 (Fall 2019)
SLOT TED PAGES
The most common layout scheme is called slotted pages.
The slot array maps "slots" to the tuples' starting position offsets.
The header keeps track of:→ The # of used slots→ The offset of the starting location of the
last slot used.
33
Header
Tuple #4
Tuple #2
Tuple #3
Tuple #1
Fixed/Var-lengthTuple Data
Slot Array
![Page 49: 03 Database Storage Part I - CMU 15-445/645 · 03 Database Storage Part I. CMU 15-445/645 (Fall 2019) ADMINISTRIVIA Homework #1 is due September 11th @ 11:59pm Project #1 will be](https://reader030.vdocuments.site/reader030/viewer/2022033123/5ec3aae55cbb287d2613c8c2/html5/thumbnails/49.jpg)
CMU 15-445/645 (Fall 2019)
LOG-STRUCTURED FILE ORGANIZATION
Instead of storing tuples in pages, the DBMS only stores log records.
The system appends log records to the file of how the database was modified:→ Inserts store the entire tuple.→ Deletes mark the tuple as deleted.→ Updates contain the delta of just the
attributes that were modified.
34
…Ne
w E
ntr
ies
INSERT id=1,val=a
INSERT id=2,val=b
DELETE id=4
UPDATE val=X (id=3)
UPDATE val=Y (id=4)
INSERT id=3,val=c
Page
![Page 50: 03 Database Storage Part I - CMU 15-445/645 · 03 Database Storage Part I. CMU 15-445/645 (Fall 2019) ADMINISTRIVIA Homework #1 is due September 11th @ 11:59pm Project #1 will be](https://reader030.vdocuments.site/reader030/viewer/2022033123/5ec3aae55cbb287d2613c8c2/html5/thumbnails/50.jpg)
CMU 15-445/645 (Fall 2019)
LOG-STRUCTURED FILE ORGANIZATION
To read a record, the DBMS scans the log backwards and "recreates" the tuple to find what it needs.
35
INSERT id=1,val=a
INSERT id=2,val=b
DELETE id=4
UPDATE val=X (id=3)
UPDATE val=Y (id=4)
INSERT id=3,val=c
…
Re
ad
s
Page
![Page 51: 03 Database Storage Part I - CMU 15-445/645 · 03 Database Storage Part I. CMU 15-445/645 (Fall 2019) ADMINISTRIVIA Homework #1 is due September 11th @ 11:59pm Project #1 will be](https://reader030.vdocuments.site/reader030/viewer/2022033123/5ec3aae55cbb287d2613c8c2/html5/thumbnails/51.jpg)
CMU 15-445/645 (Fall 2019)
LOG-STRUCTURED FILE ORGANIZATION
To read a record, the DBMS scans the log backwards and "recreates" the tuple to find what it needs.
Build indexes to allow it to jump to locations in the log.
35
INSERT id=1,val=a
INSERT id=2,val=b
DELETE id=4
UPDATE val=X (id=3)
UPDATE val=Y (id=4)
INSERT id=3,val=c
…id=1
id=2
id=3
id=4
Page
![Page 52: 03 Database Storage Part I - CMU 15-445/645 · 03 Database Storage Part I. CMU 15-445/645 (Fall 2019) ADMINISTRIVIA Homework #1 is due September 11th @ 11:59pm Project #1 will be](https://reader030.vdocuments.site/reader030/viewer/2022033123/5ec3aae55cbb287d2613c8c2/html5/thumbnails/52.jpg)
CMU 15-445/645 (Fall 2019)
LOG-STRUCTURED FILE ORGANIZATION
To read a record, the DBMS scans the log backwards and "recreates" the tuple to find what it needs.
Build indexes to allow it to jump to locations in the log.
Periodically compact the log.
35
id=1,val=aid=2,val=bid=3,val=Xid=4,val=Y
Page
![Page 53: 03 Database Storage Part I - CMU 15-445/645 · 03 Database Storage Part I. CMU 15-445/645 (Fall 2019) ADMINISTRIVIA Homework #1 is due September 11th @ 11:59pm Project #1 will be](https://reader030.vdocuments.site/reader030/viewer/2022033123/5ec3aae55cbb287d2613c8c2/html5/thumbnails/53.jpg)
CMU 15-445/645 (Fall 2019)
TODAY'S AGENDA
File Storage
Page Layout
Tuple Layout
37
![Page 54: 03 Database Storage Part I - CMU 15-445/645 · 03 Database Storage Part I. CMU 15-445/645 (Fall 2019) ADMINISTRIVIA Homework #1 is due September 11th @ 11:59pm Project #1 will be](https://reader030.vdocuments.site/reader030/viewer/2022033123/5ec3aae55cbb287d2613c8c2/html5/thumbnails/54.jpg)
CMU 15-445/645 (Fall 2019)
TUPLE L AYOUT
A tuple is essentially a sequence of bytes.
It's the job of the DBMS to interpret those bytes into attribute types and values.
38
![Page 55: 03 Database Storage Part I - CMU 15-445/645 · 03 Database Storage Part I. CMU 15-445/645 (Fall 2019) ADMINISTRIVIA Homework #1 is due September 11th @ 11:59pm Project #1 will be](https://reader030.vdocuments.site/reader030/viewer/2022033123/5ec3aae55cbb287d2613c8c2/html5/thumbnails/55.jpg)
CMU 15-445/645 (Fall 2019)
Tuple
TUPLE HEADER
Each tuple is prefixed with a headerthat contains meta-data about it.→ Visibility info (concurrency control)→ Bit Map for NULL values.
We do not need to store meta-data about the schema.
39
Header Attribute Data
![Page 56: 03 Database Storage Part I - CMU 15-445/645 · 03 Database Storage Part I. CMU 15-445/645 (Fall 2019) ADMINISTRIVIA Homework #1 is due September 11th @ 11:59pm Project #1 will be](https://reader030.vdocuments.site/reader030/viewer/2022033123/5ec3aae55cbb287d2613c8c2/html5/thumbnails/56.jpg)
CMU 15-445/645 (Fall 2019)
TUPLE DATA
Attributes are typically stored in the order that you specify them when you create the table.
This is done for software engineering reasons.
We re-order attributes automatically in CMU's new DBMS…
40
Tuple
Header a b c d e
CREATE TABLE foo (a INT PRIMARY KEY,b INT NOT NULL,c INT,d DOUBLE,e FLOAT
);
![Page 57: 03 Database Storage Part I - CMU 15-445/645 · 03 Database Storage Part I. CMU 15-445/645 (Fall 2019) ADMINISTRIVIA Homework #1 is due September 11th @ 11:59pm Project #1 will be](https://reader030.vdocuments.site/reader030/viewer/2022033123/5ec3aae55cbb287d2613c8c2/html5/thumbnails/57.jpg)
CMU 15-445/645 (Fall 2019)
DENORMALIZED TUPLE DATA
Can physically denormalize (e.g., "pre join") related tuples and store them together in the same page.→ Potentially reduces the amount of I/O for
common workload patterns.→ Can make updates more expensive.
41
CREATE TABLE foo (a INT PRIMARY KEY,b INT NOT NULL,
); CREATE TABLE bar (c INT PRIMARY KEY,a INT⮱REFERENCES foo (a),
);
![Page 58: 03 Database Storage Part I - CMU 15-445/645 · 03 Database Storage Part I. CMU 15-445/645 (Fall 2019) ADMINISTRIVIA Homework #1 is due September 11th @ 11:59pm Project #1 will be](https://reader030.vdocuments.site/reader030/viewer/2022033123/5ec3aae55cbb287d2613c8c2/html5/thumbnails/58.jpg)
CMU 15-445/645 (Fall 2019)
DENORMALIZED TUPLE DATA
Can physically denormalize (e.g., "pre join") related tuples and store them together in the same page.→ Potentially reduces the amount of I/O for
common workload patterns.→ Can make updates more expensive.
41
foo
Header c a
Header c a
Header c a
bar
Header a b
![Page 59: 03 Database Storage Part I - CMU 15-445/645 · 03 Database Storage Part I. CMU 15-445/645 (Fall 2019) ADMINISTRIVIA Homework #1 is due September 11th @ 11:59pm Project #1 will be](https://reader030.vdocuments.site/reader030/viewer/2022033123/5ec3aae55cbb287d2613c8c2/html5/thumbnails/59.jpg)
CMU 15-445/645 (Fall 2019)
DENORMALIZED TUPLE DATA
Can physically denormalize (e.g., "pre join") related tuples and store them together in the same page.→ Potentially reduces the amount of I/O for
common workload patterns.→ Can make updates more expensive.
41
foo
c c c …
foo bar
Header a b
![Page 60: 03 Database Storage Part I - CMU 15-445/645 · 03 Database Storage Part I. CMU 15-445/645 (Fall 2019) ADMINISTRIVIA Homework #1 is due September 11th @ 11:59pm Project #1 will be](https://reader030.vdocuments.site/reader030/viewer/2022033123/5ec3aae55cbb287d2613c8c2/html5/thumbnails/60.jpg)
CMU 15-445/645 (Fall 2019)
DENORMALIZED TUPLE DATA
Can physically denormalize (e.g., "pre join") related tuples and store them together in the same page.→ Potentially reduces the amount of I/O for
common workload patterns.→ Can make updates more expensive.
Not a new idea.→ IBM System R did this in the 1970s.→ Several NoSQL DBMSs do this without
calling it physical denormalization.
41
foo
c c c …
foo bar
Header a b
![Page 61: 03 Database Storage Part I - CMU 15-445/645 · 03 Database Storage Part I. CMU 15-445/645 (Fall 2019) ADMINISTRIVIA Homework #1 is due September 11th @ 11:59pm Project #1 will be](https://reader030.vdocuments.site/reader030/viewer/2022033123/5ec3aae55cbb287d2613c8c2/html5/thumbnails/61.jpg)
CMU 15-445/645 (Fall 2019)
RECORD IDS
The DBMS needs a way to keep track of individual tuples.
Each tuple is assigned a unique record identifier.→ Most common: page_id + offset/slot→ Can also contain file location info.
An application cannot rely on these ids to mean anything.
42
CTID (4-bytes)
ROWID (10-bytes)
ROWID (8-bytes)
![Page 62: 03 Database Storage Part I - CMU 15-445/645 · 03 Database Storage Part I. CMU 15-445/645 (Fall 2019) ADMINISTRIVIA Homework #1 is due September 11th @ 11:59pm Project #1 will be](https://reader030.vdocuments.site/reader030/viewer/2022033123/5ec3aae55cbb287d2613c8c2/html5/thumbnails/62.jpg)
CMU 15-445/645 (Fall 2019)
CONCLUSION
Database is organized in pages.
Different ways to track pages.
Different ways to store pages.
Different ways to store tuples.
43
![Page 63: 03 Database Storage Part I - CMU 15-445/645 · 03 Database Storage Part I. CMU 15-445/645 (Fall 2019) ADMINISTRIVIA Homework #1 is due September 11th @ 11:59pm Project #1 will be](https://reader030.vdocuments.site/reader030/viewer/2022033123/5ec3aae55cbb287d2613c8c2/html5/thumbnails/63.jpg)
CMU 15-445/645 (Fall 2019)
NEXT CL ASS
Value Representation
Storage Models
44