database system internals...database= collection of related files dbms= program that manages the...

40
1 Architecture April 3, 2020 Database System Internals CSE 444 - Spring 2020

Upload: others

Post on 30-Mar-2021

14 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Database System Internals...Database= collection of related files DBMS= program that manages the database CSE 444 –Spring 2018 3 What we already know… Data models: relational,

1

Architecture

April 3, 2020

Database System Internals

CSE 444 - Spring 2020

Page 2: Database System Internals...Database= collection of related files DBMS= program that manages the database CSE 444 –Spring 2018 3 What we already know… Data models: relational,

Announcements

§Lab 1 part 1 is due on Monday• “git pull upstream master” before building• Remember to git commit and git push often!

§HW1 is due next week on Friday• gradescope

§544M paper review is due in two weeks• Email to me

CSE 444 - Spring 2020 2April 3, 2020

Page 3: Database System Internals...Database= collection of related files DBMS= program that manages the database CSE 444 –Spring 2018 3 What we already know… Data models: relational,

What we already know…

§Database = collection of related files

§DBMS = program that manages the database

CSE 444 – Spring 2018 3

Page 4: Database System Internals...Database= collection of related files DBMS= program that manages the database CSE 444 –Spring 2018 3 What we already know… Data models: relational,

What we already know…

§Data models: relational, semi-structured (XML), graph (RDF), key-value pairs

§ Relational model: defines only the logical model, and does not define a physical storage of the data

CSE 444 – Spring 2018 4

Page 5: Database System Internals...Database= collection of related files DBMS= program that manages the database CSE 444 –Spring 2018 3 What we already know… Data models: relational,

What we already know…

Relational Query Language:

§ Set-at-a-time: instead of tuple-at-a-time

§Declarative: user says what they want and not how to get it

§Query optimizer: from what to how

CSE 444 – Spring 2018 5

Page 6: Database System Internals...Database= collection of related files DBMS= program that manages the database CSE 444 –Spring 2018 3 What we already know… Data models: relational,

How to Implement a Relational DBMS?

CSE 444 – Spring 2018 6

DBMS

SQL

Data

Key challenge:Achieve high performanceon large databases!

Page 7: Database System Internals...Database= collection of related files DBMS= program that manages the database CSE 444 –Spring 2018 3 What we already know… Data models: relational,

DBMS Architecture

7

Query Processor

Parser

Query Rewrite

Optimizer

Executor

April 3, 2020 CSE 444 - Spring 2020

Page 8: Database System Internals...Database= collection of related files DBMS= program that manages the database CSE 444 –Spring 2018 3 What we already know… Data models: relational,

DBMS Architecture

8

Query Processor

Parser

Query Rewrite

Optimizer

Executor

April 3, 2020 CSE 444 - Spring 2020

We will fill inimplementation

Page 9: Database System Internals...Database= collection of related files DBMS= program that manages the database CSE 444 –Spring 2018 3 What we already know… Data models: relational,

DBMS Architecture

9

Query Processor

Parser

Query Rewrite

Optimizer

Executor

Storage Manager

Access Methods

Lock Manager

Buffer Manager

Log Manager

April 3, 2020 CSE 444 - Spring 2020

Page 10: Database System Internals...Database= collection of related files DBMS= program that manages the database CSE 444 –Spring 2018 3 What we already know… Data models: relational,

DBMS Architecture

10

Query Processor

Parser

Query Rewrite

Optimizer

Executor

Storage Manager

Access Methods

Lock Manager

Buffer Manager

Log Manager

April 3, 2020 CSE 444 - Spring 2020

We will fill inimplementation

Page 11: Database System Internals...Database= collection of related files DBMS= program that manages the database CSE 444 –Spring 2018 3 What we already know… Data models: relational,

DBMS Architecture

11

Process Manager

Admission Control

Connection Mgr

Query Processor

Parser

Query Rewrite

Optimizer

Executor

Storage Manager

Access Methods

Lock Manager

Buffer Manager

Log Manager

April 3, 2020 CSE 444 - Spring 2020

Page 12: Database System Internals...Database= collection of related files DBMS= program that manages the database CSE 444 –Spring 2018 3 What we already know… Data models: relational,

DBMS Architecture

12

Process Manager

Admission Control

Connection Mgr

Query Processor

Parser

Query Rewrite

Optimizer

Executor

Storage Manager

Access Methods

Lock Manager

Buffer Manager

Log Manager

Shared Utilities

Memory Mgr

Disk Space Mgr

Replication Services

Admin Utilities

[Anatomy of a Db System. J. Hellerstein & M. Stonebraker. Red Book. 4ed.]

April 3, 2020 CSE 444 - Spring 2020

Page 13: Database System Internals...Database= collection of related files DBMS= program that manages the database CSE 444 –Spring 2018 3 What we already know… Data models: relational,

Goal for Today

Overview of query execution

Overview of storage manager

CSE 444 - Spring 2020 13April 3, 2020

Page 14: Database System Internals...Database= collection of related files DBMS= program that manages the database CSE 444 –Spring 2018 3 What we already know… Data models: relational,

Query Processor

CSE 444 - Spring 2020 14

Query Processor

CSE 444 – Spring 2018 8

Query Processor

April 3, 2020

Page 15: Database System Internals...Database= collection of related files DBMS= program that manages the database CSE 444 –Spring 2018 3 What we already know… Data models: relational,

Example Database Schema

Supplier(sno,sname,scity,sstate)

Part(pno,pname,psize,pcolor)

Supplies(sno,pno,price)

View: Suppliers in Seattle

CREATE VIEW NearbySupp AS

SELECT sno, sname

FROM Supplier

WHERE scity='Seattle' AND sstate='WA'

15April 3, 2020 CSE 444 - Spring 2020

Page 16: Database System Internals...Database= collection of related files DBMS= program that manages the database CSE 444 –Spring 2018 3 What we already know… Data models: relational,

CSE 444 - Spring 2020

Example Query

§ Find the names of all suppliers in Seattle who supply part number 2

SELECT sno, sname

FROM NearbySupp

WHERE sno IN ( SELECT snoFROM Supplies

WHERE pno = 2 )

16

Supplier(sno,sname,scity,sstate)Part(pno,pname,psize,pcolor)Supplies(sno,pno,price)

April 3, 2020

Page 17: Database System Internals...Database= collection of related files DBMS= program that manages the database CSE 444 –Spring 2018 3 What we already know… Data models: relational,

CSE 444 - Spring 2020

Query Processor

§Step 1: Parser• Parses query into an internal format• Performs various checks using catalog

§Step 2: Query rewrite• View rewriting, flattening, etc.

17April 3, 2020

Page 18: Database System Internals...Database= collection of related files DBMS= program that manages the database CSE 444 –Spring 2018 3 What we already know… Data models: relational,

April 3, 2020 CSE 444 - Spring 2020 18

Rewritten Version of Our QuerySupplier(sno,sname,scity,sstate)Part(pno,pname,psize,pcolor)Supplies(sno,pno,price)

Original view:CREATE VIEW NearbySupp ASSELECT sno, snameFROM SupplierWHERE scity='Seattle' AND sstate='WA'

Original query:SELECT sno, snameFROM NearbySuppWHERE sno IN ( SELECT sno

FROM SuppliesWHERE pno = 2 )

Page 19: Database System Internals...Database= collection of related files DBMS= program that manages the database CSE 444 –Spring 2018 3 What we already know… Data models: relational,

April 3, 2020 CSE 444 - Spring 2020 19

Rewritten Version of Our QuerySupplier(sno,sname,scity,sstate)Part(pno,pname,psize,pcolor)Supplies(sno,pno,price)

Original view:CREATE VIEW NearbySupp ASSELECT sno, snameFROM SupplierWHERE scity='Seattle' AND sstate='WA'

Original query:SELECT sno, snameFROM NearbySuppWHERE sno IN ( SELECT sno

FROM SuppliesWHERE pno = 2 )

Rewritten query (view inlining plus query unnesting):SELECT S.sno, S.snameFROM Supplier S, Supplies UWHERE S.scity='Seattle' AND S.sstate='WA’AND S.sno = U.snoAND U.pno = 2;

Page 20: Database System Internals...Database= collection of related files DBMS= program that manages the database CSE 444 –Spring 2018 3 What we already know… Data models: relational,

Query Processor

§Step 3: Optimizer• Find an efficient query plan for executing the query• A query plan is

• Logical: An extended relational algebra tree • Physical: With additional annotations at each node

• Access method to use for each relation• Implementation to use for each relational operator

§Step 4: Executor• Actually executes the physical plan

CSE 444 - Spring 2020 20April 3, 2020

Page 21: Database System Internals...Database= collection of related files DBMS= program that manages the database CSE 444 –Spring 2018 3 What we already know… Data models: relational,

CSE 444 - Spring 2020

Logical Query Plan

Supplier Supplies

sno = sno

𝜎sscity=‘Seattle’ ⋀ sstate=‘WA’ ⋀ pno=2

𝛑sno,sname

21

Supplier(sno,sname,scity,sstate)Part(pno,pname,psize,pcolor)Supplies(sno,pno,price)

SELECT S.snameFROM Supplier S, Supplies UWHERE S.scity='Seattle' AND S.sstate='WA’AND S.sno = U.snoAND U.pno = 2;

April 3, 2020

Page 22: Database System Internals...Database= collection of related files DBMS= program that manages the database CSE 444 –Spring 2018 3 What we already know… Data models: relational,

CSE 444 - Spring 2020

Physical Query Plan

§ Logical query plan with extra annotations

§ Implementation choice for each operator

§Access path selection for each relation• Bottom of tree = read from disk• Use a file scan or use an index

22April 3, 2020

Page 23: Database System Internals...Database= collection of related files DBMS= program that manages the database CSE 444 –Spring 2018 3 What we already know… Data models: relational,

CSE 444 - Spring 2020

Physical Query Plan

Suppliers Supplies

sno = sno

𝜎 sscity=‘Seattle’ ⋀ sstate=‘WA’ ⋀ pno=2

𝛑sno,sname

(File scan) (File scan)

(Nested loop)

(On the fly)

(On the fly)

23

Supplier(sno,sname,scity,sstate)Part(pno,pname,psize,pcolor)Supplies(sno,pno,price)

April 3, 2020

Page 24: Database System Internals...Database= collection of related files DBMS= program that manages the database CSE 444 –Spring 2018 3 What we already know… Data models: relational,

CSE 444 - Spring 2020

Physical Query Plan

Suppliers Supplies

sno = sno

𝜎 sscity=‘Seattle’ ⋀ sstate=‘WA’ ⋀ pno=2

(File scan) (File scan)

(Nested loop)

(On the fly)

(On the fly)

24

Supplier(sno,sname,scity,sstate)Part(pno,pname,psize,pcolor)Supplies(sno,pno,price)

April 3, 2020

for x in Suppliers dofor y in Supplies do

if x.sno=y.snooutput(x,y)

𝛑sno,sname

Page 25: Database System Internals...Database= collection of related files DBMS= program that manages the database CSE 444 –Spring 2018 3 What we already know… Data models: relational,

CSE 444 - Spring 2020

Physical Query Plan

Suppliers Supplies

sno = sno

𝜎 sscity=‘Seattle’ ⋀ sstate=‘WA’ ⋀ pno=2

(File scan) (File scan)

(Nested loop)

(On the fly)

(On the fly)

25

Supplier(sno,sname,scity,sstate)Part(pno,pname,psize,pcolor)Supplies(sno,pno,price)

April 3, 2020

for x in Suppliers dofor y in Supplies do

if x.sno=y.snooutput(x,y)

for x in Input doif x.sscity=‘Seattle’ and...

output(x)

𝛑sno,sname

Page 26: Database System Internals...Database= collection of related files DBMS= program that manages the database CSE 444 –Spring 2018 3 What we already know… Data models: relational,

CSE 444 - Spring 2020

Physical Query Plan

Suppliers Supplies

sno = sno

𝜎 sscity=‘Seattle’ ⋀ sstate=‘WA’ ⋀ pno=2

(File scan) (File scan)

(Nested loop)

(On the fly)

(On the fly)

26

Supplier(sno,sname,scity,sstate)Part(pno,pname,psize,pcolor)Supplies(sno,pno,price)

April 3, 2020

for x in Suppliers dofor y in Supplies do

if x.sno=y.snooutput(x,y)

for x in Input doif x.sscity=‘Seattle’ and...

output(x)How do wecombine them?

𝛑sno,sname

Page 27: Database System Internals...Database= collection of related files DBMS= program that manages the database CSE 444 –Spring 2018 3 What we already know… Data models: relational,

Query Executor

CSE 444 - Spring 2020 27April 3, 2020

Page 28: Database System Internals...Database= collection of related files DBMS= program that manages the database CSE 444 –Spring 2018 3 What we already know… Data models: relational,

Iterator Interface

§ Each operator implements OpIterator.java§ open()

• Initializes operator state• Sets parameters such as selection predicate

§ next()• Returns a Tuple!• Operator invokes next() recursively on its inputs• Performs processing and produces an output tuple

§ close(): clean-up state§ Operators also have reference to their child operator

in the query plan

28April 3, 2020 CSE 444 - Spring 2020

Page 29: Database System Internals...Database= collection of related files DBMS= program that manages the database CSE 444 –Spring 2018 3 What we already know… Data models: relational,

CSE 444 - Spring 2020

Query Execution

Suppliers Supplies

sno = sno

𝜎 sscity=‘Seattle’ ⋀ sstate=‘WA’ ⋀ pno=2

𝛑ssno,sname

(File scan) (File scan)

(Nested loop)

(On the fly)

(On the fly)

29

open() (called by query executor)

open() (called by above operator)

open() (called by above operator)

open() open()

Supplier(sno,sname,scity,sstate)Part(pno,pname,psize,pcolor)Supplies(sno,pno,price)

April 3, 2020

Page 30: Database System Internals...Database= collection of related files DBMS= program that manages the database CSE 444 –Spring 2018 3 What we already know… Data models: relational,

CSE 444 - Spring 2020

Query Execution

Suppliers Supplies

sno = sno

𝜎 sscity=‘Seattle’ ⋀ sstate=‘WA’ ⋀ pno=2

(File scan) (File scan)

(Nested loop)

(On the fly)

(On the fly)

30

next()

next()

next()

next() next()

Supplier(sno,sname,scity,sstate)Part(pno,pname,psize,pcolor)Supplies(sno,pno,price)

next()

April 3, 2020

pull-based execution𝛑ssno,sname

Page 31: Database System Internals...Database= collection of related files DBMS= program that manages the database CSE 444 –Spring 2018 3 What we already know… Data models: relational,

Storage Manager

CSE 444 - Spring 2020 31April 3, 2020

Page 32: Database System Internals...Database= collection of related files DBMS= program that manages the database CSE 444 –Spring 2018 3 What we already know… Data models: relational,

Access Methods

Query Processor

Storage Manager

Access Methods: HeapFile, etc.

Buffer Manager

32

Operators: Sequential Scan, etc.

Data on disk

§ Operators: Process data§ Access methods:

Organize data to support fast access to desired subsets of records

§ Buffer manager: Caches data in memory. Reads/writes data to/from disk as needed

§ Disk-space manager: Allocates space on disk for files/access methodsDisk Space Mgr

CSE 444 - Spring 2020April 3, 2020

Page 33: Database System Internals...Database= collection of related files DBMS= program that manages the database CSE 444 –Spring 2018 3 What we already know… Data models: relational,

Buffer Manager (BufferPool in SimpleDB)

33

Disk

Mainmemory

Page requests from higher-level code

Buffer pool

Disk page

Free frame

1 page correspondsto 1 disk block

Disk is a collectionof blocks

Buffer pool managerAccess methods

CSE 444 - Spring 2020April 3, 2020

Page 34: Database System Internals...Database= collection of related files DBMS= program that manages the database CSE 444 –Spring 2018 3 What we already know… Data models: relational,

Buffer Manager

§ Brings pages in from memory and caches them§ Eviction policies

• Random page (ok for SimpleDB)• Least-recently used• The “clock” algorithm (see book)

§Keeps track of which pages are dirty• A dirty page has changes not reflected on disk• Implementation: Each page includes a dirty bit

CSE 444 - Spring 2020 34April 3, 2020

Page 35: Database System Internals...Database= collection of related files DBMS= program that manages the database CSE 444 –Spring 2018 3 What we already know… Data models: relational,

Access Methods

§ A DBMS stores data on disk by breaking it into pages• A page is the size of a disk block.• A page is the unit of disk IO

§ Buffer manager caches these pages in memory§ Access methods do the following:

• They organize pages into collections called DB files• They organize data inside pages• They provide an API for operators to access data in these files

§ Discussion:• OS vs DBMS files • OS vs DBMS buffer manager

35April 3, 2020 CSE 444 - Spring 2020

Page 36: Database System Internals...Database= collection of related files DBMS= program that manages the database CSE 444 –Spring 2018 3 What we already know… Data models: relational,

CSE 444 - Spring 2020

Query ExecutionHow it all Fits Together

Suppliers Supplies

sno = sno

𝜎sscity=‘Seattle’ ⋀ sstate=‘WA’ ⋀ pno=2

𝛑sno,sname

(File scan) (File scan)

(Nested loop)

(On the fly)

(On the fly)

36

open()

open()

open()

open() open()

April 3, 2020

Page 37: Database System Internals...Database= collection of related files DBMS= program that manages the database CSE 444 –Spring 2018 3 What we already know… Data models: relational,

CSE 444 - Spring 2020

Suppliers Supplies

sno = sno

𝜎 sscity=‘Seattle’ ⋀ sstate=‘WA’ ⋀ pno=2

(File scan) (File scan)

(Nested loop)

(On the fly)

(On the fly)

37

next()

next()

next()

next() next()

Query ExecutionHow it all Fits Together

next()

April 3, 2020

𝛑sno,sname

Page 38: Database System Internals...Database= collection of related files DBMS= program that manages the database CSE 444 –Spring 2018 3 What we already know… Data models: relational,

CSE 444 - Spring 2020 38

Query Execution In SimpleDB

SeqScan Operator at bottom of plan

Heap File Access Method

In SimpleDB, SeqScan can find HeapFile in Catalog

open()

open()

Offers iterator interface

• open()• next()• close()

Knows how to read/write pages from disk

next()

next()

But if Heap File reads data directly from disk, it will not stay cached in Buffer Pool!

April 3, 2020

Page 39: Database System Internals...Database= collection of related files DBMS= program that manages the database CSE 444 –Spring 2018 3 What we already know… Data models: relational,

39

Query Execution In SimpleDB

CSE 444 - Spring 2020April 3, 2020

HeapFile for R

BufferPool

Manager

39Data on disk: OS Files

Iterator interface• open()• next()• close()

Read/write pages from disk

Database sharesa single cache in Buffer Pool

HeapFile for S

HeapFile for T

HeapFileN…

Heap files for other relations

bp.getPage()

hf.readPage()

SeqScanhf.next()

Page 40: Database System Internals...Database= collection of related files DBMS= program that manages the database CSE 444 –Spring 2018 3 What we already know… Data models: relational,

HeapFile In SimpleDB

§Data is stored on disk in an OS file. HeapFileclass knows how to “decode” its content

§Control flow:

40CSE 444 - Spring 2020

The BufferManager will then call HeapFile.readPage()/writePage() page to actually read/writethe page.

SeqScan calls methods such as "iterate" on the HeapFileAccess Method

During the iteration, the HeapFile object needs to call the BufferManager.getPage() method to ensure that necessary pages get loaded into memory.

April 3, 2020