data storage formats - stanford universityplacing data for efficient access locality:which items are...

73
Data Storage Formats Instructor: Matei Zaharia cs245.stanford.edu

Upload: others

Post on 01-Oct-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read

Data Storage Formats

Instructor: Matei Zahariacs245.stanford.edu

Page 2: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read

Outline

Overview

Record encoding

Collection storage

Indexes

CS 245 2

Page 3: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read

Outline

Overview

Record encoding

Collection storage

Indexes

CS 245 3

Page 4: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read

Overview

Recall from last time: I/O slow compared to compute, random I/O ≪ sequential

Key concerns in storage:» Access time: minimize # of random accesses,

bytes transferred, etc• Main way: place co-accessed data together!

» Size: storage costs $» Ease of updates

CS 245 4

Page 5: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read

General SetupRecord collection

Index

Secondaryindex

CS 245 5

Page 6: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read

Outline

Overview

Record encoding

Collection storage

Indexes

CS 245 6

Page 7: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read

What Are the Data Items We Want to Store?a salary

a name

a date

a picture

CS 245 7

Page 8: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read

What Are the Data Items We Want to Store?a salary

a name

a date

a picture

What we have available: bytes

8bits

CS 245 8

Page 9: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read

To Represent:

Integer (short): 2 bytes

e.g., 35 is 00000000 00100011

Real, floating pointn bits for mantissa, m for exponent….

CS 245 9

Page 10: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read

Characters

® Various coding schemes available

Example: ASCIIA: 1000001a: 11000015: 0110101LF: 0001010

To Represent:

CS 245 10

Page 11: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read

Booleane.g., TRUE

FALSE1111 11110000 0000

Application specifice.g., RED ® 1 GREEN ® 3

BLUE ® 2 YELLOW ® 4 …

To Represent:

Can we use less than 1 byte/code?Yes, but only if desperate...CS 245 11

Page 12: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read

Datese.g.: - Integer, # days since Jan 1, 1900

- 8 characters, YYYYMMDD- 7 characters, YYYYDDD

Timee.g. - Integer, seconds since midnight

- characters, HHMMSSFF

To Represent:

CS 245 12

Page 13: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read

String of characters» Null terminated

e.g.,

» Length givene.g.,

- Fixed length

c ta

c ta3

To Represent:

CS 245 13

Page 14: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read

Bag of bits Length Bits

To Represent:

CS 245 14

Page 15: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read

To Represent:

CS 245 15

Page 16: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read

To Represent: Nothing

NULL concept in SQL (not same as 0 or “”)

Physical representation options:» Special “sentinel” value in fixed0length field» Boolean “is null” flag» Just skip the field in a sparse record format

Pretty common in practice!

CS 245 16

Page 17: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read

Key Point

• Fixed length items

• Variable length items- usually length given at beginning

CS 245 17

Page 18: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read

Also

Type of an item: tells us how to interpret the bytes, plus size if fixed

CS 245 18

Page 19: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read

Data Items

Records

Blocks

Files

Bigger Collections

CS 245 19

Page 20: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read

Record: Set of Related Data Items (“Fields”)

E.g.: Employee record:

name field,

salary field,

date-of-hire field, ...

CS 245 20

Page 21: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read

Main choices:» Fixed vs variable format» Fixed vs variable length

Types of Records

CS 245 21

Page 22: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read

Fixed Format

A schema (not record) contains following info:

- # of fields

- type of each field

- order in record

- meaning of each field

CS 245 22

Page 23: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read

Example: Fixed Format & Length

Employee record

(1) E#, 2 byte integer

(2) E.name, 10 char. Schema

(3) Dept, 2 byte code

55 s m i t h 02

83 j o n e s 01Records

CS 245 23

Page 24: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read

Variable Format

Record itself contains format

“Self Describing”

CS 245 24

Page 25: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read

4I52 4S DROF46

Field name codes could also be strings, i.e. TAGS

# Fi

elds

Cod

e id

entif

ying

field

as

E#In

tege

r typ

e

Cod

e fo

r Ena

me

Strin

g ty

peLe

ngth

of s

tr.

Example: Variable Format & Length

CS 245 25

Page 26: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read

Variable Format Useful For

“Sparse” records

Repeating fields

Evolving formats

But may waste space...

CS 245 26

Page 27: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read

Example: Variable Format Record with Repeated Fields

Employee ® one or more ® children

3 E_name: Fred Child: Sally Child: Tom

CS 245 27

Page 28: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read

Note: Repeated Fields Does Not Imply Variable Format/Length

Could have fixed space for a max # of items and their sizes

John Sailing Chess (null)

CS 245 28

Page 29: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read

Example: Include a record type in record

record type record length

Type is a pointer to one of several schemas

5 27 . . . .

Many Variants Between Fixed and Variable Format

CS 245 29

Page 30: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read

May contain:- record type- record length- timestamp- concurrency stuff ...

Record Header: Data at Start that Describes a Record

CS 245 30

Page 31: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read

Exercise: How to store XML data?

<table><description> people on the fourth floor <\description><people>

<person><name> Alan <\name><age> 42 <\age><email> [email protected] <\email>

<\person><person>

<name> Sally <\name><age> 30 <\age><email> [email protected] <\email>

<\person><\people><\table>

Source: Data on the Web, Abiteboul et alCS 245 31

Page 32: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read

Compression» Within record: e.g. encoding selection» Collection of records: use common patterns

Encryption» Usually operates on large blocks

Other Interesting Issues

CS 245 32

Page 33: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read

Outline

Overview

Record encoding

Collection storage

Indexes

CS 245 33

Page 34: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read

Collection Storage Questions

How do we place data items and records for efficient access?» Locality and searchability

How do we physically encode records in blocks and files?

CS 245 34

Page 35: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read

Placing Data for Efficient AccessLocality: which items are accessed together» When you read one field of a record, you’re

likely to read other fields of the same record» When you read one field of record 1, you’re

likely to read the same field of record 2

Searchability: quickly find relevant records» E.g. sorting the file lets you do binary search

CS 245 35

Page 36: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read

Locality Example: Row Stores vs Column Stores

Row Store Column Store

AlexBob

CarolDavidEve

Frances

203042212656

GiaHaroldIvan

192841

CACANYMACANYMAAKCA

name age state

Fields stored contiguouslyin one file

AlexBob

CarolDavidEve

FrancesGia

HaroldIvan

name203042212656192841

ageCACANYMACANYMAAKCA

state

Each column in a different file

CS 245 36

Page 37: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read

Locality Example: Row Stores vs Column Stores

Row Store Column Store

AlexBob

CarolDavidEve

Frances

203042212656

GiaHaroldIvan

192841

CACANYMACANYMAAKCA

name age state

Fields stored contiguouslyin one file

AlexBob

CarolDavidEve

FrancesGia

HaroldIvan

name203042212656192841

ageCACANYMACANYMAAKCA

state

Each column in a different file

Accessing all fields of one record: 1 random I/O for row, 3 for columnCS 245 37

Page 38: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read

Locality Example: Row Stores vs Column Stores

Row Store Column Store

AlexBob

CarolDavidEve

Frances

203042212656

GiaHaroldIvan

192841

CACANYMACANYMAAKCA

name age state

Fields stored contiguouslyin one file

AlexBob

CarolDavidEve

FrancesGia

HaroldIvan

name203042212656192841

ageCACANYMACANYMAAKCA

state

Each column in a different file

Accessing one field of all records: 3x less I/O for column storeCS 245 38

Page 39: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read

Can We Have Hybrids Between Row & Column?

Yes! For example, colocated column groups:

AlexBob

CarolDavidEve

FrancesGia

HaroldIvan

name203042212656192841

ageCACANYMACANYMAAKCA

state

File 1 File 2: age & state

Helpful if age & state are frequently co-accessedCS 245 39

Page 40: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read

Improving Searchability: Ordering

Ordering the data by a field will give:» Closer I/Os if queries tend to read data with

nearby values of the field (e.g. time ranges)» Option to accelerate search via an ordered

index (e.g. B-tree), binary search, etc

What’s the downside of having an ordering?

CS 245 40

Page 41: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read

Improving Searchability: PartitionsJust place data into buckets based on a field (but not necessarily fine-grained order)

E.g. Hive table storage over filesystem or S3:

/my_table/date=20190101/file1.parquet/my_table/date=20190101/file2.parquet/my)table/date=20190102/file1.parquet/my_table/date=20190101/file2.parquet/my_table/date=20190103/file1.parquet

...

Easy to add, remove & list files in any directoryCS 245 41

Page 42: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read

Can We Have Searchability on Multiple Fields at Once?Yes! Many possible ways:

1) Multiple partition or sort keys (e.g. partition data by date, then group by customer ID)

2) Interleaved orderings such as Z-ordering

CS 245 42

Page 43: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read

Z-Ordering

Image source: Wikipedia

dimension 1

dimension 2

CS 245 43

Page 44: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read

How Do We Encode Records into Blocks & Files?

CS 245 44

Page 45: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read

How Do We Encode Records into Blocks & Files?

blocks

a file

records

CS 245 45

Page 46: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read

Questions in Storing Records

(1) separating records

(2) spanned vs. unspanned

(3) indirection

CS 245 46

Page 47: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read

Block

(a) no need to separate - fixed size recs.(b) special marker(c) give record lengths (or offsets)

- within each record- in block header

R2R1 R3

(1) Separating Records

CS 245 47

Page 48: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read

Unspanned: records must be within one block

block 1 block 2

Spanned:

block 1 block 2

...

R1 R2

R1

R3 R4 R5

R2 R3(a)

R3(b) R6R5R4 R7

(a)

(2) Spanned vs Unspanned

CS 245 48

Page 49: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read

With Spanned Records

need indication need indicationof partial record of continuation“pointer” to rest (+ from where?)

R1 R2 R3(a)

R3(b) R6R5R4 R7

(a)

CS 245 49

Page 50: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read

Spanned vs Unspanned

Unspanned is much simpler, but may waste storage space…

Spanned essential if record size > block size

CS 245 50

Page 51: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read

How does one refer to specific records?(e.g. in metadata or in other records)

Rx

(4) Indirection

CS 245 51

Page 52: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read

How does one refer to records?

Rx

Many options:Physical Indirect

(4) Indirection

CS 245 52

Page 53: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read

Device IDE.g., Record Cylinder #

Address = Track #or ID Block #

Offset in block

Block ID

Purely Physical

CS 245 53

Page 54: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read

E.g., Record ID is arbitrary bit string

maprec ID

r addressa

Physicaladdr.Rec ID

Fully Indirect

CS 245 54

Page 55: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read

Flexibility Costto move records of indirection(for deletions, insertions)

Tradeoff

CS 245 55

Page 56: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read

Physical Indirect

Many optionsin between …

CS 245 56

Page 57: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read

Header

A block: Free space

R3R4R1 R2

Example: Indirection in Block

CS 245 57

Page 58: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read

May contain:- File ID (or table or database ID)- This block ID- Record directory- Pointer to free space- Type of block (e.g. contains recs type 4)- Pointer to other blocks “like it”- Timestamp ...

Block Header: Data at Start that Describes Block

CS 245 58

Page 59: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read

Other Concern: Deletion!

CS 245 59

Page 60: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read

Options

(a) Immediately reclaim space

(b) Mark deleted

CS 245 60

Page 61: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read

Options

(a) Immediately reclaim space

(b) Mark deleted– May need chain of deleted records

(for space re-use)– Need a way to mark:

• special characters• delete field• entries in maps

CS 245 61

Page 62: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read

How expensive is to move valid record to free space for immediate reclaim?

How much space is wasted?» e.g., deleted records, delete fields, free

space chains,...

As Usual, Many Tradeoffs

CS 245 62

Page 63: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read

Concern with Deletions

Dangling pointers

CS 245 63

R1 ?

Page 64: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read

CS 245 64

Solution 1: Do Not Worry

Page 65: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read

Solution 2: Tombstones

Special mark in old location or mappings

CS 245 65

Page 66: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read

Solution 2: Tombstones

Special mark in old location or mappings

CS 245 66

Physical IDs:

A block

This space This space cannever re-used be re-used

Page 67: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read

Logical IDs:

ID LOC

7788

map

Never reuseID 7788 nor space in map...

CS 245 67

Solution 2: Tombstones

Special mark in old location or mappings

Page 68: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read

Insertion

Easy case: records not ordered

® Insert new record at end of file or in a deleted slot

® If records are variable size, not as easy...

CS 245 68

Page 69: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read

Insertion

Hard case: records are ordered

® If free space close by, not too bad...

® Otherwise, use an overflow area?

CS 245 69

Page 70: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read

How much free space to leave in each block, track, cylinder?

How often do I reorganize file + overflow?

CS 245 70

Interesting Problems

Freespace

Page 71: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read

Summary

There are 10,000,000 ways to organize my data on disk…

Which is right for me?

CS 245 71

Page 72: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read

Flexibility Space Utilization

Complexity Performance

Issues

CS 245 72

Page 73: Data Storage Formats - Stanford UniversityPlacing Data for Efficient Access Locality:which items are accessed together »When you read one field of a record, you’re likely to read

To Evaluate a Strategy, Compute:

Space used for expected data

Expected time to- fetch record given key- fetch record with next key- insert record- append record- delete record- update record- read all file- reorganize file

CS 245 73