cs 471 operating systems yue chengyuecheng/teaching/materials/lec-08... · mapreduce grep 63 very...

78
CS 471 Operating Systems Yue Cheng George Mason University Fall 2017

Upload: lyngoc

Post on 06-Feb-2018

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CS 471 Operating Systems Yue Chengyuecheng/teaching/materials/lec-08... · MapReduce Grep 63 Very big data Split data Split data Split data Split data grep grep grep grep matches

CS 471 Operating Systems

Yue ChengGeorge Mason University

Fall 2017

Page 2: CS 471 Operating Systems Yue Chengyuecheng/teaching/materials/lec-08... · MapReduce Grep 63 Very big data Split data Split data Split data Split data grep grep grep grep matches

Google File SystemMapReduce

Key-Value Store

2

Page 3: CS 471 Operating Systems Yue Chengyuecheng/teaching/materials/lec-08... · MapReduce Grep 63 Very big data Split data Split data Split data Split data grep grep grep grep matches

Google File SystemMapReduce

Key-Value Store

3

Page 4: CS 471 Operating Systems Yue Chengyuecheng/teaching/materials/lec-08... · MapReduce Grep 63 Very big data Split data Split data Split data Split data grep grep grep grep matches

Google File System (GFS) Overview

o Motivation

o Architecture

4

Page 5: CS 471 Operating Systems Yue Chengyuecheng/teaching/materials/lec-08... · MapReduce Grep 63 Very big data Split data Split data Split data Split data grep grep grep grep matches

GFSo Goal: a global (distributed) file system that

stores data across many machines– Need to handle 100’s TBs

o Google published details in 2003

o Open source implementation: – Hadoop Distributed File System (HDFS)

5

Page 6: CS 471 Operating Systems Yue Chengyuecheng/teaching/materials/lec-08... · MapReduce Grep 63 Very big data Split data Split data Split data Split data grep grep grep grep matches

Workload-driven Designo Google workload characteristics

– Huge files (GBs)– Almost all writes are appends– Concurrent appends common– High throughput is valuable– Low latency is not

6

Page 7: CS 471 Operating Systems Yue Chengyuecheng/teaching/materials/lec-08... · MapReduce Grep 63 Very big data Split data Split data Split data Split data grep grep grep grep matches

Example Workloadso Read entire dataset, do computation over it

o Producer/consumer: many producers append work to file concurrently; one consumer reads and does work

7

Page 8: CS 471 Operating Systems Yue Chengyuecheng/teaching/materials/lec-08... · MapReduce Grep 63 Very big data Split data Split data Split data Split data grep grep grep grep matches

Workload-driven Designo Build a global file system that incorporates all

these application properties

o Only supports features required by applications

o Avoid difficult local file system features, e.g.:– rename dir– links

8

Page 9: CS 471 Operating Systems Yue Chengyuecheng/teaching/materials/lec-08... · MapReduce Grep 63 Very big data Split data Split data Split data Split data grep grep grep grep matches

Google File System (GFS) Overview

o Motivation

o Architecture

9

Page 10: CS 471 Operating Systems Yue Chengyuecheng/teaching/materials/lec-08... · MapReduce Grep 63 Very big data Split data Split data Split data Split data grep grep grep grep matches

Replication

10

GFS Server 1 GFS Server 2 GFS Server 3 GFS Server 4

A A

Page 11: CS 471 Operating Systems Yue Chengyuecheng/teaching/materials/lec-08... · MapReduce Grep 63 Very big data Split data Split data Split data Split data grep grep grep grep matches

Replication

11

GFS Server 1 GFS Server 2 GFS Server 3 GFS Server 4

A AB BC C C

Page 12: CS 471 Operating Systems Yue Chengyuecheng/teaching/materials/lec-08... · MapReduce Grep 63 Very big data Split data Split data Split data Split data grep grep grep grep matches

Replication

12

GFS Server 1 GFS Server 2 GFS Server 3 GFS Server 4

A AB BC C C

Similar to RAID, but less orderly than RAID• Machines’ capacity may vary• Different data may have different replication factors

Page 13: CS 471 Operating Systems Yue Chengyuecheng/teaching/materials/lec-08... · MapReduce Grep 63 Very big data Split data Split data Split data Split data grep grep grep grep matches

Data Recovery

13

GFS Server 1 GFS Server 2 GFS Server 3 GFS Server 4

A AB BC C C

Page 14: CS 471 Operating Systems Yue Chengyuecheng/teaching/materials/lec-08... · MapReduce Grep 63 Very big data Split data Split data Split data Split data grep grep grep grep matches

Data Recovery

14

GFS Server 1 GFS Server 2 ??? GFS Server 4

A AB BC C C

Page 15: CS 471 Operating Systems Yue Chengyuecheng/teaching/materials/lec-08... · MapReduce Grep 63 Very big data Split data Split data Split data Split data grep grep grep grep matches

Data Recovery

15

GFS Server 1 GFS Server 2 ??? GFS Server 4

A AB BC C CA

Replicating A to maintain a replication factor of 2

Page 16: CS 471 Operating Systems Yue Chengyuecheng/teaching/materials/lec-08... · MapReduce Grep 63 Very big data Split data Split data Split data Split data grep grep grep grep matches

Data Recovery

16

GFS Server 1 GFS Server 2 ??? GFS Server 4

A AB BC C CA C

Replicating C to maintain a replication factor of 3

Page 17: CS 471 Operating Systems Yue Chengyuecheng/teaching/materials/lec-08... · MapReduce Grep 63 Very big data Split data Split data Split data Split data grep grep grep grep matches

Data Recovery

17

GFS Server 1 GFS Server 2 ??? GFS Server 4

A AB BC C CA C

Machine may be dead forever, or it may come back

Page 18: CS 471 Operating Systems Yue Chengyuecheng/teaching/materials/lec-08... · MapReduce Grep 63 Very big data Split data Split data Split data Split data grep grep grep grep matches

Data Recovery

18

GFS Server 1 GFS Server 2 GFS Server 3 GFS Server 4

A AB BC C CA C

Machine may be dead forever, or it may come back

Page 19: CS 471 Operating Systems Yue Chengyuecheng/teaching/materials/lec-08... · MapReduce Grep 63 Very big data Split data Split data Split data Split data grep grep grep grep matches

Data Recovery

19

GFS Server 1 GFS Server 2 GFS Server 3 GFS Server 4

A AB BC C CA C

Page 20: CS 471 Operating Systems Yue Chengyuecheng/teaching/materials/lec-08... · MapReduce Grep 63 Very big data Split data Split data Split data Split data grep grep grep grep matches

Data Recovery

20

GFS Server 1 GFS Server 2 GFS Server 3 GFS Server 4

AB BC C CA C

Data RebalancingDeleting one A to maintain a replication factor of 2

Page 21: CS 471 Operating Systems Yue Chengyuecheng/teaching/materials/lec-08... · MapReduce Grep 63 Very big data Split data Split data Split data Split data grep grep grep grep matches

Data Recovery

21

GFS Server 1 GFS Server 2 GFS Server 3 GFS Server 4

AB BC C CA C

Page 22: CS 471 Operating Systems Yue Chengyuecheng/teaching/materials/lec-08... · MapReduce Grep 63 Very big data Split data Split data Split data Split data grep grep grep grep matches

Data Recovery

22

GFS Server 1 GFS Server 2 GFS Server 3 GFS Server 4

AB BC C CA

Data RebalancingDeleting one C to maintain a replication factor of 3

Page 23: CS 471 Operating Systems Yue Chengyuecheng/teaching/materials/lec-08... · MapReduce Grep 63 Very big data Split data Split data Split data Split data grep grep grep grep matches

Data Recovery

23

GFS Server 1 GFS Server 2 GFS Server 3 GFS Server 4

AB BC C CA

Question: how to maintain a global view of all datadistributed across machines?

Page 24: CS 471 Operating Systems Yue Chengyuecheng/teaching/materials/lec-08... · MapReduce Grep 63 Very big data Split data Split data Split data Split data grep grep grep grep matches

GFS Architecture

24

Master

Clients GFS Servers

Page 25: CS 471 Operating Systems Yue Chengyuecheng/teaching/materials/lec-08... · MapReduce Grep 63 Very big data Split data Split data Split data Split data grep grep grep grep matches

GFS Architecture

25

Master

Clients GFS Servers

RPC RPC

RPC

Page 26: CS 471 Operating Systems Yue Chengyuecheng/teaching/materials/lec-08... · MapReduce Grep 63 Very big data Split data Split data Split data Split data grep grep grep grep matches

GFS Architecture

26

Master[metadata]

Clients GFS Servers[data]

RPC RPC

RPCmany many

one

Page 27: CS 471 Operating Systems Yue Chengyuecheng/teaching/materials/lec-08... · MapReduce Grep 63 Very big data Split data Split data Split data Split data grep grep grep grep matches

GFS Architecture

27

GFS Server 1 GFS Server 2 GFS Server 3 GFS Server 4

AB BC C CA

Master[metadata]

Client 1 Client 2 Client 3

Page 28: CS 471 Operating Systems Yue Chengyuecheng/teaching/materials/lec-08... · MapReduce Grep 63 Very big data Split data Split data Split data Split data grep grep grep grep matches

Data Chunkso Break large GFS files into coarse-grained data

chunks (e.g., 64MB)

o GFS servers store physical data chunks in local Linux file system

o Centralized master keeps track of mapping between logical and physical chunks

28

Page 29: CS 471 Operating Systems Yue Chengyuecheng/teaching/materials/lec-08... · MapReduce Grep 63 Very big data Split data Split data Split data Split data grep grep grep grep matches

Chunk Map

29

Master

chunk maplogical phys

924521…

s2,s5,s7s2,s9,s11

Page 30: CS 471 Operating Systems Yue Chengyuecheng/teaching/materials/lec-08... · MapReduce Grep 63 Very big data Split data Split data Split data Split data grep grep grep grep matches

GFS Server s2

30

chunk maplogical phys

924521…

s2,s5,s7s2,s9,s11

GFS server s2

Local fschunks/924 => data1chunks/521 => data2…

Master

Page 31: CS 471 Operating Systems Yue Chengyuecheng/teaching/materials/lec-08... · MapReduce Grep 63 Very big data Split data Split data Split data Split data grep grep grep grep matches

Client Reads a Chunk

31

chunk maplogical phys

924521…

s2,s5,s7s2,s9,s11

GFS server s2

Local fschunks/924 => data1chunks/521 => data2…

Client

lookup 924

Master

Page 32: CS 471 Operating Systems Yue Chengyuecheng/teaching/materials/lec-08... · MapReduce Grep 63 Very big data Split data Split data Split data Split data grep grep grep grep matches

Client Reads a Chunk

32

chunk maplogical phys

924521…

s2,s5,s7s2,s9,s11

GFS server s2

Local fschunks/924 => data1chunks/521 => data2…

Client

s2,s5,s7

Master

Page 33: CS 471 Operating Systems Yue Chengyuecheng/teaching/materials/lec-08... · MapReduce Grep 63 Very big data Split data Split data Split data Split data grep grep grep grep matches

Client Reads a Chunk

33

chunk maplogical phys

924521…

s2,s5,s7s2,s9,s11

GFS server s2

Local fschunks/924 => data1chunks/521 => data2…

ClientMaster

Page 34: CS 471 Operating Systems Yue Chengyuecheng/teaching/materials/lec-08... · MapReduce Grep 63 Very big data Split data Split data Split data Split data grep grep grep grep matches

Client Reads a Chunk

34

chunk maplogical phys

924521…

s2,s5,s7s2,s9,s11

GFS server s2

Local fschunks/924 => data1chunks/521 => data2…

Client

read 924:offset=0size=1MB

Master

Page 35: CS 471 Operating Systems Yue Chengyuecheng/teaching/materials/lec-08... · MapReduce Grep 63 Very big data Split data Split data Split data Split data grep grep grep grep matches

Client Reads a Chunk

35

chunk maplogical phys

924521…

s2,s5,s7s2,s9,s11

GFS server s2

Local fschunks/924 => data1chunks/521 => data2…

Client

data

Master

Page 36: CS 471 Operating Systems Yue Chengyuecheng/teaching/materials/lec-08... · MapReduce Grep 63 Very big data Split data Split data Split data Split data grep grep grep grep matches

Client Reads a Chunk

36

chunk maplogical phys

924521…

s2,s5,s7s2,s9,s11

GFS server s2

Local fschunks/924 => data1chunks/521 => data2…

Client

read 924:offset=1MBsize=1MB

Master

Page 37: CS 471 Operating Systems Yue Chengyuecheng/teaching/materials/lec-08... · MapReduce Grep 63 Very big data Split data Split data Split data Split data grep grep grep grep matches

Client Reads a Chunk

37

chunk maplogical phys

924521…

s2,s5,s7s2,s9,s11

GFS server s2

Local fschunks/924 => data1chunks/521 => data2…

Client

data

Master

Page 38: CS 471 Operating Systems Yue Chengyuecheng/teaching/materials/lec-08... · MapReduce Grep 63 Very big data Split data Split data Split data Split data grep grep grep grep matches

File Namespace

38

chunk maplogical phys

924521…

s2,s5,s7s2,s9,s11

GFS server s2

Local fschunks/924 => data1chunks/521 => data2…

Client

path names mapped to logical names

file namespace:/foo/bar => 924,813/var/log => 123,999

Master

Page 39: CS 471 Operating Systems Yue Chengyuecheng/teaching/materials/lec-08... · MapReduce Grep 63 Very big data Split data Split data Split data Split data grep grep grep grep matches

Google File SystemMapReduce

Key-Value Store

39

Page 40: CS 471 Operating Systems Yue Chengyuecheng/teaching/materials/lec-08... · MapReduce Grep 63 Very big data Split data Split data Split data Split data grep grep grep grep matches

MapReduce Overviewo Motivation

o Architecture

o Programming Model

40

Page 41: CS 471 Operating Systems Yue Chengyuecheng/teaching/materials/lec-08... · MapReduce Grep 63 Very big data Split data Split data Split data Split data grep grep grep grep matches

Problemo Datasets are too big to process using single

machine

o Good concurrent processing engines are rare

o Want a concurrent processing framework that is:– easy to use (no locks, CVs, race conditions)– general (works for many problems)

41

Page 42: CS 471 Operating Systems Yue Chengyuecheng/teaching/materials/lec-08... · MapReduce Grep 63 Very big data Split data Split data Split data Split data grep grep grep grep matches

MapReduceo Strategy: break data into buckets, do

computation over each bucket

o Google published details in 2004

o Open source implementation: Hadoop

42

Page 43: CS 471 Operating Systems Yue Chengyuecheng/teaching/materials/lec-08... · MapReduce Grep 63 Very big data Split data Split data Split data Split data grep grep grep grep matches

Example: Word Count

Word Count

was 28

what 129

was 54

what 18

was 32

map 10

43

How to quickly sum word counts withmultiple machines concurrently?

Page 44: CS 471 Operating Systems Yue Chengyuecheng/teaching/materials/lec-08... · MapReduce Grep 63 Very big data Split data Split data Split data Split data grep grep grep grep matches

Example: Word Count

Word Count

was 28

what 129

was 54

what 18

was 32

map 10

44

mapper 1

was 28

what 129

was 54

mapper 2

what 18

was 32

map 10

Page 45: CS 471 Operating Systems Yue Chengyuecheng/teaching/materials/lec-08... · MapReduce Grep 63 Very big data Split data Split data Split data Split data grep grep grep grep matches

Example: Word Count

45

was 28+54

what 129

what 18

was 32

map 10

Word Count

was 28

what 129

was 54

what 18

was 32

map 10

mapper 1

was 28

what 129

was 54

mapper 2

what 18

was 32

map 10

Page 46: CS 471 Operating Systems Yue Chengyuecheng/teaching/materials/lec-08... · MapReduce Grep 63 Very big data Split data Split data Split data Split data grep grep grep grep matches

Example: Word Count

46

reducer 1

reducer 2

Reduce was

Reduce what

Reduce map

was 28+54

what 129

what 18

was 32

map 10

Word Count

was 28

what 129

was 54

what 18

was 32

map 10

mapper 1

was 28

what 129

was 54

mapper 2

what 18

was 32

map 10

Page 47: CS 471 Operating Systems Yue Chengyuecheng/teaching/materials/lec-08... · MapReduce Grep 63 Very big data Split data Split data Split data Split data grep grep grep grep matches

Example: Word Count

47

reducer 1

reducer 2

was: 114

what: 147

map: 10

was 28+54

what 129

what 18

was 32

map 10

Word Count

was 28

what 129

was 54

what 18

was 32

map 10

mapper 1

was 28

what 129

was 54

mapper 2

what 18

was 32

map 10

Page 48: CS 471 Operating Systems Yue Chengyuecheng/teaching/materials/lec-08... · MapReduce Grep 63 Very big data Split data Split data Split data Split data grep grep grep grep matches

MapReduce Overviewo Motivation

o Architecture

o Programming Model

48

Page 49: CS 471 Operating Systems Yue Chengyuecheng/teaching/materials/lec-08... · MapReduce Grep 63 Very big data Split data Split data Split data Split data grep grep grep grep matches

MapReduce Architecture

49

Master

Worker Worker Worker

Master node

Slave node 1 Slave node 2 Slave node N

Chunks

Client

Chunks Chunks

Page 50: CS 471 Operating Systems Yue Chengyuecheng/teaching/materials/lec-08... · MapReduce Grep 63 Very big data Split data Split data Split data Split data grep grep grep grep matches

MapReduce Architecture

50

Master

Worker Worker Worker

Master node

Slave node 1 Slave node 2 Slave node N

Chunks

Client

Chunks Chunks

GFS layer storing data

chunks

Page 51: CS 471 Operating Systems Yue Chengyuecheng/teaching/materials/lec-08... · MapReduce Grep 63 Very big data Split data Split data Split data Split data grep grep grep grep matches

MapReduce over GFSo MapReduce writes and reads data to/from GFS

o MapReduce workers run on same machines as GFS server daemons

51

GFSfiles Mappers Intermediate

local files Reducers GFSfiles

Page 52: CS 471 Operating Systems Yue Chengyuecheng/teaching/materials/lec-08... · MapReduce Grep 63 Very big data Split data Split data Split data Split data grep grep grep grep matches

MapReduce Data Flows & Executions

52

Page 53: CS 471 Operating Systems Yue Chengyuecheng/teaching/materials/lec-08... · MapReduce Grep 63 Very big data Split data Split data Split data Split data grep grep grep grep matches

53

GFSfiles Mappers Intermediate

local files Reducers GFSfiles

Page 54: CS 471 Operating Systems Yue Chengyuecheng/teaching/materials/lec-08... · MapReduce Grep 63 Very big data Split data Split data Split data Split data grep grep grep grep matches

MapReduce Overviewo Motivation

o Architecture

o Programming Model

54

Page 55: CS 471 Operating Systems Yue Chengyuecheng/teaching/materials/lec-08... · MapReduce Grep 63 Very big data Split data Split data Split data Split data grep grep grep grep matches

Map/Reduce Function Typeso map(k1, v1) à list(k2, v2)o reduce(k2, list(v2)) à list(k3, v3)

55

Page 56: CS 471 Operating Systems Yue Chengyuecheng/teaching/materials/lec-08... · MapReduce Grep 63 Very big data Split data Split data Split data Split data grep grep grep grep matches

Hadoop APIpublic void map(LongWritable key, Text value) {

// WRITE CODE HERE}

public void reduce(Text key, Iterator<IntWritable> values) {

// WRITE CODE HERE}

56

Page 57: CS 471 Operating Systems Yue Chengyuecheng/teaching/materials/lec-08... · MapReduce Grep 63 Very big data Split data Split data Split data Split data grep grep grep grep matches

MapReduce Word Count Pseudo Code

func mapper(key, line) {

for word in line.split()

yield word, 1}

func reducer(word, occurrences) {yield word, sum(occurrences)

}

57

Page 58: CS 471 Operating Systems Yue Chengyuecheng/teaching/materials/lec-08... · MapReduce Grep 63 Very big data Split data Split data Split data Split data grep grep grep grep matches

MapReduce Word Count

58

Very big

data

Split dataSplit dataSplit data

Split data

countcountcount

count

countcountcount

count

merge Mergedcounts

Page 59: CS 471 Operating Systems Yue Chengyuecheng/teaching/materials/lec-08... · MapReduce Grep 63 Very big data Split data Split data Split data Split data grep grep grep grep matches

MapReduce Word Count

59

Very big

data

Split dataSplit dataSplit data

Split data

countcountcount

count

countcountcount

count

merge Mergedcounts

Page 60: CS 471 Operating Systems Yue Chengyuecheng/teaching/materials/lec-08... · MapReduce Grep 63 Very big data Split data Split data Split data Split data grep grep grep grep matches

MapReduce Word Count

60

Very big

data

Split dataSplit dataSplit data

Split data

countcountcount

count

countcountcount

count

merge Mergedcounts

Page 61: CS 471 Operating Systems Yue Chengyuecheng/teaching/materials/lec-08... · MapReduce Grep 63 Very big data Split data Split data Split data Split data grep grep grep grep matches

MapReduce Word Count

61

Very big

data

Split dataSplit dataSplit data

Split data

countcountcount

count

countcountcount

count

merge Mergedcounts

Page 62: CS 471 Operating Systems Yue Chengyuecheng/teaching/materials/lec-08... · MapReduce Grep 63 Very big data Split data Split data Split data Split data grep grep grep grep matches

MapReduce Word Count

62

Very big

data

Split dataSplit dataSplit data

Split data

countcountcount

count

countcountcount

count

merge Mergedcounts

Page 63: CS 471 Operating Systems Yue Chengyuecheng/teaching/materials/lec-08... · MapReduce Grep 63 Very big data Split data Split data Split data Split data grep grep grep grep matches

MapReduce Grep

63

Very big

data

Split dataSplit dataSplit data

Split data

grepgrepgrep

grep

matchesmatchesmatches

matches

cat Allmatches

Page 64: CS 471 Operating Systems Yue Chengyuecheng/teaching/materials/lec-08... · MapReduce Grep 63 Very big data Split data Split data Split data Split data grep grep grep grep matches

Google File SystemMapReduce

Key-Value Store

64Credit: Prof. Hector Garcia-Molina@Stanford

Page 65: CS 471 Operating Systems Yue Chengyuecheng/teaching/materials/lec-08... · MapReduce Grep 63 Very big data Split data Split data Split data Split data grep grep grep grep matches

Key-Value Store

65

key valuek1 v1k2 v2k3 v3k4 v4

Table T:

Page 66: CS 471 Operating Systems Yue Chengyuecheng/teaching/materials/lec-08... · MapReduce Grep 63 Very big data Split data Split data Split data Split data grep grep grep grep matches

Key-Value Store

66

key valuek1 v1k2 v2k3 v3k4 v4

Table T:

keys are sorted

o API:– lookup(key) ® value– lookup(key range) ® values– getNext ® value– insert(key, value)– delete(key)

o Each row has timestempo Single row actions atomic

(but not persistent in some systems?)

o No multi-key transactionso No query language!

Page 67: CS 471 Operating Systems Yue Chengyuecheng/teaching/materials/lec-08... · MapReduce Grep 63 Very big data Split data Split data Split data Split data grep grep grep grep matches

Partitioning (Sharding)

67

key valuek1 v1k2 v2k3 v3k4 v4k5 v5k6 v6k7 v7k8 v8k9 v9k10 v10

key valuek1 v1k2 v2k3 v3k4 v4

key valuek5 v5k6 v6

key valuek7 v7k8 v8k9 v9k10 v10

server 1 server 2 server 3

• use a partition vector• “auto-sharding”: vector selected automatically

tablet

Page 68: CS 471 Operating Systems Yue Chengyuecheng/teaching/materials/lec-08... · MapReduce Grep 63 Very big data Split data Split data Split data Split data grep grep grep grep matches

Tablet Replication

68

key valuek7 v7k8 v8k9 v9k10 v10

server 3 server 4 server 5

key valuek7 v7k8 v8k9 v9k10 v10

key valuek7 v7k8 v8k9 v9k10 v10

primary backup backup

• Cassandra:Replication Factor (# copies)R/W Rule: One, Quorum, AllPolicy (e.g., Rack Unaware, Rack Aware, ...)Read all copies (return fastest reply, do repairs if necessary)

• HBase: Does not manage replication, relies on HDFS

Page 69: CS 471 Operating Systems Yue Chengyuecheng/teaching/materials/lec-08... · MapReduce Grep 63 Very big data Split data Split data Split data Split data grep grep grep grep matches

Need a “directory”o Add naming hierarchy to a flat namespace

o Table Name: – Key ® Servers: stores key ® Backup servers

o Can be implemented as a special table

69

Page 70: CS 471 Operating Systems Yue Chengyuecheng/teaching/materials/lec-08... · MapReduce Grep 63 Very big data Split data Split data Split data Split data grep grep grep grep matches

Tablet Internals

70

key valuek3 v3k8 v8k9 deletek15 v15

key valuek2 v2k6 v6k9 v9k12 v12

key valuek4 v4k5 deletek10 v10k20 v20k22 v22

memory

disk

Design Philosophy (?): Primary scenario is where all data is in memoryDisk storage added as an afterthought

Page 71: CS 471 Operating Systems Yue Chengyuecheng/teaching/materials/lec-08... · MapReduce Grep 63 Very big data Split data Split data Split data Split data grep grep grep grep matches

key valuek3 v3k8 v8k9 deletek15 v15

key valuek2 v2k6 v6k9 v9k12 v12

key valuek4 v4k5 deletek10 v10k20 v20k22 v22

memory

disk

Tablet Internals

71

flush periodically

• tablet is merge of all segments (files)• disk segments imutable• writes efficient; reads only efficient when all data in memory• periodically reorganize into single segment

tombstone

Page 72: CS 471 Operating Systems Yue Chengyuecheng/teaching/materials/lec-08... · MapReduce Grep 63 Very big data Split data Split data Split data Split data grep grep grep grep matches

Column Family

72

K A B C D Ek1 a1 b1 c1 d1 e1k2 a2 null c2 d2 e2k3 null null null d3 e3k4 a4 b4 c4 e4 e4k5 a5 b5 null null null

Page 73: CS 471 Operating Systems Yue Chengyuecheng/teaching/materials/lec-08... · MapReduce Grep 63 Very big data Split data Split data Split data Split data grep grep grep grep matches

Column Family

73

K A B C D Ek1 a1 b1 c1 d1 e1k2 a2 null c2 d2 e2k3 null null null d3 e3k4 a4 b4 c4 e4 e4k5 a5 b5 null null null

• for storage, treat each row as a single “super value”• API provides access to sub-values

(use family:qualifier to refer to sub-valuese.g., price:euros, price:dollars )

• Cassandra allows “super-column”:two level nesting of columns(e.g., Column A can have sub-columns X & Y )

Page 74: CS 471 Operating Systems Yue Chengyuecheng/teaching/materials/lec-08... · MapReduce Grep 63 Very big data Split data Split data Split data Split data grep grep grep grep matches

Vertical Partitions

74

K A B C D Ek1 a1 b1 c1 d1 e1k2 a2 null c2 d2 e2k3 null null null d3 e3k4 a4 b4 c4 e4 e4k5 a5 b5 null null null

K Ak1 a1k2 a2k4 a4k5 a5

K Bk1 b1k4 b4k5 b5

K Ck1 c1k2 c2k4 c4

K D Ek1 d1 e1k2 d2 e2k3 d3 e3k4 e4 e4

can be manually implemented as

server 1 server 2 server 3 server 4

Page 75: CS 471 Operating Systems Yue Chengyuecheng/teaching/materials/lec-08... · MapReduce Grep 63 Very big data Split data Split data Split data Split data grep grep grep grep matches

Vertical Partitions

75

K A B C D Ek1 a1 b1 c1 d1 e1k2 a2 null c2 d2 e2k3 null null null d3 e3k4 a4 b4 c4 e4 e4k5 a5 b5 null null null

K Ak1 a1k2 a2k4 a4k5 a5

K Bk1 b1k4 b4k5 b5

K Ck1 c1k2 c2k4 c4

K D Ek1 d1 e1k2 d2 e2k3 d3 e3k4 e4 e4

column family

• good for sparse data;• good for column scans• not so good for tuple reads• are atomic updates to row still supported?• API supports actions on full table; mapped to actions on column tables• API supports column “project”• To decide on vertical partition, need to know access patterns

Page 76: CS 471 Operating Systems Yue Chengyuecheng/teaching/materials/lec-08... · MapReduce Grep 63 Very big data Split data Split data Split data Split data grep grep grep grep matches

Failure Recovery (BigTable, HBase)

76

tablet servermemory

logGFS or HFS

master node sparetablet server

write ahead logging

ping

Page 77: CS 471 Operating Systems Yue Chengyuecheng/teaching/materials/lec-08... · MapReduce Grep 63 Very big data Split data Split data Split data Split data grep grep grep grep matches

Failure recovery (Cassandra)o No master node, all nodes in “cluster” equal

77

server 1 server 2 server 3

Page 78: CS 471 Operating Systems Yue Chengyuecheng/teaching/materials/lec-08... · MapReduce Grep 63 Very big data Split data Split data Split data Split data grep grep grep grep matches

Failure recovery (Cassandra)o No master node, all nodes in “cluster” equal

78

server 1 server 2 server 3

access any table in clusterat any server

that server sends requeststo other servers