distributed file systems overview of the dfs ecology mapreduce and hadoop

39
The MapReduce Environment Distributed File Systems Overview of the DFS Ecology MapReduce and Hadoop Jeffrey D. Ullman Stanford University

Upload: mada

Post on 23-Feb-2016

34 views

Category:

Documents


0 download

DESCRIPTION

The MapReduce Environment. Distributed File Systems Overview of the DFS Ecology MapReduce and Hadoop. Jeffrey D. Ullman Stanford University. Distributed File Systems. Chunking Replication Distribution on Racks. Commodity Clusters. Datasets can be very large. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Distributed File  Systems Overview of the DFS Ecology MapReduce and  Hadoop

The MapReduce EnvironmentDistributed File SystemsOverview of the DFS EcologyMapReduce and Hadoop

Jeffrey D. UllmanStanford University

Page 2: Distributed File  Systems Overview of the DFS Ecology MapReduce and  Hadoop

2

Distributed File SystemsChunkingReplicationDistribution on Racks

Page 3: Distributed File  Systems Overview of the DFS Ecology MapReduce and  Hadoop

Commodity Clusters Datasets can be very large.

Tens to hundreds of terabytes. Cannot process on a single server.

Standard architecture emerging: Cluster of commodity Linux nodes (compute nodes). Gigabit Ethernet interconnect.

How to organize computations on this architecture? Mask issues such as hardware failure.

Page 4: Distributed File  Systems Overview of the DFS Ecology MapReduce and  Hadoop

Cluster Architecture

Mem

Disk

CPU

Mem

Disk

CPU

Switch

Each rack contains 16-64 nodes

Mem

Disk

CPU

Mem

Disk

CPU

Switch

Switch1 Gbps between any pair of nodesin a rack

2-10 Gbps backbone between racks

Page 5: Distributed File  Systems Overview of the DFS Ecology MapReduce and  Hadoop

Stable Storage First order problem: if nodes can fail, how can

we store data persistently? Answer: Distributed File System.

Provides global file namespace. Examples: Google GFS, Colossus; Hadoop HDFS.

Typical usage pattern: Huge files. Data is rarely updated in place. Reads and appends are common.

Page 6: Distributed File  Systems Overview of the DFS Ecology MapReduce and  Hadoop

Distributed File System Chunk Servers.

File is split into contiguous chunks, typically 64MB. Each chunk replicated (usually 2x or 3x). Try to keep replicas in different racks. Alternative: Erasure coding.

Master Node for a file. Stores metadata, location of all chunks. Possibly replicated.

Page 7: Distributed File  Systems Overview of the DFS Ecology MapReduce and  Hadoop

7

Compute Nodes Organized into racks. Intra-rack connection typically gigabit speed. Inter-rack connection faster by a small factor.

Page 8: Distributed File  Systems Overview of the DFS Ecology MapReduce and  Hadoop

8

Racks of Compute Nodes

File

Chunks

Page 9: Distributed File  Systems Overview of the DFS Ecology MapReduce and  Hadoop

9

3-way replication offiles, with copies ondifferent racks.

Page 10: Distributed File  Systems Overview of the DFS Ecology MapReduce and  Hadoop

10

Above the DFSMapReduceKey-Value StoresSQL Implementations

Page 11: Distributed File  Systems Overview of the DFS Ecology MapReduce and  Hadoop

11

The New Stack

Distributed File System

MapReduce, e.g.Hadoop

Object Store (key-valuestore), e.g., BigTable,

Hbase, Cassandra

SQL Implementations,e.g., PIG (relational

algebra), HIVE

Page 12: Distributed File  Systems Overview of the DFS Ecology MapReduce and  Hadoop

12

MapReduce Systems MapReduce (Google) and open-source (Apache)

equivalent Hadoop. Important specialized parallel computing tool. Cope with compute-node failures.

Avoid restart of the entire job.

Page 13: Distributed File  Systems Overview of the DFS Ecology MapReduce and  Hadoop

13

Key-Value Stores BigTable (Google), Hbase, Cassandra (Apache),

Dynamo (Amazon). Each row is a key plus values over a flexible set of

columns. Each column component can be a set of values.

Example: Structure of the Web. Key is a URL. One column is a set of URL’s – those linked to the

page represented by the key. A second column is the set of URL’s linking to the key.

Page 14: Distributed File  Systems Overview of the DFS Ecology MapReduce and  Hadoop

14

SQL-Like Systems PIG – Yahoo! implementation of relational

algebra. Translates to a sequence of map-reduce

operations, using Hadoop. Hive – open-source (Apache) implementation

of a restricted SQL, called QL, over Hadoop.

Page 15: Distributed File  Systems Overview of the DFS Ecology MapReduce and  Hadoop

15

SQL-Like Systems – (2) Sawzall – Google implementation of parallel

select + aggregation, but using C++. Dremel – (Google) real restricted SQL, column

oriented store. F1 – (Google) row-oriented, conventional, but

massive scale. Scope – Microsoft implementation of restricted

SQL.

Page 16: Distributed File  Systems Overview of the DFS Ecology MapReduce and  Hadoop

16

MapReduceFormal DefinitionImplementationFault-ToleranceExamples: Word-Count, Join

Page 17: Distributed File  Systems Overview of the DFS Ecology MapReduce and  Hadoop

MapReduce Input: a set of key/value pairs. User supplies two functions:

map(k,v) set(k1,v1) reduce(k1, list(v1)) set(v2)

Technically, the input consists of key-value pairs of some type, but usually only the value is important.

(k1,v1) is an intermediate key/value pair. Output is the set of (k1,v2) pairs.

Page 18: Distributed File  Systems Overview of the DFS Ecology MapReduce and  Hadoop

18

Map Tasks and Reduce Tasks MapReduce job =

Map function (inputs -> key-value pairs) + Reduce function (key and list of values -> outputs).

Map and Reduce Tasks apply Map or Reduce function to (typically) many of their inputs. Unit of parallelism.

Page 19: Distributed File  Systems Overview of the DFS Ecology MapReduce and  Hadoop

19

Behind the Scenes The Map tasks generate key-value pairs.

Each takes one or more chunks of input from the distributed file system.

The system takes all the key-value pairs from all the Map tasks and sorts them by key.

Then, it forms key-(list-of-associated-values) pairs and passes each key-(value-list) pair to one of the Reduce tasks.

Page 20: Distributed File  Systems Overview of the DFS Ecology MapReduce and  Hadoop

20

MapReduce Pattern

Maptasks

Reducetasks

InputfromDFS

Outputto DFS

“key”-value pairs

Page 21: Distributed File  Systems Overview of the DFS Ecology MapReduce and  Hadoop

Example: Word Count We have a large file documents, which are

sequences of words. Count the number of times each distinct word

appears in the file.

Page 22: Distributed File  Systems Overview of the DFS Ecology MapReduce and  Hadoop

Word Count Using MapReduce

map(key, value):// key: document name; value: text of document

FOR (each word w in value)emit(w, 1);

reduce(key, value-list):// key: a word; value: an iterator over value-list

result = 0;FOR (each count v on value-list)

result += v;emit(result);

Page 23: Distributed File  Systems Overview of the DFS Ecology MapReduce and  Hadoop

Distributed Execution Overview

UserProgram

Worker

Worker

Master

Worker

Worker

Worker

fork fork fork

assignmap assign

reduce

readlocalwrite

remoteread,sort

OutputFile 0

OutputFile 1

writeChunk 0Chunk1Chunk 2

Input Data

Page 24: Distributed File  Systems Overview of the DFS Ecology MapReduce and  Hadoop

Data Management Input and final output are stored in the

distributed file system. Scheduler tries to schedule Map tasks “close” to

physical storage location of input data – preferably at the same node.

Intermediate results are stored on local file storage of Map and Reduce workers.

Page 25: Distributed File  Systems Overview of the DFS Ecology MapReduce and  Hadoop

The Master Task Maintain task status: (idle, active, completed). Idle tasks get scheduled as workers become

available. When a Map task completes, it sends the

Master the location and sizes of its intermediate files, one for each Reduce task.

Master pushes location of intermediates to Reduce tasks.

Master pings workers periodically to detect failures.

Page 26: Distributed File  Systems Overview of the DFS Ecology MapReduce and  Hadoop

How Many Map and Reduce Tasks? Rule of thumb: Use several times more Map

tasks and Reduce tasks than the number of compute nodes available. Minimizes skew caused by different tasks taking

different amounts of time. One DFS chunk per Map task is common.

Page 27: Distributed File  Systems Overview of the DFS Ecology MapReduce and  Hadoop

Combiners Often a Map task will produce many pairs of the

form (k,v1), (k,v2), … for the same key k. E.g., popular words in Word Count.

Can save communication time by applying Reduce function to values with the same key at the Map task. Called a combiner.

Works only if Reduce function is commutative and associative.

Page 28: Distributed File  Systems Overview of the DFS Ecology MapReduce and  Hadoop

Partition Function We need to assure that records with the same

intermediate key end up at the same Reduce task.

System uses a default partition function e.g., hash(key) mod R, if there are R Reduce tasks.

Sometimes useful to override. Example: hash(hostname(URL)) mod R ensures URLs

from a host end up at the same Reduce task and therefore appear together in the output.

Page 29: Distributed File  Systems Overview of the DFS Ecology MapReduce and  Hadoop

29

Coping With Failures MapReduce is designed to deal with

compute nodes failing to execute a task. Re-executes failed tasks, not whole jobs. Failure modes:

1. Compute-node failure (e.g., disk crash).2. Rack communication failure.3. Software failures, e.g., a task requires Java n;

node has Java n-1.

Page 30: Distributed File  Systems Overview of the DFS Ecology MapReduce and  Hadoop

30

Things MapReduce is Good At1. Matrix-Matrix and Matrix-vector

multiplication. One step of the PageRank iteration was the original

application.2. Relational algebra operations.

We’ll do an example of the join.3. Many other “embarrassingly parallel”

operations.

Page 31: Distributed File  Systems Overview of the DFS Ecology MapReduce and  Hadoop

31

Review of Terminology Map-Reduce job =

Map function (inputs -> key-value pairs) + Reduce function (key and list of values -> outputs).

Map and Reduce Tasks apply Map or Reduce function to (typically) many of their inputs. Unit of parallelism.

Mapper = application of the Map function to a single input.

Reducer = application of the Reduce function to a single key-(list of values) pair.

Page 32: Distributed File  Systems Overview of the DFS Ecology MapReduce and  Hadoop

32

Example: Natural Join Join of R(A,B) with S(B,C) is the set of tuples

(a,b,c) such that (a,b) is in R and (b,c) is in S. Mappers need to send R(a,b) and S(b,c) to the

same reducer, so they can be joined there. Mapper output: key = B-value, value = relation

and other component (A or C). Example: R(1,2) -> (2, (R,1))

S(2,3) -> (2, (S,3))

Page 33: Distributed File  Systems Overview of the DFS Ecology MapReduce and  Hadoop

33

Mapping Tuples

Mapper for

R(1,2)R(1,2) (2, (R,1))

Mapper for

R(4,2)R(4,2)

Mapper for

S(2,3)S(2,3)

Mapper for

S(5,6)S(5,6)

(2, (R,4))

(2, (S,3))

(5, (S,6))

Page 34: Distributed File  Systems Overview of the DFS Ecology MapReduce and  Hadoop

34

Grouping Phase There is a reducer for each key. Every key-value pair generated by any mapper is

sent to the reducer for its key.

Page 35: Distributed File  Systems Overview of the DFS Ecology MapReduce and  Hadoop

35

Mapping Tuples

Mapper for

R(1,2)(2, (R,1))

Mapper for

R(4,2)

Mapper for

S(2,3)

Mapper for

S(5,6)

(2, (R,4))

(2, (S,3))

(5, (S,6))

Reducerfor B = 2

Reducerfor B = 5

Page 36: Distributed File  Systems Overview of the DFS Ecology MapReduce and  Hadoop

36

Constructing Value-Lists The input to each reducer is organized by the

system into a pair: The key. The list of values associated with that key.

Page 37: Distributed File  Systems Overview of the DFS Ecology MapReduce and  Hadoop

37

The Value-List Format

Reducerfor B = 2

Reducerfor B = 5

(2, [(R,1), (R,4), (S,3)])

(5, [(S,6)])

Page 38: Distributed File  Systems Overview of the DFS Ecology MapReduce and  Hadoop

38

The Reduce Function for Join Given key b and a list of values that are either

(R, ai) or (S, cj), output each triple (ai, b, cj). Thus, the number of outputs made by a reducer is

the product of the number of R’s on the list and the number of S’s on the list.

Page 39: Distributed File  Systems Overview of the DFS Ecology MapReduce and  Hadoop

39

Output of the Reducers

Reducerfor B = 2

Reducerfor B = 5

(2, [(R,1), (R,4), (S,3)])

(5, [(S,6)])

(1,2,3), (4,2,3)