cis 455/555: internet and web systems
TRANSCRIPT
© 2021 A. Haeberlen, Z. Ives, V. Liu
CIS 455/555: Internet and Web Systems
1University of Pennsylvania
MapReduce
October 20, 2021
© 2021 A. Haeberlen, Z. Ives, V. Liu
Plan for today
n Google File Systemn Introduction to MapReduce
n Programming modeln Data flown Example tasks
2University of Pennsylvania
NEXT
© 2021 A. Haeberlen, Z. Ives, V. Liu
How Do We Get Parallelism in the Real World?Consider US Census
n There are ~330 million people in the USAn Suppose we are doing the census in person
n 10,000 employees, whose job is to collate census forms and to determine how many people live in each city
n How would you coordinate this task?
https://www.census.gov/programs-surveys/decennial-census/technical-documentation/questionnaires/2020.html
3University of Pennsylvania
© 2021 A. Haeberlen, Z. Ives, V. Liu
Basic Strategy for Canvassing in Parallel
n Send workers out in parallel
n They report back with a stack of filled out forms
n and find a next zone
4University of Pennsylvania
© 2021 A. Haeberlen, Z. Ives, V. Liu
Basic Strategy for Canvassing in Parallel
https://3danim8.files.wordpress.com/2017/07/usa-zip-code-map.jpg
5University of Pennsylvania
© 2021 A. Haeberlen, Z. Ives, V. Liu
The second part: grouping!
As we collect forms: they are from many places… Suppose we want to count by congressional district?
1. Sequential: One person sorts everything!2. Parallel: decompose the work into chunks and work
together:n Divide-and-conquer-based sorting, e.g., merge sortn Bucketing, e.g., bucket sorts, hashing
6University of Pennsylvania
© 2021 A. Haeberlen, Z. Ives, V. Liu
Mergesort in parallelFocusing on 2 workers
Worker 0
Worker 1
Worker 0
Worker 1
Worker 0
Worker 1
Worker 0
7University of Pennsylvania
© 2021 A. Haeberlen, Z. Ives, V. Liu
Hashing in parallel Focusing on 2 workers
Worker 0
Worker 1
Worker 0
Worker 1
8University of Pennsylvania
© 2021 A. Haeberlen, Z. Ives, V. Liu
Counting groups!
count()
count()
count()
count()
count()
4
4
5
7
4
Worker 0
Worker 1
Worker 0
Worker 1
Worker 0
9University of Pennsylvania
© 2021 A. Haeberlen, Z. Ives, V. Liu
What if some of the data is bad?n Assign a task to everyone as they collect census forms –
filter anything that doesn’t pass a sanity check
What about if some people finish far before others?
n Break into many more tasks than we have peoplen Have a centralized coordinator (scheduler) for the remaining
work!
Can we do partial counts?
A few wrinkles
123
211 710
University of Pennsylvania
© 2021 A. Haeberlen, Z. Ives, V. Liu
A dataflow diagram – independent of # of workers
filter count()
count()
count()
count()
count()
4 orange
4 green
5 blue
7 cyan
4 gray
group aggregate
11University of Pennsylvania
© 2021 A. Haeberlen, Z. Ives, V. Liu
Summary of the Intuitions
For very particular kinds of data collecting tasks –we have a highly parallel scheme
n Data fetch, filtern Partition the data into groups – by mergesort or hashingn Aggregate each group
More workers allows more tasks to be done in parallel – up to the maximum number of tasks that don’t have a data dependency!
Let’s now formalize this with a computational framework…
12University of Pennsylvania
© 2021 A. Haeberlen, Z. Ives, V. Liu
MapReduce
n Wouldn't it be nice if there were some system that took care of all these details for you?n But every task is different!n Or is it? The details are different (what to compute, etc.),
but the data flow is often the same!n Maybe we can have a 'generic' solution?
n Ideally, you'd just tell the system what needs to be done
n That's the MapReduce framework.13
University of Pennsylvania
© 2021 A. Haeberlen, Z. Ives, V. Liu
What is MapReduce?n A famous distributed programming modeln In many circles, considered the key building block for
much of Google’s data analysisn A programming language built on it: Sawzall,
http://labs.google.com/papers/sawzall.htmln … Sawzall has become one of the most widely used programming languages at
Google. … [O]n one dedicated Workqueue cluster with 1500 Xeon CPUs, there were 32,580 Sawzall jobs launched, using an average of 220 machines each. While running those jobs, 18,636 failures occurred (application failure, network outage, system crash, etc.) that triggered rerunning some portion of the job. The jobs read a total of 3.2x1015 bytes of data (2.8PB) and wrote 9.9x1012 bytes (9.3TB).
n Other similar languages: Yahoo’s Pig Latin and Pig; Microsoft’s Dryad
n Cloned in open source: Hadoop,http://hadoop.apache.org/core/
University of Pennsylvania14
© 2021 A. Haeberlen, Z. Ives, V. Liu
The MapReduce programming modeln Simple distributed functional programming primitivesn Modeled after Lisp primitives:
map (apply function f to each item x in a collection, creating a new collection with f(x) in its place) andreduce (apply function to set of items with a common key)
n We start with:n A user-defined function to be applied to all data,
map: (item_key, value) à (stack_key, value’)n Another user-specified operation
reduce: (stack_key, {set of value’}) à resultn A set of n nodes, each with data
n All nodes run map on their data, producing new data with keysn This data is collected by key, then there is an implicit shuffle stage,
and finally a reducen Dataflow is through temp files on GFS
University of Pennsylvania15
© 2021 A. Haeberlen, Z. Ives, V. Liu
Simple example: Word count
n Goal: Given a set of documents, count how often each word occursn Input: Key-value pairs (document:lineNumber, text)n Output: Key-value pairs (word, #occurrences)n What should be the intermediate key-value pairs?
University of Pennsylvania
map(String key, String value) {// key: document name, line no// value: contents of line
}
reduce(String key, Iterator values) {
}
for each word w in value:emit(w, "1")
// key: a word// values: a list of countsint result = 0;for each v in values:result += ParseInt(v);
emit(key, result)
16
Key designquestion!
© 2021 A. Haeberlen, Z. Ives, V. Liu
Simple example: Word count
17University of Pennsylvania
Mapper(1-2)
Mapper(3-4)
Mapper(5-6)
Mapper(7-8)
Reducer(A-G)
Reducer(H-N)
Reducer(O-U)
Reducer(V-Z)
(1, the apple)(2, is an apple)(3, not an orange)(4, because the)(5, orange)(6, unlike the apple)(7, is orange)(8, not green)
(the, 1)
(apple, 1)
(is, 1)
(apple, 1)(an, 1)
(not, 1)
(orange, 1)
(an, 1)(because, 1)
(the, 1)(orange, 1)
(unlike, 1)
(apple, 1)
(the, 1)
(is, 1)
(orange, 1)
(not, 1)
(green, 1)
(apple, 3)(an, 2)
(because, 1)(green, 1)
(is, 2)(not, 2)
(orange, 3)(the, 3)
(unlike, 1)
(apple, {1, 1, 1})(an, {1, 1})
(because, {1})(green, {1})
(is, {1, 1})(not, {1, 1})
(orange, {1, 1, 1})(the, {1, 1, 1})
(unlike, {1})
Each mapper receives some of the KV-pairs
as input
The mappersprocess the
KV-pairs one by one
Each KV-pair outputby the mapper is sent to the reducer that is
responsible for it
The reducers sort their input
by key and group it
The reducers process their
input one groupat a time
1 2 3 4 5
Key range the node is responsible for
© 2021 A. Haeberlen, Z. Ives, V. Liu
MapReduce dataflow
18University of Pennsylvania
Mapper
Mapper
Mapper
Mapper
Reducer
Reducer
Reducer
Reducer
Inpu
t dat
a
Outp
ut d
ata
"The Shuffle"
Intermediate (key,value) pairs
What makes this so scalable?
In practice, mappers andreducers usually run on
the same set of machines!
© 2021 A. Haeberlen, Z. Ives, V. Liu
MapReduce system components
n To make this work, we need a few more parts…
n The file system (distributed across all nodes):n Stores the inputs, outputs, and temporary results
n The driver program (executes on one node):n Specifies where to find the inputs, the outputsn Specifies what mapper and reducer to usen Can customize behavior of the execution
n The runtime system (controls nodes):n Supervises the execution of tasksn Esp. JobTracker
19
© 2021 A. Haeberlen, Z. Ives, V. Liu
The Underlying MapReduce data flow
Data partitionsby key
Map computation partitions
Reduce computation
partitions
Redistributionby output’s key
("shuffle")
Coordinator
University of Pennsylvania
(Default MapReduce uses Filesystem)
20
© 2021 A. Haeberlen, Z. Ives, V. Liu
Observe That…
21University of Pennsylvania
n All data is key/value pairs – a simple binary tuple
n Map can be thought of as a bolt that, upon each tuple, filters / restructures the tuple
n Reduce can be thought of a bolt that buffers tuples with a common key
n Connection between map and reduce is done by something analogous to a “fieldGrouping”
© 2021 A. Haeberlen, Z. Ives, V. Liu
What if a node crashes?n How will we know?
n Master pings every worker periodically
n What to do when a worker crashes?n Failed map task on node A: Reexecute on another node B,
and notify all the workers executing reduce tasksn If reduce task has not read all the data from A yet, it will read from B
n Failed reduce task: If not complete yet, reexecute on another node
n Intermediate outputs from map tasks are stored locally on the mapper, whereas outputs from reduce tasks are in the distributed file system
n What to do when the master crashes?n Could periodically checkpoint state & restart from theren Or just abort the computation - is this a good idea?
University of Pennsylvania22
© 2021 A. Haeberlen, Z. Ives, V. Liu
Other challenges
n Localityn Try to schedule map task on machine that already has data
n Task granularityn How many map tasks? How many reduce tasks?
n Dealing with stragglersn Schedule some backup tasks
n Saving bandwidthn E.g., with combiners
n Handling bad recordsn "Last gasp" packet with current sequence number
23University of Pennsylvania
© 2021 A. Haeberlen, Z. Ives, V. Liu
MapReduce as Stream Processing with End-of-Stream
24University of Pennsylvania
File Spout MapBolt
File Spout MapBolt
ReduceBolt
ReduceBolt
4
4
2
2
StreamRouter(FieldBased orRoundRobin)
StreamRouter(FieldBased)
WorkerServer WorkerServer
Printer Bolt
StreamRouter(First)
shuffleGrouping in StormLiteuses hashing to group
Master(Spark
webapp)
POST WorkerJob in JSON to WorkerServersGet updates from WorkerServer background thread
file.0…
file.1…
eos
eos
© 2021 A. Haeberlen, Z. Ives, V. Liu
Summary: MapReduce
25University of Pennsylvania
Three major stages:n map items individually, outputting 0 or more records with
keysn shuffle records by keysn reduce the entries for each key
n Naturally distributes + parallelizesn Can create multi-stage pipelines, loops, etc.
to implement richer algorithms
© 2021 A. Haeberlen, Z. Ives, V. Liu
Plan for today
n Google File Systemn Introduction to MapReduce
n Programming modeln Data flown Example tasks
n Hadoop and HDFSn Architecturen Using Hadoopn Using HDFSn Beyond MapReduce
26University of Pennsylvania
NEXT
© 2021 A. Haeberlen, Z. Ives, V. Liu
Programming for MapReduce
27University of Pennsylvania
n Programming for MapReduce is very much like a callback-based programming model
n map() gets called for each input recordn reduce() gets called for each group
n Internally, the outputs of map() get sorted by a key
n Important: don’t make assumptions about what is shared across calls to map() or reduce()!
© 2021 A. Haeberlen, Z. Ives, V. Liu
Beyond word countn Distributed grep – all lines matching a pattern
University of Pennsylvania28
© 2021 A. Haeberlen, Z. Ives, V. Liu
Input: (k,v) where k is __________ and v is __________
map(key : __________, value : __________) {
}reduce(key : __________, values: __________) {
}
Output: (k,v) where k is __________ and v is _________
29University of Pennsylvania
© 2021 A. Haeberlen, Z. Ives, V. Liu
Beyond word countn Distributed grep – all lines matching a pattern
n Map: filter by patternn Reduce: output set
n Count URL access frequencyn Map: output each URL as key, with count 1n Reduce: sum the counts
n Reverse web-link graphn Map: output (target,source) pairs when link to target
found in soucen Reduce: concatenates values and emits (target,list(source))
n Inverted indexn Map: Emits (word,documentID)n Reduce: Combines these into (word,list(documentID))
University of Pennsylvania30