scalable algorithm design with mapreduce
TRANSCRIPT
Scalable Algorithm DesignThe MapReduce Programming Model
Pietro Michiardi
Eurecom
Pietro Michiardi (Eurecom) Scalable Algorithm Design 1 / 62
Sources and Acks
Jimmy Lin and Chris Dyer, “Data-Intensive Text Processing withMapReduce,” Morgan & Claypool Publishers, 2010.http://lintool.github.io/MapReduceAlgorithms/
Tom White, “Hadoop, The Definitive Guide,” O’Reilly / YahooPress, 2012
Anand Rajaraman, Jeffrey D. Ullman, Jure Leskovec, “Mining ofMassive Datasets”, Cambridge University Press, 2013
Pietro Michiardi (Eurecom) Scalable Algorithm Design 2 / 62
Key Principles
Key Principles
Pietro Michiardi (Eurecom) Scalable Algorithm Design 3 / 62
Key Principles
Scale out, not up!
For data-intensive workloads, a large number of commodityservers is preferred over a small number of high-end servers
I Cost of super-computers is not linearI But datacenter efficiency is a difficult problem to solve [1, 2]
Some numbers (∼ 2012):I Data processed by Google every day: 100+ PBI Data processed by Facebook every day: 10+ PB
Pietro Michiardi (Eurecom) Scalable Algorithm Design 4 / 62
Key Principles
Implications of Scaling Out
Processing data is quick, I/O is very slowI 1 HDD = 75 MB/secI 1000 HDDs = 75 GB/sec
Sharing vs. Shared nothing:I Sharing: manage a common/global stateI Shared nothing: independent entities, no common state
Sharing is difficult:I Synchronization, deadlocksI Finite bandwidth to access data from SANI Temporal dependencies are complicated (restarts)
Pietro Michiardi (Eurecom) Scalable Algorithm Design 5 / 62
Key Principles
Failures are the norm, not the exception
LALN data [DSN 2006]I Data for 5000 machines, for 9 yearsI Hardware: 60%, Software: 20%, Network 5%
DRAM error analysis [Sigmetrics 2009]I Data for 2.5 yearsI 8% of DIMMs affected by errors
Disk drive failure analysis [FAST 2007]I Utilization and temperature major causes of failures
Amazon Web Service(s) failures [Several!]I Cascading effect
Pietro Michiardi (Eurecom) Scalable Algorithm Design 6 / 62
Key Principles
Implications of Failures
Failures are part of everyday lifeI Mostly due to the scale and shared environment
Sources of FailuresI Hardware / SoftwareI Electrical, Cooling, ...I Unavailability of a resource due to overload
Failure TypesI PermanentI Transient
Pietro Michiardi (Eurecom) Scalable Algorithm Design 7 / 62
Key Principles
Move Processing to the Data
Drastic departure from high-performance computing modelI HPC: distinction between processing nodes and storage nodesI HPC: CPU intensive tasks
Data intensive workloadsI Generally not processor demandingI The network becomes the bottleneckI MapReduce assumes processing and storage nodes to be
collocated→ Data Locality Principle
Distributed filesystems are necessary
Pietro Michiardi (Eurecom) Scalable Algorithm Design 8 / 62
Key Principles
Process Data Sequentially and Avoid Random Access
Data intensive workloadsI Relevant datasets are too large to fit in memoryI Such data resides on disks
Disk performance is a bottleneckI Seek times for random disk access are the problem
F Example: 1 TB DB with 1010 100-byte records. Updates on 1%requires 1 month, reading and rewriting the whole DB would take 1day1
I Organize computation for sequential reads
1From a post by Ted Dunning on the Hadoop mailing listPietro Michiardi (Eurecom) Scalable Algorithm Design 9 / 62
Key Principles
Implications of Data Access Patterns
MapReduce is designed for:I Batch processingI involving (mostly) full scans of the data
Typically, data is collected “elsewhere” and copied to thedistributed filesystem
I E.g.: Apache Flume, Hadoop Sqoop, · · ·
Data-intensive applicationsI Read and process the whole Web (e.g. PageRank)I Read and process the whole Social Graph (e.g. LinkPrediction,
a.k.a. “friend suggest”)I Log analysis (e.g. Network traces, Smart-meter data, · · · )
Pietro Michiardi (Eurecom) Scalable Algorithm Design 10 / 62
Key Principles
Hide System-level Details
Separate the what from the howI MapReduce abstracts away the “distributed” part of the systemI Such details are handled by the framework
BUT: In-depth knowledge of the framework is keyI Custom data reader/writerI Custom data partitioningI Memory utilization
Auxiliary componentsI Hadoop PigI Hadoop HiveI Cascading/ScaldingI ... and many many more!
Pietro Michiardi (Eurecom) Scalable Algorithm Design 11 / 62
Key Principles
Seamless Scalability
We can define scalability along two dimensionsI In terms of data: given twice the amount of data, the same
algorithm should take no more than twice as long to runI In terms of resources: given a cluster twice the size, the same
algorithm should take no more than half as long to run
Embarrassingly parallel problemsI Simple definition: independent (shared nothing) computations on
fragments of the datasetI How to to decide if a problem is embarrassingly parallel or not?
MapReduce is a first attempt, not the final answer
Pietro Michiardi (Eurecom) Scalable Algorithm Design 12 / 62
The Programming Model
The Programming Model
Pietro Michiardi (Eurecom) Scalable Algorithm Design 13 / 62
The Programming Model
Functional Programming RootsKey feature: higher order functions
I Functions that accept other functions as argumentsI Map and Fold
Figure: Illustration of map and fold.
Pietro Michiardi (Eurecom) Scalable Algorithm Design 14 / 62
The Programming Model
Functional Programming Roots
map phase:I Given a list, map takes as an argument a function f (that takes a
single argument) and applies it to all element in a list
fold phase:I Given a list, fold takes as arguments a function g (that takes two
arguments) and an initial value (an accumulator)I g is first applied to the initial value and the first item in the listI The result is stored in an intermediate variable, which is used as an
input together with the next item to a second application of gI The process is repeated until all items in the list have been
consumed
Pietro Michiardi (Eurecom) Scalable Algorithm Design 15 / 62
The Programming Model
Functional Programming Roots
We can view map as a transformation over a datasetI This transformation is specified by the function fI Each functional application happens in isolationI The application of f to each element of a dataset can be
parallelized in a straightforward manner
We can view fold as an aggregation operationI The aggregation is defined by the function gI Data locality: elements in the list must be “brought together”I If we can group elements of the list, also the fold phase can
proceed in parallel
Associative and commutative operationsI Allow performance gains through local aggregation and reordering
Pietro Michiardi (Eurecom) Scalable Algorithm Design 16 / 62
The Programming Model
Functional Programming and MapReduce
Equivalence of MapReduce and Functional Programming:I The map of MapReduce corresponds to the map operationI The reduce of MapReduce corresponds to the fold operation
The framework coordinates the map and reduce phases:I Grouping intermediate results happens in parallel
In practice:I User-specified computation is applied (in parallel) to all input
records of a datasetI Intermediate results are aggregated by another user-specified
computation
Pietro Michiardi (Eurecom) Scalable Algorithm Design 17 / 62
The Programming Model
What can we do with MapReduce?
MapReduce “implements” a subset of functionalprogramming
I The programming model appears quite limited and strict
There are several important problems that can be adapted toMapReduce
I We will focus on illustrative casesI We will see in detail “design patterns”
F How to transform a problem and its inputF How to save memory and bandwidth in the system
Pietro Michiardi (Eurecom) Scalable Algorithm Design 18 / 62
The Programming Model
Data Structures
Key-value pairs are the basic data structure in MapReduceI Keys and values can be: integers, float, strings, raw bytesI They can also be arbitrary data structures
The design of MapReduce algorithms involves:I Imposing the key-value structure on arbitrary datasets2
F E.g.: for a collection of Web pages, input keys may be URLs andvalues may be the HTML content
I In some algorithms, input keys are not used, in others they uniquelyidentify a record
I Keys can be combined in complex ways to design variousalgorithms
2There’s more about it: here we only look at the input to the map function.Pietro Michiardi (Eurecom) Scalable Algorithm Design 19 / 62
The Programming Model
A Generic MapReduce Algorithm
The programmer defines a mapper and a reducer asfollows34:
I map: (k1, v1)→ [(k2, v2)]I reduce: (k2, [v2])→ [(k3, v3)]
In words:I A dataset stored on an underlying distributed filesystem, which is
split in a number of blocks across machinesI The mapper is applied to every input key-value pair to generate
intermediate key-value pairsI The reducer is applied to all values associated with the same
intermediate key to generate output key-value pairs
3We use the convention [· · · ] to denote a list.4Pedices indicate different data types.Pietro Michiardi (Eurecom) Scalable Algorithm Design 20 / 62
The Programming Model
Where the magic happens
Implicit between the map and reduce phases is a parallel“group by” operation on intermediate keys
I Intermediate data arrive at each reducer in order, sorted by the keyI No ordering is guaranteed across reducers
Output keys from reducers are written back to the distributedfilesystem
I The output may consist of r distinct files, where r is the number ofreducers
I Such output may be the input to a subsequent MapReduce phase5
Intermediate keys are transient:I They are not stored on the distributed filesystemI They are “spilled” to the local disk of each machine in the cluster
5Think of iterative algorithms.Pietro Michiardi (Eurecom) Scalable Algorithm Design 21 / 62
The Programming Model
“Hello World” in MapReduce
1: class MAPPER
2: method MAP(offset a, line l)3: for all term t ∈ line l do4: EMIT(term t , count 1)
1: class REDUCER
2: method REDUCE(term t , counts [c1, c2, . . .])3: sum← 04: for all count c ∈ counts [c1, c2, . . .] do5: sum← sum + c6: EMIT(term t , count sum)
Pietro Michiardi (Eurecom) Scalable Algorithm Design 22 / 62
The Programming Model
Pietro Michiardi (Eurecom) Scalable Algorithm Design 23 / 62
The Programming Model
“Hello World” in MapReduce
Input:I Key-value pairs: (offset, line) of a file stored on the distributed
filesystemI a: unique identifier of a line offsetI l: is the text of the line itself
Mapper:I Takes an input key-value pair, tokenize the lineI Emits intermediate key-value pairs: the word is the key and the
integer is the valueThe framework:
I Guarantees all values associated with the same key (the word) arebrought to the same reducer
The reducer:I Receives all values associated to some keysI Sums the values and writes output key-value pairs: the key is the
word and the value is the number of occurrences
Pietro Michiardi (Eurecom) Scalable Algorithm Design 24 / 62
The Programming Model
Combiners
Combiners are a general mechanism to reduce the amount ofintermediate data
I They could be thought of as “mini-reducers”
Back to our running example: word countI Combiners aggregate term counts across documents processed by
each map taskI If combiners take advantage of all opportunities for local
aggregation we have at most m × V intermediate key-value pairsF m: number of mappersF V : number of unique terms in the collection
I Note: due to Zipfian nature of term distributions, not all mappers willsee all terms
Pietro Michiardi (Eurecom) Scalable Algorithm Design 25 / 62
The Programming Model
A word of caution
The use of combiners must be thought carefullyI In Hadoop, they are optional: the correctness of the algorithm
cannot depend on computation (or even execution) of thecombiners
Combiners I/O typesI Input: (k2, [v2]) [Same input as for Reducers]I Output: [(k2, v2)] [Same output as for Mappers]
Commutative and Associative computationsI Reducer and Combiner code may be interchangeable (e.g. Word
Count)I This is not true in the general case
Pietro Michiardi (Eurecom) Scalable Algorithm Design 26 / 62
The Programming Model
Pietro Michiardi (Eurecom) Scalable Algorithm Design 27 / 62
The Programming Model
Algorithmic Correctness: an ExampleProblem statement
I We have a large dataset where input keys are strings and inputvalues are integers
I We wish to compute the mean of all integers associated with thesame key
F In practice: the dataset can be a log from a website, where the keysare user IDs and values are some measure of activity
Next, a baseline approachI We use an identity mapper, which groups and sorts appropriately
input key-value pairsI Reducers keep track of running sum and the number of integers
encounteredI The mean is emitted as the output of the reducer, with the input
string as the key
Inefficiency problems in the shuffle phasePietro Michiardi (Eurecom) Scalable Algorithm Design 28 / 62
The Programming Model
Example: Computing the mean
1: class MAPPER
2: method MAP(string t , integer r )3: EMIT(string t , integer r)
1: class REDUCER
2: method REDUCE(string t , integers [r1, r2, . . .])3: sum← 04: cnt ← 05: for all integer r ∈ integers [r1, r2, . . .] do6: sum← sum + r7: cnt ← cnt + 18: ravg ← sum/cnt9: EMIT(string t , integer ravg)
Pietro Michiardi (Eurecom) Scalable Algorithm Design 29 / 62
The Programming Model
Algorithmic Correctness
Note: operations are not distributiveI Mean(1,2,3,4,5) 6= Mean(Mean(1,2), Mean(3,4,5))I Hence: a combiner cannot output partial means and hope that the
reducer will compute the correct final mean
Rule of thumb:I Combiners are optimizations, the algorithm should work even when
“removing” them
Pietro Michiardi (Eurecom) Scalable Algorithm Design 30 / 62
The Programming Model
Example: Computing the mean with combiners
1: class MAPPER2: method MAP(string t , integer r )3: EMIT(string t, pair (r, 1))
1: class COMBINER2: method COMBINE(string t , pairs [(s1, c1), (s2, c2) . . .])3: sum ← 04: cnt ← 05: for all pair (s, c) ∈ pairs [(s1, c1), (s2, c2) . . .] do6: sum ← sum + s7: cnt ← cnt + c8: EMIT(string t, pair (sum, cnt))
1: class REDUCER2: method REDUCE(string t , pairs [(s1, c1), (s2, c2) . . .])3: sum ← 04: cnt ← 05: for all pair (s, c) ∈ pairs [(s1, c1), (s2, c2) . . .] do6: sum ← sum + s7: cnt ← cnt + c8: ravg ← sum/cnt
9: EMIT(string t, integer ravg )
Pietro Michiardi (Eurecom) Scalable Algorithm Design 31 / 62
Basic Design Patterns
Basic Design Patterns
Pietro Michiardi (Eurecom) Scalable Algorithm Design 32 / 62
Basic Design Patterns
Algorithm Design
Developing algorithms involve:I Preparing the input dataI Implement the mapper and the reducerI Optionally, design the combiner and the partitioner
How to recast existing algorithms in MapReduce?I It is not always obvious how to express algorithmsI Data structures play an important roleI Optimization is hard→ The designer needs to “bend” the framework
Learn by examplesI “Design patterns”I “Shuffle” is perhaps the most tricky aspect
Pietro Michiardi (Eurecom) Scalable Algorithm Design 33 / 62
Basic Design Patterns
Algorithm Design
Aspects that are not under the control of the designerI Where a mapper or reducer will runI When a mapper or reducer begins or finishesI Which input key-value pairs are processed by a specific mapperI Which intermediate key-value pairs are processed by a specific
reducer
Aspects that can be controlledI Construct data structures as keys and valuesI Execute user-specified initialization and termination code for
mappers and reducersI Preserve state across multiple input and intermediate keys in
mappers and reducersI Control the sort order of intermediate keys, and therefore the order
in which a reducer will encounter particular keysI Control the partitioning of the key space, and therefore the set of
keys that will be encountered by a particular reducer
Pietro Michiardi (Eurecom) Scalable Algorithm Design 34 / 62
Basic Design Patterns
Algorithm Design
MapReduce algorithms can be complexI Many algorithms cannot be easily expressed as a single
MapReduce jobI Decompose complex algorithms into a sequence of jobs
F Requires orchestrating data so that the output of one job becomesthe input to the next
I Iterative algorithms require an external driver to check forconvergence
Basic design patterns6
I Local AggregationI Pairs and StripesI Order inversion
6You will see them in action during the laboratory sessions.Pietro Michiardi (Eurecom) Scalable Algorithm Design 35 / 62
Basic Design Patterns Local Aggregation
Local Aggregation
In the context of data-intensive distributed processing, themost important aspect of synchronization is the exchange ofintermediate results
I This involves copying intermediate results from the processes thatproduced them to those that consume them
I In general, this involves data transfers over the networkI In Hadoop, also disk I/O is involved, as intermediate results are
written to disk
Network and disk latencies are expensiveI Reducing the amount of intermediate data translates into
algorithmic efficiency
Combiners and preserving state across inputsI Reduce the number and size of key-value pairs to be shuffled
Pietro Michiardi (Eurecom) Scalable Algorithm Design 36 / 62
Basic Design Patterns Local Aggregation
In-Mapper Combiners
In-Mapper Combiners, a possible improvement over vanillaCombiners
I Hadoop does not7 guarantee combiners to be executedI Combiners can be costly in terms of CPU and I/O
Use an associative array to cumulate intermediate resultsI The array is used to tally up term counts within a single “document”I The Emit method is called only after all InputRecords have been
processed
Example (see next slide)I The code emits a key-value pair for each unique term in the
document7Actually, combiners are not called if the number of map output records is less than
a small threshold, i.e., 4Pietro Michiardi (Eurecom) Scalable Algorithm Design 37 / 62
Basic Design Patterns Local Aggregation
In-Memory Combiners
1: class MAPPER
2: method MAP(offset a, line l)3: H ← new AssociativeArray4: for all term t ∈ line l do5: H{t} ← H{t}+ 16: for all term t ∈ H do7: EMIT(term t , count H{t})
Pietro Michiardi (Eurecom) Scalable Algorithm Design 38 / 62
Basic Design Patterns Local Aggregation
In-Memory Combiners
Taking the idea one step furtherI Exploit implementation details in HadoopI A Java mapper object is created for each map taskI JVM reuse must be enabled
Preserve state within and across calls to the Map methodI Initialize method, used to create an across-map, persistent
data structureI Close method, used to emit intermediate key-value pairs only
when all map task scheduled on one machine are done
Pietro Michiardi (Eurecom) Scalable Algorithm Design 39 / 62
Basic Design Patterns Local Aggregation
In-Memory Combiners
1: class MAPPER
2: method INITIALIZE
3: H ← new AssociativeArray4: method MAP(offset a, line l)5: for all term t ∈ line l do6: H{t} ← H{t}+ 17: method CLOSE
8: for all term t ∈ H do9: EMIT(term t , count H{t})
Pietro Michiardi (Eurecom) Scalable Algorithm Design 40 / 62
Basic Design Patterns Local Aggregation
In-Memory Combiners
Summing up: a first “design pattern”, in-memory combiningI Provides control over when local aggregation occursI Designer can determine how exactly aggregation is done
Efficiency vs. CombinersI There is no additional overhead due to the materialization of
key-value pairsF Un-necessary object creation and destruction (garbage collection)F Serialization, deserialization when memory bounded
I With combiners, mappers still need to emit all key-value pairs;combiners “only” reduce network traffic
Pietro Michiardi (Eurecom) Scalable Algorithm Design 41 / 62
Basic Design Patterns Local Aggregation
In-Memory Combiners
PrecautionsI In-memory combining breaks the functional programming paradigm
due to state preservationI Preserving state across multiple instances implies that algorithm
behavior might depend on execution orderF Works well with commutative / associative operationsF Otherwise, order-dependent bugs are difficult to find
Memory capacity is limitedI In-memory combining strictly depends on having sufficient memory
to store intermediate resultsI A possible solution: “block” and “flush”
Pietro Michiardi (Eurecom) Scalable Algorithm Design 42 / 62
Basic Design Patterns Local Aggregation
Further Remarks
The extent to which efficiency can be increased with localaggregation depends on the size of the intermediate keyspace
I Opportunities for aggregation arise when multiple values areassociated to the same keys
Local aggregation also effective to deal with reducestragglers
I Reduce the number of values associated with frequently occurringkeys
Pietro Michiardi (Eurecom) Scalable Algorithm Design 43 / 62
Basic Design Patterns Local Aggregation
Computing the average, with in-mapper combinersPartial sums and counts are held in memory (across inputs)Intermediate values are emitted only after the entire input split isprocessedThe output value is a pair
1: class MAPPER
2: method INITIALIZE
3: S ← new AssociativeArray4: C ← new AssociativeArray5: method MAP(term t , integer r )6: S{t} ← S{t}+ r7: C{t} ← C{t}+ 18: method CLOSE
9: for all term t ∈ S do10: EMIT(term t , pair (S{t},C{t}))
Pietro Michiardi (Eurecom) Scalable Algorithm Design 44 / 62
Basic Design Patterns Pairs and Stripes
Pairs and Stripes
A common approach in MapReduce: build complex keysI Use the framework to group data together
Two basic techniques:I Pairs: similar to the example on the averageI Stripes: uses in-mapper memory data structures
Next, we focus on a particular problem that benefits fromthese two methods
Pietro Michiardi (Eurecom) Scalable Algorithm Design 45 / 62
Basic Design Patterns Pairs and Stripes
Problem statement
The problem: building word co-occurrence matrices for largecorpora
I The co-occurrence matrix of a corpus is a square n × n matrix, MI n is the number of unique words (i.e., the vocabulary size)I A cell mij contains the number of times the word wi co-occurs with
word wj within a specific contextI Context: a sentence, a paragraph a document or a window of m
wordsI NOTE: the matrix may be symmetric in some cases
MotivationI This problem is a basic building block for more complex operationsI Estimating the distribution of discrete joint events from a large
number of observationsI Similar problem in other domains:
F Customers who buy this tend to also buy that
Pietro Michiardi (Eurecom) Scalable Algorithm Design 46 / 62
Basic Design Patterns Pairs and Stripes
Observations
Space requirementsI Clearly, the space requirement is O(n2), where n is the size of the
vocabularyI For real-world (English) corpora n can be hundreds of thousands of
words, or even billions of worlds in some specific cases
So what’s the problem?I If the matrix can fit in the memory of a single machine, then just use
whatever naive implementationI Instead, if the matrix is bigger than the available memory, then
paging would kick in, and any naive implementation would break
CompressionI Such techniques can help in solving the problem on a single
machineI However, there are scalability problems
Pietro Michiardi (Eurecom) Scalable Algorithm Design 47 / 62
Basic Design Patterns Pairs and Stripes
Word co-occurrence: the Pairs approach
Pietro Michiardi (Eurecom) Scalable Algorithm Design 48 / 62
Basic Design Patterns Pairs and Stripes
Word co-occurrence: the Pairs approachInput to the problem
I Key-value pairs in the form of a offset and a line
The mapper:I Processes each input documentI Emits key-value pairs with:
F Each co-occurring word pair as the keyF The integer one (the count) as the value
I This is done with two nested loops:F The outer loop iterates over all wordsF The inner loop iterates over all neighbors
The reducer:I Receives pairs related to co-occurring words
F This requires modifying the partitionerI Computes an absolute count of the joint eventI Emits the pair and the count as the final key-value output
F Basically reducers emit the cells of the output matrix
Pietro Michiardi (Eurecom) Scalable Algorithm Design 49 / 62
Basic Design Patterns Pairs and Stripes
Word co-occurrence: the Pairs approach
1: class MAPPER
2: method MAP(offset a, line l)3: for all term w ∈ line l do4: for all term u ∈ NEIGHBORS(w) do5: EMIT (pair (w ,u), count 1)6: class REDUCER
7: method REDUCE(pair p, counts [c1, c2, · · · ])8: s ← 09: for all count c ∈ counts [c1, c2, · · · ] do
10: s ← s + c11: EMIT (pair p, count s)
Pietro Michiardi (Eurecom) Scalable Algorithm Design 50 / 62
Basic Design Patterns Pairs and Stripes
Word co-occurrence: the Stripes approach
Pietro Michiardi (Eurecom) Scalable Algorithm Design 51 / 62
Basic Design Patterns Pairs and Stripes
Word co-occurrence: the Stripes approach
Input to the problemI Key-value pairs in the form of a offset and a line
The mapper:I Same two nested loops structure as beforeI Co-occurrence information is first stored in an associative arrayI Emit key-value pairs with words as keys and the corresponding
arrays as values
The reducer:I Receives all associative arrays related to the same wordI Performs an element-wise sum of all associative arrays with the
same keyI Emits key-value output in the form of word, associative array
F Basically, reducers emit rows of the co-occurrence matrix
Pietro Michiardi (Eurecom) Scalable Algorithm Design 52 / 62
Basic Design Patterns Pairs and Stripes
Word co-occurrence: the Stripes approach
1: class MAPPER
2: method MAP(offset a, line l)3: for all term w ∈ line l do4: H ← new AssociativeArray5: for all term u ∈ NEIGHBORS(w) do6: H{u} ← H{u}+ 17: EMIT (term w ,Stripe H)
8: class REDUCER
9: method REDUCE(term w , Stripes [H1,H2,H3 · · · ])10: Hf ← new AssociativeArray11: for all Stripe H ∈ Stripes [H1,H2,H3 · · · ] do12: SUM(Hf ,H)
13: EMIT (term w ,Stripe Hf )
Pietro Michiardi (Eurecom) Scalable Algorithm Design 53 / 62
Basic Design Patterns Pairs and Stripes
Pairs and Stripes, a comparison
The pairs approachI Generates a large number of key-value pairs
F In particular, intermediate ones, that fly over the networkI The benefit from combiners is limited, as it is less likely for a
mapper to process multiple occurrences of a wordI Does not suffer from memory paging problems
The stripes approachI More compactI Generates fewer and shorted intermediate keys
F The framework has less sorting to doI The values are more complex and have serialization /
deserialization overheadI Greatly benefits from combiners, as the key space is the vocabularyI Suffers from memory paging problems, if not properly engineered
Pietro Michiardi (Eurecom) Scalable Algorithm Design 54 / 62
Basic Design Patterns Order Inversion
Computing relative frequencies
“Relative” Co-occurrence matrix constructionI Similar problem as before, same matrixI Instead of absolute counts, we take into consideration the fact that
some words appear more frequently than othersF Word wi may co-occur frequently with word wj simply because one of
the two is very commonI We need to convert absolute counts to relative frequencies f (wj |wi)
F What proportion of the time does wj appear in the context of wi?
Formally, we compute:
f (wj |wi) =N(wi ,wj)∑w ′ N(wi ,w ′)
I N(·, ·) is the number of times a co-occurring word pair is observedI The denominator is called the marginal
Pietro Michiardi (Eurecom) Scalable Algorithm Design 55 / 62
Basic Design Patterns Order Inversion
Computing relative frequencies
The stripes approachI In the reducer, the counts of all words that co-occur with the
conditioning variable (wi ) are available in the associative arrayI Hence, the sum of all those counts gives the marginalI Then we divide the joint counts by the marginal and we’re done
The pairs approachI The reducer receives the pair (wi ,wj) and the countI From this information alone it is not possible to compute f (wj |wi)I Fortunately, as for the mapper, also the reducer can preserve state
across multiple keysF We can buffer in memory all the words that co-occur with wi and their
countsF This is basically building the associative array in the stripes method
Pietro Michiardi (Eurecom) Scalable Algorithm Design 56 / 62
Basic Design Patterns Order Inversion
Computing relative frequencies: a basic approachWe must define the sort order of the pair
I In this way, the keys are first sorted by the left word, and then by theright word (in the pair)
I Hence, we can detect if all pairs associated with the word we areconditioning on (wi ) have been seen
I At this point, we can use the in-memory buffer, compute the relativefrequencies and emit
We must define an appropriate partitionerI The default partitioner is based on the hash value of the
intermediate key, modulo the number of reducersI For a complex key, the raw byte representation is used to
compute the hash valueF Hence, there is no guarantee that the pair (dog, aardvark) and
(dog,zebra) are sent to the same reducerI What we want is that all pairs with the same left word are sent to
the same reducer
Limitations of this approachI Essentially, we reproduce the stripes method in the reducer and we
need to use a custom partitionerI This algorithm would work, but present the same
memory-bottleneck problem as the stripes method
Pietro Michiardi (Eurecom) Scalable Algorithm Design 57 / 62
Basic Design Patterns Order Inversion
Computing relative frequencies: order inversion
The key is to properly sequence data presented to reducersI If it were possible to compute the marginal in the reducer before
processing the joint counts, the reducer could simply divide the jointcounts received from mappers by the marginal
I The notion of “before” and “after” can be captured in the ordering ofkey-value pairs
I The programmer can define the sort order of keys so that dataneeded earlier is presented to the reducer before data that isneeded later
Pietro Michiardi (Eurecom) Scalable Algorithm Design 58 / 62
Basic Design Patterns Order Inversion
Computing relative frequencies: order inversion
Recall that mappers emit pairs of co-occurring words as keys
The mapper:I additionally emits a “special” key of the form (wi , ∗)I The value associated to the special key is one, that represents the
contribution of the word pair to the marginalI Using combiners, these partial marginal counts will be aggregated
before being sent to the reducers
The reducer:I We must make sure that the special key-value pairs are processed
before any other key-value pairs where the left word is wiI We also need to modify the partitioner as before, i.e., it would take
into account only the first word
Pietro Michiardi (Eurecom) Scalable Algorithm Design 59 / 62
Basic Design Patterns Order Inversion
Computing relative frequencies: order inversion
Memory requirements:I Minimal, because only the marginal (an integer) needs to be storedI No buffering of individual co-occurring wordI No scalability bottleneck
Key ingredients for order inversionI Emit a special key-value pair to capture the marginalI Control the sort order of the intermediate key, so that the special
key-value pair is processed firstI Define a custom partitioner for routing intermediate key-value pairsI Preserve state across multiple keys in the reducer
Pietro Michiardi (Eurecom) Scalable Algorithm Design 60 / 62
References
References
Pietro Michiardi (Eurecom) Scalable Algorithm Design 61 / 62
References
References I
[1] Luiz Andre Barroso and Urs Holzle.The datacebter as a computer: An introduction to the design ofwarehouse-scale machines.Morgan & Claypool Publishers, 2009.
[2] James Hamilton.Cooperative expendable micro-slice servers (cems): Low cost, lowpower servers for internet-scale services.In Proc. of the 4th Biennal Conference on Innovative Data SystemsResearch (CIDR), 2009.
Pietro Michiardi (Eurecom) Scalable Algorithm Design 62 / 62