map-reduce: win -or- epic win
DESCRIPTION
CSC313: Advanced Programming Topics. Map-Reduce: Win -or- Epic Win. Brief History of Google. BackRub : 1996 4 disk drives 24 GB total storage. Brief History of Google. BackRub : 1996 4 disk drives 24 GB total storage. =. Brief History of Google. Google: 1998 44 disk drives - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Map-Reduce: Win -or- Epic Win](https://reader034.vdocuments.site/reader034/viewer/2022050809/56816781550346895ddc8e10/html5/thumbnails/1.jpg)
MAP-REDUCE:WIN-OR-
EPIC WIN
CSC313: Advanced Programming Topics
![Page 2: Map-Reduce: Win -or- Epic Win](https://reader034.vdocuments.site/reader034/viewer/2022050809/56816781550346895ddc8e10/html5/thumbnails/2.jpg)
Brief History of Google
BackRub: 19964 disk drives
24 GB total storage
![Page 3: Map-Reduce: Win -or- Epic Win](https://reader034.vdocuments.site/reader034/viewer/2022050809/56816781550346895ddc8e10/html5/thumbnails/3.jpg)
Brief History of Google
BackRub: 19964 disk drives
24 GB total storage
=
![Page 4: Map-Reduce: Win -or- Epic Win](https://reader034.vdocuments.site/reader034/viewer/2022050809/56816781550346895ddc8e10/html5/thumbnails/4.jpg)
Brief History of Google
Google: 199844 disk drives
366 GB total storage
![Page 5: Map-Reduce: Win -or- Epic Win](https://reader034.vdocuments.site/reader034/viewer/2022050809/56816781550346895ddc8e10/html5/thumbnails/5.jpg)
Brief History of Google
Google: 199844 disk drives
366 GB total storage
=
![Page 6: Map-Reduce: Win -or- Epic Win](https://reader034.vdocuments.site/reader034/viewer/2022050809/56816781550346895ddc8e10/html5/thumbnails/6.jpg)
Traditional Design Principles If big enough, supercomputer processes
work Use desktop CPUs, just a lot more of them But it also provides huge bandwidth to
memory Equivalent to many machines bandwidth at
once But supercomputers are VERY, VERY
expensive Maintenance also expensive once machine
bought But do get something: high-quality == low
downtime Safe, expensive solution to very large
problems
![Page 7: Map-Reduce: Win -or- Epic Win](https://reader034.vdocuments.site/reader034/viewer/2022050809/56816781550346895ddc8e10/html5/thumbnails/7.jpg)
Why Trade Money for Safety?
![Page 8: Map-Reduce: Win -or- Epic Win](https://reader034.vdocuments.site/reader034/viewer/2022050809/56816781550346895ddc8e10/html5/thumbnails/8.jpg)
Why Trade Money for Safety?
![Page 9: Map-Reduce: Win -or- Epic Win](https://reader034.vdocuments.site/reader034/viewer/2022050809/56816781550346895ddc8e10/html5/thumbnails/9.jpg)
How Was Search Performed?
http://www.yahoo.com/search?p=pager
DNS
![Page 10: Map-Reduce: Win -or- Epic Win](https://reader034.vdocuments.site/reader034/viewer/2022050809/56816781550346895ddc8e10/html5/thumbnails/10.jpg)
How Was Search Performed?
http://www.yahoo.com/search?p=pager
DNS
![Page 11: Map-Reduce: Win -or- Epic Win](https://reader034.vdocuments.site/reader034/viewer/2022050809/56816781550346895ddc8e10/html5/thumbnails/11.jpg)
How Was Search Performed?
http://www.yahoo.com/search?p=pager
DNS
http://209.191.122.70
![Page 12: Map-Reduce: Win -or- Epic Win](https://reader034.vdocuments.site/reader034/viewer/2022050809/56816781550346895ddc8e10/html5/thumbnails/12.jpg)
How Was Search Performed?
DNS
http://209.191.122.70/search?p=pagerhttp://www.yahoo.com/search?p=pager
![Page 13: Map-Reduce: Win -or- Epic Win](https://reader034.vdocuments.site/reader034/viewer/2022050809/56816781550346895ddc8e10/html5/thumbnails/13.jpg)
How Was Search Performed?
DNS
http://209.191.122.70/search?p=pagerhttp://www.yahoo.com/search?p=pager
![Page 14: Map-Reduce: Win -or- Epic Win](https://reader034.vdocuments.site/reader034/viewer/2022050809/56816781550346895ddc8e10/html5/thumbnails/14.jpg)
Google’s Big Insight
Performing search is “embarrassingly parallel” No need for supercomputer and all that
expense Can instead do this using lots & lots of
desktops Identical effective bandwidth &
performance
![Page 15: Map-Reduce: Win -or- Epic Win](https://reader034.vdocuments.site/reader034/viewer/2022050809/56816781550346895ddc8e10/html5/thumbnails/15.jpg)
Google’s Big Insight
Performing search is “embarrassingly parallel” No need for supercomputer and all that
expense Can instead do this using lots & lots of
desktops Identical effective bandwidth &
performance But problem is desktop machines
unreliable Budget for 2 replacements, since
machines cheap Just expect failure; software provides
quality
![Page 16: Map-Reduce: Win -or- Epic Win](https://reader034.vdocuments.site/reader034/viewer/2022050809/56816781550346895ddc8e10/html5/thumbnails/16.jpg)
Google’s Big Insight
Performing search is “embarrassingly parallel” No need for supercomputer and all that
expense Can instead do this using lots & lots of
desktops Identical effective bandwidth &
performance But problem is desktop machines
unreliable Budget for 2 replacements, since
machines cheap Just expect failure; software provides
quality
![Page 17: Map-Reduce: Win -or- Epic Win](https://reader034.vdocuments.site/reader034/viewer/2022050809/56816781550346895ddc8e10/html5/thumbnails/17.jpg)
Google’s Big Insight
Performing search is “embarrassingly parallel” No need for supercomputer and all that
expense Can instead do this using lots & lots of
desktops Identical effective bandwidth &
performance But problem is desktop machines
unreliable Budget for 2 replacements, since
machines cheap Just expect failure; software provides
quality
![Page 18: Map-Reduce: Win -or- Epic Win](https://reader034.vdocuments.site/reader034/viewer/2022050809/56816781550346895ddc8e10/html5/thumbnails/18.jpg)
A brief history of Google
Google: 2012?0,000 total
servers ??? PB total storage
![Page 19: Map-Reduce: Win -or- Epic Win](https://reader034.vdocuments.site/reader034/viewer/2022050809/56816781550346895ddc8e10/html5/thumbnails/19.jpg)
How Is Search Performed Now?
http://209.85.148.100/search?q=android
![Page 20: Map-Reduce: Win -or- Epic Win](https://reader034.vdocuments.site/reader034/viewer/2022050809/56816781550346895ddc8e10/html5/thumbnails/20.jpg)
How Is Search Performed Now?
http://209.85.148.100/search?q=androidSpell Checker
Ad Server
Document Servers (TB)Index Servers (TB)
![Page 21: Map-Reduce: Win -or- Epic Win](https://reader034.vdocuments.site/reader034/viewer/2022050809/56816781550346895ddc8e10/html5/thumbnails/21.jpg)
How Is Search Performed Now?
http://209.85.148.100/search?q=androidSpell Checker
Ad Server
Document Servers (TB)Index Servers (TB)
![Page 22: Map-Reduce: Win -or- Epic Win](https://reader034.vdocuments.site/reader034/viewer/2022050809/56816781550346895ddc8e10/html5/thumbnails/22.jpg)
How Is Search Performed Now?
http://209.85.148.100/search?q=androidSpell Checker
Ad Server
Document Servers (TB)Index Servers (TB)
![Page 23: Map-Reduce: Win -or- Epic Win](https://reader034.vdocuments.site/reader034/viewer/2022050809/56816781550346895ddc8e10/html5/thumbnails/23.jpg)
How Is Search Performed Now?
http://209.85.148.100/search?q=androidSpell Checker
Ad Server
Document Servers (TB)Index Servers (TB)
![Page 24: Map-Reduce: Win -or- Epic Win](https://reader034.vdocuments.site/reader034/viewer/2022050809/56816781550346895ddc8e10/html5/thumbnails/24.jpg)
Google’s Processing Model
Buy cheap machines & prepare for worst Machines going to fail, but still cheaper
approach Important steps keep whole system
reliable Replicate data so that information losses
limited Move data freely so can always
rebalance loads These decisions lead to many other
benefits Scalability helped by focus on balancing Search speed improved; performance
much better Utilize resources fully, since search
demand varies
![Page 25: Map-Reduce: Win -or- Epic Win](https://reader034.vdocuments.site/reader034/viewer/2022050809/56816781550346895ddc8e10/html5/thumbnails/25.jpg)
Heterogeneous processing
By buying cheapest computers, variances are high Programs must handle homo- & hetero-
systems Centralized workqueue helps with different
speeds
![Page 26: Map-Reduce: Win -or- Epic Win](https://reader034.vdocuments.site/reader034/viewer/2022050809/56816781550346895ddc8e10/html5/thumbnails/26.jpg)
Heterogeneous processing
By buying cheapest computers, variances are high Programs must handle homo- & hetero-
systems Centralized workqueue helps with different
speeds This process also leads to a few small
downsides Space Power consumption Cooling costs
![Page 27: Map-Reduce: Win -or- Epic Win](https://reader034.vdocuments.site/reader034/viewer/2022050809/56816781550346895ddc8e10/html5/thumbnails/27.jpg)
Complexity at Google
![Page 28: Map-Reduce: Win -or- Epic Win](https://reader034.vdocuments.site/reader034/viewer/2022050809/56816781550346895ddc8e10/html5/thumbnails/28.jpg)
Complexity at Google
Avoid this nightmare using abstractions
![Page 29: Map-Reduce: Win -or- Epic Win](https://reader034.vdocuments.site/reader034/viewer/2022050809/56816781550346895ddc8e10/html5/thumbnails/29.jpg)
Google Abstractions
Google File System Handles replication to provide scalability &
durability BigTable
Manages large relational data sets Chubby
Gonna skip past that joke; distributed locking service
MapReduce If job fits, easy parallelism possible without much
work
![Page 30: Map-Reduce: Win -or- Epic Win](https://reader034.vdocuments.site/reader034/viewer/2022050809/56816781550346895ddc8e10/html5/thumbnails/30.jpg)
Google Abstractions
Google File System Handles replication to provide scalability &
durability BigTable
Manages large relational data sets Chubby
Gonna skip past that joke; distributed locking service
MapReduce If job fits, easy parallelism possible without much
work
![Page 31: Map-Reduce: Win -or- Epic Win](https://reader034.vdocuments.site/reader034/viewer/2022050809/56816781550346895ddc8e10/html5/thumbnails/31.jpg)
Remember Google’s Problem
![Page 32: Map-Reduce: Win -or- Epic Win](https://reader034.vdocuments.site/reader034/viewer/2022050809/56816781550346895ddc8e10/html5/thumbnails/32.jpg)
MapReduce Overview
Programming model makes details simple Automatic parallelization & load
balancing Network and disk I/O optimization Robust performance even if machines
fail
![Page 33: Map-Reduce: Win -or- Epic Win](https://reader034.vdocuments.site/reader034/viewer/2022050809/56816781550346895ddc8e10/html5/thumbnails/33.jpg)
MapReduce Overview
Programming model provides good Façade Automatic parallelization & load balancing Network and disk I/O optimization Robust performance even if machines fail
![Page 34: Map-Reduce: Win -or- Epic Win](https://reader034.vdocuments.site/reader034/viewer/2022050809/56816781550346895ddc8e10/html5/thumbnails/34.jpg)
MapReduce Overview
Programming model provides good Façade Automatic parallelization & load balancing Network and disk I/O optimization Robust performance even if machines fail
Idea came from 2 Lisp (functional) primitives Map Reduce
![Page 35: Map-Reduce: Win -or- Epic Win](https://reader034.vdocuments.site/reader034/viewer/2022050809/56816781550346895ddc8e10/html5/thumbnails/35.jpg)
MapReduce Overview
Programming model provides good Façade Automatic parallelization & load balancing Network and disk I/O optimization Robust performance even if machines fail
Idea came from 2 Lisp (functional) primitives Map: process each entry in list using
some function Reduce: recombines data using given
function
![Page 36: Map-Reduce: Win -or- Epic Win](https://reader034.vdocuments.site/reader034/viewer/2022050809/56816781550346895ddc8e10/html5/thumbnails/36.jpg)
Typical MapReduce problem
1. Read lots and lots of data (e.g., TBs)2. Map
Extract important data from each entry in input
3. Combine Maps and sort entries by key
4. Reduce Process each key’s entries to get result for
that key5. Output final result & watch money roll
in
![Page 37: Map-Reduce: Win -or- Epic Win](https://reader034.vdocuments.site/reader034/viewer/2022050809/56816781550346895ddc8e10/html5/thumbnails/37.jpg)
Typical MapReduce problem
1. Read lots and lots of data (e.g., TBs)2. Map
Extract important data from each entry in input
3. Combine Maps and sort entries by key
4. Reduce Process each key’s entries to get result for
that key5. Output final result & watch money roll
in
Outline always same;Just map & reduce functions change
![Page 38: Map-Reduce: Win -or- Epic Win](https://reader034.vdocuments.site/reader034/viewer/2022050809/56816781550346895ddc8e10/html5/thumbnails/38.jpg)
Typical MapReduce problem
1. Read lots and lots of data (e.g., TBs)2. Map
Extract important data from each entry in input
3. Combine Maps and sort entries by key
4. Reduce Process each key’s entries to get result for
that key5. Output final result & watch money roll
in
Algorithm always same;Just map & reduce functions change
![Page 39: Map-Reduce: Win -or- Epic Win](https://reader034.vdocuments.site/reader034/viewer/2022050809/56816781550346895ddc8e10/html5/thumbnails/39.jpg)
Typical MapReduce problem
1. Read lots and lots of data (e.g., TBs)2. Map
Extract important data from each entry in input
3. Combine Maps and sort entries by key
4. Reduce Process each key’s entries to get result for
that key5. Output final result & watch money roll
in
Template method always same;Just the hook methods change
![Page 40: Map-Reduce: Win -or- Epic Win](https://reader034.vdocuments.site/reader034/viewer/2022050809/56816781550346895ddc8e10/html5/thumbnails/40.jpg)
Pictorial View of MapReduce
![Page 41: Map-Reduce: Win -or- Epic Win](https://reader034.vdocuments.site/reader034/viewer/2022050809/56816781550346895ddc8e10/html5/thumbnails/41.jpg)
Ex: Count Word Frequencies Processes files separately
MapKey=URLValue=text on page
![Page 42: Map-Reduce: Win -or- Epic Win](https://reader034.vdocuments.site/reader034/viewer/2022050809/56816781550346895ddc8e10/html5/thumbnails/42.jpg)
Ex: Count Word Frequencies Processes files separately & count word
freq. in each
MapKey=URLValue=text on page
Key’=wordValue’=countKey’=word
Value’=countKey’=wordValue’=countKey’=word
Value’=count
![Page 43: Map-Reduce: Win -or- Epic Win](https://reader034.vdocuments.site/reader034/viewer/2022050809/56816781550346895ddc8e10/html5/thumbnails/43.jpg)
Ex: Count Word Frequencies
Reduce
Key’=“to”Value’=“1”
Key’=“be”Value’=“1”
Key’=“or”Value’=“1”
Key’=“not”Value’=“1”
Key’=“to”Value’=“1”
Key’=“be”Value’=“1”
In shuffle step, Maps combined & entries sorted by key
Reduce
![Page 44: Map-Reduce: Win -or- Epic Win](https://reader034.vdocuments.site/reader034/viewer/2022050809/56816781550346895ddc8e10/html5/thumbnails/44.jpg)
Ex: Count Word Frequencies In shuffle step, Maps combined & entries
sorted by key Reduce combines key’s results to compute
final output
Reduce
Key’=“to”Value’=“1”
Key’=“be”Value’=“1”
Key’=“or”Value’=“1”
Key’=“not”Value’=“1”
Key’=“to”Value’=“1”
Key’=“be”Value’=“1”
Key’’=“to”Value’’=“2”
Key’’=“be”Value’’=“2”
Key’’=“or”Value’’=“1”
Key’’=“not”Value’’=“1”
![Page 45: Map-Reduce: Win -or- Epic Win](https://reader034.vdocuments.site/reader034/viewer/2022050809/56816781550346895ddc8e10/html5/thumbnails/45.jpg)
Word Frequency Pseudo-codeMap(String input_key, String input_values) {
String[] words = input_values.split(“ ”);foreach w in words { EmitIntermediate(w, "1");}
}
Reduce(String key, Iterator intermediate_values){int result = 0;foreach v in intermediate_values { result += ParseInt(v);}Emit(result);
}
![Page 46: Map-Reduce: Win -or- Epic Win](https://reader034.vdocuments.site/reader034/viewer/2022050809/56816781550346895ddc8e10/html5/thumbnails/46.jpg)
Ex: Build Search Index
Processes files separately & record words found on each
MapKey=URLValue=text on page
Key’=wordValue’=countKey’=word
Value’=countKey’=wordValue’=countKey’=word
Value’=URL
![Page 47: Map-Reduce: Win -or- Epic Win](https://reader034.vdocuments.site/reader034/viewer/2022050809/56816781550346895ddc8e10/html5/thumbnails/47.jpg)
Ex: Build Search Index
Processes files separately & record words found on each
To get search Map, combine key’s results in Reduce
MapKey=URLValue=text on page
Key’=wordValue’=countKey’=word
Value’=countKey’=wordValue’=countKey’=word
Value’=URL
ReduceKey’=wordValue’=countKey’=word
Value’=countKey’=wordValue’=countKey’=word
Value’=URL
Key=wordValue=URLs with word
![Page 48: Map-Reduce: Win -or- Epic Win](https://reader034.vdocuments.site/reader034/viewer/2022050809/56816781550346895ddc8e10/html5/thumbnails/48.jpg)
Search Index Pseudo-codeMap(String input_key, String input_values) {
String[] words = input_values.split(“ ”);foreach w in words { EmitIntermediate(w, input_key);}
}
Reduce(String key, Iterator intermediate_values){List result = new ArrayList();foreach v in intermediate_values { result.addLast(v);}Emit(result);
}
![Page 49: Map-Reduce: Win -or- Epic Win](https://reader034.vdocuments.site/reader034/viewer/2022050809/56816781550346895ddc8e10/html5/thumbnails/49.jpg)
Ex: Page Rank Computation
Google’s algorithm ranking pages’ relevance
![Page 50: Map-Reduce: Win -or- Epic Win](https://reader034.vdocuments.site/reader034/viewer/2022050809/56816781550346895ddc8e10/html5/thumbnails/50.jpg)
Ex: Page Rank Computation
MapKey=<URL, rank>Value=links on page
Key’=wordValue’=countKey’=word
Value’=countKey’=wordValue’=countKey’=link on page
Value’=<URL, rank/N>
Reduce Key=<URL, rank>Value=links on page
Key=<URL, rank>Value=links on page
Key’=wordValue’=countKey’=word
Value’=countKey’=wordValue’=countKey’=link to URL
Value’=<src, rank/N>
Key=<URL, rank>Value=links on page
+
+
![Page 51: Map-Reduce: Win -or- Epic Win](https://reader034.vdocuments.site/reader034/viewer/2022050809/56816781550346895ddc8e10/html5/thumbnails/51.jpg)
Ex: Page Rank Computation
MapKey=<URL, rank>Value=links on page
Key’=wordValue’=countKey’=word
Value’=countKey’=wordValue’=countKey’=link on page
Value’=<URL, rank/N>
Reduce Key=<URL, rank>Value=links on page
Key=<URL, rank>Value=links on page
Key’=wordValue’=countKey’=word
Value’=countKey’=wordValue’=countKey’=link to URL
Value’=<src, rank/N>
Key=<URL, rank>Value=links on page
+
+
Repeat entire process
(e.g., input Reduce results back into Map)
until page ranks stabilize
(sum of changes to the ranksdrops below some threshold)
![Page 52: Map-Reduce: Win -or- Epic Win](https://reader034.vdocuments.site/reader034/viewer/2022050809/56816781550346895ddc8e10/html5/thumbnails/52.jpg)
Ex: Page Rank Computation
Google’s algorithm ranking pages’ relevance
![Page 53: Map-Reduce: Win -or- Epic Win](https://reader034.vdocuments.site/reader034/viewer/2022050809/56816781550346895ddc8e10/html5/thumbnails/53.jpg)
Advanced MapReduce Ideas
How to implement? One master, many workers Split input data into tasks where each task
size fixed Will also be partitioning reduce phase into
tasks Dynamically assign tasks to workers during
each step Tasks assigned as needed & placed in in-
process list Once worker completes task, save result &
retire task Assume that a worker crashed, if not
complete in time Move incomplete tasks back into pool for
reassignment
![Page 54: Map-Reduce: Win -or- Epic Win](https://reader034.vdocuments.site/reader034/viewer/2022050809/56816781550346895ddc8e10/html5/thumbnails/54.jpg)
Advanced MapReduce Ideas
How to implement? One master, many workers Split input data into tasks where each task
size fixed Will also be partitioning reduce phase into
tasks Dynamically assign tasks to workers during
each step Tasks assigned as needed & placed in in-
process list Once worker completes task, save result &
retire task Assume that a worker crashed, if not
complete in time Move incomplete tasks back into pool for
reassignment
![Page 55: Map-Reduce: Win -or- Epic Win](https://reader034.vdocuments.site/reader034/viewer/2022050809/56816781550346895ddc8e10/html5/thumbnails/55.jpg)
Advanced MapReduce Ideas
How to implement? One invoker, many commands Split input data into tasks where each task
size fixed Will also be partitioning reduce phase into
tasks Dynamically assign tasks to workers during
each step Tasks assigned as needed & placed in in-
process list Once worker completes task, save result &
retire task Assume that a worker crashed, if not
complete in time Move incomplete tasks back into pool for
reassignment