introducing mapreduce programming framework

11
Introducing MapReduce Programming Model Samuel Yee

Upload: samuel-yee

Post on 15-Apr-2017

128 views

Category:

Data & Analytics


2 download

TRANSCRIPT

Page 1: Introducing MapReduce Programming Framework

Introducing MapReduce

Programming ModelSamuel Yee

Page 2: Introducing MapReduce Programming Framework

*Multi-threaded Programming

Page 3: Introducing MapReduce Programming Framework

MapReduce Programming Model

For parallelization & distributed computing, programmers don’t have to worry about multi-threading, system failure, file I/O, networking, data loss etc. All these complex low-level activities are taken care of by Hadoop.

Focus on 2 key functions instead: Mapper and Reducer Mapper function

Ingest from large input files Split up into many smaller blocks (default 64MB per block size) Transform inputs into key-value pairs, shuffle and map them to Reduce function

Reducer function Reduce outputs by aggregating, summing, eliminating etc. Write to output files

Key-Value pairs must match between Mapper and Reducer functions

Page 4: Introducing MapReduce Programming Framework

Data Processing (MapReduce)

Input Data

Map()

Map()

Map()

Reduce()

Reduce()

Output Data

Split[k1, v1]

Sort byk1

Merge[k1, [v1, v2, v3…]]

Page 5: Introducing MapReduce Programming Framework

Hadoop’s Approach

Big Data

Block

Block

Block

Block

Block

Block

Split into smaller data blocks

Page 6: Introducing MapReduce Programming Framework

Hadoop’s Approach

Block

Block

Block

Block

Block

Block

Computing

Computing

Computing

Computing

Computing

Computing

Map Computing Process to Data Blocks

Reduce outputs by aggregating into a result

Output

Output

Output

Output

Output

Output

Page 7: Introducing MapReduce Programming Framework

Consider Two Input Files

File01.txt: Hello World Bye World File02.txt: Hello Hadoop Goodbye Hadoop

Page 8: Introducing MapReduce Programming Framework

Outputs of Mappers

Process 1 [Hello, 1] [Hadoop, 1] [Goodbye, 1] [Hadoop, 1]

Process 2 [Hello, 1] [World, 1] [Bye, 1] [World, 1]

Page 9: Introducing MapReduce Programming Framework

Consolidated Result of Reducers

[Bye, 1] [Goodbye, 1] [Hadoop, 2] [Hello, 2] [World, 2]

Page 10: Introducing MapReduce Programming Framework

MapReduce Template in Java

Page 11: Introducing MapReduce Programming Framework

Demo

MapReduce programming using IntelliJ IDEA and Java Read my LinkedIn articles on how to setup development environment

for MapReduce and Spark on Windows http://tinyurl.com/px9rwwk