mapreduce introduction | overview | online training | basics

Post on 12-Feb-2017

454 Views

Category:

Education

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Introduction to Hadoop Map Reduce

Email: sales@kerneltraining.comCall us: +91 8099776681

www.kerneltraining.com

MapReduce

Pre-requisites for learning MapReduce ??

1. Hadoop Framework2. Distributed storage system such as HDFS3. Parallel programming concepts

www.kerneltraining.com

MapReduce

Overview of mapreduce workflow

Map Input List

Map Output List

Mapper

Reduce Input List

Reduce Output List

Reducer

Mapping Phase

Reducing Pahase

Map Input List

www.kerneltraining.com

MapReduce

<1, Delhi Mumbai Delhi>

<Delhi, 1><Mumbai, 1><Delhi, 1>

<2, Bangalore Delhi Chennai>

<3, Mumbai Delhi Chennai>

<Bangalore, 1><Delhi, 1><Chennai, 1>

<Mumbai, 1><Delhi, 1><Chennai, 1>

<Delhi, 1><Delhi, 1><Delhi, 1><Delhi, 1><Bangalore, 1>

<Mumbai, 1><Mumbai, 1>

<Chennai, 1><Chennai, 1>

<Delhi, (1,1,1,1)><Bangalore, 1>

<Mumbai, (1,1)><Chennai, (1,1)>

Delhi Mumbai Delhi

Bangalore Delhi Chennai

Mumbai Delhi Chennai

Map Phase

Shuffle/Sort

Reduce Phase

Map Output

Overview of MapReduce Framework

Input File

www.kerneltraining.com

MapReduceResponsibilities to tackle various phases

Input Map Shuffling ReduceMap Output

Create ‘Input Splits’

Create individual Records -- Framework

User Defined Logic -- User

User Defined Logic -- User

Framework

<1, Delhi Mumbai Delhi>

<Delhi, 1><Mumbai, 1><Delhi, 1>

<2, Bangalore Delhi Chennai>

<3, Mumbai Delhi Chennai>

<Bangalore, 1><Delhi, 1><Chennai, 1>

<Mumbai, 1><Delhi, 1><Chennai, 1>

<Bangalore, 1>

<Mumbai, 1><Mumbai, 1>

<Chennai, 1><Chennai, 1>

<Delhi, (1,1,1,1)><Bangalore, 1>

<Mumbai, (1,1)><Chennai, (1,1)>

<Delhi, 4><Bangalore, 1>

Delhi Mumbai Delhi

Bangalore Delhi Chennai

Mumbai Delhi Chennai <Mumbai,

2><Chennai, 2>

<Delhi, 1><Delhi, 1><Delhi, 1><Delhi, 1>

www.kerneltraining.com

MapReduce

Reduce Process

Mapper Process

Block A Block B Block C

Driver

Mapper

Record Reader

Input Split 1

Input Split 2

Input Split 3

Input Split 4

InputFormat

Mapper Process

Mapper

Record Reader

Reads

Passes <K,V> pairs

Reads

Calculates

Defines Passes

<K,V> pairs

<K, V> pairs

<K, V> pairs

Components of MapReduce

www.kerneltraining.com

MapReduceComponents of MapReduce

Reduce Process

Mapper Process

Block A Block B Block C

Driver

Mapper

Record Reader

Input Split 1 Input Split 2 Input Split 3 Input Split 4

Mapper Process

Mapper

Record Reader

Reads

Passes <K,V> pairs

Reads

Calculates

Defines Passes

<K,V> pairs

<K, V> pairs

<K, V> pairs

Reduce Process

Reduce Process

www.kerneltraining.com

MapReduceComponents of MapReduce

Reduce Process

Mapper Process

Block A Block B Block C

Driver

Mapper

Record Reader

Input Split 1 Input Split 2 Input Split 3 Input Split 4

InputFormat

Mapper Process

Mapper

Record Reader

Reads

Passes <K,V> pairs

Reads

Calculates

Defines Passes

<K,V> pairs

<K, V> pairs

<K, V> pairs

Reduce Process

Reduce Process

Reduce ProcessReducer

Reduce ProcessReducer

Passes <K,V> pairs

Passes <K,V> pairs

Shuffle

www.kerneltraining.com

MapReduceComponents of MapReduce

Reduce Process

Mapper Process

Block A Block B Block C

Driver

Mapper

Record Reader

Input Split 1 Input Split 2 Input Split 3 Input Split 4

InputFormat

Mapper Process

Mapper

Record Reader

Reads

Passes <K,V> pairs

Reads

Calculates

Defines Passes <K,V>

pairs

<K, V> pairs

<K, V> pairs

Reduce Process

Reduce Process

Reduce ProcessReducer

Reduce ProcessReducer

Passes <K,V> pairs

Passes <K,V> pairs

Shuffle

Writer

Output Data

Writer

Output DataWrites

Writes

OutputFormat

Defines

Defines

Defines

Defines

Defines

www.kerneltraining.com

MapReduceDeciding factors to decide MapReduce.

Questions we must ask before deciding MapReduce :-

• Are input files input files independent of each other to process?

• Can the problem be broken into smaller tasks such that each task can be processed independently?

• Can the partial results of executing processing on small tasks be aggregated or consolidated?

www.kerneltraining.com

MapReduceDesign Patterns

Template for solving a common and general data manipulation problem with MapReduce.

• Summarization Patterns

• Filtering Patterns

• Join Patterns

• Job Chaining Patterns

www.kerneltraining.com

MapReduceCase Study – Summarization Pattern

•To find out subscribers and their corresponding downloaded bytes from sample logs of airmobile provided. Each line has information about subscriber (substring 15,26) the bytes downloaded (substring 87,97)

•Sample log files are present in above format. Data is present in line delimited format. From each line Customer ID and Downloaded Bytes have to be extracted for analysis.

www.kerneltraining.com

MapReduceCase Study – Summarization Pattern

(K1, V1) -- Input to user defined map function

•(0 , subId=00001111911128052639towerid=11232w34532543456345623453456984756894756bytes=122112212212212219.6726312167218586E17)

•(121 , subId=00001111911128052615towerid=11232w34532543456345623453456984756894756bytes=122112212212212216.9431647633139046E17

•(242 , subId=00001111911128052615towerid=11232w34532543456345623453456984756894756bytes=122112212212212214.7836041833447418E17)

www.kerneltraining.com

MapReduceCase Study – Summarization Pattern

list(K2, V2) -- Output from use defined map function

•(28052627, 8.4621702216543) •(28052639, 9.672631216721a858) •(28052627, 8.64072609693471)

www.kerneltraining.com

MapReduceCase Study – Summarization Pattern

(K2, list(V2)) -- Input to use defined reduce function

•(“28052627”, (8.4621702216543, 8.64072609693471) •(“28052639”, (9.672631216721858))

www.kerneltraining.com

MapReduceCase Study – Summarization Pattern

Mapper Class

www.kerneltraining.com

MapReduceCase Study – Summarization Pattern

Reducer Class

www.kerneltraining.com

MapReduceCase Study – Summarization Pattern

Driver Class

www.kerneltraining.com

MapReduceCase Study – Summarization Pattern

MapReduce Output

Questions?

www.kerneltraining.com

THANK YOUfor attending Demo of Hadoop Map Reduce

www.kerneltraining.com

Email: sales@kerneltraining.comCall us: +91 8099776681

top related