mapreduce introduction | overview | online training | basics

21
Introduction to Hadoop Map Reduce Email: [email protected] Call us: +91 8099776681

Upload: kernel-training

Post on 12-Feb-2017

454 views

Category:

Education


1 download

TRANSCRIPT

Page 1: Mapreduce Introduction | Overview | Online Training | Basics

Introduction to Hadoop Map Reduce

Email: [email protected] us: +91 8099776681

Page 2: Mapreduce Introduction | Overview | Online Training | Basics

www.kerneltraining.com

MapReduce

Pre-requisites for learning MapReduce ??

1. Hadoop Framework2. Distributed storage system such as HDFS3. Parallel programming concepts

Page 3: Mapreduce Introduction | Overview | Online Training | Basics

www.kerneltraining.com

MapReduce

Overview of mapreduce workflow

Map Input List

Map Output List

Mapper

Reduce Input List

Reduce Output List

Reducer

Mapping Phase

Reducing Pahase

Map Input List

Page 4: Mapreduce Introduction | Overview | Online Training | Basics

www.kerneltraining.com

MapReduce

<1, Delhi Mumbai Delhi>

<Delhi, 1><Mumbai, 1><Delhi, 1>

<2, Bangalore Delhi Chennai>

<3, Mumbai Delhi Chennai>

<Bangalore, 1><Delhi, 1><Chennai, 1>

<Mumbai, 1><Delhi, 1><Chennai, 1>

<Delhi, 1><Delhi, 1><Delhi, 1><Delhi, 1><Bangalore, 1>

<Mumbai, 1><Mumbai, 1>

<Chennai, 1><Chennai, 1>

<Delhi, (1,1,1,1)><Bangalore, 1>

<Mumbai, (1,1)><Chennai, (1,1)>

Delhi Mumbai Delhi

Bangalore Delhi Chennai

Mumbai Delhi Chennai

Map Phase

Shuffle/Sort

Reduce Phase

Map Output

Overview of MapReduce Framework

Input File

Page 5: Mapreduce Introduction | Overview | Online Training | Basics

www.kerneltraining.com

MapReduceResponsibilities to tackle various phases

Input Map Shuffling ReduceMap Output

Create ‘Input Splits’

Create individual Records -- Framework

User Defined Logic -- User

User Defined Logic -- User

Framework

<1, Delhi Mumbai Delhi>

<Delhi, 1><Mumbai, 1><Delhi, 1>

<2, Bangalore Delhi Chennai>

<3, Mumbai Delhi Chennai>

<Bangalore, 1><Delhi, 1><Chennai, 1>

<Mumbai, 1><Delhi, 1><Chennai, 1>

<Bangalore, 1>

<Mumbai, 1><Mumbai, 1>

<Chennai, 1><Chennai, 1>

<Delhi, (1,1,1,1)><Bangalore, 1>

<Mumbai, (1,1)><Chennai, (1,1)>

<Delhi, 4><Bangalore, 1>

Delhi Mumbai Delhi

Bangalore Delhi Chennai

Mumbai Delhi Chennai <Mumbai,

2><Chennai, 2>

<Delhi, 1><Delhi, 1><Delhi, 1><Delhi, 1>

Page 6: Mapreduce Introduction | Overview | Online Training | Basics

www.kerneltraining.com

MapReduce

Reduce Process

Mapper Process

Block A Block B Block C

Driver

Mapper

Record Reader

Input Split 1

Input Split 2

Input Split 3

Input Split 4

InputFormat

Mapper Process

Mapper

Record Reader

Reads

Passes <K,V> pairs

Reads

Calculates

Defines Passes

<K,V> pairs

<K, V> pairs

<K, V> pairs

Components of MapReduce

Page 7: Mapreduce Introduction | Overview | Online Training | Basics

www.kerneltraining.com

MapReduceComponents of MapReduce

Reduce Process

Mapper Process

Block A Block B Block C

Driver

Mapper

Record Reader

Input Split 1 Input Split 2 Input Split 3 Input Split 4

Mapper Process

Mapper

Record Reader

Reads

Passes <K,V> pairs

Reads

Calculates

Defines Passes

<K,V> pairs

<K, V> pairs

<K, V> pairs

Reduce Process

Reduce Process

Page 8: Mapreduce Introduction | Overview | Online Training | Basics

www.kerneltraining.com

MapReduceComponents of MapReduce

Reduce Process

Mapper Process

Block A Block B Block C

Driver

Mapper

Record Reader

Input Split 1 Input Split 2 Input Split 3 Input Split 4

InputFormat

Mapper Process

Mapper

Record Reader

Reads

Passes <K,V> pairs

Reads

Calculates

Defines Passes

<K,V> pairs

<K, V> pairs

<K, V> pairs

Reduce Process

Reduce Process

Reduce ProcessReducer

Reduce ProcessReducer

Passes <K,V> pairs

Passes <K,V> pairs

Shuffle

Page 9: Mapreduce Introduction | Overview | Online Training | Basics

www.kerneltraining.com

MapReduceComponents of MapReduce

Reduce Process

Mapper Process

Block A Block B Block C

Driver

Mapper

Record Reader

Input Split 1 Input Split 2 Input Split 3 Input Split 4

InputFormat

Mapper Process

Mapper

Record Reader

Reads

Passes <K,V> pairs

Reads

Calculates

Defines Passes <K,V>

pairs

<K, V> pairs

<K, V> pairs

Reduce Process

Reduce Process

Reduce ProcessReducer

Reduce ProcessReducer

Passes <K,V> pairs

Passes <K,V> pairs

Shuffle

Writer

Output Data

Writer

Output DataWrites

Writes

OutputFormat

Defines

Defines

Defines

Defines

Defines

Page 10: Mapreduce Introduction | Overview | Online Training | Basics

www.kerneltraining.com

MapReduceDeciding factors to decide MapReduce.

Questions we must ask before deciding MapReduce :-

• Are input files input files independent of each other to process?

• Can the problem be broken into smaller tasks such that each task can be processed independently?

• Can the partial results of executing processing on small tasks be aggregated or consolidated?

Page 11: Mapreduce Introduction | Overview | Online Training | Basics

www.kerneltraining.com

MapReduceDesign Patterns

Template for solving a common and general data manipulation problem with MapReduce.

• Summarization Patterns

• Filtering Patterns

• Join Patterns

• Job Chaining Patterns

Page 12: Mapreduce Introduction | Overview | Online Training | Basics

www.kerneltraining.com

MapReduceCase Study – Summarization Pattern

•To find out subscribers and their corresponding downloaded bytes from sample logs of airmobile provided. Each line has information about subscriber (substring 15,26) the bytes downloaded (substring 87,97)

•Sample log files are present in above format. Data is present in line delimited format. From each line Customer ID and Downloaded Bytes have to be extracted for analysis.

Page 13: Mapreduce Introduction | Overview | Online Training | Basics

www.kerneltraining.com

MapReduceCase Study – Summarization Pattern

(K1, V1) -- Input to user defined map function

•(0 , subId=00001111911128052639towerid=11232w34532543456345623453456984756894756bytes=122112212212212219.6726312167218586E17)

•(121 , subId=00001111911128052615towerid=11232w34532543456345623453456984756894756bytes=122112212212212216.9431647633139046E17

•(242 , subId=00001111911128052615towerid=11232w34532543456345623453456984756894756bytes=122112212212212214.7836041833447418E17)

Page 14: Mapreduce Introduction | Overview | Online Training | Basics

www.kerneltraining.com

MapReduceCase Study – Summarization Pattern

list(K2, V2) -- Output from use defined map function

•(28052627, 8.4621702216543) •(28052639, 9.672631216721a858) •(28052627, 8.64072609693471)

Page 15: Mapreduce Introduction | Overview | Online Training | Basics

www.kerneltraining.com

MapReduceCase Study – Summarization Pattern

(K2, list(V2)) -- Input to use defined reduce function

•(“28052627”, (8.4621702216543, 8.64072609693471) •(“28052639”, (9.672631216721858))

Page 16: Mapreduce Introduction | Overview | Online Training | Basics

www.kerneltraining.com

MapReduceCase Study – Summarization Pattern

Mapper Class

Page 17: Mapreduce Introduction | Overview | Online Training | Basics

www.kerneltraining.com

MapReduceCase Study – Summarization Pattern

Reducer Class

Page 18: Mapreduce Introduction | Overview | Online Training | Basics

www.kerneltraining.com

MapReduceCase Study – Summarization Pattern

Driver Class

Page 19: Mapreduce Introduction | Overview | Online Training | Basics

www.kerneltraining.com

MapReduceCase Study – Summarization Pattern

MapReduce Output

Page 20: Mapreduce Introduction | Overview | Online Training | Basics

Questions?

www.kerneltraining.com

Page 21: Mapreduce Introduction | Overview | Online Training | Basics

THANK YOUfor attending Demo of Hadoop Map Reduce

www.kerneltraining.com

Email: [email protected] us: +91 8099776681