developing a mapreduce application€¦ · mapreduce paradigm job tracker example word count job...

29
MapReduce Paradigm Job Tracker Example Developing a MapReduce Application Oguzhan Gencoglu Oguzhan Gencoglu Developing a MapReduce Application

Upload: others

Post on 23-Sep-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Developing a MapReduce Application€¦ · MapReduce Paradigm Job Tracker Example Word Count Job Tracker Key Points Key Points Test mapper and reducer outside hadoop. Copy your MapReduce

MapReduce ParadigmJob Tracker

Example

Developing a MapReduce Application

Oguzhan Gencoglu

TIE 12206 - Apache HadoopTampere University of Technology, Finland

November, 2014

Oguzhan Gencoglu Developing a MapReduce Application

Page 2: Developing a MapReduce Application€¦ · MapReduce Paradigm Job Tracker Example Word Count Job Tracker Key Points Key Points Test mapper and reducer outside hadoop. Copy your MapReduce

MapReduce ParadigmJob Tracker

Example

Outline

1 MapReduce ParadigmWhat is MapReduceMapReduce Workflow

2 Job TrackerHadoop Default Ports

3 ExampleWord CountJob TrackerKey Points

Oguzhan Gencoglu Developing a MapReduce Application

Page 3: Developing a MapReduce Application€¦ · MapReduce Paradigm Job Tracker Example Word Count Job Tracker Key Points Key Points Test mapper and reducer outside hadoop. Copy your MapReduce

MapReduce ParadigmJob Tracker

Example

What is MapReduceMapReduce Workflow

Outline

1 MapReduce ParadigmWhat is MapReduceMapReduce Workflow

2 Job TrackerHadoop Default Ports

3 ExampleWord CountJob TrackerKey Points

Oguzhan Gencoglu Developing a MapReduce Application

Page 4: Developing a MapReduce Application€¦ · MapReduce Paradigm Job Tracker Example Word Count Job Tracker Key Points Key Points Test mapper and reducer outside hadoop. Copy your MapReduce

MapReduce ParadigmJob Tracker

Example

What is MapReduceMapReduce Workflow

What is MapReduce

MapReduce is a software framework for processing (large) datasets in a distributed fashion over several machines.

Core idea

< key, value > pairs

Almost all data can be mapped into key, value pairs.

Keys and values may be of any type.

Oguzhan Gencoglu Developing a MapReduce Application

Page 5: Developing a MapReduce Application€¦ · MapReduce Paradigm Job Tracker Example Word Count Job Tracker Key Points Key Points Test mapper and reducer outside hadoop. Copy your MapReduce

MapReduce ParadigmJob Tracker

Example

What is MapReduceMapReduce Workflow

What is MapReduce

MapReduce is a software framework for processing (large) datasets in a distributed fashion over several machines.

Core idea

< key, value > pairs

Almost all data can be mapped into key, value pairs.

Keys and values may be of any type.

Oguzhan Gencoglu Developing a MapReduce Application

Page 6: Developing a MapReduce Application€¦ · MapReduce Paradigm Job Tracker Example Word Count Job Tracker Key Points Key Points Test mapper and reducer outside hadoop. Copy your MapReduce

MapReduce ParadigmJob Tracker

Example

What is MapReduceMapReduce Workflow

What is MapReduce

MapReduce is a software framework for processing (large) datasets in a distributed fashion over several machines.

Core idea

< key, value > pairs

Almost all data can be mapped into key, value pairs.

Keys and values may be of any type.

Oguzhan Gencoglu Developing a MapReduce Application

Page 7: Developing a MapReduce Application€¦ · MapReduce Paradigm Job Tracker Example Word Count Job Tracker Key Points Key Points Test mapper and reducer outside hadoop. Copy your MapReduce

MapReduce ParadigmJob Tracker

Example

What is MapReduceMapReduce Workflow

Outline

1 MapReduce ParadigmWhat is MapReduceMapReduce Workflow

2 Job TrackerHadoop Default Ports

3 ExampleWord CountJob TrackerKey Points

Oguzhan Gencoglu Developing a MapReduce Application

Page 8: Developing a MapReduce Application€¦ · MapReduce Paradigm Job Tracker Example Word Count Job Tracker Key Points Key Points Test mapper and reducer outside hadoop. Copy your MapReduce

MapReduce ParadigmJob Tracker

Example

What is MapReduceMapReduce Workflow

MapReduce Workflow

Write your map and reduce functions

Test with a small subset of data

If it fails use your IDE’s debugger to find the problem

Run on full dataset

If it fails Hadoop provides some debugging tools

e.g. IsolationRunner : runs a task over the same input which itfailed.

Do profiling to tune the performance

Oguzhan Gencoglu Developing a MapReduce Application

Page 9: Developing a MapReduce Application€¦ · MapReduce Paradigm Job Tracker Example Word Count Job Tracker Key Points Key Points Test mapper and reducer outside hadoop. Copy your MapReduce

MapReduce ParadigmJob Tracker

Example

What is MapReduceMapReduce Workflow

MapReduce Workflow

Write your map and reduce functions

Test with a small subset of data

If it fails use your IDE’s debugger to find the problem

Run on full dataset

If it fails Hadoop provides some debugging tools

e.g. IsolationRunner : runs a task over the same input which itfailed.

Do profiling to tune the performance

Oguzhan Gencoglu Developing a MapReduce Application

Page 10: Developing a MapReduce Application€¦ · MapReduce Paradigm Job Tracker Example Word Count Job Tracker Key Points Key Points Test mapper and reducer outside hadoop. Copy your MapReduce

MapReduce ParadigmJob Tracker

Example

What is MapReduceMapReduce Workflow

MapReduce Workflow

Write your map and reduce functions

Test with a small subset of data

If it fails use your IDE’s debugger to find the problem

Run on full dataset

If it fails Hadoop provides some debugging tools

e.g. IsolationRunner : runs a task over the same input which itfailed.

Do profiling to tune the performance

Oguzhan Gencoglu Developing a MapReduce Application

Page 11: Developing a MapReduce Application€¦ · MapReduce Paradigm Job Tracker Example Word Count Job Tracker Key Points Key Points Test mapper and reducer outside hadoop. Copy your MapReduce

MapReduce ParadigmJob Tracker

Example

What is MapReduceMapReduce Workflow

MapReduce Workflow

Write your map and reduce functions

Test with a small subset of data

If it fails use your IDE’s debugger to find the problem

Run on full dataset

If it fails Hadoop provides some debugging tools

e.g. IsolationRunner : runs a task over the same input which itfailed.

Do profiling to tune the performance

Oguzhan Gencoglu Developing a MapReduce Application

Page 12: Developing a MapReduce Application€¦ · MapReduce Paradigm Job Tracker Example Word Count Job Tracker Key Points Key Points Test mapper and reducer outside hadoop. Copy your MapReduce

MapReduce ParadigmJob Tracker

Example

What is MapReduceMapReduce Workflow

MapReduce Workflow

Write your map and reduce functions

Test with a small subset of data

If it fails use your IDE’s debugger to find the problem

Run on full dataset

If it fails Hadoop provides some debugging tools

e.g. IsolationRunner : runs a task over the same input which itfailed.

Do profiling to tune the performance

Oguzhan Gencoglu Developing a MapReduce Application

Page 13: Developing a MapReduce Application€¦ · MapReduce Paradigm Job Tracker Example Word Count Job Tracker Key Points Key Points Test mapper and reducer outside hadoop. Copy your MapReduce

MapReduce ParadigmJob Tracker

Example

What is MapReduceMapReduce Workflow

MapReduce Workflow

Write your map and reduce functions

Test with a small subset of data

If it fails use your IDE’s debugger to find the problem

Run on full dataset

If it fails Hadoop provides some debugging tools

e.g. IsolationRunner : runs a task over the same input which itfailed.

Do profiling to tune the performance

Oguzhan Gencoglu Developing a MapReduce Application

Page 14: Developing a MapReduce Application€¦ · MapReduce Paradigm Job Tracker Example Word Count Job Tracker Key Points Key Points Test mapper and reducer outside hadoop. Copy your MapReduce

MapReduce ParadigmJob Tracker

Example

What is MapReduceMapReduce Workflow

MapReduce Workflow

Write your map and reduce functions

Test with a small subset of data

If it fails use your IDE’s debugger to find the problem

Run on full dataset

If it fails Hadoop provides some debugging tools

e.g. IsolationRunner : runs a task over the same input which itfailed.

Do profiling to tune the performance

Oguzhan Gencoglu Developing a MapReduce Application

Page 15: Developing a MapReduce Application€¦ · MapReduce Paradigm Job Tracker Example Word Count Job Tracker Key Points Key Points Test mapper and reducer outside hadoop. Copy your MapReduce

MapReduce ParadigmJob Tracker

ExampleHadoop Default Ports

Outline

1 MapReduce ParadigmWhat is MapReduceMapReduce Workflow

2 Job TrackerHadoop Default Ports

3 ExampleWord CountJob TrackerKey Points

Oguzhan Gencoglu Developing a MapReduce Application

Page 16: Developing a MapReduce Application€¦ · MapReduce Paradigm Job Tracker Example Word Count Job Tracker Key Points Key Points Test mapper and reducer outside hadoop. Copy your MapReduce

MapReduce ParadigmJob Tracker

ExampleHadoop Default Ports

Hadoop Default Ports

Handful of ports over TCP.Some used by Hadoop itself (to schedule jobs, replicateblocks, etc.).Some are directly for users (either via an interposed Javaclient or via plain old HTTP)

Oguzhan Gencoglu Developing a MapReduce Application

Page 17: Developing a MapReduce Application€¦ · MapReduce Paradigm Job Tracker Example Word Count Job Tracker Key Points Key Points Test mapper and reducer outside hadoop. Copy your MapReduce

MapReduce ParadigmJob Tracker

Example

Word CountJob TrackerKey Points

Outline

1 MapReduce ParadigmWhat is MapReduceMapReduce Workflow

2 Job TrackerHadoop Default Ports

3 ExampleWord CountJob TrackerKey Points

Oguzhan Gencoglu Developing a MapReduce Application

Page 18: Developing a MapReduce Application€¦ · MapReduce Paradigm Job Tracker Example Word Count Job Tracker Key Points Key Points Test mapper and reducer outside hadoop. Copy your MapReduce

MapReduce ParadigmJob Tracker

Example

Word CountJob TrackerKey Points

Word Count

Task: Counting the word occurances (frequencies) in a text file (orset of files).

< word, count > as < key, value > pair

Mapper: Emits < word, 1 > for each word (no counting at thispart).

Shuffle in between: pairs with same keys grouped together andpassed to a single machine.

Reducer: Sums up the values (1s) with the same key value.

Oguzhan Gencoglu Developing a MapReduce Application

Page 19: Developing a MapReduce Application€¦ · MapReduce Paradigm Job Tracker Example Word Count Job Tracker Key Points Key Points Test mapper and reducer outside hadoop. Copy your MapReduce

MapReduce ParadigmJob Tracker

Example

Word CountJob TrackerKey Points

Oguzhan Gencoglu Developing a MapReduce Application

Page 20: Developing a MapReduce Application€¦ · MapReduce Paradigm Job Tracker Example Word Count Job Tracker Key Points Key Points Test mapper and reducer outside hadoop. Copy your MapReduce

MapReduce ParadigmJob Tracker

Example

Word CountJob TrackerKey Points

Outline

1 MapReduce ParadigmWhat is MapReduceMapReduce Workflow

2 Job TrackerHadoop Default Ports

3 ExampleWord CountJob TrackerKey Points

Oguzhan Gencoglu Developing a MapReduce Application

Page 21: Developing a MapReduce Application€¦ · MapReduce Paradigm Job Tracker Example Word Count Job Tracker Key Points Key Points Test mapper and reducer outside hadoop. Copy your MapReduce

MapReduce ParadigmJob Tracker

Example

Word CountJob TrackerKey Points

Job Tracker

Oguzhan Gencoglu Developing a MapReduce Application

Page 22: Developing a MapReduce Application€¦ · MapReduce Paradigm Job Tracker Example Word Count Job Tracker Key Points Key Points Test mapper and reducer outside hadoop. Copy your MapReduce

MapReduce ParadigmJob Tracker

Example

Word CountJob TrackerKey Points

Tasks

Oguzhan Gencoglu Developing a MapReduce Application

Page 23: Developing a MapReduce Application€¦ · MapReduce Paradigm Job Tracker Example Word Count Job Tracker Key Points Key Points Test mapper and reducer outside hadoop. Copy your MapReduce

MapReduce ParadigmJob Tracker

Example

Word CountJob TrackerKey Points

Name Node

Oguzhan Gencoglu Developing a MapReduce Application

Page 24: Developing a MapReduce Application€¦ · MapReduce Paradigm Job Tracker Example Word Count Job Tracker Key Points Key Points Test mapper and reducer outside hadoop. Copy your MapReduce

MapReduce ParadigmJob Tracker

Example

Word CountJob TrackerKey Points

Outline

1 MapReduce ParadigmWhat is MapReduceMapReduce Workflow

2 Job TrackerHadoop Default Ports

3 ExampleWord CountJob TrackerKey Points

Oguzhan Gencoglu Developing a MapReduce Application

Page 25: Developing a MapReduce Application€¦ · MapReduce Paradigm Job Tracker Example Word Count Job Tracker Key Points Key Points Test mapper and reducer outside hadoop. Copy your MapReduce

MapReduce ParadigmJob Tracker

Example

Word CountJob TrackerKey Points

Key Points

Test mapper and reducer outside hadoop.

Copy your MapReduce function and files to DFS.

Test mapper and reducer with hadoop using a small portion ofthe data.

Track the jobs, debug, do profiling

Oguzhan Gencoglu Developing a MapReduce Application

Page 26: Developing a MapReduce Application€¦ · MapReduce Paradigm Job Tracker Example Word Count Job Tracker Key Points Key Points Test mapper and reducer outside hadoop. Copy your MapReduce

MapReduce ParadigmJob Tracker

Example

Word CountJob TrackerKey Points

Key Points

Test mapper and reducer outside hadoop.

Copy your MapReduce function and files to DFS.

Test mapper and reducer with hadoop using a small portion ofthe data.

Track the jobs, debug, do profiling

Oguzhan Gencoglu Developing a MapReduce Application

Page 27: Developing a MapReduce Application€¦ · MapReduce Paradigm Job Tracker Example Word Count Job Tracker Key Points Key Points Test mapper and reducer outside hadoop. Copy your MapReduce

MapReduce ParadigmJob Tracker

Example

Word CountJob TrackerKey Points

Key Points

Test mapper and reducer outside hadoop.

Copy your MapReduce function and files to DFS.

Test mapper and reducer with hadoop using a small portion ofthe data.

Track the jobs, debug, do profiling

Oguzhan Gencoglu Developing a MapReduce Application

Page 28: Developing a MapReduce Application€¦ · MapReduce Paradigm Job Tracker Example Word Count Job Tracker Key Points Key Points Test mapper and reducer outside hadoop. Copy your MapReduce

MapReduce ParadigmJob Tracker

Example

Word CountJob TrackerKey Points

Key Points

Test mapper and reducer outside hadoop.

Copy your MapReduce function and files to DFS.

Test mapper and reducer with hadoop using a small portion ofthe data.

Track the jobs, debug, do profiling

Oguzhan Gencoglu Developing a MapReduce Application

Page 29: Developing a MapReduce Application€¦ · MapReduce Paradigm Job Tracker Example Word Count Job Tracker Key Points Key Points Test mapper and reducer outside hadoop. Copy your MapReduce

MapReduce ParadigmJob Tracker

Example

Word CountJob TrackerKey Points

Questions/Comments

Oguzhan Gencoglu Developing a MapReduce Application