an introduction to apache hadoop mapreduce

Download An Introduction to Apache Hadoop MapReduce

If you can't read please download the document

Upload: semtech-solutions-ltd

Post on 16-Apr-2017

1.080 views

Category:

Technology


3 download

TRANSCRIPT

Apache Hadoop MapReduce

What is it ?

Why use it ?

How does it work

Some examples

Big users

MapReduce What is it ?

Processing engine of Hadoop

Developers create Map and Reduce jobs

Used for big data batch processing

Parallel processing of huge data volumes

Fault tolerant

Scalable

MapReduce Why use it ?

Your data in Terabyte / Petabyte range

You have huge I/O

Hadoop framework takes care of

Job and task management

Failures

Storage

Replication

You just write Map and Reduce jobs

MapReduce How does it work ?

Take word counting as an example, something that Google does all of the time.

MapReduce How does it work ?

Input data split into shards

Split data mapped to key,value pairs i.e. Bear,1

Mapped data shuffled/sorted by key i.e. Bear

Sorted data reduced i.e. Bear, 2

Final data stored on HDFS

There might be extra map layer before shuffle

JobTracker controls all tasks in job

TaskTracker controls map and reduce

MapReduce - Some examples

A visual example with colours to show you the cycle

Split -> Map -> Shuffle -> Reduce

MapReduce - Some examples

A visual example of MapReduce with job and task trackers added to individual map and reduce jobs.

Hadoop MapReduce Big users

Users

Facebook

Yahoo

Amazon

Ebay

Providers

Amazon

Cloudera

HortonWorks

MapR

Contact Us

Feel free to contact us at

www.semtech-solutions.co.nz

[email protected]

We offer IT project consultancy

We are happy to hear about your problems

You can just pay for those hours that you need

To solve your problems