hadoop and mapreduce

Presentation by:

Asst. Prof. Amresh Kumar

Department of Computer Science & Engineering

GH Raisoni College of Engineering, Nagpur

112/05/14

2

• Introduction

• Hadoop Architecture

• MapReduce Program Model

• References

12/05/14

• Hadoop is :Open source software framework.Scalable, fault-tolerant system, Simple and Accessible and supports distributed applications.

• Hadoop is build on (Provides) two main parts:

A shared storage: HDFS and Framework: MR

•Hadoop project includes: HDFS, MR, YARN.

•Other Hadoop-related projects at Apache.

Hadoop was created by Doug Cutting.

12/05/14 4

http://en.wikipedia.org/wiki/Doug_Cutting

• Software framework.

• The name derives from the application of map() and reduce() functions.

• It splits the I/P data-set into independent chunks, which are processed by the map() and the reduce().

• Typically, compute nodes & storage nodes are the same (MRF + DFS).

• HDFS :o A shared storage.o Stores data on the compute nodes (After Processing).o HDFS has a master (Namenodes)/slave (DataNodes)

architecture.

5

Data Node

Name Node

Data Node Data Node

DFS

DFS DFS DFS

Master

Slave Slave Slave

Task Tracke

r

Task Tracke

r

Task Tracke

r

MRF MRF MRF

Job Tracker

MRF

MAPPER

MAPPER

REDUCER

OUTPUT

Input Dataset

Partitioning

Merge on Disk

Other Reducers

Merge

Merge

Merge

Fetch

Other Mappers

Map Phase Reduce PhaseCopy Phase Sort Phase

DataNodes DataNodes

NameNode

Start NameNode

Start JobTracker

Start DataNode Start

TaskTracker

Working on Hadoop

DataNodes DataNodes

NameNode

http://localhost:50070


Confirm: Cluster Working

DataNodes DataNodes

NameNode

Put I/P on HDFS

Run Algorithm



Running Algorithm on Cluster

Map & ReduceRunnING……..

Reliability: Replication of Data

NameNodeNameNode



Analyzing Result

12

[1] http://ieeexplore.ieee.org[2] http://hadoop.apache.org/[3] https://wiki.cloudera.com[4] http://hadoop.apache.org/docs/hdfs/current/hdfs_design.html[5] Books:

•Data-Intensive Text Processing with MapReduce, Jimmy Lin and Chris Dyer University of Maryland, College Park.•Hadoop: The Definitive Guide, Tom White.

12/05/14

Thank You

1312/05/14