hadoop and mapreduce

13
Presentation by: Asst. Prof. Amresh Kumar Department of Computer Science & Engineering GH Raisoni College of Engineering, Nagpur 1 12/05/14

Upload: amreshkr19

Post on 05-Jul-2015

119 views

Category:

Engineering


6 download

DESCRIPTION

Hadoop, MapReduce

TRANSCRIPT

Page 1: Hadoop and MapReduce

Presentation by:

Asst. Prof. Amresh Kumar

Department of Computer Science & Engineering

GH Raisoni College of Engineering, Nagpur

112/05/14

Page 2: Hadoop and MapReduce

2

• Introduction

• Hadoop Architecture

• MapReduce Program Model

• References

12/05/14

Page 3: Hadoop and MapReduce

3

Page 4: Hadoop and MapReduce

• Hadoop is :Open source software framework.Scalable, fault-tolerant system, Simple and Accessible and supports distributed applications.

• Hadoop is build on (Provides) two main parts:

A shared storage: HDFS and Framework: MR

•Hadoop project includes: HDFS, MR, YARN.

•Other Hadoop-related projects at Apache.

Hadoop was created by Doug Cutting.

12/05/14 4

Page 5: Hadoop and MapReduce

• Software framework.

• The name derives from the application of map() and reduce() functions.

• It splits the I/P data-set into independent chunks, which are processed by the map() and the reduce().

• Typically, compute nodes & storage nodes are the same (MRF + DFS).

• HDFS :o A shared storage.o Stores data on the compute nodes (After Processing).o HDFS has a master (Namenodes)/slave (DataNodes)

architecture.

5

Page 6: Hadoop and MapReduce

Data Node

Name Node

Data Node Data Node

DFS

DFS DFS DFS

Master

Slave Slave Slave

Task Tracke

r

Task Tracke

r

Task Tracke

r

MRF MRF MRF

Job Tracker

MRF

Page 7: Hadoop and MapReduce

MAPPER

MAPPER

REDUCER

OUTPUT

Input Dataset

Partitioning

Merge on Disk

Other Reducers

Merge

Merge

Merge

Fetch

Other Mappers

Map Phase Reduce PhaseCopy Phase Sort Phase

Page 8: Hadoop and MapReduce

DataNodes DataNodes

NameNode

Start NameNode

Start JobTracker

Start DataNode Start

TaskTracker

Working on Hadoop

Page 9: Hadoop and MapReduce

DataNodes DataNodes

NameNode

http://localhost:50070

http://localhost:50030

Confirm: Cluster Working

Page 10: Hadoop and MapReduce

DataNodes DataNodes

NameNode

Put I/P on HDFS

Run Algorithm

http://localhost:50070

http://localhost:50030

Running Algorithm on Cluster

Map & ReduceRunnING……..

Reliability: Replication of Data

Page 11: Hadoop and MapReduce

NameNodeNameNode

http://localhost:50070

http://localhost:50030

Analyzing Result

Page 12: Hadoop and MapReduce

12

[1] http://ieeexplore.ieee.org[2] http://hadoop.apache.org/[3] https://wiki.cloudera.com[4] http://hadoop.apache.org/docs/hdfs/current/hdfs_design.html[5] Books:

•Data-Intensive Text Processing with MapReduce, Jimmy Lin and Chris Dyer University of Maryland, College Park.•Hadoop: The Definitive Guide, Tom White.

12/05/14

Page 13: Hadoop and MapReduce

Thank You

1312/05/14