hadoop and mapreduce
DESCRIPTION
Hadoop, MapReduceTRANSCRIPT
Presentation by:
Asst. Prof. Amresh Kumar
Department of Computer Science & Engineering
GH Raisoni College of Engineering, Nagpur
112/05/14
2
• Introduction
• Hadoop Architecture
• MapReduce Program Model
• References
12/05/14
3
• Hadoop is :Open source software framework.Scalable, fault-tolerant system, Simple and Accessible and supports distributed applications.
• Hadoop is build on (Provides) two main parts:
A shared storage: HDFS and Framework: MR
•Hadoop project includes: HDFS, MR, YARN.
•Other Hadoop-related projects at Apache.
Hadoop was created by Doug Cutting.
12/05/14 4
• Software framework.
• The name derives from the application of map() and reduce() functions.
• It splits the I/P data-set into independent chunks, which are processed by the map() and the reduce().
• Typically, compute nodes & storage nodes are the same (MRF + DFS).
• HDFS :o A shared storage.o Stores data on the compute nodes (After Processing).o HDFS has a master (Namenodes)/slave (DataNodes)
architecture.
5
Data Node
Name Node
Data Node Data Node
DFS
DFS DFS DFS
Master
Slave Slave Slave
Task Tracke
r
Task Tracke
r
Task Tracke
r
MRF MRF MRF
Job Tracker
MRF
MAPPER
MAPPER
REDUCER
OUTPUT
Input Dataset
Partitioning
Merge on Disk
Other Reducers
Merge
Merge
Merge
Fetch
Other Mappers
Map Phase Reduce PhaseCopy Phase Sort Phase
DataNodes DataNodes
NameNode
Start NameNode
Start JobTracker
Start DataNode Start
TaskTracker
Working on Hadoop
DataNodes DataNodes
NameNode
http://localhost:50070
http://localhost:50030
Confirm: Cluster Working
DataNodes DataNodes
NameNode
Put I/P on HDFS
Run Algorithm
http://localhost:50070
http://localhost:50030
Running Algorithm on Cluster
Map & ReduceRunnING……..
Reliability: Replication of Data
NameNodeNameNode
http://localhost:50070
http://localhost:50030
Analyzing Result
12
[1] http://ieeexplore.ieee.org[2] http://hadoop.apache.org/[3] https://wiki.cloudera.com[4] http://hadoop.apache.org/docs/hdfs/current/hdfs_design.html[5] Books:
•Data-Intensive Text Processing with MapReduce, Jimmy Lin and Chris Dyer University of Maryland, College Park.•Hadoop: The Definitive Guide, Tom White.
12/05/14
Thank You
1312/05/14