understanding hadoop framework

22

Upload: prashant-sharma

Post on 13-Apr-2018

239 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Understanding Hadoop framework

7/27/2019 Understanding Hadoop framework

http://slidepdf.com/reader/full/understanding-hadoop-framework 1/31

Page 2: Understanding Hadoop framework

7/27/2019 Understanding Hadoop framework

http://slidepdf.com/reader/full/understanding-hadoop-framework 2/31

Week 1 – Understanding Big Data

 – Introduction to HDFS

Week 2 – Playing around with Cluster

 – Data loading Techniques

Week 3 – Map-Reduce Basics, types and formats

 – Use-cases for Map-Reduce

Week 4 –  Analytics using Pig

 – Understanding Pig Latin

Week 5 –  Analytics using Hive

 –

Understanding HIVE QL

Week 6 – NoSQL Databases

 – Understanding HBASE

Week 7 – Real world Datasets and

 – Hadoop Project Environm

Week 8 – Project Reviews

 – Planning a career in Big D

Course Topics

Page 3: Understanding Hadoop framework

7/27/2019 Understanding Hadoop framework

http://slidepdf.com/reader/full/understanding-hadoop-framework 3/31

 Live classes

 Class recordings  Module wise Quizzes, Coding Assignments

24x7 on-demand technical support

Project work on large Datasets

Online certification exam

 Lifetime access to the Learning Management System

How it works

Complementary Java Classes

Page 4: Understanding Hadoop framework

7/27/2019 Understanding Hadoop framework

http://slidepdf.com/reader/full/understanding-hadoop-framework 4/31

What is Big Data?

Page 5: Understanding Hadoop framework

7/27/2019 Understanding Hadoop framework

http://slidepdf.com/reader/full/understanding-hadoop-framework 5/31

Facebook Example

Facebook users spend 10.5 b

(almost 20,000 years) online

network

Facebook has an average of

comments are posted every

Page 6: Understanding Hadoop framework

7/27/2019 Understanding Hadoop framework

http://slidepdf.com/reader/full/understanding-hadoop-framework 6/31

Twitter has over 500 million re

users.

The USA, whose 141.8 million represents 27.4 percent of all T

good enough to finish well ahe

Japan, the UK and Indonesia.

79% of US Twitter users are mo

recommend brands they follow 67% of US Twitter users are mo

buy from brands they follow

57% of all companies that use

for business use Twitter

Twitter Example

Page 7: Understanding Hadoop framework

7/27/2019 Understanding Hadoop framework

http://slidepdf.com/reader/full/understanding-hadoop-framework 7/31

Other Industrial Usecases

• Insurance

• Healthcare

• Retail

 – Recommendations

 –Groupings

• Genome Sequencing

• Utilities

Page 8: Understanding Hadoop framework

7/27/2019 Understanding Hadoop framework

http://slidepdf.com/reader/full/understanding-hadoop-framework 8/31

Hadoop Users

http://wiki.apache.org/hadoop/PoweredBy 

Page 9: Understanding Hadoop framework

7/27/2019 Understanding Hadoop framework

http://slidepdf.com/reader/full/understanding-hadoop-framework 9/31

Data volume is growing exponentially 

• Estimated Global Data Volum

 – 2011: 1.8 ZB

 – 2015: 7.9 ZB

• The world's information doubl

• Over the next 10 years:

 – The number of servers world

 – Amount of information mana

data centers will grow by 50x

 – Number of “files” enterprise

will grow by 75x

Source: http://www.emc.com/leaders

universe.htm, which was based on the

Universe Study

Page 10: Understanding Hadoop framework

7/27/2019 Understanding Hadoop framework

http://slidepdf.com/reader/full/understanding-hadoop-framework 10/31

Un-Structured Data is exploding

Page 11: Understanding Hadoop framework

7/27/2019 Understanding Hadoop framework

http://slidepdf.com/reader/full/understanding-hadoop-framework 11/31

Read 1 TB Data

10 Machine 4 I/O Channels

Each Channel  – 1 4 I/O Channels

Each Channel  – 100 MB/s

1 Machine

Why DFS?

Page 12: Understanding Hadoop framework

7/27/2019 Understanding Hadoop framework

http://slidepdf.com/reader/full/understanding-hadoop-framework 12/31

10 Machine 4 I/O Channels

Each Channel  – 1 4 I/O Channels

Each Channel  – 100 MB/s

1 Machine

Read 1 TB Data

45 Minutes

Why DFS?

Page 13: Understanding Hadoop framework

7/27/2019 Understanding Hadoop framework

http://slidepdf.com/reader/full/understanding-hadoop-framework 13/31

4.5 Minut45 Minutes

10 Machine 4 I/O Channels

Each Channel  – 1 4 I/O Channels

Each Channel  – 100 MB/s

1 Machine

Read 1 TB Data

Why DFS?

Page 14: Understanding Hadoop framework

7/27/2019 Understanding Hadoop framework

http://slidepdf.com/reader/full/understanding-hadoop-framework 14/31

What Is Distributed File System? (DFS)

Page 15: Understanding Hadoop framework

7/27/2019 Understanding Hadoop framework

http://slidepdf.com/reader/full/understanding-hadoop-framework 15/31

 Apache Hadoop is a framework that allows for the distributed processing of large data sets ac

of commodity computers using a simple programming model.

Companies using Hadoop: 

- Yahoo 

- Google 

- Facebook  

- Amazon 

- AOL 

- IBM 

- And many more at

http://wiki.apache.org/hadoop/PoweredBy 

What is Hadoop?

Page 16: Understanding Hadoop framework

7/27/2019 Understanding Hadoop framework

http://slidepdf.com/reader/full/understanding-hadoop-framework 16/31

Hadoop Eco-System

d

Page 17: Understanding Hadoop framework

7/27/2019 Understanding Hadoop framework

http://slidepdf.com/reader/full/understanding-hadoop-framework 17/31

HDFS  – Hadoop Distributed File System (storage)

MapReduce (processing)

Hadoop Core Components:

h i S?

Page 18: Understanding Hadoop framework

7/27/2019 Understanding Hadoop framework

http://slidepdf.com/reader/full/understanding-hadoop-framework 18/31

HDFS - Hadoop Distributed File System

Highly fault-tolerant

High throughput

Suitable for applications with large data sets

Streaming access to file system data

Can be built out of commodity hardware

What is HDFS?

M i C Of HDFS

Page 19: Understanding Hadoop framework

7/27/2019 Understanding Hadoop framework

http://slidepdf.com/reader/full/understanding-hadoop-framework 19/31

NameNode:

master of the system

maintains and manages the blocks which are present on the

DataNodes

Main Components Of HDFS:

DataNodes: slaves which are deployed on each machine and provide the actual

storage

responsible for serving read and write requests for the clients 

S d N N d

Page 20: Understanding Hadoop framework

7/27/2019 Understanding Hadoop framework

http://slidepdf.com/reader/full/understanding-hadoop-framework 20/31

Secondary NameNode:

Not a hot standby for the NameNode

Connects to NameNode every hour*

Housekeeping, backup of NemeNode metadata

Saved metadata can build a failed NameNode

Secondary NameNode:

You gi

metada

hour, I sec

Sin

F

Secondary

NameNode

NameNode

metadata

metadata

J bT k d T kT k

Page 21: Understanding Hadoop framework

7/27/2019 Understanding Hadoop framework

http://slidepdf.com/reader/full/understanding-hadoop-framework 21/31

JobTracker and TaskTracker:

HDFS A hit t

Page 22: Understanding Hadoop framework

7/27/2019 Understanding Hadoop framework

http://slidepdf.com/reader/full/understanding-hadoop-framework 22/31

 

HDFS Architecture

Job Tracker

Page 23: Understanding Hadoop framework

7/27/2019 Understanding Hadoop framework

http://slidepdf.com/reader/full/understanding-hadoop-framework 23/31

Job Tracker

Job Tracker Contd

Page 24: Understanding Hadoop framework

7/27/2019 Understanding Hadoop framework

http://slidepdf.com/reader/full/understanding-hadoop-framework 24/31

Job Tracker Contd.

Job Tracker Contd

Page 25: Understanding Hadoop framework

7/27/2019 Understanding Hadoop framework

http://slidepdf.com/reader/full/understanding-hadoop-framework 25/31

Job Tracker Contd.

Job Tracker Contd

Page 26: Understanding Hadoop framework

7/27/2019 Understanding Hadoop framework

http://slidepdf.com/reader/full/understanding-hadoop-framework 26/31

Job Tracker Contd.

HDFS Client Creates a New File

Page 27: Understanding Hadoop framework

7/27/2019 Understanding Hadoop framework

http://slidepdf.com/reader/full/understanding-hadoop-framework 27/31

HDFS Client Creates a New File

Rack Awareness

Page 28: Understanding Hadoop framework

7/27/2019 Understanding Hadoop framework

http://slidepdf.com/reader/full/understanding-hadoop-framework 28/31

Rack Awareness

Anatomy of a File Write:

Page 29: Understanding Hadoop framework

7/27/2019 Understanding Hadoop framework

http://slidepdf.com/reader/full/understanding-hadoop-framework 29/31

Anatomy of a File Write:

Anatomy of a File Read:

Page 30: Understanding Hadoop framework

7/27/2019 Understanding Hadoop framework

http://slidepdf.com/reader/full/understanding-hadoop-framework 30/31

Anatomy of a File Read:

Page 31: Understanding Hadoop framework

7/27/2019 Understanding Hadoop framework

http://slidepdf.com/reader/full/understanding-hadoop-framework 31/31

Thank YouSee You in Class Next Week