hadoop administrationfiles.meetup.com/11583652/hadoop_presentation.pdf · why hadoop ? we are...

23
Hadoop Administration

Upload: others

Post on 14-Jun-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Hadoop Administrationfiles.meetup.com/11583652/Hadoop_Presentation.pdf · Why Hadoop ? We are generating more data then ever before Financial transactions Sensor networks Server logs

Hadoop Administration

Page 2: Hadoop Administrationfiles.meetup.com/11583652/Hadoop_Presentation.pdf · Why Hadoop ? We are generating more data then ever before Financial transactions Sensor networks Server logs

Case for Hadoop

Why Hadoop is needed

How Hadoop originated

What problems Hadoop Solve

Page 3: Hadoop Administrationfiles.meetup.com/11583652/Hadoop_Presentation.pdf · Why Hadoop ? We are generating more data then ever before Financial transactions Sensor networks Server logs

Why Hadoop ?

We are generating more data then ever before

Financial transactions

Sensor networks

Server logs

Analytics

Social Media

It’s not just about the size of data, but the frequency of data. We are generating data faster then ever before.

We need to make sense out of data.

Page 4: Hadoop Administrationfiles.meetup.com/11583652/Hadoop_Presentation.pdf · Why Hadoop ? We are generating more data then ever before Financial transactions Sensor networks Server logs

The 3 V's

Web logs

Images

Videos

Audios

Sensor Data

Volume Velocity Variety

Page 5: Hadoop Administrationfiles.meetup.com/11583652/Hadoop_Presentation.pdf · Why Hadoop ? We are generating more data then ever before Financial transactions Sensor networks Server logs

Two Big problems at hand

Large scale data storage

Large scale data analysis

- Traditional ways of moving data to the compute node, does not scale well at this large scale.

- More time spent coping data then actually processing it.

Page 6: Hadoop Administrationfiles.meetup.com/11583652/Hadoop_Presentation.pdf · Why Hadoop ? We are generating more data then ever before Financial transactions Sensor networks Server logs
Page 7: Hadoop Administrationfiles.meetup.com/11583652/Hadoop_Presentation.pdf · Why Hadoop ? We are generating more data then ever before Financial transactions Sensor networks Server logs

What is Hadoop Apache Hadoop is a framework that allows for the distributed processing of large data sets across clusters of commodity computers using a simple programming model.

It is an Open-source Data Management with scale-out storage

& distributed processing.

Page 8: Hadoop Administrationfiles.meetup.com/11583652/Hadoop_Presentation.pdf · Why Hadoop ? We are generating more data then ever before Financial transactions Sensor networks Server logs

Hadoop Eco-System

Page 9: Hadoop Administrationfiles.meetup.com/11583652/Hadoop_Presentation.pdf · Why Hadoop ? We are generating more data then ever before Financial transactions Sensor networks Server logs

Hadoop Components

It has two main components:

HDFS – Hadoop Distributed File System (Storage)

Distributed across “nodes”

Natively redundant

NameNode tracks locations.

MapReduce (Processing)

Splits a task across processors

“near” the data & assembles results

Page 10: Hadoop Administrationfiles.meetup.com/11583652/Hadoop_Presentation.pdf · Why Hadoop ? We are generating more data then ever before Financial transactions Sensor networks Server logs

Main Components Of HDFS NameNode

- Master Node

- Stores MetaData

DataNode

- Stores the Actual Data Blocks

- Serves Read/Write Requests

Page 11: Hadoop Administrationfiles.meetup.com/11583652/Hadoop_Presentation.pdf · Why Hadoop ? We are generating more data then ever before Financial transactions Sensor networks Server logs

NameNode Metadata Meta-data in Memory

- The entire metadata is in main memory

- No demand paging of FS meta-data

Types of Metadata

- List of files

- List of Blocks for each file

- List of DataNode for each block

- File attributes, e.g. access time, replication factor

A Transaction Log

- Records file creations, file deletions. etc

Page 12: Hadoop Administrationfiles.meetup.com/11583652/Hadoop_Presentation.pdf · Why Hadoop ? We are generating more data then ever before Financial transactions Sensor networks Server logs

HDFS Architecture

Page 13: Hadoop Administrationfiles.meetup.com/11583652/Hadoop_Presentation.pdf · Why Hadoop ? We are generating more data then ever before Financial transactions Sensor networks Server logs

File Split

Page 14: Hadoop Administrationfiles.meetup.com/11583652/Hadoop_Presentation.pdf · Why Hadoop ? We are generating more data then ever before Financial transactions Sensor networks Server logs

File Split

Page 15: Hadoop Administrationfiles.meetup.com/11583652/Hadoop_Presentation.pdf · Why Hadoop ? We are generating more data then ever before Financial transactions Sensor networks Server logs
Page 16: Hadoop Administrationfiles.meetup.com/11583652/Hadoop_Presentation.pdf · Why Hadoop ? We are generating more data then ever before Financial transactions Sensor networks Server logs
Page 17: Hadoop Administrationfiles.meetup.com/11583652/Hadoop_Presentation.pdf · Why Hadoop ? We are generating more data then ever before Financial transactions Sensor networks Server logs

Replication

Page 18: Hadoop Administrationfiles.meetup.com/11583652/Hadoop_Presentation.pdf · Why Hadoop ? We are generating more data then ever before Financial transactions Sensor networks Server logs

Write Operation

Page 19: Hadoop Administrationfiles.meetup.com/11583652/Hadoop_Presentation.pdf · Why Hadoop ? We are generating more data then ever before Financial transactions Sensor networks Server logs

Write Operation

Page 20: Hadoop Administrationfiles.meetup.com/11583652/Hadoop_Presentation.pdf · Why Hadoop ? We are generating more data then ever before Financial transactions Sensor networks Server logs
Page 21: Hadoop Administrationfiles.meetup.com/11583652/Hadoop_Presentation.pdf · Why Hadoop ? We are generating more data then ever before Financial transactions Sensor networks Server logs

Rack Awareness

Page 22: Hadoop Administrationfiles.meetup.com/11583652/Hadoop_Presentation.pdf · Why Hadoop ? We are generating more data then ever before Financial transactions Sensor networks Server logs

Pipelined Write

Page 23: Hadoop Administrationfiles.meetup.com/11583652/Hadoop_Presentation.pdf · Why Hadoop ? We are generating more data then ever before Financial transactions Sensor networks Server logs

Thank You!