hadoop- a highly available and secure enterprise datawarehousing solution

20
www.edureka.co/hadoop-admin Hadoop : A Highly Available and Secure Enterprise Data Warehousing Solution

Upload: edureka

Post on 28-Jan-2018

574 views

Category:

Technology


0 download

TRANSCRIPT

www.edureka.co/hadoop-admin

Hadoop : A Highly Available and Secure Enterprise Data Warehousing Solution

www.edureka.co/hadoop-admin

What will you learn today?

What is Big Data

Hadoop: A synonym for Big Data

Hadoop High Availability

Hadoop as a Data Warehouse

Hands-On: Achieving NameNode and YARN high availability

Hands-On: Securing through ACL

www.edureka.co/hadoop-admin

What is Big Data ?

What is Big Data

www.edureka.co/hadoop-admin

Big Data

www.edureka.co/hadoop-admin

How Big Data is stored ?

How do you store Big

Data, I guess RDBMS?

www.edureka.co/hadoop-admin

RDBMS – Not the right choice for Big Data

Considering the type and volume of data RDBMS is not the right choice for storing Big Data

www.edureka.co/hadoop-admin

Hadoop : The Savior

Hadoop

www.edureka.co/hadoop-admin

What is Hadoop ?

Apache Hadoop is an open source, scalable and reliable solution that stores and allows distributed processing of large data sets across clusters of computers using simple programming model

www.edureka.co/hadoop-admin

A closer look at Apache Hadoop

Apache Hadoop includes following modules :

Hadoop Distributed File System (HDFS): A distributed file system

Hadoop Common: The common utilities that support the other Hadoop modules

Hadoop YARN: A framework for job scheduling and cluster resource management

Hadoop MapReduce: A YARN-based system for parallel processing of large data sets

www.edureka.co/hadoop-admin

High Availability

www.edureka.co/hadoop-admin

Maintaining High Availability

In Distributed Computing, failure is a norm, which means YARN should have acceptable amount of availability

NameNode - No Horizontal Scale NameNode - No High Availability

DataNode

DataNode

DataNode

….

Client get Block Locations

Read Data

NameNodeNS

Block Management

www.edureka.co/hadoop-admin

NameNode: Single Point of Failure

SecondaryNameNode

NameNode

Secondary NameNode:

"Not a hot standby" for the NameNode

Connects to NameNode every hour*

Housekeeping, backup of NemeNode metadata

Saved metadata can build a failed NameNode

metadata

metadata

Single PointFailure

You give me metadata

every hour, I will make it

secure

www.edureka.co/hadoop-admin

Hadoop 2.0 Cluster Architecture: High Availability

Node Manager

HDFS

YARN

Resource Manager

Shared edit logs

All name space edits logged to shared NFS storage; single writer

(fencing)

Read edit logs and applies to its own namespace

Secondary Name Node

DataNode

Standby NameNode

Active NameNode

ContainerApp

Master

Node Manager

DataNode

ContainerApp

Master

Data Node

Client

DataNode

ContainerApp

Master

Node Manager

DataNode

ContainerApp

Master

Node Manager

NameNode High Availability

Next Generation MapReduce

http://hadoop.apache.org/docs/stable2/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithNFS.html

HDFS HIGH AVAILABILITY

www.edureka.co/hadoop-admin

Hands-On

Achieving HDFS and YARN High Availability

www.edureka.co/hadoop-admin

Hands-On

Securing through ACL

www.edureka.co/hadoop-admin

What to do with Big Data ?

www.edureka.co/hadoop-admin

Hadoop: The Perfect Data Warehouse

Free TextImages/Videos

HCatalog

HiveSQL Others …ImpalaSQL

Tableau CognosQlikView

LogsTransaction Sensors

Pentaho

HDFS Files

Metadata

Query Engines

BI Tools

www.edureka.co/hadoop-admin

What is a Data Warehouse is good at ?

Among others, a data warehouse is the foundation for a successful business intelligence program

The Data Warehouse Institute

www.tdwi.org

www.edureka.co/hadoop-admin

Survey

Your feedback is vital for us, be it a compliment, a suggestion or a complaint. It helps us to make your experience better!

Please spare few minutes to take the survey after the webinar.

www.edureka.co/hadoop-admin

Thank You …

Questions/Queries/Feedback

Recording and presentation will be made available to you within 24 hours