hadoop a highly available and secure enterprise data warehousing solution

38
www.edureka.co/r-for-analytics www.edureka.co/hadoop-admin Hadoop : A Highly Available and Secure Enterprise Data warehousing Solution

Upload: edureka

Post on 23-Jan-2018

994 views

Category:

Technology


2 download

TRANSCRIPT

Page 1: Hadoop a Highly Available and Secure Enterprise Data Warehousing solution

www.edureka.co/r-for-analytics

www.edureka.co/hadoop-admin

Hadoop : A Highly Available and Secure Enterprise Data warehousing Solution

Page 2: Hadoop a Highly Available and Secure Enterprise Data Warehousing solution

Slide 2Slide 2Slide 2 www.edureka.co/hadoop-admin

At the end of this webinar we will Know about:

What is Big Data

Why do Enterprise care about Big Data

Why your DWH needs Hadoop?

Security in Hadoop

How Hadoop maintains high Availability

Data warehousing tools in Hadoop

Agenda

Page 3: Hadoop a Highly Available and Secure Enterprise Data Warehousing solution

Slide 3Slide 3Slide 3 www.edureka.co/hadoop-admin

What is Big Data

Page 4: Hadoop a Highly Available and Secure Enterprise Data Warehousing solution

Slide 4Slide 4Slide 4 www.edureka.co/hadoop-admin

Page 5: Hadoop a Highly Available and Secure Enterprise Data Warehousing solution

Slide 5Slide 5Slide 5 www.edureka.co/hadoop-admin

What is Wrong with our traditional DWH Solutions

Page 6: Hadoop a Highly Available and Secure Enterprise Data Warehousing solution

Slide 6Slide 6Slide 6 www.edureka.co/hadoop-admin

Storing Unstructured data like images and video

Processing images and video

Storing and processing other large files

PDFs, Excel files

Processing large blocks of natural language text

Blog posts, job ads, product descriptions

Processing semi-structured data

CSV, JSON, XML, log files

Sensor data

When RDBMS Makes no Sense?

Page 7: Hadoop a Highly Available and Secure Enterprise Data Warehousing solution

Slide 7Slide 7Slide 7 www.edureka.co/hadoop-admin

Ad-hoc, exploratory analytics

Integrating data from external sources

Data cleanup tasks

Very advanced analytics (machine learning)

When RDBMS Makes no Sense?

Page 8: Hadoop a Highly Available and Secure Enterprise Data Warehousing solution

Slide 8Slide 8Slide 8 www.edureka.co/hadoop-admin

It is:

– Unstructured

– Unprocessed

– Un-aggregated

– Un-filtered

– Repetitive

– Low quality

– And generally messy.

Oh, and there is a lot of it.

Big Problems with Big Data

Page 9: Hadoop a Highly Available and Secure Enterprise Data Warehousing solution

Slide 9Slide 9Slide 9 www.edureka.co/hadoop-admin

Storage capacity

Storage throughput

Pipeline throughput

Processing power

Parallel processing

System Integration

Data Analysis

Scalable storage

Massive Parallel Processing

Ready to use tools

Technical Challenges

Page 10: Hadoop a Highly Available and Secure Enterprise Data Warehousing solution

Slide 10Slide 10Slide 10 www.edureka.co/hadoop-admin

Too many channels for data

Technical Challenges

Page 11: Hadoop a Highly Available and Secure Enterprise Data Warehousing solution

Slide 11Slide 11Slide 11 www.edureka.co/hadoop-admin

Why do Enterprise care about Big Data

Page 12: Hadoop a Highly Available and Secure Enterprise Data Warehousing solution

Slide 12Slide 12Slide 12 www.edureka.co/hadoop-admin

Page 13: Hadoop a Highly Available and Secure Enterprise Data Warehousing solution

Slide 13Slide 13Slide 13 www.edureka.co/hadoop-admin

Page 14: Hadoop a Highly Available and Secure Enterprise Data Warehousing solution

Slide 14Slide 14Slide 14 www.edureka.co/hadoop-admin

You said RDBMS does not have solution

for Big Data, Then who has???

Page 15: Hadoop a Highly Available and Secure Enterprise Data Warehousing solution

Slide 15Slide 15Slide 15 www.edureka.co/hadoop-admin

I Have The solution for Big Data Problem

Hadoop

Hadoop : The Savior

Page 16: Hadoop a Highly Available and Secure Enterprise Data Warehousing solution

Slide 16Slide 16Slide 16 www.edureka.co/hadoop-admin

How Hadoop differs from RDBMS

Hadoop can store all types of data in it so that you have flexibility of analyzing all types of data.

You can drill down the big data to find even the rare insight which was not possible earlier.

Page 17: Hadoop a Highly Available and Secure Enterprise Data Warehousing solution

Slide 17Slide 17Slide 17 www.edureka.co/hadoop-admin

First Load the data then do whatever you want to do.

This is Possible because of the cheap storage and distributed HDFS.

Hadoop Is The New DWH Solution

• This is ETL• Before loading you should

transform data in particular format

• This puts an restriction on the type of data that can be stored

Page 18: Hadoop a Highly Available and Secure Enterprise Data Warehousing solution

Slide 18Slide 18Slide 18 www.edureka.co/hadoop-admin

First Load the data then do whatever you want to do.

This is Possible because of the cheap storage and distributed HDFS.

Hadoop Is The New DWH Solution

• This is ETL• Before loading you should

transform data in particular format

• This puts an restriction on the type of data that can be stored

• This is ELT• There is no need to transform

the data beforehand• You can have all kind of data on

board• Freedom to work with all data

Page 19: Hadoop a Highly Available and Secure Enterprise Data Warehousing solution

Slide 19Slide 19Slide 19 www.edureka.co/hadoop-admin

Hadoop is the new Data Warehouse for all kind of BI requirements.

Hadoop Does ELT Not ETL

Page 20: Hadoop a Highly Available and Secure Enterprise Data Warehousing solution

Slide 20Slide 20Slide 20 www.edureka.co/hadoop-admin

Core Features of Hadoop

Page 21: Hadoop a Highly Available and Secure Enterprise Data Warehousing solution

Slide 21Slide 21Slide 21 www.edureka.co/hadoop-admin

Hadoop Is Fault Tolerant And Super Consistent

Page 22: Hadoop a Highly Available and Secure Enterprise Data Warehousing solution

Slide 22Slide 22Slide 22 www.edureka.co/hadoop-admin

Maintaining High Availability(HA)

In Distributed Computing, failure is a norm, which means YARN should have acceptable amount of availability

NameNode - No Horizontal Scale

NameNode - No High Availability

DataNode

DataNode

DataNode

….

Client get Block Locations

Read Data

NameNodeNS

Block Management

Page 23: Hadoop a Highly Available and Secure Enterprise Data Warehousing solution

Slide 23Slide 23Slide 23 www.edureka.co/hadoop-admin

Secondary NameNode:

"Not a hot standby" for the NameNode

Connects to NameNode every hour*

Housekeeping, backup of NemeNode metadata

Saved metadata can build a failed NameNode

SecondaryNameNode

NameNode

metadata

metadata

Single PointFailure

You give me metadata

every hour, I will make it

secure

NameNode – Single Point of Failure

Page 24: Hadoop a Highly Available and Secure Enterprise Data Warehousing solution

Slide 24Slide 24Slide 24 www.edureka.co/hadoop-admin

Node Manager

HDFS

YARN

Resource Manager

Shared edit logs

All name space edits logged to shared NFS storage; single writer

(fencing)

Read edit logs and applies to its own namespace

Secondary Name Node

DataNode

Standby NameNode

Active NameNode

ContainerApp

Master

Node Manager

DataNode

ContainerApp

Master

Data Node

Client

DataNode

ContainerApp

Master

Node Manager

DataNode

ContainerApp

Master

Node Manager

NameNode High Availability

Next Generation MapReduce

HDFS HIGH AVAILABILITY

http://hadoop.apache.org/docs/stable2/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithNFS.html

Hadoop 2.0 Cluster Architecture - HA

Page 25: Hadoop a Highly Available and Secure Enterprise Data Warehousing solution

Demo

Achieving HDFS and YARN High Availability

Page 26: Hadoop a Highly Available and Secure Enterprise Data Warehousing solution

Slide 26Slide 26Slide 26 www.edureka.co/hadoop-admin

Hadoop is Secure

Page 27: Hadoop a Highly Available and Secure Enterprise Data Warehousing solution

Slide 27Slide 27Slide 27 www.edureka.co/hadoop-admin

Security

Service-level authorization and web proxy capabilities in YARN.

Access Control Lists(ACL) : The Hadoop Distributed File System (HDFS) implements a permissions model for files and directories that shares much of the POSIX model

Page 28: Hadoop a Highly Available and Secure Enterprise Data Warehousing solution

Slide 28Slide 28Slide 28 www.edureka.co/hadoop-admin

Security – Simple Flow

Security Risks

Insufficient Authentication Do not authenticate users services

No Privacy and No Integrity Insecure Network Transport No Message level security

Arbitrary Code Execution No User verification for MapReduce code

execution, malicious users could submit a job

Client Job Tracker

HDFS

Task Tracker

Task

HDFS

Task Tracker

Task

Page 29: Hadoop a Highly Available and Secure Enterprise Data Warehousing solution

Slide 29Slide 29Slide 29 www.edureka.co/hadoop-admin

Managing users, permissions , quotas, etc …

Checking Resources Usage And Users Permissions

Page 30: Hadoop a Highly Available and Secure Enterprise Data Warehousing solution

Demo

Demo on ACL

Page 31: Hadoop a Highly Available and Secure Enterprise Data Warehousing solution

Slide 31Slide 31Slide 31 www.edureka.co/hadoop-admin

Hadoop provides traditional SQL interface as well asNoSQL Interface foe data storage

Page 32: Hadoop a Highly Available and Secure Enterprise Data Warehousing solution

Slide 32Slide 32Slide 32 www.edureka.co/hadoop-admin

Hive ??

Page 33: Hadoop a Highly Available and Secure Enterprise Data Warehousing solution

Slide 33Slide 33Slide 33 www.edureka.co/hadoop-admin

Hive Architecture

Page 34: Hadoop a Highly Available and Secure Enterprise Data Warehousing solution

Slide 34Slide 34Slide 34 www.edureka.co/hadoop-admin

Hbase and its Architecture??

Page 35: Hadoop a Highly Available and Secure Enterprise Data Warehousing solution

Hive and HBase Integration

Page 36: Hadoop a Highly Available and Secure Enterprise Data Warehousing solution

Questions

Slide 36

Page 37: Hadoop a Highly Available and Secure Enterprise Data Warehousing solution

Slide 37

Your feedback is vital for us, be it a compliment, a suggestion or a complaint. It helps us to make your experience better!

Please spare few minutes to take the survey after the webinar.

Survey

Page 38: Hadoop a Highly Available and Secure Enterprise Data Warehousing solution