hadoop- a highly available and secure enterprise datawarehousing solution
TRANSCRIPT
www.edureka.co/hadoop-admin
Hadoop : A Highly Available and Secure Enterprise Data Warehousing Solution
www.edureka.co/hadoop-admin
What will you learn today?
What is Big Data
Hadoop: A synonym for Big Data
Hadoop High Availability
Hadoop as a Data Warehouse
Hands-On: Achieving NameNode and YARN high availability
Hands-On: Securing through ACL
www.edureka.co/hadoop-admin
RDBMS – Not the right choice for Big Data
Considering the type and volume of data RDBMS is not the right choice for storing Big Data
www.edureka.co/hadoop-admin
What is Hadoop ?
Apache Hadoop is an open source, scalable and reliable solution that stores and allows distributed processing of large data sets across clusters of computers using simple programming model
www.edureka.co/hadoop-admin
A closer look at Apache Hadoop
Apache Hadoop includes following modules :
Hadoop Distributed File System (HDFS): A distributed file system
Hadoop Common: The common utilities that support the other Hadoop modules
Hadoop YARN: A framework for job scheduling and cluster resource management
Hadoop MapReduce: A YARN-based system for parallel processing of large data sets
www.edureka.co/hadoop-admin
Maintaining High Availability
In Distributed Computing, failure is a norm, which means YARN should have acceptable amount of availability
NameNode - No Horizontal Scale NameNode - No High Availability
DataNode
DataNode
DataNode
….
Client get Block Locations
Read Data
NameNodeNS
Block Management
www.edureka.co/hadoop-admin
NameNode: Single Point of Failure
SecondaryNameNode
NameNode
Secondary NameNode:
"Not a hot standby" for the NameNode
Connects to NameNode every hour*
Housekeeping, backup of NemeNode metadata
Saved metadata can build a failed NameNode
metadata
metadata
Single PointFailure
You give me metadata
every hour, I will make it
secure
www.edureka.co/hadoop-admin
Hadoop 2.0 Cluster Architecture: High Availability
Node Manager
HDFS
YARN
Resource Manager
Shared edit logs
All name space edits logged to shared NFS storage; single writer
(fencing)
Read edit logs and applies to its own namespace
Secondary Name Node
DataNode
Standby NameNode
Active NameNode
ContainerApp
Master
Node Manager
DataNode
ContainerApp
Master
Data Node
Client
DataNode
ContainerApp
Master
Node Manager
DataNode
ContainerApp
Master
Node Manager
NameNode High Availability
Next Generation MapReduce
http://hadoop.apache.org/docs/stable2/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithNFS.html
HDFS HIGH AVAILABILITY
www.edureka.co/hadoop-admin
Hadoop: The Perfect Data Warehouse
Free TextImages/Videos
HCatalog
HiveSQL Others …ImpalaSQL
Tableau CognosQlikView
LogsTransaction Sensors
Pentaho
HDFS Files
Metadata
Query Engines
BI Tools
www.edureka.co/hadoop-admin
What is a Data Warehouse is good at ?
Among others, a data warehouse is the foundation for a successful business intelligence program
The Data Warehouse Institute
www.tdwi.org
www.edureka.co/hadoop-admin
Survey
Your feedback is vital for us, be it a compliment, a suggestion or a complaint. It helps us to make your experience better!
Please spare few minutes to take the survey after the webinar.