[osdc 2013] hadoop cluster ha 的經驗分享
DESCRIPTION
Hadoop HA 是個熱門且重要的議題,目前有諸多設計是以 Namenode HA 為主軸,進而延伸至 Job Tracker 和 HMaster。然而,在實作 Hadoop Cluster HA 時,僅考量Namenode、Job Tracker、和 HMaster 仍然不夠嚴謹,在Production的環境,一個Hadoop Cluster 通常還會需要其他的非Hadoop Ecosystem的服務與之協同運作,如 PostgreSQL、Kerberos、Puppet、 NTP 等,這些服務皆需一併規劃與設計,在 HA 被觸發後,讓 Hadoop Cluster 仍可正確運作。本議程將會介紹 Apache Hadoop、Cloudera、和 Hortonworks 於 Hadoop HA 上的解決方案,以及最新發展,並分享第一手的 Etu Appliance HA 作法。 韓祖棻 現任Etu 技術經理TRANSCRIPT
Hadoop Cluster HA
的經驗分享Etu 韓祖棻[email protected]
2
Who am I
韓祖棻 Jerry – Etu 技術經理
• Database Management
• Windows/Linux Application Developer
• Web Developer
• Developer of Etu
3
Agenda
• Background• Facebook Namenode High Availability• Hadoop 1.0 Namenode High Availability• Hortonworks High Availability• Cloudera High Availability• Etu Appliance High Availability• Conclusion
4
Background
5
The Hadoop Ecosystem
MahoutMahout
HBaseHBase
MapReduceMapReduce
PigPig
HDFS ( Hadoop Distributed File System)HDFS ( Hadoop Distributed File System)
Data Store
Data Processing Layer
Hive Meta StoreHive Meta Store
HiveQLHiveQL
Zooke
eper
Zooke
eper
Avro
(Seri
aliz
ati
on
)Avro
(Seri
aliz
ati
on
)
RDBMSRDBMSETL ToolsETL Tools BI ReportingBI Reporting
6
HDFS cluster consists of a single Namenode.
HDFS Architecture (Master/Slave)
Namenode
Breplication
Rack1 Rack2Client
Blocks
Datanodes Datanodes
Client
Write
Read
Metadata ops
Metadata(Name, replicas..)(/var/disk/data, 1..
Block opsMetadata ops
The Namenode was a sing point of failure
(SPOF) in an HDFS Cluster.
7
Facebook Namenode High Availability
8
AvatarNode
9
Hadoop 1.0 Namenode High Availability
10
Backup Namenode Approach
• Use case 3f: – Active running, Standby down for maintenance. Active dies and cannot start.
Standby is started and takes over as active.
11
Hortonworks High Availability
12
HDPs Full-Stack HA Architecture
13
HA for HDFS NameNode Using VMware
Do not use the NameNode VM for running any other master daemon.
14
HA for Hadoop Using RHEL (v5.x, v6.x)
15
Cloudera High Availability
16
Shared Storage Using NFS (After CDH 4.0)
NNActive
NNStandby
Shared NN state with single writer
(fencing)
DN
FailoverControllerActive
ZK
Monitor Health of NN. OS, HW
DN DN
FailoverControllerStandby
ZK ZKHeartbeat Heartbeat
Monitor Health of NN. OS, HW
SPOF
17
Journal Nodes
Quorum-based Storage (After CDH 4.1)
NNActive
NNStandby
DN
FailoverControllerActive
ZK
Monitor Health of NN. OS, HW
DN DN
FailoverControllerStandby
ZK ZKHeartbeat Heartbeat
Monitor Health of NN. OS, HW
JN JN JN
QJM QJM
JNJNJN
18
Etu Appliance High Availability
19
Summarize previous solutions
Solution AutoFailover HA Type External
StorageFacebook Avatar Node X Namenode ○Apache Hadoop 1.0 Backup Namenode X Namenode ○
Hortonworks
Vmware (*1) ○ Namenode ○
RHEL (*2) ○ System-wide ○
Cloudera (Apache Hadoop 2.X)
Shared Storage ○ Namenode(*3) Optional
Quorum-based Storage ○ Namenode (*3) Optional
1. 2 ESX Servers + SAN Arch. (vSphere HA Cluster)2. RHEL Cluster HA and Power Fencing Device3. Implementing the Fencing Method for System-wide HA.
20
Two Roles
Master node Worker
Worker
Worker
Master node
21
Services on Master and Workers
Master Worker
Hadoop Ecosystem Services
Name NodeJob TrackerHBase MasterZookeeper (Leader)Hive
Data NodeTask TrackerRegion ServerZookeeper
System Services
MySQL/PostgreSQLKerberosNTP ServerSyslog
Syslog
22
HA Architecture (Active/Standby)
23
HA based on CDH4.0.1
NNActive
NNStandby
SynchronizedFile System
DN
FailoverControllerActive
ZK
Monitor Health of NN. OS, HW
DN DN
FailoverControllerStandby
ZK ZKHeartbeat Heartbeat
Monitor Health of NN. OS, HW
24
Data Synchronization
• Hadoop ecosystem– Configurations are stored in Zookeeper– Hive meta data is stored in PostgreSQL
• PostgreSQL– Using PostgreSQL Replication
• User data• System configurations or data
– PostgreSQL, Kerberos, NTP server, Syslog
25
Requirements
Active Master Worker
Worker
Standby Master
ZK
ZK
ZK Leader
- HDFS Service is Running in Active Master- Zookeeper Cluster is ready- Standby Master is ready to activate High
Availability service
26
Failover Scenario
Active Master Worker
Worker
Worker
- Active Namenode service failure- Active Namenode JVM failure- Active ZKFC service failure- Etu Active Master OS failure- Etu Active Master machine power failure- Failure of NIC cards on the Etu Active
Master machine- Network failure for the Etu Active Master
machine
Standby Master
27
Design Details – Enabling HA
Active Master Standby Master
1. Stopping services dependent on HDFS. (JobTracker, HMaster, …)
2. Stopping Namenode and Datanode services.
3. Configuring HDFS and FC service.
4. Creating Synchronized File System.
5. Initializing Synchronized File System for share edit logs.
7. Initializing Standby Master.
6. Starting Active FC service.
Namenode JT, HMaster, …
FC
Namenode
FC
edit logs
Kerberos, NTP, Syslog,…
8. Starting Standby FC service.
9. Synchronizing system configurations and data.
10. Starting Active Namenode and Datanode services.
11. Starting Standby Namenode and Datanode services.
12. Checking Services Status.
13. Starting services dependent on HDFS. (JobTracker, HMaster, …)
DB Replication
Kerberos, NTP, …
28
Design Details - Failover
Active Master Standby Master
1. Fencing Active Master from Standby Mastera. Stopping network service.b. Stopping Hadoop related services.c. Stopping system services.d. Configuring network environment.e. Removing default services.
7. Transition Standby Master to Active Master.a. Stopping network service.b. Stopping system services.c. Configuring network environment.d. Configuring host information.e. Configuring system services.f. Starting network service.g. Starting System services.
Namenode JT, HMaster, …
FC
Namenode
FC
edit logs
Kerberos, NTP, Syslog,…
8. Configuring Hadoop related services.
DB Replication
Kerberos, NTP, …
2. Stopping Standby FC service.
3. Stopping Standby Namenode service.
5. Removing DB Replication.
4. Removing Synchronized File System . 9. Starting Namenode and Datanode services.
10. Starting Hadoop related services.
Active Master
Namenode JT, HMaster, …
Kerberos, NTP, …
Fencing
29
Use case -Active Namenode maintenance
Active Master Worker
Worker
Worker
- Stop NN- Restart NN
Standby Master
30
Use case - Standby Master failure
Active Master Worker
Worker
Worker
- OS failure- Power failure- Failure of NICs- Network failure
Standby Master
31
Use case - Cluster power failure
Active Master Worker
Worker
Worker
Standby Master
32
Use case - Cluster network failure
Active Master Worker
Worker
Worker
Standby Master
33
Demo – Non-HA (VM002)
Activating HA with One-Click
34
Demo –Activating (VM002 --- VM007)
35
Demo –Activating Done (VM002 – VM007)
36
Demo –Failover (VM002 –> VM007)
37
Demo –Failover Done (VM007)
38
Conclusion
• Leveraging Synchronized File System to share Namenode edit logs, and system data between Masters.
• Implements improved fencing method to handle failover.
• Providing system-wide high availability, not only for Hadoop Name Node Service.
39
Reference
• Hadoop 1.0.4 Documentation– http://hadoop.apache.org/docs/stable/index.html– https://issues.apache.org/jira/secure/attachment/12480489/Na
meNode%20HA_v2_1.pdf
• Hadoop 2.0.3-alpha Documentation– http://hadoop.apache.org/docs/r2.0.3-alpha/index.html
• Hadoop AvatarNode High Availability– http://hadoopblog.blogspot.tw/2010/02/hadoop-namenode-high
-availability.html
• Hortonworks Data Platform– http://hortonworks.com/products/hortonworksdataplatform/– http://www.vmware.com/files/pdf/Apache-Hadoop-VMware-HA-s
olution.pdf
40
Reference
• CDH4.2.0 Documentation– http://www.cloudera.com/content/support/en/documentation/cd
h4-documentation/cdh4-documentation-v4-latest.html