protecting your critical hadoop clusters against disasters

20
1 © Hortonworks Inc. 2011 – 2017 All Rights Reserved Protecting Your Critical Hadoop Clusters Against Disasters DataWorks Summit – Sydney September 2017

Upload: dataworks-summit

Post on 22-Jan-2018

294 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Protecting your Critical Hadoop Clusters Against Disasters

1 © Hortonworks Inc. 2011 – 2017 All Rights Reserved

Protecting Your Critical Hadoop Clusters Against Disasters

DataWorks Summit – SydneySeptember 2017

Page 2: Protecting your Critical Hadoop Clusters Against Disasters

2 © Hortonworks Inc. 2011 – 2017 All Rights Reserved

Presenters

Jeff Sposetti

Senior Director of Product Management, Hortonworks

Sankar Hariappan

Senior Software Engineer, Hortonworks

Page 3: Protecting your Critical Hadoop Clusters Against Disasters

3 © Hortonworks Inc. 2011 – 2017 All Rights Reserved

Agenda

Background

Under the Hood / Deep Dive

Demonstration

Wrap Up

Q & A

Page 4: Protecting your Critical Hadoop Clusters Against Disasters

4 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Background on DR + Backup

Page 5: Protecting your Critical Hadoop Clusters Against Disasters

5 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

What Is Disaster Recovery and Backup & Restore?

Disaster Recovery / Replication– Replication is copying data from Production Site to

Disaster Recovery Site

– Disaster Recovery includes replication, but also incorporates failover to Disaster Recovery site in case of outage and failback to the original Production Site

– Disaster Recovery Site can be an on-premise or cloud cluster

Backup & Restore– While Replication/Disaster Recovery protects against

disasters, it is can transport the logical errors (e.g. accidental deletion or corruption of data) to the DR Site

– To protect against accident deletion of your important HDFS directories or HBase Databases, customers need to do incremental/full backup (generally retained for 30 days) in order to restore back to a previous Point in time version

Production Site Disaster Recovery Site

Offsite Replication

Failback

Replication/Disaster Recovery

Sunday Monday Tuesday Wednesday Thursday Friday Saturday Sunday

Full Backup

Cumulative incremental backup Accidental Deletion

Backup & Restore

Page 6: Protecting your Critical Hadoop Clusters Against Disasters

6 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Why Enterprise Customers Care?

Disaster Recovery (DR)– To maintain the business continuity, customers want replication, failover & failback capabilities across site. It

also becomes a compliance requirement

– Early adopter verticals are financials, Insurance, Healthcare, Payment Processing, Telco etc

Backup & Restore of business critical data– Customer want to backup and restore critical HDFS files, Hive Data, HBase Databases

Replication to Cloud– Customers want to move HDFS files/Hive external tables to S3/WASB/ADLS and spin up a compute cluster

– This enables a hybrid cloud deployment for our Enterprise customers

Hadoop Data Lake is becoming an integral part of Information Architecture in support of a data driven organization and many business critical applications are hosted on Hadoop infrastructure.

The High Availability of the business critical data across sites or Backup & Restore is critical

Page 7: Protecting your Critical Hadoop Clusters Against Disasters

7 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Use Case Flow: Disaster Recovery of Hive/HDFS

Aactive

ARead only

Bactive

BRead only

Ce

ntr

aliz

ed

Se

curi

ty a

nd

Go

vern

ance

On-Premise

Data Center (a)

On-Premise

Data Center (b)

Scheduled Policy (A)

(2am, 10am, 6pm daily)

Scheduled Policy (B)

(2am daily)

Bactive

B’active

B’active

B’Read Only

ARead Only

1 Data replication with scheduled policy

2 Disaster takes down Data Center (b)

3 Failover to Data Center (a); data set B made active

4 Active data set B changes to B’ in Data Center (a)

5 Data center (b) is back up

6Failback to Data Center (b); B’ made passive in Data Center (a) and get re-synced to Data Center (b)

Page 8: Protecting your Critical Hadoop Clusters Against Disasters

8 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Use Case Flow: Replication to Cloud Storage

Aactive

Apassive Public Cloud (C)

B’active

B’passive

b

Ce

ntr

aliz

ed

Se

curi

ty a

nd

Go

vern

ance

b

On-Premise

Data Center (a)

On-Premise

Data Center (b)

Manually Triggered

7 Trigger replication between Cloud and on-premcluster (scheduled or manual)

Use Case:

-Move Hive tables/partitions to cloud over-time for cloud native analytics

Page 9: Protecting your Critical Hadoop Clusters Against Disasters

9 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Ideal Solution

Schedule and manage the replication policies

Subsystems supported: HDFS, Hive

Extensible to HBase & Kafka

HDFS Replication– Based on snapshots

– Restoration of to a prior snapshot state if there are errors during replication

– Automatic management of snapshots

Hive Replication– Support incremental replication of Hive

tables

– Replication policy can be created for each database in Hive warehouse

– Minimizes HDFS copies and provides a more consistent snapshot of state of source warehouse

Orchestration… Built on the Core Capabilities…

Page 10: Protecting your Critical Hadoop Clusters Against Disasters

10 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Under the Hood / Deep Dive:Hive Replication

Page 11: Protecting your Critical Hadoop Clusters Against Disasters

11 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Hive Replication: Design Goals

Metadata + Data replication

Point in time consistent replication

Efficient replication – transfer exact changes

Use cases– Disaster recovery

– Offload data processing to other clusters (perhaps in cloud)

Master – Slave replication (predictable)

Page 12: Protecting your Critical Hadoop Clusters Against Disasters

12 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Event logging

HiveServer2 Hive Metastore

MetastoreRDBMS

Events table

JDBC/ODBC

Runs Query Retrieve/Store metadata

Page 13: Protecting your Critical Hadoop Clusters Against Disasters

13 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Replicated Objects

Database

Table

Partition

Function

View

Constraint

Page 14: Protecting your Critical Hadoop Clusters Against Disasters

14 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Event Based Replication

MetastoreRDBMS

Events TableHDFS

Serialize new events batch

Master Cluster

Slave Cluster

HiveServer2

Dump

(metadata + data)

HDFSMeatastore

RDBMSHiveServer2

DistcpMetastore API to write objects

Data files copy

Read repldump dir

REPL DUMP

REPL LOAD

Page 15: Protecting your Critical Hadoop Clusters Against Disasters

15 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

REPL Commands

"repl dump <db name> from <event id>" – get events newer than <event id>.

– Includes data files information.

– "<event id>" is last replicated event id for db from the destination cluster

"repl load <db name> from <hdfs URI>" – apply the events on destination

Page 16: Protecting your Critical Hadoop Clusters Against Disasters

16 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Demonstration

Page 17: Protecting your Critical Hadoop Clusters Against Disasters

17 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Wrap Up

Page 18: Protecting your Critical Hadoop Clusters Against Disasters

18 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Takeaways

A Data Lake is becoming an integral part of Information Architecture in support of a data driven organization and many business critical applications are hosted on Hadoop infrastructure.

The availability of the business critical data across sites is critical.

DR and Backup solutions are powered by the replication capabilities of Hive and HDFS.

Page 19: Protecting your Critical Hadoop Clusters Against Disasters

19 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Learn More

Disaster recovery and cloud migration for your Apache Hive warehouse

Breakout SessionThursday, September 21 @ 11:00a

https://dataworkssummit.com/sydney-2017/sessions/disaster-recovery-and-cloud-migration-for-your-apache-hive-warehouse/

Apache Hive, Apache HBase and Apache Phoenix

Bird of a FeatherThursday, September 21 @ 6:00p

https://dataworkssummit.com/sydney-2017/birds-of-a-feather/apache-hive-apache-hbase-apache-phoenix/

Page 20: Protecting your Critical Hadoop Clusters Against Disasters

20 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Thank You.Questions?