protecting your critical hadoop clusters against disasters

1 © Hortonworks Inc. 2011 – 2017 All Rights Reserved

Protecting Your Critical Hadoop Clusters Against Disasters

DataWorks Summit – SydneySeptember 2017


Presenters

Jeff Sposetti

Senior Director of Product Management, Hortonworks

Sankar Hariappan

Senior Software Engineer, Hortonworks


Agenda

Background

Under the Hood / Deep Dive

Demonstration

Wrap Up

Q & A

4 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Background on DR + Backup


What Is Disaster Recovery and Backup & Restore?

Disaster Recovery / Replication– Replication is copying data from Production Site to

Disaster Recovery Site

– Disaster Recovery includes replication, but also incorporates failover to Disaster Recovery site in case of outage and failback to the original Production Site

– Disaster Recovery Site can be an on-premise or cloud cluster

Backup & Restore– While Replication/Disaster Recovery protects against

disasters, it is can transport the logical errors (e.g. accidental deletion or corruption of data) to the DR Site

– To protect against accident deletion of your important HDFS directories or HBase Databases, customers need to do incremental/full backup (generally retained for 30 days) in order to restore back to a previous Point in time version

Production Site Disaster Recovery Site

Offsite Replication

Failback

Replication/Disaster Recovery

Sunday Monday Tuesday Wednesday Thursday Friday Saturday Sunday

Full Backup

Cumulative incremental backup Accidental Deletion

Backup & Restore


Why Enterprise Customers Care?

Disaster Recovery (DR)– To maintain the business continuity, customers want replication, failover & failback capabilities across site. It

also becomes a compliance requirement

– Early adopter verticals are financials, Insurance, Healthcare, Payment Processing, Telco etc

Backup & Restore of business critical data– Customer want to backup and restore critical HDFS files, Hive Data, HBase Databases

Replication to Cloud– Customers want to move HDFS files/Hive external tables to S3/WASB/ADLS and spin up a compute cluster

– This enables a hybrid cloud deployment for our Enterprise customers

Hadoop Data Lake is becoming an integral part of Information Architecture in support of a data driven organization and many business critical applications are hosted on Hadoop infrastructure.

The High Availability of the business critical data across sites or Backup & Restore is critical


Use Case Flow: Disaster Recovery of Hive/HDFS

Aactive

ARead only

Bactive

BRead only

Ce

ntr

aliz

ed

Se

curi

ty a

nd

Go

vern

ance

On-Premise

Data Center (a)

On-Premise

Data Center (b)

Scheduled Policy (A)

(2am, 10am, 6pm daily)

Scheduled Policy (B)

(2am daily)

Bactive

B’active

B’active

B’Read Only

ARead Only

1 Data replication with scheduled policy

2 Disaster takes down Data Center (b)

3 Failover to Data Center (a); data set B made active

4 Active data set B changes to B’ in Data Center (a)

5 Data center (b) is back up

6Failback to Data Center (b); B’ made passive in Data Center (a) and get re-synced to Data Center (b)


Use Case Flow: Replication to Cloud Storage

Aactive

Apassive Public Cloud (C)

B’active

B’passive

b

Ce

ntr

aliz

ed

Se

curi

ty a

nd

Go

vern

ance

b

On-Premise

Data Center (a)

On-Premise

Data Center (b)

Manually Triggered

7 Trigger replication between Cloud and on-premcluster (scheduled or manual)

Use Case:

-Move Hive tables/partitions to cloud over-time for cloud native analytics


Ideal Solution

Schedule and manage the replication policies

Subsystems supported: HDFS, Hive

Extensible to HBase & Kafka

HDFS Replication– Based on snapshots

– Restoration of to a prior snapshot state if there are errors during replication

– Automatic management of snapshots

Hive Replication– Support incremental replication of Hive

tables

– Replication policy can be created for each database in Hive warehouse

– Minimizes HDFS copies and provides a more consistent snapshot of state of source warehouse

Orchestration… Built on the Core Capabilities…


Under the Hood / Deep Dive:Hive Replication


Hive Replication: Design Goals

Metadata + Data replication

Point in time consistent replication

Efficient replication – transfer exact changes

Use cases– Disaster recovery

– Offload data processing to other clusters (perhaps in cloud)

Master – Slave replication (predictable)


Event logging

HiveServer2 Hive Metastore

MetastoreRDBMS

Events table

JDBC/ODBC

Runs Query Retrieve/Store metadata


Replicated Objects

Database

Table

Partition

Function

View

Constraint


Event Based Replication

MetastoreRDBMS

Events TableHDFS

Serialize new events batch

Master Cluster

Slave Cluster

HiveServer2

Dump

(metadata + data)

HDFSMeatastore

RDBMSHiveServer2

DistcpMetastore API to write objects

Data files copy

Read repldump dir

REPL DUMP

REPL LOAD


REPL Commands

"repl dump <db name> from <event id>" – get events newer than <event id>.

– Includes data files information.

– "<event id>" is last replicated event id for db from the destination cluster

"repl load <db name> from <hdfs URI>" – apply the events on destination


Demonstration


Wrap Up


Takeaways

A Data Lake is becoming an integral part of Information Architecture in support of a data driven organization and many business critical applications are hosted on Hadoop infrastructure.

The availability of the business critical data across sites is critical.

DR and Backup solutions are powered by the replication capabilities of Hive and HDFS.


Learn More

Disaster recovery and cloud migration for your Apache Hive warehouse

Breakout SessionThursday, September 21 @ 11:00a

https://dataworkssummit.com/sydney-2017/sessions/disaster-recovery-and-cloud-migration-for-your-apache-hive-warehouse/

Apache Hive, Apache HBase and Apache Phoenix

Bird of a FeatherThursday, September 21 @ 6:00p

https://dataworkssummit.com/sydney-2017/birds-of-a-feather/apache-hive-apache-hbase-apache-phoenix/


Thank You.Questions?

protecting your critical hadoop clusters against disasters

Technology