enterprise-grade rolling upgrade for a live hadoop cluster

31
Page 1 © Hortonworks Inc. 2014 Enterprise-Grade Rolling Upgrade for a Live Hadoop Cluster Sanjay Radia, Vinod Kumar Vavilapalli, Hortonworks Inc Page 1

Upload: hadoop-summit

Post on 15-Jul-2015

240 views

Category:

Technology


2 download

TRANSCRIPT

Page 1: Enterprise-Grade Rolling Upgrade for a Live Hadoop Cluster

Page 1 © Hortonworks Inc. 2014

Enterprise-Grade Rolling Upgrade for a Live

Hadoop ClusterSanjay Radia, Vinod Kumar Vavilapalli, Hortonworks Inc

Page 1

Page 2: Enterprise-Grade Rolling Upgrade for a Live Hadoop Cluster

Page 2 © Hortonworks Inc. 2014

© Hortonworks Inc. 2013 - Confidential

Agenda

• Introduction

•What is Rolling Upgrade?

•Problem – Several key issues to be addressed

–Wire compatibility and side-by-side installs are not sufficient!!

–Must Address: Data safety, Service degradation and disruption

•Enhancements to various components

–Packaging – side-by-side install

–HDFS, Yarn, Hive, Oozie

Page 2

Page 3: Enterprise-Grade Rolling Upgrade for a Live Hadoop Cluster

Page 3 © Hortonworks Inc. 2014

© Hortonworks Inc. 2013 - Confidential

Hello, my name is Sanjay Radia

•Chief Architect, Founder, Hortonworks

•Part of the Hadoop team at Yahoo! since 2007

–Chief Architect of Hadoop Core at Yahoo!

–Apache Hadoop PMC and Committer

• Prior

–Data center automation, schedulers, virtualization, Java, HA, OSs, File

Systems

– (Startup, Sun Microsystems, Inria …)

–Ph.D., University of Waterloo

Page 3

Page 4: Enterprise-Grade Rolling Upgrade for a Live Hadoop Cluster

Page 4 © Hortonworks Inc. 2014

HDP Upgrade: Two Upgrade Modes

Stop the Cluster UpgradeShutdown services and cluster and then upgrade.

Traditionally this was the only way

Rolling Upgrade

Upgrade cluster and its services while cluster is

actively running jobs and applicationsNote: Upgrade time is proportional to # nodes, not data size

Enterprises run critical services and data on a Hadoop cluster.

Need live cluster upgrade that maintains SLAs without degradation

Page 5: Enterprise-Grade Rolling Upgrade for a Live Hadoop Cluster

Page 5 © Hortonworks Inc. 2014

© Hortonworks Inc. 2013 - Confidential

But you can Revert to Prior State

Rollback

Revert bits and state of cluster and its services back to a

checkpoint’d state.

Why? This is an emergency procedure.

Downgrade

Downgrade the service and component to prior version, but

keep any new data and metadata that has been generated

Why? You are not happy with performance, or app compatibility, ….

Page 6: Enterprise-Grade Rolling Upgrade for a Live Hadoop Cluster

Page 6 © Hortonworks Inc. 2014

But aren’t wire compatibility and

side-by-side installs sufficient for

Rolling upgrades?

Unfortunately No!! Not if you want

• Data safety

• Keep running jobs/apps continue to run correctly

• Maintain SLAs

• Allow downgrade/rollbacks in case of problems

Page 6

Page 7: Enterprise-Grade Rolling Upgrade for a Live Hadoop Cluster

Page 7 © Hortonworks Inc. 2014

Issues that need to be addressed (1)

• Data safety

• HDFS’s upgrade checkpoint does not work for rolling upgrade

• Service degradation – note every daemon is restarted in rolling fashion

• HDFS write pipeline

• Yarn App masters restart

• Node manager restart

• Hive server is processing client queries – it cannot restart to new version without loss

• Client must not see failures – many components do not have retry

BUT Hadoop deals with failures, it will fix pipelines, restart tasks –

what is the big deal!!

Service degradation will be high because every daemon is restarted

Page 8: Enterprise-Grade Rolling Upgrade for a Live Hadoop Cluster

Page 8 © Hortonworks Inc. 2014

Issues that need to be addressed (2)

• Maintaining the job submitters context (correctness)

• Yarn tasks get their context from the local node

– In the past the submitters and node’s context were identical

– But with RU, a node’s binaries are being upgraded and hence may be inconsistent with submitter

- Half of the job could execute with old binaries and the other with the new one!!

• Persistent state

• Backward compatibility for upgrade (or convert)

• Forward compatibility for downgrade (or convert)

• Wire compatibility

• With clients (forward and backward)

• Internally (Between Masters and Slaves or Peers)

– Note: the upgrade is in a rolling fashion

Page 9: Enterprise-Grade Rolling Upgrade for a Live Hadoop Cluster

Page 9 © Hortonworks Inc. 2014

Component Enhancements • Packaging – Side-by-side installs

• HDFS Enhancements

• Yarn Enhancements

• Retaining Job/App Context

• Hive Enhancements

Page 10: Enterprise-Grade Rolling Upgrade for a Live Hadoop Cluster

Page 10 © Hortonworks Inc. 2014

Packaging: Side-by-side Installs (1)• Need side-by-side installs of multiple versions on same node

• Some components are version N, while others are N+1

• For same component, some daemons version N, others N+1 on the same node (e.g. NN and DN)

• HDP’s solution: Use OS-distro standard packaging solution

• Rejected a proprietary packing solution (no lock-in)

• Want to support RU via Ambari and Manually

• Standard packaging solutions like RPMs have useful tools and mechanisms

– Tools to install, uninstall, query, etc

– Manage dependencies automatically

– Admins do not need to learn new tools and formats

• Side benefits for ‘stop-the-world” upgrade:

• Can install the new binaries before the shutdown

Page 11: Enterprise-Grade Rolling Upgrade for a Live Hadoop Cluster

Page 11 © Hortonworks Inc. 2014

Packaging: Side-by-side installs (2)

• Layout: side-by-side

• /usr/hdp/2.2.0.0/hadoop

• /usr/hdp/2.2.0.0/hive

• /usr/hdp/2.3.0.0/hadoop

• /usr/hdp/2.3.0.0/hive

• Define what is current for each component’s

daemon and clients

• /usr/hdp/current/hdfs-nn->/usr/hdp/2.3.0.0/hadoop

• /usr/hdp/current/hadoop-client->/usr/hdp/2.2.0.0/hadoop

• /usr/hdp/current/hdfs-dn->/usr/hdp/2.2.0.0/hadoop

• Distro-select helps you manage the version switch

• Our solution: the package name contains the version number:

• E.g hadoop_2_2_0_0 is the RPM package name itself

– Hadoop_2_3_0_0 is different peer package

• Bin commands point to current:

/usr/bin/hadoop->/usr/hdp/current/hadoop-client/bin/hadoop

Page 12: Enterprise-Grade Rolling Upgrade for a Live Hadoop Cluster

Page 12 © Hortonworks Inc. 2014

Packaging: Side-by-side installs (3)

• distro-select tool to select current binary

• Per-component, Per-daemon

• Maintain stack consistency – that is what QE tested

• Each component refers to its siblings of same stack version

• Each component knows the “hadoop home” of the same stack

– Wrapper bin-scripts set this up

• Config updates can be optionally synchronized with binary upgrade

• Configs can sit in their old location

• But what if the new binary version requires slightly different config?

• Each binary version has its own config pointer

– /usr/hdp/2.2.0.0/hadoop/conf -> /etc/hadoop/conf

Page 13: Enterprise-Grade Rolling Upgrade for a Live Hadoop Cluster

Page 13 © Hortonworks Inc. 2014

Component Enhancements • Packaging – Side-by-side installs

• HDFS Enhancements

• Yarn Enhancements

• Retaining Job/App Context

• Hive Enhancements

Page 14: Enterprise-Grade Rolling Upgrade for a Live Hadoop Cluster

Page 14 © Hortonworks Inc. 2014

HDFS Enhancements (1)

Data safety

• Since version 2007, HDFS supported an upgrade-checkpoint

• Backups of HDFS not practical – too large

• Protects against HDFS bugs in new version deleting files

• Standard practice to use for ALL upgrade even patch releases

• But this only works for “stop-the-world” full upgrade and does not support downgrade

• Irresponsible to do rolling upgrade without such a mechanism

HDP 2.2 has enhanced upgrade-checkpoint (HDFS-5535)

• Markers for rollback

• “Hardlinks” to protect against deletes due to bugs in the new version of HDFS code

• Old scheme had hardlinks but we now delay the deletes

• Added downgrade capability

• Protobuf based fsImage for compatible extensibility

Page 15: Enterprise-Grade Rolling Upgrade for a Live Hadoop Cluster

Page 15 © Hortonworks Inc. 2014

HDFS Enhancements (2)

Minimize service degradation and retain data safety

• Fast datanode restart (HDFS-5498)

• Write pipeline – every DN will be upgraded and hence many write

pipelines will break and repaired

• Umbrella Jira HDFS-5535

– Repair it to the same DN during RU (avoid replica data copy)

– Retain same number of replicas in pipeline

• Upgrade HA standby and failover (NN HA available for a long time)

Page 16: Enterprise-Grade Rolling Upgrade for a Live Hadoop Cluster

Page 16 © Hortonworks Inc. 2014

Component Enhancements • Packaging – Side-by-side installs

• HDFS Enhancements

• Yarn Enhancements

• Retaining Job/App Context

• Hive Enhancements

Page 17: Enterprise-Grade Rolling Upgrade for a Live Hadoop Cluster

Page 17 © Hortonworks Inc. 2014

YARN Enhancements: Minimize Service Degradation• YARN RM retains app/job queue (2013)

• YARN RM HA (2014)

• Note this retains the queues but ALL jobs are restarted

• Yarn RM can restart while retaining jobs (2015)

Page 18: Enterprise-Grade Rolling Upgrade for a Live Hadoop Cluster

Page 18 © Hortonworks Inc. 2014

YARN Enhancements: Minimize Service Degradation• A restarted YARN NodeManager retains existing containers (2015)

• Recall restarting containers will cause serious SLA degradation

Page 19: Enterprise-Grade Rolling Upgrade for a Live Hadoop Cluster

Page 19 © Hortonworks Inc. 2014

YARN Enhancement: Compatibility

• Versioning of state-stores of RM and NMs

• Compatible evolution of tokens over time

• Wire compatibility between mixed versions of RM

Page 20: Enterprise-Grade Rolling Upgrade for a Live Hadoop Cluster

Page 20 © Hortonworks Inc. 2014

Component Enhancements • Packaging – Side-by-side installs

• HDFS Enhancements

• Yarn Enhancements

• Retaining Job/App Context

• Hive Enhancements

Page 21: Enterprise-Grade Rolling Upgrade for a Live Hadoop Cluster

Page 21 © Hortonworks Inc. 2014

Retaining Job/App context

Previously a Job/Apps uses libraries from the local node

• Worked because client-node & compute-nodes had same version

• But during RU, the node manager has multiple versions

• Must use the same version as used by the client when submitting a job

• Solution:

• Framework libraries are now installed in HDFS

• Client-context sent as “distro-version” variable in job config

• Has side benefits

– Frameworks now installed in single node and then uploaded to HDFS

• Note Oozie also enhanced to maintain consistent context

Page 22: Enterprise-Grade Rolling Upgrade for a Live Hadoop Cluster

Page 22 © Hortonworks Inc. 2014

Component Enhancements • Packaging – Side-by-side installs

• HDFS Enhancements

• Yarn Enhancements

• Retaining Job/App Context

• Hive Enhancements

Page 23: Enterprise-Grade Rolling Upgrade for a Live Hadoop Cluster

Page 23 © Hortonworks Inc. 2014

Hive Enhancements

• Fast restarts + client-side reconnection

• Hive metastore and Hive client

• Hive-server2: statefull server that submits the client’s query

• Need to keep it running till the old queries complete

• Solution:

• Allow multiple Hive-servers to run, each registered in Zookeeper

• New client requests go to new servers

• Old server completes old queries but does not receive any new ones

– Old-server is removed from Zookeeper

• Side benefits

• HA + Load balancing solution for Hiveserver2

Page 24: Enterprise-Grade Rolling Upgrade for a Live Hadoop Cluster

Page 24 © Hortonworks Inc. 2014

Automated Rolling Upgrade

Via Ambari

Via Your own cluster management scripts

Page 25: Enterprise-Grade Rolling Upgrade for a Live Hadoop Cluster

Page 25 © Hortonworks Inc. 2014

HDP Rolling Upgrades Runbook

Pre-requisites

• HA

• Configs

Prepare• Install bits

• DB backups

• HDFS checkpoint

Rolling

UpgradeFinalize

Rolling

Downgrade

Rollback

NOT Rolling. Shutdown all

services.

Note: Upgrade time is proportional to # nodes, not data size

Page 26: Enterprise-Grade Rolling Upgrade for a Live Hadoop Cluster

Page 28 © Hortonworks Inc. 2014

Both Manual and Automated Rolling Upgrade

• Ambari supports fully automated upgrades

• Verifies prerequisites

• Performs HDFS upgrade-checkpoint, prompts for DB backups

• Performs rolling upgrade

• All the components, in the right order

• Smoke tests at each critical stages

• Opportunities for Admin verification at critical stages

• Downgrade if you change your mind

• Have published the runbook for those that do not use Ambari

• You can do it manually or automate your own process

Page 27: Enterprise-Grade Rolling Upgrade for a Live Hadoop Cluster

Page 29 © Hortonworks Inc. 2014

Runbook: Rolling Upgrade

Ambari has automated

process for Rolling Upgrades

Services are switched over to

new version in rolling fashion

Any components not installed

on cluster are skipped

Zookeeper

Ranger

Core Masters

Core Slaves

Hive

Oozie

Falcon

Clients

Kafka

Knox

Storm

Slider

Flume

Hue

Finalize

HDFS, YARN, MR,

Tez, HBase, Pig.

Hive, Phoenix,

Mahout

HDFS

YARN

HBase

Page 28: Enterprise-Grade Rolling Upgrade for a Live Hadoop Cluster

Page 30 © Hortonworks Inc. 2014

Runbook: Rolling Downgrade

Zookeeper

Ranger

Core Masters

Core Slaves

Hive

Oozie

Falcon

Clients

Kafka

Knox

Storm

Slider

Flume

Hue

Downgrade

Finalize

Page 29: Enterprise-Grade Rolling Upgrade for a Live Hadoop Cluster

Page 31 © Hortonworks Inc. 2014

Summary

• Enterprises run critical services and data on a Hadoop cluster.

• Need a live cluster upgrade without degradation and maintaining SLAs

• We enhanced Hadoop components for enterprise-grade rolling upgrade

• Non-proprietary packaging solution using OS-standard solution (RPMs, Debs, )

• Data safety

– HDFS checkpoints and write-pipelines

• Maintain SLAs – solve a number of service degradation problems

– HDFS write pipelines, Yarn RM, NM state recovery, Hive, …

• Jobs/apps continue to run correctly with the right context

• Allow downgrade/rollbacks in case of problems

• All enhancements truly open source and pushed back to Apache?

• Yes of course – that is how Hortonworks does business …

Page 30: Enterprise-Grade Rolling Upgrade for a Live Hadoop Cluster

Page 32 © Hortonworks Inc. 2014

Backup slides

Page 31: Enterprise-Grade Rolling Upgrade for a Live Hadoop Cluster

Page 33 © Hortonworks Inc. 2014

Why didn’t you use alternatives

• Alternatives generally keep one version active, not two

• We need to move some services as a pack (clients)

• We need to support managing confs and binaries together and

separately

• Maybe we could have done it, but it was getting complex …..