apache hadoop 3.0 what's new in yarn and mapreduce

22
1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache Hadoop 3.0: What’s new in YARN & MapReduce Tokyo, Oct.26 2016 Junping Du [email protected]

Upload: hadoop-summit

Post on 07-Jan-2017

591 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Apache Hadoop 3.0 What's new in YARN and MapReduce

1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Apache Hadoop 3.0: What’s new inYARN & MapReduceTokyo, Oct.26 2016Junping [email protected]

Page 2: Apache Hadoop 3.0 What's new in YARN and MapReduce

2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

About Speakers

⬢ Junping Du– Apache Hadoop Committer & PMC member– Lead Software Engineer @ Hortonworks YARN Core Team– 10+ years for developing enterprise software (5+ years for being “Hadooper”)

Page 3: Apache Hadoop 3.0 What's new in YARN and MapReduce

3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Agenda

⬢Evolutions in YARN & MR (Done and In Progress)

⬢Timeline Estimation for Apache Hadoop 3.0 Release

Page 4: Apache Hadoop 3.0 What's new in YARN and MapReduce

4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

First, A bit of Vision…

⬢ Evolution of Hadoop start with YARN

⬢ YARN Evolution will continue to drive Hadoop forward

Hadoop 3

Page 5: Apache Hadoop 3.0 What's new in YARN and MapReduce

5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Several important trends in age of Hadoop 3.0 +

YARN and Other Platform Services

StorageResource

Management SecurityServiceDiscovery Management

Monitoring

Alerts

IOT Assembly

Kafka Storm HBase Solr

Governance

MR Tez Spark …

Innovating frameworks:

Flink, DL(TensorFlow)

, etc.

Various Environments

On Premise Private Cloud Public Cloud

Page 6: Apache Hadoop 3.0 What's new in YARN and MapReduce

6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Evolutions in YARN & MR

⬢ Re-architecture for YARN Timeline Service - ATS v2

⬢Service Native Support in YARN

⬢YARN Scheduling Enhancements

⬢More Cloud Friendly

⬢Better User Experiences

⬢Other Enhancements

Page 7: Apache Hadoop 3.0 What's new in YARN and MapReduce

7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Timeline Service Revolution – ATS v2⬢Why ATS v2?

– Scalability & Performance To get rid of v1 limitation:

•Single global instance of writer/reader

•Local disk based LevelDB storage

– Usability•Handle flows as first-class concepts and model aggregation

•Add configuration and metrics as first-class members

•Better support for queries

– Reliability

v1 limitation:•Data is stored in a local disk•Single point of failure (SPOF) for timeline server

– Flexibility•Data model is more describable•Extended to more specific info to app

Page 8: Apache Hadoop 3.0 What's new in YARN and MapReduce

8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Core Design for ATS v2⬢ Distributed write path

– Logical per app collector + physical per node writer

– Collector/Writer launched as an auxiliary service in NM.

– Standalone writers will be added later.

⬢Pluggable backend storage– Built in with a scalable and reliable

implementation (HBase)

⬢ Enhanced data model– Entity (bi-directional relation) with flow,

queue, etc.– Configuration, Metric, Event, etc.

⬢Separate reader instances

⬢ Aggregation & Accumulation– Aggregation: rolling up the metric values to the

parent•Online aggregation for apps and flow runs

•Offline aggregation for users, flows and queues

– Accumulation: rolling up the metric values across time interval

•Accumulated resource consumption for app, flow, etc.

Page 9: Apache Hadoop 3.0 What's new in YARN and MapReduce

9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

ATS v2 Architecture

ResourceManager

RMApp

NodeManager

Info of Collectors{app_1,app_2,….}

app_1 AM

Syncapp_1 Collector

app_n Collector

Aux Service

AM timeline info

TimelineWriter

RM app Events

NM Collector Service

TimelineWriter

NM_n…

NM_1

app_1 container

NM Collector Service

Sync

Container Monitor

11Timeline

ReaderUser

Queries

Container metric info

HBase

container info (to be added)

Page 10: Apache Hadoop 3.0 What's new in YARN and MapReduce

10

© Hortonworks Inc. 2011 – 2016. All Rights Reserved

Data Model in ATS v2Entity

ID + Type

Configurations

Metadata(Info)

Parent-Child Relationships

Metrics

Events

Metric

ID

Metadata

Single Value or Time

Series(with timestamps)

Cluster

Type

Cluster Attributes

Flow

Type

User

Flow Runs

Flow Attributes

Flow Run

Type

User

Running apps

Flow Run Attributes

Application

Type

User

Flow + Run

Queue

Attempts

Attempt

Type

Application

Queue

Containers

Container

Type

Attempt

Attributes

Entities of first class citizens

User

Username(ID)

Aggregated metrics

Queue

Queue(ID)

Sub queues

Aggregated metrics

Aggregation

Event

ID

Metadata

Timestamp

Page 11: Apache Hadoop 3.0 What's new in YARN and MapReduce

11

© Hortonworks Inc. 2011 – 2016. All Rights Reserved

Status for ATS v2

⬢For other details, like: – Aggregations (app/flow/user/queue level, offline or online)– HBase table schema for EntityTable, ApplicationTable, FlowRunTable, etc.– Reader APIs (RESTful)

Please refer to previous talks in Hadoop Summit 2016 San Jose:https://www.youtube.com/watch?v=adV-DFa-8us&index=6&list=PLKnYDs_-dq16K1NH83Bke2dGGUO3YKZ5b

⬢Status–Phase I (YARN-2928): already released as an alpha feature in 3.0.0-alpha1–Phase II (YARN-5355): In progress

Page 12: Apache Hadoop 3.0 What's new in YARN and MapReduce

12

© Hortonworks Inc. 2011 – 2016. All Rights Reserved

Native Service Support in YARN A native YARN framework. YARN-4692

– Abstract common Framework (Similar to Slider) to support long running service– More simplified API

Better support for long running service– Recognition of long running service

• Affect the policy of preemption, container reservation, etc.– Auto-restart of containers

• Containers in long running service are more stateful– Service/application upgrade support

• More services are expected to run long enough to across versions– Dynamic container configuration

• Only reserve resource for necessary moment

Page 13: Apache Hadoop 3.0 What's new in YARN and MapReduce

13

© Hortonworks Inc. 2011 – 2016. All Rights Reserved

API Simplification - REST Existing APIs are too low level and not easy to work with.

Simple REST API layer fronting YARN– YARN-4793. Simplified API layer for services and beyond

Create and manage lifecycle of YARN services.

Example: ZooKeeper App

Page 14: Apache Hadoop 3.0 What's new in YARN and MapReduce

14

© Hortonworks Inc. 2011 – 2016. All Rights Reserved

Discovery services in YARN

YARN Service Discovery via DNS: YARN-4757– Expose existing service information in YARN registry via DNS

• Current YARN service registry’s records will be converted into DNS entries

– Enabling Container to IP mappings - enables discovery of the IPs of containers via standard DNS lookups.

• Application – zkapp1.user1.yarncluster.com -> 192.168.10.11:8080

• Container– container - 1454001598828-0001-01-00004.yarncluster.com -> 192.168.10.18

Page 15: Apache Hadoop 3.0 What's new in YARN and MapReduce

15

© Hortonworks Inc. 2011 – 2016. All Rights Reserved

More Cloud Friendly⬢Elastic

–Dynamic Resource Configuration•YARN-291•Allow tune down/up on NM’s resource in runtime

–Graceful decommissioning of NodeManagers•YARN-914•Drains a node that’s being decommissioned to allow running containers to finish

⬢Efficient–Support for container resizing

•YARN-1197•Allows applications to change the size of an existing container

–Task level native optimization•MAPREDUCE-2841

Page 16: Apache Hadoop 3.0 What's new in YARN and MapReduce

16

© Hortonworks Inc. 2011 – 2016. All Rights Reserved

More Cloud Friendly (Contd.)⬢ Isolation

–Embrace container technology to achieve better isolation–Resource isolation support for disk and network

•YARN-2619 (disk), YARN-2140 (network)•Containers get a fair share of disk and network resources using Cgroups

–Docker support in LinuxContainerExecutor•YARN-3611•Support to launch Docker containers alongside process•Packaging and resource isolation

⬢Operation–Container upgrades (YARN-4726)

•”Do an upgrade of my Spark / HBase apps with minimal impact to end-users”–AM Restart With Work Preserving

•MAPREDUCE-6608

Page 17: Apache Hadoop 3.0 What's new in YARN and MapReduce

17

© Hortonworks Inc. 2011 – 2016. All Rights Reserved

Scheduling Enhancements Application priorities: YARN-1963

– Inner-queue priority support

Affinity / anti-affinity: YARN-1042– More restraints on locations

Global Scheduling: YARN-5139– Get rid of per node scheduling model– Enhance container scheduling throughput

Page 18: Apache Hadoop 3.0 What's new in YARN and MapReduce

18

© Hortonworks Inc. 2011 – 2016. All Rights Reserved

Operational and User Experience Enhancements (YARN-3368)

Page 19: Apache Hadoop 3.0 What's new in YARN and MapReduce

19

© Hortonworks Inc. 2011 – 2016. All Rights Reserved

Other YARN work could get released in Hadoop 3.X

⬢Resource profiles–YARN-3926–Users can specify resource profile name instead of individual resources–Resource types read via a config file

⬢YARN federation–YARN-2915–Allows YARN to scale out to tens of thousands of nodes–Cluster of clusters which appear as a single cluster to an end user

⬢Gang Scheduling–YARN-624

More Details in tomorrow noon session “Apache Hadoop YARN: Past, Present and Future” by Junping Du and Jian He

Page 20: Apache Hadoop 3.0 What's new in YARN and MapReduce

20

© Hortonworks Inc. 2011 – 2016. All Rights Reserved

Release Timeline for Apache Hadoop 3.0

⬢3.0.0-alpha1 is released on Sep/3/2016

⬢alpha2 in Q4. 2016 (Estimated)

⬢beta1 in early Q1. 2017 (Estimated)

⬢GA in Q1/Q2 2017 (Estimated)

Page 21: Apache Hadoop 3.0 What's new in YARN and MapReduce

21

© Hortonworks Inc. 2011 – 2016. All Rights Reserved

HDP Evolution with Apache Hadoop and YARN

Beyond2.x1.x

Page 22: Apache Hadoop 3.0 What's new in YARN and MapReduce

22

© Hortonworks Inc. 2011 – 2016. All Rights Reserved22

© Hortonworks Inc. 2011 – 2016. All Rights Reserved

Thank you!