weave: yarn made easyfiles.meetup.com/1767817/la_hug_meetup_8-29-13_weave-jg.pdf · jonathan gray...

22
WEAVE: YARN MADE EASY Jonathan Gray Founder/CEO @ Continuuity HBase Committer Los Angeles Hadoop User Group August 29, 2013

Upload: others

Post on 29-May-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: WEAVE: YARN MADE EASYfiles.meetup.com/1767817/LA_HUG_Meetup_8-29-13_Weave-JG.pdf · Jonathan Gray • As Streamy faded away in early 2009, began Hadoop consulting • Recruited by

WEAVE: YARN MADE EASY

Jonathan GrayFounder/CEO @ Continuuity

HBase Committer

Los Angeles Hadoop User GroupAugust 29, 2013

Page 2: WEAVE: YARN MADE EASYfiles.meetup.com/1767817/LA_HUG_Meetup_8-29-13_Weave-JG.pdf · Jonathan Gray • As Streamy faded away in early 2009, began Hadoop consulting • Recruited by

Continuuity Proprietary and Confidential

AGENDA• About Me• About Continuuity (quickly)• BigFlow: Our first YARN application• What is YARN? • Motivation for Weave• Weave Architecture and Examples• Weave Roadmap

2

Page 3: WEAVE: YARN MADE EASYfiles.meetup.com/1767817/LA_HUG_Meetup_8-29-13_Weave-JG.pdf · Jonathan Gray • As Streamy faded away in early 2009, began Hadoop consulting • Recruited by

Jonathan Gray• Founded Streamy.com, a Social News Reader, in 2006 out of LA

• Built Streamy v1 on PostgreSQL

• Introduced to Hadoop/HBase in 2007 while interviewing someone

• Migrated platform to Hadoop/HBase in 2008

• Became a Hadoop contributor and an HBase Committer along the way

‣ Started out doing application development on Hadoop

‣ Ended up doing Hadoop development to enable apps

Page 4: WEAVE: YARN MADE EASYfiles.meetup.com/1767817/LA_HUG_Meetup_8-29-13_Weave-JG.pdf · Jonathan Gray • As Streamy faded away in early 2009, began Hadoop consulting • Recruited by

Jonathan Gray• As Streamy faded away in early 2009, began Hadoop consulting

• Recruited by Facebook in late 2009 to help their Hadoop efforts

• They decided in January 2010 to use HBase for FB Messages

‣ Joined Facebook to be the HBase expert (and help w/ OSS)

• Launched several massive scale applications on Hadoop/HBase

‣ Messages, Puma, and ODS are the big ones

• Left FB in November 2011 and immediately founded Continuuity

‣ And was finally forced to move to the Bay Area

Page 5: WEAVE: YARN MADE EASYfiles.meetup.com/1767817/LA_HUG_Meetup_8-29-13_Weave-JG.pdf · Jonathan Gray • As Streamy faded away in early 2009, began Hadoop consulting • Recruited by

About Continuuity• Who we are ?

‣ 25 person startup based in Palo Alto. Founded by Big Data pioneers and backed by top-tier investors.

‣ We understand the pain of building Big Data Applications - we have lived through it.

• What do we do ?

‣ Mission : Democratize the development of the next generation of Big Data applications.

‣ Building the industry’s first Hadoop-based Application Server (YARN + HBase)

• Products

‣ Reactor Development Kit (public beta available now)

- Downloadable developer edition. Includes SDK and Local Reactor

‣ Sandbox Reactor (public beta available now)

- Free, Self-Service, one node hosted cloud environment

‣ Hosted Reactor

‣ Enterprise Reactor

Page 6: WEAVE: YARN MADE EASYfiles.meetup.com/1767817/LA_HUG_Meetup_8-29-13_Weave-JG.pdf · Jonathan Gray • As Streamy faded away in early 2009, began Hadoop consulting • Recruited by

BIG FLOWWhat is BigFlow?• A core Processor; a real-time stream processing engine

(Similar to Storm and S4)

• BigFlow provides

‣ Tight integration with your existing Hadoop/HBase cluster

‣ Exactly-once execution vs. At-least-once execution (your apps do not have to be idempotent)

‣ Supports transactions for consistency andHBase-backed queues for durability

‣ Direct integration with a data storage engine (Continuuity Data Fabric / HDFS + HBase)

‣ Beautiful user interface for application management

‣ Utilizes YARN for deployment and supports runtime elastic scalability of Flows

Page 7: WEAVE: YARN MADE EASYfiles.meetup.com/1767817/LA_HUG_Meetup_8-29-13_Weave-JG.pdf · Jonathan Gray • As Streamy faded away in early 2009, began Hadoop consulting • Recruited by

What is YARN?• The Resource Manager of Hadoop 2.0

• Separates resource management from programming paradigm

• Enables virtually any type of distributed application to run on a Hadoop cluster

• Multi-tenant: Supports security and resource isolation

• Scales to many thousands of nodes

• Pluggable scheduling algorithms

Page 8: WEAVE: YARN MADE EASYfiles.meetup.com/1767817/LA_HUG_Meetup_8-29-13_Weave-JG.pdf · Jonathan Gray • As Streamy faded away in early 2009, began Hadoop consulting • Recruited by

YARN IS NOT SIMPLE!

Page 9: WEAVE: YARN MADE EASYfiles.meetup.com/1767817/LA_HUG_Meetup_8-29-13_Weave-JG.pdf · Jonathan Gray • As Streamy faded away in early 2009, began Hadoop consulting • Recruited by

YARN PAIN POINTS• YARN is not for novice Big Data programmers

• High ramp-up time

• Lots of boiler plate code required to build simple applications

• 80% of applications are generally simple

• YARN is built around applications that terminate

‣ Logs are available only on completion of application

• No standard support for

• Application lifecycle management

• Communication between Container(s) & Application Master

• Handling Application level errors

• Hard to debug

Page 10: WEAVE: YARN MADE EASYfiles.meetup.com/1767817/LA_HUG_Meetup_8-29-13_Weave-JG.pdf · Jonathan Gray • As Streamy faded away in early 2009, began Hadoop consulting • Recruited by

TYPICAL YARN APP• Many distributed applications are simple.

• For example: Distributed load test for a queuing system:

• “Run n producers and m consumers for k seconds”

• Producers and consumers do not need to know about YARN

• This would be easy if we ran them as threads in a single JVM

• Goal: Can we make YARN as simple as Java Threads?

Page 11: WEAVE: YARN MADE EASYfiles.meetup.com/1767817/LA_HUG_Meetup_8-29-13_Weave-JG.pdf · Jonathan Gray • As Streamy faded away in early 2009, began Hadoop consulting • Recruited by

TYPICAL NEEDS ...... of a distributed application:

• Monitor and restart application containers

• Collect logs and metrics

• Elastic scalability (dynamic adding/removing of instances)

• Long-running jobs

Page 12: WEAVE: YARN MADE EASYfiles.meetup.com/1767817/LA_HUG_Meetup_8-29-13_Weave-JG.pdf · Jonathan Gray • As Streamy faded away in early 2009, began Hadoop consulting • Recruited by

WEAVE!

Page 13: WEAVE: YARN MADE EASYfiles.meetup.com/1767817/LA_HUG_Meetup_8-29-13_Weave-JG.pdf · Jonathan Gray • As Streamy faded away in early 2009, began Hadoop consulting • Recruited by

MOTIVATION• Not designed to be a replacement for YARN

‣ Designed to simplify building, debugging & running of YARN Applications‣ Running in YARN should be as simple as running Threads in Java

-A simpler programming model for YARN• Hide low level details of YARN

‣ Simplified API for specifying, running & managing an application‣ Simplified way to specify and manage stage(s) of the application lifecycle‣ Generic Application Master to better support simple applications‣ Simpler archive management‣ Better control over application logs and errors

Page 14: WEAVE: YARN MADE EASYfiles.meetup.com/1767817/LA_HUG_Meetup_8-29-13_Weave-JG.pdf · Jonathan Gray • As Streamy faded away in early 2009, began Hadoop consulting • Recruited by

SIMPLICITY140. public class ApplicationMaster {141. .150. private AMRMClientAsync resourceManager; .435. public boolean run() throws YarnRemoteException {436. LOG.info(“Starting ApplicationMaster”); . 498. } .500. public void finish() {504. for(Thread launchThread: launchThreads) { . ...511. }517. FinalApplicationStatus appStatus;518. String appMessage = null;538. } .790. public ContainerRequest setupContainerAskForRM(int numContainers) {.800. Resource capability = Records.newRecord(Resource.class);801. capability.setMemory(containerMemory);.803. ContainerRequest request = new ContainerRequest(capability, ..);807. }808. } . .1102. public class Client extends YarnClientImpl { .1336. public boolean run() throws IOException { .1354. GetNewApplicationResponse newApp = super.getNewApplication();1355. ApplicationId appId = newApp.getApplicationId(); .1583. return monitorApplication(appId);1585. } .1594. private boolean monitorApplication(ApplicationId appId) throws ... { .1596. while(true) { .1606. ApplicationReport report = super.getApplicationReport(appId); .1647. }1648. } .1657. private void forceKillApplication(ApplicationId appId) throws ... { .1664. super.killApplication(appId); .1665. } . 1667. }

Distributed Shell - Vanilla YARN Distributed Shell - WEAVE

001. public class DistributedShell extends AbstractWeaveRunnable {002. static Logger LOG = LoggerFactory.getLogger(DistributedShell.class); 003. private String command;004. 005. public DistributedShell(String command) {006. this.command = command;007. }008.009. @Override010. public WeaveRunnableSpecification configure() {011. return WeaveRunnableSpecification.builder().with()012. .setName("Executor " + command)013. .withArguments(ImmutableMap.of("cmd", this.command))014. .build();015. }016.017. @Override018. public void initialize(WeaveContext context) {019. this.command = context.getSpecification().getArguments("cmd");020. }021.022. @Override023. public void run() {024. try {025. Process process = Runtime.getRuntime().exec(this.command);026. InputStream stdin = proc.getInputStream();027. BufferedReader br = new BufferedReader(new InputStreamReader(stdin));028.029. String line;030. while( (line = br.readline()) != null) {031. LOG.info(line);032. }033. } catch (Throwable t) {034. LOG.error(t.getMessage());035. }036. }037. }038.039. public static main(String[] args) {040. String zkStr = args[0];041. String command = args[1];042. yarnConfig = ...043. WeaveRunnerService weaveService = new YarnWeaveRunnerService(yarnConfig, zkStr);044. weaveService.startAndWait();045. WeaveController controller = weaveService.prepare(new DistributedShell(command))046. .addLogHandler(new PrinterLogHandler(new PrintWriter(System.out)))047. .start();048. }

~ 500 lines ~ 50 lines

Page 15: WEAVE: YARN MADE EASYfiles.meetup.com/1767817/LA_HUG_Meetup_8-29-13_Weave-JG.pdf · Jonathan Gray • As Streamy faded away in early 2009, began Hadoop consulting • Recruited by

Client Host

JVM

ARCHITECTUREZooKeeper

ZooKeeperZooKeeper

Cluster Host

YARN Node Manager

YARN container

WeaveContainer

WeaveRunnable

Cluster Host

YARN Node Manager

YARN container

WeaveContainer

WeaveRunnable

Cluster Host

YARN Node Manager

YARN container

WeaveContainer

WeaveRunnable

Runtime states and messages

YARN container

WeaveApplicationMaster

WeaveRunnerService

Application

WeaveRunnable

WeaveRunnable

WeaveRunnable

Submit

KafkaServer

Log entries

Page 16: WEAVE: YARN MADE EASYfiles.meetup.com/1767817/LA_HUG_Meetup_8-29-13_Weave-JG.pdf · Jonathan Gray • As Streamy faded away in early 2009, began Hadoop consulting • Recruited by

RUNNABLE// Defines a Task

public class EchoServer implements WeaveRunnable

{

private static Logger LOG = ...

private WeaveContext context;

//....

@Override

public void run() {

ServerSocket server = new ServerSocket(0);

context.announce("echo", server.getLocalPort());

while (isRunning()) {

Socket socket = server.accept();

//...

}

}

}

‣ Class implements WeaveRunnable is a single task.

‣ SLF4J logger can be used for logging‣ logs are collected and sent back to client using

Kafka‣ Kafka server is started inside Application Master.

‣ WeaveRunnable can be run in‣ Thread‣ YARN Container

‣ Service discovery integration

‣ Easy! - Piece of cake.

Page 17: WEAVE: YARN MADE EASYfiles.meetup.com/1767817/LA_HUG_Meetup_8-29-13_Weave-JG.pdf · Jonathan Gray • As Streamy faded away in early 2009, began Hadoop consulting • Recruited by

RUNNER// Starts a task

WeaveRunner runner = ...;

WeaveController controller =

runner.prepare(new EchoServer())

.addLogHandler(new PrinterLogHandler(

new PrintWriter(System.out)

)

).start();

//...

controller.sendCommand(

Commands.create(“port”, 44444));

//...

controller.stop();

‣ WeaveRunner ‣ Can run Runnable or Application‣ Packages the class and it’s dependencies along

with additional resources specified‣ Attaches log collector for collecting all the logs

from containers

WeaveController is used to manage the lifecycle of a runnable or application

Controller also provides ability to send commands to running containers

Page 18: WEAVE: YARN MADE EASYfiles.meetup.com/1767817/LA_HUG_Meetup_8-29-13_Weave-JG.pdf · Jonathan Gray • As Streamy faded away in early 2009, began Hadoop consulting • Recruited by

APPLICATION// Defines an Application

public class WebServer implements WeaveApplication {

@Override

public WeaveSpecification configure() {

return WeaveSpecification.Builder.with()

.setName(“Jetty WebServer”)

.withRunnables()

.add(new JettyWebServer())

.withLocalFiles()

.add(“www-html”, new File(“/var/html”))

.apply()

.add(new LogsCollector())

.anyOrder()

.build();

}

}

WeaveController controller =

runner.prepare(new WebServer()).start();

‣ WeaveApplication defines a collection of runnables and their behavior

‣ Application is specified by a WeaveSpecification

‣ Application can specify any additional local directories to be made available for container to run.

‣ Weave starts containers in no order.

‣ Simply starts the application with WeaveRunner.

Page 19: WEAVE: YARN MADE EASYfiles.meetup.com/1767817/LA_HUG_Meetup_8-29-13_Weave-JG.pdf · Jonathan Gray • As Streamy faded away in early 2009, began Hadoop consulting • Recruited by

APPLICATION// Defines Application with specific tasks ordering

public class DataApplication implements WeaveApplication {

@Override

public WeaveSpecification configure() {

return WeaveSpecification.Builder.with()

.setName(“Cool Data Application”)

.withRunnables()

.add(“reader”, new InputReader())

.add(“processor”, new Processor())

.add(“writer”, new OutputWriter())

.order()

.first(“reader”)

.next(“processor”)

.next(“writer”)

.build();

}

}

Page 20: WEAVE: YARN MADE EASYfiles.meetup.com/1767817/LA_HUG_Meetup_8-29-13_Weave-JG.pdf · Jonathan Gray • As Streamy faded away in early 2009, began Hadoop consulting • Recruited by

ROADMAP• First release at the end of April • Features for upcoming releases (in no order)

‣ API support for managing lifecycle hooks of an application and it’s containers‣ Extendable Application Master‣ Support for non-JVM languages‣ Support for command dispatching‣ Ability to programmatically control resource allocation‣ Advanced error handling capabilities

- E.g. Fail application if 50% of my containers fail

- E.g. Fail application if it takes more than ‘x’ seconds to allocate containers for the application

‣ Contribs- Running HBase using Weave

- Running Jetty HTTP Server using Weave

‣ Support for custom application-level metrics‣ Better UI‣ Ability to remote debug a running container‣ LXC & QoS‣ Improve APIs, Improve APIs, Improve APIs!

‣ . . .

Page 21: WEAVE: YARN MADE EASYfiles.meetup.com/1767817/LA_HUG_Meetup_8-29-13_Weave-JG.pdf · Jonathan Gray • As Streamy faded away in early 2009, began Hadoop consulting • Recruited by

How to start?• Checkout at githubhttp://github.com/continuuity/weave

• Read the docs at http://weave.continuuity.com

Page 22: WEAVE: YARN MADE EASYfiles.meetup.com/1767817/LA_HUG_Meetup_8-29-13_Weave-JG.pdf · Jonathan Gray • As Streamy faded away in early 2009, began Hadoop consulting • Recruited by

Thank You!

Questions?

http://github.com/continuuity/weave

(psst, we are hiring! http://continuuity.com/careers)