introduction to zeppelin - events.static.linuxfound.org · what gets measured, gets managed - peter...

32
Introduction to Zeppelin Lee moon soo, NFLabs [email protected]

Upload: trannhu

Post on 02-Apr-2018

224 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Introduction to Zeppelin - events.static.linuxfound.org · What gets measured, gets managed - Peter Drucker A price of light is less than the cost of darkness - Arthur C. Nielsen

Introduction to Zeppelin

Lee moon soo, NFLabs [email protected]

Page 2: Introduction to Zeppelin - events.static.linuxfound.org · What gets measured, gets managed - Peter Drucker A price of light is less than the cost of darkness - Arthur C. Nielsen

What is Zeppelin?

Page 3: Introduction to Zeppelin - events.static.linuxfound.org · What gets measured, gets managed - Peter Drucker A price of light is less than the cost of darkness - Arthur C. Nielsen

Let’s see demo

Page 4: Introduction to Zeppelin - events.static.linuxfound.org · What gets measured, gets managed - Peter Drucker A price of light is less than the cost of darkness - Arthur C. Nielsen

How Zeppelin started

Page 5: Introduction to Zeppelin - events.static.linuxfound.org · What gets measured, gets managed - Peter Drucker A price of light is less than the cost of darkness - Arthur C. Nielsen

What gets measured, gets managed - Peter Drucker

A price of light is less than the cost of darkness - Arthur C. Nielsen , Founder of ACNielsen

War is ninety percent information - Napoleon Bonaparate

In God we trust, all others must bring data - W. Edwards Deming

Data = Understanding

Page 6: Introduction to Zeppelin - events.static.linuxfound.org · What gets measured, gets managed - Peter Drucker A price of light is less than the cost of darkness - Arthur C. Nielsen

For data analysis, we need tool

Page 7: Introduction to Zeppelin - events.static.linuxfound.org · What gets measured, gets managed - Peter Drucker A price of light is less than the cost of darkness - Arthur C. Nielsen

But couldn’t find one i like

ImpalaDrill

Hive tajoPig

Cloudera-ML

MLLib

MRQL

Page 8: Introduction to Zeppelin - events.static.linuxfound.org · What gets measured, gets managed - Peter Drucker A price of light is less than the cost of darkness - Arthur C. Nielsen

Decided to make one

Really good one

Page 9: Introduction to Zeppelin - events.static.linuxfound.org · What gets measured, gets managed - Peter Drucker A price of light is less than the cost of darkness - Arthur C. Nielsen

Good analytics environment

Analytical language Many Libraries Interactive Visualization Sharing

Page 10: Introduction to Zeppelin - events.static.linuxfound.org · What gets measured, gets managed - Peter Drucker A price of light is less than the cost of darkness - Arthur C. Nielsen

The First attempt 2012~2013

Page 11: Introduction to Zeppelin - events.static.linuxfound.org · What gets measured, gets managed - Peter Drucker A price of light is less than the cost of darkness - Arthur C. Nielsen

It’s got graphic REPL, deployment, search, import tool

But failed, because

Page 12: Introduction to Zeppelin - events.static.linuxfound.org · What gets measured, gets managed - Peter Drucker A price of light is less than the cost of darkness - Arthur C. Nielsen

It wasn’t widely used It wasn’t opensource

Page 13: Introduction to Zeppelin - events.static.linuxfound.org · What gets measured, gets managed - Peter Drucker A price of light is less than the cost of darkness - Arthur C. Nielsen

Second attempt 2013~2014

Opensourced graphic REPL from commercial product

The first version of Zeppelin

Page 14: Introduction to Zeppelin - events.static.linuxfound.org · What gets measured, gets managed - Peter Drucker A price of light is less than the cost of darkness - Arthur C. Nielsen

Second attempt 2013~2014

Not widely used

It was slow, difficult to use, …

Page 15: Introduction to Zeppelin - events.static.linuxfound.org · What gets measured, gets managed - Peter Drucker A price of light is less than the cost of darkness - Arthur C. Nielsen

Third attempt 2014~

After few weeks of study, decided to rewrite Zeppelin

Graphic REPl -> Notebook with Apache Spark integration

Page 16: Introduction to Zeppelin - events.static.linuxfound.org · What gets measured, gets managed - Peter Drucker A price of light is less than the cost of darkness - Arthur C. Nielsen

Third attempt 2014~

Next week, beautifulized

Page 17: Introduction to Zeppelin - events.static.linuxfound.org · What gets measured, gets managed - Peter Drucker A price of light is less than the cost of darkness - Arthur C. Nielsen

Why do you like Zeppelin?

Page 18: Introduction to Zeppelin - events.static.linuxfound.org · What gets measured, gets managed - Peter Drucker A price of light is less than the cost of darkness - Arthur C. Nielsen

Web Based

web framework

d3.js

Visualization

Language

Package management

bower

Build

Page 19: Introduction to Zeppelin - events.static.linuxfound.org · What gets measured, gets managed - Peter Drucker A price of light is less than the cost of darkness - Arthur C. Nielsen

Notebook

Code

Result

Code

Result

결과가 텝과 뉴라인으로 구분된 테이블 형식이면 자동으로 테이블로 포메팅

Notebook

Page 20: Introduction to Zeppelin - events.static.linuxfound.org · What gets measured, gets managed - Peter Drucker A price of light is less than the cost of darkness - Arthur C. Nielsen

Notebook

Data Visualize

Page 21: Introduction to Zeppelin - events.static.linuxfound.org · What gets measured, gets managed - Peter Drucker A price of light is less than the cost of darkness - Arthur C. Nielsen

Pivot

Pivot

Page 22: Introduction to Zeppelin - events.static.linuxfound.org · What gets measured, gets managed - Peter Drucker A price of light is less than the cost of darkness - Arthur C. Nielsen

Dynamic Form Creation

Dynamic Form

Page 23: Introduction to Zeppelin - events.static.linuxfound.org · What gets measured, gets managed - Peter Drucker A price of light is less than the cost of darkness - Arthur C. Nielsen

REST

Web socket

Synchronized

Sharing

Page 24: Introduction to Zeppelin - events.static.linuxfound.org · What gets measured, gets managed - Peter Drucker A price of light is less than the cost of darkness - Arthur C. Nielsen

Zeppelin simplifies data analysis

Page 25: Introduction to Zeppelin - events.static.linuxfound.org · What gets measured, gets managed - Peter Drucker A price of light is less than the cost of darkness - Arthur C. Nielsen

Why do your project likes Zeppelin?

Page 26: Introduction to Zeppelin - events.static.linuxfound.org · What gets measured, gets managed - Peter Drucker A price of light is less than the cost of darkness - Arthur C. Nielsen

….

Interpreters Spark PySpark SparkSQL Hive Mysql (JDBC) Markdown Shell

Easy to extend

Page 27: Introduction to Zeppelin - events.static.linuxfound.org · What gets measured, gets managed - Peter Drucker A price of light is less than the cost of darkness - Arthur C. Nielsen

Zeppelin Interpreter Architecture

Classloader

InterpreterGroup

Interpreter Interpreter

Server

Client

HTTP Rest / Websocket

Classloader

InterpreterGroup

Spark SparkSQL Dep

Page 28: Introduction to Zeppelin - events.static.linuxfound.org · What gets measured, gets managed - Peter Drucker A price of light is less than the cost of darkness - Arthur C. Nielsen

Classloader

InterpreterGroup

Interpreter Interpreter

Server

Client

HTTP Rest / Websocket

InterpreterGroup

Interpreter Interpreter …

Seprate JVM process

Thrift

Zeppelin Interpreter Architecture

Page 29: Introduction to Zeppelin - events.static.linuxfound.org · What gets measured, gets managed - Peter Drucker A price of light is less than the cost of darkness - Arthur C. Nielsen

public abstract void open();

public abstract void close();

public abstract InterpreterResult interpret(String st, InterpreterContext context);

public abstract void cancel(InterpreterContext context);

public abstract int getProgress(InterpreterContext context);

public abstract List<String> completion(String buf, int cursor);

public abstract FormType getFormType();

public Scheduler getScheduler();

Implementing new Interpreter

Must have

Good to have

More controls

Page 30: Introduction to Zeppelin - events.static.linuxfound.org · What gets measured, gets managed - Peter Drucker A price of light is less than the cost of darkness - Arthur C. Nielsen

Roadmap• Integration with more distributed processing framework • Flink, Ignite, Tajo, etc..

• Output message streaming • Ability to create rich GUI

Page 31: Introduction to Zeppelin - events.static.linuxfound.org · What gets measured, gets managed - Peter Drucker A price of light is less than the cost of darkness - Arthur C. Nielsen

ImpalaDrill

Hive tajoPig

Cloudera-ML

MLLib

MRQL