apache zeppelin and spark for enterprise data science
TRANSCRIPT
![Page 1: Apache Zeppelin and Spark for Enterprise Data Science](https://reader031.vdocuments.site/reader031/viewer/2022030309/58f212e81a28abf46f8b4593/html5/thumbnails/1.jpg)
1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Enabling Apache Zeppelin* and Spark* for Data Science in the Enterprise
Bikas Saha@bikassaha
*Apache Hadoop, Falcon, Atlas, Tez, Sqoop, Flume, Kafka, Pig, Hive,HBase, Accumulo, Storm, Solr, Spark, Ranger, Knox, Ambari, ZooKeeper,Oozie, Zeppelin and the Hadoop elephant logo are trademarks of theApache Software Foundation.
![Page 2: Apache Zeppelin and Spark for Enterprise Data Science](https://reader031.vdocuments.site/reader031/viewer/2022030309/58f212e81a28abf46f8b4593/html5/thumbnails/2.jpg)
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
AgendaMaking Big Data Science easy to approach
What are the current issues for the enterprise
Making Apache Zeppelin enterprise ready
Future Roadmap
![Page 3: Apache Zeppelin and Spark for Enterprise Data Science](https://reader031.vdocuments.site/reader031/viewer/2022030309/58f212e81a28abf46f8b4593/html5/thumbnails/3.jpg)
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Zeppelin
![Page 4: Apache Zeppelin and Spark for Enterprise Data Science](https://reader031.vdocuments.site/reader031/viewer/2022030309/58f212e81a28abf46f8b4593/html5/thumbnails/4.jpg)
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Zeppelin makes Big Data Science Easy to Approach
Zero install – Just connect via a web browser and ready to run
Support for multiple execution platforms (Apache Spark, JDBC, Hive…)
Support for multiple languages (Scala, SQL, Python…)
Support for built-in visualizations
Support for reporting
Support for sharing and collaborative work
Does NOT have machine learning built-in – that’s where Apache Spark comes in (or your favorite SQL engine Apache Flink/Drill/Hive… and 30+ others)
![Page 5: Apache Zeppelin and Spark for Enterprise Data Science](https://reader031.vdocuments.site/reader031/viewer/2022030309/58f212e81a28abf46f8b4593/html5/thumbnails/5.jpg)
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Zeppelin for Sharing
![Page 6: Apache Zeppelin and Spark for Enterprise Data Science](https://reader031.vdocuments.site/reader031/viewer/2022030309/58f212e81a28abf46f8b4593/html5/thumbnails/6.jpg)
6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
AgendaMaking Big Data Science easy to approach
What are the current issues for the enterprise
Making Apache Zeppelin enterprise ready
Future Roadmap
![Page 7: Apache Zeppelin and Spark for Enterprise Data Science](https://reader031.vdocuments.site/reader031/viewer/2022030309/58f212e81a28abf46f8b4593/html5/thumbnails/7.jpg)
7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Current Apache Zeppelin and Spark integration
ZeppelinServer
SparkDriver
U
s
e
r SparkExecutor
SparkExecutor
SparkExecutor
SparkExecutor
SparkExecutor
SparkExecutor
SparkExecutor
SparkExecutor
![Page 8: Apache Zeppelin and Spark for Enterprise Data Science](https://reader031.vdocuments.site/reader031/viewer/2022030309/58f212e81a28abf46f8b4593/html5/thumbnails/8.jpg)
8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Architectural Issue with Secure Data Access
ZeppelinServer
SparkDriver
U
s
e
r
1
SparkExecutor
SparkExecutor
SparkExecutor
SparkExecutor
SparkExecutor
SparkExecutor
SparkExecutor
SparkExecutor
Zeppelin ServerUser
H
D
F
S
![Page 9: Apache Zeppelin and Spark for Enterprise Data Science](https://reader031.vdocuments.site/reader031/viewer/2022030309/58f212e81a28abf46f8b4593/html5/thumbnails/9.jpg)
9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Architectural Issues with Multi-Tenancy – Fault Tolerance
ZeppelinServer
SparkDriver
U
s
e
r
1
SparkExecutor
SparkExecutor
SparkExecutor
SparkExecutor
SparkExecutor
SparkExecutor
SparkExecutor
SparkExecutor
U
s
e
r
2
User 1 failure affects User 2
Heavy-weight Spark drivers
![Page 10: Apache Zeppelin and Spark for Enterprise Data Science](https://reader031.vdocuments.site/reader031/viewer/2022030309/58f212e81a28abf46f8b4593/html5/thumbnails/10.jpg)
10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Architectural Issues with Multi-Tenancy – Privacy
ZeppelinServer
SparkDriver
U
s
e
r
1
SparkExecutor
SparkExecutor
SparkExecutor
SparkExecutor
SparkExecutor
SparkExecutor
SparkExecutor
U
s
e
r
2
User 1 can
access User 2Data
![Page 11: Apache Zeppelin and Spark for Enterprise Data Science](https://reader031.vdocuments.site/reader031/viewer/2022030309/58f212e81a28abf46f8b4593/html5/thumbnails/11.jpg)
11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
AgendaMaking Big Data Science easy to approach
What are the current issues for the enterprise
Enterprise Ready Big Data Science
Future Roadmap
![Page 12: Apache Zeppelin and Spark for Enterprise Data Science](https://reader031.vdocuments.site/reader031/viewer/2022030309/58f212e81a28abf46f8b4593/html5/thumbnails/12.jpg)
12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Livy Server as a Session Management Service
LivyServer
Remote Spark Driver
SessionRemote Context
Interactive REST API
BatchREST API
Standard Spark Batch Job
SparkExecutor
SparkExecutor
SparkExecutor
SparkExecutor
![Page 13: Apache Zeppelin and Spark for Enterprise Data Science](https://reader031.vdocuments.site/reader031/viewer/2022030309/58f212e81a28abf46f8b4593/html5/thumbnails/13.jpg)
13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Secure Data Access - Solved
ZeppelinServer
LivyInterpreter
U
s
e
r
SparkExecutor
SparkExecutor
LivyServer
Remote Spark Driver
Session
Remote Context
User
HDFS
![Page 14: Apache Zeppelin and Spark for Enterprise Data Science](https://reader031.vdocuments.site/reader031/viewer/2022030309/58f212e81a28abf46f8b4593/html5/thumbnails/14.jpg)
14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Multi Tenancy - Solved
ZeppelinServer
LivyInterpreter
LivyServer
Session 1
U
s
e
r
1
U
s
e
r
2
LivyInterpreter
Session 2
Remote Spark Driver
Remote Context
SparkExecutor
Remote Spark Driver
Remote Context
SparkExecutor
![Page 15: Apache Zeppelin and Spark for Enterprise Data Science](https://reader031.vdocuments.site/reader031/viewer/2022030309/58f212e81a28abf46f8b4593/html5/thumbnails/15.jpg)
15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
AgendaMaking Big Data Science easy to approach
What are the current issues for the enterprise
Making Apache Zeppelin enterprise ready
Future Roadmap
![Page 16: Apache Zeppelin and Spark for Enterprise Data Science](https://reader031.vdocuments.site/reader031/viewer/2022030309/58f212e81a28abf46f8b4593/html5/thumbnails/16.jpg)
16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Near Term Improvements
Session Management
Debuggability
Unified session for all languages
Better visualizations for Machine Learning
Support for Spark 2.0
![Page 17: Apache Zeppelin and Spark for Enterprise Data Science](https://reader031.vdocuments.site/reader031/viewer/2022030309/58f212e81a28abf46f8b4593/html5/thumbnails/17.jpg)
17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Long Term Improvements
Controlled sharing of sessions for collaboration
Data exploration and browsing with metadata
Taking the model from training to production
![Page 18: Apache Zeppelin and Spark for Enterprise Data Science](https://reader031.vdocuments.site/reader031/viewer/2022030309/58f212e81a28abf46f8b4593/html5/thumbnails/18.jpg)
18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Thank You