storm-on-yarn: convergence of low-latency and big-data
DESCRIPTION
adoop plays a central role for Yahoo! to provide personalized experiences for our users and create value for our advertisers. In this talk, we will discuss the convergence of low-latency processing and Hadoop platform. To enable the convergence, we have developed Storm-on-YARN to enable Storm streaming/microbatch applications and Hadoop batch applications hosted in a single cluster. Storm applications could leverage YARN for resource management, and apply Hadoop style security to Hadoop datasets on HDFS and HBase. In Storm-on-YARN, YARN is used to launch Storm application master (Nimbus), and enable Nimbus to request resources for Storm workers (Supervisors). YARN resource manager and Storm scheduler work together to support multi-tenancy and high availability. HDFS enables Storm to achieve higher availability of Nimbus itself. We are introducing Hadoop style security into Storm through JAAS authentication (Kerberos and Digest). Storm servers (Nimbus and DRPC) will be configured with authorization plugins for access control and audit. The security context enables Storm applications to access authorized datasets only (including those created by Hadoop applications). Yahoo! is making our contribution on Storm and YARN available as open source. We will work with industry partners to foster the convergence of low-latency processing and big-data.TRANSCRIPT
![Page 1: Storm-on-YARN: Convergence of Low-Latency and Big-Data](https://reader035.vdocuments.site/reader035/viewer/2022070304/54b9c3c84a7959c82c8b45d9/html5/thumbnails/1.jpg)
Storm-on-YARN: Convergence of Low-Latency and Big-Data
Andrew Feng
![Page 2: Storm-on-YARN: Convergence of Low-Latency and Big-Data](https://reader035.vdocuments.site/reader035/viewer/2022070304/54b9c3c84a7959c82c8b45d9/html5/thumbnails/2.jpg)
Self Introduction• Current– Distinguished Architect, Yahoo! Hadoop Team – Core contributor at Storm project
• Past– Online advertisement– Personalization– Serving containers– Cloud services– NoSQL database– Application server
![Page 3: Storm-on-YARN: Convergence of Low-Latency and Big-Data](https://reader035.vdocuments.site/reader035/viewer/2022070304/54b9c3c84a7959c82c8b45d9/html5/thumbnails/3.jpg)
Agenda• Business motivation• Technical overview• Open source
![Page 4: Storm-on-YARN: Convergence of Low-Latency and Big-Data](https://reader035.vdocuments.site/reader035/viewer/2022070304/54b9c3c84a7959c82c8b45d9/html5/thumbnails/4.jpg)
Yahoo!: Personalized Web
![Page 5: Storm-on-YARN: Convergence of Low-Latency and Big-Data](https://reader035.vdocuments.site/reader035/viewer/2022070304/54b9c3c84a7959c82c8b45d9/html5/thumbnails/5.jpg)
Personalization w/ Hadoop
Understand user & content/ads
Select relevant content & ads
![Page 6: Storm-on-YARN: Convergence of Low-Latency and Big-Data](https://reader035.vdocuments.site/reader035/viewer/2022070304/54b9c3c84a7959c82c8b45d9/html5/thumbnails/6.jpg)
Personalization w/ Low-Latency
Latest content per current interests
![Page 7: Storm-on-YARN: Convergence of Low-Latency and Big-Data](https://reader035.vdocuments.site/reader035/viewer/2022070304/54b9c3c84a7959c82c8b45d9/html5/thumbnails/7.jpg)
Big Data + Low Latency: Design Pattern
• Personalization• Ad targeting• Reporting• Ad budgeting• Fraud detection• Trending topics
![Page 8: Storm-on-YARN: Convergence of Low-Latency and Big-Data](https://reader035.vdocuments.site/reader035/viewer/2022070304/54b9c3c84a7959c82c8b45d9/html5/thumbnails/8.jpg)
Agenda• Business motivation• Technical overview• Open source
![Page 9: Storm-on-YARN: Convergence of Low-Latency and Big-Data](https://reader035.vdocuments.site/reader035/viewer/2022070304/54b9c3c84a7959c82c8b45d9/html5/thumbnails/9.jpg)
Hadoop YARN: MapReduce & Beyond
• Yahoo! deployed YARN into 30k+ nodes in production.
• YARN Apps … MapReduce, Storm, etc.
![Page 10: Storm-on-YARN: Convergence of Low-Latency and Big-Data](https://reader035.vdocuments.site/reader035/viewer/2022070304/54b9c3c84a7959c82c8b45d9/html5/thumbnails/10.jpg)
Storm: Distributed Stream Processing
https://github.com/nathanmarz/storm
X
Streams• User activities• Ad beacons• Content feeds• Social feeds• …
![Page 11: Storm-on-YARN: Convergence of Low-Latency and Big-Data](https://reader035.vdocuments.site/reader035/viewer/2022070304/54b9c3c84a7959c82c8b45d9/html5/thumbnails/11.jpg)
Storm Clusters on Hadoop Grid
![Page 12: Storm-on-YARN: Convergence of Low-Latency and Big-Data](https://reader035.vdocuments.site/reader035/viewer/2022070304/54b9c3c84a7959c82c8b45d9/html5/thumbnails/12.jpg)
Storm-YARN: Launch Cluster• Result: <appID> of the
newly launched Storm master
• storm-yarn launch <conf> – Initial # of supervisors– memory size of
allocated container
![Page 13: Storm-on-YARN: Convergence of Low-Latency and Big-Data](https://reader035.vdocuments.site/reader035/viewer/2022070304/54b9c3c84a7959c82c8b45d9/html5/thumbnails/13.jpg)
Storm-YARN: Manage Cluster
1. addSupervisors <appID> <count>
2. getStormConfig <appID>3. setStormConfig <appID> 4. startNimbus <appID> 5. stopNimbus <appID> 6. startUI <appID> 7. stopUI <appID> 8. startSupervisors <appID> 9. stopSupervisors <appID>
![Page 14: Storm-on-YARN: Convergence of Low-Latency and Big-Data](https://reader035.vdocuments.site/reader035/viewer/2022070304/54b9c3c84a7959c82c8b45d9/html5/thumbnails/14.jpg)
Storm-YARN: Deploy Apps
storm jar <appJar>
![Page 15: Storm-on-YARN: Convergence of Low-Latency and Big-Data](https://reader035.vdocuments.site/reader035/viewer/2022070304/54b9c3c84a7959c82c8b45d9/html5/thumbnails/15.jpg)
Authentication/Authorization/Audit
• Authentication plugins– Digest– Kerberos (soon)– None– Bring your own
• Authorization plugins– Accept all– Limited operations only– User whitelist– Bring your own
• Audit– Access log
![Page 16: Storm-on-YARN: Convergence of Low-Latency and Big-Data](https://reader035.vdocuments.site/reader035/viewer/2022070304/54b9c3c84a7959c82c8b45d9/html5/thumbnails/16.jpg)
Agenda• Business motivation• Technical overview• Open source
![Page 17: Storm-on-YARN: Convergence of Low-Latency and Big-Data](https://reader035.vdocuments.site/reader035/viewer/2022070304/54b9c3c84a7959c82c8b45d9/html5/thumbnails/17.jpg)
Storm-YARN: Open Source• Code released for
early access – under the Apache 2.0
License– move to apache.org
later
• Welcome contribution!– Submit proposals– Sign Apache style CLA– Submit git pull requests
https://github.com/yahoo/storm-yarn
![Page 18: Storm-on-YARN: Convergence of Low-Latency and Big-Data](https://reader035.vdocuments.site/reader035/viewer/2022070304/54b9c3c84a7959c82c8b45d9/html5/thumbnails/18.jpg)
Storm-YARN: mvn test
1. storm-yarn launch – ./conf/storm.yaml --stormZip lib/
storm.zip --appname storm-on-yarn-test --output target/appId.txt
2. storm-yarn getStormConfig – ./conf/storm.yaml --
appId application_1372121842369_0001 --output ./lib/storm/storm.yaml
3. storm jar – lib/storm-starter-0.0.1-SNAPSHOT.jar – storm.starter.WordCountTopology – word-count-topology
4. storm kill – word-count-topology
5. storm-yarn shutdown– ./conf/storm.yaml --
appId application_1372121842369_0001
![Page 19: Storm-on-YARN: Convergence of Low-Latency and Big-Data](https://reader035.vdocuments.site/reader035/viewer/2022070304/54b9c3c84a7959c82c8b45d9/html5/thumbnails/19.jpg)
Storm-YARN: Deployment
Install Storm S/W1. hadoop fs –put
storm.zip /lib/storm/<version>/storm.zip
Apply Storm-YARN
2. storm-yarn launch <appID>
3. storm-yarn getStormConfig <appID>
<storm.yaml>
4. storm jar <appJar>
![Page 20: Storm-on-YARN: Convergence of Low-Latency and Big-Data](https://reader035.vdocuments.site/reader035/viewer/2022070304/54b9c3c84a7959c82c8b45d9/html5/thumbnails/20.jpg)
Conclusion
• YARN empowers the emergence of big-data & low-latency processing
• Yahoo! open source:– Storm-yarn @
github/yahoo– Spark-yarn @ spark-
project.org
![Page 21: Storm-on-YARN: Convergence of Low-Latency and Big-Data](https://reader035.vdocuments.site/reader035/viewer/2022070304/54b9c3c84a7959c82c8b45d9/html5/thumbnails/21.jpg)
?Questions