spark hsinchu meetup
TRANSCRIPT
![Page 1: Spark Hsinchu meetup](https://reader031.vdocuments.site/reader031/viewer/2022021507/5872ec091a28abfa548b73f3/html5/thumbnails/1.jpg)
Spark Summit 2016 @San Francisco
Stana He
![Page 2: Spark Hsinchu meetup](https://reader031.vdocuments.site/reader031/viewer/2022021507/5872ec091a28abfa548b73f3/html5/thumbnails/2.jpg)
• gitter https://gitter.im/hubertfc/SparkHsinchu
• Gitter app https://gitter.im/apps
• meetup https://www.meetup.com/Apache-Spark-Hsinchu/
![Page 3: Spark Hsinchu meetup](https://reader031.vdocuments.site/reader031/viewer/2022021507/5872ec091a28abfa548b73f3/html5/thumbnails/3.jpg)
Who am I ?
• Stana He
• Is-land
![Page 4: Spark Hsinchu meetup](https://reader031.vdocuments.site/reader031/viewer/2022021507/5872ec091a28abfa548b73f3/html5/thumbnails/4.jpg)
Agenda
• Something about Apache Spark
• Enterprise use case
![Page 5: Spark Hsinchu meetup](https://reader031.vdocuments.site/reader031/viewer/2022021507/5872ec091a28abfa548b73f3/html5/thumbnails/5.jpg)
What is Apache Spark?
• Open source cluster computing framework.
• Developed at the UC, Berkeley's AMPLab.
• Donated to the Apache Software Foundation.
![Page 6: Spark Hsinchu meetup](https://reader031.vdocuments.site/reader031/viewer/2022021507/5872ec091a28abfa548b73f3/html5/thumbnails/6.jpg)
Benefits of Apache Spark• Speed
- 100x faster than Hadoop for large scale data processing.
• Ease of Use
- Easy-to-use APIs.
• Unified Engine
- Packaged with higher-level libraries,including streaming data,SQL queries,machine learning and graph processing.
![Page 7: Spark Hsinchu meetup](https://reader031.vdocuments.site/reader031/viewer/2022021507/5872ec091a28abfa548b73f3/html5/thumbnails/7.jpg)
What’s New in 2.0 ?• Structured API improvements
- SQL, DataFrames, Datasets
• Structured Streaming
• MLlib model export
• MLlib R bindings
• SQL 2003 support
• Scala 2.12 support
![Page 8: Spark Hsinchu meetup](https://reader031.vdocuments.site/reader031/viewer/2022021507/5872ec091a28abfa548b73f3/html5/thumbnails/8.jpg)
What’s New in 2.0 ?
• Whole-stage code generation
- Fuse across multiple operators
• Optimized input / output
- Apache Parquet + built-in cache
reference:http://www.slideshare.net/databricks/spark-summit-san-francisco-2016-matei-zaharia-keynote-apache-spark-20
![Page 9: Spark Hsinchu meetup](https://reader031.vdocuments.site/reader031/viewer/2022021507/5872ec091a28abfa548b73f3/html5/thumbnails/9.jpg)
Enterprise use case
![Page 10: Spark Hsinchu meetup](https://reader031.vdocuments.site/reader031/viewer/2022021507/5872ec091a28abfa548b73f3/html5/thumbnails/10.jpg)
![Page 11: Spark Hsinchu meetup](https://reader031.vdocuments.site/reader031/viewer/2022021507/5872ec091a28abfa548b73f3/html5/thumbnails/11.jpg)
Winning the game with Spark!
![Page 12: Spark Hsinchu meetup](https://reader031.vdocuments.site/reader031/viewer/2022021507/5872ec091a28abfa548b73f3/html5/thumbnails/12.jpg)
Unfortunately, it doesn’t!
![Page 13: Spark Hsinchu meetup](https://reader031.vdocuments.site/reader031/viewer/2022021507/5872ec091a28abfa548b73f3/html5/thumbnails/13.jpg)
reference:http://www.slideshare.net/SparkSummit/video-games-at-scale-improving-the-gaming-experience-with-apache-spark
![Page 14: Spark Hsinchu meetup](https://reader031.vdocuments.site/reader031/viewer/2022021507/5872ec091a28abfa548b73f3/html5/thumbnails/14.jpg)
Players and Data
• 67+ million monthly active players
• 500+ billion data points per day
• 26 petabytes data collected since beta
![Page 15: Spark Hsinchu meetup](https://reader031.vdocuments.site/reader031/viewer/2022021507/5872ec091a28abfa548b73f3/html5/thumbnails/15.jpg)
What does Spark do ?
• Spark SQL Data exploration and reporting
• Spark Streaming Network performance
• Spark MLlib Recommendation system
![Page 16: Spark Hsinchu meetup](https://reader031.vdocuments.site/reader031/viewer/2022021507/5872ec091a28abfa548b73f3/html5/thumbnails/16.jpg)
Spark SQL -Data exploration and reporting
![Page 17: Spark Hsinchu meetup](https://reader031.vdocuments.site/reader031/viewer/2022021507/5872ec091a28abfa548b73f3/html5/thumbnails/17.jpg)
Performance
reference:http://www.slideshare.net/SparkSummit/video-games-at-scale-improving-the-gaming-experience-with-apache-spark
![Page 18: Spark Hsinchu meetup](https://reader031.vdocuments.site/reader031/viewer/2022021507/5872ec091a28abfa548b73f3/html5/thumbnails/18.jpg)
Spark Streaming -Network performance
![Page 19: Spark Hsinchu meetup](https://reader031.vdocuments.site/reader031/viewer/2022021507/5872ec091a28abfa548b73f3/html5/thumbnails/19.jpg)
Build network
Riot Directreference:http://www.slideshare.net/SparkSummit/video-games-at-scale-improving-the-gaming-experience-with-apache-spark
![Page 20: Spark Hsinchu meetup](https://reader031.vdocuments.site/reader031/viewer/2022021507/5872ec091a28abfa548b73f3/html5/thumbnails/20.jpg)
Normal Network Model
reference:http://www.slideshare.net/SparkSummit/video-games-at-scale-improving-the-gaming-experience-with-apache-spark
![Page 21: Spark Hsinchu meetup](https://reader031.vdocuments.site/reader031/viewer/2022021507/5872ec091a28abfa548b73f3/html5/thumbnails/21.jpg)
Detect model
reference:http://www.slideshare.net/SparkSummit/video-games-at-scale-improving-the-gaming-experience-with-apache-spark
![Page 22: Spark Hsinchu meetup](https://reader031.vdocuments.site/reader031/viewer/2022021507/5872ec091a28abfa548b73f3/html5/thumbnails/22.jpg)
Another detect model
reference:http://www.slideshare.net/SparkSummit/video-games-at-scale-improving-the-gaming-experience-with-apache-spark
![Page 23: Spark Hsinchu meetup](https://reader031.vdocuments.site/reader031/viewer/2022021507/5872ec091a28abfa548b73f3/html5/thumbnails/23.jpg)
Model Building/Evaluation
HIVE(stores aggregated data)
Kafka
Consume/Aggregate
Alerts
Spark
Elasticsearch
Dashboards
reference:http://www.slideshare.net/SparkSummit/video-games-at-scale-improving-the-gaming-experience-with-apache-spark
![Page 24: Spark Hsinchu meetup](https://reader031.vdocuments.site/reader031/viewer/2022021507/5872ec091a28abfa548b73f3/html5/thumbnails/24.jpg)
Spark MLlib -Recommendation system
![Page 25: Spark Hsinchu meetup](https://reader031.vdocuments.site/reader031/viewer/2022021507/5872ec091a28abfa548b73f3/html5/thumbnails/25.jpg)
reference:http://www.slideshare.net/SparkSummit/video-games-at-scale-improving-the-gaming-experience-with-apache-spark
![Page 26: Spark Hsinchu meetup](https://reader031.vdocuments.site/reader031/viewer/2022021507/5872ec091a28abfa548b73f3/html5/thumbnails/26.jpg)
reference:http://www.slideshare.net/SparkSummit/video-games-at-scale-improving-the-gaming-experience-with-apache-spark
![Page 27: Spark Hsinchu meetup](https://reader031.vdocuments.site/reader031/viewer/2022021507/5872ec091a28abfa548b73f3/html5/thumbnails/27.jpg)
Modeling/Evaluation
HIVE
Explore/Feature
engineering
Recommendation
Game Server
Data
Feature
SparkSQL MLlib
Spark
reference:http://www.slideshare.net/SparkSummit/video-games-at-scale-improving-the-gaming-experience-with-apache-spark
![Page 28: Spark Hsinchu meetup](https://reader031.vdocuments.site/reader031/viewer/2022021507/5872ec091a28abfa548b73f3/html5/thumbnails/28.jpg)
Q & A
![Page 30: Spark Hsinchu meetup](https://reader031.vdocuments.site/reader031/viewer/2022021507/5872ec091a28abfa548b73f3/html5/thumbnails/30.jpg)
Spark Cookbook-• Ch1. Getting Started with Apache Spark (Chunhung Huang) (4 )
• Ch2. Developing Applications with Spark ( )
• Spark RDD (Allen )
• Ch3. External Data Sources ( ) (8 )
• Ch4. Spark SQL ( )
• Ch5. Spark Streaming ( )
• Ch6. Getting Started with Machine Learning Using MLlib ( )
• Ch7. Supervised Learning with MLlib - Regression (Dean Du)
• Ch8. Supervised Learning with MLlib - Classification ( )
• Ch9. Unsupervised Learning with MLlib (Vito)
• Ch10.Recommender System (Leorick)
• Ch11.Graph Processing using GraphX ( )
• Ch12.Optimizations and Performance Tuning ( )