continuous integration for spark apps by sean mcintyre
TRANSCRIPT
![Page 1: Continuous Integration for Spark Apps by Sean McIntyre](https://reader033.vdocuments.site/reader033/viewer/2022051502/58f9a933760da3da068b6c25/html5/thumbnails/1.jpg)
Continuous Integrationfor Spark Apps
![Page 2: Continuous Integration for Spark Apps by Sean McIntyre](https://reader033.vdocuments.site/reader033/viewer/2022051502/58f9a933760da3da068b6c25/html5/thumbnails/2.jpg)
Hi, I’m Sean!
© 2015 Uncharted Software Inc.
![Page 3: Continuous Integration for Spark Apps by Sean McIntyre](https://reader033.vdocuments.site/reader033/viewer/2022051502/58f9a933760da3da068b6c25/html5/thumbnails/3.jpg)
It’s hard to test Spark Apps :(
© 2015 Uncharted Software Inc.
![Page 4: Continuous Integration for Spark Apps by Sean McIntyre](https://reader033.vdocuments.site/reader033/viewer/2022051502/58f9a933760da3da068b6c25/html5/thumbnails/4.jpg)
Case Study: Uncharted Spark Pipeline
© 2015 Uncharted Software Inc.
![Page 5: Continuous Integration for Spark Apps by Sean McIntyre](https://reader033.vdocuments.site/reader033/viewer/2022051502/58f9a933760da3da068b6c25/html5/thumbnails/5.jpg)
Case Study: Uncharted Spark PipelineSome key issues:
● Ensure reliability● Prevent regressions● Maintain compatibility with multiple versions of Spark● Open-source - need a quick and easy way to evaluate PRs
© 2015 Uncharted Software Inc.
![Page 6: Continuous Integration for Spark Apps by Sean McIntyre](https://reader033.vdocuments.site/reader033/viewer/2022051502/58f9a933760da3da068b6c25/html5/thumbnails/6.jpg)
What is Continuous Integration?
© 2015 Uncharted Software Inc.
![Page 7: Continuous Integration for Spark Apps by Sean McIntyre](https://reader033.vdocuments.site/reader033/viewer/2022051502/58f9a933760da3da068b6c25/html5/thumbnails/7.jpg)
“Continuous Integration (CI) is a development practice that requires developers to integrate code into a shared repository several times a day. Each check-in is then verified by an
automated build, allowing teams to detect problems early.”
-- ThoughtWorks
© 2015 Uncharted Software Inc.
![Page 8: Continuous Integration for Spark Apps by Sean McIntyre](https://reader033.vdocuments.site/reader033/viewer/2022051502/58f9a933760da3da068b6c25/html5/thumbnails/8.jpg)
“Continuous Integration (CI) is a development practice that is pretty damnedimportant for writing quality software.”
-- Me
© 2015 Uncharted Software Inc.
![Page 9: Continuous Integration for Spark Apps by Sean McIntyre](https://reader033.vdocuments.site/reader033/viewer/2022051502/58f9a933760da3da068b6c25/html5/thumbnails/9.jpg)
So, What is Continuous Integration?
© 2015 Uncharted Software Inc.
![Page 10: Continuous Integration for Spark Apps by Sean McIntyre](https://reader033.vdocuments.site/reader033/viewer/2022051502/58f9a933760da3da068b6c25/html5/thumbnails/10.jpg)
Best Practices, courtesy of Wikipedia1. Maintain a code repository (Git)
© 2015 Uncharted Software Inc.
![Page 11: Continuous Integration for Spark Apps by Sean McIntyre](https://reader033.vdocuments.site/reader033/viewer/2022051502/58f9a933760da3da068b6c25/html5/thumbnails/11.jpg)
Best Practices, courtesy of Wikipedia1. Maintain a code repository (Git)2. Automate the build (Gradle)
© 2015 Uncharted Software Inc.
![Page 12: Continuous Integration for Spark Apps by Sean McIntyre](https://reader033.vdocuments.site/reader033/viewer/2022051502/58f9a933760da3da068b6c25/html5/thumbnails/12.jpg)
Best Practices, courtesy of Wikipedia1. Maintain a code repository (Git)2. Automate the build (Gradle)3. Tests should be part of the build (ScalaTest)
© 2015 Uncharted Software Inc.
![Page 13: Continuous Integration for Spark Apps by Sean McIntyre](https://reader033.vdocuments.site/reader033/viewer/2022051502/58f9a933760da3da068b6c25/html5/thumbnails/13.jpg)
Best Practices, courtesy of Wikipedia1. Maintain a code repository (Git)2. Automate the build (Gradle)3. Tests should be part of the build (ScalaTest)4. Commit/push feature branches often
© 2015 Uncharted Software Inc.
![Page 14: Continuous Integration for Spark Apps by Sean McIntyre](https://reader033.vdocuments.site/reader033/viewer/2022051502/58f9a933760da3da068b6c25/html5/thumbnails/14.jpg)
Best Practices, courtesy of Wikipedia1. Maintain a code repository (Git)2. Automate the build (Gradle)3. Tests should be part of the build (ScalaTest)4. Commit/push feature branches often5. Build (and test) All The Branches
© 2015 Uncharted Software Inc.
![Page 15: Continuous Integration for Spark Apps by Sean McIntyre](https://reader033.vdocuments.site/reader033/viewer/2022051502/58f9a933760da3da068b6c25/html5/thumbnails/15.jpg)
Best Practices, courtesy of Wikipedia1. Maintain a code repository (Git)2. Automate the build (Gradle)3. Tests should be part of the build (ScalaTest)4. Commit/push feature branches often5. Build (and test) All The Branches6. Test in a clone of the production environment
© 2015 Uncharted Software Inc.
![Page 16: Continuous Integration for Spark Apps by Sean McIntyre](https://reader033.vdocuments.site/reader033/viewer/2022051502/58f9a933760da3da068b6c25/html5/thumbnails/16.jpg)
Best Practices, courtesy of Wikipedia1. Maintain a code repository (Git)2. Automate the build (Gradle)3. Tests should be part of the build (ScalaTest)4. Commit/push feature branches often5. Build (and test) All The Branches6. Test in a clone of the production environment7. Keep the build fast
© 2015 Uncharted Software Inc.
![Page 17: Continuous Integration for Spark Apps by Sean McIntyre](https://reader033.vdocuments.site/reader033/viewer/2022051502/58f9a933760da3da068b6c25/html5/thumbnails/17.jpg)
Best Practices, courtesy of Wikipedia1. Maintain a code repository (Git)2. Automate the build (Gradle)3. Tests should be part of the build (ScalaTest)4. Commit/push feature branches often5. Build (and test) All The Branches6. Test in a clone of the production environment7. Keep the build fast8. Everyone can see the results of builds
© 2015 Uncharted Software Inc.
![Page 18: Continuous Integration for Spark Apps by Sean McIntyre](https://reader033.vdocuments.site/reader033/viewer/2022051502/58f9a933760da3da068b6c25/html5/thumbnails/18.jpg)
Best Practices, courtesy of Wikipedia1. Maintain a code repository (Git)2. Automate the build (Gradle)3. Tests should be part of the build (ScalaTest)4. Commit/push feature branches often5. Build (and test) All The Branches6. Test in a clone of the production environment7. Keep the build fast8. Everyone can see the results of builds
} duh.
© 2015 Uncharted Software Inc.
![Page 19: Continuous Integration for Spark Apps by Sean McIntyre](https://reader033.vdocuments.site/reader033/viewer/2022051502/58f9a933760da3da068b6c25/html5/thumbnails/19.jpg)
Best Practices, courtesy of Wikipedia1. Maintain a code repository (Git)2. Automate the build (Gradle)3. Tests should be part of the build (ScalaTest)4. Commit/push feature branches often5. Build (and test) All The Branches6. Test in a clone of the production environment7. Keep the build fast8. Everyone can see the results of builds
} ...less duh.
© 2015 Uncharted Software Inc.
![Page 20: Continuous Integration for Spark Apps by Sean McIntyre](https://reader033.vdocuments.site/reader033/viewer/2022051502/58f9a933760da3da068b6c25/html5/thumbnails/20.jpg)
Why are these difficult with Apache Spark?
5. Build (and test) All The Branches6. Test in a clone of the production
environment7. Keep the build fast8. Everyone can see the results of builds
© 2015 Uncharted Software Inc.
![Page 21: Continuous Integration for Spark Apps by Sean McIntyre](https://reader033.vdocuments.site/reader033/viewer/2022051502/58f9a933760da3da068b6c25/html5/thumbnails/21.jpg)
What is a Spark App?
© 2015 Uncharted Software Inc.
![Page 22: Continuous Integration for Spark Apps by Sean McIntyre](https://reader033.vdocuments.site/reader033/viewer/2022051502/58f9a933760da3da068b6c25/html5/thumbnails/22.jpg)
What is a Spark app?
Source JARSpark ?
This thing.
JAR
© 2015 Uncharted Software Inc.
![Page 23: Continuous Integration for Spark Apps by Sean McIntyre](https://reader033.vdocuments.site/reader033/viewer/2022051502/58f9a933760da3da068b6c25/html5/thumbnails/23.jpg)
And...
Source JARSpark ?
We need to test this
JAR
© 2015 Uncharted Software Inc.
![Page 24: Continuous Integration for Spark Apps by Sean McIntyre](https://reader033.vdocuments.site/reader033/viewer/2022051502/58f9a933760da3da068b6c25/html5/thumbnails/24.jpg)
But...
Source JARScalaTestScala RE
By default, we have this
JAR
(boom)
© 2015 Uncharted Software Inc.
![Page 25: Continuous Integration for Spark Apps by Sean McIntyre](https://reader033.vdocuments.site/reader033/viewer/2022051502/58f9a933760da3da068b6c25/html5/thumbnails/25.jpg)
v1: Squish Spark inside ScalaTest
Source JAR
ScalaTest with
SparkContext
So, we try this
JAR
it works!(sort of)
© 2015 Uncharted Software Inc.
![Page 26: Continuous Integration for Spark Apps by Sean McIntyre](https://reader033.vdocuments.site/reader033/viewer/2022051502/58f9a933760da3da068b6c25/html5/thumbnails/26.jpg)
it works!(sort of)
© 2015 Uncharted Software Inc.
![Page 27: Continuous Integration for Spark Apps by Sean McIntyre](https://reader033.vdocuments.site/reader033/viewer/2022051502/58f9a933760da3da068b6c25/html5/thumbnails/27.jpg)
6. Test in a clone of the production environment
© 2015 Uncharted Software Inc.
![Page 28: Continuous Integration for Spark Apps by Sean McIntyre](https://reader033.vdocuments.site/reader033/viewer/2022051502/58f9a933760da3da068b6c25/html5/thumbnails/28.jpg)
v2: Squish ScalaTest into Spark
Source
TestJAR
Tests Main.scala
Spark
JAR TestJAR
Test Output
JAR
© 2015 Uncharted Software Inc.
![Page 29: Continuous Integration for Spark Apps by Sean McIntyre](https://reader033.vdocuments.site/reader033/viewer/2022051502/58f9a933760da3da068b6c25/html5/thumbnails/29.jpg)
Main.scala
© 2015 Uncharted Software Inc.
![Page 30: Continuous Integration for Spark Apps by Sean McIntyre](https://reader033.vdocuments.site/reader033/viewer/2022051502/58f9a933760da3da068b6c25/html5/thumbnails/30.jpg)
6. Test in a clone of the production environment
© 2015 Uncharted Software Inc.
![Page 31: Continuous Integration for Spark Apps by Sean McIntyre](https://reader033.vdocuments.site/reader033/viewer/2022051502/58f9a933760da3da068b6c25/html5/thumbnails/31.jpg)
Progress?
5. Build (and test) All The Branches6. Test in a clone of the production environment7. Keep the build fast8. Everyone can see the results of builds
© 2015 Uncharted Software Inc.
![Page 32: Continuous Integration for Spark Apps by Sean McIntyre](https://reader033.vdocuments.site/reader033/viewer/2022051502/58f9a933760da3da068b6c25/html5/thumbnails/32.jpg)
What now?
5. Build (and test) All The Branches6. Test in a clone of the production environment7. Keep the build fast8. Everyone can see the results of builds
© 2015 Uncharted Software Inc.
![Page 33: Continuous Integration for Spark Apps by Sean McIntyre](https://reader033.vdocuments.site/reader033/viewer/2022051502/58f9a933760da3da068b6c25/html5/thumbnails/33.jpg)
Docker Container (uncharted/sparklet)
v3: Squish Spark and Test JAR into Docker
Test Output
Source
TestJAR
Tests Main.scala
Spark
JAR
JAR TestJAR
© 2015 Uncharted Software Inc.
![Page 34: Continuous Integration for Spark Apps by Sean McIntyre](https://reader033.vdocuments.site/reader033/viewer/2022051502/58f9a933760da3da068b6c25/html5/thumbnails/34.jpg)
test.sh
© 2015 Uncharted Software Inc.
![Page 35: Continuous Integration for Spark Apps by Sean McIntyre](https://reader033.vdocuments.site/reader033/viewer/2022051502/58f9a933760da3da068b6c25/html5/thumbnails/35.jpg)
build.gradle (excerpt)
© 2015 Uncharted Software Inc.
![Page 36: Continuous Integration for Spark Apps by Sean McIntyre](https://reader033.vdocuments.site/reader033/viewer/2022051502/58f9a933760da3da068b6c25/html5/thumbnails/36.jpg)
Progress?
5. Build (and test) All The Branches6. Test in a clone of the production environment7. Keep the build fast8. Everyone can see the results of builds
© 2015 Uncharted Software Inc.
![Page 37: Continuous Integration for Spark Apps by Sean McIntyre](https://reader033.vdocuments.site/reader033/viewer/2022051502/58f9a933760da3da068b6c25/html5/thumbnails/37.jpg)
Travis CI VM
Docker Container
v4: Squish Docker into Travis CI
Test Output
Source
TestJAR
Tests Main.scala
Spark
JAR
JAR TestJAR
© 2015 Uncharted Software Inc.
![Page 38: Continuous Integration for Spark Apps by Sean McIntyre](https://reader033.vdocuments.site/reader033/viewer/2022051502/58f9a933760da3da068b6c25/html5/thumbnails/38.jpg)
.travis.yml
© 2015 Uncharted Software Inc.
![Page 39: Continuous Integration for Spark Apps by Sean McIntyre](https://reader033.vdocuments.site/reader033/viewer/2022051502/58f9a933760da3da068b6c25/html5/thumbnails/39.jpg)
Voilà!
© 2015 Uncharted Software Inc.
![Page 40: Continuous Integration for Spark Apps by Sean McIntyre](https://reader033.vdocuments.site/reader033/viewer/2022051502/58f9a933760da3da068b6c25/html5/thumbnails/40.jpg)
Progress?
5. Build (and test) All The Branches6. Test in a clone of the production environment7. Keep the build fast8. Everyone can see the results of builds
© 2015 Uncharted Software Inc.
![Page 41: Continuous Integration for Spark Apps by Sean McIntyre](https://reader033.vdocuments.site/reader033/viewer/2022051502/58f9a933760da3da068b6c25/html5/thumbnails/41.jpg)
© 2015 Uncharted Software Inc.
![Page 42: Continuous Integration for Spark Apps by Sean McIntyre](https://reader033.vdocuments.site/reader033/viewer/2022051502/58f9a933760da3da068b6c25/html5/thumbnails/42.jpg)
© 2015 Uncharted Software Inc.
![Page 43: Continuous Integration for Spark Apps by Sean McIntyre](https://reader033.vdocuments.site/reader033/viewer/2022051502/58f9a933760da3da068b6c25/html5/thumbnails/43.jpg)
All done!
5. Build (and test) All The Branches6. Test in a clone of the production environment7. Keep the build fast8. Everyone can see the results of builds
© 2015 Uncharted Software Inc.
![Page 44: Continuous Integration for Spark Apps by Sean McIntyre](https://reader033.vdocuments.site/reader033/viewer/2022051502/58f9a933760da3da068b6c25/html5/thumbnails/44.jpg)
Next Steps?
Alpine Linux
docker-compose
Windows (dev environment) support
python
© 2015 Uncharted Software Inc.
![Page 45: Continuous Integration for Spark Apps by Sean McIntyre](https://reader033.vdocuments.site/reader033/viewer/2022051502/58f9a933760da3da068b6c25/html5/thumbnails/45.jpg)
Questions?
https://github.com/unchartedsoftware/sparkpipe-core
https://github.com/Ghnuberath
@Ghnuberath
https://hub.docker.com/r/uncharted/sparklet/