extending spark with java agent (handout)
TRANSCRIPT
![Page 1: Extending Spark With Java Agent (handout)](https://reader031.vdocuments.site/reader031/viewer/2022030309/58f2a6a91a28abb73a8b45bd/html5/thumbnails/1.jpg)
EXTENDING SPARK WITH JAVA AGENTS
Jaroslav BachorikAdrian Popescu
![Page 2: Extending Spark With Java Agent (handout)](https://reader031.vdocuments.site/reader031/viewer/2022030309/58f2a6a91a28abb73a8b45bd/html5/thumbnails/2.jpg)
Meet The Speakers
● Backend engineer● Hands-on experience with JVM
○ OpenJDK Reviewer● Strong focus on performance
○ VisualVM Contributor○ BTrace Maintainer and Lead
Developer
Jaroslav Adrian● Backend engineer● 5+ years of experience in
performance monitoring & modeling
● Focusing on tuning and optimization of Big Data apps
![Page 3: Extending Spark With Java Agent (handout)](https://reader031.vdocuments.site/reader031/viewer/2022030309/58f2a6a91a28abb73a8b45bd/html5/thumbnails/3.jpg)
UnravelPerformance Intelligence for Big Data Stack
![Page 4: Extending Spark With Java Agent (handout)](https://reader031.vdocuments.site/reader031/viewer/2022030309/58f2a6a91a28abb73a8b45bd/html5/thumbnails/4.jpg)
Caching in Spark…rdd.cache()rdd.count()…
rdd.reduceByKey((x,y) => x+y).count()
1. Evaluatepart #1
Executorpart #2
Executorpart #3
Executor
Block Manager
Block Manager
Block Manager2. Caching
part #1 part #2 part #3
3. Fetch or re-evaluate part #1 part #2 part #3
Avoiding re-evaluation for cached RDD blocks
![Page 5: Extending Spark With Java Agent (handout)](https://reader031.vdocuments.site/reader031/viewer/2022030309/58f2a6a91a28abb73a8b45bd/html5/thumbnails/5.jpg)
Spark Caching Problem• RDD Caching
- Benefits: speedup, saved resources- Too much caching may waste resources
• Spot the opportunity- Which RDDs to cache (for most benefit)?- How to prioritize when memory is scarce?- Where to cache?- Benefit depends on memory state
• http://unraveldata.com/to-cache-or-not-to-cache/
Challenging to decide when, what, and where to cache
![Page 6: Extending Spark With Java Agent (handout)](https://reader031.vdocuments.site/reader031/viewer/2022030309/58f2a6a91a28abb73a8b45bd/html5/thumbnails/6.jpg)
Algorithm1. Find stages that share RDDs
2. Cached RDDs: Measure benefitBlock hitRate & time saved for every Stage
3. RDDs not cached: Potential benefitApproximate time saved & RDD storage size
4. Suggest which RDDs to cache
Need to collect additional metrics
Insufficient visibility into Spark structures
e.g., only block info in BlockStatusListener
![Page 7: Extending Spark With Java Agent (handout)](https://reader031.vdocuments.site/reader031/viewer/2022030309/58f2a6a91a28abb73a8b45bd/html5/thumbnails/7.jpg)
Meet Java Agent
![Page 8: Extending Spark With Java Agent (handout)](https://reader031.vdocuments.site/reader031/viewer/2022030309/58f2a6a91a28abb73a8b45bd/html5/thumbnails/8.jpg)
Meet Java Agent
Premain-Class: com.myorg.agents.FancyAgent
Can-Redefine-Classes: true
Can-Retransform-Classes: true
META-INF/manifest.mf
public static void premain(String args, Instrumentation inst) {
// do the magic stuff here
}
FancyAgent.java
Launcherjava -javaagent:myfancyagent.jar=<args>
● Runs JVM with an agent● Agent jar
○ agent class○ required manifest entries
● Arguments○ any string understood by agent class
● Multiple agents can be specified
![Page 9: Extending Spark With Java Agent (handout)](https://reader031.vdocuments.site/reader031/viewer/2022030309/58f2a6a91a28abb73a8b45bd/html5/thumbnails/9.jpg)
Meet Java Agent
Premain-Class: com.myorg.agents.FancyAgent
Can-Redefine-Classes: true
Can-Retransform-Classes: true
META-INF/manifest.mf
public static void premain(String args, Instrumentation inst) {
// do the magic stuff here
}
FancyAgent.java
Launcherjava -javaagent:myfancyagent.jar=<args>
● Executed at JVM startup● Instrumentation provides access to JVM internals
![Page 10: Extending Spark With Java Agent (handout)](https://reader031.vdocuments.site/reader031/viewer/2022030309/58f2a6a91a28abb73a8b45bd/html5/thumbnails/10.jpg)
Meet Java Agent
Premain-Class: com.myorg.agents.FancyAgent
Can-Redefine-Classes: true
Can-Retransform-Classes: true
META-INF/manifest.mf
public static void premain(String args, Instrumentation inst) {
// do the magic stuff here
}
FancyAgent.java
Launcherjava -javaagent:myfancyagent.jar=<args>
● Agent class● Must be present in the agent jar
![Page 11: Extending Spark With Java Agent (handout)](https://reader031.vdocuments.site/reader031/viewer/2022030309/58f2a6a91a28abb73a8b45bd/html5/thumbnails/11.jpg)
Meet Java Agent
Premain-Class: com.myorg.agents.FancyAgent
Can-Redefine-Classes: true
Can-Retransform-Classes: true
META-INF/manifest.mf
public static void premain(String args, Instrumentation inst) {
// do the magic stuff here
}
FancyAgent.java
Launcherjava -javaagent:myfancyagent.jar=<args>
Transform classes upon load
![Page 12: Extending Spark With Java Agent (handout)](https://reader031.vdocuments.site/reader031/viewer/2022030309/58f2a6a91a28abb73a8b45bd/html5/thumbnails/12.jpg)
Meet Java Agent
Premain-Class: com.myorg.agents.FancyAgent
Can-Redefine-Classes: true
Can-Retransform-Classes: true
META-INF/manifest.mf
public static void premain(String args, Instrumentation inst) {
// do the magic stuff here
}
FancyAgent.java
Launcherjava -javaagent:myfancyagent.jar=<args>
● Agent capabilities and settings○ retransform classes○ redefine classes○ native method prefix
https://docs.oracle.com/javase/7/docs/api/java/lang/instrument/package-summary.html
![Page 13: Extending Spark With Java Agent (handout)](https://reader031.vdocuments.site/reader031/viewer/2022030309/58f2a6a91a28abb73a8b45bd/html5/thumbnails/13.jpg)
Class Transformersclass SpecialLibraryClass {
public void doStuff() {
System.out.println(
"Doing stuff #" + id(System.currentTimeMillis())
);
}
private int id(int input) {
return input % 42;
}
}
Private method in 3rd party package private method.
Patch and recompile? Reflection?
http://pixabay.com
![Page 14: Extending Spark With Java Agent (handout)](https://reader031.vdocuments.site/reader031/viewer/2022030309/58f2a6a91a28abb73a8b45bd/html5/thumbnails/14.jpg)
Class Transformersclass SpecialLibraryClass {
public void doStuff() {
System.out.println("Doing stuff #" + id(System.currentTimeMillis()));
}
private int id(int input) {
AgentCallbacks.onEnter("SLC", "id", "input = " + input);
int tmp = input % 42;
AgentCallbacks.onReturn("SLC", "id", tmp);
return tmp;
}
}
http://pixabay.com
![Page 15: Extending Spark With Java Agent (handout)](https://reader031.vdocuments.site/reader031/viewer/2022030309/58f2a6a91a28abb73a8b45bd/html5/thumbnails/15.jpg)
Class Transformersjava.lang.instrument.ClassTransformer
byte[] transform(ClassLoader l, String name, Class<?> cls,
ProtectionDomain pd, byte[] classfileBuffer)
● Inspect and modify class data ○ without recompilation○ even on-the-fly
● Powerful○ introduce callbacks○ patch problematic code
● Not a simple task○ class data structure○ constant pool○ stack frame maps
http://wikipedia.org
![Page 16: Extending Spark With Java Agent (handout)](https://reader031.vdocuments.site/reader031/viewer/2022030309/58f2a6a91a28abb73a8b45bd/html5/thumbnails/16.jpg)
BTrace
● Open source dynamic instrumentation framework for JVM
● Java annotations based DSL for class transformations
● Optimized for high performance● Used eg. in VisualVM
(http://visualvm.github.io)
https://github.com/btraceio/btrace
![Page 17: Extending Spark With Java Agent (handout)](https://reader031.vdocuments.site/reader031/viewer/2022030309/58f2a6a91a28abb73a8b45bd/html5/thumbnails/17.jpg)
Spark Config Tuning● Intercept SparkContext instantiation at Spark Driver● Modify the associated SparkConf object
○ before SparkContext is fully instantiated● Any Spark property can be changed
○ eg. “spark.executor.cores” based on system load
● Modified SparkConf will be applied to the current job○ and distributed to to all executors
![Page 18: Extending Spark With Java Agent (handout)](https://reader031.vdocuments.site/reader031/viewer/2022030309/58f2a6a91a28abb73a8b45bd/html5/thumbnails/18.jpg)
Config Tuning Transformation@OnMethod(clazz = "org.apache.spark.SparkContext",
method = "<init>",
location = @Location(Kind.RETURN)
)
public static void sparkContextConstructor(@Self SparkContext scObj) {
SparkConf conf = scObj.conf();
if (conf == null)
return;
int recommendedCores = AdviceSystem.getRecommendedCores(scObj);
conf.set("spark.executor.cores", recommendedCores);
}
![Page 19: Extending Spark With Java Agent (handout)](https://reader031.vdocuments.site/reader031/viewer/2022030309/58f2a6a91a28abb73a8b45bd/html5/thumbnails/19.jpg)
Agents on Spark?
http://apache.orghttp://pixabay.com
![Page 20: Extending Spark With Java Agent (handout)](https://reader031.vdocuments.site/reader031/viewer/2022030309/58f2a6a91a28abb73a8b45bd/html5/thumbnails/20.jpg)
Agents on Spark!● Spark is a JVM application● Any Java agent can be used with Spark
http://apache.orghttp://pixabay.com
![Page 21: Extending Spark With Java Agent (handout)](https://reader031.vdocuments.site/reader031/viewer/2022030309/58f2a6a91a28abb73a8b45bd/html5/thumbnails/21.jpg)
Register JVM Agent for Sparkjava -javaagent:myfancyagent.jar=<args>
spark.executor.extraJavaOptions=”-javaagent:/opt/myagent/myagent.jar=opt.load:true”
spark.driver.extraJavaOptions=”-javaagent:/opt/myagent/myagent.jar=track.conf:true”
Store the options in spark-defaults.conf to apply system-wide.
Spark configuration parameters
http://arthurandsage.blogspot.cz/
![Page 22: Extending Spark With Java Agent (handout)](https://reader031.vdocuments.site/reader031/viewer/2022030309/58f2a6a91a28abb73a8b45bd/html5/thumbnails/22.jpg)
Let’s Put It All Together!
![Page 23: Extending Spark With Java Agent (handout)](https://reader031.vdocuments.site/reader031/viewer/2022030309/58f2a6a91a28abb73a8b45bd/html5/thumbnails/23.jpg)
Identify Required Data
Block Manager
Block Manager
Block Manager
1. Evaluation
2. Caching
part #1 part #2 part #3
part #1Executor
part #2Executor
part #3Executor
• Access to Spark internal structures• Collecting: RDD block usage, cache hit, potential benefit• Run algorithm and suggest which RDDs to cache
…rdd.cache()rdd.count()…
rdd.reduceByKey((x,y) => x+y).count()…
Agents allow:
![Page 24: Extending Spark With Java Agent (handout)](https://reader031.vdocuments.site/reader031/viewer/2022030309/58f2a6a91a28abb73a8b45bd/html5/thumbnails/24.jpg)
Transform Classes def get(blockId: BlockId): Option[BlockResult] = {
val local = getLocal(blockId)
if (local.isDefined) {
logInfo(s"Found block $blockId locally")
return local
}
val remote = getRemote(blockId)
if (remote.isDefined) {
logInfo(s"Found block $blockId remotely")
return remote
}
None
}
def get(blockId: BlockId): Option[BlockResult] = {
AgentRuntime.useBlock(blockId, currentStage)
val local = getLocal(blockId)
if (local.isDefined) {
logInfo(s"Found block $blockId locally")
return local
}
val remote = getRemote(blockId)
...
...
}
![Page 25: Extending Spark With Java Agent (handout)](https://reader031.vdocuments.site/reader031/viewer/2022030309/58f2a6a91a28abb73a8b45bd/html5/thumbnails/25.jpg)
Unravel Spark Cache Events
Identify the costs of not-caching
Give very precise advice
![Page 26: Extending Spark With Java Agent (handout)](https://reader031.vdocuments.site/reader031/viewer/2022030309/58f2a6a91a28abb73a8b45bd/html5/thumbnails/26.jpg)
Put It On Cluster
![Page 27: Extending Spark With Java Agent (handout)](https://reader031.vdocuments.site/reader031/viewer/2022030309/58f2a6a91a28abb73a8b45bd/html5/thumbnails/27.jpg)
Agents In Cluster
http://http://spark.apache.org/
![Page 28: Extending Spark With Java Agent (handout)](https://reader031.vdocuments.site/reader031/viewer/2022030309/58f2a6a91a28abb73a8b45bd/html5/thumbnails/28.jpg)
Distributing Agents for Spark
• Manually copy agents to all nodes
Make sure all nodes have the latest and same copy of agent binaries
https://www.quickmeme.com
• Put that script in CRON!
• Script that copy process
![Page 29: Extending Spark With Java Agent (handout)](https://reader031.vdocuments.site/reader031/viewer/2022030309/58f2a6a91a28abb73a8b45bd/html5/thumbnails/29.jpg)
Yarn Localization Service● Built-in resource distribution system● Cluster resources
○ linked into executor working directories
○ kept up-to-date● Supported resource types
○ files○ archives
● https://spark.apache.org/docs/1.5.2/running-on-yarn.html
![Page 30: Extending Spark With Java Agent (handout)](https://reader031.vdocuments.site/reader031/viewer/2022030309/58f2a6a91a28abb73a8b45bd/html5/thumbnails/30.jpg)
Takeaways● Sometimes even rich set of Spark metrics, logs and events is not
sufficient● Java agents are powerful tool
○ extend application beyond the original intent○ hot-patch problems or enhancements○ may be removed when not needed
● Try Unravel to see the power of Java agents unleashed
![Page 31: Extending Spark With Java Agent (handout)](https://reader031.vdocuments.site/reader031/viewer/2022030309/58f2a6a91a28abb73a8b45bd/html5/thumbnails/31.jpg)
Resources● To simplify application and operations management with Spark, try out
Unravel○ http://unraveldata.com/free-trial/
● Start with Java agents and try out BTrace○ https://github.com/btraceio/btrace○ contributors most sincerely welcome!
● Spark Configuration○ http://spark.apache.org/docs/latest/configuration.html○ https://spark.apache.org/docs/1.5.2/running-on-yarn.html
● Java Agents○ https://docs.oracle.com/javase/7/docs/api/java/lang/instrument/pac
kage-summary.html○ http://zeroturnaround.com/rebellabs/how-to-inspect-classes-in-your
-jvm/
![Page 32: Extending Spark With Java Agent (handout)](https://reader031.vdocuments.site/reader031/viewer/2022030309/58f2a6a91a28abb73a8b45bd/html5/thumbnails/32.jpg)
Q&A
Unravel Free Trial: http://unraveldata.com/free-trial/
![Page 33: Extending Spark With Java Agent (handout)](https://reader031.vdocuments.site/reader031/viewer/2022030309/58f2a6a91a28abb73a8b45bd/html5/thumbnails/33.jpg)
THANK YOU.Jaroslav Bachorik, [email protected] Popescu, [email protected]