phosphor: illuminating dynamic data flow in commodity jvms
DESCRIPTION
Dynamic taint analysis is a well-known information flow analysis problem with many possible applications. Taint tracking allows for analysis of application data flow by assigning labels to data, and then propagating those labels through data flow. Taint tracking systems traditionally compromise among performance, precision, soundness, and portability. Performance can be critical, as these systems are often intended to be deployed to production environments, and hence must have low overhead. To be deployed in security-conscious settings, taint tracking must also be sound and precise. Dynamic taint tracking must be portable in order to be easily deployed and adopted for real world purposes, without requiring recompilation of the operating system or language interpreter, and without requiring access to application source code. We present Phosphor, a dynamic taint tracking system for the Java Virtual Machine (JVM) that simultaneously achieves our goals of performance, soundness, precision, and portability. Moreover, to our knowledge, it is the first portable general purpose taint tracking system for the JVM. We evaluated Phosphor's performance on two commonly used JVM languages (Java and Scala), on two successive revisions of two commonly used JVMs (Oracle's HotSpot and OpenJDK's IcedTea) and on Android's Dalvik Virtual Machine, finding its performance to be impressive: as low as 3% (53% on average; 220% at worst) using the DaCapo macro benchmark suite.TRANSCRIPT
@_jon_bell_ October 22, 2014OOPSLA 2014
Jonathan Bell and Gail Kaiser Columbia University, New York, NY USA
Fork me on GithubPhosphor: Illuminating Dynamic Data Flow in Commodity JVMs
@_jon_bell_ October 22, 2014OOPSLA 2014
Dynamic Data Flow Analysis: Taint Tracking
ApplicationInputs Outputs
Flagged (“Tainted”)Input
Output that is derivedfrom tainted input
@_jon_bell_ October 22, 2014OOPSLA 2014
Taint Tracking: Applications
• End-user privacy testing: Does this application send my personal data to remote servers?
• SQL injection attack avoidance: Do SQL queries contain raw user input
• Debugging: Which inputs are relevant to the current (crashed) application state?
@_jon_bell_ October 22, 2014OOPSLA 2014
Qualities of a Successful Analysis
@_jon_bell_ October 22, 2014OOPSLA 2014
Soundness No data leakage!
@_jon_bell_ October 22, 2014OOPSLA 2014
Precision Data is tracked with the right tag
@_jon_bell_ October 22, 2014OOPSLA 2014
Performance Minimal slowdowns
@_jon_bell_ October 22, 2014OOPSLA 2014
Portability No special hardware, OS, or JVM
@_jon_bell_ October 22, 2014OOPSLA 2014
“Normal” Taint Tracking• Associate tags with data, then propagate the tags• Approaches:
• Operating System modifications [Vandebogart ’07], [Zeldovich ’06]
• Language interpreter modifications [Chandra ’07], [Enck ’10], [Nair ’07], [Son ’13]
• Source code modifications [Lam ‘06], [Xu ’06]
• Binary instrumentation of applications [Clause ’07], [Cheng ’06], [Kemerlis ’12]Hard to be sound, precise, and performant
Not portable
@_jon_bell_ October 22, 2014OOPSLA 2014
Phosphor• Leverages benefits of interpreter-based
approaches (information about variables) but fully portably
• Instruments all byte code that runs in the JVM (including the JRE API) to track taint tags
• Add a variable for each variable
• Adds propagation logic
@_jon_bell_ October 22, 2014OOPSLA 2014
Key contribution:How do we efficiently store meta-data
for every variable without modifying the JVM itself?
@_jon_bell_ October 22, 2014OOPSLA 2014
JVM Type Organization• Primitive Types
• int, long, char, byte, etc.
• Reference Types
• Arrays, instances of classes
• All reference types are assignable to java.lang.Object
Phosphor’s taint tag storageLocal
variableMethod
argumentReturn value
Operand stack Field
Object Stored as a field of the object
Object array Stored as a field of each object
Primitive
Primitive array
Shadow variable
Shadow array
variable
Shadow argument
Shadow array
argument
"Boxed"
"Boxed"
Below the value on stack
Array below value on stack
Shadow field
Shadow array field
@_jon_bell_ October 22, 2014OOPSLA 2014
@_jon_bell_ October 22, 2014OOPSLA 2014
Taint Propagation• Modify all byte code instructions to be taint-aware
by adding extra instructions
• Examples:
• Arithmetic -> combine tags of inputs
• Load variable to stack -> Also load taint tag to stack
• Modify method calls to pass taint tags
@_jon_bell_ October 22, 2014OOPSLA 2014Two big problems
@_jon_bell_ October 22, 2014OOPSLA 2014
Challenge 1: Type Mayhem
java.lang.ObjectPrimitive Types
Instances of classes (Objects) Primitive Arrays
instanceof instanceofAlways has
extra variable!
Always has extra variable
Sometimes has extra variable!
Never has extra variable!
@_jon_bell_ October 22, 2014OOPSLA 2014
Challenge 1: Type Mayhem
Solution 1: Box taint tag with array when we lose type information
byte[] array = new byte[5];Object ret = array;return ret;
int[] array_tag = new int[5];byte[] array = new byte[5];
Object ret = new TaintedByteArray(array_tag,array);
@_jon_bell_ October 22, 2014OOPSLA 2014
Challenge 2: Native Code
We can’t instrument everything!
@_jon_bell_ October 22, 2014OOPSLA 2014
Challenge 2: Native Codepublic int hashCode() { return super.hashCode() * field.hashCode();}
public native int hashCode();
Solution: Wrappers. Rename every method, and leave a wrapper behind
@_jon_bell_ October 22, 2014OOPSLA 2014
Challenge 2: Native Codepublic int hashCode() { return super.hashCode() * field.hashCode();}
public native int hashCode();
public TaintedInt hashCode$$wrapper() {return new TaintedInt(0, hashCode());
}
Solution: Wrappers. Rename every method, and leave a wrapper behind
@_jon_bell_ October 22, 2014OOPSLA 2014
Challenge 2: Native CodeWrappers work both ways: native code can still call a
method with the old signature
public int[] someMethod(byte in)
@_jon_bell_ October 22, 2014OOPSLA 2014
Challenge 2: Native CodeWrappers work both ways: native code can still call a
method with the old signature
public int[] someMethod(byte in)
public TaintedIntArray someMethod$$wrapper(int in_tag, byte in){ //The original method "someMethod", but with taint tracking}
@_jon_bell_ October 22, 2014OOPSLA 2014
Challenge 2: Native CodeWrappers work both ways: native code can still call a
method with the old signature
public int[] someMethod(byte in){ return someMethod$$wrapper(0, in).val;}
public TaintedIntArray someMethod$$wrapper(int in_tag, byte in){ //The original method "someMethod", but with taint tracking}
@_jon_bell_ October 22, 2014OOPSLA 2014
Challenge 2: Native CodeWrappers work both ways: native code can still call a
method with the old signature
public int[] someMethod(byte in){ return someMethod$$wrapper(0, in).val;}
public TaintedIntArray someMethod$$wrapper(int in_tag, byte in){ //The original method "someMethod", but with taint tracking}
@_jon_bell_ October 22, 2014OOPSLA 2014
Design Limitations
• Tracking through native code
• Return value’s tag becomes combination of all parameters (heuristic); not found to be a problem in our evaluation
• Tracks explicit data flow only (not through control flow)
@_jon_bell_ October 22, 2014OOPSLA 2014
Evaluation
• Soundness & Precision
• Performance
• Portability
@_jon_bell_ October 22, 2014OOPSLA 2014
Soundness & Precision
• DroidBench - series of unit tests for Java taint tracking
• Passed all except for implicit flows (intended behavior)
@_jon_bell_ October 22, 2014OOPSLA 2014
Performance
• Macrobenchmarks (DaCapo, Scalabench)
• Microbenchmarks
• Versus TaintDroid [Enck, 2010] on CaffeineMark
• Versus Trishul [Nair, 2008] on JavaGrande
@_jon_bell_ October 22, 2014OOPSLA 2014
0% 50% 100% 150% 200% 250% tmt
specs scalaxb
scalatest scalariform
scalap scaladoc
kiama factorie apparat actors xalan
tradesoap tradebeans
tomcat sunflow
pmd lusearch luindex jython
h2 fop
eclipse batik
avrora
Relative Runtime Overhead
scalab
ench
daca
po
Phosphor Relative Runtime Overhead (Hotspot 7)
Average: 53.3%
Macrobenchmarks
@_jon_bell_ October 22, 2014OOPSLA 2014
0% 100% 200% 300% 400% 500% 600% 700% 800% 900% tmt
specs scalaxb
scalatest scalariform
scalap scaladoc
kiama factorie apparat actors xalan
tradesoap tradebeans
tomcat sunflow
pmd lusearch luindex jython
h2 fop
eclipse batik
avrora
Relative Memory Overhead
scalab
ench
daca
po
Phosphor Relative Memory Overhead (Hotspot 7)
Average: 270.9%
Macrobenchmarks
@_jon_bell_ October 22, 2014OOPSLA 2014
MicrobenchmarksPhosphor (Hotspot 7) and Trishul Relative Overhead
0% 20% 40% 60% 80% 100% 120%
Serial
Method
Math
Loop
Exception
Create
Cast
Assign
Arithmetic
Relative Runtime Overhead
Phoshpor Trishul
@_jon_bell_ October 22, 2014OOPSLA 2014
MicrobenchmarksPhosphor (Hotspot 7) and Trishul Relative Overhead
0% 20% 40% 60% 80% 100% 120%
Serial
Method
Math
Loop
Exception
Create
Cast
Assign
Arithmetic
Relative Runtime Overhead
Phoshpor Trishul
0% 20% 40% 60% 80% 100% 120%
Serial
Method
Math
Loop
Exception
Create
Cast
Assign
Arithmetic
Relative Runtime Overhead
Phoshpor Trishul Kaffe
@_jon_bell_ October 22, 2014OOPSLA 2014
Microbenchmarks: Taintdroid
• Taintdroid: Taint tracking for Android’s Dalvik VM [Enck, 2010]
• Not very precise: one tag per array (not per array element!)
• Applied Phosphor to Android!
@_jon_bell_ October 22, 2014OOPSLA 2014
0% 50% 100% 150% 200%
Float
Logic
Loop
Method
Sieve
String Buffer
Relative Runtime Overhead
Taintdroid 4.3 Phosphor (Android 4.3)
MicrobenchmarksPhosphor and Taintdroid Relative Overhead
Array-heavy benchmarks
@_jon_bell_ October 22, 2014OOPSLA 2014
PortabilityJVM Version(s) Success?
Oracle (Hotspot) 1.7.0_45, 1.8.0_0 Yes
OpenJDK 1.7.0_45, 1.8.0_0 Yes
Android Dalvik 4.3.1 Yes
Apache Harmony 6.0M3 Yes
Kaffe VM 1.1.9 Yes
Jikes RVM 3.1.3 No, but may be possible with more work
@_jon_bell_ October 22, 2014OOPSLA 2014
Future Work & Extension
• This is a general approach for tracking metadata with variables in unmodified JVMs
• Could track any sort of data in principle
• We have already extended this approach to track path constraints on inputs
@_jon_bell_ October 22, 2014OOPSLA 2014
Phosphor: Illuminating Dynamic Data Flow in Commodity JVMs
https://github.com/Programming-Systems-Lab/Phosphor
Jonathan Bell and Gail Kaiser Columbia University
[email protected] @_jon_bell_
Fork me on Github
Consist
ent *Complete *
Well D
ocumented*Easyt
oR
euse* *
Evaluated*
OOPSLA*
Artifact *
AEC