phosphor: illuminating dynamic data flow in commodity jvms

38
@_jon_bell_ October 22, 2014 OOPSLA 2014 Jonathan Bell and Gail Kaiser Columbia University, New York, NY USA Fork me on Github Phosphor: Illuminating Dynamic Data Flow in Commodity JVMs

Upload: jonbell

Post on 01-Jul-2015

361 views

Category:

Science


5 download

DESCRIPTION

Dynamic taint analysis is a well-known information flow analysis problem with many possible applications. Taint tracking allows for analysis of application data flow by assigning labels to data, and then propagating those labels through data flow. Taint tracking systems traditionally compromise among performance, precision, soundness, and portability. Performance can be critical, as these systems are often intended to be deployed to production environments, and hence must have low overhead. To be deployed in security-conscious settings, taint tracking must also be sound and precise. Dynamic taint tracking must be portable in order to be easily deployed and adopted for real world purposes, without requiring recompilation of the operating system or language interpreter, and without requiring access to application source code. We present Phosphor, a dynamic taint tracking system for the Java Virtual Machine (JVM) that simultaneously achieves our goals of performance, soundness, precision, and portability. Moreover, to our knowledge, it is the first portable general purpose taint tracking system for the JVM. We evaluated Phosphor's performance on two commonly used JVM languages (Java and Scala), on two successive revisions of two commonly used JVMs (Oracle's HotSpot and OpenJDK's IcedTea) and on Android's Dalvik Virtual Machine, finding its performance to be impressive: as low as 3% (53% on average; 220% at worst) using the DaCapo macro benchmark suite.

TRANSCRIPT

Page 1: Phosphor: Illuminating Dynamic Data Flow in Commodity JVMs

@_jon_bell_ October 22, 2014OOPSLA 2014

Jonathan Bell and Gail Kaiser Columbia University, New York, NY USA

Fork me on GithubPhosphor: Illuminating Dynamic Data Flow in Commodity JVMs

Page 2: Phosphor: Illuminating Dynamic Data Flow in Commodity JVMs

@_jon_bell_ October 22, 2014OOPSLA 2014

Dynamic Data Flow Analysis: Taint Tracking

ApplicationInputs Outputs

Flagged (“Tainted”)Input

Output that is derivedfrom tainted input

Page 3: Phosphor: Illuminating Dynamic Data Flow in Commodity JVMs

@_jon_bell_ October 22, 2014OOPSLA 2014

Taint Tracking: Applications

• End-user privacy testing: Does this application send my personal data to remote servers?

• SQL injection attack avoidance: Do SQL queries contain raw user input

• Debugging: Which inputs are relevant to the current (crashed) application state?

Page 4: Phosphor: Illuminating Dynamic Data Flow in Commodity JVMs

@_jon_bell_ October 22, 2014OOPSLA 2014

Qualities of a Successful Analysis

Page 5: Phosphor: Illuminating Dynamic Data Flow in Commodity JVMs

@_jon_bell_ October 22, 2014OOPSLA 2014

Soundness No data leakage!

Page 6: Phosphor: Illuminating Dynamic Data Flow in Commodity JVMs

@_jon_bell_ October 22, 2014OOPSLA 2014

Precision Data is tracked with the right tag

Page 7: Phosphor: Illuminating Dynamic Data Flow in Commodity JVMs

@_jon_bell_ October 22, 2014OOPSLA 2014

Performance Minimal slowdowns

Page 8: Phosphor: Illuminating Dynamic Data Flow in Commodity JVMs

@_jon_bell_ October 22, 2014OOPSLA 2014

Portability No special hardware, OS, or JVM

Page 9: Phosphor: Illuminating Dynamic Data Flow in Commodity JVMs

@_jon_bell_ October 22, 2014OOPSLA 2014

“Normal” Taint Tracking• Associate tags with data, then propagate the tags• Approaches:

• Operating System modifications [Vandebogart ’07], [Zeldovich ’06]

• Language interpreter modifications [Chandra ’07], [Enck ’10], [Nair ’07], [Son ’13]

• Source code modifications [Lam ‘06], [Xu ’06]

• Binary instrumentation of applications [Clause ’07], [Cheng ’06], [Kemerlis ’12]Hard to be sound, precise, and performant

Not portable

Page 10: Phosphor: Illuminating Dynamic Data Flow in Commodity JVMs

@_jon_bell_ October 22, 2014OOPSLA 2014

Phosphor• Leverages benefits of interpreter-based

approaches (information about variables) but fully portably

• Instruments all byte code that runs in the JVM (including the JRE API) to track taint tags

• Add a variable for each variable

• Adds propagation logic

Page 11: Phosphor: Illuminating Dynamic Data Flow in Commodity JVMs

@_jon_bell_ October 22, 2014OOPSLA 2014

Key contribution:How do we efficiently store meta-data

for every variable without modifying the JVM itself?

Page 12: Phosphor: Illuminating Dynamic Data Flow in Commodity JVMs

@_jon_bell_ October 22, 2014OOPSLA 2014

JVM Type Organization• Primitive Types

• int, long, char, byte, etc.

• Reference Types

• Arrays, instances of classes

• All reference types are assignable to java.lang.Object

Page 13: Phosphor: Illuminating Dynamic Data Flow in Commodity JVMs

Phosphor’s taint tag storageLocal

variableMethod

argumentReturn value

Operand stack Field

Object Stored as a field of the object

Object array Stored as a field of each object

Primitive

Primitive array

Shadow variable

Shadow array

variable

Shadow argument

Shadow array

argument

"Boxed"

"Boxed"

Below the value on stack

Array below value on stack

Shadow field

Shadow array field

@_jon_bell_ October 22, 2014OOPSLA 2014

Page 14: Phosphor: Illuminating Dynamic Data Flow in Commodity JVMs

@_jon_bell_ October 22, 2014OOPSLA 2014

Taint Propagation• Modify all byte code instructions to be taint-aware

by adding extra instructions

• Examples:

• Arithmetic -> combine tags of inputs

• Load variable to stack -> Also load taint tag to stack

• Modify method calls to pass taint tags

Page 15: Phosphor: Illuminating Dynamic Data Flow in Commodity JVMs

@_jon_bell_ October 22, 2014OOPSLA 2014Two big problems

Page 16: Phosphor: Illuminating Dynamic Data Flow in Commodity JVMs

@_jon_bell_ October 22, 2014OOPSLA 2014

Challenge 1: Type Mayhem

java.lang.ObjectPrimitive Types

Instances of classes (Objects) Primitive Arrays

instanceof instanceofAlways has

extra variable!

Always has extra variable

Sometimes has extra variable!

Never has extra variable!

Page 17: Phosphor: Illuminating Dynamic Data Flow in Commodity JVMs

@_jon_bell_ October 22, 2014OOPSLA 2014

Challenge 1: Type Mayhem

Solution 1: Box taint tag with array when we lose type information

byte[] array = new byte[5];Object ret = array;return ret;

int[] array_tag = new int[5];byte[] array = new byte[5];

Object ret = new TaintedByteArray(array_tag,array);

Page 18: Phosphor: Illuminating Dynamic Data Flow in Commodity JVMs

@_jon_bell_ October 22, 2014OOPSLA 2014

Challenge 2: Native Code

We can’t instrument everything!

Page 19: Phosphor: Illuminating Dynamic Data Flow in Commodity JVMs

@_jon_bell_ October 22, 2014OOPSLA 2014

Challenge 2: Native Codepublic int hashCode() { return super.hashCode() * field.hashCode();}

public native int hashCode();

Solution: Wrappers. Rename every method, and leave a wrapper behind

Page 20: Phosphor: Illuminating Dynamic Data Flow in Commodity JVMs

@_jon_bell_ October 22, 2014OOPSLA 2014

Challenge 2: Native Codepublic int hashCode() { return super.hashCode() * field.hashCode();}

public native int hashCode();

public TaintedInt hashCode$$wrapper() {return new TaintedInt(0, hashCode());

}

Solution: Wrappers. Rename every method, and leave a wrapper behind

Page 21: Phosphor: Illuminating Dynamic Data Flow in Commodity JVMs

@_jon_bell_ October 22, 2014OOPSLA 2014

Challenge 2: Native CodeWrappers work both ways: native code can still call a

method with the old signature

public int[] someMethod(byte in)

Page 22: Phosphor: Illuminating Dynamic Data Flow in Commodity JVMs

@_jon_bell_ October 22, 2014OOPSLA 2014

Challenge 2: Native CodeWrappers work both ways: native code can still call a

method with the old signature

public int[] someMethod(byte in)

public TaintedIntArray someMethod$$wrapper(int in_tag, byte in){ //The original method "someMethod", but with taint tracking}

Page 23: Phosphor: Illuminating Dynamic Data Flow in Commodity JVMs

@_jon_bell_ October 22, 2014OOPSLA 2014

Challenge 2: Native CodeWrappers work both ways: native code can still call a

method with the old signature

public int[] someMethod(byte in){ return someMethod$$wrapper(0, in).val;}

public TaintedIntArray someMethod$$wrapper(int in_tag, byte in){ //The original method "someMethod", but with taint tracking}

Page 24: Phosphor: Illuminating Dynamic Data Flow in Commodity JVMs

@_jon_bell_ October 22, 2014OOPSLA 2014

Challenge 2: Native CodeWrappers work both ways: native code can still call a

method with the old signature

public int[] someMethod(byte in){ return someMethod$$wrapper(0, in).val;}

public TaintedIntArray someMethod$$wrapper(int in_tag, byte in){ //The original method "someMethod", but with taint tracking}

Page 25: Phosphor: Illuminating Dynamic Data Flow in Commodity JVMs

@_jon_bell_ October 22, 2014OOPSLA 2014

Design Limitations

• Tracking through native code

• Return value’s tag becomes combination of all parameters (heuristic); not found to be a problem in our evaluation

• Tracks explicit data flow only (not through control flow)

Page 26: Phosphor: Illuminating Dynamic Data Flow in Commodity JVMs

@_jon_bell_ October 22, 2014OOPSLA 2014

Evaluation

• Soundness & Precision

• Performance

• Portability

Page 27: Phosphor: Illuminating Dynamic Data Flow in Commodity JVMs

@_jon_bell_ October 22, 2014OOPSLA 2014

Soundness & Precision

• DroidBench - series of unit tests for Java taint tracking

• Passed all except for implicit flows (intended behavior)

Page 28: Phosphor: Illuminating Dynamic Data Flow in Commodity JVMs

@_jon_bell_ October 22, 2014OOPSLA 2014

Performance

• Macrobenchmarks (DaCapo, Scalabench)

• Microbenchmarks

• Versus TaintDroid [Enck, 2010] on CaffeineMark

• Versus Trishul [Nair, 2008] on JavaGrande

Page 29: Phosphor: Illuminating Dynamic Data Flow in Commodity JVMs

@_jon_bell_ October 22, 2014OOPSLA 2014

0% 50% 100% 150% 200% 250% tmt

specs scalaxb

scalatest scalariform

scalap scaladoc

kiama factorie apparat actors xalan

tradesoap tradebeans

tomcat sunflow

pmd lusearch luindex jython

h2 fop

eclipse batik

avrora

Relative Runtime Overhead

scalab

ench

daca

po

Phosphor Relative Runtime Overhead (Hotspot 7)

Average: 53.3%

Macrobenchmarks

Page 30: Phosphor: Illuminating Dynamic Data Flow in Commodity JVMs

@_jon_bell_ October 22, 2014OOPSLA 2014

0% 100% 200% 300% 400% 500% 600% 700% 800% 900% tmt

specs scalaxb

scalatest scalariform

scalap scaladoc

kiama factorie apparat actors xalan

tradesoap tradebeans

tomcat sunflow

pmd lusearch luindex jython

h2 fop

eclipse batik

avrora

Relative Memory Overhead

scalab

ench

daca

po

Phosphor Relative Memory Overhead (Hotspot 7)

Average: 270.9%

Macrobenchmarks

Page 31: Phosphor: Illuminating Dynamic Data Flow in Commodity JVMs

@_jon_bell_ October 22, 2014OOPSLA 2014

MicrobenchmarksPhosphor (Hotspot 7) and Trishul Relative Overhead

0% 20% 40% 60% 80% 100% 120%

Serial

Method

Math

Loop

Exception

Create

Cast

Assign

Arithmetic

Relative Runtime Overhead

Phoshpor Trishul

Page 32: Phosphor: Illuminating Dynamic Data Flow in Commodity JVMs

@_jon_bell_ October 22, 2014OOPSLA 2014

MicrobenchmarksPhosphor (Hotspot 7) and Trishul Relative Overhead

0% 20% 40% 60% 80% 100% 120%

Serial

Method

Math

Loop

Exception

Create

Cast

Assign

Arithmetic

Relative Runtime Overhead

Phoshpor Trishul

0% 20% 40% 60% 80% 100% 120%

Serial

Method

Math

Loop

Exception

Create

Cast

Assign

Arithmetic

Relative Runtime Overhead

Phoshpor Trishul Kaffe

Page 33: Phosphor: Illuminating Dynamic Data Flow in Commodity JVMs

@_jon_bell_ October 22, 2014OOPSLA 2014

Microbenchmarks: Taintdroid

• Taintdroid: Taint tracking for Android’s Dalvik VM [Enck, 2010]

• Not very precise: one tag per array (not per array element!)

• Applied Phosphor to Android!

Page 34: Phosphor: Illuminating Dynamic Data Flow in Commodity JVMs

@_jon_bell_ October 22, 2014OOPSLA 2014

0% 50% 100% 150% 200%

Float

Logic

Loop

Method

Sieve

String Buffer

Relative Runtime Overhead

Taintdroid 4.3 Phosphor (Android 4.3)

MicrobenchmarksPhosphor and Taintdroid Relative Overhead

Array-heavy benchmarks

Page 35: Phosphor: Illuminating Dynamic Data Flow in Commodity JVMs

@_jon_bell_ October 22, 2014OOPSLA 2014

PortabilityJVM Version(s) Success?

Oracle (Hotspot) 1.7.0_45, 1.8.0_0 Yes

OpenJDK 1.7.0_45, 1.8.0_0 Yes

Android Dalvik 4.3.1 Yes

Apache Harmony 6.0M3 Yes

Kaffe VM 1.1.9 Yes

Jikes RVM 3.1.3 No, but may be possible with more work

Page 36: Phosphor: Illuminating Dynamic Data Flow in Commodity JVMs

@_jon_bell_ October 22, 2014OOPSLA 2014

Future Work & Extension

• This is a general approach for tracking metadata with variables in unmodified JVMs

• Could track any sort of data in principle

• We have already extended this approach to track path constraints on inputs

Page 37: Phosphor: Illuminating Dynamic Data Flow in Commodity JVMs

@_jon_bell_ October 22, 2014OOPSLA 2014

Page 38: Phosphor: Illuminating Dynamic Data Flow in Commodity JVMs

Phosphor: Illuminating Dynamic Data Flow in Commodity JVMs

https://github.com/Programming-Systems-Lab/Phosphor

Jonathan Bell and Gail Kaiser Columbia University

[email protected] @_jon_bell_

Fork me on Github

Consist

ent *Complete *

Well D

ocumented*Easyt

oR

euse* *

Evaluated*

OOPSLA*

Artifact *

AEC