eecs 583 class 21 research topic 3: dynamic taint...
TRANSCRIPT
EECS 583 – Class 21
Research Topic 3:
Dynamic Taint Analysis
University of Michigan
December 5, 2012
- 1 -
Announcements
Exams returned
» Answer Key can be found on the course website
» One week to turn in written requests for exam regrades
Sign up for project presentation slots
» Friday on Scott’s office door, Monday in class
Today’s class reading » "Dynamic taint analysis for automatic detection, analysis, and signature
generation of exploits on commodity software,"
James Newsome and Dawn Song, Proceedings of the Network and
Distributed System Security Symposium, Feb. 2005.
» Optional background reading: "All You Ever Wanted to Know About
Dynamic Taint Analysis and Forward Symbolic Execution (but might
have been afraid to ask),"
Edward J. Schwartz, Thanassis Avgerinos, David Brumley, Proceedings
of the 2010 IEEE Symposium on Security and Privacy, May 2010.
- 2 -
EECS 583 Exam Statistics
0
10
20
30
40
50
60
70
80
90
100
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61
Series1
Exam
Sco
re
Mean: 70.6 StDev: 12.8 High: 94 Low: 44
- 3 -
The Problem
Unknown Vulnerability Detection **
» Buffer Overflows
» Format string vulnerabilties
» The goal of TaintCheck
Malware Analysis
» What is the software doing with sensitive data?
» Data propagation from triggers
» Ex. TaintDroid
More Generally - Tracking the flow of data through a
program
- 4 -
Buffer Overflow
Example Stack Buffer Overflow
- 5 -
Buffer Overflow
- 6 -
Format String Vulnerability
Attacker controls a format string
- 7 -
Format String Vulnerability
Attacker controls a format string
» Overwrite arbitrary memory
Take control of program execution
%n commonly used in conjunction with other format specifiers
Writes number of bytes written thus far to the stack
» Dump memory
ex. printf ("%08x.%08x.%08x.%08x.%08x\n");
» Crash the program
ex. printf ("%s%s%s%s%s%s%s%s%s%s%s%s");
Denial of Service
- 8 -
Dynamic Taint Analysis
Track information flow through a program at runtime
Identify sources of taint – “TaintSeed”
» What are you tracking?
Untrusted input
Sensitive data
Taint Policy – “TaintTracker”
» Propagation of taint
Identify taint sinks – “TaintAssert”
» Taint checking
Special calls
Jump statements
Format strings
Outside network
- 9 -
TaintCheck
Performed on x86 binary
» No need for source
Implemented using Valgrind skin
» X86 -> Valgrind’s Ucode
» Taint instrumentation added
» Ucode -> x86
Sources -> TaintSeed
Taint Policy -> TaintTracker
Sinks -> TaintAssert
Add on “Exploit Analyzer”
- 10 -
Taint Analysis in Action
12/6/2012
All You Ever Wanted to Know About
Dynamic Taint Analysis
11
x = get_input( )
y = x + 42
…
goto y Input is tainted
untainted tainted
x 7
Δ Var Val
T x
Tainted? Var
τ
Input t = IsUntrusted(src) get_input(src)↓ t
TaintSeed
12/6/2012
All You Ever Wanted to Know About
Dynamic Taint Analysis
12
x = get_input( )
y = x + 42
…
goto y Data derived from
user input is tainted
untainted tainted
y 49
Δ Var Val
x 7
T y
Tainted?
T
Var
x
τ
BinOp t1 = τ[x1] , t2 = τ[x2]
x1 + x2 ↓ t1 v t2
TaintTracker
Pgoto(ta) = ¬ ta
(Must be true to execute)
12/6/2012
All You Ever Wanted to Know About
Dynamic Taint Analysis
13
Policy Violation Detected
x = get_input( )
y = x + 42
…
goto y
untainted tainted Δ Var Val
x 7
y 49
Tainted?
T
T
Var
x
y
τ TaintAssert
12/6/2012
All You Ever Wanted to Know About
Dynamic Taint Analysis
14
x = get_input( )
y = …
…
goto y
… strcpy(buffer,argv[1]) ; … return ;
Jumping to overwritten
return address
- 15 -
TaintCheck - Exploit Analyzer
TaintPolicy - Keep track of Taint propagation info
» Backtrace chain of taint structures
» Allow generation of attack signatures
Transfer control to sandbox for analysis
- 16 -
TaintCheck - Evaluation
False Negatives
» Use control flow to change value without gathering taint
Example: if (x == 0) y=0; else if (x == 1) y=1;
Equivalent to x=y;
» Tainted index into a hardcoded table
Policy – value translation is not tainted
» Enumerating all sources of taint
False Positives
» Vulnerable code?
» Sanity Checks not removing taint
Requires fine-tuning
Taint sanitization problem
» 0 false positives?
Thoughts?
- 17 -
Policy Considerations?
Memory Load
12/6/2012 Carnegie Mellon University 18
Variables Memory
Δ Var Val
x 7
Tainted?
T
Var
x
τ
μ Addr Val
7 42
Tainted?
F
Addr
7
τμ
Problem: Memory Addresses
12/6/2012 Carnegie Mellon University 19
x = get_input( ) y = load( x ) … goto y
All values derived from user input
are tainted??
7 42 μ
Addr Val
Tainted?
F
Addr
7 τμ
x 7 Δ
Var Val
μ Addr Val
x = get_input( ) y = load( x ) … goto y
Jump target could be any untainted
memory cell value
Policy 1:
12/6/2012
All You Ever Wanted to Know About
Dynamic Taint Analysis
20
Load v = Δ[x] , t = τμ[v]
load(x) ↓ t
Taint depends only on the memory cell
Taint Propagation
7 42
Tainted?
F
Addr
7 τμ
x 7 Δ
Var Val
Undertainting Failing to identify tainted
values - e.g., missing exploits
jmp_table
Policy Violation?
12/6/2012
All You Ever Wanted to Know About
Dynamic Taint Analysis
21
x = get_input( ) y = load(jmp_table + x % 2 ) … goto y
Policy 2:
Memory
printa
printb
Address expression is tainted
Load v = Δ[x] , t = τμ[v], ta = τ[x]
load(x) ↓ t v ta
If either the address or the memory cell is tainted, then the value is tainted
Taint Propagation
Overtainting Unaffected values are tainted
- e.g., exploits on safe inputs
General Challenge State-of-the-Art is not perfect for all
programs
12/6/2012
All You Ever Wanted to Know About
Dynamic Taint Analysis
22
Undertainting: Policy may miss taint
Overtainting: Policy may wrongly
detect taint
- 23 -
TaintCheck - Attack Detection
Synthetic Exploits
» Buffer overflow -> function pointer
» Buffer overflow -> format string
» Format string -> info leak
» Success!
Actual Exploits
» 3 real world examples
» Random sample?
» Prevalence of protected exploits?
slightly more convincing
- 24 -
Performance
Implementation decisions
» Use of Valgrind
» Better on IO bound tasks
» How much performance would you give up?
- 25 -
TaintCheck Uses
Individual Sites
» Cost?
Honeypots
“TaintCheck plus OS Randomization”
» Rely on OS randomization to cause crashes
» Log and reproduce with tainting
Sampling
» Users rather than requests
Do I just need more infected computers?
» Distributed Sampling
- 26 -
Signatures
Automatic Semantic Analysis
» Exploit Analyzer
Generated signature validation
- 27 -
Other considerations
Effectiveness in the wild
» Side-channel attacks
Can the attacker determine when someone is using taint tracking?
Speed?
Perform the attack only on native systems
Similar to existing virtual machine and honeypot problems
» Circumventing the taint system
Clean your data before exploit
Depends on policy
Example: Hex to ASCII translation
Cause false positives
Raises cost of running the system
Admins may turn it off
» Difficult to evaluate without motivated attackers