ibinhunt: binary hunting with inter-procedural control flow jiang ming, meng pan, and debin gao...

Post on 16-Dec-2015

221 Views

Category:

Documents

3 Downloads

Preview:

Click to see full reader

TRANSCRIPT

iBinHunt: Binary Hunting with Inter-Procedural Control Flow

Jiang Ming, Meng Pan, and Debin Gao

College of Information Sciences and Technology, Penn State University D’Crypt Pte Ltd

School of Information Systems, Singapore Management University

12 3

1

2

3

Introduction

• Binary Hunting: automatically finding Semantic Differences in binary programs• Need to capture Semantic Differences

– Differences in functionality (input-output behavior)

• Syntactic Differences cause false positives– Differences in instructions– Register allocation– Basic-block reordering– Variables rename– ….

An example: gzip

• Different instructions in two versions, but with the same semantics

• A patch with 5 lines of code

All the 75 non-empty functions are changed

xor eax, eax

and ebx, 0

1

Gzip Long File Name Buffer Overflow Vulnerabilityhttp://www.securityfocus.com/bid/3712

1

Importance of Binary Hunting

Security applications of binary hunting• Finding security vulnerabilities with patched binary

– “BinHunt: Automatically finding semantic differences in binary programs”, ICICS 2008

• Automatic patch-based exploit (1-day exploit ) generation – “Automatic Patch-Based Exploit Generation is Possible”, IEEE S&P 2008

• Software plagiarism detection– “GPLAG: detection of software plagiarism by program dependence graph analysis”, KDD 2006

• Adapting trained anomaly detectors to software patches– “Automatically adapting a trained anomaly detector to software patches”, RAID 2009

• Malware analysis– “Polymorphic worm detection using structural information of executables”, RAID 2005– “Large-scale malware indexing using function-call graphs”, CCS 2009

Challenge

• Source code of binary files is not available• Function name extracted from these binary files are unreliable• Variety of obfuscation• ……• Latest solutions -- find similarity/difference in control flow structure rather than binary instructions

– Resistant to “superficial” changes – Example: BinDiff, BinHunt, DarunGrim, SMIT

Intra-procedural control flow vs.

Inter-procedural control flow

• Intra-procedural control flow– Most previous work focus on the intra-

procedural control flow.– Sub-graph isomorphism problem is NP-

complete.

– Example: 96% of non-empty functions of thttpd have fewer than 30 basic blocks.

– Graph isomorphism is practical in analyzing intra-procedural control flow

• Inter-procedural control flow– No function boundary– Huge graph with large size of nodes,

where graph isomorphism is impractical

– Example: thttpd-2.25 totally has more than 4,300 basic blocks. More than 4,000 candidate matchings for single basic block

Function Transformation Obfuscation

• Function transformation obfuscation is well-studied– Inlining functions– Outlining functions– Cloning functions– Interleaving functions

• Performing such obfuscation is simple and without intensive analysis of the binaries.

1

C. Collberg, C. Thomborson, and D. Low. A taxonomy of obfuscating transformations.Technical Report 148, Department of Computer Sciences, The University of Auckland, July 1997.

Inlining and outlining transformations

1

Advanced control flow obfuscation

• Control flow flattening– “Protection of software-based survivability

mechanisms”, DSN 2001– “An Approach to the Obfuscation of

Control-Flow of Sequential Computer Programs”, ISC 2001

• Redirecting control-flow with exceptions– “Binary Obfuscation Using Signals”,

USENIX Security 2007– “binOb+: a framework for potent and

stealthy binary obfuscation”, AsiaCCS 2010

• Function boundary information (Intra-procedural control flow) is not reliable !

Overview of iBinHunt

• iBinHunt: Binary Diffing with Inter-Procedural Control Flow Graphs• iBinHunt provides practical solutions to large number of basic block matchings

– Dynamic Tainting: Monitor the execution of the two binary programs under a common input and use taint analysis to record all basic blocks involved in the processing of the input.

– Deep taint: assign different taint tags to various parts of the input; only basic blocks from two binary programs that are marked with the same taint tags are considered matching candidates (a reduction factor of up to 74%).

– Basic block comparison: symbolic execution is first used to represent outputs of the basic blocks with their input symbols, and a theorem prover is then used to check if the outputs from the two basic block are semantically equivalent.

– Automatic input generation: increases the coverage of tainted basic blocks by automatically generating inputs that result in different execution traces.

Deep taint for basic block comparison

Inter-Procedural Control Flow Graphs

Deep taint execution trace

Deep Taint

Basic block comparison

An example: thttpd

• Input and its taint tag colors • Dynamic execution traces with Deep taint

Basic Blocks comparison

• Symbolic execution and theorem proving– Use symbolic execution to represent final values of outputs (registers and

variables)– Use a theorem prover to test if the outputs of two basic blocks are always

the same given the same inputs• Context aware

– the permutation of outputs of the equivalent basic blocks is the permutation of inputs of the successor blocks.

• Obtain the matching strength based on the result from the theorem

Basic block matching

we need to consider two other groups of blocks for finding matched blocks.

• Blocks are not semantically equivalent but with the same taint tags

• Blocks are not tainted but on the dynamic execution trace

• They could very likely be the differences between the two programs that iBinHunt is trying to locate. E.g., BB_13232 and BB_16184 are the location of binary difference

• Due to various reasons including limitations of taint analysis, not directly processing program inputs (e.g., signal processing), etc.

Matching Strength

Basic blocks B1 and B2 are considered matched to one another if B1 and B2 have the same taint tags (possibly non-tainted) and• B1 and B2 are semantically equivalent (evaluated by symbolic execution and a

theorem proving); or

• a predecessor of B1 and a predecessor of B2 match; or

• a successor of B1 and a successor of B2 match.

1B 2B

predecessor

1B

predecessor

2B

successor

1B

successor

2B

Automatic Input Generation

Symbolic ExecutionConcrete Execution

Symbolic Formula

Initial Input:GET index.html HTTP/1.1Host: .

ff Constraint

Solver(STP)

New Input

Evaluation

• We applied iBinHunt to find semantic differences in several versions of thttpd and gzip. There are two main aspects on which we want to evaluate:

– Efficiency: how many basic blocks can be matched under our definition of matching strength, how many matchings are identified by deep taint, and how long it takes to find these matchings.

– Accuracy: confirm these differences by comparing them to the ground truth (program source code).

• Different versions of thttpd and gzip (number of lines changed / total number of lines)

thttpd - 2.20 2.20c 2.21 2.25

2.19 252/6059 254/5843 1483/6641 2908/7271

gzip- 1.3.12 1.3.13 1.40

1.2.4 1317/4959 1351/4929 1446/4841

Matching basic blocks

We evaluate:• Matched basic blocks that are semantically the same;• Matched ones that are not semantically equivalent but have both a predecessor and a

successor matched;• Basic blocks are not semantically equivalent but have either a predecessor or a successor

matched.• The time taken by input generation and deep taint;

Effectiveness of deep taint

• Results show that more than 34% and 67% of the matched basic blocks in thttpd and gzip contain the same taint tags.

– a large number of these matchings do contain the same taint tags;– even though many basic blocks are not tainted by our limited number of

program inputs, their neighbors are tainted in most cases and the tainted neighbors help matchings to be identified.

• Percentage of matched basic blocks with the same taint representation

thttpd- 2.20 2.20c 2.21 2.25

2.19 34.8% 38.2% 39.9% 37.4%

gzip- 1.3.12 1.3.13 1.40

1.2.4 67.9% 72.2% 72.6%

Accuracy

• BB_1371 from thttpd-2.19 should match with BB_1689 in thttpd-2.25, both of which deal with the “-i” argument.

• However, BB_1687 in thttpd-2.25 also contains the same (type of) instructions, which confuses the binary diffing tool in the matching.

Discussions

• Limitations– The power of iBinHunt is limited by the non-perfect basic block coverage.– In our experiments with thttpd and gzip, some basic blocks are not covered

even if we continue to generate new program inputs– Performance

• Future work– More optimization on the code to improve efficiency. – Parallelizing Dynamic Taint Tracking– More in-depth binary difference analysis, in which (part of) the programs are

only semantically equivalent on certain subset of the inputs.

Conclusion

• Introduce function obfuscation attacks in existing binary diffing tools that analyze intra-procedural control flow of programs.

• Propose a novel binary diffing tool called iBinHunt which analyzes the inter-procedural control flow.

• iBinHunt makes use of a novel technique called deep taint.

top related