reverse engineering automation

52
Reverse Engineering automation by Anton Dorfman PHDAYS 2014, Moscow

Upload: positive-hack-days

Post on 12-May-2015

427 views

Category:

Technology


4 download

TRANSCRIPT

Page 1: Reverse Engineering automation

Reverse Engineering automation

by Anton Dorfman PHDAYS 2014, Moscow

Page 2: Reverse Engineering automation

Fan of & Fun with Assembly language Researcher Scientist Teach Reverse Engineering since 2001 Candidate of technical science Lecturer at Samara State Technical

University and Samara State Aerospace University

About me

Page 3: Reverse Engineering automation

Intro Simple Trace & Coverage Graph Program Slicing All Together

Agenda

Page 4: Reverse Engineering automation

Intro

Page 5: Reverse Engineering automation

Iterative process Understand small piece of code – make

abstraction in mind Understand all pieces of code in procedure –

unite all abstractions – make abstraction about function

And etc Good visualization important Many routine tasks

Reverse Engineering

Page 6: Reverse Engineering automation

Code localization Data flow dependencies Code flow dependencies Local variables checking Input output procedures parameters

checking Variables range checking Labels naming Function naming Function prototyping

Routine tasks of RE

Page 7: Reverse Engineering automation

Biggest science school - Professor Thomas W. Reps - University of Wisconsin-Madison - http://pages.cs.wisc.edu/~reps/

In Russia – Institute for System Programming Russian Academy of Science - http://www.ispras.ru

Automatic program analysis - Science

Page 8: Reverse Engineering automation

Dynamic Binary Instrumentation (DBI) Intermediate representation (IR) System emulators

Technologies that helps

Page 9: Reverse Engineering automation

Simple

Page 10: Reverse Engineering automation

Function Variable Label

Just naming

Page 11: Reverse Engineering automation

Trace & Coverage

Page 12: Reverse Engineering automation

Also called Execution Trace Trace of program execution Simpe case - just a list of addresses that

instruction pointer takes on single run

Code Trace

Page 13: Reverse Engineering automation

Code Trace example

Page 14: Reverse Engineering automation

Firstly used as a measure to describe the degree to which the source code of a program is tested by a particular test suite.

List of instructions that executed during single run

List of unique addresses from program trace

Code Coverage

Page 15: Reverse Engineering automation

Code Coverage example

Page 16: Reverse Engineering automation

Difference between code coverage can help to locate code that do some functionality

Common code coverage – common functionality

More runs – more diff between code coverage – precise code localization

Code Coverage Diff

Page 17: Reverse Engineering automation

Code Coverage Diff Example

Page 18: Reverse Engineering automation

The collection of all memory accesses performed by an application in single run

Include both writes and reads

Memory Trace

Page 19: Reverse Engineering automation

Include Code Trace Include all registers values and memory

values at every execution point May be absolute – save all values Relative – just save values that changed at

this execution point

Full Program Trace

Page 20: Reverse Engineering automation

Graph

Page 21: Reverse Engineering automation

Directed graph that shows control dependencies between blocks of commands

Each node represents basic block Basic block – piece of code ends with jump,

starts with jump target without any jump or jump target inside block

Two special blocks – entry block and exit block

Control Flow Graph (CFG)

Page 22: Reverse Engineering automation
Page 23: Reverse Engineering automation

Directed graph that represents calling relationships between subroutines in a computer program

Each node represents procedure Each edge (a, b) indicates that procedure a calls

procedure b Cycle in the graph indicates recursive procedure

calls Static call graph represents every possible run of

the program Dynamic call graph is a record of an execution of

the program

Call Graph

Page 24: Reverse Engineering automation

Call Graph example

Page 25: Reverse Engineering automation

Directed graph that represents data dependencies between a number of operations

Each node represents operation Each edge represents variable

Data Flow Graph (DFG)

Page 26: Reverse Engineering automation

Data Flow Graph example

Page 27: Reverse Engineering automation

Ottenstein & Ottenstein – PDG, 1984 Actually – Procedure dependence graph because

introduced for programs with one procedure Each node represents a statement Two types of edges Control Dependence – between a predicate and

the statements it controls Data Dependence – between statements

modifying a variable and those that may reference it

Special “Entry” node is connected to all nodes that are not control dependant

Program Dependence Graph (PDG)

Page 28: Reverse Engineering automation

PDG example

Page 29: Reverse Engineering automation

Horowitz, Reps & Binkly – SDG, 1990 PDG included for procedures New nodes: Call Site, Procedure Entry, Actual-in-

argument, Actual-out-argument, Formal-in-parameter, Formal-out-parameter

3 new edge types Call Edge – connect “call site” and “procedure

entry” Parameter-In Edge – connect “Actual-in” with

“Formal-in” Parameter-Out-Edge – connect “Actual-out” with

“Formal-out”

System Dependence Graph (SDG)

Page 30: Reverse Engineering automation
Page 31: Reverse Engineering automation
Page 32: Reverse Engineering automation

Program Slicing

Page 33: Reverse Engineering automation

Large programs must be decomposed for understanding and manipulation.

However, it should be into procedures and abstract data types.

Program Slicing is decomposition based on data flow and control flow analysis.

A study showed, experienced programmers mentally slicing while debugging.

“The mental abstraction people make when they are debugging a program” [Weiser]

Program Slicing - Mark Weiser, 1979

Page 34: Reverse Engineering automation

All the statements of a program that may affect the values of some variables in a set V at some point of interest i.

A slicing criterion of a program P is a tuple (i, V), where i is a statement in P and V is a subset of variables in P.

Slicing Criterion:C = (i , V)

What is a Slice?

Page 35: Reverse Engineering automation

Example of Slices

Page 36: Reverse Engineering automation

Direction of slicing◦ Backward◦ Forward

Slicing techniques◦ Static ◦ Dynamic◦ Conditioned

Levels of slices◦ Intraprocedural slicing◦ Interprocedural slicing

Slicing classifications

Page 37: Reverse Engineering automation

Original Slicing Method Backward slice of a program with respect to a

program point i and set of program variables V consists of all statements and predicates in the program that may affect the value of variables in V at I

Answer the question “what program components might effect a selected computation?”

Preserve the meaning of the variable (s) in the slicing criterion for all possible inputs to the program

Backward slicing

Page 38: Reverse Engineering automation

Slice criterion <12,i>◦ 1 main( )◦ 2 {◦ 3 int i, sum;◦ 4 sum = 0;◦ 5 i = 1;◦ 6 while(i <= 10)◦ 7 {◦ 8 Sum = sum + 1;◦ 9 ++ i;◦ 10 }◦ 11 Cout<< sum;◦ 12 Cout<< i;◦ 13 }

Backward slicing example

Page 39: Reverse Engineering automation

• Forward slice of a program with respect to a program point i and set of program variables V consists of all statements and predicates in the program that may be affected by the value of variables in V at I

• Answers the question “what program components might be effected by a selected computation?”

• Can show the code affected by a modification to a single statement

Forward Slicing

Page 40: Reverse Engineering automation

Slice criterion <3,sum>◦ 1 main( )◦ 2 {◦ 3 int i, sum;◦ 4 sum = 0;◦ 5 i = 1;◦ 6 while(i <= 10)◦ 7 {◦ 8 sum = sum + 1;◦ 9 ++ i;◦ 10 }◦ 11 Cout<< sum;◦ 12 Cout<< i;◦ 13}

Forward Slicing example

Page 41: Reverse Engineering automation

Static Slicing does not make any assumptions regarding the input.

Slices derived from the source code for all possible input values

May lead to relatively big slices Contains all statements that may affect a

variable for every possible execution Current static methods can only compute

approximations

Static Slicing

Page 42: Reverse Engineering automation

Slice criterion (12,i)◦ 1 main( )◦ 2 {◦ 3 int i, sum;◦ 4 sum = 0;◦ 5 i = 1;◦ 6 while(i <= 10)◦ 7 {◦ 8 sum = sum + 1;◦ 9 ++ i;◦ 10 }◦ 11 Cout<< sum;◦ 12 Cout<< i;◦ 13 }

Static Slicing example

Page 43: Reverse Engineering automation

First introduced by Korel and Laski Dynamic Slicing assumes a fixed input for a

program. Only the dependences that occur in a specific

execution of the program are taken into account Computed on a given input Dynamic slicing criterion is a triple (input,

occurrence of a statement, variable) – it specifies the input, and distinguishes between different occurrences of a statement in the execution history

Dynamic Slicing

Page 44: Reverse Engineering automation

1. read (n)2. for I := 1 to n do3. a := 24. if c1==1 then5. if c2==1 then6. a := 47. else8. a := 69. z := a10. write (z)

Dynamic Slicing example

• Assumptions– Input n is 1– C1, c2 both true– Execution history is 11, 21, 31, 41, 51, 61, 91, 22,

101

– Slice criterion<1, 101, z>

Page 45: Reverse Engineering automation

Assumptions - Input ‘a’ is positive number

1. read(a) 2. if (a < 0)3. a = -a4. x = 1/a

Conditioned slice example

Page 46: Reverse Engineering automation

Computes slice within one procedure Consists basically of two steps: A single slice of the procedure containing

the slicing criterion is made. Procedure calls from within this procedure

are sliced using new criteria.

Intraprocedural slicing

Page 47: Reverse Engineering automation

Compute slice over an entire program Two ways for crossing procedure boundary Up – going from sliced procedure into

calling procedure Down – going from sliced procedure into

called procedure Must Be Context Sensitive

Interprocedural Slicing

Page 48: Reverse Engineering automation

Chopping Value Set Analysis

Also

Page 49: Reverse Engineering automation

CodeSurfer◦ Commercial product by GammaTech Inc.◦ GUI Based◦ Scripting language-Tk

Unravel◦ Static program slicer developed at NIST◦ Slices ANSI C programs◦ Limitations are in the treatment of Unions, Forks

and pointers to functions

Program slicing tools

Page 50: Reverse Engineering automation

All together

Page 51: Reverse Engineering automation

Slicing of Register on Code Coverage Graph based view of file reading and moves

between memory blocks

Some Results

Page 52: Reverse Engineering automation

[email protected]

Thank you!