a model for self-modifying code bertrand anckaert, matias madou and koen de bosschere 8 th...
Post on 30-Dec-2015
214 Views
Preview:
TRANSCRIPT
A Model for Self-Modifying Code
Bertrand Anckaert, Matias Madou andKoen De Bosschere
8th Information Hiding Conference, July 11th 2006
2
o Problem for Reverse-Engineeringo Used for Hiding Program Internals
• Software Protectiono Copyright Protection Mechanismso Secret Algorithmso …
• Malicious intent of viruses
o Program Optimization
Self-Modifying Code
3
Scope
010010101101110101011111101101101101101011001100110011011101010111001101101010101111101111110111000001110010011101101101101101010110101 001001010100 011101011111
010010101101110101011101101101101101101011001100110011011101010111001101101010101111101111110111000001110010011101101101 101101010110101 001011010100 011101011111
Focus: malicious host paradigm
Not: malicious code paradigm
known
4
Goal
o Internal Representation
o Construction and Deconstruction
o Accurate and Conservative
o Analyses and Transformations
5
o Introductiono Running Example
o Internal Representationo Construction and Deconstructiono Analyses and Transformations
o Applications
Overview
Accurate and Conservative
Accurate and Conservative
6
Example: ISA
Assembly Binary Semantics
movb value to 0xc6 value to set byte at address to to value value
inc reg 0x40 reg increment register reg
dec reg 0x48 reg decrement register reg
push reg 0xff reg push register reg
jmp to 0x0c to jump to address to (absolute)
7
Example: Introduction
Address Binary Assembly
0x00x30x50x80xa0xc
c6 0c 0840 01c6 0c 0540 03ff 0248 01
movb 0xc 0x8inc %ebxmovb 0xc 0x5inc %edxpush %ecxdec %ebx
8
Example: Trace
movb 0xc 0x8inc %ebxmovb 0xc 0x5inc %edxpush %ecxdec %ebx
1
movb 0xc 0x8inc %ebxmovb 0xc 0x5jmp 0x3push %ecxdec %ebx
movb 0xc 0x8inc %ebxjmp 0xcjmp 0x3push %ecxdec %ebx
2
3
4
5
6
7
=inc %ebx
2) inc %ebx 3) movb 0xc 0x54) jmp 0x35) inc %ebx6) jmp 0xc7) dec %ebx
Trace: 1) movb 0xc 0x8
1
3
9
o Scopeo Running Exampleo Internal Representation
• Superposition of CFGs• Codebytes• Codebyte Conditional Edges• Consumption of Codebyte Values
o Construction and Deconstructiono Analyses and Transformationso Applications
Overview
10
CFG for Traditional Code
o One of the most important internal representations for traditional code• Well-understood how to:
o construct and deconstructo accurate and conservativeo analysis and transformations
• representation of a superset of all possible executions
11not conservative
Traditional CFG Construction for SMC
movb 0xc 0x8inc %ebxmovb 0xc 0x5inc %edxpush %ecxdec %ebx
inc %ebxmovb 0xc 0x5jmp 0x3push %ecxdec %ebx dec %ebx
push %ecxjmp 0x3
inc %ebxjmp 0xc
movb 0xc 0x8 movb 0xc 0x8
1) movb 0xc 0x82) inc %ebx3) movb 0xc 0x54) jmp 0x35) inc %ebx6) jmp 0xc7) dec %ebx
12,53
7
1
2,534
77
4
2,56
1
not a supersetnot accurate
Unreachable Code Elimination
12
Example: Superposition of CFGs
movb 0xc 0x8
inc %ebx
jmp 0x3
movb 0xc 0x5
dec %ebx
jmp 0xc
inc %edx
push %ecx2) inc %ebx3) movb 0xc 0x54) jmp 0x35) inc %ebx6) jmp 0xc7) dec %ebx
1) movb 0xc 0x8
1
2,5
3
4
6
7
13
Contains CFG 1
movb 0xc 0x8inc %ebxmovb 0xc 0x5inc %edxpush %ecxdec %ebx
movb 0xc 0x8
inc %ebx
jmp 0x3
movb 0xc 0x5
inc %edx
dec %ebx
jmp 0xc
push %ecx
14
Contains CFG 2
inc %ebxmovb 0xc 0x5jmp 0x3
push %ecxdec %ebx
movb 0xc 0x8movb 0xc 0x8
inc %ebx
jmp 0x3
movb 0xc 0x5
inc %edx
dec %ebx
jmp 0xc
push %ecx
15
Contains CFG 3
dec %ebx
push %ecx
jmp 0x3
inc %ebxjmp 0xc
movb 0xc 0x8movb 0xc 0x8
inc %ebx
jmp 0x3
movb 0xc 0x5
inc %edx
dec %ebx
jmp 0xc
push %ecx
16
Superposition of CFGs
o Represents a superset of all possible executions
o But:• how do we linearize a graph with multiple
outgoing/incoming fall-through paths?• how do we analyze what states the program
can be in at a given program point?• …
Extensions
17
o Scopeo Running Exampleo Internal Representation
• Superposition of CFGs• CodeBytes• CodeByte Conditional Edges• Consumption of CodeByte Values
o Construction and Deconstructiono Analyses and Transformationso Applications
Overview
19
Extension 1: CodeBytes
movb 0xc 0x8
inc %ebx
jmp 0x3
movb 0xc 0x5
inc %edx
dec %ebx
jmp 0xc
push %ecx
0x340
0x401
0x60c
0x705
0xaff
0xb02
0x903
0xc48
0xd01
0x8400c
0x5c60c
0x0c6 0x1
0c0x208
20
Extension 2: CodeByte Conditional Edges
movb 0xc 0x8
inc %ebx
jmp 0x3
movb 0xc 0x5
inc %edx
dec %ebx
jmp 0xc
push %ecx
0x340
0x401
0x60c
0x705
0xaff
0xb02
0x903
0xc48
0xd01
0x8400c
0x5c60c
0x0c6 0x1
0c0x208
*(0x5)==c6
*(0x8)==0c
*(0x5)==0c
*(0x8)==40
21
Extension 3: Consumption of CodeBytes
o A codebyte is read when it is interpreted as (part of) an instruction by the CPU
o Important for data analyses, such as liveness analysis
22
Traditional Code vs. Self-Modifying Code
o Traditional Code• No Overlap • Not Self-Inspecting• Not Self-Modifying
o Special case of self-modifying code. Extensions can be omitted because:• Can be easily linearized as instructions do not overlap• Target locations of control transfers can be in only one
state• Result of data analyses on code is trivial as the code is
constant
23
o Scopeo Running Exampleo Internal Representationo Construction and Deconstructiono Analyses and Transformationso Applications
Overview
24
Construction
o Requires that we know:• Targets of control flow• Which instructions write what where
o Not a problem in the malicious host paradigm
o In the malicious code paradigm(Future Work):• Observing dynamic execution• Static extension
25
Linearization
movb 0xc 0x8
inc %ebx
jmp 0x3
movb 0xc 0x5
inc %edx
push %ecx
dec %ebx
jmp 0xc
0x340
0x401
0x60c
0x705
0xaff
0xb02
0x903
0xc48
0xd01
0x8400c
0x5c60c
0x0c6 0x1
0c0x208
c6 0c 0840 01c6 0c 0540 03ff 0248 01
26
Example: Introduction
Address Binary Assembly
0x00x30x50x80xa0xc
c6 0c 0840 01c6 0c 0540 03ff 0248 01
movb 0xc 0x8inc %ebxmovb 0xc 0x5inc %edxpush %ecxdec %ebx
27
o Scopeo Running Exampleo Internal Representationo Construction and Deconstructiono Analyses and Transformations
• Constant Propagation• Unreachable Code(Byte) Elimination• Liveness Analysis• Loop Unrolling
o Applications
Overview
28
*(0x8)==40
Constant Propagation
movb 0xc 0x8
inc %ebx
jmp 0x3
movb 0xc 0x5
inc %edx
dec %ebx
jmp 0xc
push %ecx
0x340
0x401
0x60c
0x705
0xaff
0xb02
0x903
0xc48
0xd01
0x8400c
0x5c60c
0x0c6 0x1
0c0x208
*(0x5)==c6
*(0x8)==0c
*(0x5)==0c
29
Unreachable Code(Byte) Elimination
movb 0xc 0x8
inc %ebx
jmp 0x3
movb 0xc 0x5
inc %edx
dec %ebx
jmp 0xc
push %ecx
0x340
0x401
0x60c
0x705
0xaff
0xb02
0x903
0xc48
0xd01
0x8400c
0x5c60c
0x0c6 0x1
0c0x208
*(0x5)==c6
*(0x8)==0c
*(0x5)==0c
30
Liveness Analysis
movb 0xc 0x8
inc %ebx
jmp 0x3
movb 0xc 0x5
dec %ebx
jmp 0xc
0x340
0x401
0x60c
0x705
0x903
0xc48
0xd01
0x8400c
0x5c60c
0x0c6 0x1
0c0x208
*(0x5)==c6
*(0x8)==0c
*(0x5)==0c
0x8
31
Idempotent Instruction Removal
movb 0xc 0x8
inc %ebx
jmp 0x3
movb 0xc 0x5
dec %ebx
jmp 0xc
0x340
0x401
0x60c
0x705
0x903
0xc48
0xd01
400c
0x5c60c
0x0c6 0x1
0c0x208
*(0x5)==c6
*(0x8)==0c
*(0x5)==0c
0x8
32
1) movb 0xc 0x82) inc %ebx3) movb 0xc 0x54) jmp 0x35) inc %ebx6) jmp 0xc7) dec %ebx
Loop Unrolling and …
inc %ebx
_cc60c
_e0c
jmp 0xc
dec %ebx
jmp 0xc
inc %ebx
movb 0xc 0x5movb 0xc _c
movb 0xc 0x5movb 0xc _cjmp 0x3
_a40 _b
01
_f0c
_g0c
_h_c_d
0c
0x5c60c
0x705
0x60c
0x340
0x401
_i0c
_j0c
_k_c
0x340
0x401
*(_c)==0c *(_c)==c6
*(0x5)==0c *(0x5)==c6
=
0xc48
0xd01
33
o Scopeo Running Exampleo Internal Representationo Construction and Deconstructiono Analyses and Transformationso Applications
Overview
34
Applications
o Outlining of almost identical code snippets through one-bit modifiers
o Overlapping similar functions through diff scripts
o Significant slowdown (factor 1.15 up to 3)
35
Almost Identical Code Snippets
push 0xa804245c
pop %ebx
ret
0x068
0x15c
0x4a8
0x55b
0x6c3
0x304
mov 4(%esp),%ebx
test 0x5b,%al
ret
0x224
0x08b
0x15c
0x4a8
0x55b
0x6c3
0x304
0x224
36
Merged Code Snippets
push 0xa804245c
pop %ebx
0x15c
0x4a8
0x55b
0x6c3
0x304
mov 4(%esp),%ebx
test 0x5b,%al
0x224
0x08b68
ret
movb 0x68 0x0
jmp 0x0movb 0x8b 0x0
jmp 0x0
37
Conclusion
o Superposition of different CFGso Three extensions
• CodeByte datastructure• CodeByte conditional edges• Consumption of CodeBytes
Internal Representation Allows for:• Construction (limited) and Deconstruction• Conservative and Accurate• Analyses and Transformations (iterative)
Questions?Presentation: http://www.elis.ugent.be/~banckaer
Tool: http://www.elis.ugent.be/diablo
39
Linearization
o Chains of instructions
Chains of codebyteso Codebytes c and d must be concatenated:
• c and d are successive codebytes in an instruction
• c is the last codebyte of instruction I and d is the first codebyte of instruction J and I and J are successive instructions in a basic block
• c is the last codebyte of basic block A and d is the first codebyte of basic block B and A and B are connected by a fall-through path
40
Example: Superposition of CFGs
movb 0xc 0x8
inc %ebx
jmp 0x3
movb 0xc 0x5
inc %edx
dec %ebx
jmp 0xc
push %ecx
41
Example: Superposition of CFGs
movb 0xc 0x8
inc %ebx
jmp 0x3
movb 0xc 0x5
inc %edx
dec %ebx
jmp 0xc
push %ecx
top related