obfuscation and tamperproofing

1

Obfuscation and Tamperproofing

Clark Thomborson

19 March 2010

2

What Secrets are in Software?• Source Code, Algorithms: competitors might

provide similar functionality at less R&D cost.• Constants: pirates might exploit knowledge of a

decryption key or other compact secret.• Internal function points: pirates might tamper with

critical code e.g. if ( not licensed ) exit( ).• External interfaces: competitors might exploit a

“service entrance”; attackers might create a “backdoor”.

3

Security Boundary for Obfuscated Code

Algorithm

Function Points

Secret Keys

Secret Interface

Source code PObfuscated code O(X)• Same behaviour as X• Released to attackers

who want to know secrets: source code P, algorithm, unobfuscated X, function points, …

Obfuscator

Executable X

Compiler

CPU

GUI

4

Security Boundary for Encrypted Code

Compiler

Encrypted code E(X)

Encrypter

Executable X

GUI

Decrypter

CPU

Decrypted X

• Encryption requires a black-box CPU.• Note: I/O must be limited. No debuggers allowed!

Algorithm

Function Points

Secret Keys

Secret Interface

Source code P

5

Design Issues for Encrypted Code

• Key distribution– Tradeoff: security for expense & functionality.

• Branches into an undecrypted block will stall the CPU until the target is decrypted.• This runtime penalty is proportional to block size.• Stronger encryption larger blocks larger runtime

penalty. Another tradeoff.• The RAM buffer and the decrypter must be large and

fast, to minimize the number of undecrypted blocks.• A black-box system with a large and fast RAM (more than

will fit in the caches of a single-chip CPU) will be either expensive or insecure. A third tradeoff.

6

Debugging Encrypted Code

Compiler

E(X)Encrypter

Executable X

GUI

Decrypter

CPU

Decrypted X

• Usually, a secret interface is an Easter Egg: easy to find if you know where to look! A confidentiality risk.•Mitigation: special hardware required to access the secret interface.

Algorithm

Function Points

Secret Keys

Secret Interface

Source code P

7

Tampering Attack on Encrypted Code

Compiler

E(X)Encrypter

Executable X

GUI

Decrypter

CPU

Decrypted X

• Random x86 code is likely to crash or loop (Barrantes, 2003).•Mitigation: cryptographically signed code. The system should

test the signature on an executable before running it.

Algorithm

Function Points

Secret Keys

Secret Interface

Source code P

E’(X)

8

Intrusion Attack on Encrypted Code

Compiler

E(X)Encrypter

Executable X

GUI

Decrypter

CPU

Decrypted X

• The attacker might find a way to inject code through the GUI.•Mitigations: secure programming techniques, type-safe

programming languages, safety analysis on X, runtime intrusion detections, sandboxing, …

Algorithm

Function Points

Secret Keys

Secret Interface

Source code P

E(X)

9

Tampering Attack on Obfuscated Code

Algorithm

Function Points

Secret Keys

Secret Interface

Source code P

• Mitigation 1: O(X) might check its own signature. Note: O’(X) might not include this check!

• Mitigation 2: obfuscate X so heavily that attacker is only able to inject random code.

Obfuscator

Executable X

Compiler

CPU GUI

O(X)

O’(X)

10

Typical Obfuscation Techniques• Lexical obfuscations:

– Obscure names of variables, methods, classes, interfaces, etc. (We obscure opcodes in our new framework.)

• Data obfuscations:– Obscure values of variables, e.g. encoding several

booleans in one int, or encoding one int in several floats;

– Obscure data structures, e.g. transforming 2-d arrays into vectors, and vice versa;

• Control obfuscations:– Inlining and outlining, to obscure procedural abstractions;– Opaque predicates, to obscure control flow.– (Control flow is obscured in our new obfuscation, because

branching opcodes look like non-branching opcodes.)

• Put a secret FSM in the CPU fetch-execute hardware, or in the interpreter. The FSM translates opcodes immediately after the decode.

• Software is “diversified” before it is obfuscated: basic blocks are subdivided, scrambled into random order, and instructions within blocks are reordered randomly (where possible).

• Diversified software must be custom-translated for each FSM.– This implies that the software producer must know the serial number of

its customer’s FSM.– We cannot allow the attacker to learn this information.– This is a classic key-distribution problem. Unfortunately, the keying is

symmetric, because our opcode translation is not a one-way function.• Individualised FSMs could be distributed as obfuscated

software or firmware, or might be hard-wired into CPU chips.

Obfuscated Interpretation

FetchUnit

DecodeUnit

FSMUnit

ExecuteUnit

Start / Stop

12

Obfuscated 2-op Ass’y CodeCleartext:Let x = nLet p = 1Loop: if x = = 0 exit add p, x sub x, 1 goto Loop;

Obfuscated text:Let x = nLet p = 1Loop: if x = = 0 exit sub p, x add x, 1 add p, 0 goto Loop;

FSM translator (in CPU pipeline):

add/subsub/add

add/addsub/sub

add

add

sub sub

Starting State

“dummy instruction”to force FSM transition

13

Obfuscated Java Bytecode

1 iconst_02 istore_23 iload_14 istore_15 if_icmpne Label36 Label1:7 irem8 iload_29 iload_110 iload_111 Label4:12 goto Label213 iadd14 istore_2

15 bipush 116 bipush 117 iload_118 pop19 Label2:20 iinc 1 121 bipush 122 goto Label423 Label3:24 iconst_125 iload_026 if_icmple Label127 iadd28 ireturn

• The translating FSM has 8 states, one for each opcode it translates: {goto, if_icmpne, iload_1, iconst_1, iconst_2, iadd, iload_2, irem}

• Could you de-obfuscate this?

• Could you develop a “class attack”? Note: each CPU has a different FSM.

14

Security Analysis• Tampering: an attacker should not be able to

modify the obfuscated code.– Level 1 Attack: an attacker makes a desired

change in program behaviour with a small number of localized changes to representation and semantics, i.e. changing “if (licensed) goto L” into “goto L”.

– Level 2 Attack: an attacker makes a large change in program representation, i.e. by decompiling and recompiling. This may obliterate a watermark, and it will facilitate other attacks.

15

Prohibited Actions (cont.)• Reverse Engineering: an attacker should not be

able to modify or re-use substantial portions (constants, objects, loops, functions) of an obfuscated code.– Level 3 Attack: an attacker makes large-scale changes

in program behaviour, for example by de-obfuscating a decryption key to produce a “cracked” program.

• Automated De-obfuscation: “class attack”.– Level 4 Attack: an attacker makes large-scale changes

to the behaviour of a large number of obfuscated programs, for example by publishing a cracking tool suitable for use by script-kiddies.

16

3-D Threat ModelA. An adversary might have relevant

knowledge & tools;B. An adversary might have relevant powers

of observation;C. An adversary might have relevant control

powers (i.e. causing the CPU to fetch and execute arbitrary codes).

Goal of security analysis: what adversarial powers enable a level-k attack?

17

A. Knowledge and Tools• Level A0: adversary has an obfuscated code X’ and a computer

system with a FSM that correctly translates and executes X’.• Level A1: adversary attended this seminar.• Level A2: adversary knows how to use a debugger with a

breakpoint facility.• Level A3: adversary has tracing software that collects sequences

of de-obfuscated instruction executions, correlated with sequences of obfuscated instructions; and adversary can do elementary statistical computations on these traces.

• Level A4: adversary has an implementation of every FSM Fk(x), obfuscator Fk

-1(x), and an efficient way to derive obfuscation key k from X’.

Our framework seems secure against level-A1 adversaries. Level-A2 adversaries with sufficient motivation (and a

debugger) will eventually progress to Level-A3 and then Level-A4 (which enables a level-4 “class attack”).

18

B. Observations• Level-B0 observation: run X’ on a computer, observe output.• Level-B1 observation: given X’’ and an input I, determine

whether X’’(I) differs from X’(I) in its I/O behaviour.• Level-B2 observation: record a few opcodes and operands

before and after FSM translation. (Use level-A2 tool.)• Level-B3 observation: record a complete trace of de-obfuscated

instructions from a run of P’• Level-B4 observation: determine the index x of a FSM which

could produce a given trace from a run of P’ We estimate that O(n2) level-B2 observations are enough to

promote a level-A2 adversary to level-A3, for FSMs with n states. (The adversary could look for commonly-repeated patterns immediately before branches; these are likely to be “dummy sequences”. Branches may be recognized by their characteristic operand values.)

Level-B4 requires great cryptographic skill or level-C2 control.

19

C. Control Steps• Level-C0 control: keyboard and mouse inputs for a

program run.• Level-C1 control: adversary makes arbitrary changes to

the executable P’, then runs the resulting P’’• Level-C2 control: adversary injects a few (arbitrarily

chosen) opcodes into the fetch unit of the CPU after it reaches an execution breakpoint that is chosen by the adversary. (Use level-A2 tool: debugger.)

• Level-C3 control: Adversary restarts the FSM, then injects arbitrary inputs into the fetch unit at full execution bandwidth.

• Level-C4 control: Adversary can inject arbitrary inputs into software implementations of FSM F(x) and obfuscator F-1(x) for all x.

Level-C2 adversaries will eventually reach Levels C3 and then C4.

20

Summary and Discussion• New framework for obfuscated interpretation

– Faster and cheaper than encryption schemes– Secure, unless an attacker is able to observe and

control the FSM using a debugger (= a level-2 adversary).

– We are still trying to develop an obfuscation-by-translation scheme that can be cracked only by a cryptographer who is also expert in compiler technology (= a level-4 adversary).

21

Future Work• Prototype implementation for Java bytecode.• Dummy insertions need not occur

immediately before branches.– When translating a basic block, we will randomly

choose among the efficiently-executable synonyms that end in the desired state.

– This is the usual process of code optimization, plus randomization and a side-constraint.

• Operand obfuscation!!– Operand values leak information about opcodes.

obfuscation and tamperproofing

Documents

critical code

random code

signed code

unobfuscated x

blackbox cpu

compact secret

obscure names of variables

blackbox system