obfuscation and tamperproofing
DESCRIPTION
Obfuscation and Tamperproofing. Clark Thomborson 19 March 2010. What Secrets are in Software?. Source Code, Algorithms: competitors might provide similar functionality at less R&D cost. Constants: pirates might exploit knowledge of a decryption key or other compact secret. - PowerPoint PPT PresentationTRANSCRIPT
1
Obfuscation and Tamperproofing
Clark Thomborson
19 March 2010
2
What Secrets are in Software?• Source Code, Algorithms: competitors might
provide similar functionality at less R&D cost.• Constants: pirates might exploit knowledge of a
decryption key or other compact secret.• Internal function points: pirates might tamper with
critical code e.g. if ( not licensed ) exit( ).• External interfaces: competitors might exploit a
“service entrance”; attackers might create a “backdoor”.
3
Security Boundary for Obfuscated Code
Algorithm
Function Points
Secret Keys
Secret Interface
Source code PObfuscated code O(X)• Same behaviour as X• Released to attackers
who want to know secrets: source code P, algorithm, unobfuscated X, function points, …
Obfuscator
Executable X
Compiler
CPU
GUI
4
Security Boundary for Encrypted Code
Compiler
Encrypted code E(X)
Encrypter
Executable X
GUI
Decrypter
CPU
Decrypted X
• Encryption requires a black-box CPU.• Note: I/O must be limited. No debuggers allowed!
Algorithm
Function Points
Secret Keys
Secret Interface
Source code P
5
Design Issues for Encrypted Code
• Key distribution– Tradeoff: security for expense & functionality.
• Branches into an undecrypted block will stall the CPU until the target is decrypted.• This runtime penalty is proportional to block size.• Stronger encryption larger blocks larger runtime
penalty. Another tradeoff.• The RAM buffer and the decrypter must be large and
fast, to minimize the number of undecrypted blocks.• A black-box system with a large and fast RAM (more than
will fit in the caches of a single-chip CPU) will be either expensive or insecure. A third tradeoff.
6
Debugging Encrypted Code
Compiler
E(X)Encrypter
Executable X
GUI
Decrypter
CPU
Decrypted X
• Usually, a secret interface is an Easter Egg: easy to find if you know where to look! A confidentiality risk.•Mitigation: special hardware required to access the secret interface.
Algorithm
Function Points
Secret Keys
Secret Interface
Source code P
7
Tampering Attack on Encrypted Code
Compiler
E(X)Encrypter
Executable X
GUI
Decrypter
CPU
Decrypted X
• Random x86 code is likely to crash or loop (Barrantes, 2003).•Mitigation: cryptographically signed code. The system should
test the signature on an executable before running it.
Algorithm
Function Points
Secret Keys
Secret Interface
Source code P
E’(X)
8
Intrusion Attack on Encrypted Code
Compiler
E(X)Encrypter
Executable X
GUI
Decrypter
CPU
Decrypted X
• The attacker might find a way to inject code through the GUI.•Mitigations: secure programming techniques, type-safe
programming languages, safety analysis on X, runtime intrusion detections, sandboxing, …
Algorithm
Function Points
Secret Keys
Secret Interface
Source code P
E(X)
9
Tampering Attack on Obfuscated Code
Algorithm
Function Points
Secret Keys
Secret Interface
Source code P
• Mitigation 1: O(X) might check its own signature. Note: O’(X) might not include this check!
• Mitigation 2: obfuscate X so heavily that attacker is only able to inject random code.
Obfuscator
Executable X
Compiler
CPU GUI
O(X)
O’(X)
10
Typical Obfuscation Techniques• Lexical obfuscations:
– Obscure names of variables, methods, classes, interfaces, etc. (We obscure opcodes in our new framework.)
• Data obfuscations:– Obscure values of variables, e.g. encoding several
booleans in one int, or encoding one int in several floats;
– Obscure data structures, e.g. transforming 2-d arrays into vectors, and vice versa;
• Control obfuscations:– Inlining and outlining, to obscure procedural abstractions;– Opaque predicates, to obscure control flow.– (Control flow is obscured in our new obfuscation, because
branching opcodes look like non-branching opcodes.)
• Put a secret FSM in the CPU fetch-execute hardware, or in the interpreter. The FSM translates opcodes immediately after the decode.
• Software is “diversified” before it is obfuscated: basic blocks are subdivided, scrambled into random order, and instructions within blocks are reordered randomly (where possible).
• Diversified software must be custom-translated for each FSM.– This implies that the software producer must know the serial number of
its customer’s FSM.– We cannot allow the attacker to learn this information.– This is a classic key-distribution problem. Unfortunately, the keying is
symmetric, because our opcode translation is not a one-way function.• Individualised FSMs could be distributed as obfuscated
software or firmware, or might be hard-wired into CPU chips.
Obfuscated Interpretation
FetchUnit
DecodeUnit
FSMUnit
ExecuteUnit
Start / Stop
12
Obfuscated 2-op Ass’y CodeCleartext:Let x = nLet p = 1Loop: if x = = 0 exit add p, x sub x, 1 goto Loop;
Obfuscated text:Let x = nLet p = 1Loop: if x = = 0 exit sub p, x add x, 1 add p, 0 goto Loop;
FSM translator (in CPU pipeline):
add/subsub/add
add/addsub/sub
add
add
sub sub
Starting State
“dummy instruction”to force FSM transition
13
Obfuscated Java Bytecode
1 iconst_02 istore_23 iload_14 istore_15 if_icmpne Label36 Label1:7 irem8 iload_29 iload_110 iload_111 Label4:12 goto Label213 iadd14 istore_2
15 bipush 116 bipush 117 iload_118 pop19 Label2:20 iinc 1 121 bipush 122 goto Label423 Label3:24 iconst_125 iload_026 if_icmple Label127 iadd28 ireturn
• The translating FSM has 8 states, one for each opcode it translates: {goto, if_icmpne, iload_1, iconst_1, iconst_2, iadd, iload_2, irem}
• Could you de-obfuscate this?
• Could you develop a “class attack”? Note: each CPU has a different FSM.
14
Security Analysis• Tampering: an attacker should not be able to
modify the obfuscated code.– Level 1 Attack: an attacker makes a desired
change in program behaviour with a small number of localized changes to representation and semantics, i.e. changing “if (licensed) goto L” into “goto L”.
– Level 2 Attack: an attacker makes a large change in program representation, i.e. by decompiling and recompiling. This may obliterate a watermark, and it will facilitate other attacks.
15
Prohibited Actions (cont.)• Reverse Engineering: an attacker should not be
able to modify or re-use substantial portions (constants, objects, loops, functions) of an obfuscated code.– Level 3 Attack: an attacker makes large-scale changes
in program behaviour, for example by de-obfuscating a decryption key to produce a “cracked” program.
• Automated De-obfuscation: “class attack”.– Level 4 Attack: an attacker makes large-scale changes
to the behaviour of a large number of obfuscated programs, for example by publishing a cracking tool suitable for use by script-kiddies.
16
3-D Threat ModelA. An adversary might have relevant
knowledge & tools;B. An adversary might have relevant powers
of observation;C. An adversary might have relevant control
powers (i.e. causing the CPU to fetch and execute arbitrary codes).
Goal of security analysis: what adversarial powers enable a level-k attack?
17
A. Knowledge and Tools• Level A0: adversary has an obfuscated code X’ and a computer
system with a FSM that correctly translates and executes X’.• Level A1: adversary attended this seminar.• Level A2: adversary knows how to use a debugger with a
breakpoint facility.• Level A3: adversary has tracing software that collects sequences
of de-obfuscated instruction executions, correlated with sequences of obfuscated instructions; and adversary can do elementary statistical computations on these traces.
• Level A4: adversary has an implementation of every FSM Fk(x), obfuscator Fk
-1(x), and an efficient way to derive obfuscation key k from X’.
Our framework seems secure against level-A1 adversaries. Level-A2 adversaries with sufficient motivation (and a
debugger) will eventually progress to Level-A3 and then Level-A4 (which enables a level-4 “class attack”).
18
B. Observations• Level-B0 observation: run X’ on a computer, observe output.• Level-B1 observation: given X’’ and an input I, determine
whether X’’(I) differs from X’(I) in its I/O behaviour.• Level-B2 observation: record a few opcodes and operands
before and after FSM translation. (Use level-A2 tool.)• Level-B3 observation: record a complete trace of de-obfuscated
instructions from a run of P’• Level-B4 observation: determine the index x of a FSM which
could produce a given trace from a run of P’ We estimate that O(n2) level-B2 observations are enough to
promote a level-A2 adversary to level-A3, for FSMs with n states. (The adversary could look for commonly-repeated patterns immediately before branches; these are likely to be “dummy sequences”. Branches may be recognized by their characteristic operand values.)
Level-B4 requires great cryptographic skill or level-C2 control.
19
C. Control Steps• Level-C0 control: keyboard and mouse inputs for a
program run.• Level-C1 control: adversary makes arbitrary changes to
the executable P’, then runs the resulting P’’• Level-C2 control: adversary injects a few (arbitrarily
chosen) opcodes into the fetch unit of the CPU after it reaches an execution breakpoint that is chosen by the adversary. (Use level-A2 tool: debugger.)
• Level-C3 control: Adversary restarts the FSM, then injects arbitrary inputs into the fetch unit at full execution bandwidth.
• Level-C4 control: Adversary can inject arbitrary inputs into software implementations of FSM F(x) and obfuscator F-1(x) for all x.
Level-C2 adversaries will eventually reach Levels C3 and then C4.
20
Summary and Discussion• New framework for obfuscated interpretation
– Faster and cheaper than encryption schemes– Secure, unless an attacker is able to observe and
control the FSM using a debugger (= a level-2 adversary).
– We are still trying to develop an obfuscation-by-translation scheme that can be cracked only by a cryptographer who is also expert in compiler technology (= a level-4 adversary).
21
Future Work• Prototype implementation for Java bytecode.• Dummy insertions need not occur
immediately before branches.– When translating a basic block, we will randomly
choose among the efficiently-executable synonyms that end in the desired state.
– This is the usual process of code optimization, plus randomization and a side-constraint.
• Operand obfuscation!!– Operand values leak information about opcodes.