mimimorphism: a new approach to binary code obfuscation

49
The College of WILLIAM k MARY Mimimorphism: A New Approach to Binary Code Obfuscation Zhenyu Wu, Steven Gianvecchio, Mengjun Xie Advisor: Dr. Haining Wang

Upload: ivan

Post on 23-Feb-2016

48 views

Category:

Documents


1 download

DESCRIPTION

Mimimorphism: A New Approach to Binary Code Obfuscation. Zhenyu Wu, Steven Gianvecchio, Mengjun Xie Advisor: Dr. Haining Wang. Malware Propagation & Detection. Internet & Ubiquitous Computing Billions of networked computers Playground for malware Suppression Techniques Static analysis - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Mimimorphism: A New Approach to Binary Code Obfuscation

The College of WILLIAM k MARY

Mimimorphism:A New Approach to Binary Code

ObfuscationZhenyu Wu, Steven Gianvecchio, Mengjun Xie

Advisor: Dr. Haining Wang

Page 2: Mimimorphism: A New Approach to Binary Code Obfuscation

The College of WILLIAM k MARY 2

Internet & Ubiquitous Computing◦ Billions of networked computers◦ Playground for malware

Suppression Techniques◦ Static analysis

Low latency, high throughput Widely used, IDS deployable

◦ Dynamic analysis

Malware Propagation & Detection

Page 3: Mimimorphism: A New Approach to Binary Code Obfuscation

The College of WILLIAM k MARY 3

Un-obfuscated◦ Binary in plain

Oligomorphism◦ Simple transformation (XOR)

Polymorphism◦ Compression and encryption

Metamorphism◦ Meta transformation (P-code)

State of the Art◦ Control-flow encryption◦ Byte frequency manipulation

Unique substring◦ Segments of the binary

Algorithmic detection◦ Build in transformations

Statistical analysis◦ Anomalies in code body

Advanced pattern matching◦ N-gram signatures

Semantic analysis◦ Persist high-level fingerprints

The Game of Hide and Seek

Page 4: Mimimorphism: A New Approach to Binary Code Obfuscation

The College of WILLIAM k MARY 4

Fugitive On The Run

WANTED

$5,000,000

Page 5: Mimimorphism: A New Approach to Binary Code Obfuscation

The College of WILLIAM k MARY 5

Fugitive On The Run

Page 6: Mimimorphism: A New Approach to Binary Code Obfuscation

The College of WILLIAM k MARY 6

Polymorphism◦ Compression & Encryption

Nobody looks like a small dark box!

Fugitive On The Run

??

Page 7: Mimimorphism: A New Approach to Binary Code Obfuscation

The College of WILLIAM k MARY 7

Metamorphism◦ Reordering Components

Cannot evade feature detections

Fugitive On The Run

Wanted

$5,000,000

!!

Page 8: Mimimorphism: A New Approach to Binary Code Obfuscation

The College of WILLIAM k MARY 8

Control Flow Encryption◦ Prevent feature analysis

Increases suspicion

Fugitive On The Run

??

Page 9: Mimimorphism: A New Approach to Binary Code Obfuscation

The College of WILLIAM k MARY 9

The Real Player◦ Assume other people’s identity (Mimicry)

Fugitive On The Run

Page 10: Mimimorphism: A New Approach to Binary Code Obfuscation

The College of WILLIAM k MARY 10

Lessons Learned:◦ Evasion without obfuscating features

◦ Evasion by refusing inspection

◦ Evasion by mimicking Obfuscating original features Open to inspection, but disguises detection

Fugitive On The Run

Page 11: Mimimorphism: A New Approach to Binary Code Obfuscation

The College of WILLIAM k MARY 11

Mimimorphism:◦ Reversible transformation of an executable that produces

output statically resembles other benign programs

◦ Characteristics: Completely erases features from the original binary High order statistics matches benign executables Transformed payload consists of “meaningful” control flows,

highly resemble those from benign executables

Binary Executable Mimicry

Page 12: Mimimorphism: A New Approach to Binary Code Obfuscation

The College of WILLIAM k MARY 12

Text Stenography Technique◦ Transforms the input data and produces mimicry output

copies that assume statistical and grammatical (structural) properties of another type of data

◦ Originally proposed by Peter Wayner as means to transport sensitive data under harsh surveillance Novel use of Huffman coding

Mimic Functions

Page 13: Mimimorphism: A New Approach to Binary Code Obfuscation

The College of WILLIAM k MARY 13

Huffman Tree

Huffman Coding◦ Digesting

Builds a Huffman tree according to the symbol frequency

◦ Encoding Removes redundancies of the input

data using a given Huffman tree◦ Decoding

Recovers the original data from the “condensed” data by emitting symbols according to the original Huffman tree

Mimic Functions

s

m a

0 1

0 1

mass 000111(32 bits) (6 bits)

01 s00 m01 a

Page 14: Mimimorphism: A New Approach to Binary Code Obfuscation

The College of WILLIAM k MARY 14

What if we decode a piece of random data?◦ Produces “meaningless” data, but

The output exhibits similar symbol frequency to the digest- and -

Input data can be recovered by Huffman encode

Regular Mimic Function◦ Learn: Build a Huffman tree from sample text◦ Mimicry: Huffman decode on input (randomized)◦ Recover: Huffman encode

Mimic Functions

Page 15: Mimimorphism: A New Approach to Binary Code Obfuscation

The College of WILLIAM k MARY 15

Huffman “Forest”

Insufficiencies◦ Produces illegible, garbled text◦ Frequency distributions follow 2n

distribution High-order Mimic Function

◦ Captures interdependencies Build multiple Huffman trees One for each unique symbol prefix

◦ Produces “sensible” text with much more “natural” symbol frequency distributions

Mimic Functions

c

l n

0 1

0 1

chi

p t

0 1

ins

rou

t

0 1

n g

0 1

Page 16: Mimimorphism: A New Approach to Binary Code Obfuscation

The College of WILLIAM k MARY 16

Mimicry of Peter Wayner’s paper◦ Produced by 6th order mimic function

Each of these historical reason, I don’t recommend using gA(t) to choose the safe. These one-to-one encoded with n leaves and punctuation. The starting every intended to find the same order mimic files. A Method is to break the trees by constructing the mimics the path down the most even though, offer no way that is, in this paper. Figure will not overflow memory. These produced by truncating letter. This need to handle n-th ordered compartment of nonsense words cannot bear any resemblance to B because this task is a Huffman showed in [1], [2], [3] among others.

Mimicry Text Sample

Page 17: Mimimorphism: A New Approach to Binary Code Obfuscation

The College of WILLIAM k MARY 17

The Challenge: Machine Language Mimicking◦ Consists of instructions and control flows

Each instruction has a strict format to follow Machines never make “typo”, or use wrong “tense”!

◦ Mimic function has no knowledge of instructions Often makes mistakes generating instructions Have a low success rate of creating mimicry control flows

Our Solution◦ Integrate a custom assembler / disassembler◦ Help the mimic function understand the language

Mimimorphism

Page 18: Mimimorphism: A New Approach to Binary Code Obfuscation

The College of WILLIAM k MARY 18

Digesting

Mimimorphism: Digesting

Exec.Binaries

Mimicry Target

DisassembleHigh Order Instruction

Mimic Function

Instruction Huffman Forest

Mimicry Digest

PUSH

DEC

MOV

XOR

Control Flows

Page 19: Mimimorphism: A New Approach to Binary Code Obfuscation

The College of WILLIAM k MARY 19

Digesting

MOV

Inst. Prefixes(Atomic op., repeat, operand size, etc.)

ModR/M(Mod / Reg. / R/M)

SIB(Scale / Idx. / Base)

Displacement

MOV

Mimimorphism: Digesting

Exec.Binary

INC

PUSH

0 1

0 1

PUSH

DEC

MOV

XOR

COMMON_INST StructureInstruction Huffman Tree

Instruction Prefix

MOV

MOV

XORPUSHDEC

Page 20: Mimimorphism: A New Approach to Binary Code Obfuscation

The College of WILLIAM k MARY 20

Digesting

MOV

Inst. Prefixes(Atomic op., repeat, operand size, etc.)

ModR/M(Mod / Reg. / R/M)

SIB(Scale / Idx. / Base)

Displacement

Mimimorphism: Digesting

INC

PUSH

0 1

0 1

XORPUSHDEC

PUSH

DEC

MOV

XOR

COMMON_INST StructureInstruction Huffman Tree

Instruction Prefix

MOV

Instruction Encoding TemplateMOV

Page 21: Mimimorphism: A New Approach to Binary Code Obfuscation

The College of WILLIAM k MARY 21

DigestingMOV

MOV

Inst. Prefixes(Atomic op., repeat, operand size, etc.)

ModR/M(Mod / Reg. / R/M)

SIB(Scale / Idx. / Base)

Displacement

Mimimorphism: Digesting

Inst. Prefix

16bit REP

0 1

ModR/M

EAX

0 1

ECX EDX

0 1

……

DisplacementSIB

2x8+16 3x4+0

0 1

Inst. Prefixes(Atomic op., repeat, operand size, etc.)

ModR/M(Mod / Reg. / R/M)

SIB(Scale / Idx. / Base)

Displacement

Instruction Encoding Template

Page 22: Mimimorphism: A New Approach to Binary Code Obfuscation

The College of WILLIAM k MARY 22

Digesting

Mimimorphism: Digesting

INC

PUSH

0 1

0 1

XORPUSHDEC

Instruction Huffman Tree

Instruction Prefix

MOV

MOV

Inst. Prefix

16bit REP

0 1

ModR/M

EAX

0 1

ECX EDX

0 1

……

DisplacementSIB

2x8+16 3x4+0

0 1

Instruction Encoding Template

MOV

Inst. Prefixes(Atomic op., repeat, operand size, etc.)

ModR/M(Mod / Reg. / R/M)

SIB(Scale / Idx. / Base)

Displacement

Page 23: Mimimorphism: A New Approach to Binary Code Obfuscation

The College of WILLIAM k MARY 23

Digesting

Mimimorphism: Digesting

MOV

INC

PUSH

0 1

0 1

XORPUSHDEC

Instruction Huffman Tree

Instruction Prefix

MOV

XORPUSHDEC

XORPUSHDEC

XORPUSHDEC

MOV

Instruction PrefixMOV

Inst. Prefixes(Atomic op., repeat, operand size, etc.)

ModR/M(Mod / Reg. / R/M)

SIB(Scale / Idx. / Base)

Displacement

Page 24: Mimimorphism: A New Approach to Binary Code Obfuscation

The College of WILLIAM k MARY 24

Digesting

Mimimorphism: Digesting

MOV

INC

PUSH

0 1

0 1

XORPUSHDEC

MOV

CMP

XCHG

10

10

PUSHDECMOV

JMP CALL

10

DECMOVPOP

Mimimorphic Digest

Instruction Prefix

PUSHDECMOV

Page 25: Mimimorphism: A New Approach to Binary Code Obfuscation

The College of WILLIAM k MARY 25

Encoding

Mimimorphism: Encoding

Binary Data

PRNG

High Order Instruction

Mimic FunctionMimicry Digest

Assemble

MimicryBinaries

Page 26: Mimimorphism: A New Approach to Binary Code Obfuscation

The College of WILLIAM k MARY 26

Encoding

Mimimorphism: Encoding

Binary Data

01001001100101010001010010001001

XORPUSHDEC

Instruction Prefix

Mimicry Digest

MOV

INC

PUSH

0 1

0 1

XORPUSHDEC

Instruction Huffman Tree

Page 27: Mimimorphism: A New Approach to Binary Code Obfuscation

The College of WILLIAM k MARY 27

Instruction Encoding Template

Encoding

Mimimorphism: Encoding

Binary Data

01001001100101010001010010001001

MOV

INC

PUSH

0 1

0 1

Instruction Huffman Tree

MOV

XORPUSHDEC

MOV

Inst. Prefix

16bit REP

0 1

ModR/M

EAX

0 1

ECX EDX

0 1

……

DisplacementSIB

2x8+16 3x4+0

0 1

Page 28: Mimimorphism: A New Approach to Binary Code Obfuscation

The College of WILLIAM k MARY 28

Instruction Encoding Template

Encoding

Mimimorphism: Encoding

01001001100101010001010010001001

MOV

Inst. Prefix

16bit REP

0 1

ModR/M

EAX

0 1

ECX EDX

0 1

……

DisplacementSIB

2x8+16 3x4+0

0 1

16bit

ECX

3x4+0

Page 29: Mimimorphism: A New Approach to Binary Code Obfuscation

The College of WILLIAM k MARY 29

Encoding

Mimimorphism: Encoding

01001001100101010001010010001001

MOV

Inst. Prefixes(Atomic op., repeat, operand size, etc.)

ModR/M(Mod / Reg. / R/M)

SIB(Scale / Idx. / Base)

Displacement

COMMON_INST Structure

Instruction Encoding TemplateMOV

Inst. Prefix

16bit REP

0 1

ModR/M

EAX

0 1

ECX EDX

0 1

……

DisplacementSIB

2x8+16 3x4+0

0 1

16bit

ECX

3x4+0

Page 30: Mimimorphism: A New Approach to Binary Code Obfuscation

The College of WILLIAM k MARY 30

Encoding

Mimimorphism: Encoding

01001001100101010001010010001001

MOV

Inst. Prefixes(Atomic op., repeat, operand size, etc.)

ModR/M(Mod / Reg. / R/M)

SIB(Scale / Idx. / Base)

Displacement

COMMON_INST Structure

PUSH

DEC

?

XOR

MOV

Page 31: Mimimorphism: A New Approach to Binary Code Obfuscation

The College of WILLIAM k MARY 31

Encoding

Mimimorphism: Encoding

01001001100101010001010010001001

PUSH

DEC

MOV

XOR

MOV

XORPUSHDEC

MOV

Instruction Prefix

Page 32: Mimimorphism: A New Approach to Binary Code Obfuscation

The College of WILLIAM k MARY 32

Decoding

Mimimorphism: Decoding

Binary Data

PRNG

High Order Instruction

Mimic FunctionMimicry Digest

MimicryBinaries

Disassemble

Page 33: Mimimorphism: A New Approach to Binary Code Obfuscation

The College of WILLIAM k MARY 33

Training◦ Select 100 Windows XP system files as mimicry target

They represent typical legitimate binaries◦ Trained using 7th and 8th order mimimorphic engines

Most control flow basic blocks have 7-8 instructions

Evaluations◦ Statistical Anomaly Tests

Kolmogorov-Smirnov Test & Entropy Test◦ Semantic Detection Test

Control Flow Fingerprinting

Experimental Setup

Page 34: Mimimorphism: A New Approach to Binary Code Obfuscation

The College of WILLIAM k MARY 34

Statistical Tests◦ Kolmogorov-Smirnov Test

Maximum byte frequency distribution differences

Legitimate: 0.074±0.045; Mimimorphic: 0.093±0.006

◦ Entropy Test Measurement of predictability

(or randomness) of data Legitimate: 6.353±0.258;

Mimimorphic: 6.528±0.021

Evaluation Results

0.074

0.09

6.353

0.516

Page 35: Mimimorphism: A New Approach to Binary Code Obfuscation

The College of WILLIAM k MARY 35

Semantic Tests◦ Control Flow Fingerprinting

Statically analyze executables (with a special disassembler) and extract control flow patterns

Detecting malwares by matching their characteristic control flow patterns (i.e., shared fingerprints)

◦ Between original binary and Mimimorphic instances Shared fingerprints: the lower the better Only 1 out of 100 instances share a single fingerprint (out of

hundreds of thousands fingerprints)

Evaluation Results

Page 36: Mimimorphism: A New Approach to Binary Code Obfuscation

The College of WILLIAM k MARY 36

Semantic Tests◦ Between mimimorphic and legitimate binaries

Shared fingerprints: the higher the better 7th order mimimorphic instances:

Average 1856.46±372.5 (72.93 benign files) Minimum 1057 (44 files); Maximum 3321 (92 files)

8th order mimimorphic instances: Average 11407.99±912.42 (81.37 benign files) Minimum 9606 (70 files); Maximum 14216 (91 files)

Evaluation Results

Page 37: Mimimorphism: A New Approach to Binary Code Obfuscation

The College of WILLIAM k MARY 37

Semantic Tests◦ A sample mimicry control

flow pattern Reproduced by a 7th order

mimimorphic instance

Evaluation Results

Page 38: Mimimorphism: A New Approach to Binary Code Obfuscation

The College of WILLIAM k MARY 38

Application Constraint◦ Memory consumption: 600MB for 7th order and 1.2GB for

8th order mimimorphic transformation Disk-based on-demand digest storage

◦ Size increase: 20x inflation for 7th order and 30x for 8th order mimimorphic transformation Typical malware are less than 100KB Mimimorphism results in 2~3MB files

Limitations & Discussions

Page 39: Mimimorphism: A New Approach to Binary Code Obfuscation

The College of WILLIAM k MARY 39

We propose mimimorphism as a novel binary obfuscation technique

◦ Enhanced high order mimic functions with custom assembler / disassembler

◦ Achieves evasion by disguising, not refusing detection

◦ Effective against both statistical anomaly detection as well as semantic fingerprinting tests

Conclusion

Page 40: Mimimorphism: A New Approach to Binary Code Obfuscation

The College of WILLIAM k MARY 40

Robustness against other approaches◦ Automatic n-gram detections

Typical x86 instruction length: 2.1~2.8 8th order mimimorphism can approach 16-gram mimicry Existing n-gram detection algorithms can hardly scale up to

◦ Static semantic analysis Mimimorphism does not target specific detection techniques Focuses on reproducing features from benign programs Immune to lower order signature detections

Limitations & Discussions

Page 41: Mimimorphism: A New Approach to Binary Code Obfuscation

The College of WILLIAM k MARY 41

Robustness against other approaches◦ Deep syntactic analysis

Fails to exactly reproduce high level syntactic features: 45% “functions” do not have matching prologue and epilogue Many jump instructions go across function boundaries

Detectable program-level anomalies Not all programs follow conventions Could lead to false positives

Limitations & Discussions

Page 42: Mimimorphism: A New Approach to Binary Code Obfuscation

The College of WILLIAM k MARY

Questions?

Page 43: Mimimorphism: A New Approach to Binary Code Obfuscation

The College of WILLIAM k MARY 43

The Problem of the Unpacker◦ Mimimorphic transformation does not provide solution for

hiding the unpacker◦ However, we believe unpackers do benefit from using

mimimorphism Unpacker is the weakness of polymorphism because it is

easy to be “spotted” – all other payload is not executable! All mimimorphic payload is “executable”, separating

unpacker code from the payload becomes non-trivial

Limitations & Discussions

Page 44: Mimimorphism: A New Approach to Binary Code Obfuscation

The College of WILLIAM k MARY 44

Decoding

Mimimorphism: Decoding

Binary Data

PRNG

High Order Instruction

Mimic FunctionMimicry Digest

MimicryBinaries

Disassemble

Page 45: Mimimorphism: A New Approach to Binary Code Obfuscation

The College of WILLIAM k MARY 45

Decoding

MOV

Inst. Prefixes(Atomic op., repeat, operand size, etc.)

ModR/M(Mod / Reg. / R/M)

SIB(Scale / Idx. / Base)

Displacement

MOV

Mimimorphism: Decoding

MimicryBinary

MOV

INC

PUSH

0 1

0 1

PUSH

DEC

MOV

XOR

COMMON_INST StructureInstruction Huffman Tree

Instruction Prefix

MOV

MOV

XORPUSHDEC

00

Decoded Bits

Page 46: Mimimorphism: A New Approach to Binary Code Obfuscation

The College of WILLIAM k MARY 46

XORPUSHDEC

Decoding

MOV

Inst. Prefixes(Atomic op., repeat, operand size, etc.)

ModR/M(Mod / Reg. / R/M)

SIB(Scale / Idx. / Base)

Displacement

Mimimorphism: Decoding

MOV

INC

PUSH

0 1

0 1

COMMON_INST StructureInstruction Huffman Tree

Instruction Prefix

MOV

00

MOV

Inst. Prefix

16bit REP

0 1

ModR/M

EAX

0 1

ECX EDX

0 1

……

DisplacementSIB

2x8+16 3x4+0

0 1

Decoded Bits

Decoded Bits

Page 47: Mimimorphism: A New Approach to Binary Code Obfuscation

The College of WILLIAM k MARY 47

Decoding

Mimimorphism: Decoding

MOV

Inst. Prefix

16bit REP

0 1

ModR/M

EAX

0 1

ECX EDX

0 1

……

DisplacementSIB

2x8+16 3x4+0

0 1

Decoded Bits

MOV

Inst. Prefixes(Atomic op., repeat, operand size, etc.)

ModR/M(Mod / Reg. / R/M)

SIB(Scale / Idx. / Base)

Displacement

Inst. Prefixes(Atomic op., repeat, operand size, etc.)

ModR/M(Mod / Reg. / R/M)

SIB(Scale / Idx. / Base)

Displacement

16bit

ECX

3x4+0

0101

Page 48: Mimimorphism: A New Approach to Binary Code Obfuscation

The College of WILLIAM k MARY 48

Decoding

Decoded Bits

Mimimorphism: DecodingDecoded Bits

MOV

Inst. Prefixes(Atomic op., repeat, operand size, etc.)

ModR/M(Mod / Reg. / R/M)

SIB(Scale / Idx. / Base)

Displacement

0101

MOV

INC

PUSH

0 1

0 1

Instruction Huffman Tree

MOV

XORPUSHDEC

MOV

Inst. Prefix

16bit REP

0 1

ModR/M

EAX

0 1

ECX EDX

0 1

……

DisplacementSIB

2x8+16 3x4+0

0 1

16bit

ECX

3x4+0

00

Page 49: Mimimorphism: A New Approach to Binary Code Obfuscation

The College of WILLIAM k MARY 49

Decoding

0100100110010101

Decoded Bits

Mimimorphism: Decoding

MOV

INC

PUSH

0 1

0 1

Instruction Huffman Tree

MOV

XORPUSHDEC

010100

Instruction Prefix

XORPUSHDEC

XORPUSHDEC

MOV

Instruction Prefix XOR

PUSHDEC