asm2vec: boosting static representation …...asm2vec: boosting static representation robustness for...

Post on 04-Apr-2020

21 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Asm2Vec: Boosting Static Representation Robustness for Binary Clone Search against Code Obfuscation and Compiler Optimization

StevenH.H.Ding

DataMiningandSecurityLab

SchoolofInformationStudies

McGillUniversity

Montreal,Canada

BenjaminC.M.FungDataMiningandSecurityLabSchoolofInformationStudies

McGillUniversity,Montreal,Canada

PhilippeCharland

MissionCriticalCyberSecuritySectionDefenceR&DCanada–Valcartier

Quebec,Canada

Reverseengineer

Manualanalysis

Reverseengineering

2

Didanyoneanalyzesomethingsimilarbefore?Isitalibraryfunction?

f1f2f3

LDR R3,[R11,#sct]LDR R2,[R3,#0xC]LDR R3,[R11,#applet_no]CMP R2,R3BEQ loc_DFD0LDR R3,[R11,#sct]LDR R3,[R3]STR R3,[R11,#sct]loc_DFC0LDR R3,[R11,#sct]CMP R3,#0BNE loc_DFA0

Disassemble

Abinaryfile

WithKam1n0

3

LDR R3,[R11,#sct]LDR R2,[R3,#0xC]LDR R3,[R11,#applet_no]CMP R2,R3BEQ loc_DFD0LDR R3,[R11,#sct]LDR R3,[R3]STR R3,[R11,#sct]loc_DFC0LDR R3,[R11,#sct]CMP R3,#0BNE loc_DFA0

Commentedassemblyfunction

LDR R3,[R11,#sct]LDR R2,[R3,#0xC]LDR R3,[R11,#applet_no]CMP R2,R3BEQ loc_DFD0LDR R3,[R11,#sct]LDR R3,[R3]STR R3,[R11,#sct]loc_DFC0LDR R3,[R11,#sct]CMP R3,#0BNE loc_DFA0

Labeledlibraryfunction

TypeI:Exactclone

4

0x1FE69C0+ PUSHebp

0x1FE69C1+ MOVebp,esp

0x1FE69C3+ MOVecx,[ebp+arg_0]

0x1FE69C6+ PUSHebx

0x1FE69C7+ MOVebx,[ebp+arg_8]

0x1FE69CA+ PUSHesi

0x1FE69CB+ MOVesi,ecx

0x1FE69CD+ ANDecx,0FFFFh

0x1FE69D3+ SHResi,10h

0x1FE69D6+ CMPebx,1

0x1FE69D9+ +JNZloc_1FE6A0C

0x1FE69C0+ PUSHebp

0x1FE69C1+ MOVebp,esp

0x1FE69C3+ MOVecx,[ebp+arg_0]

0x1FE69C6+ PUSHebx

0x1FE69C7+ MOVebx,[ebp+arg_8]

0x1FE69CA+ PUSHesi

0x1FE69CB+ MOVesi,ecx

0x1FE69CD+ ANDecx,0FFFFh

0x1FE69D3+ SHResi,10h

0x1FE69D6+ CMPebx,1

0x1FE69D9+ +JNZloc_1FE6A0C

TypeII:Syntacticallyequivalent

5

0x1FE05B0+ PUSHebp

0x1FE05B1+ MOVebp,esp

0x1FE05B3+ MOVecx,[ebp+arg_0]

0x1FE05B6+ PUSHebx

0x1FE05B7+ MOVebx,[ebp+arg_8]

0x1FE05BA+ PUSHesi

0x1FE05BB+ MOVesi,ecx

0x1FE05BD+ ANDecx,0FFFFh

0x1FE05B3+ SHResi,10h

0x1FE05B6+ CMPebx,1

0x1FE05B9+ +JNZloc_1FE05BC

0x1FE69C0+ PUSHebp

0x1FE69C1+ MOVebp,esp

0x1FE69C3+ MOVeax,[ebp+msg_0]

0x1FE69C6+ PUSHedx

0x1FE69C7+ MOVedx,[ebp+msg_1]

0x1FE69CA+ PUSHesi

0x1FE69CB+ MOVesi,eax

0x1FE69CD+ ANDeax,0FFFFh

0x1FE69D3+ SHResi,10h

0x1FE69D6+ CMPedx,1

0x1FE69D9+ +JNZloc_1FE6A0C

TypeIII:Minormodification

6

0x1FE05B0+ PUSHebp

0x1FE05B1+ MOVebp,esp

+

+

0x1FE05B7+ MOVebx,[ebp+arg_8]

0x1FE05BA+ PUSHesi

0x1FE05BB+ MOVesi,ecx

0x1FE05BD+ ANDecx,0FFFFh

0x1FE05B3+ MOVeax,ecx

0x1FE05B6+ SHResi,10h

0x1FE05B9+ CMPebx,1

0x1FE05C1+ +JNZloc_1FE05BC

0x1FE69C0+ PUSHebp

0x1FE69C1+ MOVebp,esp

0x1FE69C3+ MOVeax,[ebp+msg_0]

0x1FE69C6+ PUSHedx

0x1FE69C7+ MOVedx,[ebp+msg_1]

0x1FE69CA+ PUSHesi

0x1FE69CB+ MOVesi,eax

0x1FE69CD+ ANDeax,0FFFFh

0x1FE69D3+ SHResi,10h

0x1FE69D6+ CMPedx,1

0x1FE69D9+ +JNZloc_1FE6A0C

originalclone7

Obfuscation and Optimization - Challenges

8

Obfuscation and Optimization - Problems

•  P1:Therelationshipsamongassemblytokens•  xmm0(SSE)registervs.SSEoperationssuchasmovaps•  fclosevs.fopen.•  strcpyvs.memcpy.

•  P2:Tokencombinationweights•  Reverseengineerslookfor‘interestingpattern’.(higherweight)•  Regular,random,orrepeatedpatternisnotinteresting.(lowerweight)

•  SoundsofamiliarinNLP!

9

Learning English

1)Thecat____onthemat.

A:foodB:satC:sittingD:isspeaking

10

Paragraph Vector (p2vec):

11

king–man+woman=queenbad-good=maniacal_killer*

* ExamplecollectedfromAndreasMueller@amuellerml

Asm2Vec:

12

T-SNE Visualization

13

T-SNE Visualization

14

Evaluation (Quantitative)

15

Evaluation (Quantitative)

16

Evaluation (Case Studies)

17

Vulnerability retrieval

Evaluation (Case Studies)

18

Asm2Vec (IEEE S&P19) +Againstobfuscationandoptimization.+Evenbetterthanthemostrecentdynamicapproach.+Staticapproach:efficientandscalable.-  Binarydiffering(interpretability?)-  Staticapproach:cannotrecognizejumptable,etc.-Assemblycodecomefromthesameprocessorfamily.

19

TheKam1n02.xBinaryAnalysisPlatform

20

Subgraphclone

21

Sym1n0

22

Thank you. Questions?

top related