hidden markov models for software piracy detection

24
Hidden Markov Models for Software Piracy Detection Shabana Kazi Mark Stamp HMMs for Piracy Detection 1

Upload: cooper-allison

Post on 02-Jan-2016

52 views

Category:

Documents


1 download

DESCRIPTION

Hidden Markov Models for Software Piracy Detection. Shabana Kazi Mark Stamp. Intro. Here, we apply metamorphic analysis to software piracy detection Very similar to techniques used in malware detection But, problem is completely different Has nothing to do with malware - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Hidden Markov Models for Software Piracy Detection

1

Hidden Markov Models for Software Piracy Detection

Shabana KaziMark Stamp

HMMs for Piracy Detection

Page 2: Hidden Markov Models for Software Piracy Detection

2

Intro

Here, we apply metamorphic analysis to software piracy detection

Very similar to techniques used in malware detectiono But, problem is completely different o Has nothing to do with malware

We show that there are other applications of such techniques

HMMs for Piracy Detection

Page 3: Hidden Markov Models for Software Piracy Detection

3

Software Piracy

Software piracy is major problemo By 2009 estimate, $3 to $4 lost to

piracy for every $1 in software sales Usually, piracy consists of taking

software without modification In some cases, software is modified

o Commercial theft of intellectual property

o Thief really doesn’t want to get caught… HMMs for Piracy Detection

Page 4: Hidden Markov Models for Software Piracy Detection

4

Software Piracy

We assume software is stoleno And modified, making it hard to detecto If completely rewritten from scratch, we

won’t detect it by our approach Want to make life hard for bad guys

o Ideally, major modifications required How much modification is need

before we cannot reliably detect?

HMMs for Piracy Detection

Page 5: Hidden Markov Models for Software Piracy Detection

5

Goals

Technique applicable to any software

No special effort by developero Nothing extra inserted into code

We only require access to exe file Not a watermarking scheme

o More like software “birthmark” analysis

Also not plagiarism detectiono Here, want a “deeper” analysis

HMMs for Piracy Detection

Page 6: Hidden Markov Models for Software Piracy Detection

6

Use Case

You work for Alice’s Software Companyo And you develop fancy software for

ASC Trudy’s Software Company (TSC)

develops suspiciously similar product

You suspect TSC of stealing your codeo Not identical, but seems similar

What can you do?o We’ve got some ideas that might

help…

HMMs for Piracy Detection

Page 7: Hidden Markov Models for Software Piracy Detection

7

Use Case

Using the technique discussed here Can easily measure code similarity Low similarity?

o Then no hope of proving code is stolen High similarity?

o Further (costly) analysis is warranted High similarity does not prove

stoleno But a good reason to take a closer

look HMMs for Piracy Detection

Page 8: Hidden Markov Models for Software Piracy Detection

8

Background

Metamorphic softwareo Metamorphic techniques (dead code,

permutation, substitution) HMM

o Basic ideas and notationo The 3 problems and their solutions

(discussed at a high level) We’ve seen all of this before

HMMs for Piracy Detection

Page 9: Hidden Markov Models for Software Piracy Detection

9

Overview Training and scoring Train HMM on slightly morphed

copies of given “base” softwareo Slight morphing to avoid overfitting

Score morphed copies and other fileso Here, morphing serves to simulate

modifications by attacker Want to know how much morphing

required before detection failsHMMs for Piracy Detection

Page 10: Hidden Markov Models for Software Piracy Detection

10

Metamorphic Generator

Built our own metamorphic generator

Morph based on extracted opcodeso Morphing consists of dead code

insertiono Specify a dead code percentage and

number of blocks to insert Do not require morphed code works

o Makes detection more difficult, not easier

o A worst-case scenario, detection-wiseHMMs for Piracy Detection

Page 11: Hidden Markov Models for Software Piracy Detection

11

Training

Given a base executable file… Extract its opcode sequence Generate 100 slightly morphed

copieso Each morphed 10%, using dead code

extracted from random “normal” file Train HMM on morphed copies

o Using 5-fold cross validationo Note: We train one model for each

“fold”HMMs for Piracy Detection

Page 12: Hidden Markov Models for Software Piracy Detection

12

Training Illustration of training process

o Slightly morphed copies of base program

HMMs for Piracy Detection

Page 13: Hidden Markov Models for Software Piracy Detection

13

Determine Threshold

For each of 5-foldso Train HMMo Score 20 morphed files (match set)

and 15 normal (nomatch set) Determine threshold based on

scoreso Threshold is highest score of normal

fileo Implies FPR = 0; equivalently, TNR =

1 (for the given “fold”)HMMs for Piracy Detection

Page 14: Hidden Markov Models for Software Piracy Detection

14

Setting a Threshold Process used to set threshold

HMMs for Piracy Detection

Page 15: Hidden Markov Models for Software Piracy Detection

15

Experiments

Want to determine robustness For each base file tested… Train to obtain HMM and threshold Morph base file at various

percentageso Using various morphing strategieso Refer to this morphing as tampering

Score each tampered copyo Classify, based on threshold

HMMs for Piracy Detection

Page 16: Hidden Markov Models for Software Piracy Detection

16

Experiments Scoring tampered files

HMMs for Piracy Detection

Page 17: Hidden Markov Models for Software Piracy Detection

17

Experiment Details For each

base fileo 6 modelso 10

tamper percent for each

o 100 files each

o So, 6000 scores!

HMMs for Piracy Detection

Page 18: Hidden Markov Models for Software Piracy Detection

18

Experiment Details Tested 10 base files, each data

pointo So 60,000 scores computed…

HMMs for Piracy Detection

Page 19: Hidden Markov Models for Software Piracy Detection

19

Experiment Details Repeated entire experiment 6

timeso Using different number of blocks in

training phaseo Training made little difference on

scoreso So, here we only give results where 1

block used in training phase In total 360,000 scores computed

o And 360 “models” generateo That is, 1800 HMMs (one per fold)

HMMs for Piracy Detection

Page 20: Hidden Markov Models for Software Piracy Detection

20

Results: Bar Graph

HMMs for Piracy Detection

Page 21: Hidden Markov Models for Software Piracy Detection

21

Results: 3-d Plot

HMMs for Piracy Detection

Page 22: Hidden Markov Models for Software Piracy Detection

22

Conclusions

Results look very promisingo Robust high degree of morphing

required before base file undetectedo Practical only requires exe, no

special effort when developingo Applies to any exe, at any time

Overall, strong software “birthmark” strategy with practical implications

HMMs for Piracy Detection

Page 23: Hidden Markov Models for Software Piracy Detection

23

Future Work

Statistical analysis somewhat weako Results may be stronger than it

appears Many other scores/combinations of

scores can be testedo Results can only get better

Consider other morphing techniqueso And other file types (e.g., bytecode)o And mitigations for 1-block morphing

HMMs for Piracy Detection

Page 24: Hidden Markov Models for Software Piracy Detection

24

References

S. Kazi and M. Stamp, Hidden Markov models for software piracy detection, Information Security Journal: A Global Perspective, 22:140-149, 2013

HMMs for Piracy Detection