phmms for metamorphic detection mark stamp 1phmms for metamorphic detection
TRANSCRIPT
PHMMs for Metamorphic Detection 1
PHMMs for Metamorphic Detection
Mark Stamp
PHMMs for Metamorphic Detection 2
Viruses
Viruses and worms --- types of malware Various definitions are used For our purposes, “virus” used generically
How to detect malware? Signature detection used most often
In simplest form, search for a string of bits found in the malware
Could also include wildcards, heuristics, etc.
PHMMs for Metamorphic Detection 3
Metamorphic Viruses
Metamorphic viruses change “shape” For each instance, internal structure changes But function stays the same If the change is sufficient, signature detection
fails In principle, metamorphic malware among
most difficult to detect But, not too many have been seen in the wild Why not???
PHMMs for Metamorphic Detection 4
Metamorphic Detection
How to detect metamorphic malware?
Previous research: HMMs are effective Train model on opcodes extracted from
metamorphic “family” viruses Determine a threshold score Then, to score an unknown exe, extract
opcodes and score against the model
PHMMs for Metamorphic Detection 5
Profile HMM
Standard HMM does not take positional information into account
Profile HMM analogous to defining HMM at each position in a sequence Position info is taken into account
So, PHMM uses more information This might yield stronger models
PHMMs for Metamorphic Detection 6
PHMMs
Will PHMM outperform HMM? Possible advantage of PHMM
Uses more information… …since position within sequence is
taken into account Possible disadvantages of PHMM
More complex, more costly to compute Might overfit the data “More” is not always “better”
PHMMs for Metamorphic Detection 7
The Plan
1. Extract opcodes from metamorphic family viruses
2. Pairwise align opcode sequences3. Generate multiple sequence
alignment (MSA) from pairwise alignments
4. Generate PHMM from MSA5. Determine threshold, error rates
PHMMs for Metamorphic Detection 8
Metamorphic Techniques
Morphing usually applied at asm level Many techniques can be used, such as… Equivalent code substitution
Register swap Different code, same function
Garbage code/dead code insertion Code reordering
Subroutine reordering Arbitrary reordering using jumps
PHMMs for Metamorphic Detection 9
Metamorphic Techniques
Opaque predicates “Conditional” that isn’t
By combining several techniques, can get achieve desired effect Metamorphism sufficient to break
signature detection Function of code remains unchanged
PHMMs for Metamorphic Detection 10
Metamorphic Example Original code
Morphed version 2
Morphed version 1
PHMMs for Metamorphic Detection 11
Metamorphic Viruses
Real-world metamorphic viruses
PHMMs for Metamorphic Detection 12
Virus Construction Kits
Construction kits --- anyone can easily build (metamorphic) malware
First 2 are not very metamorphic But, NGVCK is highly metamorphic
So, we consider NGVCK here
PHMMs for Metamorphic Detection 13
AV Techniques
Signature detection is most popular So, of course, virus writers want to
evade signature detection Metamorphism can provide strong
defense against signature detection
PHMMs for Metamorphic Detection 14
HMMs
See previous presentation
PHMMs for Metamorphic Detection 15
PHMMs
See previous presentation
PHMMs for Metamorphic Detection 16
PHMMs
PHMMs are designed to deal with biological sequences
Goal is to find evidence that sequences related by mutation and selection
Basic processes usually considered are Substitution --- subsequence replaced Insertion --- subsequence inserted Deletion --- subsequence removed
PHMMs for Metamorphic Detection 17
PHMMs and Computer Viruses
The same basic processes can occur in metamorphic viruses That is, substitution, insertion, deletion
But also have to deal with Permutation --- re-ordering of sequence Metamorphics may do lots of permuting
Permutation can be viewed as series of insertions/deletions But “close” sequences might be “far” apart
PHMMs for Metamorphic Detection 18
Permutation and Alignment
Permutations are problematic…
How to deal with this? Maybe we can pre-process sequences
But, adds complexity and cost More about this later
PHMMs for Metamorphic Detection 19
Test Data
Virus construction kits from VX Heavens
We generated the following viruses 10 VCL32 viruses 30 MS-MPC viruses 200 NGVCK viruses
Also, 40 cygwin utilities These serve as “normal” files
PHMMs for Metamorphic Detection 20
NGVCK Pairwise Alignment
Align two NGVCK opcode sequences
This looks reasonable
PHMMs for Metamorphic Detection 21
Gap Percentages
Recall, with PHMM, the more gaps, the weaker the model
MSAs for metamorphic viruses
But, VCL32 based on 5 files, PS-MPC based on 10, NGVCK based on 20 files
PHMMs for Metamorphic Detection 22
VCL32
Using five VCL32 viruses… Generate pairwise alignments Generate MSA Then generate PHMM
PHMM has 1820 states Can’t show the whole model here
So, next slides give 3 states, 126,127,128
PHMMs for Metamorphic Detection 23
VCL32 Transition Probabilities
State transition probabilities The A matrix for states 126,127,128
PHMMs for Metamorphic Detection 24
VCL32 Emission Probabilities
Emission probabilities The E matrix States 126,127,128
Emissions only for match, insert states “Add-one” rule was
used here
PHMMs for Metamorphic Detection 25
Results Typical PHMM results for VCL32
Can set threshold for 100% detection It doesn’t get any better than that!
PHMMs for Metamorphic Detection 26
Results
Typical MS-MPC results using PHMM
Again, perfect detection
PHMMs for Metamorphic Detection 27
Results
But, VCL32 and MS-MPC are easy cases Not very metamorphic Probably detectable using signatures
In contrast, NGVCK highly metamorphic
So, NGVCK is the important test See next slides
PHMMs for Metamorphic Detection 28
Results
Typical results for NGVCK
Note that normal files score higher than NGVCK!
This is bad!
PHMMs for Metamorphic Detection 29
Pre-Processing
For NGVCK, is there any hope? Can try pre-processing Goal is to undo some of the effect of
permutation Able to reduce gap percentage in
MSA Before, gap percentage was 88.3% After, gap percentage is 44.9% Big improvement, but is it big enough?
PHMMs for Metamorphic Detection 30
Results
NGVCK with pre-processing Much better,
but not good enough
Error rate is still significant
PHMMs for Metamorphic Detection 31
Conclusions
HMMs developed in 1960s Standard machine learning technique Many applications
PHMMs relatively recent Developed for biological applications Here, a novel application of PHMMs
100% detection for some examples… …poor detection for others
PHMMs for Metamorphic Detection 32
Possible Improvements
Improved pre-processing To better account for permutation
Local alignment For example, align subroutines
Baum-Welch re-estimation of PHMM obtained from MSA
Other???
PHMMs for Metamorphic Detection 33
Last Word
Very trendy to apply biological analogies to information security
On the one hand… Results here provide evidence supporting
trend of looking to biological analogies On the other hand…
Results here are “cautionary tale against applying biological analogies too literally”
PHMMs for Metamorphic Detection 34
References
Profile hidden Markov models for metamorphic virus detection, S. Attaluri, S. McGhee and M. Stamp, Journal in Computer Virology, Vol. 5, No. 2, May 2009, pp. 151-169
Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids, Durbin, et al