metamorphic malware analysis and detection

Post on 08-Jun-2015

1.474 Views

Category:

Education

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

ABSTRACT : -------------------- Modern malware that are metamorphic or polymorphic in nature mutate their code by employing code obfuscation and encryption methods to thwart detection. Thus, conventional signature based scanners fail to detect these malware. In order to address the problems of detecting known variants of metamorphic malware, we propose a method using bioinformatics techniques effectively used for Protein and DNA matching. Instead of using exact signature matching methods, more sophisticated signature(s) are extracted using multiple sequence alignment (MSA). The results show that the proposed method is capable of identifying malware variants with minimum false alarms and misses. Also, the detection rate achieved with our proposed method is better compared to commercial antivirus products used in the study. Status: ---------- This work has been accepted by 8th IEEE International Conference on Innovations in Information Technology (Innovations'12). Link: ------- http://ieeexplore.ieee.org/xpl/login.jsp?reload=true&tp=&arnumber=6207739&url=http://ieeexplore.ieee.org/iel5/6203543/6207707/06207739.pdf?arnumber=6207739 e-mail: grijesh.mnit@gmail.com

TRANSCRIPT

Bioinformatics Techniques for Metamorphic Malware Analysis

and Detection

Malaviya National Institute of Technology, Jaipur

and Detection

Supervisors:

Dr. M. S. Gaur

Dr. V. Laxmi

By:

Grijesh Chauhan

(2009PCP116)

Outline

� Malware & Metamorphic malware

� Motivation

� Objective

� BioinformaticsTechniques� BioinformaticsTechniques

� MOMENTUM

� Dataset

� Result & Analysis

� References

Malaviya National Institute of Technology, Jaipur

Malware

� Malware are software with intentions to infect andreplicate.

� Threats

� Lossof data

Malaviya National Institute of Technology, Jaipur

� Lossof data

� Degrades computer system performance

� Identity threat

� Two broad categories

� Metamorphic: Virus body changes on each replication

� Polymorphic: Encrypts malicious payload to avoiddetection

Metamorphic Malware[1/2]

� Metamorphic malware have similarfunctionality, different structure and signature.

Malaviya National Institute of Technology, Jaipur

� Similar to genetic diversity in Biology.

Variant -1 Variant -2 Variant -3

Metamorphic Engine

Diagram depicts metamorphic malware variants with reordered code

Metamorphic Malware[1/2]

� Metamorphic Malware automatically re-codes itselfeach time it propagates or is distributed.

� Conventional signature based scanners areineffective for detecting variants of same malware.

Malaviya National Institute of Technology, Jaipur

� Sophisticated signature(s) are required to detectmetamorphic variants of malware.

Motivation

� Variants of metamorphic malware are generatedusing a small embeddedmetamorphic engine todefeat detection [2].

� Limited number of instructions are used to generate

Malaviya National Institute of Technology, Jaipur

variants so as to preserve functionality.

� Metamorphic malware like DNA/ protein sequencesmutate from generation to generation, they inheritfunctionality and some structural similarity withancestral malware.

Objective

� To devise a method for detection of metamorphicmalware and its variants.

� To extract the abstract signature(s) usingBioinformatics sequence alignment

Malaviya National Institute of Technology, Jaipur

� base code is preserved in different generations, obfuscatedusing junk code or equivalent instructions etc.

� To identify unseen malware samples using bestrepresentative signatures (group/single) of a family.

Sequence Alignment [1/2]

� Sequence alignment is a way of arrangingDNA/Protein sequences to identify regions ofsimilarity to infer functional, structural orevolutionary relationship.

Malaviya National Institute of Technology, Jaipur

� Alignment Methods

� Global Alignment - align sequences end to end.

� Local Alignment - align substring of one sequence withsubstring of other.

� Multiple Sequence Alignment (MSA) - align more thantwo sequences.

Sequence Alignment [2/2]

� Global alignmentL G P S S K Q T G K G S - S R I D N

L N - I T K S A G K G A I M R L D A

� Localalignment

Malaviya National Institute of Technology, Jaipur

� Localalignment- - - - - - T G - G - - - - - - -

- - - - - - A G K G - - - - - - -

� Alignment Parameter� Match

� Mismatch

� GapPoint of Mutation

Multiple Sequence Alignment� MSA is extension of pairwise alignment for more

than two sequences.

� It is used to identify conserved regions across agroup of sequences.

Malaviya National Institute of Technology, Jaipur

M1 M2 M3 M4 M5

add add add - add

- push push push push

Mov mov mov mov mov

- call jmp jz jmp

jmp jmp mov mov mov

• M i – ith Malware instance

Implementation of MSA

� MSA is implemented usingProgressive technique(ClustalW[9])

� Progressive MSA follows three steps:

� Determine similarity between each pair by pairwise

Malaviya National Institute of Technology, Jaipur

� Determine similarity between each pair by pairwisealignment.

� Construct aguided tree (Phylogenetic tree) to representevolutionary relationship.

� MSA is build by aligning closely related groups to mostdistant group according toguided tree.

Phylogenetic Tree

� Phylogenetic Tree depict evolutionary relationship among the sequences.

� To form groups of similar

viruses

Malaviya National Institute of Technology, Jaipur

viruses

� Guides MSA progressively

to align closer groups first

A B D F

E

( (E,(A,B)), (D,F) )

Similarity Measurement

� Alignment Score : Is the sum of score specifiedfor each aligned pair of mnemonics. Higher thescore more similar the sequences.

� Distance (d) : Calculated using followingformulas

Malaviya National Institute of Technology, Jaipur

formulas

Higher the distance more dissimilar the sequences

)#(#

#

matchmismatch

mismatchNd

+=

)##(# gapmatchmismatchLd ++=• Nd is Normalized distance, Ld is Levenshtein distance

Identification of Base Malware

� Base malware in a family is most similar to rest allwith highest sum of score using pairwise alignment(SoP[3]).

M1 M2 M3 M4 SoPM2

Malaviya National Institute of Technology, Jaipur

M1 - 7 -2 1 6

M2 7 - -3 0 4

M3 -2 -3 - 1 -4

M4 1 0 1 - 2

is Base Malware Score Matrix

M1

M3

M4

M2

M1

• M i – ith Malware instance

Implementation Method

� MetamOrphic Malware ExploratioN TechniqueUsing MSA (MOMENTUM) demonstrate theapplicability of Bioinformatics Techniques formetamorphic malware analysis and detection.

Malaviya National Institute of Technology, Jaipur

� Two phase of MOMENTUN are:

� Analysis of Metamorphism in Tools/Real Malware

� Signature Modelling and Testing

MOMENTUM [1/2]Metamorphic Families

(Virus Tools and Real Malware)

Intra-Family pair-wise Alignment

Malaviya National Institute of Technology, Jaipur

Distance Matrix Base file Alignments of twofiles

Metamorphic?Inter-Family pair-wise

Alignment

FamiliesOverlap ?

Obfuscation ?

• Flow diagram for metamorphism analysis

MOMENTUM [2/2]

Training Set Testing Set

Divide data set in two parts

Malaviya National Institute of Technology, Jaipur

Extract Group Signature

Testing with single and group signatures

Single Signature

Scan Logs

Threshold Threshold

• Diagram depicts Signature Modelling and Testing

MSA Signature� MSA signature (single signature) is a sequence of

preserved mnemonics in alignment.

M1 M2 M3 M4 M5 MSA Sign

push push - - push push

Mt

push

Malaviya National Institute of Technology, Jaipur

� Mnemonic that appears more than 50% in a rowis included in MSA signature.

- - jump jump jump jump

mov mov - lea xor

call call call call call call

push mov mov - mov mov

• M i – ith Malware instance and Mt – Test Sample

jump

lea

call

push

Group Signature

� Group signature is extracted from single signaturefor each subgroup.

� Sub groups are formed using evolutionary relationship.

� Single signature is extracted for each subgroup andcombinedin theform of wildcard.

Malaviya National Institute of Technology, Jaipur

combinedin theform of wildcard.

� DiagramSign1 Sign2 Sign3 Sign4 Sign5 Group Sign

push push - - push push

jz jz jump jump jump jump|jz

mov mov - lea xor mov|lea|xor

call call call call call call

- mov mov - push mov|push

• Signi – Signature for ith sub-group in a family

Mt

push

jz

lea

call

push

Threshold

Sign

0 B B M M Score

. . . . . .

Benign Malware

Malaviya National Institute of Technology, Jaipur

Threshold0 Bmin Bmax Mmin Mmax

Score

Where:Bmin Benign with minimum score

Bmax Benign with maximum score

Mmin Malware with minimum score

Mmax Malware with maximum score

Threshold (Bmax + Mmin) /2 , ( Threshold > Bmax )

Dataset [1/2]

Dataset Description:

Type Source #Family #instances

Synthetic NGVCK, PSMPC, G2,

MPCGEN46 1051

User Agencies

Malaviya National Institute of Technology, Jaipur

� * consists of unknown viruses (in test set).

� Dataset is equally divided into training andtesting set.

RealUser Agencies

52 + 1* 1209VxHeavens

Benign System32,Cygwin etc. 1 150

1*

Dataset [2/2]

� All samples are in Portable Executables (PE)format.

� Samples are unpacked using

� Dynamicunpacker(EtherUnpack[7] )

Malaviya National Institute of Technology, Jaipur

� Dynamicunpacker(EtherUnpack[7] )

� Signature based unpacker (GUNPacker [10])

� Malware families are created from combinedscanned results of 14 antiviruses.

� Benign samples are also scanned.

Result for Intra Family

0.05

0.1

0.15

0.2

0.25

0.3

Ave

rage

Dis

tanc

e

Global

Local

Levenshtein

Malaviya National Institute of Technology, Jaipur

� Non zero values indicates presence of metamorphism insynthetic data.

� Levenshtein distance is high due to junk code insertion.� Inspite of high values of global distance, local distances are

low in most of the samples. This indicates presence of similarregions in code.

0

NGVCK PSMPC G2 MPCGEN

• Average distance is between 0 to 1

Result for Inter Family

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Ave

rage

Dis

tnce

Global

Local

Levenshtein

Malaviya National Institute of Technology, Jaipur

� Distance is less than intra family distance. This indicatesmost of malware share some base code.

� Levenshtein distance is higher because of change infunctionality.

0

0.1

NGVCK PSMPC G2 MPCGEN VX HEAVENS

• Average distance is between 0 to 1

Comparative Analysis

VIRUS TYPEReplacements/

AlignmentAvg. SoD OBFUSCATION

NGVCK 47 1.03 Average Simple

G2 3 1.45 Low Simple

MPCGEN 31 0.61 Average Simple

Malaviya National Institute of Technology, Jaipur

MPCGEN 31 0.61 Average Simple

PSMPC 1 1.35 Low Weak

Vx-Heavens 122 8.3 Large Complex

� Viruses generated using tools belong to same family.� Families of real malware are distinct.� In PSMPCloop andjump instructions contribute for

obfuscation this increases the distance between samples.� NGVCK viruses overlaps with real malware (Savior).

• SoD – Sum of distances of a family with rest other family

Detection Results

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Eva

luat

ion

Met

rics

MSA Single

Group Signature

Malaviya National Institute of Technology, Jaipur

� 95.5% of malware is detected with MSA signature, detectionwith Group signature is 72.4% .

� 53% of benign is falsely detected as malware with MSAsignature due to loss mnemonics used for mutation inmalware.

� Group signature preserves point of mutation that is absent inbenign samples.

0

0.1

TPR FPR

MOMENTUM with Antiviruses

20

30

40

50

60

70

80

90

Det

ecti

on R

ate

Malaviya National Institute of Technology, Jaipur

� MOMENTUM (group signature) is found to be comparableto best ant-viruses.� Out of 35 undetected malware withantiviruses, MOMENTUM could detect 20 malware.

0

10

20

Scope for Improvement

� Instead of same mismatch score, computeweighted score for each pair of mnemonics usingfrequency of mismatches.

� In the alignment, operand part can be consideredto verify actualchanges(replacement/gap).

Malaviya National Institute of Technology, Jaipur

to verify actualchanges(replacement/gap).

� This can fetch the way morpher preservesfunctionality.

List of Publications[1] Vinod P., V.Laxmi, M.S.Gaur, Grijesh Chauhan

Detecting Malicious Files using Non-Signature based Methods,(To appear) Oxford Computer Journal.

[2] Vinod P., V.Laxmi, M.S.Gaur, Grijesh ChauhanMalware Detection using Non-Signature based Method, In

Malaviya National Institute of Technology, Jaipur

Malware Detection using Non-Signature based Method, InProceeding of IEEE International Conference on NetworkCommunication and Computer-ICNCC 2011, pp-427-43, DOI:978-1-4244-9551-1/11.

References[1] E.Karim, A.Walenstein, A.Lakhotia, “Malware Phylogeny using Permutation

of code”, In Proceedings of EICAR 2005, pp 167-174

[2] M.R. Chouchane and A. Lakhotia , “Using engine signature to detect metamorphic malware”, In Proceedings of the 4th ACM workshop on Recurring malcode, WORM '06, 2006,73-78.

Malaviya National Institute of Technology, Jaipur

[3] Mona Singh, " Multiple Sequence Alignment ", Lecture Notes:www.cs.princeton.edu/~mona/Lecture/msa1.pdf (Last viewed on 14-6-2011)

[4] Mona Singh, " Phylogenetics ", Lecture Notes:www.cs.princeton.edu/~mona/Lecture/msa1.pdf (Last viewed on 14-6-2011)

[5] T. Smith and M. Waterman, “Identification of Common Molecular Subsequences”, Journal of Molecular Biology, pp 195-197, 1987

[6] Mark Stamp, Wing Wong. "Hunting for metamorphic engines". Journal in Computer Virology, 2(3):211-229

References[7] Ether for Malware Unpacking: http://ether.gtisc.gatech.edu/malware.html

(Last viewed on 14-6-2011)

[8] Jian Li, Jun Xu, Ming Xu, HengiLi Zhao, Ning Zheng, “MalwareObfuscation Measuring via Evolutionary Similarity”, In Proceedings of IEEEInt. Conference on Future Information Network 2009.

Malaviya National Institute of Technology, Jaipur

[9] Larkin MA et al, " Clustal W and Clustal X version 2.0 ". Bioinformatics, 23, 2947-2948, 2007.

[10] GUnPacker : http://www.woodmann.com/collaborative/tools/index.php/GUnPacker(Last viewed on 14-6-2011)

Thanks!

Malaviya National Institute of Technology, Jaipur

top related