metamorphic malware analysis and detection

32
Bioinformatics Techniques for Metamorphic Malware Analysis and Detection Malaviya National Institute of Technology, Jaipur and Detection Supervisors: Dr. M. S. Gaur Dr. V. Laxmi By: Grijesh Chauhan (2009PCP116)

Upload: grijesh-chauhan

Post on 08-Jun-2015

1.472 views

Category:

Education


1 download

DESCRIPTION

ABSTRACT : -------------------- Modern malware that are metamorphic or polymorphic in nature mutate their code by employing code obfuscation and encryption methods to thwart detection. Thus, conventional signature based scanners fail to detect these malware. In order to address the problems of detecting known variants of metamorphic malware, we propose a method using bioinformatics techniques effectively used for Protein and DNA matching. Instead of using exact signature matching methods, more sophisticated signature(s) are extracted using multiple sequence alignment (MSA). The results show that the proposed method is capable of identifying malware variants with minimum false alarms and misses. Also, the detection rate achieved with our proposed method is better compared to commercial antivirus products used in the study. Status: ---------- This work has been accepted by 8th IEEE International Conference on Innovations in Information Technology (Innovations'12). Link: ------- http://ieeexplore.ieee.org/xpl/login.jsp?reload=true&tp=&arnumber=6207739&url=http://ieeexplore.ieee.org/iel5/6203543/6207707/06207739.pdf?arnumber=6207739 e-mail: [email protected]

TRANSCRIPT

Page 1: Metamorphic Malware Analysis and Detection

Bioinformatics Techniques for Metamorphic Malware Analysis

and Detection

Malaviya National Institute of Technology, Jaipur

and Detection

Supervisors:

Dr. M. S. Gaur

Dr. V. Laxmi

By:

Grijesh Chauhan

(2009PCP116)

Page 2: Metamorphic Malware Analysis and Detection

Outline

� Malware & Metamorphic malware

� Motivation

� Objective

� BioinformaticsTechniques� BioinformaticsTechniques

� MOMENTUM

� Dataset

� Result & Analysis

� References

Malaviya National Institute of Technology, Jaipur

Page 3: Metamorphic Malware Analysis and Detection

Malware

� Malware are software with intentions to infect andreplicate.

� Threats

� Lossof data

Malaviya National Institute of Technology, Jaipur

� Lossof data

� Degrades computer system performance

� Identity threat

� Two broad categories

� Metamorphic: Virus body changes on each replication

� Polymorphic: Encrypts malicious payload to avoiddetection

Page 4: Metamorphic Malware Analysis and Detection

Metamorphic Malware[1/2]

� Metamorphic malware have similarfunctionality, different structure and signature.

Malaviya National Institute of Technology, Jaipur

� Similar to genetic diversity in Biology.

Variant -1 Variant -2 Variant -3

Metamorphic Engine

Diagram depicts metamorphic malware variants with reordered code

Page 5: Metamorphic Malware Analysis and Detection

Metamorphic Malware[1/2]

� Metamorphic Malware automatically re-codes itselfeach time it propagates or is distributed.

� Conventional signature based scanners areineffective for detecting variants of same malware.

Malaviya National Institute of Technology, Jaipur

� Sophisticated signature(s) are required to detectmetamorphic variants of malware.

Page 6: Metamorphic Malware Analysis and Detection

Motivation

� Variants of metamorphic malware are generatedusing a small embeddedmetamorphic engine todefeat detection [2].

� Limited number of instructions are used to generate

Malaviya National Institute of Technology, Jaipur

variants so as to preserve functionality.

� Metamorphic malware like DNA/ protein sequencesmutate from generation to generation, they inheritfunctionality and some structural similarity withancestral malware.

Page 7: Metamorphic Malware Analysis and Detection

Objective

� To devise a method for detection of metamorphicmalware and its variants.

� To extract the abstract signature(s) usingBioinformatics sequence alignment

Malaviya National Institute of Technology, Jaipur

� base code is preserved in different generations, obfuscatedusing junk code or equivalent instructions etc.

� To identify unseen malware samples using bestrepresentative signatures (group/single) of a family.

Page 8: Metamorphic Malware Analysis and Detection

Sequence Alignment [1/2]

� Sequence alignment is a way of arrangingDNA/Protein sequences to identify regions ofsimilarity to infer functional, structural orevolutionary relationship.

Malaviya National Institute of Technology, Jaipur

� Alignment Methods

� Global Alignment - align sequences end to end.

� Local Alignment - align substring of one sequence withsubstring of other.

� Multiple Sequence Alignment (MSA) - align more thantwo sequences.

Page 9: Metamorphic Malware Analysis and Detection

Sequence Alignment [2/2]

� Global alignmentL G P S S K Q T G K G S - S R I D N

L N - I T K S A G K G A I M R L D A

� Localalignment

Malaviya National Institute of Technology, Jaipur

� Localalignment- - - - - - T G - G - - - - - - -

- - - - - - A G K G - - - - - - -

� Alignment Parameter� Match

� Mismatch

� GapPoint of Mutation

Page 10: Metamorphic Malware Analysis and Detection

Multiple Sequence Alignment� MSA is extension of pairwise alignment for more

than two sequences.

� It is used to identify conserved regions across agroup of sequences.

Malaviya National Institute of Technology, Jaipur

M1 M2 M3 M4 M5

add add add - add

- push push push push

Mov mov mov mov mov

- call jmp jz jmp

jmp jmp mov mov mov

• M i – ith Malware instance

Page 11: Metamorphic Malware Analysis and Detection

Implementation of MSA

� MSA is implemented usingProgressive technique(ClustalW[9])

� Progressive MSA follows three steps:

� Determine similarity between each pair by pairwise

Malaviya National Institute of Technology, Jaipur

� Determine similarity between each pair by pairwisealignment.

� Construct aguided tree (Phylogenetic tree) to representevolutionary relationship.

� MSA is build by aligning closely related groups to mostdistant group according toguided tree.

Page 12: Metamorphic Malware Analysis and Detection

Phylogenetic Tree

� Phylogenetic Tree depict evolutionary relationship among the sequences.

� To form groups of similar

viruses

Malaviya National Institute of Technology, Jaipur

viruses

� Guides MSA progressively

to align closer groups first

A B D F

E

( (E,(A,B)), (D,F) )

Page 13: Metamorphic Malware Analysis and Detection

Similarity Measurement

� Alignment Score : Is the sum of score specifiedfor each aligned pair of mnemonics. Higher thescore more similar the sequences.

� Distance (d) : Calculated using followingformulas

Malaviya National Institute of Technology, Jaipur

formulas

Higher the distance more dissimilar the sequences

)#(#

#

matchmismatch

mismatchNd

+=

)##(# gapmatchmismatchLd ++=• Nd is Normalized distance, Ld is Levenshtein distance

Page 14: Metamorphic Malware Analysis and Detection

Identification of Base Malware

� Base malware in a family is most similar to rest allwith highest sum of score using pairwise alignment(SoP[3]).

M1 M2 M3 M4 SoPM2

Malaviya National Institute of Technology, Jaipur

M1 - 7 -2 1 6

M2 7 - -3 0 4

M3 -2 -3 - 1 -4

M4 1 0 1 - 2

is Base Malware Score Matrix

M1

M3

M4

M2

M1

• M i – ith Malware instance

Page 15: Metamorphic Malware Analysis and Detection

Implementation Method

� MetamOrphic Malware ExploratioN TechniqueUsing MSA (MOMENTUM) demonstrate theapplicability of Bioinformatics Techniques formetamorphic malware analysis and detection.

Malaviya National Institute of Technology, Jaipur

� Two phase of MOMENTUN are:

� Analysis of Metamorphism in Tools/Real Malware

� Signature Modelling and Testing

Page 16: Metamorphic Malware Analysis and Detection

MOMENTUM [1/2]Metamorphic Families

(Virus Tools and Real Malware)

Intra-Family pair-wise Alignment

Malaviya National Institute of Technology, Jaipur

Distance Matrix Base file Alignments of twofiles

Metamorphic?Inter-Family pair-wise

Alignment

FamiliesOverlap ?

Obfuscation ?

• Flow diagram for metamorphism analysis

Page 17: Metamorphic Malware Analysis and Detection

MOMENTUM [2/2]

Training Set Testing Set

Divide data set in two parts

Malaviya National Institute of Technology, Jaipur

Extract Group Signature

Testing with single and group signatures

Single Signature

Scan Logs

Threshold Threshold

• Diagram depicts Signature Modelling and Testing

Page 18: Metamorphic Malware Analysis and Detection

MSA Signature� MSA signature (single signature) is a sequence of

preserved mnemonics in alignment.

M1 M2 M3 M4 M5 MSA Sign

push push - - push push

Mt

push

Malaviya National Institute of Technology, Jaipur

� Mnemonic that appears more than 50% in a rowis included in MSA signature.

- - jump jump jump jump

mov mov - lea xor

call call call call call call

push mov mov - mov mov

• M i – ith Malware instance and Mt – Test Sample

jump

lea

call

push

Page 19: Metamorphic Malware Analysis and Detection

Group Signature

� Group signature is extracted from single signaturefor each subgroup.

� Sub groups are formed using evolutionary relationship.

� Single signature is extracted for each subgroup andcombinedin theform of wildcard.

Malaviya National Institute of Technology, Jaipur

combinedin theform of wildcard.

� DiagramSign1 Sign2 Sign3 Sign4 Sign5 Group Sign

push push - - push push

jz jz jump jump jump jump|jz

mov mov - lea xor mov|lea|xor

call call call call call call

- mov mov - push mov|push

• Signi – Signature for ith sub-group in a family

Mt

push

jz

lea

call

push

Page 20: Metamorphic Malware Analysis and Detection

Threshold

Sign

0 B B M M Score

. . . . . .

Benign Malware

Malaviya National Institute of Technology, Jaipur

Threshold0 Bmin Bmax Mmin Mmax

Score

Where:Bmin Benign with minimum score

Bmax Benign with maximum score

Mmin Malware with minimum score

Mmax Malware with maximum score

Threshold (Bmax + Mmin) /2 , ( Threshold > Bmax )

Page 21: Metamorphic Malware Analysis and Detection

Dataset [1/2]

Dataset Description:

Type Source #Family #instances

Synthetic NGVCK, PSMPC, G2,

MPCGEN46 1051

User Agencies

Malaviya National Institute of Technology, Jaipur

� * consists of unknown viruses (in test set).

� Dataset is equally divided into training andtesting set.

RealUser Agencies

52 + 1* 1209VxHeavens

Benign System32,Cygwin etc. 1 150

1*

Page 22: Metamorphic Malware Analysis and Detection

Dataset [2/2]

� All samples are in Portable Executables (PE)format.

� Samples are unpacked using

� Dynamicunpacker(EtherUnpack[7] )

Malaviya National Institute of Technology, Jaipur

� Dynamicunpacker(EtherUnpack[7] )

� Signature based unpacker (GUNPacker [10])

� Malware families are created from combinedscanned results of 14 antiviruses.

� Benign samples are also scanned.

Page 23: Metamorphic Malware Analysis and Detection

Result for Intra Family

0.05

0.1

0.15

0.2

0.25

0.3

Ave

rage

Dis

tanc

e

Global

Local

Levenshtein

Malaviya National Institute of Technology, Jaipur

� Non zero values indicates presence of metamorphism insynthetic data.

� Levenshtein distance is high due to junk code insertion.� Inspite of high values of global distance, local distances are

low in most of the samples. This indicates presence of similarregions in code.

0

NGVCK PSMPC G2 MPCGEN

• Average distance is between 0 to 1

Page 24: Metamorphic Malware Analysis and Detection

Result for Inter Family

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Ave

rage

Dis

tnce

Global

Local

Levenshtein

Malaviya National Institute of Technology, Jaipur

� Distance is less than intra family distance. This indicatesmost of malware share some base code.

� Levenshtein distance is higher because of change infunctionality.

0

0.1

NGVCK PSMPC G2 MPCGEN VX HEAVENS

• Average distance is between 0 to 1

Page 25: Metamorphic Malware Analysis and Detection

Comparative Analysis

VIRUS TYPEReplacements/

AlignmentAvg. SoD OBFUSCATION

NGVCK 47 1.03 Average Simple

G2 3 1.45 Low Simple

MPCGEN 31 0.61 Average Simple

Malaviya National Institute of Technology, Jaipur

MPCGEN 31 0.61 Average Simple

PSMPC 1 1.35 Low Weak

Vx-Heavens 122 8.3 Large Complex

� Viruses generated using tools belong to same family.� Families of real malware are distinct.� In PSMPCloop andjump instructions contribute for

obfuscation this increases the distance between samples.� NGVCK viruses overlaps with real malware (Savior).

• SoD – Sum of distances of a family with rest other family

Page 26: Metamorphic Malware Analysis and Detection

Detection Results

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Eva

luat

ion

Met

rics

MSA Single

Group Signature

Malaviya National Institute of Technology, Jaipur

� 95.5% of malware is detected with MSA signature, detectionwith Group signature is 72.4% .

� 53% of benign is falsely detected as malware with MSAsignature due to loss mnemonics used for mutation inmalware.

� Group signature preserves point of mutation that is absent inbenign samples.

0

0.1

TPR FPR

Page 27: Metamorphic Malware Analysis and Detection

MOMENTUM with Antiviruses

20

30

40

50

60

70

80

90

Det

ecti

on R

ate

Malaviya National Institute of Technology, Jaipur

� MOMENTUM (group signature) is found to be comparableto best ant-viruses.� Out of 35 undetected malware withantiviruses, MOMENTUM could detect 20 malware.

0

10

20

Page 28: Metamorphic Malware Analysis and Detection

Scope for Improvement

� Instead of same mismatch score, computeweighted score for each pair of mnemonics usingfrequency of mismatches.

� In the alignment, operand part can be consideredto verify actualchanges(replacement/gap).

Malaviya National Institute of Technology, Jaipur

to verify actualchanges(replacement/gap).

� This can fetch the way morpher preservesfunctionality.

Page 29: Metamorphic Malware Analysis and Detection

List of Publications[1] Vinod P., V.Laxmi, M.S.Gaur, Grijesh Chauhan

Detecting Malicious Files using Non-Signature based Methods,(To appear) Oxford Computer Journal.

[2] Vinod P., V.Laxmi, M.S.Gaur, Grijesh ChauhanMalware Detection using Non-Signature based Method, In

Malaviya National Institute of Technology, Jaipur

Malware Detection using Non-Signature based Method, InProceeding of IEEE International Conference on NetworkCommunication and Computer-ICNCC 2011, pp-427-43, DOI:978-1-4244-9551-1/11.

Page 30: Metamorphic Malware Analysis and Detection

References[1] E.Karim, A.Walenstein, A.Lakhotia, “Malware Phylogeny using Permutation

of code”, In Proceedings of EICAR 2005, pp 167-174

[2] M.R. Chouchane and A. Lakhotia , “Using engine signature to detect metamorphic malware”, In Proceedings of the 4th ACM workshop on Recurring malcode, WORM '06, 2006,73-78.

Malaviya National Institute of Technology, Jaipur

[3] Mona Singh, " Multiple Sequence Alignment ", Lecture Notes:www.cs.princeton.edu/~mona/Lecture/msa1.pdf (Last viewed on 14-6-2011)

[4] Mona Singh, " Phylogenetics ", Lecture Notes:www.cs.princeton.edu/~mona/Lecture/msa1.pdf (Last viewed on 14-6-2011)

[5] T. Smith and M. Waterman, “Identification of Common Molecular Subsequences”, Journal of Molecular Biology, pp 195-197, 1987

[6] Mark Stamp, Wing Wong. "Hunting for metamorphic engines". Journal in Computer Virology, 2(3):211-229

Page 31: Metamorphic Malware Analysis and Detection

References[7] Ether for Malware Unpacking: http://ether.gtisc.gatech.edu/malware.html

(Last viewed on 14-6-2011)

[8] Jian Li, Jun Xu, Ming Xu, HengiLi Zhao, Ning Zheng, “MalwareObfuscation Measuring via Evolutionary Similarity”, In Proceedings of IEEEInt. Conference on Future Information Network 2009.

Malaviya National Institute of Technology, Jaipur

[9] Larkin MA et al, " Clustal W and Clustal X version 2.0 ". Bioinformatics, 23, 2947-2948, 2007.

[10] GUnPacker : http://www.woodmann.com/collaborative/tools/index.php/GUnPacker(Last viewed on 14-6-2011)

Page 32: Metamorphic Malware Analysis and Detection

Thanks!

Malaviya National Institute of Technology, Jaipur