Simseer.comMalware Similarity and Clustering Made Easy
Silvio Cesare <[email protected]>
Introduction• Simseer.com is a set of web services to analyse
malware using program structure as a signature.. Why?
• AV String signatures not very robust.
• Can’t detect ‘approximate’ matches.
• Hard to generate signature for an entire family.
• Program structure improves signature-based methods.
Who am I?
•Ph.D. Student at Deakin University.
•Presented at Ruxcon, Black Hat, AusCERT, etc.
•Published in academia.
•Book author
•Recently relocated to Canberra.
Outline
1. Introduction
2. Simseer.com’s Malware Services
3. Supporting Infrastructure
4. Other Services
5. Conclusion
Signatures
•In my other presentations.•Signature is based on ‘set of control flow
graphs’
Signature Extraction
•Transform ‘set of control flow graphs’ into a ‘feature vector’
•Decompilation + N-Grams
L_0
L_3
L_6
L_7L_1
L_2 L_4
L_5
true
true
true
true
true
W|IEH}Rproc(){L_0: while (v1 || v2) {L_1: if (v3) {L_2: } else {L_4: }L_5: }L_7: return;}
W|IEH}R
W|IE|IEHIEH}EH}R
Simseer
•Begin start of demo...
•A revamp of my existing http://www.FooCodeChu.com service.
•Submit an archive of malware samples.
•Results▫A similarity matrix comparing samples.▫An evolutionary tree showing relationships.
Submission Page
Results
Simseer
•Demo complete...
•Use ‘distance between vectors’ to show similarity.
•Visualize using phylogenetics software.
SimseerCluster• Begin demo...
• A new service.
• Submit an archive of malware samples.
• Define the number of clusters.
• Results▫ Samples grouped into clusters.▫ Cross checking samples with AV.▫ Identification of families.
Submission Page
Results
SimseerCluster
•Demo complete...
•Use ‘similarity matrix’ and ‘cosine similarity’.
•Pass to ‘cluster analysis software’ – The Weka Machine Learning Toolkit.
•Use Hierarchical clustering.
SimseerSearch• Begin demo...
• A new service.
• Submit a malware sample.
• Specify threshold of similarity.
• Results▫ All samples in database similar to query.▫ An AV report.▫ Heuristics to detect obfuscations (packing).
Submission Page
Results
SimseerSearch
•Demo complete...
•Use ‘nearest neighbour similarity search’ based on ‘Euclidean distance’.
•Packer detection based on entropy analysis.
q
Query Malicious
Query Benign
d(p,q)
p
r
Malware
Query
Supporting Infrastructure
Other Services
•Other services on the same infrastructure▫Clonewise▫Bugwise
Clonewise – Detecting embedded libraries.
Bugwise on real Debian Linux binaries
Future Work
•Integrate Cuckoo sandbox▫Unpacking with Volatility.▫Non EXE formats (PDF, DOC, etc).▫API Call classification (non signature-
based).
Conclusion
•Free services.
•Control flow better than traditional string signatures.
•Try it!
•http://www.simseer.com