from nlp to mlpcomsec.spb.ru/mmm-acns10/doc/mmm-acns2010/session4/02_te… · • nlp: • extract...

23
From NLP to MLP Peter Teufl/Udo Payer/Günther Lackner Graz University of Technology, Austria

Upload: others

Post on 22-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: From NLP to MLPcomsec.spb.ru/mmm-acns10/doc/MMM-ACNS2010/Session4/02_Te… · • NLP: • Extract sentences, analyze terms, find relations (e.g. Stanford parser) • This beautiful_A

From NLPto MLP

Peter Teufl/Udo Payer/Günther LacknerGraz University of Technology, Austria

Page 2: From NLP to MLPcomsec.spb.ru/mmm-acns10/doc/MMM-ACNS2010/Session4/02_Te… · • NLP: • Extract sentences, analyze terms, find relations (e.g. Stanford parser) • This beautiful_A

What to expect...

• Knowledge mining in various areas:

• Event correlation, RDF data analysis, WIFI privacy, malware analysis

• ... and e-Participation

• Common model based on ML and AI

LemmatizationPOS(C) Tagging

Lexical Parser

Activation Patterns AnalysisMLP vs.

NLPIntro Idea

Page 3: From NLP to MLPcomsec.spb.ru/mmm-acns10/doc/MMM-ACNS2010/Session4/02_Te… · • NLP: • Extract sentences, analyze terms, find relations (e.g. Stanford parser) • This beautiful_A

e-Participation

• Natural Language Processing (NLP)

• Analysis of text

• Clustering

• Relations

• Semantic aware search

LemmatizationPOS(C) Tagging

Lexical Parser

Activation Patterns AnalysisMLP vs.

NLPIntro Idea

Page 4: From NLP to MLPcomsec.spb.ru/mmm-acns10/doc/MMM-ACNS2010/Session4/02_Te… · • NLP: • Extract sentences, analyze terms, find relations (e.g. Stanford parser) • This beautiful_A

Idea...

• In previous work, analysis of polymorphic shellcodes

• Neural networks for the detection of shellcode decryption loops

• Code vs. natural language?

LemmatizationPOS(C) Tagging

Lexical Parser

Activation Patterns AnalysisMLP vs.

NLPIntro Idea

Page 5: From NLP to MLPcomsec.spb.ru/mmm-acns10/doc/MMM-ACNS2010/Session4/02_Te… · • NLP: • Extract sentences, analyze terms, find relations (e.g. Stanford parser) • This beautiful_A

NLP• Natural Language Processing

• Various techniques are available

• Sentence analysis

• POS tagging

• Word sense disambiguation

LemmatizationPOS(C) Tagging

Lexical Parser

Activation Patterns AnalysisIntro Idea MLP vs.

NLP

Page 6: From NLP to MLPcomsec.spb.ru/mmm-acns10/doc/MMM-ACNS2010/Session4/02_Te… · • NLP: • Extract sentences, analyze terms, find relations (e.g. Stanford parser) • This beautiful_A

MLP• Machine Language Processing

• In this talk: focus on assembler, but applicable to any other language

• Assuming that machine language has similar properties as real language

• Semantics, grammar, (word/instruction) disambiguation

LemmatizationPOS(C) Tagging

Lexical Parser

Activation Patterns AnalysisIntro Idea MLP vs.

NLP

Page 7: From NLP to MLPcomsec.spb.ru/mmm-acns10/doc/MMM-ACNS2010/Session4/02_Te… · • NLP: • Extract sentences, analyze terms, find relations (e.g. Stanford parser) • This beautiful_A

MLP• Malware: Signatures? Understanding?

• Fuzzy analysis

• Group code with similar behavior

• Semantic search for similar code

• Semantic relations within code

LemmatizationPOS(C) Tagging

Lexical Parser

Activation Patterns AnalysisIntro Idea MLP vs.

NLP

Page 8: From NLP to MLPcomsec.spb.ru/mmm-acns10/doc/MMM-ACNS2010/Session4/02_Te… · • NLP: • Extract sentences, analyze terms, find relations (e.g. Stanford parser) • This beautiful_A

Goal

• Can we map NLP to MLP???

• NLP research highly developed, mature frameworks, techniques

• Problems in language analysis easier to spot, since we know our language very well

LemmatizationPOS(C) Tagging

Lexical Parser

Activation Patterns AnalysisIntro Idea MLP vs.

NLP

MLP vs. NLP

Page 9: From NLP to MLPcomsec.spb.ru/mmm-acns10/doc/MMM-ACNS2010/Session4/02_Te… · • NLP: • Extract sentences, analyze terms, find relations (e.g. Stanford parser) • This beautiful_A

Layers Lexical Parser

POS Tagging

POS Filter

Lemmatization

Semantic Network Generation

Activation Pattern Generation

Unsupervised Analysis Semantic Search

NLP

Semantic Relations

External Knowledge

Emulator/Disassembler

POC Tagging

POC Filter

Lemmatization

MLP

NLP/MLP Processing

Semantic Processing

Analysis

Supervised Analysis

LemmatizationPOS(C) Tagging

Lexical Parser

Activation Patterns AnalysisIntro Idea MLP vs.

NLP

Page 10: From NLP to MLPcomsec.spb.ru/mmm-acns10/doc/MMM-ACNS2010/Session4/02_Te… · • NLP: • Extract sentences, analyze terms, find relations (e.g. Stanford parser) • This beautiful_A

Lexical parser

• NLP:

• Extract sentences, analyze terms, find relations (e.g. Stanford parser)

• This beautiful_A city_N is_V called_V St. Petersburg_SN.

LemmatizationPOS(C) Tagging

Activation Patterns AnalysisMLP vs.

NLPIntro Idea Lexical Parser

Page 11: From NLP to MLPcomsec.spb.ru/mmm-acns10/doc/MMM-ACNS2010/Session4/02_Te… · • NLP: • Extract sentences, analyze terms, find relations (e.g. Stanford parser) • This beautiful_A

Lexical parser

• MLP:

• instruction sequences (mov, sub, xor...)

• relations between typical instructions (modifying the same register, variable)

• e.g. interrupt preparation, loops e.g.

LemmatizationPOS(C) Tagging

Activation Patterns AnalysisMLP vs.

NLPIntro Idea Lexical Parser

Page 12: From NLP to MLPcomsec.spb.ru/mmm-acns10/doc/MMM-ACNS2010/Session4/02_Te… · • NLP: • Extract sentences, analyze terms, find relations (e.g. Stanford parser) • This beautiful_A

POS tagging, filter

• NLP:

• This beautiful_A city_N is_V called_V St. Petersburg_SN.

• Tagging terms

• Filtering (stop words etc.): beautiful, city, St. Petersburg

LemmatizationLexical Parser

Activation Patterns AnalysisMLP vs.

NLPIntro Idea POS(C) Tagging

Page 13: From NLP to MLPcomsec.spb.ru/mmm-acns10/doc/MMM-ACNS2010/Session4/02_Te… · • NLP: • Extract sentences, analyze terms, find relations (e.g. Stanford parser) • This beautiful_A

POC tagging, filter

• MLP:

• jmp, call, jz... (branch type)

• add, sub... (arithmetic)

• xor... (logic)

LemmatizationLexical Parser

Activation Patterns AnalysisMLP vs.

NLPIntro Idea POS(C) Tagging

Page 14: From NLP to MLPcomsec.spb.ru/mmm-acns10/doc/MMM-ACNS2010/Session4/02_Te… · • NLP: • Extract sentences, analyze terms, find relations (e.g. Stanford parser) • This beautiful_A

Lemmatization• NLP:

• bought => buy

• cars => car

• MLP:

• drop parameters (mov ax,4)

• group instructions (arithmetic, logic)

POS(C) Tagging

Lexical Parser

Activation Patterns AnalysisMLP vs.

NLPIntro Idea Lemmatization

Page 15: From NLP to MLPcomsec.spb.ru/mmm-acns10/doc/MMM-ACNS2010/Session4/02_Te… · • NLP: • Extract sentences, analyze terms, find relations (e.g. Stanford parser) • This beautiful_A

Activation Patterns

• Based on semantic networks, spreading activation, machine learning

• Allows us to analyze arbitrary combination of features (symbolic, real values)

• Patterns are the basis for a wide range of analysis methods

UDP

TCP

TELNET

DNS

Error Rate cluster 1e.g. low error rate

Error Rate cluster 2e.g. high error rate

Byte occurence cluster 1 (e.g. typical for plain text

traffic)

Byte occurence cluster 2

e.g. typical for encrypted traffic)

SSH

RGNG map for error rates

RGNG map for byte histograms

Symbolic values for layer 7 protocols

Symbolic values for layer 4 protocols

RGNG Map

RGNG Map

Direct integration of the features

Direct integration of the features

Activation Patterns

POS(C) Tagging

Lexical Parser AnalysisMLP vs.

NLPIntro Idea Lemmatization

Page 16: From NLP to MLPcomsec.spb.ru/mmm-acns10/doc/MMM-ACNS2010/Session4/02_Te… · • NLP: • Extract sentences, analyze terms, find relations (e.g. Stanford parser) • This beautiful_A

Analysis

• Unsupervised learning (clustering)

• Supervised learning (classifiers)

• Semantic search

• Semantic relations

• Anomaly detection

• Feature relevanceActivation Patterns

POS(C) Tagging

Lexical Parser AnalysisMLP vs.

NLPIntro Idea Lemmatization

Page 17: From NLP to MLPcomsec.spb.ru/mmm-acns10/doc/MMM-ACNS2010/Session4/02_Te… · • NLP: • Extract sentences, analyze terms, find relations (e.g. Stanford parser) • This beautiful_A

Applications

e-ParticipationText analysis

Twitter Mining

User Tracking in WIFI Networks

Event Correlation

RDF Analysis

Malware Analysis

Activation Patterns

POS(C) Tagging

Lexical Parser AnalysisMLP vs.

NLPIntro Idea Lemmatization

Page 18: From NLP to MLPcomsec.spb.ru/mmm-acns10/doc/MMM-ACNS2010/Session4/02_Te… · • NLP: • Extract sentences, analyze terms, find relations (e.g. Stanford parser) • This beautiful_A

MLP Example

• Metasploit shellcodes

• Decoder loops of various decryption engines

• Clustering, semantic search and relations

POS(C) Tagging

Lexical Parser

Activation Patterns

MLP vs. NLPIntro Idea Lemmatization Analysis

Page 19: From NLP to MLPcomsec.spb.ru/mmm-acns10/doc/MMM-ACNS2010/Session4/02_Te… · • NLP: • Extract sentences, analyze terms, find relations (e.g. Stanford parser) • This beautiful_A

Clustering

• NLP:

• Group similar documents, extract key concepts from clusters, gain quick overview

• MLP:

• Group similar code (instruction sequences, functions, decryption loops)

POS(C) Tagging

Lexical Parser

Activation Patterns

MLP vs. NLPIntro Idea Lemmatization Analysis

Page 20: From NLP to MLPcomsec.spb.ru/mmm-acns10/doc/MMM-ACNS2010/Session4/02_Te… · • NLP: • Extract sentences, analyze terms, find relations (e.g. Stanford parser) • This beautiful_A

Semantic Search• NLP:

• St. Petersburg is beautiful. The city was founded in 1703.

• MLP:

• Search for similar concepts (decryption loops)

POS(C) Tagging

Lexical Parser

Activation Patterns

MLP vs. NLPIntro Idea Lemmatization Analysis

Page 21: From NLP to MLPcomsec.spb.ru/mmm-acns10/doc/MMM-ACNS2010/Session4/02_Te… · • NLP: • Extract sentences, analyze terms, find relations (e.g. Stanford parser) • This beautiful_A

Semantic Relations

• NLP

• MLP

POS(C) Tagging

Lexical Parser

Activation Patterns

MLP vs. NLPIntro Idea Lemmatization Analysis

Page 22: From NLP to MLPcomsec.spb.ru/mmm-acns10/doc/MMM-ACNS2010/Session4/02_Te… · • NLP: • Extract sentences, analyze terms, find relations (e.g. Stanford parser) • This beautiful_A

Thoughts...

• Same basic principles for NLP and MLP

• Use the existing knowledge, bring it to another domain

• Apply it to arbitrary language

• Understand malware/programs???

Page 23: From NLP to MLPcomsec.spb.ru/mmm-acns10/doc/MMM-ACNS2010/Session4/02_Te… · • NLP: • Extract sentences, analyze terms, find relations (e.g. Stanford parser) • This beautiful_A

Thank you for your attention!