improving cyber-security improving cyber-security through

21
Copyright 2003, SPSS Inc. 1 Improving Cyber-Security Improving Cyber-Security through Predictive Analysis through Predictive Analysis 21 November 2005 Peter Frometa SPSS

Upload: tommy96

Post on 05-Dec-2014

1.443 views

Category:

Documents


14 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Improving Cyber-Security Improving Cyber-Security through

Copyright 2003, SPSS Inc. 1

Improving Cyber-SecurityImproving Cyber-Securitythrough Predictive Analysisthrough Predictive Analysis

21 November 2005

Peter FrometaSPSS

Page 2: Improving Cyber-Security Improving Cyber-Security through

Copyright 2003, SPSS Inc. 2

Applications of DM & TM in Security Applications of DM & TM in Security

Identify Cyber Threats

Discover Potentially

Suspicious Inbound Cargo

Identify Insider Threats

Discover Terrorist

Organization Affiliation

Predict Risk Potential forThreats to Safety

Detect Smuggling and Drug

Trafficking Associations

Detect Money

Laundering Behavior

Detect Fraud

Predict Non-Complianceof Foreign Visitors

Ensure Maritime Domain

Awareness

Identify True Identity ofIndividuals

Better Allocate SecurityResources

Page 3: Improving Cyber-Security Improving Cyber-Security through

Copyright 2003, SPSS Inc. 3

Data Mining Defined Data Mining Defined

What is Data Mining?

A hot buzzword for a class of techniques that find patterns or relationships

that have not previously been detected

A user-centric, interactive process which leverages analysis technologies

and computing power

Not reliant on an existing database

A relatively easy task that requires knowledge of the organization’s

problem and subject matter expertise

Page 4: Improving Cyber-Security Improving Cyber-Security through

Copyright 2003, SPSS Inc. 4

Text Mining Defined Text Mining Defined

What is Text Mining?

A user-centric process which leverages analysis technologies and computing

power to access valuable information within unstructured text data sources

Driven by the use of Natural Language Processing and Linguistic based

algorithms…not a search engine

No real value unless used in conjunction with Data Mining

Ultimately, eliminates need to manually read unstructured data sources

Page 5: Improving Cyber-Security Improving Cyber-Security through

Copyright 2003, SPSS Inc. 5

Beyond Historical ReportingBeyond Historical ReportingREACTIVE

Historical Reporting

What are the top source/destination countries?

What are the top attacked products?

What are the top offending ISP’s

What IP’s have led to the most events?

What ports are scanned the most?

What are the IDS event counts for the past day, week, month?

PROACTIVE

Predictive Analytics

What activity within the unidentified data of the top source/destination countries is likely to bemalicious?

Of those identified to be malicious, which is likely to be the worst?

What are the characteristics/ vulnerabilities of products likely to be attacked?

Which activity did not get flagged that is likely to be malicious?

Which flagged items are most likely to be false positives?

What can we expect the IDS event counts to be next week, month, etc.?

What activity is likely to be associated with worm emergence?

Which isolated events are likely to be associated with originating from the same source?

Page 6: Improving Cyber-Security Improving Cyber-Security through

Copyright 2003, SPSS Inc. 6

sIP – source IP address (IP address of host that sent the IP packet)

dIP – destination IP address

sPort – port used by the source IP address

dPort – port used by the destination IP address

pro – IP protocol (primarily TCP/UDP in this data)

packets – packet count

bytes – byte count

flags – Transfer Control Protocol header flags

sTime – start time of flow (GMT)

dur – duration of flow (eTime – sTime)

eTime – end time of flow (GMT)

sr – source name or ID of sensor that picked up the data

NetFlowNetFlow Data Data

Page 7: Improving Cyber-Security Improving Cyber-Security through

Copyright 2003, SPSS Inc. 7

Level 1 Analysis Level 2 Analysis

Stop an intrusion event Associate multiple similar intrusions

Log Data Classify organized intrusions

Report Identify state sponsored cyber terror

Based on:

Net Flow data Net Flow data

Response data

Correlated Incidents, historical data

Keystroke logging

other…

Where Does Predictive Analytics Fit?Where Does Predictive Analytics Fit?

(source and destination IP address,

protocol, packets, bytes, flags…)

Page 8: Improving Cyber-Security Improving Cyber-Security through

Copyright 2003, SPSS Inc. 8

Data Preparation/Manipulation• Merge

• Compare

• Re-Classify

• Match cases to existing lists of valid IPs

• Calculate time between access attempts from same IP

• Calculate average bytes passed to same IP

• plus other data prep steps depending on desired analysis

Page 9: Improving Cyber-Security Improving Cyber-Security through

Copyright 2003, SPSS Inc. 9

AssociatePredict

Cluster

Data

Mining

Page 10: Improving Cyber-Security Improving Cyber-Security through

Copyright 2003, SPSS Inc. 10

AssociatePredict

Cluster

Data

Mining

Page 11: Improving Cyber-Security Improving Cyber-Security through

Copyright 2003, SPSS Inc. 11

AssociatePredict

Cluster

Data

Mining

Page 12: Improving Cyber-Security Improving Cyber-Security through

Copyright 2003, SPSS Inc. 12

Given

• Event and Alarm Data

• Reported Incidents

• Response Data

• Correlated Events

If…

Event 1 takes place, followed by

Event 2, followed by

Event 3, followed by

an Intrusion

• What event is likely to happen next?

• Can we be notified before it happens?

• Can we prevent future events earlier in the cycle?

AssociatePredict

Cluster

Data

Mining

Page 13: Improving Cyber-Security Improving Cyber-Security through

Copyright 2003, SPSS Inc. 13

DeploymentPredictive Analysis as:

• an ad-hoc analysis tool

• a background process

Page 14: Improving Cyber-Security Improving Cyber-Security through

Copyright 2003, SPSS Inc. 14

Improving Insider ThreatDetection throughPredictive Analysis

Page 15: Improving Cyber-Security Improving Cyber-Security through

Copyright 2003, SPSS Inc. 15

Page 16: Improving Cyber-Security Improving Cyber-Security through

Copyright 2003, SPSS Inc. 16

Selected documents are run through Clementine’s linguistics based

text extraction engine. Linguistics-based text mining finds meaning

in text much as people do—by recognizing a variety of word forms as

having similar meanings and by analyzing sentence structure to

provide a framework for understanding the text. This approach offers

the speed and cost-effectiveness of statistics-based systems, but it

offers a far higher degree of accuracy while requiring far less human

intervention.

Page 17: Improving Cyber-Security Improving Cyber-Security through

Copyright 2003, SPSS Inc. 17

Page 18: Improving Cyber-Security Improving Cyber-Security through

Copyright 2003, SPSS Inc. 18

Isolated points represent cases where

the behavior or characteristics are not

similar to any other cases.

These anomalies may help us identify suspicious behavior.

Page 19: Improving Cyber-Security Improving Cyber-Security through

Copyright 2003, SPSS Inc. 19

Page 20: Improving Cyber-Security Improving Cyber-Security through

Copyright 2003, SPSS Inc. 20

Page 21: Improving Cyber-Security Improving Cyber-Security through

Copyright 2003, SPSS Inc. 21

• Visit www.spss.com

• Future Questions:• Peter Frometa – [email protected]

Final Thoughts and QuestionsFinal Thoughts and Questions