on the study of anomaly-based spam filtering using spam as representation of normality - ccnc - rsw...

48
Carlos Laorden

Upload: carlos-laorden

Post on 18-Dec-2014

84 views

Category:

Technology


2 download

DESCRIPTION

Presentation at CCNC's - Research Student Workshop 2012 of the paper: On the Study of Anomaly-based Spam Filtering Using Spam as Representation of Normality

TRANSCRIPT

Page 1: On the Study of Anomaly-based Spam Filtering Using Spam as Representation of Normality - CCNC - RSW 2012

Carlos Laorden

Page 2: On the Study of Anomaly-based Spam Filtering Using Spam as Representation of Normality - CCNC - RSW 2012

WHAT YOU GOT, THEN? SPAM, EGG,

SPAM, SPAM, BACON AND

SPAM.

SPAM, SPAM, SPAM, BAKED BEANS AND

SPAM.

ANYTHING WITHOUT

SPAM?

I DON’T LIKE SPAM!!

UGH!

Page 3: On the Study of Anomaly-based Spam Filtering Using Spam as Representation of Normality - CCNC - RSW 2012

Meet the real SPiced hAM

Page 4: On the Study of Anomaly-based Spam Filtering Using Spam as Representation of Normality - CCNC - RSW 2012

Monty Python’s Flying Circus

Page 5: On the Study of Anomaly-based Spam Filtering Using Spam as Representation of Normality - CCNC - RSW 2012

Something that repeats and repeats until being annoying

Page 6: On the Study of Anomaly-based Spam Filtering Using Spam as Representation of Normality - CCNC - RSW 2012

It is a

real problem for Information Security

Page 7: On the Study of Anomaly-based Spam Filtering Using Spam as Representation of Normality - CCNC - RSW 2012

Billions of daily losses in

productivity

Page 8: On the Study of Anomaly-based Spam Filtering Using Spam as Representation of Normality - CCNC - RSW 2012

Infected computers

Page 9: On the Study of Anomaly-based Spam Filtering Using Spam as Representation of Normality - CCNC - RSW 2012

Stolen credentials

Page 10: On the Study of Anomaly-based Spam Filtering Using Spam as Representation of Normality - CCNC - RSW 2012
Page 11: On the Study of Anomaly-based Spam Filtering Using Spam as Representation of Normality - CCNC - RSW 2012

We must

fight

Page 12: On the Study of Anomaly-based Spam Filtering Using Spam as Representation of Normality - CCNC - RSW 2012

Anti-spam methods

Pre-sending

New

protocols

Post-sending

Increase sending

costs Increase risks

for spammers

E-mail

sender

E-mail

content

E-mail

content

Page 13: On the Study of Anomaly-based Spam Filtering Using Spam as Representation of Normality - CCNC - RSW 2012

Usually

supervised approaches

Page 14: On the Study of Anomaly-based Spam Filtering Using Spam as Representation of Normality - CCNC - RSW 2012

A significant

labelling work is needed

Page 15: On the Study of Anomaly-based Spam Filtering Using Spam as Representation of Normality - CCNC - RSW 2012

A significant

labelling work is needed

Page 16: On the Study of Anomaly-based Spam Filtering Using Spam as Representation of Normality - CCNC - RSW 2012

But, is this

possible?

Page 17: On the Study of Anomaly-based Spam Filtering Using Spam as Representation of Normality - CCNC - RSW 2012

I mean, is this

possible...

Page 18: On the Study of Anomaly-based Spam Filtering Using Spam as Representation of Normality - CCNC - RSW 2012
Page 19: On the Study of Anomaly-based Spam Filtering Using Spam as Representation of Normality - CCNC - RSW 2012

YES

Page 20: On the Study of Anomaly-based Spam Filtering Using Spam as Representation of Normality - CCNC - RSW 2012

Anomaly Detection

Page 21: On the Study of Anomaly-based Spam Filtering Using Spam as Representation of Normality - CCNC - RSW 2012

no interest this SpamAssassin word has

this has Ling Spam no interest word

SpamAssassin

Ling Spam t1

t2

t3 D1

D2

D10 D3

D9

D4

D7

D8

D5

D11

D6

Page 22: On the Study of Anomaly-based Spam Filtering Using Spam as Representation of Normality - CCNC - RSW 2012

? ?

Anomaly detection

d

d > threshold?

> threshold?

Page 23: On the Study of Anomaly-based Spam Filtering Using Spam as Representation of Normality - CCNC - RSW 2012

Manhattan distance

Euclidean distance

Page 24: On the Study of Anomaly-based Spam Filtering Using Spam as Representation of Normality - CCNC - RSW 2012

Anomaly detection

?

d

d ?

Page 25: On the Study of Anomaly-based Spam Filtering Using Spam as Representation of Normality - CCNC - RSW 2012

Minimum distance

Maximum distance

Mean distance

Page 26: On the Study of Anomaly-based Spam Filtering Using Spam as Representation of Normality - CCNC - RSW 2012

Minimum

distance

Maximum

distance

Mean

distance

Manhattan

distance

Euclidean

distance

Page 27: On the Study of Anomaly-based Spam Filtering Using Spam as Representation of Normality - CCNC - RSW 2012

10 different

thresholds

Page 28: On the Study of Anomaly-based Spam Filtering Using Spam as Representation of Normality - CCNC - RSW 2012

Anomaly detection

d

d < threshold

> threshold

Page 29: On the Study of Anomaly-based Spam Filtering Using Spam as Representation of Normality - CCNC - RSW 2012
Page 30: On the Study of Anomaly-based Spam Filtering Using Spam as Representation of Normality - CCNC - RSW 2012

Minimum

distance

Maximum

distance

Mean

distance

Manhattan

distance

Euclidean

distance

10

thresholds

Page 31: On the Study of Anomaly-based Spam Filtering Using Spam as Representation of Normality - CCNC - RSW 2012

What do

we get?

Page 32: On the Study of Anomaly-based Spam Filtering Using Spam as Representation of Normality - CCNC - RSW 2012

Detects more than 93% of junk emails

Less than 5% of

misclassified legitimate emails

Page 33: On the Study of Anomaly-based Spam Filtering Using Spam as Representation of Normality - CCNC - RSW 2012

Detects more than 91% of junk emails

An improvable 8% of

misclassified legitimate emails

Page 34: On the Study of Anomaly-based Spam Filtering Using Spam as Representation of Normality - CCNC - RSW 2012

Suitable to

overcome the amount

of unclassified spam e-mails

Page 35: On the Study of Anomaly-based Spam Filtering Using Spam as Representation of Normality - CCNC - RSW 2012
Page 36: On the Study of Anomaly-based Spam Filtering Using Spam as Representation of Normality - CCNC - RSW 2012

More?

Page 37: On the Study of Anomaly-based Spam Filtering Using Spam as Representation of Normality - CCNC - RSW 2012

Minimum distance

Maximum distance

Mean distance

Page 38: On the Study of Anomaly-based Spam Filtering Using Spam as Representation of Normality - CCNC - RSW 2012

Minimum distance

Maximum distance

Mean distance

Page 39: On the Study of Anomaly-based Spam Filtering Using Spam as Representation of Normality - CCNC - RSW 2012

?

d

d ?

Page 40: On the Study of Anomaly-based Spam Filtering Using Spam as Representation of Normality - CCNC - RSW 2012

You have new e-mail?

Legitimate? Really?

Page 41: On the Study of Anomaly-based Spam Filtering Using Spam as Representation of Normality - CCNC - RSW 2012

What is the anomaly?

Page 42: On the Study of Anomaly-based Spam Filtering Using Spam as Representation of Normality - CCNC - RSW 2012

Anomaly

Normality

Page 43: On the Study of Anomaly-based Spam Filtering Using Spam as Representation of Normality - CCNC - RSW 2012

Results

Page 44: On the Study of Anomaly-based Spam Filtering Using Spam as Representation of Normality - CCNC - RSW 2012

SpamAssassin Manhattan Euclidean

Prec. Rec. F-Meas. Prec. Rec. F-Meas.

Mean 3.83% 100% 7.37% 3.83% 100% 7.37%

Maximum 3.98% 67.92% 7.53% 5.23% 35.63% 9.13%

Minimum 16.48% 12.50% 14.22% 58.73% 15.42% 24.42%

Page 45: On the Study of Anomaly-based Spam Filtering Using Spam as Representation of Normality - CCNC - RSW 2012

Ling Spam Manhattan Euclidean

Prec. Rec. F-Meas. Prec. Rec. F-Meas.

Mean 8.37% 100% 15.45% 8.37% 100% 15.45%

Maximum 8.37% 100% 15.45% 20.75% 56.59% 30.37%

Minimum 56.88% 23.10% 32.86% 71.58% 40.51% 51.74%

Page 46: On the Study of Anomaly-based Spam Filtering Using Spam as Representation of Normality - CCNC - RSW 2012

Anomaly

Normality

Page 47: On the Study of Anomaly-based Spam Filtering Using Spam as Representation of Normality - CCNC - RSW 2012
Page 48: On the Study of Anomaly-based Spam Filtering Using Spam as Representation of Normality - CCNC - RSW 2012

References

1. Monty Python – Spam: http://www.youtube.com/watch?v=anwy2MPT5RE

2. Spam wall by freezelight: http://www.flickr.com/photos/63056612@N00/155554663/

3. monty python flying circus by the_d8_show: http://www.flickr.com/photos/8056839@N04/478599790/

4. Dollars: http://vegasgravy.com/News-detail/two-women-

caught-for-transporting-drug-money-from-vegas/dollars/

5. Day 97: Infected by dustywrath: http://www.flickr.com/photos/10921499@N07/2187318683

6. my bank sucks by B Rosen: http://www.flickr.com/photos/rosengrant/3537904106/

7. Feet on table: http://bisystembuilders.com/wp-

content/uploads/2010/02/shutterstock_feet-on-table.jpg

8. Buried on bills: http://getupkids.net/wp-

content/uploads/2013/06/debt_piling.jpg

9. Kill spam: http://www.email-marketing-wizard.com/wp-

content/uploads/2010/03/spammer.jpg