using machine learning in security - icsisadia/talks/sadia-intel.pdf · machine learning overview....

136
Using Machine Learning in Security Sadia Afroz University of California, Berkeley

Upload: vuthuy

Post on 11-Mar-2018

226 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

Using Machine Learning in SecuritySadia Afroz

University of California, Berkeley

Page 2: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

My Research

• Stylometry

• Malware Analysis

• Internet Freedom

Page 3: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

Impact• ACM SIGSAC Dissertation award, 2014 (Runner up)

• Outstanding research in Privacy (PET) Award, 2013

• Best paper award, PETS 2012

• Free software: JStylo, Anonymouth, Doppelgänger Finder

• My work on Stylometry has been used by FBI

• My work on malware analysis is being deployed at McAfee

Page 4: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

This Talk

• Application of Machine Learning in Security

• Machine Learning under Attack

• Machine Learning in Noisy Environment

Page 5: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

Application of Machine Learning: A case study on Stylometry

• Can we link different identities using writing style?

Page 6: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

Underground Forums

Page 7: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

Understand the adversaries:

• How information flows in a cybercriminal forum? • How the cybercrime done?

Underground Forums

Page 8: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

Underground Forums

Page 9: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

Buying malware

Underground Forums

Page 10: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

Buying malware

Buying crypter

Underground Forums

Page 11: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

Buying malware Selling email/password

Buying crypter

Underground Forums

Page 12: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

Buying malware Selling email/password

Buying crypter

Underground Forums

Page 13: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

9

Buying malware

Selling email/password

Buying crypter

Underground Forums

Page 14: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

9

Buying malware

Selling email/password

Buying crypter

Underground Forums

Page 15: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

Writing style analysis: Stylometry

Page 16: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

Writing style analysis: Stylometry

Regional differences: Couch vs Sofa

Page 17: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

Writing style analysis: Stylometry

Regional differences: Couch vs Sofa

Similar meaning but different words: Although vs Though

Page 18: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

Cormac McCarthy

Ernest Hemingway

Machine Learning Overview

Page 19: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

What's the bravest thing you ever did?

He spat in the road a bloody phlegm. Getting up this morning, he said.

Cormac McCarthy

Ernest Hemingway

Machine Learning Overview

Page 20: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

What's the bravest thing you ever did?

He spat in the road a bloody phlegm. Getting up this morning, he said.

He no longer dreamed of storms, nor of women, nor

of great occurrences, nor of great fish, nor fights, nor

contests of strength, nor of his wife.

Cormac McCarthy

Ernest Hemingway

Machine Learning Overview

Page 21: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

What's the bravest thing you ever did?

He spat in the road a bloody phlegm. Getting up this morning, he said.

He no longer dreamed of storms, nor of women, nor

of great occurrences, nor of great fish, nor fights, nor

contests of strength, nor of his wife.

Cormac McCarthy

Ernest Hemingway

Extract features

Extract features

Machine Learning Overview

Page 22: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

What's the bravest thing you ever did?

He spat in the road a bloody phlegm. Getting up this morning, he said.

He no longer dreamed of storms, nor of women, nor

of great occurrences, nor of great fish, nor fights, nor

contests of strength, nor of his wife.

Cormac McCarthy

Ernest Hemingway

Extract features

Extract features

Freq of function words

Freq of punctuations

Machine Learning Overview

Page 23: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

What's the bravest thing you ever did?

He spat in the road a bloody phlegm. Getting up this morning, he said.

He no longer dreamed of storms, nor of women, nor

of great occurrences, nor of great fish, nor fights, nor

contests of strength, nor of his wife.

Cormac McCarthy

Ernest Hemingway

Extract features

Extract features

Freq of function words

Freq of punctuations

Machine Learning Overview

Page 24: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

What's the bravest thing you ever did?

He spat in the road a bloody phlegm. Getting up this morning, he said.

He no longer dreamed of storms, nor of women, nor

of great occurrences, nor of great fish, nor fights, nor

contests of strength, nor of his wife.

Cormac McCarthy

Ernest Hemingway

Extract features

Extract features

Freq of function words

Freq of punctuations

Model

Machine Learning Overview

Page 25: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

Just remember that the things you put into your head are

there forever, he said. You might want to think about that.

Test document

Machine Learning Overview

Page 26: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

Just remember that the things you put into your head are

there forever, he said. You might want to think about that.

ModelWho wrote this?

Test document

Machine Learning Overview

Page 27: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

Just remember that the things you put into your head are

there forever, he said. You might want to think about that.

Model

Cormac McCarthy

Who wrote this?

Test document

Machine Learning Overview

Page 28: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

Important Concepts

• Feature extraction

• Ground truth

Page 29: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

Underground forums

4 leaked forums

Page 30: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

Underground forums

4 leaked forums

Antichat BlackhatWorld (Russian) (English)

Carders(German)

Password cracking Blackhat seo Accounts41k users 8k users 8k users

L33tCrew(German)Accounts12k users

Page 31: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

Analyzing writing style is challenging!

• Feature extraction is hard:

• Foreign language, slang, l33tsp3ak, bad spelling

1337 down? **Neh, die Lösung!** Ne klappt nit, denke mal eher das sie mal wieder DNS probleme haben

Page 32: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

Analyzing writing style is challenging!

• No ground truth

Page 33: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

Doppelgänger Finder: Cluster accounts that belong to a user

Page 34: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

Author A

Doppelgänger Finder: Cluster accounts that belong to a user

Page 35: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

Author A Author B

Doppelgänger Finder: Cluster accounts that belong to a user

Page 36: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

Author A Author B

P(A wrote B’s doc)

Doppelgänger Finder: Cluster accounts that belong to a user

Page 37: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

Author A Author B

P(A wrote B’s doc)

P(B wrote A’s doc)

Doppelgänger Finder: Cluster accounts that belong to a user

Page 38: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

Author A Author B

P(A wrote B’s doc)

P(B wrote A’s doc)

Combined score, T = P(A wrote Bd)* P(B wrote Ad)

A and B are the same author if T > threshold

Doppelgänger Finder: Cluster accounts that belong to a user

Page 39: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

Author A Author B

P(A wrote B’s doc)

P(B wrote A’s doc)

Combined score, T = P(A wrote Bd)* P(B wrote Ad)

A and B are the same author if T > threshold

Doppelgänger Finder: Cluster accounts that belong to a user

Page 40: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

Doppelgänger Finder: Cluster accounts that belong to a user

Page 41: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

The goal is to find the most probable author when the original author is not present.

Doppelgänger Finder: Cluster accounts that belong to a user

Page 42: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

The goal is to find the most probable author when the original author is not present.

Doppelgänger Finder: Cluster accounts that belong to a user

Page 43: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

A

The goal is to find the most probable author when the original author is not present.

Doppelgänger Finder: Cluster accounts that belong to a user

Page 44: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

AB

The goal is to find the most probable author when the original author is not present.

Doppelgänger Finder: Cluster accounts that belong to a user

Page 45: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

AB D

The goal is to find the most probable author when the original author is not present.

Doppelgänger Finder: Cluster accounts that belong to a user

Page 46: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

AB C D

The goal is to find the most probable author when the original author is not present.

Doppelgänger Finder: Cluster accounts that belong to a user

Page 47: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

AB C D

E

The goal is to find the most probable author when the original author is not present.

Doppelgänger Finder: Cluster accounts that belong to a user

Page 48: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

A

B C DE

The goal is to find the most probable author when the original author is not present.

Doppelgänger Finder: Cluster accounts that belong to a user

Page 49: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

A

B C DE

Train

Model

The goal is to find the most probable author when the original author is not present.

Doppelgänger Finder: Cluster accounts that belong to a user

Page 50: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

A

B C DE

Train

ModelWho wrote these?

The goal is to find the most probable author when the original author is not present.

Doppelgänger Finder: Cluster accounts that belong to a user

Page 51: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

A

B C DE

Train

ModelWho wrote these?

P(D wrote Ad)P(C wrote Ad)

P(E wrote Ad)

P(B wrote Ad)

The goal is to find the most probable author when the original author is not present.

Doppelgänger Finder: Cluster accounts that belong to a user

Page 52: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

A

B C DE

Train

ModelWho wrote these?

P(D wrote Ad)

Repeat for every author

P(C wrote Ad)

P(E wrote Ad)

P(B wrote Ad)

The goal is to find the most probable author when the original author is not present.

Doppelgänger Finder: Cluster accounts that belong to a user

Page 53: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

Data with ground truth

• 100 English blogs

• Written by 50 authors, 2 blogs per author

• Collected by crawling Google+ profiles of the authors (Narayanan et al. 2012)

Page 54: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

Probability scores of true pairs > false pairs

Page 55: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

Probability scores of true pairs > false pairs

No true pair after this

Page 56: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

Probability scores of true pairs > false pairs

No true pair after this

Best threshold: True pair = 48, False pair= 5

Page 57: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

Score on Carders

Page 58: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

Score on Carders

Our chosen threshold: 21 pairs

Page 59: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

Score on Carders

Our chosen threshold: 21 pairs

Page 60: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

Manual analysis criteria• To verify we looked at

• Similar or not? : Username, ICQ, Signature, Contact information, Account information, Topics.

• Do they talk with each other?

• Others:

• Do they acknowledge their other accounts?

• Do they have common properties with some other users?

• Were they banned from the forum?

Page 61: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

Combined probability score on Carders

Page 62: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

Combined probability score on CardersUsernames: per**, Smi**Acknowledge, same ICQ, sell weed

Page 63: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

Combined probability score on CardersUsernames: per**, Smi**Acknowledge, same ICQ, sell weed

Usernames: Pri**, Lou**Same ICQ, Topics

Page 64: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

Combined probability score on Carders

Usernames: Kan**, deb**Same ICQ

Usernames: per**, Smi**Acknowledge, same ICQ, sell weed

Usernames: Pri**, Lou**Same ICQ, Topics

Page 65: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

Combined probability score on Carders

Usernames: Kan**, deb**Same ICQ

Usernames: per**, Smi**Acknowledge, same ICQ, sell weed

Usernames: Pri**, Lou**Same ICQ, Topics

Usernames: Mr.**, Fle**Talk with each other

Page 66: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

Combined probability score on Carders

Usernames: Kan**, deb**Same ICQ

Usernames: per**, Smi**Acknowledge, same ICQ, sell weed

Usernames: Pri**, Lou**Same ICQ, Topics

Usernames: Mr.**, Fle**Talk with each other

Usernames: puT**, pol**Nothing matches

Page 67: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

Use of duplicate accounts• Sockpuppet:

• Raise fake demands for products

Page 68: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

Use of duplicate accounts• Sockpuppet:

• Raise fake demands for products

I want to sell x

Page 69: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

Use of duplicate accounts• Sockpuppet:

• Raise fake demands for products

I want to sell x OMG!!! I’ll buy them all

Page 70: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

Use of duplicate accounts

• Accounts for sale:

• Normal accounts: 10 €, 2nd level: 25 €, 3rd Level 50€

Page 71: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

Summary: Doppelgänger Finder

• Code: https://github.com/sheetal57/doppelganger-finder

• Paper: Doppelgänger Finder: Taking Stylometry To The Underground. IEEE S&P 2014

• Future:

• Darpa Memex Challenge

Page 72: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

Machine Learning under Attack: How to Evade Stylometry

• Write less

• Write differently

Page 73: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

Evading Stylometry: write differently

• We studied two ways to change writing style:

• Imitation: imitate Cormac McCarthy

• Obfuscation: writing in a different way

• We collected regular and deceptive documents using Amazon Mechanical Turk

Page 74: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

Accuracy in regular documents is high

0.00

0.25

0.50

0.75

1.00

Number of Authors5 10 15 20 25 30 35 40

9-Feature (NN)Synonym-BasedWriteprints Baseline (SVM)Random

Page 75: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

Accuracy in obfuscated writing

0.00

0.25

0.50

0.75

1.00

Number of Authors5 10 15 20 25 30 35 40

9-Feature (NN)Synonym-BasedWriteprints Baseline (SVM)Random

Page 76: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

Accuracy in imitated writing

0.00

0.25

0.50

0.75

1.00

Number of Authors5 10 15 20 25 30 35 40

9-Feature (NN)Synonym-BasedWriteprints Baseline (SVM)Random

Page 77: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

Evading Stylometry

• Evading stylometry is possible

• Maintaining a consistent fake writing style is hard

Page 78: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

Real World Example: A Gay Girl in Damascus

“Amina Arraf”

Page 79: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

Real World Example: A Gay Girl in Damascus

Fake picture (copied from Facebook)

“Amina Arraf”

Page 80: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

Real World Example: A Gay Girl in Damascus

Fake picture (copied from Facebook)

The real “Amina”

“Amina Arraf”

Page 81: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

Real World Example: A Gay Girl in Damascus

Fake picture (copied from Facebook)

The real “Amina”

“Amina Arraf”

Page 82: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

Real World Example: A Gay Girl in Damascus

Thomas MacMaster A 40-year old American male

Fake picture (copied from Facebook)

The real “Amina”

“Amina Arraf”

Page 83: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

I live in Damascus, Syria. It's a repressive police state. Most LGBT people are still deep in the closet or staying

as invisible as possible. But I have set up a blog announcing my sexuality, with my name and my photo.

Am I crazy? Maybe.

Real World Example: A Gay Girl in Damascus

Page 84: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

Can Thomas fool Machine Learning?

Page 85: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

Can Thomas fool Machine Learning?

Thomas (as himself)

Page 86: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

Can Thomas fool Machine Learning?

Thomas (as himself)

Thomas (as Amina)

Page 87: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

Can Thomas fool Machine Learning?

Thomas (as himself)

Thomas (as Amina)

Random

Page 88: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

Can Thomas fool Machine Learning?

Thomas (as himself)

Thomas (as Amina)

Random

Page 89: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

Can Thomas fool Machine Learning?

Model

Thomas (as himself)

Thomas (as Amina)

Random

Page 90: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

A Gay Girl in Damascus

Can Thomas fool Machine Learning?

Page 91: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

A Gay Girl in Damascus

Who wrote this?

Can Thomas fool Machine Learning?

Page 92: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

ModelA Gay Girl in Damascus

Who wrote this?

Can Thomas fool Machine Learning?

Page 93: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

ModelA Gay Girl in Damascus

Who wrote this?

54% posts

Can Thomas fool Machine Learning?

Page 94: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

ModelA Gay Girl in Damascus

Who wrote this?

54% posts

43% posts

Can Thomas fool Machine Learning?

Page 95: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

Summary: Evading Stylometry

• Machine learning methods perform poorly under attack

• Need to understand the cost of adversary

Page 96: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

Website

Machine Learning in Noisy Environment: Website Fingerprinting Attack

Page 97: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

Website

Machine Learning in Noisy Environment: Website Fingerprinting Attack

Page 98: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

Website

Machine Learning in Noisy Environment: Website Fingerprinting Attack

Where is Alice going?

Page 99: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

Website

Where is Alice going?

Machine Learning in Noisy Environment: Website Fingerprinting Attack

Page 100: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

Website

Where is Alice going?

Machine Learning in Noisy Environment: Website Fingerprinting Attack

Page 101: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

Website

Where is Alice going?

Machine Learning in Noisy Environment: Website Fingerprinting Attack

Page 102: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

Website

Where is Alice going?

Machine Learning in Noisy Environment: Website Fingerprinting Attack

Page 103: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

How does it work?

Page 104: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

How does it work?

Page 105: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

How does it work?

Page 106: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

How does it work?

Page 107: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

How does it work?

Page 108: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

How does it work?

Page 109: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

How does it work?

Extract features

Extract features

Page 110: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

How does it work?

Model

Extract features

Extract features

Page 111: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

How does it work?

Page 112: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

How does it work?

Some page

Page 113: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

How does it work?

Some page

Page 114: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

How does it work?

Model

Some pageWhat is this page?

Page 115: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

How does it work?

Model

Some pageWhat is this page?

Page 116: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

Why is WF so important?

● Tor as the most advanced anonymity network

● Allows an adversary to discover the browsing history

● Series of successful attacks

● Low cost to the adversary

Number of top conference publications

on WF (25)

Page 117: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

How practical is this attack?

Page 118: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

How practical is this attack?

Page 119: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

How practical is this attack?

Visit sites

Page 120: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

How practical is this attack?

Visit sites Collect packets

Page 121: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

How practical is this attack?

Visit sites Collect packets

Train model

Page 122: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

How practical is this attack?

Visit sites Collect packets

Train model

Test model

Page 123: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

Website

Unrealistic assumptions

Page 124: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

Website

Unrealistic assumptions

Page 125: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

Website

Unrealistic assumptions

Adversary: e.g., replicability

Page 126: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

ControlTest (0.5s)

77.08%

9.8% 7.9% 8.23%

Test (3s)Test (5s)

10

Website fingerprinting attack with multi-tab browsing

Page 127: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

ControlTest (0.5s)

77.08%

9.8% 7.9% 8.23%

Test (3s)Test (5s)

10

Time

BW

Tab 2Tab 1

Website fingerprinting attack with multi-tab browsing

Page 128: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

● Coexisting Tor Browser Bundle (TBB) versions

● Versions: 2.4.7, 3.5 and 3.5.2.1 (changes in RP, etc.)

Website fingerprinting attack with different Tor versions

Page 129: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

● Coexisting Tor Browser Bundle (TBB) versions

● Versions: 2.4.7, 3.5 and 3.5.2.1 (changes in RP, etc.)

Control (3.5.2.1)

Test (2.4.7)

Test (3.5)

79.58%66.75%

6.51%

11

Website fingerprinting attack with different Tor versions

Page 130: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

Website

Unrealistic assumptions

Web: e.g., staleness

Page 131: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

Accuracy (%)

Time (days)

Website Staleness

Page 132: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

Accuracy (%)

Time (days)

Less than 50% after 9d.

Website Staleness

Page 133: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

Theoretical Accuracy vs. Practical Accuracy

A Critical Evaluation of Website Fingerprinting Attacks. ACM CCS 2014

Page 134: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

Importance of the Result

• Need for right evaluation

• Focus on important problem

Page 135: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

Summary

• Application of Machine Learning in Security

• Machine Learning under Attack

• Machine Learning in Noisy Environment

Page 136: Using Machine Learning in Security - ICSIsadia/talks/sadia-intel.pdf · Machine Learning Overview. What's the bravest thing you ever did? ... using Amazon Mechanical Turk. Accuracy

Future Work

• Machine Learning in Security: Adversarially robust features

• Machine Learning in analyzing cybercriminal network