Transcript
Page 1: A Framework for Detection and Measurement of Phishing Attacks

A Framework for Detection and Measurementof Phishing AttacksReporter: Li, Fong Ruei

National Taiwan University of Science and Technology

Slide 1 (of 35)

Page 2: A Framework for Detection and Measurement of Phishing Attacks

Machine Learning and Bioinformatics Laboratory

ReferenceWorkshop On Rapid Malcode Proceedings of the 2007 ACM workshop on Recurring malcode Alexandria, Virginia, USA

SESSION: Threats Pages: 1 - 8  Year of Publication: 2007 ISBN:978-1-59593-886-2

Slide 2 (of 35)

Page 3: A Framework for Detection and Measurement of Phishing Attacks

Machine Learning and Bioinformatics Laboratory

OutlineIntroductionPhishing URL TypesModeling Phishing URLs

Feature AnalysisTraining With Features

Analysis and FindingsConclusion

Slide 3 (of 35)

Page 4: A Framework for Detection and Measurement of Phishing Attacks

Machine Learning and Bioinformatics Laboratory

INTRODUCTIONPhishing is form of identity theft

social engineering techniques sophisticated attack vectors

To harvest financial information from unsuspecting consumers.

Often a phisher tries to lure her victim into clicking a URL pointing to a rogue page.

Slide 4 (of 35)

Page 5: A Framework for Detection and Measurement of Phishing Attacks

Machine Learning and Bioinformatics Laboratory

PHISHING URL TYPESWe examined a black list of phishing URLs maintained by Google

This black list is used to provide phishing protection in Firefox

Slide 5 (of 35)

Page 6: A Framework for Detection and Measurement of Phishing Attacks

Machine Learning and Bioinformatics Laboratory

PHISHING URL TYPESThe prominent obfuscation techniques are:

Type I: Obfuscating the Host with an IP address

Type II: Obfuscating the Host with another Domain

Type III: Obfuscating with large host namesType IV: Domain unknown or misspelled

Slide 6 (of 35)

Page 7: A Framework for Detection and Measurement of Phishing Attacks

Machine Learning and Bioinformatics Laboratory

PHISHING URL TYPES

Slide 7 (of 35)

Page 8: A Framework for Detection and Measurement of Phishing Attacks

Machine Learning and Bioinformatics Laboratory

MODELING PHISHING URLSUsing logistic regression classifierFor training the model training black list and white list as followsWe use 1245 URLs from this list as our training

black listWe used a list of the top 1000 most popular

URLs as the basis of our training white list set

Slide 8 (of 35)

Page 9: A Framework for Detection and Measurement of Phishing Attacks

Machine Learning and Bioinformatics Laboratory

MODELING PHISHING URLSFeature Analysis

We categorize our features into four groups: Page Based Domain Based Type Based Word Based

Slide 9 (of 35)

Page 10: A Framework for Detection and Measurement of Phishing Attacks

Machine Learning and Bioinformatics Laboratory

MODELING PHISHING URLSPage Based :

a numeric value on a scale of [0,1] relative importance of a page within a set of web

pages

Slide 10 (of 35)

Page 11: A Framework for Detection and Measurement of Phishing Attacks

Machine Learning and Bioinformatics Laboratory

MODELING PHISHING URLSPage Based :

Slide 11 (of 35)

Page Rank distribution for the white list and black list URLs hostname

Page 12: A Framework for Detection and Measurement of Phishing Attacks

Machine Learning and Bioinformatics Laboratory

MODELING PHISHING URLSDomain Based

This category contains only one feature: whether or not the URL’s domain name can be found in

the White Domain Table.

Slide 12 (of 35)

Page 13: A Framework for Detection and Measurement of Phishing Attacks

Machine Learning and Bioinformatics Laboratory

Slide 13 (of 35)

MODELING PHISHING URLSDomain Based

51.2% of the white list URLs were present in the table

0.2% of the black list URLs were found in this table.

Page 14: A Framework for Detection and Measurement of Phishing Attacks

Machine Learning and Bioinformatics Laboratory

MODELING PHISHING URLSType Based

Type I URL Almost all non-phishing (white list) URLs in our training

data do not contain host obfuscation A significant portion of the phishing URLs are host

obfuscated with an IP address.

Type II URL portion of the black list URLs are Type II URLs.

Slide 14 (of 35)

Page 15: A Framework for Detection and Measurement of Phishing Attacks

Machine Learning and Bioinformatics Laboratory

MODELING PHISHING URLSType Based

Slide 15 (of 35)

Distribution of Type I and Type II URLs in the training data

Page 16: A Framework for Detection and Measurement of Phishing Attacks

Machine Learning and Bioinformatics Laboratory

MODELING PHISHING URLSType Based

Type III URL we determine the number of characters present after an

organization in the hostname

Slide 16 (of 35)

Page 17: A Framework for Detection and Measurement of Phishing Attacks

Machine Learning and Bioinformatics Laboratory

MODELING PHISHING URLSType Basednon-phishing URL

http://by124fd.bay124.hotmail.msn.com/cgi-bin/getmsg 0 characters after msn.com & before the path separator the maximum number noticed in a white list URL are 14

charactersType III phishing URLs

7.34 characters (on average) after the target before the path separator

a maximum of 63 characters

Slide 17 (of 35)

Page 18: A Framework for Detection and Measurement of Phishing Attacks

Machine Learning and Bioinformatics Laboratory

MODELING PHISHING URLSWord Based Features

Phishing URLs are found to contain several suggestive word tokens

login and signin are very often found in a phishing URLWe discarded all tokens with length < 5

containe several common URL parts such as http://, and www.

We discarded organization name tokens We further removed query parameters

Slide 18 (of 35)

Page 19: A Framework for Detection and Measurement of Phishing Attacks

Machine Learning and Bioinformatics Laboratory

MODELING PHISHING URLS

Slide 19 (of 35)

Distribution of these features in our training set

Page 20: A Framework for Detection and Measurement of Phishing Attacks

Machine Learning and Bioinformatics Laboratory

MODELING PHISHING URLSTraining With Features

Our labeled data consisted of 2508 URLs 1245 were phishing URLs 1263 were benign URLs Phishing URLs were placed under the positive (true) class non-phishing ones were under the negative (false) class

66% of URLs were used for training and the remaining 34% were used as the test set

Slide 20 (of 35)

Page 21: A Framework for Detection and Measurement of Phishing Attacks

Machine Learning and Bioinformatics Laboratory

MODELING PHISHING URLSTo indicate the relative strength of each feature in identifying a Phishing URL we report the corresponding odds ratios, ecoefficient

Slide 21 (of 35)

Page 22: A Framework for Detection and Measurement of Phishing Attacks

Machine Learning and Bioinformatics Laboratory

MODELING PHISHING URLS

Slide 22 (of 35)

Page 23: A Framework for Detection and Measurement of Phishing Attacks

Machine Learning and Bioinformatics Laboratory

MODELING PHISHING URLSEvaluation Result

We evaluated the trained model on the 34% test set split.

We performed our evaluation over multiple runs with randomized partitioning.

This evaluation gave us an average accuracy of 97.31% with True Positive Rate of 95.8 % False Positive Rate of 1.2%.

Slide 23 (of 35)

Page 24: A Framework for Detection and Measurement of Phishing Attacks

Machine Learning and Bioinformatics Laboratory

ANALYSIS AND FINDINGSWe collected several million URLs from August 20th to August 31 2006

The data consisted of two main components , unique URLs which are visited each dayconsecutive look up requests to these URLs

Slide 24 (of 35)

Page 25: A Framework for Detection and Measurement of Phishing Attacks

Machine Learning and Bioinformatics Laboratory

ANALYSIS AND FINDINGSAverage Phishing URLs per day.The average number of phishing URLs which have been visited from Google’s toolbar in a day.

we find that on average there are 777 URL phishing attacks in a day5073 viewers to a phishing page

Slide 25 (of 35)

Page 26: A Framework for Detection and Measurement of Phishing Attacks

Machine Learning and Bioinformatics Laboratory

ANALYSIS AND FINDINGSAverage Phishing URLs per day.

Slide 26 (of 35)

the distribution of phishing attacks on each day of our study.

Page 27: A Framework for Detection and Measurement of Phishing Attacks

Machine Learning and Bioinformatics Laboratory

ANALYSIS AND FINDINGSAverage Phishing URLs per day.

Slide 27 (of 35)

Page 28: A Framework for Detection and Measurement of Phishing Attacks

Machine Learning and Bioinformatics Laboratory

ANALYSIS AND FINDINGSAverage Phishing URLs per day.

Slide 28 (of 35)

Page 29: A Framework for Detection and Measurement of Phishing Attacks

Machine Learning and Bioinformatics Laboratory

ANALYSIS AND FINDINGSAverage Potential Phishing Victims per day.Determine how many users interact with a phishing page

A user that has any interaction at a site classified as phishing is regarded as a potential phishing victim.

Slide 29 (of 35)

Page 30: A Framework for Detection and Measurement of Phishing Attacks

Machine Learning and Bioinformatics Laboratory

ANALYSIS AND FINDINGSAverage Potential Phishing Victims per day.Based on the number of users who view phishing pages in a day, we further can infer Potential Success Rate of a phisher as follows:

Slide 30 (of 35)

Page 31: A Framework for Detection and Measurement of Phishing Attacks

Machine Learning and Bioinformatics Laboratory

ANALYSIS AND FINDINGSAverage Potential Phishing Victims per day.

Slide 31 (of 35)

the distribution of phishing attacks on each day of our study.

Page 32: A Framework for Detection and Measurement of Phishing Attacks

Machine Learning and Bioinformatics Laboratory

ANALYSIS AND FINDINGSDistribution of Phishing by Organization

Slide 32 (of 35)

Page 33: A Framework for Detection and Measurement of Phishing Attacks

Machine Learning and Bioinformatics Laboratory

Slide 33 (of 35)

ANALYSIS AND FINDINGSGeographical Distribution of Phishing.

To determine country that hosts a particular phishing URL, we used Google’s IP to Geo-Location infrastructure.

Page 34: A Framework for Detection and Measurement of Phishing Attacks

Machine Learning and Bioinformatics Laboratory

Anti-Phishing Tools

Slide 34 (of 35)

Page 35: A Framework for Detection and Measurement of Phishing Attacks

Machine Learning and Bioinformatics Laboratory

CONCLUSIONWe use our features in a logistic regression classifier that achieves a very high accuracy.

One of the major contributions of this work is a large scale measurement study conducted on Google Toolbar URLs

On average we found around 777 unique phishing pages per day and on average 8.24% of the number users who view phishing pages are potential phishing victims

Slide 35 (of 35)


Top Related