a framework for detection and measurement of phishing attacks

Download A Framework for Detection and Measurement of Phishing Attacks

Post on 03-Jan-2016

22 views

Category:

Documents

1 download

Embed Size (px)

DESCRIPTION

A Framework for Detection and Measurement of Phishing Attacks. Reporter: Li, Fong Ruei National Taiwan University of Science and Technology. Reference. Workshop On Rapid Malcode Proceedings of the 2007 ACM workshop on Recurring malcode  Alexandria, Virginia, USA SESSION: Threats  - PowerPoint PPT Presentation

TRANSCRIPT

  • A Framework for Detection and Measurementof Phishing AttacksReporter: Li, Fong Ruei National Taiwan University of Science and Technology

    Slide 1 (of 35)

    Slide 1 (of 4)

    Machine Learning and Bioinformatics Laboratory

    ReferenceWorkshop On Rapid Malcode Proceedings of the 2007 ACM workshop on Recurring malcode Alexandria, Virginia, USASESSION: ThreatsPages: 1 - 8Year of Publication:2007ISBN:978-1-59593-886-2

    Slide 2 (of 35)

    Slide 1 (of 4)

    Machine Learning and Bioinformatics Laboratory

    OutlineIntroductionPhishing URL TypesModeling Phishing URLsFeature AnalysisTraining With FeaturesAnalysis and FindingsConclusion

    Slide 3 (of 35)

    Slide 1 (of 4)

    Machine Learning and Bioinformatics Laboratory

    INTRODUCTIONPhishing is form of identity theftsocial engineering techniques sophisticated attack vectors To harvest financial information from unsuspecting consumers. Often a phisher tries to lure her victim into clicking a URL pointing to a rogue page.Slide 4 (of 35)

    Slide 1 (of 4)

    Machine Learning and Bioinformatics Laboratory

    PHISHING URL TYPESWe examined a black list of phishing URLs maintained by GoogleThis black list is used to provide phishing protection in FirefoxSlide 5 (of 35)

    Slide 1 (of 4)

    Machine Learning and Bioinformatics Laboratory

    PHISHING URL TYPESThe prominent obfuscation techniques are:Type I: Obfuscating the Host with an IP addressType II: Obfuscating the Host with another DomainType III: Obfuscating with large host namesType IV: Domain unknown or misspelledSlide 6 (of 35)

    Slide 1 (of 4)

    Machine Learning and Bioinformatics Laboratory

    PHISHING URL TYPESSlide 7 (of 35)

    Slide 1 (of 4)

    Machine Learning and Bioinformatics Laboratory

    MODELING PHISHING URLSUsing logistic regression classifierFor training the model training black list and white list as followsWe use 1245 URLs from this list as our training black listWe used a list of the top 1000 most popular URLs as the basis of our training white list set

    Slide 8 (of 35)

    Slide 1 (of 4)

    Machine Learning and Bioinformatics Laboratory

    MODELING PHISHING URLSFeature AnalysisWe categorize our features into four groups:Page BasedDomain BasedType BasedWord Based

    Slide 9 (of 35)

    Slide 1 (of 4)

    Machine Learning and Bioinformatics Laboratory

    MODELING PHISHING URLSPage Based :a numeric value on a scale of [0,1]relative importance of a page within a set of web pagesSlide 10 (of 35)

    Slide 1 (of 4)

    Machine Learning and Bioinformatics Laboratory

    MODELING PHISHING URLSPage Based :Slide 11 (of 35)Page Rank distribution for the white list and black list URLs hostname

    Slide 1 (of 4)

    Machine Learning and Bioinformatics Laboratory

    MODELING PHISHING URLSDomain BasedThis category contains only one feature:whether or not the URLs domain name can be found in the White Domain Table.

    Slide 12 (of 35)

    Slide 1 (of 4)

    Machine Learning and Bioinformatics Laboratory

    Slide 13 (of 35)MODELING PHISHING URLSDomain Based51.2% of the white list URLs were present in the table0.2% of the black list URLs were found in this table.

    Slide 1 (of 4)

    Machine Learning and Bioinformatics Laboratory

    MODELING PHISHING URLSType BasedType I URLAlmost all non-phishing (white list) URLs in our training data do not contain host obfuscation A significant portion of the phishing URLs are host obfuscated with an IP address.Type II URLportion of the black list URLs are Type II URLs.

    Slide 14 (of 35)

    Slide 1 (of 4)

    Machine Learning and Bioinformatics Laboratory

    MODELING PHISHING URLSType BasedSlide 15 (of 35)Distribution of Type I and Type II URLs in the training data

    Slide 1 (of 4)

    Machine Learning and Bioinformatics Laboratory

    MODELING PHISHING URLSType BasedType III URLwe determine the number of characters present after an organization in the hostname

    Slide 16 (of 35)

    Slide 1 (of 4)

    Machine Learning and Bioinformatics Laboratory

    MODELING PHISHING URLSType Basednon-phishing URLhttp://by124fd.bay124.hotmail.msn.com/cgi-bin/getmsg0 characters after msn.com & before the path separatorthe maximum number noticed in a white list URL are 14 charactersType III phishing URLs7.34 characters (on average) after the target before the path separator a maximum of 63 charactersSlide 17 (of 35)

    Slide 1 (of 4)

    Machine Learning and Bioinformatics Laboratory

    MODELING PHISHING URLSWord Based FeaturesPhishing URLs are found to contain several suggestive word tokenslogin and signin are very often found in a phishing URLWe discarded all tokens with length < 5 containe several common URL parts such as http://, and www.We discarded organization name tokens We further removed query parametersSlide 18 (of 35)

    Slide 1 (of 4)

    Machine Learning and Bioinformatics Laboratory

    MODELING PHISHING URLSSlide 19 (of 35)Distribution of these features in our training set

    Slide 1 (of 4)

    Machine Learning and Bioinformatics Laboratory

    MODELING PHISHING URLSTraining With FeaturesOur labeled data consisted of 2508 URLs 1245 were phishing URLs 1263 were benign URLsPhishing URLs were placed under the positive (true) classnon-phishing ones were under the negative (false) class66% of URLs were used for training and the remaining 34% were used as the test setSlide 20 (of 35)

    Slide 1 (of 4)

    Machine Learning and Bioinformatics Laboratory

    MODELING PHISHING URLSTo indicate the relative strength of each feature in identifying a Phishing URL we report the corresponding odds ratios, ecoefficientSlide 21 (of 35)

    Slide 1 (of 4)

    Machine Learning and Bioinformatics Laboratory

    MODELING PHISHING URLSSlide 22 (of 35)

    Slide 1 (of 4)

    Machine Learning and Bioinformatics Laboratory

    MODELING PHISHING URLSEvaluation ResultWe evaluated the trained model on the 34% test set split.We performed our evaluation over multiple runs with randomized partitioning. This evaluation gave us an average accuracy of 97.31% with True Positive Rate of 95.8 %False Positive Rate of 1.2%.Slide 23 (of 35)

    Slide 1 (of 4)

    Machine Learning and Bioinformatics Laboratory

    ANALYSIS AND FINDINGSWe collected several million URLs from August 20th to August 31 2006The data consisted of two main components , unique URLs which are visited each dayconsecutive look up requests to these URLsSlide 24 (of 35)

    Slide 1 (of 4)

    Machine Learning and Bioinformatics Laboratory

    ANALYSIS AND FINDINGSAverage Phishing URLs per day.The average number of phishing URLs which have been visited from Googles toolbar in a day.we find that on average there are 777 URL phishing attacks in a day5073 viewers to a phishing pageSlide 25 (of 35)

    Slide 1 (of 4)

    Machine Learning and Bioinformatics Laboratory

    ANALYSIS AND FINDINGSAverage Phishing URLs per day.Slide 26 (of 35)the distribution of phishing attacks on each day of our study.

    Slide 1 (of 4)

    Machine Learning and Bioinformatics Laboratory

    ANALYSIS AND FINDINGSAverage Phishing URLs per day.Slide 27 (of 35)

    Slide 1 (of 4)

    Machine Learning and Bioinformatics Laboratory

    ANALYSIS AND FINDINGSAverage Phishing URLs per day.Slide 28 (of 35)

    Slide 1 (of 4)

    Machine Learning and Bioinformatics Laboratory

    ANALYSIS AND FINDINGSAverage Potential Phishing Victims per day.Determine how many users interact with a phishing pageA user that has any interaction at a site classified as phishing is regarded as a potential phishing victim.Slide 29 (of 35)

    Slide 1 (of 4)

    Machine Learning and Bioinformatics Laboratory

    ANALYSIS AND FINDINGSAverage Potential Phishing Victims per day.Based on the number of users who view phishing pages in a day, we further can infer Potential Success Rate of a phisher as follows:Slide 30 (of 35)

    Slide 1 (of 4)

    Machine Learning and Bioinformatics Laboratory

    ANALYSIS AND FINDINGSAverage Potential Phishing Victims per day.Slide 31 (of 35)the distribution of phishing attacks on each day of our study.

    Slide 1 (of 4)

    Machine Learning and Bioinformatics Laboratory

    ANALYSIS AND FINDINGSDistribution of Phishing by OrganizationSlide 32 (of 35)

    Slide 1 (of 4)

    Machine Learning and Bioinformatics Laboratory

    Slide 33 (of 35)ANALYSIS AND FINDINGSGeographical Distribution of Phishing.To determine country that hosts a particular phishing URL, we used Googles IP to Geo-Location infrastructure.

    Slide 1 (of 4)

    Machine Learning and Bioinformatics Laboratory

    Anti-Phishing ToolsSlide 34 (of 35)

    Slide 1 (of 4)

    Machine Learning and Bioinformatics Laboratory

    CONCLUSIONWe use our features in a logistic regression classifier that achieves a very high accuracy.One of the major contributions of this work is a large scale measurement study conducted on Google Toolbar URLsOn average we found around 777 unique phishing pages per day and on average 8.24% of the number users who view phishing pages are potential phishing victimsSlide 35 (of 35)

    Slide 1 (of 4)

    *******************************

Recommended

View more >