exploration of gaps in bitly's spam detection and relevant countermeasures

Exploration of gaps in Bitly's spam detectionand relevant counter measures

Neha GuptaAdvisor: Dr. Ponnurangam Kumaraguru

M.Tech Thesis Defense23-April-2014

2

Thesis Committee Mr. Sachin Gaur, MixORG Dr. Vinayak Naik, IIIT-Delhi Dr. PK (Chair), IIIT-Delhi

3

Achievements Gupta, N., Aggarwal, A., and Kumaraguru, P. bit.ly/can-do-better.

Poster at Security and Privacy Symposium (SPS), IIT-K, 2014.

4

Presentation Outline Research Motivation and Aim Related Work Research Contribution Methodology Experiments and Analysis Malicious Bitly Link Detection Conclusion Future Work Questions

Presentation Outline

5

What are URL shortening services?

Long URL Short URL

…

Others

http://bit.ly/1oL7gi5https://www.youtube.com/watch?v=ukUL_I14GPw

URL shortening service

hashThe most popular

Research Motivation and Aim

Shortens close to 80 million links each day Marks 2-3 million as suspicious every week Twitter’s default URL shortening service before 2011

http://bit.ly/1oL7gi5

https://www.youtube.com/watch?v=ukUL_I14GPw

6

Use of URL shortening services

Space gain (Twitter’s 140 character limit)

More manageable Prevent line breaks Easy dissemination of content

Provides useful analytics (e.g. click data)

Online Social Media (OSM) connection

Complex link obfuscation

Please like this picture http://3.bp.blogspot.com/_s5emCsFnEdE/TKUVi2BopBI/AAAAAAAADl8/lffvi7khF7g/s1600/googl.png

@abcDr. ABC

10:44 PM Tue Aug 30, 2011

Please like this picture bit.ly/1hAVVaE

@abcDr. ABC

10:44 PM Tue Aug 30, 2011

versus


http://3.bp.blogspot.com/_s5emCsFnEdE/TKUVi2BopBI/AAAAAAAADl8/lffvi7khF7g/s1600/googl.png



http://bit.ly/1hAVVaE

7

Abuse of URL shortening services- Attack scenario

URL shortening

service

One-level obfuscation

Long malicious URL Short malicious URL

Many short URL services detect and restrict long URLs at submission, but Bitly does not

Not so popular

URL shortening

service

Long malicious URL Short malicious URLPopular

URL shortening

service

Multi-level obfuscation

…


http://www.sexpixbox.com/kingnet/cute/index.html http://bit.ly/QCMW2S

http://www.sexpixbox.com/kingnet/cute/index.html http://bit.ly/QCMW2Shttp://short.me/ABCD

http://www.sexpixbox.com/kingnet/cute/index.html


http://bit.ly/QCMW2S


http://bit.ly/QCMW2S

http://short.me/ABCD

http://short.me/ABCD

8

Abuse of URL shortening services- Attack execution

Legitimate looking tweet Scam!!


9

Major attacks

Year 2012

Year 2014


Year 2013

Year 2014

10

Bitly's Spam Detection Policies

+

+More filters..

‘‘ ’’

‘‘’’


11

Research Aim

A focused study on Bitly to:

characterize malicious URLs

examine Bitly’s security policies

identify Bitly specific features to detect spam


12



13

Related Work2009

• Kandylas et al. Relative study of long and short Bitly URLs on Twitter

2010 • Benevenuto et al. + Grier et al. Identification of distinctive features to detect spammers on Twitter

2011

• Antoniades et al. Analysis of content, popularity, and impact of short URLs

• Chhabra et al. Overview of evolving phishing attacks through short URLs on Twitter

• Thomas et al. Classification of a long URL as malicious / benign in real time

2012

• Klien et al. Global usage pattern analysis of short URLs• Aggarwal et al. Real time phishing detection on Twitter using

Twitter and URL based features• Lee et al. Real time suspicious URL detection technique on

Twitter using conditional redirects

2013 • Maggi et al. Study of abuse of short URLs using 622 distinct shortening services

2014 • Nikiforakis et al. Study of ecosystem of ad-based URL shortening services

Related Work

14

Related Work

2013• Click Traffic Analysis of Short URL Spam on Twitter

Wang et al. Classification of a link as spam / non-spam using only click traffic based features

Non inclusion of unclicked Bitly URLs

First study to

highlight short

URL based

features in spam

detection

Related Work

No dedicated study to identify ground security issues specific to a URL shortener Unexplored short URL based features to detect malicious content before it targets the audience

Research Gaps

15

Research Contribution

Click traffic analysis and social network impact of malicious Bitly links

Detailed inspection to highlight weaknesses in Bitly’s spam detection techniques

Proposal to detect clicked / unclicked malicious Bitly links

Related Contribution

Methodology - Acquiring Dataset

link_encoder_infolink_encoder_link_history

link_infolink_expandlink_clickslink_referring_domainslink_encoders

Bitly Global HashLong URL#Warnings

Link Dataset (763,160)

Link Metric Dataset(413,119)

Encoder/User Metric Dataset(12,344)

(Bitly API)(Bitly API)

Phase 1 Phase 2 Phase 3

(54.13%) (100%)

16

Methodology

17



18

Metadata Analysis – Content Creation

Bitly Global Hash

Long URL

#Warnings

TLDDomains

Status check after 5 months

(22,038)Link Dataset (763,160)

Non-existent Domains

(18,966)

Whitelist checkDomains

(21,982)

Results: 83.06% suspicious domains non-existent before / after 5 months Total number of click requests made to these dead domains (only in October) found to be 9,937,250

Inference:Created for a dedicated purpose of spamming and eventually die out after achieving significant number of hits!!

Experiments and Analysis

19


Network Analysis Referrer Network Connected Network


20

Network Analysis - Referrer Network

Referrer as Twitter

37,903

last <=200 tweets

788,759

Text + URL + Domain Jaccard Similarity |BA|

|BA|B)J(A,

(Twitter API)

17 Twitter profiles with variance <=0.00012

…

Status check after 5 months

Manual annotation

21,679

4,336 (11.44%)

636 (1.68%)

5,444 (14.36%)

5,302 (13.99%)

22,185 (58.53%)


Hour of the day vs. Minute of the hour graph for Twitter user – (a) @dtitgp2. (b) @fujisakikaoru

3 profiles: pornographic content1 profile: work from home scam11 profiles: spam but <=3 tweets 2 profiles: suspended

21

Network Analysis – Connected Network


22

Twitter3,415

(63.54%)

Facebook951

(17.69%)1,009

18.77%

5,375 users connected Twitter / Facebook profile

Users can connect any number of Facebook / Twitter accounts

Why more Twitter than Facebook? Doesn't allow users to connect Facebook brand / fan

pages for free

Multiple connections 507 malicious users connected multiple Twitter accounts 28 malicious users connected at least 10 Twitter accounts



Connected OSM network of all encoders

23

Bitly profiles

(Link history)

Bitly warning check

(Connected Twitter accounts)

(<=200 tweets)

Inter Twitter profile Jaccard Similarity

(Bitly user name)

Inter Bitly profile Jaccard Similarity

Manual annotation based on similarity scores

3 malicious communities detected



24

Community 12 Bitly users, 9 associated Twitter accounts each

All 18 Twitter accounts shared similar explicit pornographic content

Dormant on Bitly, active on Twitter



Counter measure 1Bitly should impose a restriction on the

number of OSM accounts a user can connect

25

Experiments and AnalysisExperiments and Analysis

Security Analysis Efficiency Check Promptness Check Tractability Check

26

(a) Malicious link identification

Security Analysis - Efficiency Check

International consortium that brings together businesses affected by phishing attacks

Free checking of suspicious URLs using 52 different website / domain scanning engines and datasets

Domain / IP level blacklist created from the occurrence of websites in unsolicited messages.

1 2 3


27

1

APWGs livefeed request

setup

142,660

6 months

216

2,656

Direct lookup

Reverse lookup

2,872

+Bitly warning check 382

(13.30%)

Inference: 86% undetected malicious links by Bitly in 6 months Such low detection rate looks alarming as APWG is a popular and trusted source to detect phishing!



Bitly is not even using the claimed detection services effectively

Similar analysis when performed against Virustotal, 71.53% undetected links obtained 36.66% domains blacklisted by SURBL, but undetected by Bitly Bitly claims to use SURBL (http://blog.bitly.com/post/138381844/spam-and-malware-protection)

http://blog.bitly.com/post/138381844/spam-and-malware-protection

28

(b) Malicious user profile Identification

collectedlinksTotal

pagewarningBitlytogredirectinLinksFactorSuspicion

encoder #

#

Suspicion Factor measures the credibility of a Bitly profile Computed for all 12,344 encoders in our dataset



29

2,018 (12,344 - 10,326) out of 12,344 encoders (16.35%) had a Suspicion Factor=1 i.e. they shortened only suspicious links



Counter measure 2 Bitly should take some measures to detect and suspend such users If not suspend, a credibility score can be added with a Bitly profile

30

Our blog Bitly’s reponse

Tweet to our blog



31

Highly suspicious profiles: User has shortened at least 100 links + Suspicion Factor is 1 80 profiles


Security Analysis – Promptness Check

User: bamsesang, Month lag: 24

Also collected their recent link history (after 1 January 2014) 4 of these 80 users were still active and propagating malicious content

Ease of penetration of spammers and delay in Bitly’s suspicious user detection process which it claims to follow

User: iplayonlinegames, Month lag: 18

32

Result: 35.2% identified malicious links in October 2013 are also being actively clicked in year 2014

Inference: By-passable Bitly warning page is alone not enough to curtail the dissemination of spam No control over the access to already detected malicious links can heavily encourage spammers to use Bitly

Popular malicious Bitly links Links with large number of warning pages displayed (URLs with high overall impact)

Bitly Global Hash

Long URL

#Warnings

Link Dataset (763,160)

Reverse sort based on number of warnings Top 1000

links

Bitly API Recent Click history


Security Analysis – Tractability Check

Counter measure 3Bitly should not only throw a

warning page but also block the visit on popular malicious Bitly

links already detected

33

Bitly is not using the claimed detection services effectively

Extreme delay in suspicious user identification (if at all)

By-passable Bitly warning page is alone not enough to control the problem of spam


Security Analysis – Major Findings

34



35

Malicious Bitly Link Detection - Data Collection and Labeling

a repository of suspected phishing or malware pages maintained by Google Inc.

a public crowdsourced database of phishing URLs

Malicious Bitly Link Detection

Tweets from Twitter’s REST

API (412,139)

Blacklist + Bitly Warning Check

Extract and expand bitly

URLs(34,802)

Malicious

Benign

labeled-datasetunlabeled-datasetCollect data

1. Google Safebrowsing2. SURBL3. PhishTank4. VirusTotal

Data Collection Data Labeling

36

Malicious Bitly Link Detection – Feature Selection

No. Feature Name Feature Description1 Domain age Difference between domain creation / updation date and expiration

date

2 Link Creation domain creation difference

Difference between domain creation date and bitly link creation date

3 Link creation hour Bitly link creation hour

4 Number of encoders Number of bitly users who encoded a particular link

5 Anonymous and API encoder ratio

Ratio of encoders as ‘’anonymous’’ or from a Twitter based application (Twitterfeed, TweetDeck, Tweetbot) to the total number of encoders

6 Link creation first click difference

Difference in days between bitly link creation date and date of first click received

7 Referring domains - direct by total

Ratio of referring domains from a direct source to the total number of referring domains

WHOIS

specific

Bitly specific

Non-Click based

Click based


37

Malicious Bitly Link Detection - Experimental Setup

1) Naive Bayes2) Decision Tree3) Random Forest

Machine Learning Algorithms

Training on pre-labeled datasetPredict labels of an unseen data

Used the classifiers implemented in Weka software package Open source collection of machine learning classifiers for data mining tasks


Training and Testing Data

Testing25%

Train-ing

75%

10 fold cross validation

Performance evaluation on unlabeled data

38

Malicious Bitly Link Detection – Evaluation Results

(a) Experiment 1Mix dataset – Click and Non-click

All features

Malicious Benign

5,926 6,074

Malicious Benign

2,074 1,926

Training data (75%) Test Data (25%)

Evaluation Metric Naive Bayes Decision Tree Random Forest

Accuracy 72.15% 78.37% 80.43%

Recall (malicious) 73.10% 82.40% 81.00%

Recall (Benign) 71.10% 74.10% 79.90%

Precision (malicious) 73.10% 77.40% 81.20%

Precision (Benign) 71.10% 79.60% 79.60%

F-measure (malicious) 73.10% 76.70% 81.10%

F-measure (benign) 71.10% 76.70% 79.70%

TP

FP

FN

TN


39

Malicious Bitly Link Detection - Evaluation Results

(c) Experiment 2Mix dataset – Click and Non-clickWHOIS + Non-click based features


Accuracy 73.57% 81.93% 83.50%


Recall (Benign) 73.80% 80.30% 82.80%




F-measure (benign) 72.90% 81.00% 82.80%

Malicious Benign

5,926 6,074

Malicious Benign

2,074 1,926


TP

FP

FN

TN


40

Malicious Bitly Link Detection – Evaluation Results

(b) Experiment 3Only Non-click data

WHOIS + Non-click based features

Malicious Benign

2,743 2,796

Malicious Benign

950 897



Accuracy 80.02% 85.06% 86.41%


Recall (Benign) 80.40% 80.80% 83.40%




F-measure (benign) 80.50% 84.80% 86.30%

TP

FP

FN

TN


41

Malicious Bitly Link Detection - Feature Ranks

Rank Feature1 Type of referring domains


3 Domain age

4 Link creation hour

5 Type of encoders

6 Link creation-click lag

7 Number of encoders

Rank Feature

1 Link creation hour


3 Domain age

4 Type of encoders

5 Number of encoders

Complete labeled-dataset Non-Click labeled-dataset

Using Weka's InfoGainAttributeEval package for attribute selection Evaluates the worth of an attribute by measuring the information gain with respect to the class


42

Malicious Bitly Link Detection - Result Summary

Increase in accuracy and F-measure on using only non-click based features

Not only efficient in detecting clicked malicious Bitly links, but can identify suspicious links even when no click is received

This solution can also capture the multi level obfuscation technique used by attackers


Counter measure 4In addition to the blacklists and

other spam detection filters, Bitly specific feature set can also be

used to detect malicious content

43

Proposed Counter Measures - Summary

Impose a check on the number of connected OSM accounts by a single profile

Either directly delete the identified malicious links or introduce a credibility score for each profile to warn users (of upcoming risks)

More than a warning page, should go ahead and block the popular malicious links to prevent its persistence over web

In addition to various blacklists and other filters, can also incorporate available Bitly analytics to better detect illegitimate content

Counter Measures

44

Domains created for a dedicated purpose of spamming eventually die out after achieving

a significant number of hits Spammers exploit Bitly's policy of not imposing a cap on the number of connected OSM

accounts

Existence of malicious communities which operate across Bitly and Twitter

Bitly is not using the claimed detection services effectively

Inability / extreme-delay in detection of malicious accounts

By-passable warning page does not restrict the overall problem of spam High classification accuracy for non-click dataset; capable of identifying suspicious Bitly

links much before they target their audience

ConclusionConclusion

45

Since characteristics of spammers change over time , do a detailed comparative analysis on a more exhaustive dataset

Broaden and generalize our feature set to detect spam from any short URL services

Develop a browser extension that can work in real time and classify any short link as malicious or benign

Future WorkFuture Work

46

AcknowledgementsDr. PK, IIIT-Delhi

Anupama Aggarwal, PhD ,IIIT-Delhi

Brian David Eoff (senior data scientist), Mark Josephson (CEO), Bitly

CERC, IIIT-Delhi

Precog members, friends and family

47

References (I) Alexander Neumann, Johannes Barnickel, Ulrike Meyer. Security and Privacy Implications of URL Shortening

Services. In proceedings of Web 2.0 Security and Privacy (W2SP) (2011). Florian Klien, Markus Strohmaier. Short Links Under Attack: Geographical Analysis of Spam in a URL Shortener

Network. In proceedings of the 23rd ACM conference on Hypertext and social media (2012), Pages 83-88. Demetris Antoniades, Iasonas Polakis, Georgios Kontaxis. we.b: The web of short URLs. In proceedings of the

20th international conference on World wide web (2011), Pages 715-724. Federico Maggi, Alessandro Frossi, Stefano Zanero, Gianluca Stringhini, Brett Stone-Gross, Christopher Kruegel,

Giovanni Vigna. Two Years of Short URLs Internet Measurement: Security Threats and Countermeasures. In proceedings of the 22nd international conference on World Wide Web (2013), Pages 861-872.

Sangho Lee and Jong Kim. WARNINGBIRD: Detecting Suspicious URLs in Twitter Stream. NDSS 2012 (2012). De Wang, Shamkant B. Navathe, Ling Liu, Danesh Irani, Acar Tamersoy, and Calton Pu. Click Traffic Analysis of

Short URL Spam on Twitter. Collaborative Computing: Networking, Applications and Worksharing (Collaboratecom), 2013 9th International Conference (2013), Pages 250-259.

Aditi Gupta and and Ponnurangam Kumaraguru. Credibility ranking of tweets during high impact events. In Proceedings of the 1st Workshop on Privacy and Security in Online Social Media (PSOSM), In conjunction with WWW'12 (2012).

48

Kurt Thomas, Chris Grier, Justin Ma, Vern Paxson, and Dawn Song. Design and Evaluation of a Real-Time URL Spam Filtering Service. Security and Privacy (SP) IEEE Symposium (2011), Pages 447 - 462.

Hongyu Gao, Jun Hu, and Christo Wilson. Detecting and Characterizing Social Spam Campaigns. In proceedings of the 10th ACM SIGCOMM conference on Internet measurement (2010), Pages 35-47.

Fabricio Benevenuto, Gabriel Magno, Tiago Rodrigues, and Virgilio Almeida. Detecting Spammers on Twitter. In Collaboration, Electronic messaging, Anti-Abuse and Spam Conference (CEAS) (2010).

Anupama Aggarwal, Ashwin Rajadesingan, and Ponnurangam Kumaraguru. PhishAri: Automatic Realtime Phishing Detection on Twitter. In Seventh IEEE APWG eCrime researchers summit (eCRS) (2012). Master's thesis, IIIT-Delhi, http://precog.iiitd.edu.in/Publications les/Anupama_MTech_Thesis.pdf (2012).

Chris Grier, Kurt Thomas, Vern Paxson, and Michael Zhang. @spam: The Underground on 140 Characters or Less. In proceedings of the 17th ACM conference on Computer and communications security (2010), Pages 27-37.

Sidharth Chhabra, Anupama Aggarwal, Fabricio Benevenuto, and Ponnurangam Kumaraguru. Phi.sh/$oCiaL: The Phishing Landscape through Short URLs. CEAS '11 Proceedings of the 8th Annual Collaboration, Electronic messaging, Anti-Abuse and Spam Conference (2011), Pages 92-101.

References (II)

49

Saeed Abu-Nimeh, Dario Nappa, Xinlei Wang, and Suku Nair. A Comparison of Machine Learning Techniques for Phishing Detection. In proceedings of eCrime researchers summit (2007), ACM, Pages 60-69.

Mashable. Warning: Bit.ly Is Eating Other URL Shorteners for Breakfast. http://mashable.com/2009/10/12/bitly-domination/, October 2009.

Symantec. Spam with .gov URLs. http://www.symantec.com/connect/blogs/spam-gov-urls, October 2012. Symantec. Malicious Shortened URLS on Social Networking Sites.

http://www.symantec.com/threatreport/topic.jsp?id=threat_activity_trends&aid=malicious_shortened_urls, 2010.

Mark Hall, Eibe Frank, Georey Holmes, Bernhard Pfahringer, Peter Reutemann, and Ian H. Witten. The WEKA Data Mining Software: An Update. ACM SIGKDD Explorations Newsletter (2009), vol. 11, no. 1, Pages 1018.

References (III)

50

Thank You!

Questions?

exploration of gaps in bitly's spam detection and relevant countermeasures

Engineering

long url short url

short url services

abuse of url

domains link

short bitly urls

dataset link

adbased url

malicious urls