typo-squatting: a nuisance or a threat to your traffic?

42
Typo-Squatting: a Nuisance or a Threat to Your Traffic? Mishari Almishari

Upload: webhostingguy

Post on 20-Jun-2015

1.282 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Typo-Squatting: a Nuisance or a Threat to Your Traffic?

Typo-Squatting: a Nuisance or a Threat to Your Traffic?

Mishari Almishari

Page 2: Typo-Squatting: a Nuisance or a Threat to Your Traffic?

Outline

Introduction Background Methodology Parked Domain Classifier Measurements Future Work Related Work Conclusion

Page 3: Typo-Squatting: a Nuisance or a Threat to Your Traffic?

Introduction - Motivation

Traffic is important to web domains!• no point of launching without incoming traffic

• Loosing/Gaining traffic means loosing/gaining money

• One way to price the ADS is Pay Per Click Model

Traffic Diversion could be a serious threat to a domain

Page 4: Typo-Squatting: a Nuisance or a Threat to Your Traffic?

Introduction - Motivation

Typos may attract traffic• Users vulnerable to making typos

• Users may forget about visiting target domain• Threat to Target Domain!

Intentionally registering such typo domains is called Typo-squatting

Page 5: Typo-Squatting: a Nuisance or a Threat to Your Traffic?

Introduction - Goal

To study how much traffic typo-squatters can get from target domains• Are those domains attracting much traffic?

• There are many typo-squatting domains registered (Banerjee et al., 08)

• Search engines typo-corrections and browser auto-completions!

• How much traffic target domains are loosing?• Is it of negligible ratio or a serious threat?• Do users go back to target domains or get distracted?

Page 6: Typo-Squatting: a Nuisance or a Threat to Your Traffic?

Introduction - Contribution

Automatic and accurate identification of typo-squatting domains (Measurement Methodology)

Bound on how much traffic target domains are loosing towards typo-squatting domains (Measurement Results)

Page 7: Typo-Squatting: a Nuisance or a Threat to Your Traffic?

Outline

Introduction Background Methodology Parked Domain Classifier Measurements Related Work Future Work Conclusion

Page 8: Typo-Squatting: a Nuisance or a Threat to Your Traffic?

Background – Domain Parking

Domain Parking is the practice of showing a temporary page for an unused domain before launching it

Page 9: Typo-Squatting: a Nuisance or a Threat to Your Traffic?

Background - Domain Parking

Page 10: Typo-Squatting: a Nuisance or a Threat to Your Traffic?

Background – Domain Parking

Page 11: Typo-Squatting: a Nuisance or a Threat to Your Traffic?

Background – Domain Parking

Page 12: Typo-Squatting: a Nuisance or a Threat to Your Traffic?

Background – Domain Parking

Domain Parking Service• Parks and hosts unused domains

• Monetize the traffic by showing ads

Many Typo-squatting domains are parked domains (Wang et al, 06), (Keats, 07)

Page 13: Typo-Squatting: a Nuisance or a Threat to Your Traffic?

Outline

Introduction Background Methodology Parked Domain Classifier Measurements Future Work Related Work Conclusion

Page 14: Typo-Squatting: a Nuisance or a Threat to Your Traffic?

Methodology

Data Collection Identifying Typo-Squatting Domains

Page 15: Typo-Squatting: a Nuisance or a Threat to Your Traffic?

Methodology - Data Collection

UCI NET UCI NETINTERNETINTERNET

UCI ResolverOur Machine

DATE TIME HASHED-IP DOMAIN TYPE CLASS

USER QUERY

Page 16: Typo-Squatting: a Nuisance or a Threat to Your Traffic?

Methodology – Identify Typo-squatting Domain

Identify Similar Domainsa. Single Error Typo

• Single error accounts for 90-95% of spelling/typo errors (Pollock et al, 83)

• www.walmart.com and www.wamart.com

b. gTLD substitution • www.amazon.com and www.amazon.org

Page 17: Typo-Squatting: a Nuisance or a Threat to Your Traffic?

Methodology – Identify Typo-squatting Domains

But Similar domain is not enough!• www.abc.com and www.abd.com• www.walmart.com and www.walkmart.com• www.usps.com and www.usps.org • Random Sample

• More than 54% are not Typo-squatting

Need to Identify Hijacking Intention

Page 18: Typo-Squatting: a Nuisance or a Threat to Your Traffic?

Methodology – Identify Typo-squatting Domain

• Identify Hijacking Indicator Parked Domain (Ads – listing)

~ 88%

Forwarding to other domains ~ 8%

Others: Inappropriate Content, …

Parked Domain as the indicator

Page 19: Typo-Squatting: a Nuisance or a Threat to Your Traffic?

Methodology – Identify Typo-squatting Domain

Similar Domain Parked Domain

Typo-Squatting Domain

Page 20: Typo-Squatting: a Nuisance or a Threat to Your Traffic?

Methodology – Identify Typo-squatting Domain

How to identify Parked Domain?• Parked Domain Classifier

• 96%

• Presence of Parking signatures• Well-known parking signatures (domain

names/urls)

Page 21: Typo-Squatting: a Nuisance or a Threat to Your Traffic?

Methodology - Summary

Identify Similar Domains

Identify ParkedDomains

List of Typo-squatting

Domains

Page 22: Typo-Squatting: a Nuisance or a Threat to Your Traffic?

Outline

Introduction Background Methodology Parked Domain Classifier Measurements Future Work Related Work Conclusion

Page 23: Typo-Squatting: a Nuisance or a Threat to Your Traffic?

Parked Domain Classifier

Build Data Set

Extract Core Features

Combine Into Classifier

Page 24: Typo-Squatting: a Nuisance or a Threat to Your Traffic?

Data Set

Data Set consists of 2,800 domains 700 are parked domain

• Collected from MS Strider Website

2,100 are non-parked domains

• Collected From the fourteen Yahoo Directory Top Categories

Page 25: Typo-Squatting: a Nuisance or a Threat to Your Traffic?

Feature Selection

• Heuristically, Identify common features in parked domain

• Compute the distribution of those features for verification

•Common Link Ratio Max

Page 26: Typo-Squatting: a Nuisance or a Threat to Your Traffic?

Combining Features Into Classifier

Tried Different Classifier Algorithms• Decision Tree

• SVM

• K-Nearest Neighbor

• Random Forest

• The best performance

Page 27: Typo-Squatting: a Nuisance or a Threat to Your Traffic?

Outline

Introduction Background Methodology Parked Domain Classifier Measurements Future Work Related Work Conclusion

Page 28: Typo-Squatting: a Nuisance or a Threat to Your Traffic?

DATA Sets

DNS Traces• Four Months

• ~ 30 million domains (~ 2 billion hits) (~ 30,000 users)

Target Domain Set• Alexa’s Top 500 popular domains

• ~53,000,000 hits

Page 29: Typo-Squatting: a Nuisance or a Threat to Your Traffic?

Typo-Squatting Domains & Hits

1,332 typo-squatting 13,431 hits (~ 110 a day) Is it Large or Small?

• 500 Target Domains

• 4 Month Period

• ~ 30,000 users

• Given Similar Ratio may translate to non-trivial number

• 30,000 => 110 Per Day

• 300,000 => 1,100 Per Day

• 3000,000 => 11,000 (X 365 = ~ 4,000,000 A YEAR)

Page 30: Typo-Squatting: a Nuisance or a Threat to Your Traffic?

Typo-squatting Ratio

• 0.025% of total number of queries

• (89% , ≤ 1%) (70%, ≤ 0.1%) ( 57%, ≤ 0.01%)

Page 31: Typo-Squatting: a Nuisance or a Threat to Your Traffic?

User Correction Ratio – Alexa-500

• 54% of typo-squatting queries are corrected

• ~ 51% squatted target domains have most squat hits corrected

Page 32: Typo-Squatting: a Nuisance or a Threat to Your Traffic?

Potential Hit Loss

• Potential Hit Loss Ratio = 0.012%

• (92% , ≤ 1%) (78%, ≤ 0.1%) (64%, ≤ 0.01%)

Page 33: Typo-Squatting: a Nuisance or a Threat to Your Traffic?

Potential Money Loss

• ~75% do not point to target domains

• Referring Typo-Sqt Ratio = 0.008%

• (96%, ≤ 1%) (91%, ≤ 0.1%) ( 81%, ≤ 0.01%)

Page 34: Typo-Squatting: a Nuisance or a Threat to Your Traffic?

Typo-Squatting Distribution

•19 % of all Typo-squatting hits

Page 35: Typo-Squatting: a Nuisance or a Threat to Your Traffic?

Typo Characterization

• Most Typos are single errors (95% VS 5%)

• Most gTLD sub are “com” to “org” (50%)

• Add – 37 % are of non-adjacent keys

• Sub – 77% are of non-adjacent keys

• Sub – 13% of substitutions are “a” and “o”

•Spelling error

Page 36: Typo-Squatting: a Nuisance or a Threat to Your Traffic?

Typo-squatting Domains – TP60

• 15,499 hits

• 0.045% of total number of queries

• (76%, ≤ 1%) (60%, ≤ 0.5%)

Page 37: Typo-Squatting: a Nuisance or a Threat to Your Traffic?

Outline

Introduction Background Methodology Parked Domain Classifier Measurements Future Work Related Work Conclusion

Page 38: Typo-Squatting: a Nuisance or a Threat to Your Traffic?

Future Work

How much of the ads budget go to squatters? Enhance our identification technique See, if the results hold at other ISPs Typo Modeling for getting traffic back

Page 39: Typo-Squatting: a Nuisance or a Threat to Your Traffic?

Outline

Introduction Background Methodology Parked Domain Classifier Measurements Future Work Related Work Conclusion

Page 40: Typo-Squatting: a Nuisance or a Threat to Your Traffic?

Related Work

MS Strider Project [Wang et al. Sruti06] McAfee Study [Keats McAfee White

Paper 07] JAAL project [Banerjee et al. Infocom 08]

Page 41: Typo-Squatting: a Nuisance or a Threat to Your Traffic?

Outline

Introduction Background Methodology Parked Domain Classifier Measurements Future Work Related Work Conclusion

Page 42: Typo-Squatting: a Nuisance or a Threat to Your Traffic?

Conclusion

Accurately and automatically identify typo-squatting domains

How much traffic go to typo-squatters Bound on how much traffic the target domain is

loosing towards typo-squatting

• inconsequential