Document 7914614
Download
Report
Transcript Document 7914614
Accurately Detect
Parked Domain Typosquatting Attacks
Mishari Almishari and Xiaowei Yang
University of California, Irvine
Donald Bren School of Information and Computer
Sciences
Computer Science Department
malmisha, [email protected]
Introduction
Typo-Squatting refers to the act of
registering domain names that are
typographical errors of other popular
domain names (target domains) to hijack
the traffic intended to those popular
domain names
Hijacking for malicous purposes
Hijacking for financial purposes
Goals & Contributions
Accurately identify typo-squatting domains
Measure the amount of traffic hijacked by
squatters
Build a system that would reduce the
amount of traffic to such domains
Methodology
Identifying Typos
Use
edit distance of 1 as our typo definition
Less controversial in terms of typo definition
Users are more prone to make a single error
than 2 or more
A study shows that 90-95% of spelling errors
are of 1 mistake
Nevertheless, extending the typo definition is
worth working at.
Methodology
Identifying hijacking attempts
Is
being a typo domain enough?
No, 55% are not squatting
What
are the common hijacking indicators?
Parked Domain / Ads Listing (88.5%)
Offensive Adult Content (3.1%)
Domain For Sale (2.1%)
Forwarding To Another Domain (8.3%)
How to identify Parked Domain?
Use Machine Learning Classifier (96%) (100%)
Experiment
Measure amount of hijacked traffic
UCI DNS traces of 8 months
500 popular domains from Alexa Website
Steps
Pre-processing
of DNS queries
Finding Typo Domains
Finding Typo Squatting Domains
Measurement Results
Typo-squatting Hits
Total of 23,989
Ranges from 1,675 to
3,621
Typo-squatting
Domains
Total
of 1,786 domains
Ranges from 347 to
530 domains
Measurement Results
Maximum
Hits to Typosquatting Domains
Could reach up to 649
hits for one domain in on
month
Average
Hijack Ratio
Low
0.33% to 1%
Measurement Results
Maximum Hijack
Ratio
From 82% to 100%
Most squatted
Domains
Most
hijacked is
www.facebook.com
2nd Most hijacked is
www.youtube.com
www.facebook.com
www.youtube.com
www.myspace.com
www.wikipedia.org
www.google.com
Measurement Results
Typo Characterization
14%
of Cat 1 is
missing dot
66% of Cat 2 is from
neighbor keys
26% of Cat 2 is the
same as one before or
after
42 % is from neighbor
keys
Typo Category
Ratio
Missing One
Character
32%
Adding One
Character
33%
Substituting
One Character
22%
Swapping Two
Characters
13%
Comparison With Other Typocorrectors
Google & Yahoo
typo-correction
web services
15% (12%) missed
by Google (Yahoo)
99.6% (98%) of
what is missed are
real parked
domains
23%(31%) fwd to
the same target
domain
System Implementation
Successfully integrate
our methodology with
Mozilla Firefox
browser
Second set, 94% <=
167 ms
Non Typo domains, 10
ms in avg and max is
25 ms
Classifier
Data Set is of 2,800
sample
700 are parked domain and
2,100 general purpose
domain from Yahoo
Directory
Identify distinguishing
features
Compute Distribution for
verification
Use WEKA library to try
different classification
algorithms, Random Forest
was the best
Conclusion
Defined and implemented an accurate
identification methodology
Performed measurements that show typosquatters are moderately successful
Integrated the methodology with a Firefox
browser to detect typo-squatting domains
on the fly