Document 7914614

Download Report

Transcript Document 7914614

Accurately Detect
Parked Domain Typosquatting Attacks
Mishari Almishari and Xiaowei Yang
University of California, Irvine
Donald Bren School of Information and Computer
Sciences
Computer Science Department
malmisha, [email protected]
Introduction
Typo-Squatting refers to the act of
registering domain names that are
typographical errors of other popular
domain names (target domains) to hijack
the traffic intended to those popular
domain names
 Hijacking for malicous purposes
 Hijacking for financial purposes

Goals & Contributions
Accurately identify typo-squatting domains
 Measure the amount of traffic hijacked by
squatters
 Build a system that would reduce the
amount of traffic to such domains

Methodology

Identifying Typos
 Use
edit distance of 1 as our typo definition
 Less controversial in terms of typo definition
 Users are more prone to make a single error
than 2 or more
 A study shows that 90-95% of spelling errors
are of 1 mistake
 Nevertheless, extending the typo definition is
worth working at.
Methodology

Identifying hijacking attempts
 Is

being a typo domain enough?
No, 55% are not squatting
 What
are the common hijacking indicators?
Parked Domain / Ads Listing (88.5%)
 Offensive Adult Content (3.1%)
 Domain For Sale (2.1%)
 Forwarding To Another Domain (8.3%)


How to identify Parked Domain?

Use Machine Learning Classifier (96%) (100%)
Experiment
Measure amount of hijacked traffic
 UCI DNS traces of 8 months
 500 popular domains from Alexa Website
 Steps

 Pre-processing
of DNS queries
 Finding Typo Domains
 Finding Typo Squatting Domains
Measurement Results

Typo-squatting Hits

Total of 23,989
 Ranges from 1,675 to
3,621

Typo-squatting
Domains
 Total
of 1,786 domains
 Ranges from 347 to
530 domains
Measurement Results
 Maximum
Hits to Typosquatting Domains

Could reach up to 649
hits for one domain in on
month
 Average
Hijack Ratio
Low
 0.33% to 1%

Measurement Results

Maximum Hijack
Ratio


From 82% to 100%
Most squatted
Domains
 Most
hijacked is
www.facebook.com
 2nd Most hijacked is
www.youtube.com
www.facebook.com
www.youtube.com
www.myspace.com
www.wikipedia.org
www.google.com
Measurement Results

Typo Characterization
 14%
of Cat 1 is
missing dot
 66% of Cat 2 is from
neighbor keys
 26% of Cat 2 is the
same as one before or
after
 42 % is from neighbor
keys
Typo Category
Ratio
Missing One
Character
32%
Adding One
Character
33%
Substituting
One Character
22%
Swapping Two
Characters
13%
Comparison With Other Typocorrectors




Google & Yahoo
typo-correction
web services
15% (12%) missed
by Google (Yahoo)
99.6% (98%) of
what is missed are
real parked
domains
23%(31%) fwd to
the same target
domain
System Implementation



Successfully integrate
our methodology with
Mozilla Firefox
browser
Second set, 94% <=
167 ms
Non Typo domains, 10
ms in avg and max is
25 ms
Classifier





Data Set is of 2,800
sample
700 are parked domain and
2,100 general purpose
domain from Yahoo
Directory
Identify distinguishing
features
Compute Distribution for
verification
Use WEKA library to try
different classification
algorithms, Random Forest
was the best
Conclusion
Defined and implemented an accurate
identification methodology
 Performed measurements that show typosquatters are moderately successful
 Integrated the methodology with a Firefox
browser to detect typo-squatting domains
on the fly
